Download

This page contains details of and links to all the data dumps of the Open Citation Indexes, which are created every six months, and of the OpenCitations Corpus (OCC), which are created regularly every month. They are made available online by means of the support of Figshare.

Each dump of an Open Citation Index is composed by four zip archives. Two of these archives contains the actual data and provenance information of the index in N-Triples, while the other two archives contain the same information in CSV.

Instead, each dump of the OpenCitations Corpus is composed by several zip archives, each containing either data or provenance information relating to a particular sub-dataset within the OCC. After unzipping an archive, one needs to use Disk ARchive (DAR) - a multi-platform archive tool for managing huge amount of data - to recreate the whole OCC structure.

Open Citation Indexes

COCI, the OpenCitations Index of Crossref open DOI-to-DOI references
Most recent COCI data dump - July 2018 Dump

Dump created on 2018-07-04. This dump includes information on:

Type and formatArchiveSize
Citation data (CSV)ZIP49 GB (8.2 GB zipped)
Citation data (N-Triple)ZIP333 GB (16 GB zipped)
Provenance data (CSV)ZIP54 GB (4.5 GB zipped)
Provenance data (N-Triple)ZIP206 GB (8 GB zipped)

OpenCitations Corpus (OCC)

Most recent OCC data dump - December 2017 Dump
25 December 2017 Dump

Dump created on 2017-12-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
25 November 2017 Dump

Dump created on 2017-11-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
25 October 2017 Dump

Dump created on 2017-10-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
25 September 2017 Dump

Dump created on 2017-09-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
25 August 2017 Dump

Dump created on 2017-08-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance, data (single n-quads file)
25 July 2017 Dump

Dump created on 2017-07-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
25 June 2017 Dump

Dump created on 2017-06-25. This dump includes information on:

TypeArchive
agent roles (ar)(data not available for technical reasons), provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
May 2017 OCC Dump

Dump created on 2017-05-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
April 2017 OCC Dump

Dump created on 2017-04-26. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance, data+provenance (single n-quads file)
March 2017 OCC Dump

Dump not submitted for technical reasons.

February 2017 OCC Dump

Dump not submitted for technical reasons.

January 2017 OCC Dump

Dump not submitted for technical reasons.

December 2016 OCC Dump

Dump created on 2016-12-24. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
November 2016 OCC Dump

Dump not submitted for technical reasons.

October 2016 OCC Dump

Dump created on 2016-10-24. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
September 2016 OCC Dump

Dump created on 2016-09-24. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance