Download

This page contains details of and links to all the data dumps of the OpenCitations Indexes, the Open Biomedical Citations in Context Corpus, and of the OpenCitations Corpus (OCC). They are made available online by means of the support of Figshare and of the Internet Archive.

Each dump of an OpenCitations Index is composed by four zip archives. Two of these archives contains the actual data and provenance information of the index in N-Triples, while the other two archives contain the same information in CSV.

Each dump of the Open Biomedical Citations in Context Corpus is composed by one single zip artchive containing all the information about actual data and provenance stored in JSON-LD.

Instead, each dump of the OpenCitations Corpus is composed by several zip archives, each containing either data or provenance information relating to a particular sub-dataset within the OCC. After unzipping an archive, one needs to use Disk ARchive (DAR) - a multi-platform archive tool for managing huge amount of data - to recreate the whole OCC structure.

OpenCitations Indexes

COCI, the OpenCitations Index of Crossref open DOI-to-DOI references
Most recent COCI data dump - September 2021 Dump

Dump created on 2021-09-03, based on open references to works with DOIs within the Crossref dump dated August 2021. This dump includes information on:

Type and formatArchiveSize
Citation data (CSV)ZIP193.6 GB (29.44 GB zipped)
Citation data (N-Triple)ZIP1.28 TB (57.9 GB zipped)
Citation data (Scholix)ZIP1.06 TB (31.12 GB zipped)
Provenance data (CSV)ZIP267.5 GB (15.37 GB zipped)
Provenance data (N-Triple)ZIP2.65 TB (62.4 GB zipped)
July 2021 Dump

Dump created on 2021-07-29, based on open references to works with DOIs within the Crossref dump dated January 2021. This dump includes information on:

Type and formatArchiveSize
Citation data (CSV)ZIP178.6 GB (26.86 GB zipped)
Citation data (N-Triple)ZIP1.18 TB (53.15 GB zipped)
Citation data (Scholix)ZIP979 GB (28.61 GB zipped)
Provenance data (CSV)ZIP246.5 GB (14 GB zipped)
Provenance data (N-Triple)ZIP2.45 TB (57.36 GB zipped)
Triplestore data (Blazegraph)TAR.GZ159 GB zipped
December 2020 Dump

Dump created on 2020-12-07, based on open references to works with DOIs within the Crossref dump dated November 2020. This dump includes information on:

Type and formatArchiveSize
Citation data (CSV)ZIP122.6 GB (18.4 GB zipped)
Citation data (N-Triple)ZIP808 GB (37.9 GB zipped)
Citation data (Scholix)ZIP678 GB (20.1 GB zipped)
Provenance data (CSV)ZIP168.5 GB (9.4 GB zipped)
Provenance data (N-Triple)ZIP1.7 TB (40.1 GB zipped)
September 2020 Dump

Dump created on 2020-09-06, based on open references to works with DOIs within the Crossref dump dated August 2020. This dump includes information on:

Type and formatArchiveSize
Citation data (CSV)ZIP118.4 GB (17.7 GB zipped)
Citation data (N-Triple)ZIP780 GB (36.6 GB zipped)
Citation data (Scholix)ZIP654 GB (19.4 GB zipped)
Provenance data (CSV)ZIP162.7 GB (9.1 GB zipped)
Provenance data (N-Triple)ZIP1.6 TB (38.7 GB zipped)
July 2020 Dump

Dump created on 2020-07-04, based on open references to works with DOIs within the Crossref dump dated June 2020. This dump includes information on:

Type and formatArchiveSize
Citation data (CSV)ZIP116.5 GB (17.4 GB zipped)
Citation data (N-Triple)ZIP767 GB (36 GB zipped)
Citation data (Scholix)ZIP643 GB (19.1 GB zipped)
Provenance data (CSV)ZIP160.1 GB (8.9 GB zipped)
Provenance data (N-Triple)ZIP1.58 TB (38.1 GB zipped)
May 2020 Dump

Dump created on 2020-05-12, based on open references to works with DOIs within the Crossref dump dated April 2020. This dump includes information on:

Type and formatArchiveSize
Citation data (CSV)ZIP113.4 GB (16.9 GB zipped)
Citation data (N-Triple)ZIP746 GB (35 GB zipped)
Citation data (Scholix)ZIP626 GB (18.6 GB zipped)
Provenance data (CSV)ZIP155.9 GB (8.7 GB zipped)
Provenance data (N-Triple)ZIP1.54 TB (37.1 GB zipped)
March 2020 Dump

Dump created on 2020-03-23, based on open references to works with DOIs within the Crossref dump dated February 2019. This dump includes information on:

Type and formatArchiveSize
Citation data (CSV)ZIP106 GB (15.7 GB zipped)
Citation data (N-Triple)ZIP697 GB (31.7 GB zipped)
Citation data (Scholix)ZIP584 GB (17.3 GB zipped)
Provenance data (CSV)ZIP144.9 GB (8.1 GB zipped)
Provenance data (N-Triple)ZIP1.44 TB (34.6 GB zipped)
January 2020 Dump

Dump created on 2020-01-21, based on open references to works with DOIs within the Crossref dump dated November 2019. This dump includes information on:

Type and formatArchiveSize
Citation data (CSV)ZIP101,1 GB (15 GB zipped)
Citation data (N-Triple)ZIP665 GB (30.3 GB zipped)
Citation data (Scholix)ZIP556 GB (16.5 GB zipped)
Provenance data (CSV)ZIP138 GB (7.7 GB zipped)
Provenance data (N-Triple)ZIP1.37 TB (33 GB zipped)
November 2018 Dump

Dump created on 2018-11-12, based on open references to works with DOIs within the Crossref dump dated October 2018. This dump includes information on:

Type and formatArchiveSize
Citation data (CSV)ZIP72 GB (11 GB zipped)
Citation data (N-Triple)ZIP481 GB (22 GB zipped)
Provenance data (CSV)ZIP77 GB (5.3 GB zipped)
Provenance data (N-Triple)ZIP292 GB (11 GB zipped)
Triplestore DB (Blazegraph)ZIP484 GB (67 GB zipped)
July 2018 Dump

Dump created on 2018-07-04. This dump includes information on:

Type and formatArchiveSize
Citation data (CSV)ZIP49 GB (8.2 GB zipped)
Citation data (N-Triple)ZIP333 GB (16 GB zipped)
Provenance data (CSV)ZIP54 GB (4.5 GB zipped)
Provenance data (N-Triple)ZIP206 GB (8 GB zipped)

Open Biomedical Citations in Context Corpus (CCC)

Most recent CCC data dump - March 2021 Dump
20 March 2021 Dump

Dump created on 2021-03-20. This dump includes information on:

TypeArchive
all entitiesdata + provenance

OpenCitations Corpus (OCC)

Most recent OCC data dump - December 2017 Dump
25 December 2017 Dump

Dump created on 2017-12-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
25 November 2017 Dump

Dump created on 2017-11-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
25 October 2017 Dump

Dump created on 2017-10-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
25 September 2017 Dump

Dump created on 2017-09-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
25 August 2017 Dump

Dump created on 2017-08-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance, data (single n-quads file)
25 July 2017 Dump

Dump created on 2017-07-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
25 June 2017 Dump

Dump created on 2017-06-25. This dump includes information on:

TypeArchive
agent roles (ar)(data not available for technical reasons), provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
May 2017 OCC Dump

Dump created on 2017-05-25. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
April 2017 OCC Dump

Dump created on 2017-04-26. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance, data+provenance (single n-quads file)
March 2017 OCC Dump

Dump not submitted for technical reasons.

February 2017 OCC Dump

Dump not submitted for technical reasons.

January 2017 OCC Dump

Dump not submitted for technical reasons.

December 2016 OCC Dump

Dump created on 2016-12-24. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
November 2016 OCC Dump

Dump not submitted for technical reasons.

October 2016 OCC Dump

Dump created on 2016-10-24. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance
September 2016 OCC Dump

Dump created on 2016-09-24. This dump includes information on:

TypeArchive
agent roles (ar)data, provenance
bibliographic entries (be)data, provenance
bibliographic resources (br)data, provenance
identifiers (id)data, provenance
responsible agents (ra)data, provenance
resource embodiment (re)data, provenance
corpustriplestore, provenance