This page contains details of and links to all the data dumps of the OpenCitations Indexes, the Open Biomedical Citations in Context Corpus, OpenCitations Meta and of the OpenCitations Corpus (OCC). They are made available online by means of the support of Figshare and of the Internet Archive.
Each dump of an OpenCitations Index is composed by four zip archives. Two of these archives contains the actual data and provenance information of the index in N-Triples, while the other two archives contain the same information in CSV.
Each dump of the Open Biomedical Citations in Context Corpus is composed by one single zip artchive containing all the information about actual data and provenance stored in JSON-LD.
Instead, each dump of the OpenCitations Corpus is composed by several zip archives, each containing either data or provenance information relating to a particular sub-dataset within the OCC. After unzipping an archive, one needs to use Disk ARchive (DAR) - a multi-platform archive tool for managing huge amount of data - to recreate the whole OCC structure.
Dump created on 2023-06-28, it includes information on:
99,851,773 bibliographic entities
318,409,360 authors and 2,406,318 editors (counted by their roles, without disambiguating individuals)
659,214 publication venues
34,394 publishers
Type and format | Archive | Size |
---|---|---|
Metadata (CSV) | ZIP | 39.9 GB (9 GB zipped) on NTFS |
Metadata and provenance (RDF) | ZIP | 35.1 GB zipped on NTFS. It does not vary once extracted because it contains zipped JSON files |
Dump created on 2022-12-13, based on the last dump of DataCite dated 22 October 2021. This dump includes information on:
1,753,858 bibliographic resources;
169,822,752 citations;
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 19.4 GB (2.4 GB zipped) |
Citation data (N-Triple) | ZIP | 130 GB (5.4 GB zipped) |
Citation data (Scholix) | ZIP | 140.3 GB (2.8 GB zipped) |
Provenance data (CSV) | ZIP | 33 GB (1.4 GB zipped) |
Provenance data (N-Triple) | ZIP | 305 GB (7.3 GB zipped) |
Dump created on 2022-12-27, based on the dump of NIH Open Citation Collection dated November 2022.. This dump includes information on:
29,005,551 bibliographic resources.
717,654,703 citations;
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 50 GB (9.6 GB zipped) |
Citation data (N-Triple) | ZIP | 610 GB (25 GB zipped) |
Citation data (Scholix) | ZIP | 608 GB (11.8 GB zipped) |
Provenance data (CSV) | ZIP | 122 GB (5 GB zipped) |
Provenance data (N-Triple) | ZIP | 731 GB (32 GB zipped) |
Dump created on 2023-01-23, based on open references to works with DOIs within the Crossref dump dated December 2022. This dump includes information on:
77,045,952 bibliographic resources;
1,463,920,523 citations.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 238.5 GB (37.5 GB zipped) |
Citation data (N-Triple) | ZIP | 1.6 TB (73.1 GB zipped) |
Citation data (Scholix) | ZIP | 1.3 TB (38.8 GB zipped) |
Provenance data (CSV) | ZIP | 330 GB (20 GB zipped) |
Provenance data (N-Triple) | ZIP | 3.3 TB (78 GB zipped) |
In addition, a dump containing the number of incoming citations to each bibliographic entity (identified by a DOI) in COCI is provided:
Citation count data (CSV) | ZIP | 2.27 GB (0.64 GB zipped) |
Dump created on 2022-10-31, based on open references to works with DOIs within the Crossref dump dated October 2022. This dump includes information on:
76,072,926 bibliographic resources;
1,392,036,835 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 226.5 GB (35.5 GB zipped) |
Citation data (N-Triple) | ZIP | 1.5 TB (69.1 GB zipped) |
Citation data (Scholix) | ZIP | 1.24 TB (36.9 GB zipped) |
Provenance data (CSV) | ZIP | 313 GB (18.92 GB zipped) |
Provenance data (N-Triple) | ZIP | 3.15 TB (73.9 GB zipped) |
In addition, a dump containing the number of incoming citations to each bibliographic entity (identified by a DOI) in COCI is provided:
Citation count data (CSV) | ZIP | 2.24 GB (0.66 GB zipped) |
Dump created on 2022-08-31, based on open references to works with DOIs within the Crossref dump dated August 2022. This dump includes information on:
75,030,924 bibliographic resources;
1,363,718,366 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 221.8 GB (34.81 GB zipped) |
Citation data (N-Triple) | ZIP | 1.46 TB (67.64 GB zipped) |
Citation data (Scholix) | ZIP | 1.21 TB (36.13 GB zipped) |
Provenance data (CSV) | ZIP | 308 GB (19.71 GB zipped) |
Provenance data (N-Triple) | ZIP | 3.08 TB (72.5 GB zipped) |
Dump created on 2022-06-18, based on open references to works with DOIs within the Crossref dump dated June 2022. This dump includes information on:
73,103,457 bibliographic resources;
1,315,379,811 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 213.8 GB (33.41 GB zipped) |
Citation data (N-Triple) | ZIP | 1.41 TB (65.03 GB zipped) |
Citation data (Scholix) | ZIP | 1.17 TB (34.83 GB zipped) |
Provenance data (CSV) | ZIP | 296.3 GB (18.96 GB zipped) |
Provenance data (N-Triple) | ZIP | 2.98 TB (69.7 GB zipped) |
Dump created on 2022-03-26, based on open references to works with DOIs within the Crossref dump dated March 2022. This dump includes information on:
72,268,850 bibliographic resources;
1,294,283,603 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 210.4 GB (32.81 GB zipped) |
Citation data (N-Triple) | ZIP | 1.39 TB (63.93 GB zipped) |
Citation data (Scholix) | ZIP | 1.15 TB (34.28 GB zipped) |
Provenance data (CSV) | ZIP | 291.6 GB (18.63 GB zipped) |
Provenance data (N-Triple) | ZIP | 2.94 TB (68.55 GB zipped) |
Dump created on 2022-01-29, based on open references to works with DOIs within the Crossref dump dated January 2022. This dump includes information on:
71,337,645 bibliographic resources;
1,271,360,867 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 206.7 GB (32.06 GB zipped) |
Citation data (N-Triple) | ZIP | 1.37 TB (62.62 GB zipped) |
Citation data (Scholix) | ZIP | 1.13 TB (33.59 GB zipped) |
Provenance data (CSV) | ZIP | 286.5 GB (18.2 GB zipped) |
Provenance data (N-Triple) | ZIP | 2.89 TB (67.16 GB zipped) |
Dump created on 2021-11-25, based on open references to works with DOIs within the Crossref dump dated October 2021. This dump includes information on:
69,897,400 bibliographic resources;
1,235,170,583 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 200.9 GB (30.88 GB zipped) |
Citation data (N-Triple) | ZIP | 1.33 TB (60.52 GB zipped) |
Citation data (Scholix) | ZIP | 1.1 TB (32.49 GB zipped) |
Provenance data (CSV) | ZIP | 278.5 GB (16.81 GB zipped) |
Provenance data (N-Triple) | ZIP | 2.75 TB (65 GB zipped) |
Dump created on 2021-09-03, based on open references to works with DOIs within the Crossref dump dated August 2021. This dump includes information on:
69,074,291 bibliographic resources;
1,186,958,898 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 193.6 GB (29.44 GB zipped) |
Citation data (N-Triple) | ZIP | 1.28 TB (57.9 GB zipped) |
Citation data (Scholix) | ZIP | 1.06 TB (31.12 GB zipped) |
Provenance data (CSV) | ZIP | 267.5 GB (15.37 GB zipped) |
Provenance data (N-Triple) | ZIP | 2.65 TB (62.4 GB zipped) |
Dump created on 2021-07-29, based on open references to works with DOIs within the Crossref dump dated January 2021. This dump includes information on:
65,835,422 bibliographic resources;
1,094,394,688 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 178.6 GB (26.86 GB zipped) |
Citation data (N-Triple) | ZIP | 1.18 TB (53.15 GB zipped) |
Citation data (Scholix) | ZIP | 979 GB (28.61 GB zipped) |
Provenance data (CSV) | ZIP | 246.5 GB (14 GB zipped) |
Provenance data (N-Triple) | ZIP | 2.45 TB (57.36 GB zipped) |
Triplestore data (Blazegraph) | TAR.GZ | 159 GB zipped |
Dump created on 2020-12-07, based on open references to works with DOIs within the Crossref dump dated November 2020. This dump includes information on:
60,778,357 bibliographic resources;
759,516,507 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 122.6 GB (18.4 GB zipped) |
Citation data (N-Triple) | ZIP | 808 GB (37.9 GB zipped) |
Citation data (Scholix) | ZIP | 678 GB (20.1 GB zipped) |
Provenance data (CSV) | ZIP | 168.5 GB (9.4 GB zipped) |
Provenance data (N-Triple) | ZIP | 1.7 TB (40.1 GB zipped) |
Dump created on 2020-09-06, based on open references to works with DOIs within the Crossref dump dated August 2020. This dump includes information on:
59,455,917 bibliographic resources;
733,367,140 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 118.4 GB (17.7 GB zipped) |
Citation data (N-Triple) | ZIP | 780 GB (36.6 GB zipped) |
Citation data (Scholix) | ZIP | 654 GB (19.4 GB zipped) |
Provenance data (CSV) | ZIP | 162.7 GB (9.1 GB zipped) |
Provenance data (N-Triple) | ZIP | 1.6 TB (38.7 GB zipped) |
Dump created on 2020-07-04, based on open references to works with DOIs within the Crossref dump dated June 2020. This dump includes information on:
58,876,621 bibliographic resources;
721,655,392 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 116.5 GB (17.4 GB zipped) |
Citation data (N-Triple) | ZIP | 767 GB (36 GB zipped) |
Citation data (Scholix) | ZIP | 643 GB (19.1 GB zipped) |
Provenance data (CSV) | ZIP | 160.1 GB (8.9 GB zipped) |
Provenance data (N-Triple) | ZIP | 1.58 TB (38.1 GB zipped) |
Dump created on 2020-05-12, based on open references to works with DOIs within the Crossref dump dated April 2020. This dump includes information on:
58,028,534 bibliographic resources;
702,772,530 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 113.4 GB (16.9 GB zipped) |
Citation data (N-Triple) | ZIP | 746 GB (35 GB zipped) |
Citation data (Scholix) | ZIP | 626 GB (18.6 GB zipped) |
Provenance data (CSV) | ZIP | 155.9 GB (8.7 GB zipped) |
Provenance data (N-Triple) | ZIP | 1.54 TB (37.1 GB zipped) |
Dump created on 2020-03-23, based on open references to works with DOIs within the Crossref dump dated February 2019. This dump includes information on:
55,622,845 bibliographic resources;
655,602,113 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 106 GB (15.7 GB zipped) |
Citation data (N-Triple) | ZIP | 697 GB (31.7 GB zipped) |
Citation data (Scholix) | ZIP | 584 GB (17.3 GB zipped) |
Provenance data (CSV) | ZIP | 144.9 GB (8.1 GB zipped) |
Provenance data (N-Triple) | ZIP | 1.44 TB (34.6 GB zipped) |
Dump created on 2020-01-21, based on open references to works with DOIs within the Crossref dump dated November 2019. This dump includes information on:
53,464,457 bibliographic resources;
624,183,532 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 101,1 GB (15 GB zipped) |
Citation data (N-Triple) | ZIP | 665 GB (30.3 GB zipped) |
Citation data (Scholix) | ZIP | 556 GB (16.5 GB zipped) |
Provenance data (CSV) | ZIP | 138 GB (7.7 GB zipped) |
Provenance data (N-Triple) | ZIP | 1.37 TB (33 GB zipped) |
Dump created on 2018-11-12, based on open references to works with DOIs within the Crossref dump dated October 2018. This dump includes information on:
46,534,705 bibliographic resources;
445,826,118 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 72 GB (11 GB zipped) |
Citation data (N-Triple) | ZIP | 481 GB (22 GB zipped) |
Provenance data (CSV) | ZIP | 77 GB (5.3 GB zipped) |
Provenance data (N-Triple) | ZIP | 292 GB (11 GB zipped) |
Triplestore DB (Blazegraph) | ZIP | 484 GB (67 GB zipped) |
Dump created on 2018-07-04. This dump includes information on:
45,145,889 bibliographic resources;
316,243,802 citation links.
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 49 GB (8.2 GB zipped) |
Citation data (N-Triple) | ZIP | 333 GB (16 GB zipped) |
Provenance data (CSV) | ZIP | 54 GB (4.5 GB zipped) |
Provenance data (N-Triple) | ZIP | 206 GB (8 GB zipped) |
Dump created on 2021-03-20. This dump includes information on:
64,810 citing bibliographic resources;
7,045,425 citation links.
Type | Archive |
---|---|
all entities | data + provenance |
Dump created on 2017-12-25. This dump includes information on:
298,797 citing bibliographic resources;
6,488,914 cited bibliographic resources;
12,652,601 citation links.
Type | Archive |
---|---|
agent roles (ar) | data, provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance |
Dump created on 2017-11-25. This dump includes information on:
283,215 citing bibliographic resources;
6,251,266 cited bibliographic resources;
11,976,217 citation links.
Type | Archive |
---|---|
agent roles (ar) | data, provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance |
Dump created on 2017-10-25. This dump includes information on:
264,835 citing bibliographic resources;
5,967,178 cited bibliographic resources;
11,207,388 citation links.
Type | Archive |
---|---|
agent roles (ar) | data, provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance |
Dump created on 2017-09-25. This dump includes information on:
245,504 citing bibliographic resources;
5,683,435 cited bibliographic resources;
10,457,170 citation links.
Type | Archive |
---|---|
agent roles (ar) | data, provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance |
Dump created on 2017-08-25. This dump includes information on:
225,645 citing bibliographic resources;
5,358,479 cited bibliographic resources;
9,607,259 citation links.
Type | Archive |
---|---|
agent roles (ar) | data, provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance, data (single n-quads file) |
Dump created on 2017-07-25. This dump includes information on:
203,301 citing bibliographic resources;
4,972,748 cited bibliographic resources;
8,652,486 citation links.
Type | Archive |
---|---|
agent roles (ar) | data, provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance |
Dump created on 2017-06-25. This dump includes information on:
180,403 citing bibliographic resources;
4,575,269 cited bibliographic resources;
7,730,161 citation links.
Type | Archive |
---|---|
agent roles (ar) | (data not available for technical reasons), provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance |
Dump created on 2017-05-25. This dump includes information on:
159,451 citing bibliographic resources;
4,173,318 cited bibliographic resources;
6,819,745 citation links.
Type | Archive |
---|---|
agent roles (ar) | data, provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance |
Dump created on 2017-04-26. This dump includes information on:
139,822 citing bibliographic resources;
3,781,435 cited bibliographic resources;
5,976,771 citation links.
Type | Archive |
---|---|
agent roles (ar) | data, provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance, data+provenance (single n-quads file) |
Dump not submitted for technical reasons.
Dump not submitted for technical reasons.
Dump not submitted for technical reasons.
Dump created on 2016-12-24. This dump includes information on:
60,119 citing bibliographic resources;
1,935,661 cited bibliographic resources;
2,586,233 citation links;
40,314 containers (journals, books, etc.);
136,365 ORCID identifiers.
Type | Archive |
---|---|
agent roles (ar) | data, provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance |
Dump not submitted for technical reasons.
Dump created on 2016-10-24. This dump includes information on:
1,461,441 citing/cited bibliographic resources;
1,796,048 citation links;
32,301 containers (journals, books, etc.);
115,542 ORCID identifiers.
Type | Archive |
---|---|
agent roles (ar) | data, provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance |
Dump created on 2016-09-24. This dump includes information on:
953,736 citing/cited bibliographic resources;
1,106,920 citation links;
25,772 containers (journals, books, etc.);
92,833 ORCID identifiers.
Type | Archive |
---|---|
agent roles (ar) | data, provenance |
bibliographic entries (be) | data, provenance |
bibliographic resources (br) | data, provenance |
identifiers (id) | data, provenance |
responsible agents (ra) | data, provenance |
resource embodiment (re) | data, provenance |
corpus | triplestore, provenance |