This page contains details of and links to all the data dumps of the OpenCitations Meta and OpenCitations Index. They are made available online by means of the support of Figshare and of the Internet Archive.
The OpenCitations Meta database stores and delivers bibliographic metadata for all publications involved in the OpenCitations Index.
This dataset's dump, released on 2025-02-13, enhances its previous version by incorporating new data from the Crossref dump available at Crossref November 2024 Dump, as well as the November 2024 dump of JaLC (Japan Link Center). This dump includes information on:
121,302,680 bibliographic entities
368,061,399 authors, 2,718,222 editors, and 101,612,475 publishers (counted by their roles, without disambiguating individual
698,995 publication venues
Type and format | Archive | Size |
---|---|---|
Metadata (CSV) | tar | 12G (48G zipped) on ext4 |
Metadata and provenance (RDF) | tar.gz | 47G (145G compressed) on ext4 |
In addition:
Type and format | Archive | Size |
---|---|---|
A CSV dump storing a mapping between all OMIDs and their corresponding PID(s) (e.g., DOI, ORCID, PMID, etc) | ZIP | 6.5 GB (1.5 GB zipped) |
Dump created on 2024-06-20. Compared to the previous dump, this one adds the metadata contained in the Crossref dump dated March 2024.Dump available in CSV (metadata) and JSON-LD (metadata and provenance)
Dump created on 2024-04-06. Compared to the previous dump, this one incorporates OpenAlex IDs, leveraging data from the OpenAlex dump. Dump available in CSV (metadata) and RDF (metadata and provenance) formats.
Dump created on 2023-11-29. Compared to the previous dump, this one adds the metadata contained in the Japan Link Center (JaLC). Dump available in CSV (metadata) format.
Dump created on 2023-10-24. Compared to the previous dump, this one adds the metadata contained in OpenAIRE and in the Crossref dump dated September 2023.Dump available in CSV (metadata) and JSON-LD (metadata and provenance) formats.
Dump created on 2023-06-28. Compared to the previous dump, this one adds the metadata contained in the dump of NIH Open Citation Collection dated November 2022. Dump available in CSV (metadata) and JSON-LD (metadata and provenance) formats.
Dump created on 2023-02-24. Compared to the previous dump, this one adds the metadata contained in the last dump of DataCite dated 22 October 2021. Dump available in CSV (metadata) and JSON-LD (metadata and provenance) formats.
Dump created on 2022-12-20, based on open references to works with DOIs within the Crossref dump dated December 2022. Dump available in CSV (metadata) and JSON-LD (metadata and provenance) formats.
The OpenCitations Index stores OMID-to-OMID references representing all the references gathered from several sources.
Dump created on 2025-03-24. Compared to the previous dump, this one adds the citation data contained in the Crossref dump dated November 2024. This dump includes information on:
2,155,497,918 citations
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 220 GB (34.4 GB zipped) |
Citation data (N-Triple) | ZIP | 1.9 TB (80.6 GB zipped) |
Citation data (Scholix) | ZIP | 1.9 TB (40 GB zipped) |
Provenance data (CSV) | ZIP | 410 GB (18 GB zipped) |
Provenance data (N-Triple) | ZIP | 3.1 TB (95 GB zipped) |
In addition:
Type and format | Archive | Size |
---|---|---|
Citation data sources' info (N-Triple): information regarding the data source collection (e.g., COCI, DOCI, POCI, etc) of all the citation data | ZIP | 388 GB (23.7 GB zipped) |
Citation data sources' info (CSV): information regarding the data source collection (e.g., COCI, DOCI, POCI, etc) of all the citation data | ZIP | 97 GB (21 GB zipped) |
Citation count data (CSV): the number of incoming citations to each bibliographic entity (identified by an OMID) in OpenCitations Index | TBA |
Dump created on 2024-07-01. Dump available in CSV (citation data), N-Triple (citation data), SCHOLIX (citation data), CSV (provenance data), and N-Triple (provenance data). In addition, a N-Triple dump containing information regarding the data source collection, and a citation count dump with the number of incoming citations to each bibliographic entity (identified by an OMID)
Dump created on 2023-11-29. Dump available in CSV (citation data), N-Triple (citation data), SCHOLIX (citation data), CSV (provenance data), and N-Triple (provenance data). In addition, a N-Triple dump containing information regarding the data source collection, and a citation count dump with the number of incoming citations to each bibliographic entity (identified by an OMID)
Dump created on 2023-10-25. Dump available in CSV (citation data), N-Triple (citation data), SCHOLIX (citation data), CSV (provenance data), and N-Triple (provenance data). In addition, a N-Triple dump containing information regarding the data source collection.