This page contains details of and links to all the data dumps of the OpenCitations Meta and OpenCitations Index. They are made available online by means of the support of Figshare and of the Internet Archive.
The OpenCitations Meta database stores and delivers bibliographic metadata for all publications involved in the OpenCitations Index.
This dataset's dump, created on 2024-04-06, enhances its previous version by incorporating OpenAlex IDs, leveraging data from the OpenAlex dump available at https://openalex.s3.amazonaws.com/browse.html. This dump includes information on:
114,703,611 bibliographic entities
343,715,548 authors and 2,533,136 editors (counted by their roles, without disambiguating individuals)
711,711 publication venues
241,783 publishers
Type and format | Archive | Size |
---|---|---|
Metadata (CSV) | ZIP | 11 GB (46 GB zipped) on ext4 |
Metadata and provenance (RDF) | XZ (LZMA2 compression algorithm) | 29 GB (43 GB compressed) on ext4 |
In addition, a CSV dump containing a mapping between all the bibliographic resources identified by an OMID (e.g., br/12345) and their corresponding PID(s) (e.g., DOI, PMID)
BR OMID map (CSV) | ZIP | 4.4 GB (1.7 GB zipped) |
Dump created on 2023-11-29. Compared to the previous dump, this one adds the metadata contained in the Japan Link Center (JaLC). Dump available in CSV (metadata) format.
Dump created on 2023-10-24. Compared to the previous dump, this one adds the metadata contained in OpenAIRE and in the Crossref dump dated September 2023.Dump available in CSV (metadata) and JSON-LD (metadata and provenance) formats.
Dump created on 2023-06-28. Compared to the previous dump, this one adds the metadata contained in the dump of NIH Open Citation Collection dated November 2022. Dump available in CSV (metadata) and JSON-LD (metadata and provenance) formats.
Dump created on 2023-02-24. Compared to the previous dump, this one adds the metadata contained in the last dump of DataCite dated 22 October 2021. Dump available in CSV (metadata) and JSON-LD (metadata and provenance) formats.
Dump created on 2022-12-20, based on open references to works with DOIs within the Crossref dump dated December 2022. Dump available in CSV (metadata) and JSON-LD (metadata and provenance) formats.
The OpenCitations Index stores OMID-to-OMID references representing all the references gathered from several sources.
Dump created on 2023-11-29. Compared to the previous dump, this one adds the citation data contained in the Japan Link Center (JaLC) and in the Crossref dump dated September 2023. This dump includes information on:
89,920,081 bibliographic resources;
1,975,552,846 citations;
Type and format | Archive | Size |
---|---|---|
Citation data (CSV) | ZIP | 171 GB (26.8 GB zipped) |
Citation data (N-Triple) | ZIP | 1.4 TB (62.3 GB zipped) |
Citation data (Scholix) | ZIP | 1.7 TB (37 GB zipped) |
Provenance data (CSV) | ZIP | 14 GB (312 GB zipped) |
Provenance data (N-Triple) | ZIP | 2.5 TB (79 GB zipped) |
In addition:
Type and format | Archive | Size |
---|---|---|
Citation data sources' info (N-Triple): information regarding the data source collection (e.g., COCI, DOCI, POCI, etc) of all the citation data | ZIP | 351 GB (19 GB zipped) |
Citation count data (CSV): the number of incoming citations to each bibliographic entity (identified by an OMID) in OpenCitations Index | ZIP | 1.7 GB (0.4 GB zipped) |
Reference count data (CSV): the number of references of each bibliographic entity (identified by an OMID) in OpenCitations Index | ZIP | 1.7 GB (0.35 GB zipped) |
Dump created on 2023-10-25. Dump available in CSV (citation data), N-Triple (citation data), SCHOLIX (citation data), CSV (provenance data), and N-Triple (provenance data). In addition, a N-Triple dump containing information regarding the data source collection.