University of Arizona
Browse

University of Arizona authors' scholarly works published and cited works year 2022 from OpenAlex

dataset
posted on 2025-04-14, 16:36 authored by Yan HanYan Han

Two Datasets: works_published and works_cited for year 2022 from OpenAlex database.

Check license https://github.com/ourresearch/openalex-docs/blob/main/license.md "OpenAlex data is made available under the CC0 license. That means it's in the public domain, and free to use in any way you like. We appreciate attribution where it's convenient, but it's not at all necessary. There is one exception: the MAG Format snapshot is released under ODC-BY, as per the original MAG license applied by Microsoft (it reuses their schema). See the LICENSE.txt file in the MAG format snapshot distribution for attribution requirement details."

Data Quality Considerations:

  • OpenAlex has improved the accuracy of the data with helps from algorithms and institutions.
  • Our current data quality assessment showed the precision and recall 95%+.

The first dataset "works_published", as constructed in the provided sources, refers to the publications authored by individuals affiliated with the University of Arizona (UArizona). The data is retrieved using the OpenAlexR package by querying the OpenAlex database with UArizona's Research Organization Registry (ROR) ID (03m2x1q45) and specific publication date ranges. Key aspects of this dataset:

  • Scope: It contains records of scholarly works associated with UArizona authors, including various publication types such as journals, repositories (like PubMed and arXiv), and others. It is also possible to filter the results to include only "journal" type publications using the primary_location.source.type = "journal" parameter in the oa_fetch function.
  • Temporal Coverage: The sources demonstrate fetching data for specific years (e.g., 2019, 2020, 2021, 2022, 2023).
  • Data Retrieval: The process involves using the oa_fetch function from the openalexR package with the entity="works" parameter and specifying the institutions.ror.
  • Data Structure: Each record in this dataset represents a publication and includes various fields. Certain fields are data frames.
  • Usage: This dataset is used as a starting point for various data analyses and data mining.

The second dataset "works_cited", refers to scholarly works cited by the publications within the works_published dataset. It is created by extracting the OpenAlex IDs from the $referenced_works field of the works_published data and then using the oa_fetch function to retrieve the full metadata for these cited works. Key aspects of this dataset:

  • Scope: It includes metadata for a wide range of scholarly works that have been cited by UArizona-affiliated publications. This can encompass articles, books, preprints, book chapters, and other types of scholarly outputs.
  • Data Derivation: The dataset is derived from the referenced_works field of the works_published dataset.
  • Data Structure: Each record in this dataset represents a cited work and contains various fields retrieved by the OpenAlex API.

The third file (institution_publications.r) is the source code to get the above dataset.

  • Note the code retrieves additional years in addition to 2022.

Usage: Both datasets are crucial for performing publication and citation analysis and mining, including:

  • Identifying the most frequently cited works and journals.
  • Analyzing the journal usage and publisher distribution of cited works.
  • Understanding the scholarly landscape influencing UArizona research.
  • Identifying potential resources for library collections based on citation frequency.
  • Investigating the presence and frequency of citations from specific publishers or to specific works.




For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu


This item is part of University of Arizona authors' scholarly works published and cited works

History

Usage metrics

    University Libraries

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC