--------------------------------------------- # Borderlands newspaper text 1897 - 1963 Preferred citation (DataCite format): Oliver, Jeffrey C (2020). Borderlands newspaper text 1897 - 1963. University of Arizona Research Data Repository. Dataset. https://doi.org/10.25422/azu.data.12735992 Corresponding Author: Jeffrey C Oliver, University of Arizona, jcoliver@email.arizona.edu License: CC0 DOI: https://doi.org/10.25422/azu.data.12735992 --------------------------------------------- ## Summary The data files are text of eight newspapers published in the border region of Arizona, USA and Sonora, Mexico between 1897 and 1963. The text is generated from scans of newspapers analyzed with OCR by the Library of Congress Chronicling America Project. Details on the digitization process can be found at [https://chroniclingamerica.loc.gov/about/](https://chroniclingamerica.loc.gov/about/). Text from these newspapers is considered in the Public Domain (see [https://chroniclingamerica.loc.gov/about/#rights_and_reproductions](https://chroniclingamerica.loc.gov/about/#rights_and_reproductions)). --------------------------------------------- ## Files and Folders The file full-titles.csv provides publication location, Library of Congress Control Number (LCCN), language, directory name, and dates of coverage for each newspaper title included. Within each title's directory are two directories: pages and volumes. The pages directories include text files of scanned text of single pages and are named with the following format: YYYYMMDD-.txt, where YYYYMMDD reflects the date of publication and is the integer page number. For example, the text of page 2 of the El Tucsonense paper from January 3, 1925 is stored in el-tucsonense/pages/19250103-2.txt. The volumes directories include text files of concatenated individual pages for a day's publication; files are named with the YYYYMMDD.txt format, where YYYYMMDD refers to the date of publication. For example, the text of all pages of the El Tucsonense paper published on January 3, 1925 is stored in el-tucsonense/volumes/19250103.txt. ### apache-sentinel: Scanned text of _Apache Sentinel_ issues #### apache-sentinel/pages: Text files of individual pages #### apache-sentinel/volumes: Text files of concatenated pages for single day ### arizona-citizen: Scanned text of _Arizona Citizen_ issues #### arizona-citizen/pages: Text files of individual pages #### arizona-citizen/volumes: Text files of concatenated pages for single day ### arizona-post: Scanned text of _Arizona Post_ issues #### arizona-post/pages: Text files of individual pages #### arizona-post/volumes: Text files of concatenated pages for single day ### arizona-sun: Scanned text of _Arizona Sun_ issues #### arizona-sun/pages: Text files of individual pages #### arizona-sun/volumes: Text files of concatenated pages for single day ### bisbee-daily-review: Scanned text of _Bisbee Daily Review_ issues #### bisbee-daily-review/pages: Text files of individual pages #### bisbee-daily-review/volumes: Text files of concatenated pages for single day ### border-vidette: Scanned text of _Border Vidette_ issues #### border-vidette/pages: Text files of individual pages #### border-vidette/volumes: Text files of concatenated pages for single day ### el-fronterizo: Scanned text of _El Fronterizo_ issues #### el-fronterizo/pages: Text files of individual pages #### el-fronterizo/volumes: Text files of concatenated pages for single day ### el-mosquito: Scanned text of _El Mosquito_ issues #### el-mosquito/pages: Text files of individual pages #### el-mosquito/volumes: Text files of concatenated pages for single day ### el-sol: Scanned text of _El Sol_ issues #### el-sol/pages: Text files of individual pages #### el-sol/volumes: Text files of concatenated pages for single day ### el-tucsonense: Scanned text of _El Tucsonense_ issues #### el-tucsonense/pages: Text files of individual pages #### el-tucsonense/volumes: Text files of concatenated pages for single day ### phoenix-tribune: Scanned text of _Phoenix Tribune_ issues #### phoenix-tribune/pages: Text files of individual pages #### phoenix-tribune/volumes: Text files of concatenated pages for single day ### the-oasis: Scanned text of _The Oasis_ issues #### the-oasis/pages: Text files of individual pages #### the-oasis/volumes: Text files of concatenated pages for single day ### the-daily-morning-oasis: Scanned text of _The Daily Morning Oasis_ issues #### the-daily-morning-oasis/pages: Text files of individual pages #### the-daily-morning-oasis/volumes: Text files of concatenated pages for single day ### the-weekly-orb: Scanned text of _The Weekly Orb_ issues #### the-weekly-orb/pages: Text files of individual pages #### the-weekly-orb/volumes: Text files of concatenated pages for single day ### tucson-citizen: Scanned text of _Tucson Citizen_ issues #### tucson-citizen/pages: Text files of individual pages #### tucson-citizen/volumes: Text files of concatenated pages for single day -------------------------------------------- ## Materials & Methods Individual pages were downloaded via the Chronicling America API [https://chroniclingamerica.loc.gov/about/api/](https://chroniclingamerica.loc.gov/about/api/). Text files for volumes were created using Python scripts. Source code for downloading original text files and volume assembly are available at [https://github.com/jcoliver/borderlands-newspapers](https://github.com/jcoliver/borderlands-newspapers). All text files are encoded with UTF-8. --------------------------------------------- ## Contributor Roles The roles are defined by the CRediT taxonomy http://credit.niso.org/ - Jeffrey Oliver, University of Arizona: Methodology, Software, Validation, Data Curation --------------------------------------------- ## Additional Notes Links: https://chroniclingamerica.loc.gov/ https://chroniclingamerica.loc.gov/about/api/ https://github.com/jcoliver/borderlands-newspapers https://github.com/jcoliver/dig-coll-borderlands