File(s) stored somewhere else

https://imagery.library.arizona.edu/cocadata/

Please note: Linked content is NOT stored on University of Arizona and we can't guarantee its availability, quality, security or accept any liability.

Corpus of Contemporary American English (COCA) 1990 to 2012 Datasets

Version 2 2022-05-16, 14:39

Version 1 2021-10-19, 20:18

dataset

posted on 2021-10-19, 20:18 authored by University of Arizona LibrariesUniversity of Arizona Libraries

Dataset available only to University of Arizona affiliates. To obtain access, you must log into ReDATA with your NetID. Data is for research use by each individual downloader only. Sharing and/or redistribution of any portion of this dataset is prohibited.

In no case can substantial amounts of the full-text data (typically a total of 50,000 words or more) be distributed outside the University of Arizona. If portions of the derived data are made available to others, it cannot include substantial portions of the raw frequency of words. Any publications of products that are based on the data should contain a reference to the source of the data: http://corpus.byu.edu/full-text/. COCA cannot be used to create software or products that will be sold to others.

This database is only available on the COCA website. To access the data, follow the link provided (https://imagery.library.arizona.edu/cocadata/).

The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. The corpus was created by Mark Davies of Brigham Young University, and it is used by tens of thousands of users every month (linguists, teachers, translators, and other researchers). COCA is also related to other large corpora that we have created. The corpus contains more than 450 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. It includes 20 million words each year from 1990-2012 and the corpus is also updated regularly (the most recent texts are from Summer 2012). Because of its design, it is perhaps the only corpus of English that is suitable for looking at current, ongoing changes in the language.

For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu