Maintained by the Association of Computational Linguistics, hosts thousands of papers on computational linguistics and natural language processing
Brigham Young University projects that offer a number of specialist corpora, primarily covering US legal material
100-million word collection of written and spoken modern British English representing a "unique snapshot of the English language". The data was collected between 1991 and 1994 and represents a wide variety of mostly written (90%) and spoken language.
The corpus contains more than one billion words of text (1990-) from eight genres: spoken, fiction, popular magazines, newspapers, academic texts, and (with the update in March 2020): TV and Movies subtitles, blogs, and other web pages
Useful links to some major English language corpora
Provides information about corpora at Essex, and links to corpus tools and corpus resources.
Vast resource of corpora, data, software & research papers hosted by the University of Pennsylvania. Essex has purchased access to a very limited number of corpora
OPUS Open Parallel Corpora
EU collaborative project to promote availability of open parallel corpora resources
An index with words in different languages for happiness
Global registry of research data repositories, all subject areas, including linguistics
Produced by the University of Liverpool, this site offers a suite of tools which allows access to the World Wide Web as a corpus. It can aid research on how particular words and phrases are used, especially those which are too new or too rare to appear in any dictionary or standard corpus.