Browse by author
Lookup NU author(s): Dr Malcolm Farrow
Full text for this publication is not currently held within this repository. Alternative links are provided below where available.
The chi-squared test is used to find the vocabulary most typical of seven different ICAME corpora, each representing the English used in a particular country. In a closely related study, Leech and Fallon (1992, Computer corpora - what do they tell us about culture? ICAME Journal, 16 : 29-50) found differences in the vocabulary used in the Brown Corpus of American English and that the Lancaster-Oslo-Bergen Corpus of British English. They were mainly interested in those vocabulary differences which they assumed to be due to cultural differences between the United States and Britain, but we are equally interested in vocabulary differences which reveal linguistic preferences in the various countries in which English is spoken. Whether vocabulary differences are cultural or linguistic in nature, they can be used for the automatic classification according to variety of English of texts of unknown provenance. The extent to which the vocabulary differences between the corpora represent vocabulary differences between the varieties of English as a whole depends on the extent to which the corpora represent the full range of topics typical of their associated cultures, and thus there is a need for corpora designed to represent the topics and vocabulary of cultures or dialects, rather than stratified across a set range of topics and genres. This will require methods to determine the range of topics addressed in each culture, then methods to sample adequately from each topical domain. © 2007 Oxford University Press.
Author(s): Oakes MP, Farrow M
Publication type: Article
Publication status: Published
Journal: Literary and Linguistic Computing
Year: 2007
Volume: 22
Issue: 1
Pages: 85-99
ISSN (print): 0268-1145
ISSN (electronic): 1477-4615
Publisher: Oxford University Press
URL: http://dx.doi.org/101093/llc/fql044
DOI: 10.1093/llc/fql044
Altmetrics provided by Altmetric