Toggle Main Menu Toggle Search

Open Access padlockePrints

Use of the chi-squared test to examine vocabulary differences in English language corpora representing seven different countries

Lookup NU author(s): Dr Malcolm Farrow

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Abstract

The chi-squared test is used to find the vocabulary most typical of seven different ICAME corpora, each representing the English used in a particular country. In a closely related study, Leech and Fallon (1992, Computer corpora - what do they tell us about culture? ICAME Journal, 16 : 29-50) found differences in the vocabulary used in the Brown Corpus of American English and that the Lancaster-Oslo-Bergen Corpus of British English. They were mainly interested in those vocabulary differences which they assumed to be due to cultural differences between the United States and Britain, but we are equally interested in vocabulary differences which reveal linguistic preferences in the various countries in which English is spoken. Whether vocabulary differences are cultural or linguistic in nature, they can be used for the automatic classification according to variety of English of texts of unknown provenance. The extent to which the vocabulary differences between the corpora represent vocabulary differences between the varieties of English as a whole depends on the extent to which the corpora represent the full range of topics typical of their associated cultures, and thus there is a need for corpora designed to represent the topics and vocabulary of cultures or dialects, rather than stratified across a set range of topics and genres. This will require methods to determine the range of topics addressed in each culture, then methods to sample adequately from each topical domain. © 2007 Oxford University Press.


Publication metadata

Author(s): Oakes MP, Farrow M

Publication type: Article

Publication status: Published

Journal: Literary and Linguistic Computing

Year: 2007

Volume: 22

Issue: 1

Pages: 85-99

ISSN (print): 0268-1145

ISSN (electronic): 1477-4615

Publisher: Oxford University Press

URL: http://dx.doi.org/101093/llc/fql044

DOI: 10.1093/llc/fql044


Altmetrics

Altmetrics provided by Altmetric


Share