You are here
The computer-aided analysis of naturalistic or experimental speech data provides new insights into the nature of the language faculty and new means of testing linguistic theories. This course treats corpus methods for phonology, emphasizing connections between data mining, statistical testing, and phonological analysis (especially issues concerning productivity and potential discrepancies between different sources of linguistic data). The first week focuses on extracting distributions and predictors of variable phenomena (e.g. allophony, end-weight, clitic placement) from a naturalistic corpus and translating them into probabilistic, multifactorial models of grammar. The second turns more to the place of corpus data in phonological theory, including the use of naturalistic distributions to test current theoretical proposals, and the reconciliation of apparent contradictions between different types of linguistic data, including corpus data, lexical data, intuitions, perceptual experiments, and wug tests. Some critical Python and R scripting is covered; no prior programming ability is assumed.