You are here
Computational Corpus Lexicography
Courses
Computational Corpus Lexicography
With the availability of large corpora and the increased computational power that has enabled their efficient processing, modern lexicography has undergone a revolution. Statistical techniques, now central to lexicography, enable lexicographers to produce higher-quality dictionaries at lower cost. This course introduces modern corpus lexicography, with a focus on monolingual dictionaries, using English as an exemplar. We discuss the need for large corpora, how to build them, and the key statistical corpus methods used in modern lexicography. We introduce relevant linguistic theories, specifically Fillmore’s frame semantics and Hanks’s theory of norms and exploitations. We also address the practical issues of how to apply these theoretical insights, along with statistical information from corpora, to make distinctions about word meanings. Throughout, we take a hands-on, practical approach. Students will be expected to bring a laptop or tablet to class for in-class exercises.