Computational Linguistics

Advanced Probabilistic Modeling in R

Roger Levy

Probabilistic modeling is transforming the study of human language, ranging from novel theories of linguistic cognition to sophisticated techniques for statistical analysis of complex, structured linguistic data to practical methods for automated processing of language. Doing cutting-edge research in these areas requires skill with probability and statistics, familiarity with formalisms from computational linguistics, ability to use and develop new computational tools, and comfort with handling complex datasets. This course will cover both theory and application, covering conceptual fundame
Read More
Computational Approaches to Sound Change

James Kirby, Morgan Sonderegger

Decades of empirical research have led to an increasingly nuanced picture of the nature of phonetic and phonological change, incorporating insights from speech production and perception, cognitive biases, and social factors. However, there remains a significant gap between observed patterns and proposed mechanisms, in part due to the difficulty of conducting the type of controlled studies necessary to test hypotheses about historical change. Computational and mathematical models provide an alternative means by which such hypotheses can be fruitfully explored.
Read More
Computational Corpus Lexicography

Paul Cook, Ed Finegan

With the availability of large corpora and the increased computational power that has enabled their efficient processing, modern lexicography has undergone a revolution. Statistical techniques, now central to lexicography, enable lexicographers to produce higher-quality dictionaries at lower cost. This course introduces modern corpus lexicography, with a focus on monolingual dictionaries, using English as an exemplar. We discuss the need for large corpora, how to build them, and the key statistical corpus methods used in modern lexicography.
Read More
Computational Learning of Syntax

Alexander Clark

This course will look at the computational and mathematical theory of how grammars can be learned from strings.
Any theory of language acquisition must at bottom rest on some solution to this problem.
Read More
Computational Lexical Semantics

Dan Jurafsky

Survey of computational models for representing and processing lexical semantics. Topics include semantic role labeling, online dictionaries and thesauri, word sense disambiguation, distributional (vector) semantics, and sentiment analysis.

Read More
Computational Minimalism

Greg Kobele

A precise formal understanding of a linguistic theory is vital for distinguishing between contentful and notational aspects of a linguistic proposal, for pinpointing cross-framework agreements and disagreements, and for making principled connections to other empirical domains.

This course will present recent transformational syntax (`minimalism') in terms Stabler's minimalist grammar (MG) formalism. To get a feel for the formalism, we will engage in a hands-on analysis of basic aspects of constructions like raising, auxiliaries, expletives, and passives.
Read More
Computational Phonology

Jeffrey Heinz, Jason Riggle

This course teaches foundational concepts in computer science and mathematical linguistics as they apply to phonology. This material is related to rule-based and constraint-based theories of phonology including several varieties of SPE and OT including harmonic grammar. The course has two main foci. First, it will show how computational analysis allows the expressive power of the theories to be compared. Second, it will show how computational analysis can make significant inroads on problems relating to learning phonological patterns from data.
Read More
Computational Psycholinguistics

Roger Levy, Klinton Bicknell

Over the last two and a half decades, computational linguistics has been revolutionized as a result of three closely related developments: increases in computing power, the advent of large linguistic datasets, and a paradigm shift toward probabilistic modeling.
Read More
Continuations and Natural Language

Chris Barker

This course will make a case that continuations, a concept from the theory of programming languages, are an indispensable element in any complete account of natural language meaning. The continuation of an expression is a portion of its surrounding context. The main applications of continuations to be considered include scope, binding, crossover, reconstruction, negative polarity licensing, the compositional semantics of the adjective "same", and sluicing. The course will follow the 2014 Barker and Shan book, `Continuations and natural language' (Oxford).
Read More
Data-driven Computational Pragmatics

Shlomo Engelson Argamon, Jonathan Dunn

This course introduces data-driven computational pragmatics, an empirical approach to pragmatics which uses large amounts of linguistic data with only computational annotations to learn models describing pragmatic phenomena. Data-driven computational pragmatics offers two important advantages: (1) experiments which require no direct human intervention can be run on massive amounts of linguistic data; (2) subtle pragmatic phenomena which are below the level of consciousness of individual analysts can be detected and described.
Read More
Gradient Symbolic Computation

Paul Smolensky, Matt Goldrick

Classical, discrete representations (e.g., syntactic trees) have been the foundation for much of modern linguistic theory, providing key insights into the structure of linguistic knowledge and language processing. However, such frameworks fail to capture the gradient computational principles that underlie human cognition and behavior--not simply performance, but also our competence.
Read More
Introduction to Bilingualism

Virginia Yip, Ping Li

This course introduces theoretical and methodological issues in the study of bilingualism. The first half of the course focuses on bilingual acquisition in early childhood. We examine how children develop two languages in families where they are exposed to dual input from birth. The issues covered include language differentiation, cross-linguistic influence and code-mixing in bilingual development. Data from the language development of children acquiring Chinese and English, as well as other language pairs will be used for illustration.
Read More
Introduction to Computational Linguistics

Sharon Goldwater

This course provides an overview of the main methods and algorithms used in computational linguistics, motivated by some examples of questions they can be used to investigate. We will cover the basics of: information theory (entropy and mutual information), n-gram models (for computing the probabilities of phone or word sequences), finite-state automata and hidden Markov models, parsing algorithms, and distributional semantic models. In addition to lectures, we will include some hands-on labs in Python to help students gain practical experience with some of these concepts.
Read More
Introduction to Statistics with R

Stefan Th. Gries

This course introduces the participants to the basic logic underlying statistical description and analysis, teaches them how to compute and visualize descriptive statistics, and how to perform monofactorial statistical tests of linguistic data from both observational and experimental settings. The course will use the open source programming language and environment R and will be loosely based on Gries (2013), the second edition of my textbook 'Statistics for linguistics with R'.
Read More
Language Variation through the Lens of Web Data

Sravana Reddy

The rise of social media has resulted in an unprecedented quantity of user-generated data such as text on Twitter or speech and video on YouTube. This content is often associated with demographic information – the gender, geographic location, ethnicity, and social network connections of the author – which opens up the opportunity to study language variation from a corpus-based "big data" point of view.
Read More
Speech Technologies

Karen Livescu

This course will introduce techniques used in speech technologies, mainly focusing on automatic speech recognition (ASR). Speech recognition is one of the oldest and most complex sequence prediction tasks receiving significant research and commercial attention, and also a good example of the effectiveness of combining linguistic knowledge and speech science with statistics and machine learning. Course topics will include historical and phonetic background, acoustic features, dynamic time warping, hidden Markov models, statistical language models, and current research in ASR.
Read More
The Computational Theory of the Error-driven Ranking Model of the Acquisition of Phonotactics

Giorgio Magri

Nine-month-olds already display knowledge of the native phonotactics, namely react differently to licit versus illicit sound combinations. Children must thus rely on a remarkably efficient phonotactic learning procedure. What does it look like? Assume that the learner is provided with the typology of OT grammars corresponding to all rankings of a given constraint set. Data come in a stream and consist of licit phonological forms.
Read More
The Data Gold Rush: Exploiting freely available web data for linguistic research

Andrew Wedel, Bodo Winter

The web is full of freely available data that is just waiting to be explored by the capable analyst. In this course, we will survey some of the freely available web data sources and discuss linguistic research projects that have been conducted with them. We will emphasize the Buckeye Corpus, the Lexicon Projects, dictionary data, Google Ngram and the TV News Archive, as well as resources from less-studied languages. We discuss projects that relate to a broad range of linguistic topics, including speech/phonetics, semantics and gesture.
Read More

You are here

Computational Linguistics

Computational Linguistics

View courses

James Kirby, Morgan Sonderegger

Paul Cook, Ed Finegan

Jeffrey Heinz, Jason Riggle

Roger Levy, Klinton Bicknell

Shlomo Engelson Argamon, Jonathan Dunn

Paul Smolensky, Matt Goldrick

Virginia Yip, Ping Li

Andrew Wedel, Bodo Winter