You are here
Computational Linguistics
Computational Linguistics
-
Advanced Probabilistic Modeling in R
Roger Levy
Read MoreProbabilistic modeling is transforming the study of human language, ranging from novel theories of linguistic cognition to sophisticated techniques for statistical analysis of complex, structured linguistic data to practical methods for automated processing of language. Doing cutting-edge research in these areas requires skill with probability and statistics, familiarity with formalisms from computational linguistics, ability to use and develop new computational tools, and comfort with handling complex datasets. This course will cover both theory and application, covering conceptual fundame
-
Computational Approaches to Sound Change
James Kirby, Morgan Sonderegger
Read MoreDecades of empirical research have led to an increasingly nuanced picture of the nature of phonetic and phonological change, incorporating insights from speech production and perception, cognitive biases, and social factors. However, there remains a significant gap between observed patterns and proposed mechanisms, in part due to the difficulty of conducting the type of controlled studies necessary to test hypotheses about historical change. Computational and mathematical models provide an alternative means by which such hypotheses can be fruitfully explored.
-
Computational Corpus Lexicography
Paul Cook, Ed Finegan
Read MoreWith the availability of large corpora and the increased computational power that has enabled their efficient processing, modern lexicography has undergone a revolution. Statistical techniques, now central to lexicography, enable lexicographers to produce higher-quality dictionaries at lower cost. This course introduces modern corpus lexicography, with a focus on monolingual dictionaries, using English as an exemplar. We discuss the need for large corpora, how to build them, and the key statistical corpus methods used in modern lexicography.
-
Computational Learning of Syntax
Alexander Clark
Read MoreThis course will look at the computational and mathematical theory of how grammars can be learned from strings.
Any theory of language acquisition must at bottom rest on some solution to this problem. -
Computational Lexical Semantics
Dan Jurafsky
Read MoreSurvey of computational models for representing and processing lexical semantics. Topics include semantic role labeling, online dictionaries and thesauri, word sense disambiguation, distributional (vector) semantics, and sentiment analysis.
-
Computational Minimalism
Greg Kobele
Read MoreA precise formal understanding of a linguistic theory is vital for distinguishing between contentful and notational aspects of a linguistic proposal, for pinpointing cross-framework agreements and disagreements, and for making principled connections to other empirical domains.
This course will present recent transformational syntax (`minimalism') in terms Stabler's minimalist grammar (MG) formalism. To get a feel for the formalism, we will engage in a hands-on analysis of basic aspects of constructions like raising, auxiliaries, expletives, and passives.
-
Computational Phonology
Jeffrey Heinz, Jason Riggle
Read MoreThis course teaches foundational concepts in computer science and mathematical linguistics as they apply to phonology. This material is related to rule-based and constraint-based theories of phonology including several varieties of SPE and OT including harmonic grammar. The course has two main foci. First, it will show how computational analysis allows the expressive power of the theories to be compared. Second, it will show how computational analysis can make significant inroads on problems relating to learning phonological patterns from data.
-
Computational Psycholinguistics
Roger Levy, Klinton Bicknell
Read MoreOver the last two and a half decades, computational linguistics has been revolutionized as a result of three closely related developments: increases in computing power, the advent of large linguistic datasets, and a paradigm shift toward probabilistic modeling.
-
Continuations and Natural Language
Chris Barker
Read MoreThis course will make a case that continuations, a concept from the theory of programming languages, are an indispensable element in any complete account of natural language meaning. The continuation of an expression is a portion of its surrounding context. The main applications of continuations to be considered include scope, binding, crossover, reconstruction, negative polarity licensing, the compositional semantics of the adjective "same", and sluicing. The course will follow the 2014 Barker and Shan book, `Continuations and natural language' (Oxford).
-
Data-driven Computational Pragmatics
Shlomo Engelson Argamon, Jonathan Dunn
Read MoreThis course introduces data-driven computational pragmatics, an empirical approach to pragmatics which uses large amounts of linguistic data with only computational annotations to learn models describing pragmatic phenomena. Data-driven computational pragmatics offers two important advantages: (1) experiments which require no direct human intervention can be run on massive amounts of linguistic data; (2) subtle pragmatic phenomena which are below the level of consciousness of individual analysts can be detected and described.
-
Gradient Symbolic Computation
Paul Smolensky, Matt Goldrick
Read MoreClassical, discrete representations (e.g., syntactic trees) have been the foundation for much of modern linguistic theory, providing key insights into the structure of linguistic knowledge and language processing. However, such frameworks fail to capture the gradient computational principles that underlie human cognition and behavior--not simply performance, but also our competence.
-
Introduction to Bilingualism
Virginia Yip, Ping Li
Read MoreThis course introduces theoretical and methodological issues in the study of bilingualism. The first half of the course focuses on bilingual acquisition in early childhood. We examine how children develop two languages in families where they are exposed to dual input from birth. The issues covered include language differentiation, cross-linguistic influence and code-mixing in bilingual development. Data from the language development of children acquiring Chinese and English, as well as other language pairs will be used for illustration.
-
Introduction to Computational Linguistics
Sharon Goldwater
Read MoreThis course provides an overview of the main methods and algorithms used in computational linguistics, motivated by some examples of questions they can be used to investigate. We will cover the basics of: information theory (entropy and mutual information), n-gram models (for computing the probabilities of phone or word sequences), finite-state automata and hidden Markov models, parsing algorithms, and distributional semantic models. In addition to lectures, we will include some hands-on labs in Python to help students gain practical experience with some of these concepts.
-
Introduction to Statistics with R
Stefan Th. Gries
Read MoreThis course introduces the participants to the basic logic underlying statistical description and analysis, teaches them how to compute and visualize descriptive statistics, and how to perform monofactorial statistical tests of linguistic data from both observational and experimental settings. The course will use the open source programming language and environment R and will be loosely based on Gries (2013), the second edition of my textbook 'Statistics for linguistics with R'.
-
Language Variation through the Lens of Web Data
Sravana Reddy
Read MoreThe rise of social media has resulted in an unprecedented quantity of user-generated data such as text on Twitter or speech and video on YouTube. This content is often associated with demographic information – the gender, geographic location, ethnicity, and social network connections of the author – which opens up the opportunity to study language variation from a corpus-based "big data" point of view.
-
Speech Technologies
Karen Livescu
Read MoreThis course will introduce techniques used in speech technologies, mainly focusing on automatic speech recognition (ASR). Speech recognition is one of the oldest and most complex sequence prediction tasks receiving significant research and commercial attention, and also a good example of the effectiveness of combining linguistic knowledge and speech science with statistics and machine learning. Course topics will include historical and phonetic background, acoustic features, dynamic time warping, hidden Markov models, statistical language models, and current research in ASR.
-
The Computational Theory of the Error-driven Ranking Model of the Acquisition of Phonotactics
Giorgio Magri
Read MoreNine-month-olds already display knowledge of the native phonotactics, namely react differently to licit versus illicit sound combinations. Children must thus rely on a remarkably efficient phonotactic learning procedure. What does it look like? Assume that the learner is provided with the typology of OT grammars corresponding to all rankings of a given constraint set. Data come in a stream and consist of licit phonological forms.
-
The Data Gold Rush: Exploiting freely available web data for linguistic research
Andrew Wedel, Bodo Winter
Read MoreThe web is full of freely available data that is just waiting to be explored by the capable analyst. In this course, we will survey some of the freely available web data sources and discuss linguistic research projects that have been conducted with them. We will emphasize the Buckeye Corpus, the Lexicon Projects, dictionary data, Google Ngram and the TV News Archive, as well as resources from less-studied languages. We discuss projects that relate to a broad range of linguistic topics, including speech/phonetics, semantics and gesture.