You are here

Computational Linguistics

Computational Linguistics

  • Advanced Probabilistic Modeling in R

    Roger Levy

    Probabilistic modeling is transforming the study of human language, ranging from novel theories of linguistic cognition to sophisticated techniques for statistical analysis of complex, structured linguistic data to practical methods for automated processing of language. Doing cutting-edge research in these areas requires skill with probability and statistics, familiarity with formalisms from computational linguistics, ability to use and develop new computational tools, and comfort with handling complex datasets. This course will cover both theory and application, covering conceptual fundame

    Read More
  • Computational Approaches to Sound Change

    James Kirby, Morgan Sonderegger

    Decades of empirical research have led to an increasingly nuanced picture of the nature of phonetic and phonological change, incorporating insights from speech production and perception, cognitive biases, and social factors. However, there remains a significant gap between observed patterns and proposed mechanisms, in part due to the difficulty of conducting the type of controlled studies necessary to test hypotheses about historical change. Computational and mathematical models provide an alternative means by which such hypotheses can be fruitfully explored.

    Read More
  • Computational Corpus Lexicography

    Paul Cook, Ed Finegan

    With the availability of large corpora and the increased computational power that has enabled their efficient processing, modern lexicography has undergone a revolution. Statistical techniques, now central to lexicography, enable lexicographers to produce higher-quality dictionaries at lower cost. This course introduces modern corpus lexicography, with a focus on monolingual dictionaries, using English as an exemplar. We discuss the need for large corpora, how to build them, and the key statistical corpus methods used in modern lexicography.

    Read More
  • Computational Learning of Syntax

    Alexander Clark

    This course will look at the computational and mathematical theory of how grammars can be learned from strings.  
    Any theory of language acquisition must at bottom rest on some solution to this problem.

    Read More
  • Computational Lexical Semantics

    Dan Jurafsky

    Survey of computational models for representing and processing lexical semantics. Topics include semantic role labeling, online dictionaries and thesauri, word sense disambiguation, distributional (vector) semantics, and sentiment analysis.

    Read More
  • Computational Minimalism

    Greg Kobele

    A precise formal understanding of a linguistic theory is vital for distinguishing between contentful and notational aspects of a linguistic proposal, for pinpointing cross-framework agreements and disagreements, and for making principled connections to other empirical domains.

    This course will present recent transformational syntax (`minimalism') in terms Stabler's minimalist grammar (MG) formalism. To get a feel for the formalism, we will engage in a hands-on analysis of basic aspects of constructions like raising, auxiliaries, expletives, and passives.

    Read More
  • Computational Phonology

    Jeffrey Heinz, Jason Riggle

    This course teaches foundational concepts in computer science and mathematical linguistics as they apply to phonology. This material is related to rule-based and constraint-based theories of phonology including several varieties of SPE and OT including harmonic grammar. The course has two main foci. First, it will show how computational analysis allows the expressive power of the theories to be compared. Second, it will show how computational analysis can make significant inroads on problems relating to learning phonological patterns from data.

    Read More
  • Computational Psycholinguistics

    Roger Levy, Klinton Bicknell

    Over the last two and a half decades, computational linguistics has been revolutionized as a result of three closely related developments: increases in computing power, the advent of large linguistic datasets, and a paradigm shift toward probabilistic modeling.

    Read More
  • Continuations and Natural Language

    Chris Barker

    This course will make a case that continuations, a concept from the theory of programming languages, are an indispensable element  in any complete account of natural language meaning.  The continuation of an expression is a portion of its surrounding context.  The main applications of continuations to be considered include scope, binding, crossover, reconstruction, negative polarity licensing, the compositional semantics of the adjective "same", and sluicing.   The course will follow the 2014 Barker and Shan book, `Continuations and natural language' (Oxford).

    Read More
  • Data-driven Computational Pragmatics

    Shlomo Engelson Argamon, Jonathan Dunn

    This course introduces data-driven computational pragmatics, an empirical approach to pragmatics which uses large amounts of linguistic data with only computational annotations to learn models describing pragmatic phenomena. Data-driven computational pragmatics offers two important advantages: (1) experiments which require no direct human intervention can be run on massive amounts of linguistic data; (2) subtle pragmatic phenomena which are below the level of consciousness of individual analysts can be detected and described.

    Read More
  • Gradient Symbolic Computation

    Paul Smolensky, Matt Goldrick

    Classical, discrete representations (e.g., syntactic trees) have been the foundation for much of modern linguistic theory, providing key insights into the structure of linguistic knowledge and language processing. However, such frameworks fail to capture the gradient computational principles that underlie human cognition and behavior--not simply performance, but also our competence. 

    Read More
  • Introduction to Bilingualism

    Virginia Yip, Ping Li

    This course introduces theoretical and methodological issues in the study of bilingualism. The first half of the course focuses on bilingual acquisition in early childhood. We examine how children develop two languages in families where they are exposed to dual input from birth. The issues covered include language differentiation, cross-linguistic influence and code-mixing in bilingual development. Data from the language development of children acquiring Chinese and English, as well as other language pairs will be used for illustration.

    Read More
  • Introduction to Computational Linguistics

    Sharon Goldwater

    This course provides an overview of the main methods and algorithms used in computational linguistics, motivated by some examples of questions they can be used to investigate.  We will cover the basics of: information theory (entropy and mutual information), n-gram models (for computing the probabilities of phone or word sequences), finite-state automata and hidden Markov models, parsing algorithms, and distributional semantic models.  In addition to lectures, we will include some hands-on labs in Python to help students gain practical experience with some of these concepts.

    Read More
  • Introduction to Statistics with R

    Stefan Th. Gries

    This course introduces the participants to the basic logic underlying statistical description and analysis, teaches them how to compute and visualize descriptive statistics, and how to perform monofactorial statistical tests of linguistic data from both observational and experimental settings. The course will use the open source programming language and environment R and will be loosely based on Gries (2013), the second edition of my textbook 'Statistics for linguistics with R'.

    Read More
  • Language Variation through the Lens of Web Data

    Sravana Reddy

    The rise of social media has resulted in an unprecedented quantity of user-generated data such as text on Twitter or speech and video on YouTube. This content is often associated with demographic information – the gender, geographic location, ethnicity, and social network connections of the author – which opens up the opportunity to study language variation from a corpus-based "big data" point of view.

    Read More
  • Speech Technologies

    Karen Livescu

    This course will introduce techniques used in speech technologies, mainly focusing on automatic speech recognition (ASR). Speech recognition is one of the oldest and most complex sequence prediction tasks receiving significant research and commercial attention, and also a good example of the effectiveness of combining linguistic knowledge and speech science with statistics and machine learning. Course topics will include historical and phonetic background, acoustic features, dynamic time warping, hidden Markov models, statistical language models, and current research in ASR.

    Read More
  • The Computational Theory of the Error-driven Ranking Model of the Acquisition of Phonotactics

    Giorgio Magri

    Nine-month-olds already display knowledge of the native phonotactics, namely react differently to licit versus illicit sound combinations. Children must thus rely on a remarkably efficient phonotactic learning procedure. What does it look like? Assume that the learner is provided with the typology of OT grammars corresponding to all rankings of a given constraint set. Data come in a stream and consist of licit phonological forms.

    Read More
  • The Data Gold Rush: Exploiting freely available web data for linguistic research

    Andrew Wedel, Bodo Winter

    The web is full of freely available data that is just waiting to be explored by the capable analyst. In this course, we will survey some of the freely available web data sources and discuss linguistic research projects that have been conducted with them. We will emphasize the Buckeye Corpus, the Lexicon Projects, dictionary data, Google Ngram and the TV News Archive, as well as resources from less-studied languages. We discuss projects that relate to a broad range of linguistic topics, including speech/phonetics, semantics and gesture.

    Read More