You are here

Language Variation through the Lens of Web Data


Language Variation through the Lens of Web Data

The rise of social media has resulted in an unprecedented quantity of user-generated data such as text on Twitter or speech and video on YouTube. This content is often associated with demographic information – the gender, geographic location, ethnicity, and social network connections of the author – which opens up the opportunity to study language variation from a corpus-based "big data" point of view.

This class will introduce relevant technologies in machine learning, text and signal processing, and statistics, with a view towards applying these methods to study language variation. For example, can we identify when a certain linguistic feature entered a community? Which gender was responsible for the adoption of that feature? What kinds of language contact phenomena are observed in mixed and immigrant populations in the US and elsewhere? Students will gain exposure to the relevant machine learning and statistical ideas, and practice writing programs to mine and analyze linguistic data from web sources.

Course Status: Closed

This course is currently at capacity. Login to be added to the course's waiting list.

Course Number:


Course Session:

First two-week Session


1:10 pm-3:00 pm
1:10 pm-3:00 pm