You are here

The Data Gold Rush: Exploiting freely available web data for linguistic research

Courses

The Data Gold Rush: Exploiting freely available web data for linguistic research

The web is full of freely available data that is just waiting to be explored by the capable analyst. In this course, we will survey some of the freely available web data sources and discuss linguistic research projects that have been conducted with them. We will emphasize the Buckeye Corpus, the Lexicon Projects, dictionary data, Google Ngram and the TV News Archive, as well as resources from less-studied languages. We discuss projects that relate to a broad range of linguistic topics, including speech/phonetics, semantics and gesture. As a crucial part of this survey, we will provide background for quantitative methods that are helpful in analyzing these kinds of data structures. The course acts as a launch pad for generating research ideas with web-based data by exploring what has been done and could be done with legacy data.

Course Status: Closed

This course is currently at capacity. Login to be added to the course's waiting list.

Course Number:

336

Course Session:

First two-week Session

Times:

Monday: 3:10 pm-5:00 pm
Thursday: 3:10 pm-5:00 pm

Instructor(s):

Prerequisites:

No programming skills are required, but people with programming skills (in particular R, Python) will be able to do more involved projects. The course is appropriate for both undergraduates and graduates with a good background in basic linguistics.