You are here

A Corpus-based Approach to Building Ontology of an Endangered Language

A Corpus-based Approach to Building Ontology of an Endangered Language

The half-day workshop on “A corpus-based approach to building ontology of an endangered language” will be presented by the faculty and graduate students of the Yami research team at the Institute of Linguistics, National Chung Cheng University and the Department of Computer Science and Communication Engineering, Providence University in Taiwan. The participants will learn about the linguistic issues and computer techniques in corpus linguistics from an endangered language documentation team.

Please pre-register for the workshop here: https://docs.google.com/a/gm.pu.edu.tw/forms/d/1PVMCtOnQ3pih6YBORAn0Eh9b...

The workshop schedule is as follows.

9:30-10:50 Part One: Ontology

1. Introduction, Online dictionary and ontology building for Austronesian languages in Taiwan, Victoria Rau (5 minutes)
2. A corpus-based approach to the classification of Yami emotion, Victoria Rau (15 minutes)
3. Constructing fish culture from Yami language documentation corpus, Yi-Hsin Catherine Wu (15 minutes)
4. Categorization and conceptualization of body parts in Yami, Ann Hui-Huan Chang (15 minutes)
5. Comments by invited discussants: Anthony Woodbury, Lenore Grenoble, Keren Rice, and Martin Haspelmath (30 minutes)

10:50-11:00 Break

11:00-12:00 Part Two: Technology
6. On developing online lexical resources for endangered languages in Taiwan: An example of Yami, Meng-Chien Yang (15 minutes)
7. Constructing high quality audio learning materials using Lexique Pro, Ann Chang (15 minutes)
8. Comments by invited discussants: Anthony Woodbury, Lenore Grenoble, Keren Rice, and Martin Haspelmath (20 minutes)
9. Q & A (10 minutes)

Online Dictionary and Ontology Building for Austronesian Languages in Taiwan
1Victoria Rau, 2Meng-Chien Yang, 1Hui-Huan Ann Chang, & 3Maa-neu Dong
1National Chung Cheng University, 2Providence University, and 3National Museum of Natural Sciences
Abstract
This paper provides a model of language documentation and conservation in Taiwan to illustrate how online dictionaries have been produced by a collaborative team, and how technology has been used in the process to create a formalized model of existing indigenous knowledge. Our interactions with the Yami community over the past decade have led us to believe that a cooperation framework involving three groups of experts provides necessary “scaffolding” before an “egalitarian” wiki style of online dictionary or ontology building can be attempted. In addition, ontology building requires triangulation of various sources of human interpretations. It is not possible to build an ontology only based on sophisticated machine reasoning. We hope this model of collaboration can serve as a feasible model for other projects in language revitalization and capacity building in the future.

A corpus approach to Yami emotion
1Victoria Rau, 1Yi-Hsin Wu, and 2Meng-Chien Yang
1National Chung Cheng University and 2Providence University
Abstract
This study aims to demonstrate a corpus approach to identifying emotion in Yami. Following Huang’s (2002) study on emotion in Tsou by focusing on a grammatical model to conceptualize emotion concepts, we began our study by extracting all 1763 tokens of ika- from our Yami language documentation website (http://yamiproject.cs.pu.edu.tw/yami/database.htm) and identifying the set of lexical items related to emotion. The extraction of all the ika- tokens helped us identify the construction meaning (Goldberg 1995) of ika- as “the reason/cause for a certain feeling or state.”
Our final extraction of 126 emotion terms with ika- prefix from 166 texts from the Yami corpus was classified based on Clore et al.’s (1987) taxonomy of Filipino emotion terms. Yami emotion constitutes three internal conditions: affective, cognitive, and physical, with the three affective conditions and the two cognitive conditions “hypercognized.” This finding supported Church et al.’s recommendation that the terms in all three subcategories of affective-cognitive states in Clore et al.’s (1987) taxonomy of emotion terms be viewed as referring to emotions. Overall, Yami negative emotion is much finely lexicalized than positive emotion. This methodology can be readily applied to future study on emotion in other Austronesian languages.

Constructing fish culture from Yami language documentation corpus
Yi-Hsin Catherine Wu and Victoria Rau
Institute of Linguistics, National Chung Cheng University
Abstract
Fish constitute an integral part of the Yami maritime culture of Orchid Island. Most previous research studies on Yami fish culture have been conducted in the field of anthropology, and until recently no corpus-based linguistic analysis has been attempted due to the lack of a database.
This study aimed to construct a fish ontology using the Yami corpus built during the Yami language documentation project (http://yamiproject.cs.pu.edu.tw). We started by selecting all the paragraphs containing the keyword “fish” and coded the contents into categories, such as taboo, food, fishing techniques, etc. After categorization, we drew tree diagrams to show the relationships of hyponymity. We also consulted Yami fishermen about the face validity of our preliminary tree structure and revised the final analysis by adding one major branch at the top of the diagram to distinguish flying fish from other fish.
The method of using discourse analysis to build a Yami fish ontology from the bottom up has provided a near complete picture of Yami fish culture. The same approach can be used to discover relationships in other domains of Yami culture, paving the way for future investigation of ontology using corpus analysis.

Categorization and conceptualization of body parts in Yami
Ann Hui-Huan Chang
Institute of Linguistics, National Chung Cheng University
Abstract
This study provides a description and analysis of body part terminology used in Yami, a Philippine language spoken on Orchid Island with a population of about 4000. The first part of the study lists an inventory of Yami body terms which is used to examine how Yami people organize and conceptualize Yami body parts. Then, the relations between Yami body part terms are analyzed to discuss their partonomic relations, i.e., part-whole relations between body parts. The paper also addresses how body part categories interact with semantic categories: metaphor or metonymy, which supports Hilpert’s (2007) conceptual mappings and Lakoff and Johnson’s (1980) conceptual metaphor theory. There are three semantic extensions: 1. body-part based measures (e.g. rokap ‘palm’ > ‘the width of a palm’), 2. spatial orientation (e.g. likod ‘back (body)’ > ‘behind, in back’), and 3. metaphorical configuration (ai ‘foot’ > ‘pillar’), Finally, the study shows that Yami people segment their body into five major parts: oo ‘head’, lima ‘hand’, kataotao ‘upper body’, ai ‘leg’, and likod ‘back’. Thus, Yami hierarchical partonomy has no more than five levels which support Andersen’s (1978) ‘depth principle,’ claiming that the hierarchically part-whole relation of body part rarely exceeds five levels, and never six.

On developing online lexical resources for endangered languages in Taiwan: An example of Yami
1Meng-Chien Yang and 2Victoria Rau
1Providence University and 2National Chung Cheng University
Abstract
Online lexical resources play an important role in language processing in the information era. However, it is very costly to establish such resources for a target language, especially when the language is endangered. This paper describes a framework and guidelines for developing online lexical resources for endangered languages in Taiwan, using Yami as an example of a language we have curated over the past two decades. We also describe considerations specific to endangered languages. In addition, we discuss reasons why endangered languages need online lexical resources.

Constructing high quality audio learning materials using Lexique Pro
1Ann Hui-Huan Chang, 1Victoria Rau, and 2Maa-neu Dong
1National Chung Cheng University and 2National Museum of Natural Sciences
Abstract
Applying modern information technologies to audio documentation and conservation of endangered languages is now one of the most urgent missions in documentary linguistics. The purpose of this paper is to introduce how the Yami research team applied two essential features of Lexique Pro, and , to create audiovisual supplementary material for the forthcoming publication The Teacher’s Grammar of Yami. This paper describes how to construct both personal and online versions of audio entries, as well as example sentences pertaining to the book. Not only does it offer reusable documentary records of linguistic data, but the language resources can also be openly shared with a wider audience. We are going to discuss (1) how to select suitable audio equipment for the highest possible quality recording of speech, (2) how to use a digital audio editor to edit sound recording, and (3) how to edit the audio data and integrate them with the text to construct high quality audio learning materials. These procedures will offer an example of best practices for documenting endangered languages.

References
Chang, Ann Hui-Huan. (2014). Categorization and conceptualization of body parts in Yami. Providence Forum: Language and Humanities 15.1:183-207.
Chang, Hui-Huan, Victoria Rau, and Maa-neu Dong (2015). Constructing high quality audio learning materials using Lexique Pro. Electronic poster presented at the ICLDC 4, 2/26-3/1/2015. University of Hawaii at Manoa.
Rau, Victoria, Yi-Hsin Catherine Wu, and Meng-Chien Yang. (2015 forthcoming). A corpus-based approach to the classification of Yami emotion. E. Zeitoun, Stacy Teng, and Joy Wu (Eds.), New Advances in Formosan Languages. A-PL.
Rau, D. Victoria, Meng-Chien Yang, Ann Hui-Huan Chang & Maa-Neu Dong. (December 2009). Online dictionary and ontology building for Austronesian languages in Taiwan. Journal of Language Documentation and Conservation, University of Hawaii 3.2: 207-224. (http://www.ccunix.ccu.edu.tw/~lngrau/_private/A-2.pdf)
Wu, Yi-hsin Catherine and Victoria Rau (2015 forthcoming). Constructing fish culture from Yami language documentation corpus (in Chinese). In Tseng, Ming-yu (Ed.), Language and Material, Kaohsiung: National Sun Yat-sen University.
Yang, Meng-Chien & Rau, D. Victoria. (2015). On developing online lexical resources for endangered languages in Taiwan: An example of Yami. Unpublished manuscript.

Please see preregistration info here: https://docs.google.com/a/gm.pu.edu.tw/forms/d/1PVMCtOnQ3pih6YBORAn0Eh9b...

Websites:
1. Digital archiving Yami language documentation
http://yamiproject.cs.pu.edu.tw/yami/
2. Yami e-learning
http://yamiproject.cs.pu.edu.tw/elearn/
3. Yami online dictionary
http://yamibow.cs.pu.edu.tw/index_en.htm; http://yamionto.cs.pu.edu.tw/tao_dict/lexicon/main.htm
4. Yami audio learning materials (in Chinese)
http://www.ccunix.ccu.edu.tw/~lngrau/TAO_Teaching_Web/taoteaching.html

Date:

Saturday, July 11, 2015 - 9:30am to 12:30pm

Location:

Franke Institute
Regenstein S-102
1100 E. 57th St.
Chicago, IL 60637

Speakers / Organizers:

Victoria Rau