Language Processing
The knowledge of language needed to engage in complex language behavior can be separated into six distinct categories:- morphology : the way words are built up from small meaning-bearing units
- syntax : the structural relationships between words
- semantics : the meaning
- phonetics and phonology : linguistic sounds
- pragmatics : how language is used to accomplish goals
- discourse : the study of linguistic units larger than a single utterance
- ambiguities (this means that for an imput there are multiple alternative linguistic structures that can be built for it)
And one should take in account :
- word segmentation
- POS (part-of-speech) tagging
- phrase identification
- parsing
- grammar development
- lexicon acquisition
- corpus development
1992 : Segmentation Standard, Announcement of the first national standard for word segmentation by PRC government. (GB 13715)
1993 : Lexicon, Completion and Release of the first version of CKIP lexicon (with the category set and ICG thematic roles), First version of K. Chen's parser for Chinese
1998 : Segmentation Standard Official announcement of CNS14366 for Taiwan
2000 : Treebanks, Simultaneous completion and announcement of two Chinese Treebanks:
* Penn Chinese Treebank (see LDC)
* Sinica Treebank
LDC : Linguistic Data Consortium
[ < Home]