| ชื่อเรื่อง | : | Cross-lingual genre classification |
| นักวิจัย | : | Petrenz, Philipp |
| คำค้น | : | genre , cross-lingual , text classification |
| หน่วยงาน | : | Edinburgh Research Archive, United Kingdom |
| ผู้ร่วมงาน | : | Webber, Bonnie , Lavrenko, Victor , Engineering and Physical Sciences Research Council (EPSRC) , Google Research Award, |
| ปีพิมพ์ | : | 2557 |
| อ้างอิง | : | http://hdl.handle.net/1842/9658 |
| ที่มา | : | - |
| ความเชี่ยวชาญ | : | - |
| ความสัมพันธ์ | : | Petrenz, P. (2012). Cross-lingual genre classification. In Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 11–21. Association for Computational Linguistics. , Petrenz, P. and Webber, B. (2011). Stable classification of text genres. Computational Linguistics, 37:385–393. , Petrenz, P. and Webber, B. (2012). Label propagation for fine-grained cross-lingual genre classification. In Proceedings of the NIPS Workshop on Cross-Lingual Technologies (xLiTe). , Petrenz, P. and Webber, B. (2012). Robust cross-lingual genre classification through comparable corpora. In The 5th Workshop on Building and Using Comparable Corpora, page 1. |
| ขอบเขตของเนื้อหา | : | - |
| บทคัดย่อ/คำอธิบาย | : | Automated classification of texts into genres can benefit NLP applications, in that the structure, location and even interpretation of information within a text are dictated by its genre. Cross-lingual methods promise such benefits to languages which lack genre-annotated training data. While there has been work on genre classification for over two decades, none has considered cross-lingual methods before the start of this project. My research aims to fill this gap. It follows previous approaches to monolingual genre classification that exploit simple, low-level text features, many of which can be extracted in different languages and have similar functions. This contrasts with work on cross-lingual topic or sentiment classification of texts that typically use word frequencies as features. These have been shown to have limited use when it comes to genres. Many such methods also assume cross-lingual resources, such as machine translation, which limits the range of their application. A selection of these approaches are used as baselines in my experiments. I report the results of two semi-supervised methods for exploiting genre-labelled source language texts and unlabelled target language texts. The first is a relatively simple algorithm that bridges the language gap by exploiting cross-lingual features and then iteratively re-trains a classification model on previously predicted target texts. My results show that this approach works well where only few cross-lingual resources are available and texts are to be classified into broad genre categories. It is also shown that further improvements can be achieved through multi-lingual training or cross-lingual feature selection if genre-annotated texts are available in several source languages. The second is a variant of the label propagation algorithm. This graph-based classifier learns genre-specific feature set weights from both source and target language texts and uses them to adjust the propagation channels for each text. This allows further feature sets to be added as additional resources, such as Part of Speech taggers, become available. While the method performs well even with basic text features, it is shown to benefit from additional feature sets. Results also indicate that it handles fine-grained genre classes better than the iterative re-labelling method. |
| บรรณานุกรม | : |
Petrenz, Philipp . (2557). Cross-lingual genre classification.
กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom . Petrenz, Philipp . 2557. "Cross-lingual genre classification".
กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom . Petrenz, Philipp . "Cross-lingual genre classification."
กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom , 2557. Print. Petrenz, Philipp . Cross-lingual genre classification. กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom ; 2557.
|
