Ȩ > Ç¥ÁØÈ °³¿ä > TTAÀÇ Ç¥ÁØÇöȲ
Ç¥ÁعøÈ£ | TTAK.KO-10.1100 | ±¸Ç¥ÁعøÈ£ | |
---|---|---|---|
Á¦°³Á¤ÀÏ | 2018-12-19 | ÃÑÆäÀÌÁö | 20 |
ÇѱÛÇ¥Áظí | ´ë¿ë·® ÅؽºÆ® µ¥ÀÌÅÍ Ã³¸® È¿À² °³¼±À» À§ÇÑ ÇüÅÂ¼Ò Ç°»ç ÅÂ±× ¼¼Æ® | ||
¿µ¹®Ç¥Áظí | Part-of-Speech Tag Set for Improving the Processing Efficiency of Large-scale Text Data | ||
Çѱ۳»¿ë¿ä¾à | ÇüÅÂ¼Ò ºÐ¼®±â¿¡¼µµ 2015³â Á¦Á¤µÈ ¸»¹¶Ä¡ ÀÛ¼º¿ë Ç°»ç ÅÂ±× ¼¼Æ®¸¦ ±×´ë·Î »ç¿ëÇÏ´Â °æ¿ì°¡ ¸¹ÀÌ ÀÖ´Ù. ÇÏÁö¸¸, ½ÇÁ¦ ÀÚ¿¬¾î ó¸® °üÁ¡¿¡¼ ¾îÈÖÈ°¿ë ±â¹ÝÀÇ Ç°»ç űװ¡ ÇüÅÂÀÇ¹Ì ±â¹ÝÀÇ Ç°»ç ű׺¸´Ù ºÐ·ùÇϱ⠿ëÀÌÇÏ°í, ÀÌ·± °á°ú´Â ÇüÅÂ¼Ò ºÐ¼® °á°ú¸¦ ÀÚÁú·Î °¡Áö´Â »óÀ§ ¾ð¾î ó¸®(°³Ã¼¸í Àνıâ, ±¸¹® ºÐ¼®±â)¿¡¼µµ ¿µÇâÀ» ¹ÌÄ£´Ù. º» Ç¥ÁØ¿¡¼´Â ÇüÅÂ¼Ò ºÐ¼®±â¿¡¼ °¢ ÇüżҰ¡ °¡Áö´Â Ç°»çÀÇ ¸ðÈ£¼ºÀ» ÁÙÀ̱â À§ÇØ Ç°»ç ÅÂ±× ¼¼Æ®¸¦ ¾îÈÖÈ°¿ë Áß½ÉÀ¸·Î ¼öÁ¤ÇÏ¿´´Ù. ƯÈ÷, ºóµµ´Â ³ôÁö¸¸ Ç¥Á¦¾î°¡ Àû°í, ÁßÀÇÀû Ç¥ÇöÀÌ °¡´ÉÇÑ ÁöÁ¤»ç(~ÀÌ´Ù, ~¾Æ´Ï´Ù), Á¶»ç(ÁÖ°ÝÁ¶»ç‧º¸°ÝÁ¶»ç, ºÎ»ç°ÝÁ¶»ç‧Á¢¼ÓÁ¶»ç), ¾î¹Ì, ¾î±ÙÀÇ ÇüżҵéÀÇ Ç°»ç¸¦ ¼öÁ¤ ¹× ÅëÇÕÇÏ¿© ÃÑ 38°³ Ç°»ç ÅÂ±× ¼¼Æ®·Î ¼öÁ¤ÇÏ¿´´Ù. ¶ÇÇÑ, °³Ã¼¸í ÀνÄÀ» À§ÇÑ Ç°»ç ÅÂ±× ¼¼Æ®¿Í ±¸¹® ºÐ¼®À» À§ÇÑ Ç°»ç ÅÂ±× ¼¼Æ®¸¦ Á¦°øÇÑ´Ù. | ||
¿µ¹®³»¿ë¿ä¾à | In morpheme analyzer, there are many cases where the POS tag sets for describing the corpus tag set in 2015 is used as it is. However, in natural language process view, the vocabulary based POS tag is easier to classify than the form-semantic based POS tag and this effects upper language processors such as a named entity recognizer, a dependency parser. In the standard, to reduce the ambiguity in the morpheme analyzer, the POS tag set is modified from form-semantic base to vocabulary-usage base. The POS of morpheme of copula, postposition, end of word and root that appears frequently but with low head words, was revised and integrated into 38 sets of POS tag especially. It also provides a POS tag set for named entity recognizer and dependency parser. | ||
±¹Á¦Ç¥ÁØ | |||
°ü·ÃÆÄÀÏ | TTAK.KO-10.1100.pdf |