Ç¥ÁØÈ­ Âü¿©¾È³»

TTAÀÇ Ç¥ÁØÇöȲ

Ȩ > Ç¥ÁØÈ­ °³¿ä > TTAÀÇ Ç¥ÁØÇöȲ

Ç¥ÁعøÈ£ TTAK.KO-10.1100 ±¸Ç¥ÁعøÈ£
Á¦°³Á¤ÀÏ 2018-12-19 ÃÑÆäÀÌÁö 20
ÇѱÛÇ¥ÁØ¸í ´ë¿ë·® ÅؽºÆ® µ¥ÀÌÅÍ Ã³¸® È¿À² °³¼±À» À§ÇÑ ÇüÅÂ¼Ò Ç°»ç ÅÂ±× ¼¼Æ®
¿µ¹®Ç¥Áظí Part-of-Speech Tag Set for Improving the Processing Efficiency of Large-scale Text Data
Çѱ۳»¿ë¿ä¾à ÇüÅÂ¼Ò ºÐ¼®±â¿¡¼­µµ 2015³â Á¦Á¤µÈ ¸»¹¶Ä¡ ÀÛ¼º¿ë Ç°»ç ÅÂ±× ¼¼Æ®¸¦ ±×´ë·Î »ç¿ëÇÏ´Â °æ¿ì°¡ ¸¹ÀÌ ÀÖ´Ù. ÇÏÁö¸¸, ½ÇÁ¦ ÀÚ¿¬¾î ó¸® °üÁ¡¿¡¼­ ¾îÈÖÈ°¿ë ±â¹ÝÀÇ Ç°»ç űװ¡ ÇüÅÂÀÇ¹Ì ±â¹ÝÀÇ Ç°»ç ű׺¸´Ù ºÐ·ùÇϱ⠿ëÀÌÇÏ°í, ÀÌ·± °á°ú´Â ÇüÅÂ¼Ò ºÐ¼® °á°ú¸¦ ÀÚÁú·Î °¡Áö´Â »óÀ§ ¾ð¾î ó¸®(°³Ã¼¸í Àνıâ, ±¸¹® ºÐ¼®±â)¿¡¼­µµ ¿µÇâÀ» ¹ÌÄ£´Ù. º» Ç¥ÁØ¿¡¼­´Â ÇüÅÂ¼Ò ºÐ¼®±â¿¡¼­ °¢ ÇüżҰ¡ °¡Áö´Â Ç°»çÀÇ ¸ðÈ£¼ºÀ» ÁÙÀ̱â À§ÇØ Ç°»ç ÅÂ±× ¼¼Æ®¸¦ ¾îÈÖÈ°¿ë Áß½ÉÀ¸·Î ¼öÁ¤ÇÏ¿´´Ù. ƯÈ÷, ºóµµ´Â ³ôÁö¸¸ Ç¥Á¦¾î°¡ Àû°í, ÁßÀÇÀû Ç¥ÇöÀÌ °¡´ÉÇÑ ÁöÁ¤»ç(~ÀÌ´Ù, ~¾Æ´Ï´Ù), Á¶»ç(ÁÖ°ÝÁ¶»ç‧º¸°ÝÁ¶»ç, ºÎ»ç°ÝÁ¶»ç‧Á¢¼ÓÁ¶»ç), ¾î¹Ì, ¾î±ÙÀÇ ÇüżҵéÀÇ Ç°»ç¸¦ ¼öÁ¤ ¹× ÅëÇÕÇÏ¿© ÃÑ 38°³ Ç°»ç ÅÂ±× ¼¼Æ®·Î ¼öÁ¤ÇÏ¿´´Ù. ¶ÇÇÑ, °³Ã¼¸í ÀνÄÀ» À§ÇÑ Ç°»ç ÅÂ±× ¼¼Æ®¿Í ±¸¹® ºÐ¼®À» À§ÇÑ Ç°»ç ÅÂ±× ¼¼Æ®¸¦ Á¦°øÇÑ´Ù.
¿µ¹®³»¿ë¿ä¾à In morpheme analyzer, there are many cases where the POS tag sets for describing the corpus tag set in 2015 is used as it is. However, in natural language process view, the vocabulary based POS tag is easier to classify than the form-semantic based POS tag and this effects upper language processors such as a named entity recognizer, a dependency parser. In the standard, to reduce the ambiguity in the morpheme analyzer, the POS tag set is modified from form-semantic base to vocabulary-usage base. The POS of morpheme of copula, postposition, end of word and root that appears frequently but with low head words, was revised and integrated into 38 sets of POS tag especially. It also provides a POS tag set for named entity recognizer and dependency parser.
±¹Á¦Ç¥ÁØ
°ü·ÃÆÄÀÏ TTAK.KO-10.1100.pdf TTAK.KO-10.1100.pdf            

ÀÌÀü
º¹ÇÕ ÀÓº£µðµå ½Ã½ºÅÛ ³»ÀÇ ÀüÀÚÀåÄ¡ °£ µ¥ÀÌÅÍ ±³È¯ ÇÁ·Î±×·¡¹Ö ÀÎÅÍÆäÀ̽º ±â´É ¸í¼¼
´ÙÀ½
½Ç½Ã°£ »çÀ̹ö-¹°¸® ½Ã½ºÅÛ(CPS) ÀÀ¿ëÀ» À§ÇÑ µ¥ÀÌÅͺй輭ºñ½º ¿ä±¸»çÇ×