본문 바로가기
[학술저널]

  • 학술저널

김현중(서울대학교) 조성준(서울대학교) 강필성(서울과학기술대학교)

UCI(KEPA) : I410-ECN-0101-2015-500-001128203

표지

북마크 0

리뷰 0

이용수 274

피인용수 0

초록

A Word is the smallest unit for text analysis, and the premise behind most text-mining algorithms is that the words in given documents can be perfectly recognized. However, the newly coined words, spelling and spacing errors, and domain adaptation problems make it difficult to recognize words correctly. To make matters worse, obtaining a sufficient amount of training data that can be used in any situation is not only unrealistic but also inefficient. Therefore, an automatical word extraction method which does not require a training process is desperately needed. WordRank, the most widely used unsupervised word extraction algorithm for Chinese and Japanese, shows a poor word extraction performance in Korean due to different language structures. In this paper, we first discuss why WordRank has a poor performance in Korean, and propose a customized WordRank algorithm for Korean, named KR-WordRank, by considering its linguistic characteristics and by improving the robustness to noise in text documents. Experiment results show that the performance of KR-WordRank is significantly better than that of the original WordRank in Korean. In addition, it is found that not only can our proposed algorithm extract proper words but also identify candidate keywords for an effective document summarization.

목차

1. 서론
2. WordRank
3. KR-WordRank
4. KR-WordRank 성능 검증
5. 결론
참고문헌

리뷰(0)

도움이 되었어요.0

도움이 안되었어요.0

첫 리뷰를 남겨주세요.
DBpia에서 서비스 중인 논문에 한하여 피인용 수가 반영됩니다.
인용된 논문이 DBpia에서 서비스 중이라면, 아래 [참고문헌 신청]을 통해서 등록해보세요.
Insert title here