메뉴 건너뛰기

추천
검색
질문

논문 기본 정보

자료유형
학술저널
저자정보
(서울대학교) (서울대학교) (서울과학기술대학교)
저널정보
대한산업공학회 대한산업공학회지 대한산업공학회지 제40권 제1호
발행연도
수록면
18 - 33 (16page)

이용수

표지
📌
연구주제
📖
연구배경
🔬
연구방법
이 논문의 연구방법이 궁금하신가요?
🏆
연구결과
이 논문의 연구결과가 궁금하신가요?
AI에게 요청하기
추천
검색
질문

초록· 키워드

A Word is the smallest unit for text analysis, and the premise behind most text-mining algorithms is that the words in given documents can be perfectly recognized. However, the newly coined words, spelling and spacing errors, and domain adaptation problems make it difficult to recognize words correctly. To make matters worse, obtaining a sufficient amount of training data that can be used in any situation is not only unrealistic but also inefficient. Therefore, an automatical word extraction method which does not require a training process is desperately needed. WordRank, the most widely used unsupervised word extraction algorithm for Chinese and Japanese, shows a poor word extraction performance in Korean due to different language structures. In this paper, we first discuss why WordRank has a poor performance in Korean, and propose a customized WordRank algorithm for Korean, named KR-WordRank, by considering its linguistic characteristics and by improving the robustness to noise in text documents. Experiment results show that the performance of KR-WordRank is significantly better than that of the original WordRank in Korean. In addition, it is found that not only can our proposed algorithm extract proper words but also identify candidate keywords for an effective document summarization.
상세정보 수정요청해당 페이지 내 제목·저자·목차·페이지
정보가 잘못된 경우 알려주세요!

목차

  1. 1. 서론
  2. 2. WordRank
  3. 3. KR-WordRank
  4. 4. KR-WordRank 성능 검증
  5. 5. 결론
  6. 참고문헌

참고문헌

참고문헌 신청

최근 본 자료

전체보기
UCI(KEPA) : I410-ECN-0101-2015-500-001128203