메뉴 건너뛰기
.. 내서재 .. 알림
소속 기관/학교 인증
인증하면 논문, 학술자료 등을  무료로 열람할 수 있어요.
한국대학교, 누리자동차, 시립도서관 등 나의 기관을 확인해보세요
(국내 대학 90% 이상 구독 중)
로그인 회원가입 고객센터 ENG
주제분류

추천
검색

논문 기본 정보

자료유형
학위논문
저자정보

윤영신 (한림대학교, 한림대학교 대학원)

지도교수
김유섭
발행연도
2017
저작권
한림대학교 논문은 저작권에 의해 보호받습니다.

이용수9

표지
AI에게 요청하기
추천
검색

이 논문의 연구 히스토리 (2)

초록· 키워드

오류제보하기
In the natural language processing module based on machine learning, it is very important to express the word as input of the module. Previous studies have used the one-hot form, which assumes that the size of the vector is large and that the qualities that make up the vector are completely inde-pendent of each other. On the other hand, word-embedding, which analyzes the correlation between these qualities and expresses words with new and smaller-sized qualities, can improve the perfor-mance of various models of natural language processing In addition, its effectiveness has been demonstrated in a number of Bio-NLP (Biomedical Natural Language Processing) areas. In this pa-per, we use word embedding to calculate the similarity between certain disease markers and some diseases. Using the results of similarity calculations, analyze whether specific diseases and ovarian cancer biomarker / microorganism word pairs are related or deeply related.
In this paper, we use the title and summary part of the PubMed biomedical paper as the target text. First, the title and abstract of the biomarker related to ovarian cancer biomarker / microbiology are extracted and used. In this study, CCA (Canonical Correlation Analysis) is used among the word embedding models with established categories. CCA is a model for finding a k-dimensional projec-tion vector that maximizes the correlation between a word expression and a contextual expression related to a word. These vectors are two-dimensionally mapped using t-SNE, visualized, and the results are used to calculate the disease name and the degree of similarity between ovarian cancer markers / microorganisms. Based on the results of similarity, we analyze the relationship between each disease and biomarker / microorganism. First, the related theses are searched through Google Scholars to see if the results of the top 20 pairs of similarities between disease and biomarkers / microorganisms are actual research, and pairs that are not actually studied are extracted from the top 20 pairs with high similarity score. As a result of the experiment, 85% of the cases with high simi-larity actually proceeded to grasp the relationship between the disease and the marker, but the actual research on the remaining 15% pairs was not done well. When the degree of similarity is low, it is understood that the remaining pairs except for the 15% pair are not well researched similar to the result of similarity. Identify the relationship between disease and microorganisms, such as the analy-sis of disease and biomarkers. Experimental results showed that 85% of the cases with high similarity of disease - microorganism pair are actually proceeding to grasp the relationship of word pairs. When the relationship between disease and biomarker / microbe word pair is analyzed based on the simi-larity result value, the analysis of the relationship of the word pairs is well done and analyzed that they are related to each other.

목차

목차 I
표목차 III
그림목차 IV
1. 서론 1
1.1 개요 1
1.2 워드 임베딩 2
1.2.1 WORD2VEC 3
1.2.2 GLOVE 5
1.2.3 CCA(CANONICAL CORRELATION ANALYSIS) 6
2. 관련 연구 8
2.1 BIO-NLP 8
2.2 관련 연구 9
3. 질병 분석 10
3.1 데이터 데이터 10
3.1.1 바이오 마커 분석 데이터 11
3.1.2 미생물 분석 데이터 12
3.2 질병 분석 방법 13
3.2.1 데이터 추출 13
3.2.2 워드 임베딩 14
3.2.3 코사인 유사도 계산 16
3.2.4 유사도 결과 분석 16
3.3 바이오 바이오 마커 분석 18
3.4 미생물 미생물 분석 22
4. 실험 결과 26
4.1 바이오 바이오 마커 분석 결과 26
4.2 미생물 미생물 마커 분석 결과 29
5. 결론 33
참고문헌 35
국문 초록 38
논문 실적 40

최근 본 자료

전체보기

댓글(0)

0