CEO 편지 텍스트마이닝 -독일기업 사례를 통한 전문 텍스트종류 연구-

방경원

doi:https://dx.doi.org/10.52743/HR.62.13

추천

검색

자료유형: 학술저널

저자정보: 방경원 (한국외국어대학교)

저널정보: 조선대학교 인문학연구원 인문학연구 인문학연구 제62호

발행연도: 2021.8

수록면: 371 - 402 (32page)

DOI: https://dx.doi.org/10.52743/HR.62.13

이용수

📌

연구주제

📖

연구배경

🔬

연구방법

🏆

연구결과

초록· 키워드

오류제보하기

본 논문의 목적은 텍스트마이닝 기법을 사용하면서, 주주들에게 보내는 CEO의 편지에 나오는 주제를 가지고, 텍스트종류를 특징짓는 것이다. 독어학의 연구대상으로 독일기업의 자료를 예로 활용했다. 텍스트마이닝은 빅데이터를 처리하는 기법의 하나이다. 빅데이터 처리 기법을 텍스트 처리에 활용하는 길이 열리면서, 최근 들어 다양한 학문 분야에서 텍스트마이닝 기법이 실험되고 있다. 텍스트언어학 연구가들도 이 기법에 관심을 가질 수 있다. 텍스트 내에서 직접 확인할 수 있는 유일한 실체는 단어이다. 그래서 텍스트의 생산과 수용을 실용적으로 다루려는 연구자들에게 단어는 주요 관심 대상이 되었다. 따라서 CEO의 편지 상의 주제들을 단어의 빈도와 단어 간의 연관 관계를 통해 특징짓는 것은 텍스트의 실용적 연구에도 기여할 것으로 기대된다. 독일기업 35개에게서 수집된 CEO의 편지를 원천 데이터로 삼고, 이 데이터를 코퍼스로 변환하고 정제한 후, 문서-용어행렬(DTM) 객체로 다시 변환하였다. 사용된 패키지는 R에서의 quanteda이다. 준비된 데이터 객체는 이후 단어 빈도에 따라 단어구름과 막대그래프로 시각화되었다. 상위 빈도 단어들은 주제에 따라 분류되었는데, 분류된 주제들은 합리적인 행동의 수행자 및 관련자, 시간, 경영성과, 비교, 영역, 합리적인 행동 등이다. CEO의 편지에 나타난 세계의 단면으로 간주된다. 다음, 단어 간의 연결성이 동시 출현 연결망(Co-occurence Network)을 통해 그려졌고, 합리적인 행동의 관련자인 kunden을 중심으로 하는 연결망이 결과 되었다. 이어서 주제 모델(Topicmodel)에 따라, 서로 독립성이 강한 3개의 주제를 추려내었고, 이 중 2개의 주제는 영역과 경영성과로 각각 요약되었다. 나머지 1개 주제는 단어 간 빈도 비율 차이가 크지 않아 별도 주제로 삼지 않았다. 이제까지는 미리 범주를 설정하지 않고, 단어 빈도만으로 단어를 분류한 것이다. 그러나 미리 범주를 정해 단어를 분류하는 방법도 가능해졌다. 사전 접근법으로 문서-용어행렬(DTM) 객체에 개인적으로 작성한 사전 목록을 첨가하여 자료를 조사하는 것이다. 단어 빈도에서 결과 된 주제와 그간 CEO의 편지에 대한 주제 연구에서 제시된 결과를 참고하여, 미리 5개의 주제를 정하였다. zeit, ergebnis, aktie, handeln, esg(환경, 사회, 지배구조) 등의 주제이다. 이런 주제들에 속하는 단어들이 모아졌고, 이들 단어 목록으로 사전이 만들어졌다. 비교를 위해 35개 CEO의 편지로 구성된 문서들을 8개 산업 분야로 나누었고, 이들 산업 분야별로 주제 분포를 조사하였다. 산업 분야별로 주제 분포의 차이는 있으나, 대체로 각각의 주제가 산업 분야와 무관하게 누락되지 않고 다루어졌다. 주제의 자동분류에서 나온 결과는 규범적인 주제 분류에 참고할 수 있게 되면서, 텍스트종류에 특화된 주제를 찾는 연구에도 개선이 이루어지리라 기대된다. 이로써 주제도 텍스트종류를 특징지을 수 있는 요소로 자리매김 될 수 있을 것으로 판단된다.

The purpose of this paper is to characterize the type of texts using the text mining technique, with the topics of the CEO's letter to shareholders. Data from German companies were used as an example for the study of German linguistics. Text mining is one of the techniques for processing big data. As the way to use big data processing techniques for text processing has been opened, recently, text mining techniques are being experimented with in various academic fields. Text linguistics researchers may also be interested in this technique. The only entities that can be identified directly within the text are words. Thus, words have become a major concern for researchers who want to deal with the production and reception of texts in a practical way. Therefore, characterizing the topics in the CEO's letter through the frequency and association of words is expected to contribute to the practical study of the text. The CEO's letters collected from 35 German companies were taken as the source data, and this data was converted into a corpus, refined, and converted back into a document-term matrix (DTM) object. The package used is quanteda in R. The prepared data objects were then visualized as wordcloud and bar graphs according to word frequency. The high frequency words were classified according to the topic, and the classified topics were the performers and related parties of rational actions, time, business performance, comparison, domain, and rational action. It is regarded as a slice of the world presented in the CEO's letter. Next, the association between words was drawn through the co-occurence network, resulting in a network centered on kunden, the related parties of rational actions. Then, according to the topic model, three topics with strong independence were selected, and two topics were summarized as domain and business performance, respectively. The remaining one topic was not treated as a separate topic because the difference in frequency ratio between words was not large. Until now, without setting a category in advance, words were classified only by word frequency. However, it is also possible to classify words by defining categories in advance. A dictionary approach is to examine data by adding a list of personal dictionaries to a document-term matrix (DTM) object. By referring to the topics resulting from word frequency and the results presented in the topic study on the CEO's letter, five topics were determined in advance. Topics include zeit, ergebnis, aktie, handeln, esg (environment, society, governance). Words belonging to these topics were collected, and a dictionary was created from the list of words. For comparison, the documents composed of 35 CEO’s letters were divided into 8 industry fields, and the distribution of topics by these industry fields was investigated. Although there are differences in the distribution of topics by industry fields, in general, each topic was dealt with without omission regardless of the industry fields. As the results from automatic topic classification can be referred to for normative topic classification, it is expected that improvements will be made in research to find topics specific to text types. Accordingly, it is judged that the topic can be positioned as an element that can characterize the type of text.

#CEO의 편지 #텍스트마이닝 #토픽모델링 #전문 텍스트종류 #주제 #CEO's letter #text mining #topic modeling #specialized text types #topics

참고문헌 (0)

참고문헌 신청

참고문헌이 DBpia에서 서비스 중이라면, [참고문헌 신청]을 통해 등록해보세요

함께 읽어보면 좋을 논문

논문 유사도에 따라 DBpia 가 추천하는 논문입니다. 함께 보면 좋을 연관 논문을 확인해보세요!

이 논문의 저자 정보

방경원

소속기관 한국외국어대학교

주요연구분야 인문학 > 인문학 일반

논문수 4 이용수 17

최근 본 자료

전체보기

구분	그룹	데이터 항목
AI 학습용 데이터	원문	원문 PDF 파일
AI 학습용 데이터	원문 + 메타 (기본/상세)	원문 PDF 파일 및 서지정보 CSV
대량 구매용 데이터	B2B 구독 방식	특정 자료 한정으로 원문 접근 권한 부여
대량 구매용 데이터	URL 전달 방식	바로 PDF 뷰어를 열람할 수 있는 URL 제공

구분	그룹	데이터 항목
AI 학습용 데이터	기본 메타	발행기관명, 간행물명, 권호명, 권(vol), 호(issue), 통권, 발행연도, 발행월, 논문명, 저자명, 시작페이지, 종료페이지, 전체페이지, 상세페이지URL
상세 메타 데이터	발행기관 메타	발행기관 이명, 영문명, 창립연도, 홈페이지URL, 발행기관 소개
	간행물 메타	부제목, 간행물 유형, ISSN, ISBN, 최초발행연도, 폐간연도, 간행빈도, 발행주기, 등재사항, 이용수, 피인용수, 권호수, 논문수, 표지이미지
	논문 메타	작성 언어, 부제목, 대등제목, 목차, 키워드, 초록, 이미지, 참고문헌, 이용수, 피인용수, 논문활용도, DBpia통합주제분류, KDC분류, DDC분류, 한국연구재단분류, UCI, DOI
	저자 메타	소속기관, 소속부서, 직급, 연구분야, 연구키워드, 이용수, 피인용수, 저자 논문활용도

구분	그룹	데이터 항목
※ 결합형/맞춤형 메타 데이터는 신청 내용에 따라 다양하게 제공 가능
이용순위 정보	주제분야별 많이 이용된 논문	“인문학”에서 많이 이용된 논문 TOP100
	이용기관별 많이 이용된 논문	“중고등학교”에서 많이 이용된 논문 TOP100
	세부기관별 많이 이용된 논문	“서울대학교”에서 많이 이용된 논문 TOP100
	키워드별 많이 이용된 논문	“Chat GPT”에서 많이 이용된 논문 TOP100
키워드 정보	많이 이용된 키워드	특정기간/분야/저널 내 많이 이용된 키워드
	많이 발행된 키워드	특정기간/분야/저널 내 많이 발행된 키워드
	많이 검색된 키워드	특정기간/분야/저널 내 많이 검색된 키워드
	연구 트렌드 키워드	특정 키워드 연관 연구동향 분석 데이터 키워드

논문 기본 정보

초록· 키워드

AI 요약

연구주제

연구배경

연구방법

연구결과

주요내용

목차

참고문헌 (0)

함께 읽어보면 좋을 논문

이 논문의 저자 정보

최근 본 자료

댓글(0)