인문학
사회과학
자연과학
공학
의약학
농수해양학
예술체육학
복합학
지원사업
학술연구/단체지원/교육 등 연구자 활동을 지속하도록 DBpia가 지원하고 있어요.
커뮤니티
연구자들이 자신의 연구와 전문성을 널리 알리고, 새로운 협력의 기회를 만들 수 있는 네트워킹 공간이에요.
논문 기본 정보
- 자료유형
- 학위논문
- 저자정보
- 지도교수
- 김진아
- 발행연도
- 2023
- 저작권
- 한국외국어대학교 논문은 저작권에 의해 보호받습니다.
이용수5
초록· 키워드
상세정보 수정요청해당 페이지 내 제목·저자·목차·페이지정보가 잘못된 경우 알려주세요!
The present study conducted the translation quality of Korean neural machine translation in Chinese statutory provisions with automatic assessment and human judgement and compares and analyzes the results from the perspective of Chesterman’s expectancy norms among the descriptive translations studies.
The purpose of this is to examine whether Chinese legal texts of neural machine translation meets the reader’s expected standard to explore the translation quality of neural machine translation and how human evaluators can efficiently use automatic assessment.
This study targets a total of 648 sentences of 219 articles composed of the Constitution of People’s Republic of China, Chinese Nationality Act, Chinese State Reimbursement Act and Anti-foreign Sanctions Act provided by the World Laws Information Center under the Office of Legislation. The expected standards for legal text translation were set as accuracy, readability, and consistency in accordance with the legal translation guidelines of the Office of Legislation and the Korea Legislation Research Institute, which are public institutions in charge of legal translation in Korea. Machine translation was calculated through Naver Papago(N2MT) and Google Translation(GNMT) for legal texts, and evaluation was conducted by applying an automatic assessment model and human judgement model. The calculation date of the neural machine translation system was June 30, 2022 and three human evaluators participated in human judgement.
As a result of data analysis, both automatic assessment and human judgement except for Bleu score recorded 0.8 or higher. In terms of correlation, the correlation between automatic assessment was 0.60 to 0.85, showing a medium to high volume of correlation, and the correlation between human evaluators was 0.45 to 0.55, showing a weak or medium volume of correlation. The correlation between automatic assessment and human judgement showed the lowest correlation between Bleu score and human judgement with 0.22~0.26, Bert score and human judgement with 0.27~0.41 and Laser score with 0.00~0.18 respectively.
From the perspective of accuracy among expected standard, automatic assessment has advantages in accuracy because metrics have developed around accuracy, but errors have emerged that can’t completely judge from some sentences to contexts. In terms of human judgement, there were differences between evaluators in word layer, phrase layer, and text layer except sentence layer.
In terms of readability, automatic assessment and human judgement showed different patterns in judging the context. For some sentences that all human evaluators gave high scores, when vocabulary aspects or logical relationships changed, automatic assessment tended to judge them as errors.
For the consistency, in the case of automatic assessment, the same word is mixed and translated into different vocabulary even within a sentence, and it showed that it’s not unified with legal titles. In case of human judgement, overall consistency and coherency were judged with generally similar probabilities.
To recap, automatic assessment had advantages in the evaluation of ‘accuracy’ among the expected standard of legal texts, ‘accuracy’, ‘readability’ and ‘consistency,’ while ‘readability’ and ‘consistency’ were partially possible. In terms of human judgement, it had the advantage of being able to evaluate all of ‘accuracy’, ‘readability’, and ‘consistency,’ but there was a problem with the evaluator’s subjectivity. From the perspective of expected standard, to efficiently evaluate the machine translation of Chinese legal texts provisions, it’s necessary to consider items for ‘consistency’ and ‘weight factor’.
Analyzing the results of the present study, Korean neural machine translation is still difficult to completely meet the expected standard for Chinese legal texts, and it’s necessary to improve by technological development such as improving the similarity. In terms of human judgement, in order to use neural machine translation in the translation quality evaluation of Chinese legal texts done by human, methodological plan such as using statute books, analyzing, and synthesizing weak parts of neural machine translation and making evaluation criteria specialized in Chinese legal texts.
The purpose of this is to examine whether Chinese legal texts of neural machine translation meets the reader’s expected standard to explore the translation quality of neural machine translation and how human evaluators can efficiently use automatic assessment.
This study targets a total of 648 sentences of 219 articles composed of the Constitution of People’s Republic of China, Chinese Nationality Act, Chinese State Reimbursement Act and Anti-foreign Sanctions Act provided by the World Laws Information Center under the Office of Legislation. The expected standards for legal text translation were set as accuracy, readability, and consistency in accordance with the legal translation guidelines of the Office of Legislation and the Korea Legislation Research Institute, which are public institutions in charge of legal translation in Korea. Machine translation was calculated through Naver Papago(N2MT) and Google Translation(GNMT) for legal texts, and evaluation was conducted by applying an automatic assessment model and human judgement model. The calculation date of the neural machine translation system was June 30, 2022 and three human evaluators participated in human judgement.
As a result of data analysis, both automatic assessment and human judgement except for Bleu score recorded 0.8 or higher. In terms of correlation, the correlation between automatic assessment was 0.60 to 0.85, showing a medium to high volume of correlation, and the correlation between human evaluators was 0.45 to 0.55, showing a weak or medium volume of correlation. The correlation between automatic assessment and human judgement showed the lowest correlation between Bleu score and human judgement with 0.22~0.26, Bert score and human judgement with 0.27~0.41 and Laser score with 0.00~0.18 respectively.
From the perspective of accuracy among expected standard, automatic assessment has advantages in accuracy because metrics have developed around accuracy, but errors have emerged that can’t completely judge from some sentences to contexts. In terms of human judgement, there were differences between evaluators in word layer, phrase layer, and text layer except sentence layer.
In terms of readability, automatic assessment and human judgement showed different patterns in judging the context. For some sentences that all human evaluators gave high scores, when vocabulary aspects or logical relationships changed, automatic assessment tended to judge them as errors.
For the consistency, in the case of automatic assessment, the same word is mixed and translated into different vocabulary even within a sentence, and it showed that it’s not unified with legal titles. In case of human judgement, overall consistency and coherency were judged with generally similar probabilities.
To recap, automatic assessment had advantages in the evaluation of ‘accuracy’ among the expected standard of legal texts, ‘accuracy’, ‘readability’ and ‘consistency,’ while ‘readability’ and ‘consistency’ were partially possible. In terms of human judgement, it had the advantage of being able to evaluate all of ‘accuracy’, ‘readability’, and ‘consistency,’ but there was a problem with the evaluator’s subjectivity. From the perspective of expected standard, to efficiently evaluate the machine translation of Chinese legal texts provisions, it’s necessary to consider items for ‘consistency’ and ‘weight factor’.
Analyzing the results of the present study, Korean neural machine translation is still difficult to completely meet the expected standard for Chinese legal texts, and it’s necessary to improve by technological development such as improving the similarity. In terms of human judgement, in order to use neural machine translation in the translation quality evaluation of Chinese legal texts done by human, methodological plan such as using statute books, analyzing, and synthesizing weak parts of neural machine translation and making evaluation criteria specialized in Chinese legal texts.
목차
- 1. 서론 11.1. 연구배경 및 연구목적 11.2. 연구문제 및 연구방법 32. 이론적 배경과 선행연구 62.1. 기술론적 번역학(Descriptive Translation Studies: DTS) 62.1.1. 체스터만(Chesterman)의 기대규범 92.1.2. 번역품질평가(TQA)와 TQA 지표 172.1.3. TQA 모델 242.2. 인공신경망 기계번역(Neural Machine Translation: NMT) 422.2.1. 인공신경망 기계번역의 개념 422.2.2. 인공신경망 기계번역의 품질평가지표 452.2.3. 인공신경망 기계번역 모델 502.3. 법률번역 562.3.1. 한국의 법률체계와 법조문의 구조 562.3.2. 중국의 법률체계와 법조문의 구조 592.3.3. 법률번역과 번역전략 633. 연구방법 663.1. 연구문제와 연구모형 663.2. 연구절차 693.2.1. 데이터 수집 693.2.2. 데이터 분석 733.2.2.1. 자동평가모델 733.2.2.1.1. Bleu score 743.2.2.1.2. Bert score 783.2.2.1.3. Laser score 843.2.2.2. 인간평가모델 873.2.2.2.1. 한중 인공신경망 기계번역 평가모델 873.2.2.3. 상관계수와 p-value 924. 데이터 분석 결과 934.1. 자동평가 분석 934.1.1. 자동평가모델 분석 934.1.1.1. Bleu score 934.1.1.2. Bert score 1034.1.1.3. Laser score 1094.1.2. 법령 분석 1114.1.2.1. 헌법 1114.1.2.2. 국적법 1134.1.2.3. 국가배상법 1154.1.2.4. 반외국제재법 1174.1.3. 상관계수와 p-value 1194.1.4. 소결 1244.2. 인간평가 분석 1274.2.1. 인간평가모델 분석 1274.2.1.1. 단어 층위 1304.2.1.2. 구 층위 1334.2.1.3. 문장 층위 1354.2.1.4. 텍스트 층위 1374.2.2. 법령 분석 1394.2.2.1. 헌법 1394.2.2.2. 국적법 1414.2.2.3. 국가배상법 1444.2.2.4. 반외국제재법 1474.2.3. 상관계수와 p-value 1494.2.4. 소결 1544.3. 자동평가와 인간평가를 통한 기대규범 분석 1574.3.1. 정확성 1574.3.2. 이해 용이성 1604.3.3. 통일성 1644.3.4. 소결 1675. 결론 1695.1 연구 요약 1695.2 연구의 한계와 의의 174참고문헌 177부록 186영문초록 188