앙상블 경험적 모드 분해를 이용한 수질자료의 이상치 탐색

박상수; 박노석; 김성수; 조귀래; 윤석민

추천

검색

질문

자료유형: 학술저널

저자정보: 박상수 (경상북도 보건환경연구원) 박노석 (경상대학교) 김성수 (한국수자원공사) 조귀래 (경상대학교) 윤석민 (경상대학교)

저널정보: 대한환경공학회 대한환경공학회지 대한환경공학회지 제43권 제3호

발행연도: 2021.3

수록면: 160 - 170 (11page)

이용수

📌

연구주제

📖

연구배경

🔬

연구방법

🏆

연구결과

초록· 키워드

오류제보하기

목적: 본 연구는 국내 상수도 자동수질측정망을 통해 수집되는 자료에서 발생 가능한 다양한 이상치들을 효율적으로 탐색 및 제거 위한 방법론을 제안하기 위해 수행되었다. 이를 위해 국내 G_정수장으로부터 수온자료를 수집하였으며, 수집된 자료를 대상으로 이상치 방법론에 따른 적용 효과를 검정하였다.
방법: 본 연구에서 수질자료의 이상치 탐색을 위해 적용한 분석 절차는 다음과 같다. 첫째, 수집된 수온자료에 대해 정규성 검정을 수행하고 정규성을 만족하는 경우 Z-score, 정규성을 만족하지 않는 경우 사분위수를 활용하여 이상치를 탐색하고 기존 방법론의 한계점에 대해 분석한다. 둘째, 수온자료에 대해 경험적 모드 분해 및 앙상블 경험적 모드 분해를 활용하여 고유진동함수들을 분해한 후 모드 믹싱에 발생에 대해 고찰한다. 최종적으로 고유진동함수들의 통계적 특성치를 활용해 이상치를 식별할 기준 고유진동함수 집단을 선별한 후 회귀분석과 Cook 통계량의 절사 기준을 활용해 이상치를 제거 및 보간 후 그 성능을 검증한다.
결과 및 토의 : 수온자료의 경우 정규성을 만족하지 못하며, 수정 사분위 방법을 적용하여 이상치 탐색을 수행한 결과 계절 성분 내에 분포하는 이상치들은 전혀 식별할 수 없다는 결과를 확인하였다. 경험적 모드 분해의 경우 이상치들의 효과로 인해 모드 믹싱 현상이 발생하였으나, 앙상블 경험적 모드 분해에서는 모드 믹싱이 해결되어 뚜렷한 계절 성분이 고유진동함수로서 분해되는 것으로 나타났다. 그리고 앙상블 모드 분해로부터 구해진 고유진동함수 중 원시 수온자료와 통계적 관계성이 높은 신호들을 합성하였다. 합성된 고유진동함수와 원시 수온자료를 활용해 회귀 모형을 개발하고, Cook 통계량 근간으로 이상치 탐색을 수행한 결과 계절 성분 내에 분포하는 다양한 이상치들을 효과적으로 탐색할 수 있는 것으로 분석되었다.
결론: 상수도 자동수질측정망을 통해 수집되는 자료들로부터 합리적인 통계분석 결과를 도출하기 위한 과정에서 이상치 탐색 작업은 필수적이라고 할 수 있다. 하지만 기존의 단변량 이상치 탐색 기법의 경우 고유 변동성이 강하게 분포하는 자료에 대해 이상치 탐색 성능이 현저히 떨어지며, 탐색된 이상치에 대한 내삽 방안도 제시하지 못한다는 한계가 명확하다. 반면, 본 연구에서 제시한 앙상블 경험적 모드 분해 및 회귀분석 기반의 이상치 탐색 방법은 고유 변동성이 강한 자료 내에 분포하는 이상치들에 대한 식별 성능이 뛰어나며, 통계적 절사 기준을 제시함에 따라 분석자의 주관적 판단을 최소화 할 수 있는 장점이 있다. 또한 앙상블 경험적 모드 분해 분석으로부터 구해진 고유진동함수들을 이용해 이상치 제거 후 자료 보간이 가능하다는 장점이 있다. 따라서 기존의 단변량 이상치 탐색 기법의 적용성에 대한 한계를 고려할 때 본 연구에서 제시한 이상치 탐색 및 보간 방안은 보다 효과적인 분석도구로서 적용 가능할 것으로 기대된다.

Objectives : This study was conducted to propose a new methodology for efficiently identifying and removing various outliers that occur in data collected through automated water quality monitoring systems. In the present study, water temperature data were collected from domestic G_water supply system, and the performance of the proposed methodology was tested for water temperature data collected from domestic G_water supply system.
Methods : We applied the following analytical procedure to identify outliers in the water quality data: First, a normality test was performed on the collected data. If normality condition was satisfied, the Z-score was used. However, if the normality condition was not satisfied, outliers were identified using the quartile, and the limitations of the existing methodology were analyzed. Second, we decomposed the intrinsic mode function using empirical mode decomposition and ensemble empirical mode decomposition for the collected data, and then considered the occurrence of modal mixing. Finally, a group of intrinsic mode functions was selected using statistical characteristics to identify outliers. In addition, the performance of the method was verified after removing and interpolating outliers using regression analysis and Cook’s distance.
Results and Discussion : In the case of water temperature data, as normality condition was not satisfied, outlier identification was carried out by applying the modified quartile method. It was confirmed that outliers distributed within the seasonal component could not be identified at all. In the case of empirical mode decomposition, modal mixing occurred because of the effect of outliers. However, in the case of the ensemble empirical mode decomposition, modal mixing was resolved and the distinct seasonal components were decomposed as intrinsic mode functions. The intrinsic mode functions were synthesized, which showed statistical correlation with the raw water temperature data. As a result of developing a regression model using the synthesized intrinsic mode functions and raw water temperature data and performing outlier search based on Cook’s distances, we concluded that various outliers distributed within the seasonal component could be effectively identified.
Conclusions : Considering that satisfactory results could be derived from statistical analysis of the data collected from the automated water quality monitoring system, it can be concluded that outlier identification procedures are essential. However, in the case of the conventional univariate outlier search method, it is apparent that the outlier search performance is significantly poor for data with strong inherent variability, and the interpolation method for the searched outlier cannot be performed. Conversely, the outlier identification method based on ensemble empirical mode decomposition and regression analysis proposed in this study shows excellent discrimination performance for outliers distributed in data with strong inherent variability. Moreover, this method has the advantage of reducing the analyst’s dependence on subjective judgment by presenting statistical cutoff criteria. An additional advantage of the method is that data can be interpolated after removing outliers using intrinsic mode functions. Therefore, the outlier search and interpolation method proposed in this study is expected to have greater applicability as a more effective analysis tool compared to the existing univariate outlier search method.

#Automated Water Quality Monitoring System #Water Quality Data #Outlier Detection #Ensemble Empirical Mode Decomposition #Cook's Distance #상수도 시스템 #수질자료 #이상치 탐색 #앙상블 경험적 모드 분해 #Cook 통계량

참고문헌 (17)

참고문헌 신청

Korea Ministry of Government Legislation Home Page, http://www.law.go.kr(2021). Korea Ministry of Environment and Korea Environment Corporation Home Page, http://www.waternow.go.kr(2021). J. Kim, N. Park, S. Yun, S. Chae, S. Yoon, Application of isolation forest technique for outlier detection in water quality data, J. Korean Soc. Environ. Eng., 40(12), 473-480(2018).

S. M. Yoon, S. S. Kim, S. H. Chae, N. S. Park, Introducing new outlier detection method using robust statistical distance in water quality data, Desalin. Water Treat., 149, 157-163(2019).

S. H. Lee, Outlier detection and treatment using R, Free Academy, Paju, Korea(2015).

함께 읽어보면 좋을 논문

논문 유사도에 따라 DBpia 가 추천하는 논문입니다. 함께 보면 좋을 연관 논문을 확인해보세요!

이 논문의 저자 정보

박상수

소속기관 경상북도

주요연구분야 공학 > 건축공학 > 환경공학

논문수 2 이용수 176

박노석

소속기관 경상대학교

주요연구분야 공학 > 건축공학 > 환경공학 TOP 5%

논문수 41 이용수 2,311

김성수

소속기관 한국수자원공사

주요연구분야 공학 > 건축공학 > 환경공학

논문수 1 이용수 124

조귀래

소속기관 경상국립대학교

주요연구분야 사회과학 > 지리학 > 국제·지역개발학 공학 > 건축공학 > 환경공학

논문수 3 이용수 398

윤석민

소속기관 경상대학교

주요연구분야 공학 > 건축공학 > 환경공학 TOP 5%

논문수 38 이용수 2,316

이 논문과 함께 이용한 논문

도시가스(NG) 설비에 대한 CAPSS NO_x 배출계수의 타당성 검토

이창언 , 윤아선 , 하우석 외 1명 대한기계학회 논문집 B권 2022 .11

천연가스에 대한 기존 CAPSS의 NOx 배출계수의 타당성 검토

하우석 , 윤아선 , 이창언 한국가스학회 학술대회논문집 2021 .11

최근 본 자료

전체보기

UCI(KEPA) : I410-ECN-0101-2021-539-001637372

구분	그룹	데이터 항목
AI 학습용 데이터	원문	원문 PDF 파일
AI 학습용 데이터	원문 + 메타 (기본/상세)	원문 PDF 파일 및 서지정보 CSV
대량 구매용 데이터	B2B 구독 방식	특정 자료 한정으로 원문 접근 권한 부여
대량 구매용 데이터	URL 전달 방식	바로 PDF 뷰어를 열람할 수 있는 URL 제공

구분	그룹	데이터 항목
AI 학습용 데이터	기본 메타	발행기관명, 간행물명, 권호명, 권(vol), 호(issue), 통권, 발행연도, 발행월, 논문명, 저자명, 시작페이지, 종료페이지, 전체페이지, 상세페이지URL
상세 메타 데이터	발행기관 메타	발행기관 이명, 영문명, 창립연도, 홈페이지URL, 발행기관 소개
	간행물 메타	부제목, 간행물 유형, ISSN, ISBN, 최초발행연도, 폐간연도, 간행빈도, 발행주기, 등재사항, 이용수, 피인용수, 권호수, 논문수, 표지이미지
	논문 메타	작성 언어, 부제목, 대등제목, 목차, 키워드, 초록, 이미지, 참고문헌, 이용수, 피인용수, 논문활용도, DBpia통합주제분류, KDC분류, DDC분류, 한국연구재단분류, UCI, DOI
	저자 메타	소속기관, 소속부서, 직급, 연구분야, 연구키워드, 이용수, 피인용수, 저자 논문활용도

구분	그룹	데이터 항목
※ 결합형/맞춤형 메타 데이터는 신청 내용에 따라 다양하게 제공 가능
이용순위 정보	주제분야별 많이 이용된 논문	“인문학”에서 많이 이용된 논문 TOP100
	이용기관별 많이 이용된 논문	“중고등학교”에서 많이 이용된 논문 TOP100
	세부기관별 많이 이용된 논문	“서울대학교”에서 많이 이용된 논문 TOP100
	키워드별 많이 이용된 논문	“Chat GPT”에서 많이 이용된 논문 TOP100
키워드 정보	많이 이용된 키워드	특정기간/분야/저널 내 많이 이용된 키워드
	많이 발행된 키워드	특정기간/분야/저널 내 많이 발행된 키워드
	많이 검색된 키워드	특정기간/분야/저널 내 많이 검색된 키워드
	연구 트렌드 키워드	특정 키워드 연관 연구동향 분석 데이터 키워드

논문 기본 정보

초록· 키워드

AI 요약

연구주제

연구배경

연구방법

연구결과

주요내용

목차

참고문헌 (17)

함께 읽어보면 좋을 논문

이 논문의 저자 정보

이 논문과 함께 이용한 논문

최근 본 자료

댓글(0)