Matrix factorization modeling for recommender system in big data environment :빅데이터 환경에서의 추천시스템을 위한 행렬분해 모델링

지자치

추천

검색

자료유형: 학위논문

저자정보: 지자치 (원광대학교, Wonkwang University)

지도교수: 鄭榮址

발행연도: 2018

저작권: 원광대학교 논문은 저작권에 의해 보호받습니다.

이용수2

초록· 키워드

인터넷에서 사용자와 상품의 수가 폭발적으로 증가함에 따라, 방대한 량의 웹데이터로 인하여 원하는 정보를 선택하는 것이 매우 혼란스럽게 되었다. 추천 시스템(RS)은 이러한 문제점을 해결하는 강력한 도구가 되며, 고객과 서비스 제공 업체 모두에게 유용할 수 있다.
RS는 소비자의 관점에서 볼 때, 사용자의 시간적 행태와 아이템의 특징을 분석함으로써 사용자에게 개인화된 정보를 제공하는데 활용될 수 있다. 서비스 제공자의 관점에서는, RS를 이용하여 상품의 소비자를 보다 빠른 시간에 검색하고, 소비자가 관심있는 대상들을 제시함으로써 사업적인 수익을 증대할 수 있다. 그러므로, RS는 이론과 실제에서 모두 커다란 가치가 있다고 할 수 있다.
실제로 RS는 산발적 데이터, 콜드 스타트, 확장성 그리고 정확성 및 해석 성능 등에 따른 문제점에 직면하고 있다. 이러한 문제를 효과적으로 해결하기 위해 추천 방식에 대한 몇 가지 접근 방법이 제안된 바 있다. 본 연구에서는 데이터 산발성, 확장성 및 추천 정확도에 따른 문제점을 해결하고자 한다.
산발적 데이터의 문제와 관련하여, 부분적으로 누락된 데이터는 사용자가 아이템을 평가하길 원치 않는다는 것을 나타내므로 사용자의 관심에서 부정적 사례로 취급할 수 있다. 이에 근거하여, 부분적으로 누락된 데이터에 대해서 부정적 사례를 모델링하는 몇 가지 방법들을 제안한다. 이러한 방법에는 가중치 방법, 무작위 표본 방법 및 k-최근접 이웃 표본 방식이 있으며, 이 방법들은 SVD++모델과 통합될 수 있다. 결과적으로, SVD++_W, SVD++_R 그리고 SVD++_KNN 추천 모델들을 각각 제안하였으며, 모의 실험한 결과에서, 제안된 모델들은 Top-N 추천에서 정확도 및 리콜 성능을 1% 이상 효과적으로 개선하는 것을 보였다.
확장성의 문제를 해결하기 위하여, 사용자 기반 협업 필터링과 아이템 기반 협업 필터링은 사용자 특징 벡터 및 아이템 특징 벡터를 각각 적용함으로써 개선하였으며, Spark 기반에서 구현할 수 있도록 알고리즘을 설계하였다. 그 결과로, 데이터 볼륨이 큰 경우에도, 계산 노드 수가 증가할 때 실행시간은 1/4로 급격히 감소한다는 것을 보였다.
추천 정확도의 성능을 향상하기 위해서, 문자 관점에서 텍스트 정보의 특징을 추출하는데 심층 컨볼루션 신경망(Deep Convolutional Neural Networks)을 사용하였으며, 추출된 특징은 SVD++ 모델과 원활하게 통합되었다. 모의 실험한 결과에서, 제안한 모델의 정확도가 다른 우수 경쟁 연구에 비해 ML-1m에서1.59%, ML-10m에서 2.44%, 그리고 AIV에서 2.45%와 같이 보다 우수한 성능을 보였다

With the explosive growth in the number of users and products on Internet, vast amounts of web data make us confused about the choice of intended information. Recommender system (RS) is a powerful tool to tackle this problem. RS benefits both customers and service providers.
From the perspective of consumers, RS is adopted to analyze history behavior of users and item feature, then provide personalized information for users. From the view of service providers, RS is utilized to find out the consumers of products more quickly, and present to them what they are interested in, thus business income will increase. Therefore, RS is of great value both in theory and practice.
RS in practice is confront with such challenges as data sparsity, cold start, scalability, accuracy and interpretability etc. Several approaches for recommendation have been proposed to solve such problems. This study attempts to solve the problems of data sparsity, scalability and recommendation accuracy.
With respect to data sparse problem, parts of missing data indicate that users are unwilling to rate on the items and these missing data are treated as negative examples of users’ interest. Based on this view, we propose different methods to model negative examples in missing data. These methods are weighted method, random sample method and k-nearest neighbor sample method. Then these methods are integrated into SVD++ model. Finally, SVD++_W, SVD++_R and SVD++_KNN recommendation models are proposed respectively. The results of the experiment show that the proposed models can effectively improve the precision and the recall, over 1% in Top-N recommendation.
To solve the problem of scalability, user-based collaborative filtering and item-based collaborative filtering are improved by introducing user feature vector and item feature vector respectively, and then algorithms are designed to be implemented based on Spark. The results reveal that the running time dramatically decrease 4 times with the increase of the number of compute nodes, even though the data volume is large.
To enhance the performance of recommendation accuracy, deep convolutional neural network is used to extract the features of textual information from the perspective of character, and then the extracted features are seamlessly integrated into SVD++ model. The experiments show that the precision of proposed model is 1.59% on ML-1m, 2.44% on ML-10m, and 2.45% on AIV better than the other best competitors.

Chapter 1. Introduction 1
1.1 Background 1
1.2 Motivation 6
1.3 Organization 10
Chapter 2. Related Work 12
2.1 Collaborative filtering Recommendation 12
2.1.1 User-based Collaborative Filtering 13
2.1.2 Item-based Collaborative Filtering 17
2.1.3 Model-based Collaborative Filtering 20
2.2 Content-based Recommendation 26
2.3 Top-N Recommendation from Implicit Feedback Datasets 30
2.4 Other Recommendation Approach 34
2.5 Evaluating Recommender Systems 40
2.5.1 Rating prediction evaluation 40
2.5.2 Top-N recommendation evaluation 41
Chapter 3. Matrix Factorization for Implicit Feedback Based on Missing Data Modeling 48
3.1 Introduction 48
3.2 Modeling Method for Missing Data 50
3.2.1 Weighted Method 51
3.2.2 Random Sample Method 53
3.2.3 K-Nearest Neighbor Sample Method 55
3.3 Proposed Recommender Algorithm 57
3.3.1 Improved Algorithm Using Weighted Method 59
3.3.2 Improved Algorithm Using Random Sample Method 61
3.3.3 Improved Algorithm Using K-nearest Neighbor Sample Method 63
3.4 Experiment evaluation 64
3.4.1 Dataset and evaluation metrics 64
3.4.2 Configuration 65
3.4.3 Parameter analysis and choose 65
3.4.4 Experimental results 73
3.4.5 Summary 75
Chapter 4. Distributed Collaborative Filtering Algorithm in Big Data Environment 76
4.1 Introduction 76
4.2 An improved User-based CF Based on Spark 78
4.2.1 The problem of user-based CF 78
4.2.2 An improved user-based CF algorithm 80
4.2.3 Parallel implementation on Spark framework 84
4.3 An improved Item-based CF Based on Spark 87
4.3.1 The problem of item-based CF 87
4.3.2 An improved item-based CF algorithm 88
4.3.3 Parallel implementation on Spark framework 90
4.4 Experiment 93
4.4.1 Dataset 93
4.4.2 Configuration 94
4.4.3 Experimental results 96
4.4.4 Summary 102
Chapter 5. Deep Learning Based Matrix Factorization recommendation for Big Data 103
5.1 Introduction 103
5.2 Character-level deep convolution neural network matrix factorization 109
5.2.1 Deep CNN architecture 110
5.2.2 Char-DCNN-MF model 114
5.3 Experiment and analyze 118
5.3.1 Experimental configuration 118
5.3.2 Compare with baseline model 121
5.3.3 Compare with other matrix factorization models 122
5.3.4 Summary 124
Chapter 6. Results and Discussions 126
Chapter 7. Conclusion 132

최근 본 자료

전체보기

구분	그룹	데이터 항목
AI 학습용 데이터	원문	원문 PDF 파일
AI 학습용 데이터	원문 + 메타 (기본/상세)	원문 PDF 파일 및 서지정보 CSV
대량 구매용 데이터	B2B 구독 방식	특정 자료 한정으로 원문 접근 권한 부여
대량 구매용 데이터	URL 전달 방식	바로 PDF 뷰어를 열람할 수 있는 URL 제공

구분	그룹	데이터 항목
AI 학습용 데이터	기본 메타	발행기관명, 간행물명, 권호명, 권(vol), 호(issue), 통권, 발행연도, 발행월, 논문명, 저자명, 시작페이지, 종료페이지, 전체페이지, 상세페이지URL
상세 메타 데이터	발행기관 메타	발행기관 이명, 영문명, 창립연도, 홈페이지URL, 발행기관 소개
	간행물 메타	부제목, 간행물 유형, ISSN, ISBN, 최초발행연도, 폐간연도, 간행빈도, 발행주기, 등재사항, 이용수, 피인용수, 권호수, 논문수, 표지이미지
	논문 메타	작성 언어, 부제목, 대등제목, 목차, 키워드, 초록, 이미지, 참고문헌, 이용수, 피인용수, 논문활용도, DBpia통합주제분류, KDC분류, DDC분류, 한국연구재단분류, UCI, DOI
	저자 메타	소속기관, 소속부서, 직급, 연구분야, 연구키워드, 이용수, 피인용수, 저자 논문활용도

구분	그룹	데이터 항목
※ 결합형/맞춤형 메타 데이터는 신청 내용에 따라 다양하게 제공 가능
이용순위 정보	주제분야별 많이 이용된 논문	“인문학”에서 많이 이용된 논문 TOP100
	이용기관별 많이 이용된 논문	“중고등학교”에서 많이 이용된 논문 TOP100
	세부기관별 많이 이용된 논문	“서울대학교”에서 많이 이용된 논문 TOP100
	키워드별 많이 이용된 논문	“Chat GPT”에서 많이 이용된 논문 TOP100
키워드 정보	많이 이용된 키워드	특정기간/분야/저널 내 많이 이용된 키워드
	많이 발행된 키워드	특정기간/분야/저널 내 많이 발행된 키워드
	많이 검색된 키워드	특정기간/분야/저널 내 많이 검색된 키워드
	연구 트렌드 키워드	특정 키워드 연관 연구동향 분석 데이터 키워드

논문 기본 정보

초록· 키워드

목차

최근 본 자료

댓글(0)