Efficient Data Management Schemes for High-Performance Storage Devices :

송내영

추천

검색

자료유형: 학위논문

저자정보: 송내영 (서울대학교, 서울대학교 대학원)

발행연도: 2018

저작권: 서울대학교 논문은 저작권에 의해 보호받습니다.

이용수0

초록· 키워드

최근 하드웨어 기술의 발전으로 저장 장치가 발전함에 따라 Solid State Drive (SSD) 와 같은
고성능 저장 장치가 등장하였다.
고성능 저장 장치들은 높은 대역폭, 낮은 지연시간, 높은 입출력 및 병렬성을 제공하며, 기존 Hard Disk Drive (HDD) 의
기계적 오버헤드를 없앴기 때문에 데이터 접근을 수 십에서 수 백 배 빠르게 한다.
하지만, 이러한 고성능 저장 장치들을 기존의 소프트웨어 계층에서 그대로 사용하게 된다면
소프트웨어 계층의 오버헤드 때문에 고성능 저장 장치의 성능을 최대로 사용할 수없다.

본 논문에서는 고성능 저장 장치의 특성에 맞게 데이터 관리 기법들을 최적화한다.
고성능 저장 장치는 지연시간이 낮기 때문에 기존 소프트웨어 계층에서의 오버헤드가 더 많이 드러난다.
본 논문에서 지적한 첫 번째 소프트웨어 오버헤드는 페이지 회수 오버헤드이다.
고성능 저장 장치 기반의 시스템에서 매핑된 페이지를 회수 할 때 소프트웨어 계층의 unmap 오버헤드가 부각된다.
이를 줄이기 위해서 본 논문에서는 page recycling 기법을 제안하여 unmap overhead를 해당 응용 프로그램으로 국한시킴으로써 전체 시스템의 성능을 높일 수 있었다.

두 번째는 metadata lookup operation이다. 기존 리눅스 시스템에서는 파일들을 path 기반으로 관리하고 있다.
이러한 path 기반 파일들을 접근 하기 전에 반드시 수행되어야 하는 metadata operation은 hash table lookup이 중복됨으로써
파일 접근 시에 오버헤드를 유발한다.
이러한 오버헤드는 고성능 저장 장치에서 더 크게부각 되는데 상대적으로 데이터 접근의 오버헤드가 적어지기 때문이다.
따라서 효율적인 metadata lookup operation을 위해 본 논문에서는 hash table 을 접근할 때 검색 방향을 거꾸로 하는 backward finding 을 제안한다.
이와 같은 방법으로 metadata lookup operation의 횟수를 줄이고 전체 파일 접근 시간을 줄일 수 있었다.
기존의 Log-Structured Merge (LSM) 알고리즘은 기존 저장 장치의 지연 시간이 길다는 것을 고려하여서
알고리즘 자체를 복잡한 data structure를 사용 하여 구현하였다. 하지만 이러한 복잡한 data structure 때문에 오히려 부작용으로써
read/write amplification 이늘어난다.
고성능 저장 장치를 활용한다면 굳이 복잡한 data structure 를 사용하지 않아도 data 의 접근을 빠르게 할 수있다.
따라서 본 논문에서는 간단한 data structure 를 사용하여서 기존LSM 알고리즘을 수정하고 또한 write amplification 을
유발하는 compaction 과정도 data의 범위에 맞춰서 파일에 append만 사용하여서 효율적으로바꾸었다.
이러한 알고리즘을 HBase에 구현하여서 실험한 결과 write throughput 은 향상되었고 read/write amplification은 줄어들었다.

Recently, storage devices have developed with the advancement of hardware technology, and high-performance storage devices such as Solid-State Drives (SSDs) have appeared. High-performance storage devices offer high bandwidth, low latency and high I/O parallelism. They eliminate the mechanical overhead of traditional Hard Disk Drives (HDDs), resulting in data access from tens to hundreds of times faster. However, if these high-performance storage devices are used in the existing software layers, the performance cannot be maximized due to the overhead of the software layers.
In this dissertation, we optimize the existing software layer to exploit the high-performance storage devices. To this end, we propose data management schemes in memory management and VFS layer Because high-performance storage devices have very low-latency to access data, the overheads in existing software layers becomes more visible.
The first software overhead pointed out in this dissertation is page reclaim overhead when data are accessed with memory-mapped interface. This interface maps the physical pages into the process’s virtual address space. The unmap overhead of the software layer is highlighted when reclaiming the memory-mapped pages based on a high-performance storage device. To reduce the un- map overhead, we propose a page recycling scheme and limit the unmap overhead to the corresponding application. With the scheme, we can increase the whole system performance.
The second is the metadata lookup operation. The metadata lookup operations that must be performed before path-based file access are redundant. The overhead of these redundant lookup operations becomes more visible when the data access latency becomes low on high-performance storage devices. Therefore, we propose a backward finding mechanism for efficient metadata operation. In this way, we can reduce the number of metadata operations.
The last scheme is for the Log-Structured Merge (LSM) algorithm. The traditional LSM algorithm is constructed assumed that the storage device is enough to slow. So, it has complicated in-memory data structure to reduce the storage access for data management. Therefore, LSM algorithm suffers from side effects such as write amplification. When using the high-performance storage devices, we don’t need to use complicated data structures because the storage latency is low. In this dissertation, we remove the software overhead in LSM algorithm by using the simplified data structure, and this data structure leads to reduce the write amplification.

Chapter 1 Introduction 1
1.1 ApproachesandContributions . 3
1.2 DissertationStructure 5
Chapter 2 Background and Motivation 6
2.1 Largescalesystems 6
2.2 High-performancestoragedevices 7
2.3 Exposedsoftwareoverheads. 8
2.3.1 Overhead of un-mapping in memory-mapped I/O . . . . . 8
2.3.2 Overhead of redundant metadata operations . . . . . . . . 12
2.3.3 Overhead of LSM algorithm in key-value store . . . . . . 15
Chapter 3 Design and Implementation 20
3.1 Memory-mapped I/O optimization . 20
3.1.1 Design. 20
3.1.2 Implementation 24
3.2 Metadataoperationoptimization 25
3.2.1 Design. 26
3.2.2 Implementation 32
3.3 LSMalgorithmoptimization. 33
3.3.1 Design. 33
3.3.2 Implementation 39
Chapter 4 Evaluation 42
4.1 Memory-mappedI/Operformance . 43
4.1.1 Syntheticbenchmarkresults. 43
4.1.2 Macrobenchmarkresults 46
4.2 Metadataoperationperformance 48
4.2.1 Microbenchmarks. 48
4.2.2 Real-worldworkload. 51
4.3 RLSMperformance 53
4.3.1 Writeperformance 54
4.3.2 Performance under the mixed workload . . . . . . . . . . 55
Chapter 5 Related Work 58
5.1 Efforttoadoptthechanges . 58
Chapter 6 Conclusion 68
요약 81

최근 본 자료

전체보기

구분	그룹	데이터 항목
AI 학습용 데이터	원문	원문 PDF 파일
AI 학습용 데이터	원문 + 메타 (기본/상세)	원문 PDF 파일 및 서지정보 CSV
대량 구매용 데이터	B2B 구독 방식	특정 자료 한정으로 원문 접근 권한 부여
대량 구매용 데이터	URL 전달 방식	바로 PDF 뷰어를 열람할 수 있는 URL 제공

구분	그룹	데이터 항목
AI 학습용 데이터	기본 메타	발행기관명, 간행물명, 권호명, 권(vol), 호(issue), 통권, 발행연도, 발행월, 논문명, 저자명, 시작페이지, 종료페이지, 전체페이지, 상세페이지URL
상세 메타 데이터	발행기관 메타	발행기관 이명, 영문명, 창립연도, 홈페이지URL, 발행기관 소개
	간행물 메타	부제목, 간행물 유형, ISSN, ISBN, 최초발행연도, 폐간연도, 간행빈도, 발행주기, 등재사항, 이용수, 피인용수, 권호수, 논문수, 표지이미지
	논문 메타	작성 언어, 부제목, 대등제목, 목차, 키워드, 초록, 이미지, 참고문헌, 이용수, 피인용수, 논문활용도, DBpia통합주제분류, KDC분류, DDC분류, 한국연구재단분류, UCI, DOI
	저자 메타	소속기관, 소속부서, 직급, 연구분야, 연구키워드, 이용수, 피인용수, 저자 논문활용도

구분	그룹	데이터 항목
※ 결합형/맞춤형 메타 데이터는 신청 내용에 따라 다양하게 제공 가능
이용순위 정보	주제분야별 많이 이용된 논문	“인문학”에서 많이 이용된 논문 TOP100
	이용기관별 많이 이용된 논문	“중고등학교”에서 많이 이용된 논문 TOP100
	세부기관별 많이 이용된 논문	“서울대학교”에서 많이 이용된 논문 TOP100
	키워드별 많이 이용된 논문	“Chat GPT”에서 많이 이용된 논문 TOP100
키워드 정보	많이 이용된 키워드	특정기간/분야/저널 내 많이 이용된 키워드
	많이 발행된 키워드	특정기간/분야/저널 내 많이 발행된 키워드
	많이 검색된 키워드	특정기간/분야/저널 내 많이 검색된 키워드
	연구 트렌드 키워드	특정 키워드 연관 연구동향 분석 데이터 키워드

논문 기본 정보

초록· 키워드

목차

최근 본 자료

댓글(0)