인문학
사회과학
자연과학
공학
의약학
농수해양학
예술체육학
복합학
지원사업
학술연구/단체지원/교육 등 연구자 활동을 지속하도록 DBpia가 지원하고 있어요.
커뮤니티
연구자들이 자신의 연구와 전문성을 널리 알리고, 새로운 협력의 기회를 만들 수 있는 네트워킹 공간이에요.
논문 기본 정보
- 자료유형
- 학위논문
- 저자정보
- 지도교수
- 오희국
- 발행연도
- 2022
- 저작권
- 한양대학교 논문은 저작권에 의해 보호받습니다.
이용수2
초록· 키워드
상세정보 수정요청해당 페이지 내 제목·저자·목차·페이지정보가 잘못된 경우 알려주세요!
ㅍㅐㅊㅣ ㅂㅜㄴㅅㅓㄱㅇㅡㄴ ㅇㅣㅈㅣㄴ ㅅㅣㄹㅎㅐㅇ ㅍㅏㅇㅣㄹㅇㅔㅅㅓ ㅍㅐㅊㅣㄷㅚㄴ ㅋㅗㄴㅌㅔㄴㅊㅡㄹㅡㄹ ㅌㅏㅁㅅㅐㄱㅎㅏㄴㅡㄴ ㅇㅕㄱㅇㅔㄴㅈㅣㄴㅣㅇㅓㄹㅣㅇ ㄱㅣㅂㅓㅂㅇㅣㅁㅕ ㅈㅓㄴㅌㅗㅇㅈㅓㄱ ㅇㅡㄹㅗ ㅊㅟㅇㅑㄱㅅㅓㅇ ㄱㅓㅁㅅㅐㄱ, 1-day ㄱㅗㅇㄱㅕㄱ(1-day exploit) ㄷㅡㅇㄱㅘ ㄱㅏㅌㅇㅡㄴ ㅇㅡㅇㅇㅛㅇ ㅍㅡㄹㅗㄱㅡㄹㅐㅁㅇㅔ ㅅㅏㅇㅛㅇㄷㅚㄴㄷㅏ. ㅂㅏㅇㅣㄴㅓㄹㅣ ㅅㅏㅇㅇㅔㅅㅓ ㅍㅐㅊㅣ ㅂㅜㄴㅅㅓㄱㅇㅡㄹ ㅇㅟㅎㅐㅅㅓㄴㅡㄴ ㄷㅜ ㄱㅏㅈㅣ ㄷㅏㄹㅡㄴ ㅂㅓㅈㅓㄴㅇㅢ ㅂㅏㅇㅣㄴㅓㄹㅣ ㅅㅣㄹㅎㅐㅇ ㅍㅏㅇㅣㄹㅇㅡㄹ ㅂㅣㄱㅛㅎㅏㄱㅗ ㅍㅐㅊㅣ/ㅅㅜ ㅈㅓㅇㄷㅚㄴ ㅎㅏㅁㅅㅜㄹㅡㄹ ㅊㅏㅈㅇㅏㅇㅑ ㅎㅏㅁㅕ, ㅍㅐㅊㅣㄱㅏ ㄷㅚㅈㅣ ㅇㅏㄶㅇㅡㄴ ㅎㅏㅁㅅㅜㄹㅡㄹ ㅍㅣㄹㅌㅓㄹㅣㅇㅎㅐㅇㅑ ㅎㅏㄴㄷㅏ. ㄹㅣㅂㅓㅅㅡ ㅇㅔㄴㅈㅣㄴㅣㅇㅓㄹㅣㅇㅇㅔ ㅅㅓ ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇ(binary diffing)ㅇㅡㄴ ㄷㅜ ㅂㅏㅇㅣㄴㅓㄹㅣ ㅍㅡㄹㅗㄱㅡㄹㅐㅁ ㅅㅏㅇㅣㅇㅢ ㄱㅣㄴㅡㅇ ㅊㅏㅇㅣㅇㅘ ㅇㅠㅅㅏㅅㅓㅇㅇㅡㄹ ㅂㅏㄹ ㄱㅕㄴㅎㅏㄴㅡㄴ ㅍㅡㄹㅗㅅㅔㅅㅡㅇㅣㅁㅕ, ㅈㅓㄴㅌㅗㅇㅈㅓㄱㅇㅡㄹㅗ ㅍㅐㅊㅣ ㅂㅜㄴㅅㅓㄱㅇㅡㄹ ㅇㅟㅎㅏㄴ ㅊㅚㅅㅓㄴㅇㅢ ㅅㅓㄴㅌㅐㄱㅇㅡㄹㅗ ㄱㅏㄴㅈㅜㄷㅚㄴㄷㅏ. ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇㅇㅔ ㄷㅐㅎㅏㄴ ㄱㅣㅈㅗㄴ ㅇㅕㄴㄱㅜㄴㅡㄴ ㅎㅏㅁㅅㅜ ㅁㅐㅊㅣㅇ ㅁㅜㄴㅈㅔㄹㅗ ㅈㅓㅂㄱㅡㄴㅎㅏㅇㅕ ㅎㅏㅁㅅㅜ ㄱㅏㄴㅇㅢ ㅊㅗㄱㅣ 1:1 ㅁㅐㅍㅣㅇㅇㅡㄹ ㄱㅗㅇㅅㅣㄱㅎㅘㅎㅏㄱㅗ, ㄴㅏㅈㅜㅇㅇㅔ ㅅㅣㅋㅝㄴㅅㅡ ㅁㅐㅊㅣㅇ ㅂㅣㅇㅠㄹㅇㅡㄹ ㄱㅖㅅㅏㄴㅎㅏㅇㅕ ㄷㅜ ㅎㅏㅁㅅㅜㄹㅡㄹ ㅈㅓㅇㅎㅘㄱㅎㅏㄴ ㅇㅣㄹㅊㅣ(ㅍㅐㅊㅣㄷㅚㅈㅣ ㅇㅏㄶㅇㅡㅁ), ㅂㅜㅂㅜㄴ ㅇㅣㄹㅊㅣ(ㅍㅐ ㅊㅣㄷㅚㅁ) ㄸㅗㄴㅡㄴ ㅇㅣㄹㅊㅣ ㅇㅓㅄㅇㅡㅁ(ㅇㅗㄹㅠ/ㅅㅐㄹㅗㅇㅜㄴ ㅎㅏㅁㅅㅜ)ㅇㅡㄹㅗ ㅂㅜㄴㄹㅠㅎㅏㄴㄷㅏ. ㅍㅐㅊㅣ ㅂㅜㄴㅅㅓㄱㅇㅡㄹ ㅇㅟㅎㅏㄴ ㄱㅕㅇㅎㅓㅁㅈㅓㄱㅇㅣㄴ ㅂㅜㄴㅅㅓㄱㅇㅔ ㅅㅓ, ㅂㅗㄴ ㅇㅕㄴㄱㅜㄴㅡㄴ ㄱㅣㅈㅗㄴ ㄱㅣㅅㅜㄹㅇㅢ ㅈㅓㅇㅎㅘㄱㄷㅗㄴㅡㄴ ㅈㅓㅇㅎㅘㄱㅎㅏㄴ ㅇㅣㄹㅊㅣㄹㅡㄹ ㄱㅏㅁㅈㅣㅎㅏㄹ ㄸㅐㅁㅏㄴ ㄱㅏㅈㅏㅇ ㅇㅜㅅㅜㅎㅏㅁㅕ ㅂㅜㅂㅜㄴㅈㅓㄱㅇㅡㄹㅗ ㅂㅕㄴㄱㅕㅇㄷㅚㄴ ㄱㅣㄴㅡㅇ, ㅌㅡㄱㅎㅣ CWE478, CWE476 ㄷㅡㅇㄱㅘ ㄱㅏㅌㅇㅡㄴ ㅅㅏㅅㅗㅎㅏㄴ ㅍㅐㅊㅣㄱㅏ ㅇㅣㅆㄴㅡㄴ ㄱㅣㄴㅡㅇㅇㅡㄹ ㅌㅏㅁㅈㅣㅎㅏㄴㅡㄴ ㄷㅔㅇㅔ ㄴㅡㄴ ㅎㅛㅇㅠㄹㅈㅓㄱㅇㅣㅈㅣ ㅇㅏㄶㄷㅏㄴㅡㄴ ㄱㅓㅅㅇㅡㄹ ㅂㅏㄹㄱㅕㄴㅎㅐㅆㄷㅏ. ㄱㅣㅈㅗㄴ ㅇㅕㄴㄱㅜㄱㅏ ㅇㅣㄹㅓㄴ ㄷㅏㄴㅈㅓㅁㅇㅡㄹ ㅂㅗㅇㅣㄴㅡㄴ ㄷㅔㅇㅔㄴㅡㄴ ㄷㅜ ㄱㅏㅈㅣ ㅇㅣㅇㅠㄱㅏ ㅇㅣㅆㄷㅏ: (i) 1:1 ㅁㅐㅍㅣㅇ ㄷㅏㄴㄱㅖㅇㅔㅅㅓ ㅎㅏㅁㅅㅜㅇㅢ ㅌㅡㄱㅈㅣㅇㅇㅡㄹ ㅇㅣㄹㅊㅣㅅㅣㅋㅣㄱㅣ ㅇㅟㅎㅐ ㅇㅓㅁㄱㅕㄱㅎㅏㄴ ㅈㅓㅇㅊㅐㄱㅇㅣ ㅅㅏㅇㅛㅇㄷㅚㅇㅓㅆㄷㅏ. ㄱㅣㅈㅗㄴ ㅇㅕㄴㄱㅜㄴㅡㄴ ㅇㅣㄹㄹㅕㄴㅇㅢ ㅎㅠㄹㅣㅅㅡ ㅌㅣㄱ(heuristic)ㅇㅡㄹ ㅈㅓㅇㅇㅢㅎㅏㄱㅗ ㅅㅜㄴㅊㅏㅈㅓㄱ ㅂㅏㅇㅅㅣㄱㅇㅡㄹㅗ ㅎㅏㅁㅅㅜㄹㅡㄹ ㅇㅣㄹㅊㅣㅅㅣㅋㅣㄴㅡㄴ ㄷㅔ ㅅㅏㅇㅛㅇㅎㅐㅆㄷㅏ. ㅇㅣㄹㅂㅜ ㅎㅠㄹㅣㅅㅡㅌㅣㄱ ㅇㅣ ㅈㅣㄴㅏㅊㅣㄱㅔ ㅅㅣㄴㄹㅚㄷㅚㅇㅓㅆㄱㅗ, ㅎㅠㄹㅣㅅㅡㅌㅣㄱㅇㅢ ㅇㅜㅅㅓㄴㅅㅜㄴㅇㅟㄹㅡㄹ ㅁㅣㄹㅣ ㅅㅓㄹㅈㅓㅇㅎㅐ ㄷㅜㅁㅇㅡㄹㅗㅆㅓ ㅁㅏㄶㅇㅡㄴ ㅈㅏㄹㅁㅗㅅㄷㅚㄴ ㅇㅣㄹㅊㅣ ㄱㅕㄹㄱㅘㄱㅏ ㅂㅏㄹㅅㅐㅇㅎㅏㅇㅕㅆㄷㅏ. (ii) ㅂㅜㄴㄹㅠ ㄷㅏㄴㄱㅖㅇㅔㅅㅓ ㅇㅓㅅㅔㅁㅂㅡㄹㄹㅣ ㅅㅡㄴㅣㅍㅔㅅ(assembly snippet)ㅇㅡㄹ ㅇㅣㄹㅂㅏㄴ ㅌㅔㄱㅅㅡㅌㅡㄹㅗ ㄱㅏㄴㅈㅜㅎㅏㄱㅗ ㅇㅠㅅㅏㅅㅓㅇ ㅂㅣㄱㅛㄹㅡㄹ ㅇㅟㅎㅐ ㅅㅣㅋㅝㄴㅅㅡ ㅇㅣㄹㅊㅣ ㅇㅏㄹㄱㅗㄹㅣㅈㅡㅁㅇㅡㄹ ㅅㅏㅇㅛㅇㅎㅏㄴㄷㅏ. ㅁㅕㅇㄹㅕㅇㅇㅓㄴㅡㄴ ㄷㅗㄱㅌㅡㄱㅎㅏㄴ ㄱㅜㅈㅗㄹㅡㄹ ㄱㅏㅈㅣㄱㅗ ㅇㅣㅆㄷㅏ. ㅈㅡㄱ, ㄴㅣㅁㅗㄴㅣㄱ(mnemonic)ㄱㅘ ㄹㅔㅈㅣㅅㅡㅌㅓㄴㅡㄴ ㅁㅕㅇㄹㅕㅇㅇㅓㅇㅔㅅㅓ ㅌㅡㄱㅈㅓㅇㅎㅏㄴ ㅇㅟㅊㅣㄹㅡㄹ ㄱㅏㅈㅣㅁㅕ, ㄸㅗㅎㅏㄴ ㅇㅢㅁㅣㅈㅓㄱ ㄱㅘㄴㄱㅖ(semantic relationship)ㄹㅡㄹ ㄱㅏㅈㅣㄱㅗ ㅇㅣㅆㄷㅏㄴㅡㄴ ㅈㅓㅁㅇㅔㅅㅓ ㅇㅓㅅㅔㅁㅂㅡㄹㄹㅣ ㅋㅗㄷㅡㄹㅡㄹ ㅇㅣㄹㅂㅏㄴ ㅌㅔㄱㅅㅡㅌㅡ ㅇㅘ ㄷㅏㄹㅡㄱㅔ ㅁㅏㄴㄷㅡㄴㄷㅏ. ㄱㅕㅇㅎㅓㅁㅈㅓㄱ ㅂㅜㄴㅅㅓㄱㅇㅔㅅㅓ, ㅂㅗㄴ ㅇㅕㄴㄱㅜㄴㅡㄴ ㅍㅐㅊㅣ ㄸㅗㄴㅡㄴ ㅋㅓㅁㅍㅏㅇㅣㄹㄹㅓㄱㅏ ㄷㅗㅇㅣㅂㅎㅏㄴ ㅁㅜㅈㅏㄱㅇㅟㅅㅓㅇㅇㅔ ㅇㅢ ㅎㅐ ㅇㅑㄱㅣㄷㅚㄴㅡㄴ ㅈㅏㄱㅇㅡㄴ ㅅㅜㅈㅜㄴㅇㅢ ㅁㅕㅇㄹㅕㅇㅇㅓ ㅂㅕㄴㄱㅕㅇㅇㅡㄴ ㅅㅣㅋㅝㄴㅅㅡ ㅇㅣㄹㅊㅣ ㅇㅏㄹㄱㅗㄹㅣㅈㅡㅁㅇㅢ ㄱㅘㄴㅈㅓㅁㅇㅡㄹㅗㄴㅡㄴ ㄱㅓㅇㅢ ㄷㅗㅇㅇㅣㄹㅎㅏㄱㅔ ㅊㅟㄱㅡㅂㄷㅚㄴㅡㄴ ㄱㅓㅅㅇㅡㄹ ㅂㅏㄹㄱㅕㄴㅎㅐㅆㄷㅏ. ㅅㅣㅋㅝㄴㅅㅡ ㅇㅣㄹㅊㅣㄴㅡㄴ ㅇㅣㄹㅂㅏㄴ ㅌㅔㄱㅅㅡㅌㅡㅇㅔㅅㅓㄴㅡㄴ ㅎㅛㄱㅘㅈㅓㄱㅇㅣㅈㅣㅁㅏㄴ, ㅁㅕㅇㄹㅕㅇㅇㅓ ㅅㅜㅈㅜㄴㅇㅔㅅㅓ ㄴㅡㄴ ㄱㅜㅈㅗㅈㅓㄱ ㅁㅣㅊ ㅇㅢㅁㅣㅈㅓㄱ ㅂㅕㄴㅎㅘㄹㅡㄹ ㄱㅏㅁㅈㅣㅎㅏㅈㅣ ㅁㅗㅅㅎㅏㅁㅡㄹㅗ ㅂㅜㄴㄹㅠㅇㅔ ㄱㅡㄴㅑㅇ ㅅㅏㅇㅛㅇㅎㅏㄹ ㄱㅕㅇㅇㅜ ㅁㅏㄶㅇㅡㄴ ㅈㅏㄹㅁㅗㅅㄷㅚㄴ ㄱㅕㄹㄱㅘ ㅇㅢ ㅇㅝㄴㅇㅣㄴㅇㅣ ㄷㅚㄴㄷㅏ. ㅂㅗㄴ ㄴㅗㄴㅁㅜㄴㅇㅔㅅㅓㄴㅡㄴ ㄷㅜ ㄱㅏㅈㅣ ㅅㅗㄹㄹㅜㅅㅕㄴㅇㅡㄹ ㅈㅔㅇㅏㄴㅎㅏㅁㅇㅡㄹㅗㅆㅓ ㅇㅏㅍㅅㅓ ㅇㅓㄴㄱㅡㅂㅎㅏㄴ ㄱㅡㄴㅂㅗㄴㅈㅓㄱㅇㅣㄴ ㅁㅜㄴㅈㅔㄹㅡㄹ ㅎㅐㄱㅕㄹㅎㅐㅆㄷㅏ. ㅊㅓㅅ ㅉㅐ, 1:1 ㅁㅐㅍㅣㅇ ㄷㅏㄴㄱㅖㄹㅡㄹ ㅇㅟㅎㅐ, ㅇㅓㅂㄱㅖ ㅍㅛㅈㅜㄴ ㄷㅗㄱㅜㅇㅣㄴ Diaphoraㅇㅔㅅㅓ ㅅㅏㅇㅛㅇㅎㅏㄴㅡㄴ ㄱㅏㄱ ㅎㅠㄹㅣㅅㅡㅌㅣㄱㅇㅢ ㄷㅏㄴㅈㅓㅁㅇㅡㄹ ㅂㅜㄴㅅㅓㄱㅎㅏㄱㅗ, ㄱㅖㅅㅏㄴㅈㅓㄱㅇㅡㄹㅗ ㅈㅓㄹㅕㅁㅎㅏㄴ ㅌㅡㄱㅈㅣㅇ ㅂㅔㄱㅌㅓ(feature vector) ㅅㅔㅌㅡㄹㅡㄹ ㅈㅔㅇㅏㄴㅎㅐㅆㅇㅡㅁㅕ, Diaphoraㅇㅢ ㅎㅠ ㄹㅣㅅㅡㅌㅣㄱㅇㅡㄹ ㅂㅗㄴ ㄴㅗㄴㅁㅜㄴㅇㅔㅅㅓ ㅎㅏㅁㅅㅜ ㅁㅐㅍㅣㅇ ㄱㅘㅈㅓㅇㄱㅘ ㅍㅣㄹㅌㅓㄹㅣㅇ ㄱㅘㅈㅓㅇㅇㅡㄹ ㅇㅟㅎㅐ ㅈㅔㅇㅏㄴㅎㅏㄴ ㄱㅓㄹㅣ ㄱㅣ ㅂㅏㄴ(distance-based) ㅅㅓㄴㅈㅓㅇ ㄱㅣㅈㅜㄴㄱㅘ ㅂㅣㄱㅛㅎㅏㅇㅕㅆㄷㅏ. ㄷㅜㄹㅉㅐ, ㅂㅜㄴㄹㅠ ㄷㅏㄴㄱㅖㄹㅡㄹ ㅇㅟㅎㅐ, ㅇㅜㄹㅣㄴㅡㄴ ㄱㅏㄱ ㅂㅜㄴㄱㅣㄱㅏ ㅇㅓㅌㅔㄴ ㅅㅕㄴ(attention) ㄱㅣㅂㅏㄴ ㅂㅜㄴㅅㅏㄴ ㅎㅏㄱㅅㅡㅂ ㄴㅐㅈㅏㅇ ㅅㅣㄴㄱㅕㅇㅁㅏㅇㅇㅣㄴ ㅅㅑㅁ ㅇㅣㅈㅣㄴ ㅂㅜㄴㄹㅠ ㅅㅣㄴㄱㅕㅇㅁㅏㅇㅇㅡㄹ ㅈㅔㅇㅏㄴㅎㅐㅆㄷㅏ. ㅇㅣ ㄴㅔㅌㅡㅇㅝㅋㅡㄴㅡㄴ ㅇㅓㅅㅔㅁㅂㅡㄹㄹㅣ ㅁㅕㅇㄹㅕㅇㅇㅓ ㄱㅏㄴㅇㅢ ㅇㅢㅁㅣ ㅇㅠㅅㅏㅅㅓㅇㅇㅡㄹ ㅎㅏㄱㅅㅡㅂㅎㅏㄱㅗ, ㅁㅕㅇㄹㅕㅇㅇㅓ ㅅㅜㅈㅜㄴㅇㅔㅅㅓ ㅅㅣㄹㅈㅔㄹㅗ ㅂㅏㄹㅅㅐㅇㅎㅏㄴ ㅂㅕㄴㅎㅘㄹㅡㄹ ㄱㅏㅇ ㅈㅗㅎㅏㄴㅡㄴ ㅂㅏㅇㅂㅓㅂㅇㅡㄹ ㅂㅐㅇㅜㄱㅗ, ㅊㅚㅈㅗㅇ ㄷㅏㄴㄱㅖㅇㅔㅅㅓㄴㅡㄴ ㅇㅘㄴㅈㅓㄴ ㅇㅕㄴㄱㅕㄹ ㄱㅖㅊㅡㅇ(fully connected layer)ㅇㅡㄹ ㅌㅗㅇㅎㅐ ㅇㅣㄹㅊㅣ ㅎㅏㄱㅓㄴㅏ ㅂㅜㅂㅜㄴㅈㅓㄱㅇㅡㄹㅗ ㅇㅣㄹㅊㅣㅎㅏㄴㅡㄴ ㄷㅜ ㄱㅐㅇㅢ 1:1 ㅁㅐㅍㅣㅇㄷㅚㄴ ㅎㅏㅁㅅㅜㄹㅡㄹ ㅂㅜㄴㄹㅠㅎㅏㄴㅡㄴ ㄱㅓㅅㅇㅡㄹ ㅂㅐㅇㅜㄱㅔ ㄷㅚㄴㄷㅏ. ㅈㅔㅇㅏㄴㄷㅚㄴ ㅅㅣㄴㄱㅕㅇㅁㅏㅇㅇㅡㄴ ㅁㅕㅇㄹㅕㅇㅇㅓ ㅅㅜㅈㅜㄴㅇㅔㅅㅓ ㅋㅓㅁㅍㅏㅇㅣㄹㄹㅓㄹㅗ ㅇㅣㄴㅎㅏㄴ ㅂㅕㄴㄱㅕㅇㄱㅘ ㅍㅐㅊㅣ ㄱㅣㅂㅏㄴ ㅂㅕㄴㄱㅕㅇㅇㅡㄹ ㄱㅜㅂㅕㄹㅎㅏㄹ ㅅㅜ ㅇㅣㅆㅇㅡㄹ ㅈㅓㅇㄷㅗ ㄹㅗ ㅈㅓㅇㄱㅛㅎㅏㄷㅏ. ㅁㅏㅈㅣㅁㅏㄱㅇㅡㄹㅗ, ㅈㅔㅇㅏㄴㄷㅚㄴ 1:1 ㅁㅐㅍㅣㅇ ㄷㅏㄴㄱㅖㅇㅘ ㅂㅜㄴㄹㅠ ㄷㅏㄴㄱㅖㄹㅡㄹ ㅌㅗㅇㅎㅏㅂㅎㅏㄴㅡㄴ ㅎㅛㅇㅠㄹㅈㅓㄱㅇㅣㄴ ㅅㅣㄴㄱㅕㅇㅁㅏㅇ ㅈㅣ ㅇㅝㄴ ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇ ㅇㅏㄹㄱㅗㄹㅣㅈㅡㅁㅇㅡㄹ ㅈㅔㅇㅏㄴㅎㅐㅆㄷㅏ. ㅈㅔㅇㅏㄴㄷㅚㄴ ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇ ㅇㅏㄹㄱㅗㄹㅣㅈㅡㅁㅇㅡㄴ ㄷㅜ ㅂㅏㅇㅣㄴㅓㄹㅣ ㅎㅏㅁ ㅅㅜㄹㅡㄹ ㅈㅓㅇㅎㅘㄱㅎㅣ ㅇㅣㄹㅊㅣ, ㅂㅜㅂㅜㄴ ㅇㅣㄹㅊㅣ ㄸㅗㄴㅡㄴ ㅇㅣㄹㅊㅣ ㅇㅓㅄㅇㅡㅁ ㅅㅏㅇㅌㅐㄹㅗ ㅈㅓㅇㅎㅘㄱㅎㅏㄱㅔ ㅂㅜㄴㄹㅠㅎㅏㄴㄷㅏ. ㅂㅗㄴ ㄴㅗㄴㅁㅜㄴㅇㅡㄴ ㅈㅔㅇㅏㄴㄷㅚㄴ ㅌㅡㄱㅈㅣㅇ ㅂㅔㄱㅌㅓ, ㄷㅏㅇㅑㅇㅎㅏㄴ ㄷㅣㅈㅏㅇㅣㄴ ㅁㅣㅊ ㅅㅣㄴㄱㅕㅇㅁㅏㅇㅇㅢ ㅁㅐㄱㅐ ㅂㅕㄴㅅㅜㄹㅡㄹ ㅊㅓㄹㅈㅓㅎㅣ ㅍㅕㅇㄱㅏㅎㅐㅆㄷㅏ. ㅅㅣㄴㄱㅕㅇ ㅁㅏㅇㅇㅡㄹ ㅎㅜㄴㄹㅕㄴㅅㅣㅋㅣㄱㅣ ㅇㅟㅎㅐ, x86 XNU ㅋㅓㄴㅓㄹ ㅂㅏㅇㅣㄴㅓㄹㅣㄱㅏ ㅅㅏㅇㅛㅇㄷㅚㅇㅓㅆㅇㅡㅁㅕ ㅋㅓㄴㅓㄹ ㅂㅏㅇㅣㄴㅓㄹㅣㅇㅘ CWE ㄷㅔ ㅇㅣㅌㅓ ㅅㅔㅌㅡㅇㅔ ㄷㅐㅎㅐ ㅈㅔㅇㅏㄴㄷㅚㄴ ㅅㅣㄴㄱㅕㅇㅁㅏㅇ ㅈㅣㅇㅝㄴ ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇ ㅇㅏㄹㄱㅗㄹㅣㅈㅡㅁㅇㅡㄹ ㅍㅕㅇㄱㅏㅎㅐㅆㄷㅏ. ㅂㅗㄴ ㄴㅗㄴㅁㅜㄴㅇㅢ ㅇㅏㄹㄱㅗ ㄹㅣㅈㅡㅁㅇㅡㄴ ㄱㅣㅈㅗㄴ ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇ ㄱㅣㅂㅓㅂㄱㅘ ㅌㅜㄹㅂㅗㄷㅏ ㄴㅗㅍㅇㅡㄴ ∼99%ㅇㅢ ㅂㅜㄴㄹㅠ ㅈㅓㅇㅎㅘㄱㄷㅗㄹㅡㄹ ㄷㅏㄹㅅㅓㅇㅎㅐㅆㄷㅏ.
Patch analysis is a reverse engineering technique to explore the patched content in binary executables and traditionally it is being used for applications like vulnerability discovery, 1-day exploit generation, etc. For patch
analysis at the binary level, we need to compare two different versions of a binary executable and find the functions that were patched/modified; while filtering the unpatched functions. In reverse engineering, binary
diffing is a process to discover the differences and similarities in functionality between two binary programs and is traditionally considered the best choice for patch analysis. Previous research on binary diffing approaches it as a function matching problem to formulate an initial 1:1 mapping between functions, and later a sequence matching ratio is computed to classify two functions being an exact match (unpatched), a partial match (patched) or no-match (error/new functions). In our empirical analysis for patch analysis, we have discovered that the accuracy of existing techniques is best only when detecting exact matches and they are not
efficient in detecting partially changed functions; especially those with minor patches like CWE478, CWE476, etc.
The drawbacks in existing research are due to two major challenges (i) In the 1:1 mapping phase, using a strict policy to match function features. Existing research defines a set of heuristics and uses them to match functions in a sequential manner. They have overtrusted some heuristics and prioritizing them produces many false matching results. (ii) In the classification phase, consider an assembly snippet as a normal text, and use a sequence matching algorithm for similarity comparison. Instruction has a unique structure i.e. mnemonics and registers have a specific position in instruction and also have a semantic relationship, which makes assembly code different from general text. In our empirical analysis, we have discovered that the small instruction-level changes either caused by a patch or a compiler introduced randomness are pretty much the same for a sequence matching algorithm. Sequence matching performs best for general text but it fails to detect structural and semantic changes at an instruction level thus, its use for classification produces many false results.
In this dissertation, we have addressed the aforementioned underlying challenges by proposing a two-fold solution. First, for the 1:1 mapping phase, we have empirically analyzed heuristics in Diaphora ? an industrystandard tool, discovered drawbacks of each heuristic and proposed a set of computationally inexpensive feature vectors, which are later comparedwith a distance-based selection criteria to map similar functions and filter unmatched functions. Second, for the classification phase, we have proposed a Siamese binary-classification neural network where each branch is an attention-based distributed learning embedding neural network ? that learn the semantic similarity among assembly instructions, learn to highlight the actual changes at an instruction level and a final stage fully connected layer learn to accurately classify two 1:1 mapped function either an exact or a partial match. The proposed neural network is sophisticated enough to differentiate between the compiler-caused and patched-based changes at an instruction level. Finally, we have proposed an efficient neural network-assisted binary diffing algorithm that is an integration of our proposed 1:1 mapping phase and the classification phase. The proposed binary diffing algorithm accurately classifies the two binary functions being exact match, partial, or no-match.
We have thoroughly evaluated the proposed feature vectors, different design choices, and parameters of the neural network. For training the neural network, we have used x86 XNU kernel binaries and evaluated the proposed neural network-assisted binary diffing algorithm on kernel binaries (not included in training) and the CWE dataset. We have achieved ∼99% classification accuracy; which is higher than existing binary diffing
techniques and tools.
Patch analysis is a reverse engineering technique to explore the patched content in binary executables and traditionally it is being used for applications like vulnerability discovery, 1-day exploit generation, etc. For patch
analysis at the binary level, we need to compare two different versions of a binary executable and find the functions that were patched/modified; while filtering the unpatched functions. In reverse engineering, binary
diffing is a process to discover the differences and similarities in functionality between two binary programs and is traditionally considered the best choice for patch analysis. Previous research on binary diffing approaches it as a function matching problem to formulate an initial 1:1 mapping between functions, and later a sequence matching ratio is computed to classify two functions being an exact match (unpatched), a partial match (patched) or no-match (error/new functions). In our empirical analysis for patch analysis, we have discovered that the accuracy of existing techniques is best only when detecting exact matches and they are not
efficient in detecting partially changed functions; especially those with minor patches like CWE478, CWE476, etc.
The drawbacks in existing research are due to two major challenges (i) In the 1:1 mapping phase, using a strict policy to match function features. Existing research defines a set of heuristics and uses them to match functions in a sequential manner. They have overtrusted some heuristics and prioritizing them produces many false matching results. (ii) In the classification phase, consider an assembly snippet as a normal text, and use a sequence matching algorithm for similarity comparison. Instruction has a unique structure i.e. mnemonics and registers have a specific position in instruction and also have a semantic relationship, which makes assembly code different from general text. In our empirical analysis, we have discovered that the small instruction-level changes either caused by a patch or a compiler introduced randomness are pretty much the same for a sequence matching algorithm. Sequence matching performs best for general text but it fails to detect structural and semantic changes at an instruction level thus, its use for classification produces many false results.
In this dissertation, we have addressed the aforementioned underlying challenges by proposing a two-fold solution. First, for the 1:1 mapping phase, we have empirically analyzed heuristics in Diaphora ? an industrystandard tool, discovered drawbacks of each heuristic and proposed a set of computationally inexpensive feature vectors, which are later comparedwith a distance-based selection criteria to map similar functions and filter unmatched functions. Second, for the classification phase, we have proposed a Siamese binary-classification neural network where each branch is an attention-based distributed learning embedding neural network ? that learn the semantic similarity among assembly instructions, learn to highlight the actual changes at an instruction level and a final stage fully connected layer learn to accurately classify two 1:1 mapped function either an exact or a partial match. The proposed neural network is sophisticated enough to differentiate between the compiler-caused and patched-based changes at an instruction level. Finally, we have proposed an efficient neural network-assisted binary diffing algorithm that is an integration of our proposed 1:1 mapping phase and the classification phase. The proposed binary diffing algorithm accurately classifies the two binary functions being exact match, partial, or no-match.
We have thoroughly evaluated the proposed feature vectors, different design choices, and parameters of the neural network. For training the neural network, we have used x86 XNU kernel binaries and evaluated the proposed neural network-assisted binary diffing algorithm on kernel binaries (not included in training) and the CWE dataset. We have achieved ∼99% classification accuracy; which is higher than existing binary diffing
techniques and tools.
목차
- 1. Introduction 11.1 Overview: Patch Analysis and Binary Diffing 11.1.1 Binary Diffing 31.2 Motivation 51.2.1 Problem Definition 91.2.2 Assumptions 111.3 Contributions 111.4 Organization of Dissertation 122. Background and Preliminaries 142.1 Function matching and Binary diffing 142.2 Function Matching Related Work 152.2.1 Binary Diffing 152.2.2 Binary Code Clones 172.2.3 Deep Learning 182.3 Binary Diffing Tool Analysis ? Diaphora 203 Feature Engineering ? 1:1 mapping phase 253.1 Tally Vector 253.2 Edge Type Vector 273.3 Vertex Type Vector 283.4 Vertex degree Vector 293.5 Digraph Dominance Relationship (DDR) 303.5.1 Piecewise Hashing 333.5.2 Projection based Hashing 333.6 Opcode vector 343.7 Assembly Embedding Representation 353.7.1 An Instruction Embedding Model 373.7.2 Function Modeling 373.8 Proposed Function Matching (FMA) 393.8.1 Features Encoding 403.8.2 Representation Vector Matching 413.8.3 Function Matching Algorithm 434 Attention based Siamese binary classification neural network 474.1 Modeling Assembly Functions 474.1.1 Assembly Formatting 484.1.2 Assembly Representation 494.1.3 Dataset Collection 524.2 Proposed Learning Model 554.2.1 Assembly as a Bag of Instructions 584.2.2 Attention Model 594.2.3 Siamese Binary-Classification Model 644.2.4 Training 664.2.5 Utility 674.3 Diffing Algorithm 684.4 Design Decisions and Limitations 704.4.1 Oneshot vs Sequential 704.4.2 Expressiveness of the Embedding Layers 704.4.3 Granularity of Operands Tokenization 714.4.4 Distance Function 725 Empirical Evaluation 735.1 Test Environment 735.2 Empirical evaluation ? 1:1 mapping phase 765.3 Empirical evaluation ? classification phase 785.3.1 RQ1a: Training Accuracy 805.3.2 RQ1bPrediction Accuracy 815.3.3 RQ2: Alternative Designs Comparison 865.3.4 RQ3: Effect of Attention Mechanism 895.3.5 RQ4: Comparison with binary diffing Baselines 905.3.6 RQ5 : Evaluation for CWEs Binary Dataset 956 Case Studies 986.1 Case Study ? CVE-2019-8605 987 Conclusions 1017.1 Summary 1017.2 Precautions for using Neural Network 1027.2.1 Parameter l 1027.2.2 Architecture 1027.2.3 Optimizations 1037.3 Importing to Register-based Architectures 1037.3.1 1:1 mapping phase 1037.3.2 Classification phase 1047.4 Importing to Stack-based Architectures 104