메뉴 건너뛰기

추천
검색

논문 기본 정보

자료유형
학위논문
저자정보

(한양대학교, 한양대학교 대학원)

지도교수
오희국
발행연도
저작권
한양대학교 논문은 저작권에 의해 보호받습니다.

이용수2

표지
AI에게 요청하기
추천
검색

초록· 키워드

상세정보 수정요청해당 페이지 내 제목·저자·목차·페이지
정보가 잘못된 경우 알려주세요!
ㅍㅐㅊㅣ ㅂㅜㄴㅅㅓㄱㅇㅡㄴ ㅇㅣㅈㅣㄴ ㅅㅣㄹㅎㅐㅇ ㅍㅏㅇㅣㄹㅇㅔㅅㅓ ㅍㅐㅊㅣㄷㅚㄴ ㅋㅗㄴㅌㅔㄴㅊㅡㄹㅡㄹ ㅌㅏㅁㅅㅐㄱㅎㅏㄴㅡㄴ ㅇㅕㄱㅇㅔㄴㅈㅣㄴㅣㅇㅓㄹㅣㅇ ㄱㅣㅂㅓㅂㅇㅣㅁㅕ ㅈㅓㄴㅌㅗㅇㅈㅓㄱ ㅇㅡㄹㅗ ㅊㅟㅇㅑㄱㅅㅓㅇ ㄱㅓㅁㅅㅐㄱ, 1-day ㄱㅗㅇㄱㅕㄱ(1-day exploit) ㄷㅡㅇㄱㅘ ㄱㅏㅌㅇㅡㄴ ㅇㅡㅇㅇㅛㅇ ㅍㅡㄹㅗㄱㅡㄹㅐㅁㅇㅔ ㅅㅏㅇㅛㅇㄷㅚㄴㄷㅏ. ㅂㅏㅇㅣㄴㅓㄹㅣ ㅅㅏㅇㅇㅔㅅㅓ ㅍㅐㅊㅣ ㅂㅜㄴㅅㅓㄱㅇㅡㄹ ㅇㅟㅎㅐㅅㅓㄴㅡㄴ ㄷㅜ ㄱㅏㅈㅣ ㄷㅏㄹㅡㄴ ㅂㅓㅈㅓㄴㅇㅢ ㅂㅏㅇㅣㄴㅓㄹㅣ ㅅㅣㄹㅎㅐㅇ ㅍㅏㅇㅣㄹㅇㅡㄹ ㅂㅣㄱㅛㅎㅏㄱㅗ ㅍㅐㅊㅣ/ㅅㅜ ㅈㅓㅇㄷㅚㄴ ㅎㅏㅁㅅㅜㄹㅡㄹ ㅊㅏㅈㅇㅏㅇㅑ ㅎㅏㅁㅕ, ㅍㅐㅊㅣㄱㅏ ㄷㅚㅈㅣ ㅇㅏㄶㅇㅡㄴ ㅎㅏㅁㅅㅜㄹㅡㄹ ㅍㅣㄹㅌㅓㄹㅣㅇㅎㅐㅇㅑ ㅎㅏㄴㄷㅏ. ㄹㅣㅂㅓㅅㅡ ㅇㅔㄴㅈㅣㄴㅣㅇㅓㄹㅣㅇㅇㅔ ㅅㅓ ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇ(binary diffing)ㅇㅡㄴ ㄷㅜ ㅂㅏㅇㅣㄴㅓㄹㅣ ㅍㅡㄹㅗㄱㅡㄹㅐㅁ ㅅㅏㅇㅣㅇㅢ ㄱㅣㄴㅡㅇ ㅊㅏㅇㅣㅇㅘ ㅇㅠㅅㅏㅅㅓㅇㅇㅡㄹ ㅂㅏㄹ ㄱㅕㄴㅎㅏㄴㅡㄴ ㅍㅡㄹㅗㅅㅔㅅㅡㅇㅣㅁㅕ, ㅈㅓㄴㅌㅗㅇㅈㅓㄱㅇㅡㄹㅗ ㅍㅐㅊㅣ ㅂㅜㄴㅅㅓㄱㅇㅡㄹ ㅇㅟㅎㅏㄴ ㅊㅚㅅㅓㄴㅇㅢ ㅅㅓㄴㅌㅐㄱㅇㅡㄹㅗ ㄱㅏㄴㅈㅜㄷㅚㄴㄷㅏ. ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇㅇㅔ ㄷㅐㅎㅏㄴ ㄱㅣㅈㅗㄴ ㅇㅕㄴㄱㅜㄴㅡㄴ ㅎㅏㅁㅅㅜ ㅁㅐㅊㅣㅇ ㅁㅜㄴㅈㅔㄹㅗ ㅈㅓㅂㄱㅡㄴㅎㅏㅇㅕ ㅎㅏㅁㅅㅜ ㄱㅏㄴㅇㅢ ㅊㅗㄱㅣ 1:1 ㅁㅐㅍㅣㅇㅇㅡㄹ ㄱㅗㅇㅅㅣㄱㅎㅘㅎㅏㄱㅗ, ㄴㅏㅈㅜㅇㅇㅔ ㅅㅣㅋㅝㄴㅅㅡ ㅁㅐㅊㅣㅇ ㅂㅣㅇㅠㄹㅇㅡㄹ ㄱㅖㅅㅏㄴㅎㅏㅇㅕ ㄷㅜ ㅎㅏㅁㅅㅜㄹㅡㄹ ㅈㅓㅇㅎㅘㄱㅎㅏㄴ ㅇㅣㄹㅊㅣ(ㅍㅐㅊㅣㄷㅚㅈㅣ ㅇㅏㄶㅇㅡㅁ), ㅂㅜㅂㅜㄴ ㅇㅣㄹㅊㅣ(ㅍㅐ ㅊㅣㄷㅚㅁ) ㄸㅗㄴㅡㄴ ㅇㅣㄹㅊㅣ ㅇㅓㅄㅇㅡㅁ(ㅇㅗㄹㅠ/ㅅㅐㄹㅗㅇㅜㄴ ㅎㅏㅁㅅㅜ)ㅇㅡㄹㅗ ㅂㅜㄴㄹㅠㅎㅏㄴㄷㅏ. ㅍㅐㅊㅣ ㅂㅜㄴㅅㅓㄱㅇㅡㄹ ㅇㅟㅎㅏㄴ ㄱㅕㅇㅎㅓㅁㅈㅓㄱㅇㅣㄴ ㅂㅜㄴㅅㅓㄱㅇㅔ ㅅㅓ, ㅂㅗㄴ ㅇㅕㄴㄱㅜㄴㅡㄴ ㄱㅣㅈㅗㄴ ㄱㅣㅅㅜㄹㅇㅢ ㅈㅓㅇㅎㅘㄱㄷㅗㄴㅡㄴ ㅈㅓㅇㅎㅘㄱㅎㅏㄴ ㅇㅣㄹㅊㅣㄹㅡㄹ ㄱㅏㅁㅈㅣㅎㅏㄹ ㄸㅐㅁㅏㄴ ㄱㅏㅈㅏㅇ ㅇㅜㅅㅜㅎㅏㅁㅕ ㅂㅜㅂㅜㄴㅈㅓㄱㅇㅡㄹㅗ ㅂㅕㄴㄱㅕㅇㄷㅚㄴ ㄱㅣㄴㅡㅇ, ㅌㅡㄱㅎㅣ CWE478, CWE476 ㄷㅡㅇㄱㅘ ㄱㅏㅌㅇㅡㄴ ㅅㅏㅅㅗㅎㅏㄴ ㅍㅐㅊㅣㄱㅏ ㅇㅣㅆㄴㅡㄴ ㄱㅣㄴㅡㅇㅇㅡㄹ ㅌㅏㅁㅈㅣㅎㅏㄴㅡㄴ ㄷㅔㅇㅔ ㄴㅡㄴ ㅎㅛㅇㅠㄹㅈㅓㄱㅇㅣㅈㅣ ㅇㅏㄶㄷㅏㄴㅡㄴ ㄱㅓㅅㅇㅡㄹ ㅂㅏㄹㄱㅕㄴㅎㅐㅆㄷㅏ. ㄱㅣㅈㅗㄴ ㅇㅕㄴㄱㅜㄱㅏ ㅇㅣㄹㅓㄴ ㄷㅏㄴㅈㅓㅁㅇㅡㄹ ㅂㅗㅇㅣㄴㅡㄴ ㄷㅔㅇㅔㄴㅡㄴ ㄷㅜ ㄱㅏㅈㅣ ㅇㅣㅇㅠㄱㅏ ㅇㅣㅆㄷㅏ: (i) 1:1 ㅁㅐㅍㅣㅇ ㄷㅏㄴㄱㅖㅇㅔㅅㅓ ㅎㅏㅁㅅㅜㅇㅢ ㅌㅡㄱㅈㅣㅇㅇㅡㄹ ㅇㅣㄹㅊㅣㅅㅣㅋㅣㄱㅣ ㅇㅟㅎㅐ ㅇㅓㅁㄱㅕㄱㅎㅏㄴ ㅈㅓㅇㅊㅐㄱㅇㅣ ㅅㅏㅇㅛㅇㄷㅚㅇㅓㅆㄷㅏ. ㄱㅣㅈㅗㄴ ㅇㅕㄴㄱㅜㄴㅡㄴ ㅇㅣㄹㄹㅕㄴㅇㅢ ㅎㅠㄹㅣㅅㅡ ㅌㅣㄱ(heuristic)ㅇㅡㄹ ㅈㅓㅇㅇㅢㅎㅏㄱㅗ ㅅㅜㄴㅊㅏㅈㅓㄱ ㅂㅏㅇㅅㅣㄱㅇㅡㄹㅗ ㅎㅏㅁㅅㅜㄹㅡㄹ ㅇㅣㄹㅊㅣㅅㅣㅋㅣㄴㅡㄴ ㄷㅔ ㅅㅏㅇㅛㅇㅎㅐㅆㄷㅏ. ㅇㅣㄹㅂㅜ ㅎㅠㄹㅣㅅㅡㅌㅣㄱ ㅇㅣ ㅈㅣㄴㅏㅊㅣㄱㅔ ㅅㅣㄴㄹㅚㄷㅚㅇㅓㅆㄱㅗ, ㅎㅠㄹㅣㅅㅡㅌㅣㄱㅇㅢ ㅇㅜㅅㅓㄴㅅㅜㄴㅇㅟㄹㅡㄹ ㅁㅣㄹㅣ ㅅㅓㄹㅈㅓㅇㅎㅐ ㄷㅜㅁㅇㅡㄹㅗㅆㅓ ㅁㅏㄶㅇㅡㄴ ㅈㅏㄹㅁㅗㅅㄷㅚㄴ ㅇㅣㄹㅊㅣ ㄱㅕㄹㄱㅘㄱㅏ ㅂㅏㄹㅅㅐㅇㅎㅏㅇㅕㅆㄷㅏ. (ii) ㅂㅜㄴㄹㅠ ㄷㅏㄴㄱㅖㅇㅔㅅㅓ ㅇㅓㅅㅔㅁㅂㅡㄹㄹㅣ ㅅㅡㄴㅣㅍㅔㅅ(assembly snippet)ㅇㅡㄹ ㅇㅣㄹㅂㅏㄴ ㅌㅔㄱㅅㅡㅌㅡㄹㅗ ㄱㅏㄴㅈㅜㅎㅏㄱㅗ ㅇㅠㅅㅏㅅㅓㅇ ㅂㅣㄱㅛㄹㅡㄹ ㅇㅟㅎㅐ ㅅㅣㅋㅝㄴㅅㅡ ㅇㅣㄹㅊㅣ ㅇㅏㄹㄱㅗㄹㅣㅈㅡㅁㅇㅡㄹ ㅅㅏㅇㅛㅇㅎㅏㄴㄷㅏ. ㅁㅕㅇㄹㅕㅇㅇㅓㄴㅡㄴ ㄷㅗㄱㅌㅡㄱㅎㅏㄴ ㄱㅜㅈㅗㄹㅡㄹ ㄱㅏㅈㅣㄱㅗ ㅇㅣㅆㄷㅏ. ㅈㅡㄱ, ㄴㅣㅁㅗㄴㅣㄱ(mnemonic)ㄱㅘ ㄹㅔㅈㅣㅅㅡㅌㅓㄴㅡㄴ ㅁㅕㅇㄹㅕㅇㅇㅓㅇㅔㅅㅓ ㅌㅡㄱㅈㅓㅇㅎㅏㄴ ㅇㅟㅊㅣㄹㅡㄹ ㄱㅏㅈㅣㅁㅕ, ㄸㅗㅎㅏㄴ ㅇㅢㅁㅣㅈㅓㄱ ㄱㅘㄴㄱㅖ(semantic relationship)ㄹㅡㄹ ㄱㅏㅈㅣㄱㅗ ㅇㅣㅆㄷㅏㄴㅡㄴ ㅈㅓㅁㅇㅔㅅㅓ ㅇㅓㅅㅔㅁㅂㅡㄹㄹㅣ ㅋㅗㄷㅡㄹㅡㄹ ㅇㅣㄹㅂㅏㄴ ㅌㅔㄱㅅㅡㅌㅡ ㅇㅘ ㄷㅏㄹㅡㄱㅔ ㅁㅏㄴㄷㅡㄴㄷㅏ. ㄱㅕㅇㅎㅓㅁㅈㅓㄱ ㅂㅜㄴㅅㅓㄱㅇㅔㅅㅓ, ㅂㅗㄴ ㅇㅕㄴㄱㅜㄴㅡㄴ ㅍㅐㅊㅣ ㄸㅗㄴㅡㄴ ㅋㅓㅁㅍㅏㅇㅣㄹㄹㅓㄱㅏ ㄷㅗㅇㅣㅂㅎㅏㄴ ㅁㅜㅈㅏㄱㅇㅟㅅㅓㅇㅇㅔ ㅇㅢ ㅎㅐ ㅇㅑㄱㅣㄷㅚㄴㅡㄴ ㅈㅏㄱㅇㅡㄴ ㅅㅜㅈㅜㄴㅇㅢ ㅁㅕㅇㄹㅕㅇㅇㅓ ㅂㅕㄴㄱㅕㅇㅇㅡㄴ ㅅㅣㅋㅝㄴㅅㅡ ㅇㅣㄹㅊㅣ ㅇㅏㄹㄱㅗㄹㅣㅈㅡㅁㅇㅢ ㄱㅘㄴㅈㅓㅁㅇㅡㄹㅗㄴㅡㄴ ㄱㅓㅇㅢ ㄷㅗㅇㅇㅣㄹㅎㅏㄱㅔ ㅊㅟㄱㅡㅂㄷㅚㄴㅡㄴ ㄱㅓㅅㅇㅡㄹ ㅂㅏㄹㄱㅕㄴㅎㅐㅆㄷㅏ. ㅅㅣㅋㅝㄴㅅㅡ ㅇㅣㄹㅊㅣㄴㅡㄴ ㅇㅣㄹㅂㅏㄴ ㅌㅔㄱㅅㅡㅌㅡㅇㅔㅅㅓㄴㅡㄴ ㅎㅛㄱㅘㅈㅓㄱㅇㅣㅈㅣㅁㅏㄴ, ㅁㅕㅇㄹㅕㅇㅇㅓ ㅅㅜㅈㅜㄴㅇㅔㅅㅓ ㄴㅡㄴ ㄱㅜㅈㅗㅈㅓㄱ ㅁㅣㅊ ㅇㅢㅁㅣㅈㅓㄱ ㅂㅕㄴㅎㅘㄹㅡㄹ ㄱㅏㅁㅈㅣㅎㅏㅈㅣ ㅁㅗㅅㅎㅏㅁㅡㄹㅗ ㅂㅜㄴㄹㅠㅇㅔ ㄱㅡㄴㅑㅇ ㅅㅏㅇㅛㅇㅎㅏㄹ ㄱㅕㅇㅇㅜ ㅁㅏㄶㅇㅡㄴ ㅈㅏㄹㅁㅗㅅㄷㅚㄴ ㄱㅕㄹㄱㅘ ㅇㅢ ㅇㅝㄴㅇㅣㄴㅇㅣ ㄷㅚㄴㄷㅏ. ㅂㅗㄴ ㄴㅗㄴㅁㅜㄴㅇㅔㅅㅓㄴㅡㄴ ㄷㅜ ㄱㅏㅈㅣ ㅅㅗㄹㄹㅜㅅㅕㄴㅇㅡㄹ ㅈㅔㅇㅏㄴㅎㅏㅁㅇㅡㄹㅗㅆㅓ ㅇㅏㅍㅅㅓ ㅇㅓㄴㄱㅡㅂㅎㅏㄴ ㄱㅡㄴㅂㅗㄴㅈㅓㄱㅇㅣㄴ ㅁㅜㄴㅈㅔㄹㅡㄹ ㅎㅐㄱㅕㄹㅎㅐㅆㄷㅏ. ㅊㅓㅅ ㅉㅐ, 1:1 ㅁㅐㅍㅣㅇ ㄷㅏㄴㄱㅖㄹㅡㄹ ㅇㅟㅎㅐ, ㅇㅓㅂㄱㅖ ㅍㅛㅈㅜㄴ ㄷㅗㄱㅜㅇㅣㄴ Diaphoraㅇㅔㅅㅓ ㅅㅏㅇㅛㅇㅎㅏㄴㅡㄴ ㄱㅏㄱ ㅎㅠㄹㅣㅅㅡㅌㅣㄱㅇㅢ ㄷㅏㄴㅈㅓㅁㅇㅡㄹ ㅂㅜㄴㅅㅓㄱㅎㅏㄱㅗ, ㄱㅖㅅㅏㄴㅈㅓㄱㅇㅡㄹㅗ ㅈㅓㄹㅕㅁㅎㅏㄴ ㅌㅡㄱㅈㅣㅇ ㅂㅔㄱㅌㅓ(feature vector) ㅅㅔㅌㅡㄹㅡㄹ ㅈㅔㅇㅏㄴㅎㅐㅆㅇㅡㅁㅕ, Diaphoraㅇㅢ ㅎㅠ ㄹㅣㅅㅡㅌㅣㄱㅇㅡㄹ ㅂㅗㄴ ㄴㅗㄴㅁㅜㄴㅇㅔㅅㅓ ㅎㅏㅁㅅㅜ ㅁㅐㅍㅣㅇ ㄱㅘㅈㅓㅇㄱㅘ ㅍㅣㄹㅌㅓㄹㅣㅇ ㄱㅘㅈㅓㅇㅇㅡㄹ ㅇㅟㅎㅐ ㅈㅔㅇㅏㄴㅎㅏㄴ ㄱㅓㄹㅣ ㄱㅣ ㅂㅏㄴ(distance-based) ㅅㅓㄴㅈㅓㅇ ㄱㅣㅈㅜㄴㄱㅘ ㅂㅣㄱㅛㅎㅏㅇㅕㅆㄷㅏ. ㄷㅜㄹㅉㅐ, ㅂㅜㄴㄹㅠ ㄷㅏㄴㄱㅖㄹㅡㄹ ㅇㅟㅎㅐ, ㅇㅜㄹㅣㄴㅡㄴ ㄱㅏㄱ ㅂㅜㄴㄱㅣㄱㅏ ㅇㅓㅌㅔㄴ ㅅㅕㄴ(attention) ㄱㅣㅂㅏㄴ ㅂㅜㄴㅅㅏㄴ ㅎㅏㄱㅅㅡㅂ ㄴㅐㅈㅏㅇ ㅅㅣㄴㄱㅕㅇㅁㅏㅇㅇㅣㄴ ㅅㅑㅁ ㅇㅣㅈㅣㄴ ㅂㅜㄴㄹㅠ ㅅㅣㄴㄱㅕㅇㅁㅏㅇㅇㅡㄹ ㅈㅔㅇㅏㄴㅎㅐㅆㄷㅏ. ㅇㅣ ㄴㅔㅌㅡㅇㅝㅋㅡㄴㅡㄴ ㅇㅓㅅㅔㅁㅂㅡㄹㄹㅣ ㅁㅕㅇㄹㅕㅇㅇㅓ ㄱㅏㄴㅇㅢ ㅇㅢㅁㅣ ㅇㅠㅅㅏㅅㅓㅇㅇㅡㄹ ㅎㅏㄱㅅㅡㅂㅎㅏㄱㅗ, ㅁㅕㅇㄹㅕㅇㅇㅓ ㅅㅜㅈㅜㄴㅇㅔㅅㅓ ㅅㅣㄹㅈㅔㄹㅗ ㅂㅏㄹㅅㅐㅇㅎㅏㄴ ㅂㅕㄴㅎㅘㄹㅡㄹ ㄱㅏㅇ ㅈㅗㅎㅏㄴㅡㄴ ㅂㅏㅇㅂㅓㅂㅇㅡㄹ ㅂㅐㅇㅜㄱㅗ, ㅊㅚㅈㅗㅇ ㄷㅏㄴㄱㅖㅇㅔㅅㅓㄴㅡㄴ ㅇㅘㄴㅈㅓㄴ ㅇㅕㄴㄱㅕㄹ ㄱㅖㅊㅡㅇ(fully connected layer)ㅇㅡㄹ ㅌㅗㅇㅎㅐ ㅇㅣㄹㅊㅣ ㅎㅏㄱㅓㄴㅏ ㅂㅜㅂㅜㄴㅈㅓㄱㅇㅡㄹㅗ ㅇㅣㄹㅊㅣㅎㅏㄴㅡㄴ ㄷㅜ ㄱㅐㅇㅢ 1:1 ㅁㅐㅍㅣㅇㄷㅚㄴ ㅎㅏㅁㅅㅜㄹㅡㄹ ㅂㅜㄴㄹㅠㅎㅏㄴㅡㄴ ㄱㅓㅅㅇㅡㄹ ㅂㅐㅇㅜㄱㅔ ㄷㅚㄴㄷㅏ. ㅈㅔㅇㅏㄴㄷㅚㄴ ㅅㅣㄴㄱㅕㅇㅁㅏㅇㅇㅡㄴ ㅁㅕㅇㄹㅕㅇㅇㅓ ㅅㅜㅈㅜㄴㅇㅔㅅㅓ ㅋㅓㅁㅍㅏㅇㅣㄹㄹㅓㄹㅗ ㅇㅣㄴㅎㅏㄴ ㅂㅕㄴㄱㅕㅇㄱㅘ ㅍㅐㅊㅣ ㄱㅣㅂㅏㄴ ㅂㅕㄴㄱㅕㅇㅇㅡㄹ ㄱㅜㅂㅕㄹㅎㅏㄹ ㅅㅜ ㅇㅣㅆㅇㅡㄹ ㅈㅓㅇㄷㅗ ㄹㅗ ㅈㅓㅇㄱㅛㅎㅏㄷㅏ. ㅁㅏㅈㅣㅁㅏㄱㅇㅡㄹㅗ, ㅈㅔㅇㅏㄴㄷㅚㄴ 1:1 ㅁㅐㅍㅣㅇ ㄷㅏㄴㄱㅖㅇㅘ ㅂㅜㄴㄹㅠ ㄷㅏㄴㄱㅖㄹㅡㄹ ㅌㅗㅇㅎㅏㅂㅎㅏㄴㅡㄴ ㅎㅛㅇㅠㄹㅈㅓㄱㅇㅣㄴ ㅅㅣㄴㄱㅕㅇㅁㅏㅇ ㅈㅣ ㅇㅝㄴ ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇ ㅇㅏㄹㄱㅗㄹㅣㅈㅡㅁㅇㅡㄹ ㅈㅔㅇㅏㄴㅎㅐㅆㄷㅏ. ㅈㅔㅇㅏㄴㄷㅚㄴ ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇ ㅇㅏㄹㄱㅗㄹㅣㅈㅡㅁㅇㅡㄴ ㄷㅜ ㅂㅏㅇㅣㄴㅓㄹㅣ ㅎㅏㅁ ㅅㅜㄹㅡㄹ ㅈㅓㅇㅎㅘㄱㅎㅣ ㅇㅣㄹㅊㅣ, ㅂㅜㅂㅜㄴ ㅇㅣㄹㅊㅣ ㄸㅗㄴㅡㄴ ㅇㅣㄹㅊㅣ ㅇㅓㅄㅇㅡㅁ ㅅㅏㅇㅌㅐㄹㅗ ㅈㅓㅇㅎㅘㄱㅎㅏㄱㅔ ㅂㅜㄴㄹㅠㅎㅏㄴㄷㅏ. ㅂㅗㄴ ㄴㅗㄴㅁㅜㄴㅇㅡㄴ ㅈㅔㅇㅏㄴㄷㅚㄴ ㅌㅡㄱㅈㅣㅇ ㅂㅔㄱㅌㅓ, ㄷㅏㅇㅑㅇㅎㅏㄴ ㄷㅣㅈㅏㅇㅣㄴ ㅁㅣㅊ ㅅㅣㄴㄱㅕㅇㅁㅏㅇㅇㅢ ㅁㅐㄱㅐ ㅂㅕㄴㅅㅜㄹㅡㄹ ㅊㅓㄹㅈㅓㅎㅣ ㅍㅕㅇㄱㅏㅎㅐㅆㄷㅏ. ㅅㅣㄴㄱㅕㅇ ㅁㅏㅇㅇㅡㄹ ㅎㅜㄴㄹㅕㄴㅅㅣㅋㅣㄱㅣ ㅇㅟㅎㅐ, x86 XNU ㅋㅓㄴㅓㄹ ㅂㅏㅇㅣㄴㅓㄹㅣㄱㅏ ㅅㅏㅇㅛㅇㄷㅚㅇㅓㅆㅇㅡㅁㅕ ㅋㅓㄴㅓㄹ ㅂㅏㅇㅣㄴㅓㄹㅣㅇㅘ CWE ㄷㅔ ㅇㅣㅌㅓ ㅅㅔㅌㅡㅇㅔ ㄷㅐㅎㅐ ㅈㅔㅇㅏㄴㄷㅚㄴ ㅅㅣㄴㄱㅕㅇㅁㅏㅇ ㅈㅣㅇㅝㄴ ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇ ㅇㅏㄹㄱㅗㄹㅣㅈㅡㅁㅇㅡㄹ ㅍㅕㅇㄱㅏㅎㅐㅆㄷㅏ. ㅂㅗㄴ ㄴㅗㄴㅁㅜㄴㅇㅢ ㅇㅏㄹㄱㅗ ㄹㅣㅈㅡㅁㅇㅡㄴ ㄱㅣㅈㅗㄴ ㅂㅏㅇㅣㄴㅓㄹㅣ ㄷㅣㅍㅣㅇ ㄱㅣㅂㅓㅂㄱㅘ ㅌㅜㄹㅂㅗㄷㅏ ㄴㅗㅍㅇㅡㄴ ∼99%ㅇㅢ ㅂㅜㄴㄹㅠ ㅈㅓㅇㅎㅘㄱㄷㅗㄹㅡㄹ ㄷㅏㄹㅅㅓㅇㅎㅐㅆㄷㅏ.

Patch analysis is a reverse engineering technique to explore the patched content in binary executables and traditionally it is being used for applications like vulnerability discovery, 1-day exploit generation, etc. For patch
analysis at the binary level, we need to compare two different versions of a binary executable and find the functions that were patched/modified; while filtering the unpatched functions. In reverse engineering, binary
diffing is a process to discover the differences and similarities in functionality between two binary programs and is traditionally considered the best choice for patch analysis. Previous research on binary diffing approaches it as a function matching problem to formulate an initial 1:1 mapping between functions, and later a sequence matching ratio is computed to classify two functions being an exact match (unpatched), a partial match (patched) or no-match (error/new functions). In our empirical analysis for patch analysis, we have discovered that the accuracy of existing techniques is best only when detecting exact matches and they are not
efficient in detecting partially changed functions; especially those with minor patches like CWE478, CWE476, etc.

The drawbacks in existing research are due to two major challenges (i) In the 1:1 mapping phase, using a strict policy to match function features. Existing research defines a set of heuristics and uses them to match functions in a sequential manner. They have overtrusted some heuristics and prioritizing them produces many false matching results. (ii) In the classification phase, consider an assembly snippet as a normal text, and use a sequence matching algorithm for similarity comparison. Instruction has a unique structure i.e. mnemonics and registers have a specific position in instruction and also have a semantic relationship, which makes assembly code different from general text. In our empirical analysis, we have discovered that the small instruction-level changes either caused by a patch or a compiler introduced randomness are pretty much the same for a sequence matching algorithm. Sequence matching performs best for general text but it fails to detect structural and semantic changes at an instruction level thus, its use for classification produces many false results.

In this dissertation, we have addressed the aforementioned underlying challenges by proposing a two-fold solution. First, for the 1:1 mapping phase, we have empirically analyzed heuristics in Diaphora ? an industrystandard tool, discovered drawbacks of each heuristic and proposed a set of computationally inexpensive feature vectors, which are later comparedwith a distance-based selection criteria to map similar functions and filter unmatched functions. Second, for the classification phase, we have proposed a Siamese binary-classification neural network where each branch is an attention-based distributed learning embedding neural network ? that learn the semantic similarity among assembly instructions, learn to highlight the actual changes at an instruction level and a final stage fully connected layer learn to accurately classify two 1:1 mapped function either an exact or a partial match. The proposed neural network is sophisticated enough to differentiate between the compiler-caused and patched-based changes at an instruction level. Finally, we have proposed an efficient neural network-assisted binary diffing algorithm that is an integration of our proposed 1:1 mapping phase and the classification phase. The proposed binary diffing algorithm accurately classifies the two binary functions being exact match, partial, or no-match.

We have thoroughly evaluated the proposed feature vectors, different design choices, and parameters of the neural network. For training the neural network, we have used x86 XNU kernel binaries and evaluated the proposed neural network-assisted binary diffing algorithm on kernel binaries (not included in training) and the CWE dataset. We have achieved ∼99% classification accuracy; which is higher than existing binary diffing
techniques and tools.

목차

  1. 1. Introduction 1
    1.1 Overview: Patch Analysis and Binary Diffing 1
    1.1.1 Binary Diffing 3
    1.2 Motivation 5
    1.2.1 Problem Definition 9
    1.2.2 Assumptions 11
    1.3 Contributions 11
    1.4 Organization of Dissertation 12
    2. Background and Preliminaries 14
    2.1 Function matching and Binary diffing 14
    2.2 Function Matching Related Work 15
    2.2.1 Binary Diffing 15
    2.2.2 Binary Code Clones 17
    2.2.3 Deep Learning 18
    2.3 Binary Diffing Tool Analysis ? Diaphora 20
    3 Feature Engineering ? 1:1 mapping phase 25
    3.1 Tally Vector 25
    3.2 Edge Type Vector 27
    3.3 Vertex Type Vector 28
    3.4 Vertex degree Vector 29
    3.5 Digraph Dominance Relationship (DDR) 30
    3.5.1 Piecewise Hashing 33
    3.5.2 Projection based Hashing 33
    3.6 Opcode vector 34
    3.7 Assembly Embedding Representation 35
    3.7.1 An Instruction Embedding Model 37
    3.7.2 Function Modeling 37
    3.8 Proposed Function Matching (FMA) 39
    3.8.1 Features Encoding 40
    3.8.2 Representation Vector Matching 41
    3.8.3 Function Matching Algorithm 43
    4 Attention based Siamese binary classification neural network 47
    4.1 Modeling Assembly Functions 47
    4.1.1 Assembly Formatting 48
    4.1.2 Assembly Representation 49
    4.1.3 Dataset Collection 52
    4.2 Proposed Learning Model 55
    4.2.1 Assembly as a Bag of Instructions 58
    4.2.2 Attention Model 59
    4.2.3 Siamese Binary-Classification Model 64
    4.2.4 Training 66
    4.2.5 Utility 67
    4.3 Diffing Algorithm 68
    4.4 Design Decisions and Limitations 70
    4.4.1 Oneshot vs Sequential 70
    4.4.2 Expressiveness of the Embedding Layers 70
    4.4.3 Granularity of Operands Tokenization 71
    4.4.4 Distance Function 72
    5 Empirical Evaluation 73
    5.1 Test Environment 73
    5.2 Empirical evaluation ? 1:1 mapping phase 76
    5.3 Empirical evaluation ? classification phase 78
    5.3.1 RQ1a: Training Accuracy 80
    5.3.2 RQ1bPrediction Accuracy 81
    5.3.3 RQ2: Alternative Designs Comparison 86
    5.3.4 RQ3: Effect of Attention Mechanism 89
    5.3.5 RQ4: Comparison with binary diffing Baselines 90
    5.3.6 RQ5 : Evaluation for CWEs Binary Dataset 95
    6 Case Studies 98
    6.1 Case Study ? CVE-2019-8605 98
    7 Conclusions 101
    7.1 Summary 101
    7.2 Precautions for using Neural Network 102
    7.2.1 Parameter l 102
    7.2.2 Architecture 102
    7.2.3 Optimizations 103
    7.3 Importing to Register-based Architectures 103
    7.3.1 1:1 mapping phase 103
    7.3.2 Classification phase 104
    7.4 Importing to Stack-based Architectures 104

최근 본 자료

전체보기