메뉴 건너뛰기

추천
검색
질문

논문 기본 정보

자료유형
학술저널
저자정보
(Kookmin University) (Kookmin University) (Kookmin University) (Kookmin University)
저널정보
대한전기학회 전기학회논문지 전기학회논문지 제74권 제9호
발행연도
수록면
1,581 - 1,590 (10page)
DOI
10.5370/KIEE.2025.74.9.1581

이용수

표지
📌
연구주제
📖
연구배경
🔬
연구방법
이 논문의 연구방법이 궁금하신가요?
🏆
연구결과
이 논문의 연구결과가 궁금하신가요?
AI에게 요청하기
추천
검색
질문

초록· 키워드

While the Swin Transformer effectively reduces computational cost using window-based attention, it struggles to model global dependencies across windows. Prior work, such as the Refined Transformer, attempts to overcome this limitation by incorporating CBAM-style channel and spatial attention mechanisms. However, these sequential attention operations often introduce representational bias by overemphasizing specific features. To address this, we propose two key components: (1) the Efficient Head Self-Attention (EHSA) module, which dynamically calibrates the relative contribution of each attention head within a window, and (2) the Hierarchical Local-to-Global Spatial Attention (HLSA) module, which captures long-range interactions across windows in a hierarchical manner. By integrating these into a Swin-T backbone, our architecture improves both local detail modeling and global context aggregation. Experiments on ImageNet-1K and ImageNet100 demonstrate that our model surpasses the Refined Transformer and other window-based approaches in accuracy, while maintaining a comparable level of computational efficiency. These results validate the effectiveness of our design in enhancing local-global interactions within Vision Transformers.
상세정보 수정요청해당 페이지 내 제목·저자·목차·페이지
정보가 잘못된 경우 알려주세요!

목차

  1. Abstract
  2. 1. 서론
  3. 2. 관련 연구
  4. 3. 제안하는 모델
  5. 4. 실험
  6. 5. 결론
  7. References

참고문헌

참고문헌 신청

최근 본 자료

전체보기
UCI(KEPA) : I410-151-25-02-094036306