인문학
사회과학
자연과학
공학
의약학
농수해양학
예술체육학
복합학
지원사업
학술연구/단체지원/교육 등 연구자 활동을 지속하도록 DBpia가 지원하고 있어요.
커뮤니티
연구자들이 자신의 연구와 전문성을 널리 알리고, 새로운 협력의 기회를 만들 수 있는 네트워킹 공간이에요.
논문 기본 정보
- 자료유형
- 학술저널
- 저자정보
- 발행연도
- 2026.5
- 수록면
- 746 - 756 (11page)
- DOI
- 10.9717/kmms.2026.29.5.746
이용수
초록· 키워드
The rapid growth of online and remote interactions has increased demand for AI systems that can deliver emotionally supportive, empathetic responses, yet many conversational agents still fail to reflect the richness of human empathy in face-to-face, multimodal settings. This study proposes a multimodal empathetic response generation framework that explicitly models emotional dynamics in dyadic interactions to produce dynamic empathetic speaking videos. Given a speaker’s audio and video, the framework generates an empathizer’s responses across linguistic, acoustic, and visual modalities. The framework integrates three modules. First, a Video Conversation (ViCo)-based facial representation module encodes the speaker’s expression dynamics as 3D Morphable Model (3DMM) coefficients and predicts the empathizer’s facial expressions and head movements to enhance visual empathy. Second, an AnyGPT-based module generates semantically coherent and emotionally appropriate empathetic utterances from acoustic features. Third, to mitigate temporal mismatch between generated speech and facial motion, a lip-synchronization module (Wav2Lip) aligns lip movements with audio and produces natural conversational videos. Quantitative and qualitative evaluations show that the proposed framework generates more empathetic, dynamic, and context-consistent responses, highlighting the importance of multimodal integration for emotionally expressive, human-centered AI.
#Empathetic Response Generation
#Multimodal Learning
#Dyadic Interaction
#3D Morphable Model (3DMM)
#Affective Computing
상세정보 수정요청해당 페이지 내 제목·저자·목차·페이지정보가 잘못된 경우 알려주세요!
목차
- ABSTRACT
- 1. 서론
- 2. 관련 연구
- 3. 제안하는 공감 생성 모델
- 4. 실험 결과
- 5. 결론
- REFERENCE