Multimedia

Authors and titles for recent submissions

Fri, 5 Dec 2025
Thu, 4 Dec 2025
Wed, 3 Dec 2025
Tue, 2 Dec 2025
Mon, 1 Dec 2025

[ total of 32 entries: 1-25 | 26-32 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 5 Dec 2025

[1] arXiv:2512.04112 [pdf, ps, other]: Title: MindFuse: Towards GenAI Explainability in Marketing Strategy Co-Creation

Authors: Aleksandr Farseev, Marlo Ongpin, Qi Yang, Ilia Gossoudarev, Yu-Yi Chu-Farseeva, Sergey Nikolenko

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[2] arXiv:2512.04398 (cross-list from cs.HC) [pdf, ps, other]: Title: What is Beyond Presence? Dimensionality, Control, and Information Spaces

Authors: E. Ch'ng

Comments: 38 pages, accepted for Presence: Virtual and Augmented Reality 2026(37)

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

Thu, 4 Dec 2025

[3] arXiv:2512.03521 [pdf, ps, other]: Title: Cross-Space Synergy: A Unified Framework for Multimodal Emotion Recognition in Conversation

Authors: Xiaosen Lyu, Jiayu Xiong, Yuren Chen, Wanlong Wang, Xiaoqing Dai, Jing Wang

Comments: Accepted to AAAI 2026

Subjects: Multimedia (cs.MM); Machine Learning (cs.LG)
[4] arXiv:2512.03087 [pdf, ps, other]: Title: When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI

Authors: Yanhui Li, Qi Zhou, Zhihong Xu, Huizhong Guo, Wenhai Wang, Dongxia Wang

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[5] arXiv:2512.03566 (cross-list from cs.CV) [pdf, ps, other]: Title: GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models

Authors: Hao Sun, Lei Fan, Donglin Di, Shaohui Liu

Comments: Accepted by ACM MM Asia2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Wed, 3 Dec 2025

[6] arXiv:2512.02584 [pdf, ps, other]: Title: Stepwise Schema-Guided Prompting Framework with Parameter Efficient Instruction Tuning for Multimedia Event Extraction

Authors: Xiang Yuan, Xinrong Chen, Haochen Li, Hang Yang, Guanyu Wang, Weiping Li, Tong Mo

Comments: Accepted by 2025 IEEE International Conference on Multimedia and Expo

Subjects: Multimedia (cs.MM)
[7] arXiv:2512.02533 [pdf, ps, other]: Title: PopSim: Social Network Simulation for Social Media Popularity Prediction

Authors: Yijun Liu, Wu Liu, Xiaoyan Gu, Allen He, Weiping Wang, Yongdong Zhang

Subjects: Multimedia (cs.MM)
[8] arXiv:2512.02906 (cross-list from cs.CV) [pdf, ps, other]: Title: MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image Understanding

Authors: Fan Yang, Kaihao Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[9] arXiv:2512.02792 (cross-list from cs.CV) [pdf, ps, other]: Title: HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval

Authors: Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, Weili Guan

Comments: Accepted by ACM MM 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[10] arXiv:2512.02652 (cross-list from cs.SD) [pdf, ps, other]: Title: Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training

Authors: Hong-Jie You, Jie-Jing Shao, Xiao-Wen Yang, Lin-Han Jia, Lan-Zhe Guo, Yu-Feng Li

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[11] arXiv:2512.02650 (cross-list from cs.CV) [pdf, ps, other]: Title: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Authors: Junwon Lee, Juhan Nam, Jiyoung Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 2 Dec 2025

[12] arXiv:2512.01442 [pdf, ps, other]: Title: PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis

Authors: Heng Xie, Kang Zhu, Zhengqi Wen, Jianhua Tao, Xuefei Liu, Ruibo Fu, Changsheng Li

Comments: AAAI 2026 accepted

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[13] arXiv:2512.01267 [pdf, ps, other]: Title: ZO-ASR: Zeroth-Order Fine-Tuning of Speech Foundation Models without Back-Propagation

Authors: Yuezhang Peng, Yuxin Liu, Yao Li, Sheng Wang, Fei Wen, Xie Chen

Comments: 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Subjects: Multimedia (cs.MM); Sound (cs.SD)
[14] arXiv:2512.00928 [pdf, ps, other]: Title: Augmenting Intra-Modal Understanding in MLLMs for Robust Multimodal Keyphrase Generation

Authors: Jiajun Cao, Qinggang Zhang, Yunbo Tang, Zhishang Xiang, Chang Yang, Jinsong Su

Subjects: Multimedia (cs.MM)
[15] arXiv:2512.00883 [pdf, ps, other]: Title: Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound

Authors: Jiahua Wang, Shannan Yan, Leqi Zheng, Jialong Wu, Yaoxin Mao

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[16] arXiv:2512.01603 (cross-list from cs.CL) [pdf, ps, other]: Title: MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark

Authors: Yuezhang Peng, Chonghao Cai, Ziang Liu, Shuai Fan, Sheng Jiang, Hua Xu, Yuxin Liu, Qiguang Chen, Kele Xu, Yao Li, Sheng Wang, Libo Qin, Xie Chen

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[17] arXiv:2512.00537 (cross-list from cs.HC) [pdf, ps, other]: Title: Speculating on the Role of Media Architecture in Post-disaster Rebuilding and Recovery: Insights from Architects and Interaction Designers

Authors: Berk Goksenin Tan, Oguzhan Ozcan

Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Emerging Technologies (cs.ET); Multimedia (cs.MM)
[18] arXiv:2512.00451 (cross-list from cs.SD) [pdf, ps, other]: Title: STCTS: Generative Semantic Compression for Ultra-Low Bitrate Speech via Explicit Text-Prosody-Timbre Decomposition

Authors: Siyu Wang, Haitao Li, Donglai Zhu

Comments: The complete source code and online speech reconstruction demo is publicly available at this https URL

Subjects: Sound (cs.SD); Multimedia (cs.MM)
[19] arXiv:2512.00120 (cross-list from cs.SD) [pdf, ps, other]: Title: Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment

Authors: Jiaying Hong, Ting Zhu, Thanet Markchom, Huizhi Liang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[20] arXiv:2512.00115 (cross-list from cs.SD) [pdf, ps, other]: Title: MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual Learning

Authors: Kyeongha Rho, Hyeongkeun Lee, Jae Won Cho, Joon Son Chung

Comments: 10 pages, 5 figures

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Mon, 1 Dec 2025 (showing first 5 of 12 entries)

[21] arXiv:2511.22576 [pdf, ps, other]: Title: A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization

Authors: Janak Kapuriya, Ali Hatami, Paul Buitelaar

Subjects: Multimedia (cs.MM)
[22] arXiv:2511.22463 [pdf, ps, other]: Title: Orthogonal Disentanglement with Projected Feature Alignment for Multimodal Emotion Recognition in Conversation

Authors: Xinyi Che, Wenbo Wang, Jian Guan, Qijun Zhao

Comments: 10 pages, 1 figure

Subjects: Multimedia (cs.MM)
[23] arXiv:2511.22447 [pdf, ps, other]: Title: Angle-Optimized Partial Disentanglement for Multimodal Emotion Recognition in Conversation

Authors: Xinyi Che, Wenbo Wang, Yuanbo Hou, Mingjie Xie, Qijun Zhao, Jian Guan

Comments: 10 pages, 7 figures

Subjects: Multimedia (cs.MM)
[24] arXiv:2511.22229 [pdf, ps, other]: Title: VSpeechLM: A Visual Speech Language Model for Visual Text-to-Speech Task

Authors: Yuyue Wang, Xin Cheng, Yihan Wu, Xihua Wang, Jinchuan Tian, Ruihua Song

Comments: MM Asia 2025

Subjects: Multimedia (cs.MM)
[25] arXiv:2511.21780 [pdf, ps, other]: Title: 3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation

Authors: Yaoru Li, Heyu Si, Federico Landi, Pilar Oplustil Gallegos, Ioannis Koutsoumpas, O. Ricardo Cortez Vazquez, Ruiju Fu, Qi Guo, Xin Jin, Shunyu Liu, Mingli Song

Subjects: Multimedia (cs.MM); Sound (cs.SD)

Fri, 5 Dec 2025
Thu, 4 Dec 2025
Wed, 3 Dec 2025
Tue, 2 Dec 2025
Mon, 1 Dec 2025

[ total of 32 entries: 1-25 | 26-32 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)

> cs > cs.MM

Multimedia

Authors and titles for recent submissions

Fri, 5 Dec 2025

Thu, 4 Dec 2025

Wed, 3 Dec 2025

Tue, 2 Dec 2025

Mon, 1 Dec 2025 (showing first 5 of 12 entries)