Multimedia

Authors and titles for recent submissions, skipping first 20

[ total of 32 entries: 1-10 | 11-20 | 21-30 | 31-32 ]
[ showing 10 entries per page: fewer | more | all ]

Mon, 1 Dec 2025 (showing first 10 of 12 entries)

[21] arXiv:2511.22576 [pdf, ps, other]: Title: A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization

Authors: Janak Kapuriya, Ali Hatami, Paul Buitelaar

Subjects: Multimedia (cs.MM)
[22] arXiv:2511.22463 [pdf, ps, other]: Title: Orthogonal Disentanglement with Projected Feature Alignment for Multimodal Emotion Recognition in Conversation

Authors: Xinyi Che, Wenbo Wang, Jian Guan, Qijun Zhao

Comments: 10 pages, 1 figure

Subjects: Multimedia (cs.MM)
[23] arXiv:2511.22447 [pdf, ps, other]: Title: Angle-Optimized Partial Disentanglement for Multimodal Emotion Recognition in Conversation

Authors: Xinyi Che, Wenbo Wang, Yuanbo Hou, Mingjie Xie, Qijun Zhao, Jian Guan

Comments: 10 pages, 7 figures

Subjects: Multimedia (cs.MM)
[24] arXiv:2511.22229 [pdf, ps, other]: Title: VSpeechLM: A Visual Speech Language Model for Visual Text-to-Speech Task

Authors: Yuyue Wang, Xin Cheng, Yihan Wu, Xihua Wang, Jinchuan Tian, Ruihua Song

Comments: MM Asia 2025

Subjects: Multimedia (cs.MM)
[25] arXiv:2511.21780 [pdf, ps, other]: Title: 3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation

Authors: Yaoru Li, Heyu Si, Federico Landi, Pilar Oplustil Gallegos, Ioannis Koutsoumpas, O. Ricardo Cortez Vazquez, Ruiju Fu, Qi Guo, Xin Jin, Shunyu Liu, Mingli Song

Subjects: Multimedia (cs.MM); Sound (cs.SD)
[26] arXiv:2511.21698 [pdf, ps, other]: Title: TIP and Polish: Text-Image-Prototype Guided Multi-Modal Generation via Commonality-Discrepancy Modeling and Refinement

Authors: Zhiyong Ma, Jiahao Chen, Qingyuan Chuai, Zhengping Li

Comments: Submitted to ICASSP2026

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[27] arXiv:2511.21694 [pdf, ps, other]: Title: A Survey of Information Disorder on Video-Sharing Platforms

Authors: Meiyu Li, Wei Ai, Naeemul Hassan

Comments: Accepted by 2025 IEEE International Conference on Content-Based Multimedia Indexing

Subjects: Multimedia (cs.MM); Computers and Society (cs.CY)
[28] arXiv:2511.21693 [pdf, ps, other]: Title: Designing a Multimodal Viewer for Piano Performance Analysis -- a Pedagogy-First Approach

Authors: Joonhyung Bae, Hyeyoon Cho, Kirak Kim, Dawon Park, Taegyun Kwon, Yoon-Seok Choi, Hyeon Hur, Shigeru Kai, Yohei Wada, Satoshi Obata, Akira Maezawa, Jaebum Park, Jonghwa Park, Juhan Nam

Subjects: Multimedia (cs.MM)
[29] arXiv:2511.22805 (cross-list from cs.CV) [pdf, ps, other]: Title: From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

Authors: Yiming Chen, Junlin Han, Tianyi Bai, Shengbang Tong, Filippos Kokkinos, Philip Torr

Comments: Project page with codes/datasets/models: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[30] arXiv:2511.22715 (cross-list from cs.CV) [pdf, ps, other]: Title: ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering

Authors: Alberto Compagnoni, Marco Morini, Sara Sarto, Federico Cocchi, Davide Caffagni, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)

[ total of 32 entries: 1-10 | 11-20 | 21-30 | 31-32 ]
[ showing 10 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)

> cs > cs.MM

Multimedia

Authors and titles for recent submissions, skipping first 20

Mon, 1 Dec 2025 (showing first 10 of 12 entries)