We gratefully acknowledge support from
the Simons Foundation and member institutions.

Multimedia

Authors and titles for recent submissions, skipping first 20

[ total of 32 entries: 1-10 | 11-20 | 21-30 | 31-32 ]
[ showing 10 entries per page: fewer | more | all ]

Mon, 1 Dec 2025 (showing first 10 of 12 entries)

[21]  arXiv:2511.22576 [pdf, ps, other]
Title: A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization
Subjects: Multimedia (cs.MM)
[22]  arXiv:2511.22463 [pdf, ps, other]
Title: Orthogonal Disentanglement with Projected Feature Alignment for Multimodal Emotion Recognition in Conversation
Comments: 10 pages, 1 figure
Subjects: Multimedia (cs.MM)
[23]  arXiv:2511.22447 [pdf, ps, other]
Title: Angle-Optimized Partial Disentanglement for Multimodal Emotion Recognition in Conversation
Comments: 10 pages, 7 figures
Subjects: Multimedia (cs.MM)
[24]  arXiv:2511.22229 [pdf, ps, other]
Title: VSpeechLM: A Visual Speech Language Model for Visual Text-to-Speech Task
Comments: MM Asia 2025
Subjects: Multimedia (cs.MM)
[25]  arXiv:2511.21780 [pdf, ps, other]
Title: 3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation
Subjects: Multimedia (cs.MM); Sound (cs.SD)
[26]  arXiv:2511.21698 [pdf, ps, other]
Title: TIP and Polish: Text-Image-Prototype Guided Multi-Modal Generation via Commonality-Discrepancy Modeling and Refinement
Comments: Submitted to ICASSP2026
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[27]  arXiv:2511.21694 [pdf, ps, other]
Title: A Survey of Information Disorder on Video-Sharing Platforms
Comments: Accepted by 2025 IEEE International Conference on Content-Based Multimedia Indexing
Subjects: Multimedia (cs.MM); Computers and Society (cs.CY)
[28]  arXiv:2511.21693 [pdf, ps, other]
Title: Designing a Multimodal Viewer for Piano Performance Analysis -- a Pedagogy-First Approach
Subjects: Multimedia (cs.MM)
[29]  arXiv:2511.22805 (cross-list from cs.CV) [pdf, ps, other]
Title: From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
Comments: Project page with codes/datasets/models: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[30]  arXiv:2511.22715 (cross-list from cs.CV) [pdf, ps, other]
Title: ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[ total of 32 entries: 1-10 | 11-20 | 21-30 | 31-32 ]
[ showing 10 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help  (Access key information)