We gratefully acknowledge support from
the Simons Foundation and member institutions.

Multimedia

Authors and titles for recent submissions, skipping first 3

[ total of 16 entries: 1-25 | 4-16 ]
[ showing up to 25 entries per page: fewer | more ]

Tue, 9 Dec 2025

[4]  arXiv:2512.07209 [pdf, ps, other]
Title: Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits
Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Sound (cs.SD)
[5]  arXiv:2512.07571 (cross-list from cs.CL) [pdf, ps, other]
Title: A Simple Method to Enhance Pre-trained Language Models with Speech Tokens for Classification
Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[6]  arXiv:2512.06811 (cross-list from cs.CV) [pdf, ps, other]
Title: RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models
Comments: Accepted by AAAI 2026(Oral)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[7]  arXiv:2512.06282 (cross-list from cs.CV) [pdf, ps, other]
Title: A Sleep Monitoring System Based on Audio, Video and Depth Information
Comments: Accepted in the Computer Vision, Graphics and Image Processing (CVGIP 2013)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[8]  arXiv:2512.06022 (cross-list from cs.SD) [pdf, ps, other]
Title: DreamFoley: Scalable VLMs for High-Fidelity Video-to-Audio Generation
Comments: 10 pages; Bytedance
Subjects: Sound (cs.SD); Multimedia (cs.MM)

Mon, 8 Dec 2025

[9]  arXiv:2512.05745 (cross-list from cs.CR) [pdf, ps, other]
Title: ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior
Subjects: Cryptography and Security (cs.CR); Multimedia (cs.MM)
[10]  arXiv:2512.05438 (cross-list from cs.HC) [pdf, ps, other]
Title: EXR: An Interactive Immersive EHR Visualization in Extended Reality
Comments: 11 pages, 6 figures. Preprint version. This paper has been accepted to IEEE ICIR 2025. This is the author-prepared version and not the final published version. The final version will appear in IEEE Xplo
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[11]  arXiv:2512.05126 (cross-list from eess.AS) [pdf, ps, other]
Title: SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)

Fri, 5 Dec 2025

[12]  arXiv:2512.04112 [pdf, ps, other]
Title: MindFuse: Towards GenAI Explainability in Marketing Strategy Co-Creation
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[13]  arXiv:2512.04398 (cross-list from cs.HC) [pdf, ps, other]
Title: What is Beyond Presence? Dimensionality, Control, and Information Spaces
Authors: E. Ch'ng
Comments: 38 pages, accepted for Presence: Virtual and Augmented Reality 2026(37)
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

Thu, 4 Dec 2025

[14]  arXiv:2512.03521 [pdf, ps, other]
Title: Cross-Space Synergy: A Unified Framework for Multimodal Emotion Recognition in Conversation
Comments: Accepted to AAAI 2026
Subjects: Multimedia (cs.MM); Machine Learning (cs.LG)
[15]  arXiv:2512.03087 [pdf, ps, other]
Title: When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[16]  arXiv:2512.03566 (cross-list from cs.CV) [pdf, ps, other]
Title: GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models
Comments: Accepted by ACM MM Asia2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[ total of 16 entries: 1-25 | 4-16 ]
[ showing up to 25 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help  (Access key information)