We gratefully acknowledge support from
the Simons Foundation and member institutions.

Multimedia

Authors and titles for recent submissions

[ total of 35 entries: 1-25 | 26-35 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 13 May 2026

[1]  arXiv:2605.12034 [pdf, ps, other]
Title: Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[2]  arXiv:2605.11400 [pdf, ps, other]
Title: UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
Subjects: Multimedia (cs.MM)
[3]  arXiv:2605.10966 [pdf, ps, other]
Title: MMTB: Evaluating Terminal Agents on Multimedia-File Tasks
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[4]  arXiv:2605.11864 (cross-list from cs.IR) [pdf, ps, other]
Title: Very Efficient Listwise Multimodal Reranking for Long Documents
Comments: To appear in ICML 2026
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[5]  arXiv:2605.11732 (cross-list from cs.IR) [pdf, ps, other]
Title: AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Multiagent Systems (cs.MA); Multimedia (cs.MM)
[6]  arXiv:2605.11061 (cross-list from cs.CV) [pdf, ps, other]
Title: HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer
Comments: Source codes and models are available at Github: this https URL and Huggingface: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[7]  arXiv:2605.10995 (cross-list from eess.IV) [pdf, ps, other]
Title: Streaming of rendered content with adaptive frame rate and resolution
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)

Tue, 12 May 2026

[8]  arXiv:2605.10622 [pdf, ps, other]
Title: Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination
Comments: Accepted by ACL 2026 Main
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[9]  arXiv:2605.10357 [src]
Title: RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild
Comments: This submission was made in error. It was intended to replace the existing submission arXiv:2512.22933 rather than create a new submission
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[10]  arXiv:2605.10228 [pdf, ps, other]
Title: FLARE: Full-Modality Long-Video Audiovisual Retrieval Benchmark with User-Simulated Queries
Subjects: Multimedia (cs.MM)
[11]  arXiv:2605.09468 [pdf, ps, other]
Title: Mitigating Multimodal Inconsistency via Cognitive Dual-Pathway Reasoning for Intent Recognition
Comments: Accepted by ICMR 2026 (Main Track, Long Paper)
Subjects: Multimedia (cs.MM)
[12]  arXiv:2605.08836 [pdf, ps, other]
Title: Accelerating Multi-Condition T2I Generation via Adaptive Condition Offloading and Pruning
Comments: accepted by IEEE ICME 2026
Subjects: Multimedia (cs.MM)
[13]  arXiv:2605.09897 (cross-list from eess.IV) [pdf, ps, other]
Title: Tube-Structured Incremental Semantic HARQ for Generative Video Receivers
Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[14]  arXiv:2605.09572 (cross-list from cs.CV) [pdf, ps, other]
Title: KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation
Comments: Accepted at Neurocomputing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[15]  arXiv:2605.09479 (cross-list from eess.IV) [pdf, ps, other]
Title: ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[16]  arXiv:2605.09420 (cross-list from cs.CV) [pdf, ps, other]
Title: Relational Retrieval: Leveraging Known-Novel Interactions for Generalized Category Discovery
Comments: Accepted by ICMR 2026. Generalized category discovery, semi-supervised learning, contrastive learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[17]  arXiv:2605.09395 (cross-list from cs.AI) [pdf, ps, other]
Title: Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning
Comments: 18 pages, 12 figures, 6 tables. Preprint
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Multimedia (cs.MM)
[18]  arXiv:2605.09348 (cross-list from cs.CL) [pdf, ps, other]
Title: HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities
Comments: 12 pages, 4 figures, 7 tables, accepted at LREC2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB); Multimedia (cs.MM)
[19]  arXiv:2605.09279 (cross-list from cs.GR) [pdf, ps, other]
Title: CAGS: Color-Adaptive Volumetric Video Streaming with Dynamic 3D Gaussian Splatting
Comments: SIGGRAPH 2026 Conference Paper. Code is available at this https URL
Journal-ref: ACM SIGGRAPH 2026
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Networking and Internet Architecture (cs.NI); Image and Video Processing (eess.IV)
[20]  arXiv:2605.09024 (cross-list from cs.CV) [pdf, ps, other]
Title: Relightable Gaussian Splatting for Virtual Production Using Image-Based Illumination
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[21]  arXiv:2605.08729 (cross-list from cs.CV) [pdf, ps, other]
Title: Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM); Sound (cs.SD)
[22]  arXiv:2605.08723 (cross-list from cs.CV) [pdf, ps, other]
Title: EAR: Enhancing Uni-Modal Representations for Weakly Supervised Audio-Visual Video Parsing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[23]  arXiv:2605.08699 (cross-list from eess.IV) [pdf, ps, other]
Title: Thin-Client Interactive Gaussian Adaptive Streaming over HTTP/3
Subjects: Image and Video Processing (eess.IV); Emerging Technologies (cs.ET); Multimedia (cs.MM)

Mon, 11 May 2026 (showing first 2 of 6 entries)

[24]  arXiv:2605.07825 [pdf, ps, other]
Title: Anisotropic Modality Align
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[25]  arXiv:2605.07489 (cross-list from cs.SD) [pdf, ps, other]
Title: A Decomposed Retrieval-Edit-Rerank Framework for Chord Generation
Comments: Accepted by the 2026 ACM International Conference on Multimedia Retrieval (ICMR 2026)
Subjects: Sound (cs.SD); Multimedia (cs.MM); Signal Processing (eess.SP)
[ total of 35 entries: 1-25 | 26-35 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2605, contact, help  (Access key information)