Multimedia

Authors and titles for recent submissions

[ total of 35 entries: 1-25 | 26-35 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 13 May 2026

[1] arXiv:2605.12034 [pdf, ps, other]: Title: Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation

Authors: Che Liu, Lichao Ma, Xiangyu Tony Zhang, Yuxin Zhang, Haoyang Zhang, Xuerui Yang, Fei Tian

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[2] arXiv:2605.11400 [pdf, ps, other]: Title: UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning

Authors: Hayes Bai, Yinyi Luo, Wenwen Wang, Qingsong Wen, Jindong Wang

Subjects: Multimedia (cs.MM)
[3] arXiv:2605.10966 [pdf, ps, other]: Title: MMTB: Evaluating Terminal Agents on Multimedia-File Tasks

Authors: Chiyeong Heo, Jaechang Kim, Junhyuk Kwon, Hoyoung Kim, Dongmin Park, Jonghyun Lee, Jungseul Ok

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[4] arXiv:2605.11864 (cross-list from cs.IR) [pdf, ps, other]: Title: Very Efficient Listwise Multimodal Reranking for Long Documents

Authors: Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh

Comments: To appear in ICML 2026

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[5] arXiv:2605.11732 (cross-list from cs.IR) [pdf, ps, other]: Title: AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents

Authors: Jiarui Jin, Zexuan Yan, Shijian Wang, Wenxiang Jiao, Yuan Lu

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Multiagent Systems (cs.MA); Multimedia (cs.MM)
[6] arXiv:2605.11061 (cross-list from cs.CV) [pdf, ps, other]: Title: HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer

Authors: Qi Cai, Jingwen Chen, Chengmin Gao, Zijian Gong, Yehao Li, Yingwei Pan, Yi Peng, Zhaofan Qiu, Kai Yu, Yiheng Zhang, Hao Ai, Siying Bai, Yang Chen, Zhihui Chen, Fengbin Gao, Ying Guo, Dong Li, Zhen Shen, Leilei Shi, Jing Wang, Siyu Wang, Yimeng Wang, Rui Zheng, Ting Yao, Tao Mei

Comments: Source codes and models are available at Github: this https URL and Huggingface: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[7] arXiv:2605.10995 (cross-list from eess.IV) [pdf, ps, other]: Title: Streaming of rendered content with adaptive frame rate and resolution

Authors: Yaru Liu, Joseph G. March, Rafal K. Mantiuk

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)

Tue, 12 May 2026

[8] arXiv:2605.10622 [pdf, ps, other]: Title: Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination

Authors: Yangneng Chen, Junlin Li, Weijun Yao, Xilai Ma, Guodong Du, Wenya Wang, Jing Li

Comments: Accepted by ACL 2026 Main

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[9] arXiv:2605.10357 [src]: Title: RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild

Authors: Danni Xu, Shaojing Fan, Harry Cheng, Mohan Kankanhalli

Comments: This submission was made in error. It was intended to replace the existing submission arXiv:2512.22933 rather than create a new submission

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[10] arXiv:2605.10228 [pdf, ps, other]: Title: FLARE: Full-Modality Long-Video Audiovisual Retrieval Benchmark with User-Simulated Queries

Authors: Qijie You, Hao Liang, Mingrui Chen, Bohan Zeng, Meiyi Qiang, Zhenhao Wong, Wentao Zhang

Subjects: Multimedia (cs.MM)
[11] arXiv:2605.09468 [pdf, ps, other]: Title: Mitigating Multimodal Inconsistency via Cognitive Dual-Pathway Reasoning for Intent Recognition

Authors: Yifan Wang, Peiwu Wang, Yunxian Chi, Zhinan Gou, Kai Gao

Comments: Accepted by ICMR 2026 (Main Track, Long Paper)

Subjects: Multimedia (cs.MM)
[12] arXiv:2605.08836 [pdf, ps, other]: Title: Accelerating Multi-Condition T2I Generation via Adaptive Condition Offloading and Pruning

Authors: Yuxin Kong, Peng Yang, Chongbin Yi, Fan Wu, Feng Lyu

Comments: accepted by IEEE ICME 2026

Subjects: Multimedia (cs.MM)
[13] arXiv:2605.09897 (cross-list from eess.IV) [pdf, ps, other]: Title: Tube-Structured Incremental Semantic HARQ for Generative Video Receivers

Authors: Xuesong Wang, Xinyan Xie, Runxin Zhang

Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[14] arXiv:2605.09572 (cross-list from cs.CV) [pdf, ps, other]: Title: KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation

Authors: Guanyi Du, Lintao Wang, Kun Hu, Ziyang Wang

Comments: Accepted at Neurocomputing

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[15] arXiv:2605.09479 (cross-list from eess.IV) [pdf, ps, other]: Title: ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality

Authors: Feng Ding, Haisheng Fu, Jie Liang, Qihan Xu, Siyu Zhu, Jingning Han

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[16] arXiv:2605.09420 (cross-list from cs.CV) [pdf, ps, other]: Title: Relational Retrieval: Leveraging Known-Novel Interactions for Generalized Category Discovery

Authors: Yulin Xu, Chunqi Guo, Yuanzhen Shuai, Jianyuan Ni

Comments: Accepted by ICMR 2026. Generalized category discovery, semi-supervised learning, contrastive learning

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[17] arXiv:2605.09395 (cross-list from cs.AI) [pdf, ps, other]: Title: Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

Authors: Lin Li, Jiawei Huang, Qihao Quan, Dan Li, Boxin Li, Xiao Zhang, Erli Meng, Wenjie Feng, Jian Lou, See-Kiong Ng

Comments: 18 pages, 12 figures, 6 tables. Preprint

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Multimedia (cs.MM)
[18] arXiv:2605.09348 (cross-list from cs.CL) [pdf, ps, other]: Title: HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities

Authors: Shusaku Egami, Aoi Ohta, Tomoki Tsujimura, Masaki Asada, Tatsuya Ishigaki, Ken Fukuda, Masahiro Hamasaki, Hiroya Takamura

Comments: 12 pages, 4 figures, 7 tables, accepted at LREC2026

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB); Multimedia (cs.MM)
[19] arXiv:2605.09279 (cross-list from cs.GR) [pdf, ps, other]: Title: CAGS: Color-Adaptive Volumetric Video Streaming with Dynamic 3D Gaussian Splatting

Authors: Daheng Yin, Yili Jin, Jianxin Shi, Isaac Ding, Miao Zhang, Fangxin Wang, Zhaowu Huang, Cong Zhang, Jiangchuan Liu, Fang Dong

Comments: SIGGRAPH 2026 Conference Paper. Code is available at this https URL

Journal-ref: ACM SIGGRAPH 2026

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Networking and Internet Architecture (cs.NI); Image and Video Processing (eess.IV)
[20] arXiv:2605.09024 (cross-list from cs.CV) [pdf, ps, other]: Title: Relightable Gaussian Splatting for Virtual Production Using Image-Based Illumination

Authors: Adrian Azzarelli, Nantheera Anantrasirichai, James Pollock, David R. Bull

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[21] arXiv:2605.08729 (cross-list from cs.CV) [pdf, ps, other]: Title: Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation

Authors: Shihao Cheng, Jiaxu Zhang, Quanyue Song, Shansong Liu, Zhizhi Guo, Xiaolei Zhang, Chi Zhang, Xuelong Li, Zhigang Tu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM); Sound (cs.SD)
[22] arXiv:2605.08723 (cross-list from cs.CV) [pdf, ps, other]: Title: EAR: Enhancing Uni-Modal Representations for Weakly Supervised Audio-Visual Video Parsing

Authors: Huilai Li, Xiaomeng Di, Ying Xing, Yonghao Dang, Yiming Wang, Jianqin Yin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[23] arXiv:2605.08699 (cross-list from eess.IV) [pdf, ps, other]: Title: Thin-Client Interactive Gaussian Adaptive Streaming over HTTP/3

Authors: Emanuele Artioli, Philipp Fößl, Daniele Lorenzi, Farzad Tashtarian, Mahdi Dolati, Cheng-Hsin Hsu, Christian Timmerer

Subjects: Image and Video Processing (eess.IV); Emerging Technologies (cs.ET); Multimedia (cs.MM)

Mon, 11 May 2026 (showing first 2 of 6 entries)

[24] arXiv:2605.07825 [pdf, ps, other]: Title: Anisotropic Modality Align

Authors: Xiaomin Yu, Yijiang Li, Yuhui Zhang, Hanzhen Zhao, Yue Yang, Hao Tang, Yue Song, Xiaobin Hu, Chengwei Qin, Shuicheng Yan, Hui Xiong

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[25] arXiv:2605.07489 (cross-list from cs.SD) [pdf, ps, other]: Title: A Decomposed Retrieval-Edit-Rerank Framework for Chord Generation

Authors: Qiqi He, Dichucheng Li, Xiaoheng Sun, Anqi Huang

Comments: Accepted by the 2026 ACM International Conference on Multimedia Retrieval (ICMR 2026)

Subjects: Sound (cs.SD); Multimedia (cs.MM); Signal Processing (eess.SP)

[ total of 35 entries: 1-25 | 26-35 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2605, contact, help (Access key information)

> cs > cs.MM

Multimedia

Authors and titles for recent submissions

Wed, 13 May 2026

Tue, 12 May 2026

Mon, 11 May 2026 (showing first 2 of 6 entries)