We gratefully acknowledge support from
the Simons Foundation and member institutions.

Multimedia

Authors and titles for recent submissions

[ total of 33 entries: 1-25 | 26-33 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 27 Mar 2026

[1]  arXiv:2603.25727 (cross-list from cs.AI) [pdf, ps, other]
Title: Back to Basics: Revisiting ASR in the Age of Voice Agents
Comments: 10 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[2]  arXiv:2603.25202 (cross-list from cs.CV) [pdf, ps, other]
Title: CIV-DG: Conditional Instrumental Variables for Domain Generalization in Medical Imaging
Comments: 10 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[3]  arXiv:2603.25140 (cross-list from cs.CV) [pdf, ps, other]
Title: SAVe: Self-Supervised Audio-visual Deepfake Detection Exploiting Visual Artifacts and Audio-visual Misalignment
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
[4]  arXiv:2603.25004 (cross-list from cs.CV) [pdf, ps, other]
Title: Interpretable Zero-shot Referring Expression Comprehension with Query-driven Scene Graphs
Comments: Accepted by T-MM
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[5]  arXiv:2603.24793 (cross-list from cs.CV) [pdf, ps, other]
Title: AVControl: Efficient Framework for Training Audio-Visual Controls
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
[6]  arXiv:2603.24721 (cross-list from cs.CV) [pdf, ps, other]
Title: Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models
Comments: Accepted by CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)

Thu, 26 Mar 2026

[7]  arXiv:2603.24030 (cross-list from cs.CV) [pdf, ps, other]
Title: Decompose and Transfer: CoT-Prompting Enhanced Alignment for Open-Vocabulary Temporal Action Detection
Comments: Accepted by CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[8]  arXiv:2603.23947 (cross-list from cs.SD) [pdf, ps, other]
Title: Variable-Length Audio Fingerprinting
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[9]  arXiv:2603.23810 (cross-list from eess.AS) [pdf, ps, other]
Title: Rethinking Masking Strategies for Masked Prediction-based Audio Self-supervised Learning
Comments: 6+1 pages, 2 figures, 3 tables, accepted at IJCNN 2026
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)

Wed, 25 Mar 2026

[10]  arXiv:2603.22850 [pdf, ps, other]
Title: A Video Steganography for H.265/HEVC Based on Multiple CU Size and Block Structure Distortion
Subjects: Multimedia (cs.MM)
[11]  arXiv:2603.22663 [pdf, ps, other]
Title: Short-Form Video Viewing Behavior Analysis and Multi-Step Viewing Time Prediction
Subjects: Multimedia (cs.MM)
[12]  arXiv:2603.23445 (cross-list from cs.HC) [pdf, ps, other]
Title: MRATTS: An MR-Based Acupoint Therapy Training System with Real-Time Acupoint Detection and Evaluation Standards
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[13]  arXiv:2603.23272 (cross-list from cs.CV) [pdf, ps, other]
Title: Multi-Modal Image Fusion via Intervention-Stable Feature Learning
Comments: Accpted by CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[14]  arXiv:2603.23192 (cross-list from cs.GR) [pdf, ps, other]
Title: GTLR-GS: Geometry-Texture Aware LiDAR-Regularized 3D Gaussian Splatting for Realistic Scene Reconstruction
Subjects: Graphics (cs.GR); Multimedia (cs.MM)
[15]  arXiv:2603.23118 (cross-list from cs.CV) [pdf, ps, other]
Title: SMSP: A Plug-and-Play Strategy of Multi-Scale Perception for MLLMs to Perceive Visual Illusions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[16]  arXiv:2603.22492 (cross-list from cs.CV) [pdf, ps, other]
Title: Tiny Inference-Time Scaling with Latent Verifiers
Comments: Findings of CVPR 2026 - Code at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[17]  arXiv:2603.22466 (cross-list from cs.CV) [pdf, ps, other]
Title: Color When It Counts: Grayscale-Guided Online Triggering for Always-On Streaming Video Sensing
Comments: Accepted at CVPR 2026 (Main track)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

Tue, 24 Mar 2026 (showing first 8 of 13 entries)

[18]  arXiv:2603.21948 [pdf, ps, other]
Title: Look, Listen and Segment: Towards Weakly Supervised Audio-visual Semantic Segmentation
Comments: Accepted by ICASSP 2026
Subjects: Multimedia (cs.MM)
[19]  arXiv:2603.20894 [pdf, ps, other]
Title: AcoustEmo: Open-Vocabulary Emotion Reasoning via Utterance-Aware Acoustic Q-Former
Comments: 6 pages
Subjects: Multimedia (cs.MM)
[20]  arXiv:2603.20354 [pdf, ps, other]
Title: Leum-VL Technical Report
Comments: 27 pages, 5 figures
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[21]  arXiv:2603.20201 [pdf, ps, other]
Title: FIGURA: A Modular Prompt Engineering Method for Artistic Figure Photography in Safety-Filtered Text-to-Image Models
Authors: Luca Cazzaniga
Comments: 10 pages, 6 tables. Preprint
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
[22]  arXiv:2603.21939 (cross-list from cs.CV) [pdf, ps, other]
Title: FeatDistill: A Feature Distillation Enhanced Multi-Expert Ensemble Framework for Robust AI-generated Image Detection
Comments: 6th place (6/507) technical report at the NTIRE 2026: Robust AI-Generated Image Detection in the Wild Challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[23]  arXiv:2603.21697 (cross-list from cs.CR) [pdf, ps, other]
Title: Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models
Comments: 31 pages
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[24]  arXiv:2603.21661 (cross-list from cs.CV) [pdf, ps, other]
Title: Cross-Scenario Deraining Adaptation with Unpaired Data: Superpixel Structural Priors and Multi-Stage Pseudo-Rain Synthesis
Comments: We aim at addressing the cross-scenario (i.e., O.O.D) de-rain challenge, which has been neglected for a long period
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[25]  arXiv:2603.21493 (cross-list from cs.CV) [pdf, ps, other]
Title: StreamingEval: A Unified Evaluation Protocol towards Realistic Streaming Video Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[ total of 33 entries: 1-25 | 26-33 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2603, contact, help  (Access key information)