We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions

[ total of 79 entries: 1-25 | 26-50 | 51-75 | 76-79 ]
[ showing 25 entries per page: fewer | more | all ]

Mon, 23 Mar 2026

[1]  arXiv:2603.20165 [pdf, ps, other]
Title: Audio Avatar Fingerprinting: An Approach for Authorized Use of Voice Cloning in the Era of Synthetic Audio
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2]  arXiv:2603.19857 [pdf, ps, other]
Title: FoleyDirector: Fine-Grained Temporal Steering for Video-to-Audio Generation via Structured Scripts
Comments: Accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026, 18 pages
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
[3]  arXiv:2603.19798 [pdf, ps, other]
Title: Borderless Long Speech Synthesis
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[4]  arXiv:2603.19739 [pdf, ps, other]
Title: MOSS-TTSD: Text to Spoken Dialogue Generation
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[5]  arXiv:2603.19615 [pdf, ps, other]
Title: CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation
Comments: A condensed version of this work has been submitted to Interspeech 2026. Section 10 is an extended analysis added in this version
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[6]  arXiv:2603.19468 [pdf, ps, other]
Title: Listen First, Then Answer: Timestamp-Grounded Speech Reasoning
Comments: Submitted to Interspeech 2026
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7]  arXiv:2603.20118 (cross-list from eess.AS) [pdf, ps, other]
Title: BioDCASE 2026 Challenge Baseline for Cross-Domain Mosquito Species Classification
Comments: BioDCASE 2026 CD-MSC Baseline, source code and models: this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8]  arXiv:2603.19697 (cross-list from eess.AS) [pdf, ps, other]
Title: Plug-and-Steer: Decoupling Separation and Selection in Audio-Visual Target Speaker Extraction
Comments: Submitted to Interspeech 2026; demo available this https URL
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)

Fri, 20 Mar 2026

[9]  arXiv:2603.19176 [pdf, ps, other]
Title: Few-shot Acoustic Synthesis with Multimodal Flow Matching
Comments: To appear at CVPR 2026. 23 pages, 16 figures. Project Page: this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[10]  arXiv:2603.18678 [pdf, ps, other]
Title: Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models
Comments: The paper is currently under review
Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[11]  arXiv:2603.18359 [pdf, ps, other]
Title: Towards Interpretable Framework for Neural Audio Codecs via Sparse Autoencoders: A Case Study on Accent Information
Subjects: Sound (cs.SD)
[12]  arXiv:2603.18090 [pdf, ps, other]
Title: MOSS-TTS Technical Report
Comments: Project page: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[13]  arXiv:2603.19195 (cross-list from eess.AS) [pdf, ps, other]
Title: How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation
Comments: Project website: this https URL
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[14]  arXiv:2603.18758 (cross-list from cs.HC) [pdf, ps, other]
Title: Dual-Model Prediction of Affective Engagement and Vocal Attractiveness from Speaker Expressiveness in Video Learning
Comments: Preprint. Accepted for publication in IEEE Transactions on Computational Social Systems
Journal-ref: IEEE Transactions on Computational Social Systems, 2026
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[15]  arXiv:2603.18612 (cross-list from cs.CL) [pdf, ps, other]
Title: DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units
Comments: 6 pages, 2 figures. Submitted to Interspeech 2026
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16]  arXiv:2603.18299 (cross-list from cs.LG) [pdf, ps, other]
Title: ALIGN: Adversarial Learning for Generalizable Speech Neuroprosthesis
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)
[17]  arXiv:2603.18103 (cross-list from cs.CR) [pdf, ps, other]
Title: STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)
[18]  arXiv:2603.18082 (cross-list from cs.MM) [pdf, ps, other]
Title: EgoAdapt: Enhancing Robustness in Egocentric Interactive Speaker Detection Under Missing Modalities
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[19]  arXiv:2603.18048 (cross-list from cs.AI) [pdf, ps, other]
Title: DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models
Comments: 14 pages,6 figures
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20]  arXiv:2603.18024 (cross-list from eess.AS) [pdf, ps, other]
Title: ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[21]  arXiv:2603.18023 (cross-list from eess.AS) [pdf, ps, other]
Title: PCOV-KWS: Multi-task Learning for Personalized Customizable Open Vocabulary Keyword Spotting
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)

Thu, 19 Mar 2026 (showing first 4 of 12 entries)

[22]  arXiv:2603.17769 [pdf, ps, other]
Title: Modeling Overlapped Speech with Shuffles
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[23]  arXiv:2603.16926 [pdf, ps, other]
Title: Music Source Restoration with Ensemble Separation and Targeted Reconstruction
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[24]  arXiv:2603.16914 [pdf, ps, other]
Title: Quantizer-Aware Hierarchical Neural Codec Modeling for Speech Deepfake Detection
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[25]  arXiv:2603.17558 (cross-list from cs.CL) [pdf, ps, other]
Title: Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition
Comments: 13 pages, 8 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD)
[ total of 79 entries: 1-25 | 26-50 | 51-75 | 76-79 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2603, contact, help  (Access key information)