We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions

[ total of 62 entries: 1-25 | 26-50 | 51-62 ]
[ showing 25 entries per page: fewer | more | all ]

Mon, 23 Mar 2026

[1]  arXiv:2603.20118 [pdf, ps, other]
Title: BioDCASE 2026 Challenge Baseline for Cross-Domain Mosquito Species Classification
Comments: BioDCASE 2026 CD-MSC Baseline, source code and models: this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[2]  arXiv:2603.19831 [pdf, ps, other]
Title: Gesture2Speech: How Far Can Hand Movements Shape Expressive Speech?
Comments: Accepted at The 2nd International Workshop on Bodily Expressed Emotion Understanding (BEEU) at AAAI 2026 [non-archival]
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[3]  arXiv:2603.19697 [pdf, ps, other]
Title: Plug-and-Steer: Decoupling Separation and Selection in Audio-Visual Target Speaker Extraction
Comments: Submitted to Interspeech 2026; demo available this https URL
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[4]  arXiv:2603.20165 (cross-list from cs.SD) [pdf, ps, other]
Title: Audio Avatar Fingerprinting: An Approach for Authorized Use of Voice Cloning in the Era of Synthetic Audio
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5]  arXiv:2603.19798 (cross-list from cs.SD) [pdf, ps, other]
Title: Borderless Long Speech Synthesis
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[6]  arXiv:2603.19468 (cross-list from cs.SD) [pdf, ps, other]
Title: Listen First, Then Answer: Timestamp-Grounded Speech Reasoning
Comments: Submitted to Interspeech 2026
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 20 Mar 2026

[7]  arXiv:2603.19195 [pdf, ps, other]
Title: How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation
Comments: Project website: this https URL
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[8]  arXiv:2603.18485 [pdf, ps, other]
Title: ARTT: Augmented Reverberant-Target Training for Unsupervised Monaural Speech Dereverberation
Comments: in submission
Subjects: Audio and Speech Processing (eess.AS)
[9]  arXiv:2603.18024 [pdf, ps, other]
Title: ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[10]  arXiv:2603.18023 [pdf, ps, other]
Title: PCOV-KWS: Multi-task Learning for Personalized Customizable Open Vocabulary Keyword Spotting
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[11]  arXiv:2603.19176 (cross-list from cs.SD) [pdf, ps, other]
Title: Few-shot Acoustic Synthesis with Multimodal Flow Matching
Comments: To appear at CVPR 2026. 23 pages, 16 figures. Project Page: this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[12]  arXiv:2603.18612 (cross-list from cs.CL) [pdf, ps, other]
Title: DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units
Comments: 6 pages, 2 figures. Submitted to Interspeech 2026
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13]  arXiv:2603.18048 (cross-list from cs.AI) [pdf, ps, other]
Title: DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models
Comments: 14 pages,6 figures
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14]  arXiv:2603.17769 (cross-list from cs.SD) [pdf, ps, other]
Title: Modeling Overlapped Speech with Shuffles
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Thu, 19 Mar 2026 (showing first 11 of 18 entries)

[15]  arXiv:2603.17837 [pdf, ps, other]
Title: The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[16]  arXiv:2603.17822 [pdf, ps, other]
Title: Multi-Source Evidence Fusion for Audio Question Answering
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[17]  arXiv:2603.17383 [pdf, ps, other]
Title: Robust Nasality Representation Learning for Cleft Palate-Related Velopharyngeal Dysfunction Screening in Real-World Settings
Comments: 2 figures. Machine learning for speech-based VPD screening under domain shift
Subjects: Audio and Speech Processing (eess.AS)
[18]  arXiv:2603.17377 [pdf, ps, other]
Title: Uncertainty Quantification and Risk Control for Multi-Speaker Sound Source Localization
Comments: 13 pages, 4 figures. Code available at: this https URL
Subjects: Audio and Speech Processing (eess.AS)
[19]  arXiv:2603.17025 [pdf, ps, other]
Title: Shared Representation Learning for Reference-Guided Targeted Sound Detection
Comments: Accepted to IEEE ICASSP 2026
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[20]  arXiv:2603.16972 [pdf, ps, other]
Title: Over-the-air White-box Attack on the Wav2Vec Speech Recognition Neural Network
Comments: 9 pages, 5 figures, 1 table
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[21]  arXiv:2603.16941 [pdf, ps, other]
Title: The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs
Comments: 5 pages, 3 figures, 1 table, Submitted to Interspeech 2026
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[22]  arXiv:2603.16924 [pdf, ps, other]
Title: SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[23]  arXiv:2603.16923 [pdf, ps, other]
Title: Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies
Comments: Submitted to Interspeech 2026. 9 Pages
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[24]  arXiv:2603.16922 [pdf, ps, other]
Title: Learnable Pulse Accumulation for On-Device Speech Recognition: How Much Attention Do You Need?
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25]  arXiv:2603.16920 [pdf, ps, other]
Title: Synthetic Data Domain Adaptation for ASR via LLM-based Text and Phonetic Respelling Augmentation
Comments: accepted by ICASSP 2026
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[ total of 62 entries: 1-25 | 26-50 | 51-62 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2603, contact, help  (Access key information)