We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions

[ total of 42 entries: 1-25 | 26-42 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 13 May 2026

[1]  arXiv:2605.12287 [pdf, ps, other]
Title: The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking
Comments: 6 pages, 3 figures. Technical report on beat tracking failure modes; prepared for ISMIR 2026
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[2]  arXiv:2605.12107 [pdf, ps, other]
Title: Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement
Subjects: Audio and Speech Processing (eess.AS)
[3]  arXiv:2605.12036 [pdf, ps, other]
Title: Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
Subjects: Audio and Speech Processing (eess.AS)
[4]  arXiv:2605.11422 [pdf, ps, other]
Title: Chunkwise Aligners for Streaming Speech Recognition
Journal-ref: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026, pp. 18282-18286
Subjects: Audio and Speech Processing (eess.AS)
[5]  arXiv:2605.12135 (cross-list from cs.SD) [pdf, ps, other]
Title: STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts
Authors: Joshua Opria
Comments: 9 pages, 4 figures, 3 tables. Code and models: this https URL<your-github-username>/autocharter
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6]  arXiv:2605.11286 (cross-list from eess.SP) [pdf, ps, other]
Title: Adaptive Diagonal Loading using Krylov Subspaces for Robust Beamforming
Comments: 5 pages, 8 figures
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7]  arXiv:2509.13548 (cross-list from cs.SD) [pdf, ps, other]
Title: Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

Tue, 12 May 2026

[8]  arXiv:2605.10398 [pdf, ps, other]
Title: SF-Flow: Sound field magnitude estimation via flow matching guided by sparse measurements
Subjects: Audio and Speech Processing (eess.AS)
[9]  arXiv:2605.10084 [pdf, ps, other]
Title: PoDAR: Power-Disentangled Audio Representation for Generative Modeling
Comments: 9 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[10]  arXiv:2605.09627 [pdf, ps, other]
Title: Single-Microphone Audio Point Source Discriminative Localization From Reverberation Late Tail Estimation
Comments: Published at IEEE ICASSP 2026
Subjects: Audio and Speech Processing (eess.AS)
[11]  arXiv:2605.09568 [pdf, ps, other]
Title: RADAR Challenge 2026: Robust Audio Deepfake Recognition under Media Transformations
Comments: Submitted to APSIPA 2026
Subjects: Audio and Speech Processing (eess.AS)
[12]  arXiv:2605.09413 [pdf, ps, other]
Title: Evaluating the Expressive Appropriateness of Speech in Rich Contexts
Comments: 19 pages, 6 figures
Subjects: Audio and Speech Processing (eess.AS)
[13]  arXiv:2605.09386 [pdf, ps, other]
Title: Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech
Comments: Under Review
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[14]  arXiv:2605.08608 [pdf, ps, other]
Title: Reducing Linguistic Hallucination in LM-Based Speech Enhancement via Noise-Invariant Acoustic-Semantic Distillation
Subjects: Audio and Speech Processing (eess.AS)
[15]  arXiv:2605.08431 [pdf, ps, other]
Title: Latent Secret Spin: Keyed Orthogonal Rotations for Blind Speech Watermarking in Anisotropic Latent Spaces
Subjects: Audio and Speech Processing (eess.AS)
[16]  arXiv:2605.08189 [pdf, ps, other]
Title: DiffVQE: Hybrid Diffusion Voice Quality Enhancement Under Acoustic Echo and Noise
Comments: 6 pages, 4 figures, submitted to Interspeech 2026
Subjects: Audio and Speech Processing (eess.AS)
[17]  arXiv:2605.08186 [pdf, ps, other]
Title: Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models
Comments: Submitted to INTERSPEECH 2026
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[18]  arXiv:2605.08165 [pdf, ps, other]
Title: Low-Cost Detection of Degraded Voice Clones via Source-Output Acoustic Consistency
Comments: 7 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS)
[19]  arXiv:2605.10815 (cross-list from cs.AI) [pdf, ps, other]
Title: Probing Cross-modal Information Hubs in Audio-Visual LLMs
Comments: Accepted by ICML 2026
Subjects: Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20]  arXiv:2605.10203 (cross-list from cs.SD) [pdf, ps, other]
Title: Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration
Comments: Accepted by ICML 2026
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21]  arXiv:2605.10199 (cross-list from cs.CL) [pdf, ps, other]
Title: How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[22]  arXiv:2605.08961 (cross-list from cs.CL) [pdf, ps, other]
Title: Dolphin-CN-Dialect: Where Chinese Dialects Matter
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[23]  arXiv:2605.08214 (cross-list from cs.SD) [pdf, ps, other]
Title: Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
Comments: 3 figures and 5 tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[24]  arXiv:2605.08194 (cross-list from cs.SD) [pdf, ps, other]
Title: ShipEcho -- An Interactive Tool for Global Mapping of Underwater Radiated Noise from Vessels
Comments: 34 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

Mon, 11 May 2026 (showing first 1 of 6 entries)

[25]  arXiv:2605.07694 [pdf, ps, other]
Title: Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation
Comments: Submitted to IWAENC 2026
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
[ total of 42 entries: 1-25 | 26-42 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2605, contact, help  (Access key information)