Audio and Speech Processing

Authors and titles for recent submissions

[ total of 42 entries: 1-25 | 26-42 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 13 May 2026

[1] arXiv:2605.12287 [pdf, ps, other]: Title: The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking

Authors: Jaehoon Ahn, Tae Gum Hwang, Moon-Ryul Jung

Comments: 6 pages, 3 figures. Technical report on beat tracking failure modes; prepared for ISMIR 2026

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[2] arXiv:2605.12107 [pdf, ps, other]: Title: Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement

Authors: Danilo de Oliveira, Tal Peer, Timo Gerkmann

Subjects: Audio and Speech Processing (eess.AS)
[3] arXiv:2605.12036 [pdf, ps, other]: Title: Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model

Authors: Guojian Li, Zhixian Zhao, Zhennan Lin, Jingbin Hu, Qirui Zhan, Yuang Cao, Pengyuan Xie, Chuan Xie, Jie Liu, Qiang Zhang, Zhonghua Fu, Lei Xie

Subjects: Audio and Speech Processing (eess.AS)
[4] arXiv:2605.11422 [pdf, ps, other]: Title: Chunkwise Aligners for Streaming Speech Recognition

Authors: Wen Shen Teo, Takafumi Moriya, Masato Mimura

Journal-ref: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026, pp. 18282-18286

Subjects: Audio and Speech Processing (eess.AS)
[5] arXiv:2605.12135 (cross-list from cs.SD) [pdf, ps, other]: Title: STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts

Authors: Joshua Opria

Comments: 9 pages, 4 figures, 3 tables. Code and models: this https URL<your-github-username>/autocharter

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6] arXiv:2605.11286 (cross-list from eess.SP) [pdf, ps, other]: Title: Adaptive Diagonal Loading using Krylov Subspaces for Robust Beamforming

Authors: Manan Mittal, Ryan M. Corey, John R. Buck, Andrew C. Singer

Comments: 5 pages, 8 figures

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7] arXiv:2509.13548 (cross-list from cs.SD) [pdf, ps, other]: Title: Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers

Authors: Manan Mittal, Thomas Deppisch, Joseph Forrer, Chris Le Sueur, Zamir Ben-Hur, David Lou Alon, Daniel D.E. Wong

Comments: 5 pages, 3 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

Tue, 12 May 2026

[8] arXiv:2605.10398 [pdf, ps, other]: Title: SF-Flow: Sound field magnitude estimation via flow matching guided by sparse measurements

Authors: Ege Erdem, Shoichi Koyama, Tomohiko Nakamura, Orchisama Das, Zoran Cvetković

Subjects: Audio and Speech Processing (eess.AS)
[9] arXiv:2605.10084 [pdf, ps, other]: Title: PoDAR: Power-Disentangled Audio Representation for Generative Modeling

Authors: Alejandro Luebs, Mithilesh Vaidya, Ishaan Kumar, Sumukh Badam, Stephen W. Bailey, Matthew Bendel, Jose Sotelo, Xingzhe He

Comments: 9 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[10] arXiv:2605.09627 [pdf, ps, other]: Title: Single-Microphone Audio Point Source Discriminative Localization From Reverberation Late Tail Estimation

Authors: Matthew Maciejewski

Comments: Published at IEEE ICASSP 2026

Subjects: Audio and Speech Processing (eess.AS)
[11] arXiv:2605.09568 [pdf, ps, other]: Title: RADAR Challenge 2026: Robust Audio Deepfake Recognition under Media Transformations

Authors: Hieu-Thi Luong, Xuechen Liu, Ivan Kukanov, Zheng Xin Chai, Kong Aik Lee

Comments: Submitted to APSIPA 2026

Subjects: Audio and Speech Processing (eess.AS)
[12] arXiv:2605.09413 [pdf, ps, other]: Title: Evaluating the Expressive Appropriateness of Speech in Rich Contexts

Authors: Tianrui Wang, Ziyang Ma, Yizhou Peng, Haoyu Wang, Zhikang Niu, Zikang Huang, Yihao Wu, Yi-Wen Chao, Yu Jiang, Yuheng Lu, Guanrou Yang, Xuanchen Li, Hexin Liu, Chunyu Qiang, Cheng Gong, Yifan Yang, Tianchi Liu, Junyu Wang, Nana Hou, Meng Ge, Fuming You, Wei Yang, Zhongqian Sun, Haifeng Hu, Xiaobao Wang, Eng Siong Chng, Xie Chen, Longbiao Wang, Jianwu Dang

Comments: 19 pages, 6 figures

Subjects: Audio and Speech Processing (eess.AS)
[13] arXiv:2605.09386 [pdf, ps, other]: Title: Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech

Authors: Dong Yang, Yiyi Cai, Haoyu Zhang, Yuki Saito, Hiroshi Saruwatari

Comments: Under Review

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[14] arXiv:2605.08608 [pdf, ps, other]: Title: Reducing Linguistic Hallucination in LM-Based Speech Enhancement via Noise-Invariant Acoustic-Semantic Distillation

Authors: Zheng Wang, Xiaobin Rong, Hang Su, Tianyi Tan, Junnan Wu, Lichun Fan, Zhenbo Luo, Jian Luan, Jing Lu

Subjects: Audio and Speech Processing (eess.AS)
[15] arXiv:2605.08431 [pdf, ps, other]: Title: Latent Secret Spin: Keyed Orthogonal Rotations for Blind Speech Watermarking in Anisotropic Latent Spaces

Authors: Emma Coletta, Massimiliano Todisco, Michele Panariello, Antonio Faonio, Nicholas Evans

Subjects: Audio and Speech Processing (eess.AS)
[16] arXiv:2605.08189 [pdf, ps, other]: Title: DiffVQE: Hybrid Diffusion Voice Quality Enhancement Under Acoustic Echo and Noise

Authors: Haljan Lugo Girao, Ernst Seidel, Pejman Mowlaee, Ziyue Zhao, Tim Fingscheidt

Comments: 6 pages, 4 figures, submitted to Interspeech 2026

Subjects: Audio and Speech Processing (eess.AS)
[17] arXiv:2605.08186 [pdf, ps, other]: Title: Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models

Authors: Wei-Ping Huang, Chee-En Yu, Guan-Ting Lin, Hung-yi Lee

Comments: Submitted to INTERSPEECH 2026

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[18] arXiv:2605.08165 [pdf, ps, other]: Title: Low-Cost Detection of Degraded Voice Clones via Source-Output Acoustic Consistency

Authors: Jana Shokr, Minos Papadopoulos, Jeremy Cooperstock, Pavo Orepic

Comments: 7 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS)
[19] arXiv:2605.10815 (cross-list from cs.AI) [pdf, ps, other]: Title: Probing Cross-modal Information Hubs in Audio-Visual LLMs

Authors: Jihoo Jung, Chaeyoung Jung, Ji-Hoon Kim, Joon Son Chung

Comments: Accepted by ICML 2026

Subjects: Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20] arXiv:2605.10203 (cross-list from cs.SD) [pdf, ps, other]: Title: Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration

Authors: Haowen Li, Tianxiang Li, Yi Yang, Boyu Cao, Qi Liu

Comments: Accepted by ICML 2026

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2605.10199 (cross-list from cs.CL) [pdf, ps, other]: Title: How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Authors: Hui Lu, Xueyuan Chen, Huimeng Wang, Shuhai Peng, Shiyin Kang, Xixin Wu, Zhiyong Wu

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[22] arXiv:2605.08961 (cross-list from cs.CL) [pdf, ps, other]: Title: Dolphin-CN-Dialect: Where Chinese Dialects Matter

Authors: Yangyang Meng, Huihang Zhong, Guodong Lin, Guanbo Wang, Hu Du, Zhiming Shao, Yukai Huang, Ke Li, Wei-Qiang Zhang

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[23] arXiv:2605.08214 (cross-list from cs.SD) [pdf, ps, other]: Title: Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization

Authors: Mohammed Aman Bhuiyan, Md Sazzad Hossain Adib, Samiul Basir Bhuiyan, Amit Chakraborty, Aritra Islam Saswato, Ahmed Faizul Haque Dhrubo, Mohammad Ashrafuzzaman Khan

Comments: 3 figures and 5 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[24] arXiv:2605.08194 (cross-list from cs.SD) [pdf, ps, other]: Title: ShipEcho -- An Interactive Tool for Global Mapping of Underwater Radiated Noise from Vessels

Authors: Mark Shipton, Valentino Denona, Đula Nađ, Roee Diamant

Comments: 34 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

Mon, 11 May 2026 (showing first 1 of 6 entries)

[25] arXiv:2605.07694 [pdf, ps, other]: Title: Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation

Authors: Michael Neri, Archontis Politis, Tuomas Virtanen

Comments: Submitted to IWAENC 2026

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)

[ total of 42 entries: 1-25 | 26-42 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2605, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions

Wed, 13 May 2026

Tue, 12 May 2026

Mon, 11 May 2026 (showing first 1 of 6 entries)