Audio and Speech Processing

Authors and titles for recent submissions, skipping first 9

[ total of 23 entries: 1-25 | 10-23 ]
[ showing up to 25 entries per page: fewer | more ]

Wed, 3 Dec 2025

[10] arXiv:2512.02891 [pdf, ps, other]: Title: Perceptual evaluation of Acoustic Level of Detail in Virtual Acoustic Environments

Authors: Stefan Fichna, Steven van de Par, Bernhard U. Seeber, Stephan D. Ewert

Comments: This work has been submitted to Acoustics for possible publication. Template provided by MDPI

Subjects: Audio and Speech Processing (eess.AS)
[11] arXiv:2512.02759 [pdf, ps, other]: Title: Towards Language-Independent Face-Voice Association with Multimodal Foundation Models

Authors: Aref Farhadipour, Teodora Vukovic, Volker Dellwo

Comments: This paper presents the system description of the UZH-CL team for the FAME2026 Challenge at ICASSP 2026. Our model achieved second place in the final ranking

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[12] arXiv:2512.02027 [pdf, ps, other]: Title: On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts

Authors: Kashaf Gulzar, Dominik Wagner, Sebastian P. Bayerl, Florian Hönig, Tobias Bocklet, Korbinian Riedhammer

Comments: 6 pages, 1 figure. Accepted to ASRU 2025. This is the arXiv preprint of the accepted paper

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[13] arXiv:2512.02650 (cross-list from cs.CV) [pdf, ps, other]: Title: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Authors: Junwon Lee, Juhan Nam, Jiyoung Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2512.02593 (cross-list from cs.CL) [pdf, ps, other]: Title: Spoken Conversational Agents with Large Language Models

Authors: Chao-Han Huck Yang, Andreas Stolcke, Larry Heck

Comments: Accepted to EMNLP 2025 Tutorial

Subjects: Computation and Language (cs.CL); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 2 Dec 2025

[15] arXiv:2512.01466 [pdf, ps, other]: Title: Identifiability Conditions for Acoustic Feedback Cancellation with the Two-Channel Adaptive Feedback Canceller Algorithm

Authors: Arnout Roebben, Toon van Waterschoot, Jan Wouters, Marc Moonen

Comments: Accepted for publication in IEEE Open Journal of Signal Processing (OJSP)

Subjects: Audio and Speech Processing (eess.AS)
[16] arXiv:2512.00937 [pdf, ps, other]: Title: Arabic TTS with FastPitch: Reproducible Baselines, Adversarial Training, and Oversmoothing Analysis

Authors: Lars Nippert

Subjects: Audio and Speech Processing (eess.AS)
[17] arXiv:2512.00511 [pdf, ps, other]: Title: A Low-Complexity Speech Codec Using Parametric Dithering for ASR

Authors: Ellison Murray, Morriel Kasher, Predrag Spasojevic

Comments: 10 pages, 8 figures, Accepted 2026 Data Compression Conference

Subjects: Audio and Speech Processing (eess.AS)
[18] arXiv:2512.00482 [pdf, ps, other]: Title: Beyond Performance: Probing Representation Dynamics In Speech Enhancement Models

Authors: Yair Amar, Amir Ivry, Israel Cohen

Subjects: Audio and Speech Processing (eess.AS)

Mon, 1 Dec 2025

[19] arXiv:2511.23098 [pdf, ps, other]: Title: Group-Aware Partial Model Merging for Children's Automatic Speech Recognition

Authors: Thomas Rolland, Alberto Abad

Comments: IEEE ASRU 2025 Workshop AI4CSL

Subjects: Audio and Speech Processing (eess.AS)
[20] arXiv:2511.22687 (cross-list from cs.SD) [pdf, ps, other]: Title: PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

Authors: Jiatong Shi, Haoran Wang, William Chen, Chenda Li, Wangyou Zhang, Jinchuan Tian, Shinji Watanabe

Comments: Accepted by ASRU2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2511.22503 (cross-list from cs.CL) [pdf, ps, other]: Title: Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking

Authors: Katia Vendrame, Bolaji Yusuf, Santosh Kesiraju, Šimon Sedláček, Oldřich Plchot, Jan Černocký

Comments: submitted to ICASSP 2026

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2511.22293 (cross-list from cs.SD) [pdf, ps, other]: Title: GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis

Authors: Teysir Baoueb, Xiaoyu Bie, Mathieu Fontaine, Gaël Richard

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[23] arXiv:2511.21872 (cross-list from cs.SD) [pdf, ps, other]: Title: Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection

Authors: Bruno Padovese, Fabio Frazao, Michael Dowd, Ruth Joy

Comments: 16 pages, 6 Figures, 2 Tables, submitted to Marine Mammal Science as part of a special issue on Machine Learning and Artificial Intelligence in Marine Mammal Research

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

[ total of 23 entries: 1-25 | 10-23 ]
[ showing up to 25 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2512, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions, skipping first 9

Wed, 3 Dec 2025

Tue, 2 Dec 2025

Mon, 1 Dec 2025