Audio and Speech Processing

Authors and titles for recent submissions

Fri, 5 Dec 2025
Thu, 4 Dec 2025
Wed, 3 Dec 2025
Tue, 2 Dec 2025
Mon, 1 Dec 2025

[ total of 23 entries: 1-23 ]
[ showing up to 25 entries per page: fewer | more ]

Fri, 5 Dec 2025

[1] arXiv:2512.04964 [pdf, ps, other]: Title: HiPPO: Exploring A Novel Hierarchical Pronunciation Assessment Approach for Spoken Languages

Authors: Bi-Cheng Yan, Hsin-Wei Wang, Fu-An Chao, Tien-Hong Lo, Yung-Chang Hsu, Berlin Chen

Comments: Accepted and to appear in AACL-IJCNLP2025

Subjects: Audio and Speech Processing (eess.AS)
[2] arXiv:2512.04945 [pdf, ps, other]: Title: TripleC Learning and Lightweight Speech Enhancement for Multi-Condition Target Speech Extraction

Authors: Ziling Huang (Shanghai Normal University, China)

Comments: Submitted to ICASSP2026

Subjects: Audio and Speech Processing (eess.AS)
[3] arXiv:2512.04792 [pdf, ps, other]: Title: Towards predicting binaural audio quality in listeners with normal and impaired hearing

Authors: Thomas Biberger, Stephan D. Ewert

Comments: accepted for publication in Forum Acusticum

Subjects: Audio and Speech Processing (eess.AS)
[4] arXiv:2512.04552 (cross-list from cs.SD) [pdf, ps, other]: Title: RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS

Authors: Cong Wang, Changfeng Gao, Yang Xiang, Zhihao Du, Keyu An, Han Zhao, Qian Chen, Xiangang Li, Yingming Gao, Ya Li

Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[5] arXiv:2512.04551 (cross-list from cs.SD) [pdf, ps, other]: Title: Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention

Authors: Cong Wang, Yizhong Geng, Yuhua Wen, Qifei Li, Yingming Gao, Ruimin Wang, Chunfeng Wang, Hao Li, Ya Li, Wei Chen

Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Thu, 4 Dec 2025

[6] arXiv:2512.03486 [pdf, ps, other]: Title: A Universal Harmonic Discriminator for High-quality GAN-based Vocoder

Authors: Nan Xu, Zhaolong Huang, Xiao Zeng

Comments: Accepted by ASRU2025

Subjects: Audio and Speech Processing (eess.AS)
[7] arXiv:2512.03301 [pdf, ps, other]: Title: Comparing Unsupervised and Supervised Semantic Speech Tokens: A Case Study of Child ASR

Authors: Mohan Shi, Natarajan Balaji Shankar, Kaiyuan Zhang, Zilai Wang, Abeer Alwan

Comments: ASRU-AI4CSL

Subjects: Audio and Speech Processing (eess.AS)
[8] arXiv:2512.03636 (cross-list from cs.HC) [pdf, ps, other]: Title: Head, posture, and full-body gestures in dyadic conversations

Authors: Ľuboš Hládek, Bernhard U. Seeber

Comments: 7 figures, 10 tables, 29 pages

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:2512.03458 (cross-list from eess.SP) [pdf, ps, other]: Title: A Convolutional Framework for Mapping Imagined Auditory MEG into Listened Brain Responses

Authors: Maryam Maghsoudi, Mohsen Rezaeizadeh, Shihab Shamma

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 3 Dec 2025

[10] arXiv:2512.02891 [pdf, ps, other]: Title: Perceptual evaluation of Acoustic Level of Detail in Virtual Acoustic Environments

Authors: Stefan Fichna, Steven van de Par, Bernhard U. Seeber, Stephan D. Ewert

Comments: This work has been submitted to Acoustics for possible publication. Template provided by MDPI

Subjects: Audio and Speech Processing (eess.AS)
[11] arXiv:2512.02759 [pdf, ps, other]: Title: Towards Language-Independent Face-Voice Association with Multimodal Foundation Models

Authors: Aref Farhadipour, Teodora Vukovic, Volker Dellwo

Comments: This paper presents the system description of the UZH-CL team for the FAME2026 Challenge at ICASSP 2026. Our model achieved second place in the final ranking

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[12] arXiv:2512.02027 [pdf, ps, other]: Title: On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts

Authors: Kashaf Gulzar, Dominik Wagner, Sebastian P. Bayerl, Florian Hönig, Tobias Bocklet, Korbinian Riedhammer

Comments: 6 pages, 1 figure. Accepted to ASRU 2025. This is the arXiv preprint of the accepted paper

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[13] arXiv:2512.02650 (cross-list from cs.CV) [pdf, ps, other]: Title: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Authors: Junwon Lee, Juhan Nam, Jiyoung Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2512.02593 (cross-list from cs.CL) [pdf, ps, other]: Title: Spoken Conversational Agents with Large Language Models

Authors: Chao-Han Huck Yang, Andreas Stolcke, Larry Heck

Comments: Accepted to EMNLP 2025 Tutorial

Subjects: Computation and Language (cs.CL); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 2 Dec 2025

[15] arXiv:2512.01466 [pdf, ps, other]: Title: Identifiability Conditions for Acoustic Feedback Cancellation with the Two-Channel Adaptive Feedback Canceller Algorithm

Authors: Arnout Roebben, Toon van Waterschoot, Jan Wouters, Marc Moonen

Comments: Accepted for publication in IEEE Open Journal of Signal Processing (OJSP)

Subjects: Audio and Speech Processing (eess.AS)
[16] arXiv:2512.00937 [pdf, ps, other]: Title: Arabic TTS with FastPitch: Reproducible Baselines, Adversarial Training, and Oversmoothing Analysis

Authors: Lars Nippert

Subjects: Audio and Speech Processing (eess.AS)
[17] arXiv:2512.00511 [pdf, ps, other]: Title: A Low-Complexity Speech Codec Using Parametric Dithering for ASR

Authors: Ellison Murray, Morriel Kasher, Predrag Spasojevic

Comments: 10 pages, 8 figures, Accepted 2026 Data Compression Conference

Subjects: Audio and Speech Processing (eess.AS)
[18] arXiv:2512.00482 [pdf, ps, other]: Title: Beyond Performance: Probing Representation Dynamics In Speech Enhancement Models

Authors: Yair Amar, Amir Ivry, Israel Cohen

Subjects: Audio and Speech Processing (eess.AS)

Mon, 1 Dec 2025

[19] arXiv:2511.23098 [pdf, ps, other]: Title: Group-Aware Partial Model Merging for Children's Automatic Speech Recognition

Authors: Thomas Rolland, Alberto Abad

Comments: IEEE ASRU 2025 Workshop AI4CSL

Subjects: Audio and Speech Processing (eess.AS)
[20] arXiv:2511.22687 (cross-list from cs.SD) [pdf, ps, other]: Title: PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

Authors: Jiatong Shi, Haoran Wang, William Chen, Chenda Li, Wangyou Zhang, Jinchuan Tian, Shinji Watanabe

Comments: Accepted by ASRU2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2511.22503 (cross-list from cs.CL) [pdf, ps, other]: Title: Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking

Authors: Katia Vendrame, Bolaji Yusuf, Santosh Kesiraju, Šimon Sedláček, Oldřich Plchot, Jan Černocký

Comments: submitted to ICASSP 2026

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2511.22293 (cross-list from cs.SD) [pdf, ps, other]: Title: GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis

Authors: Teysir Baoueb, Xiaoyu Bie, Mathieu Fontaine, Gaël Richard

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[23] arXiv:2511.21872 (cross-list from cs.SD) [pdf, ps, other]: Title: Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection

Authors: Bruno Padovese, Fabio Frazao, Michael Dowd, Ruth Joy

Comments: 16 pages, 6 Figures, 2 Tables, submitted to Marine Mammal Science as part of a special issue on Machine Learning and Artificial Intelligence in Marine Mammal Research

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Fri, 5 Dec 2025
Thu, 4 Dec 2025
Wed, 3 Dec 2025
Tue, 2 Dec 2025
Mon, 1 Dec 2025

[ total of 23 entries: 1-23 ]
[ showing up to 25 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2512, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions

Fri, 5 Dec 2025

Thu, 4 Dec 2025

Wed, 3 Dec 2025

Tue, 2 Dec 2025

Mon, 1 Dec 2025