We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions

[ total of 23 entries: 1-23 ]
[ showing up to 25 entries per page: fewer | more ]

Fri, 5 Dec 2025

[1]  arXiv:2512.04964 [pdf, ps, other]
Title: HiPPO: Exploring A Novel Hierarchical Pronunciation Assessment Approach for Spoken Languages
Comments: Accepted and to appear in AACL-IJCNLP2025
Subjects: Audio and Speech Processing (eess.AS)
[2]  arXiv:2512.04945 [pdf, ps, other]
Title: TripleC Learning and Lightweight Speech Enhancement for Multi-Condition Target Speech Extraction
Authors: Ziling Huang (Shanghai Normal University, China)
Comments: Submitted to ICASSP2026
Subjects: Audio and Speech Processing (eess.AS)
[3]  arXiv:2512.04792 [pdf, ps, other]
Title: Towards predicting binaural audio quality in listeners with normal and impaired hearing
Comments: accepted for publication in Forum Acusticum
Subjects: Audio and Speech Processing (eess.AS)
[4]  arXiv:2512.04552 (cross-list from cs.SD) [pdf, ps, other]
Title: RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[5]  arXiv:2512.04551 (cross-list from cs.SD) [pdf, ps, other]
Title: Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Thu, 4 Dec 2025

[6]  arXiv:2512.03486 [pdf, ps, other]
Title: A Universal Harmonic Discriminator for High-quality GAN-based Vocoder
Comments: Accepted by ASRU2025
Subjects: Audio and Speech Processing (eess.AS)
[7]  arXiv:2512.03301 [pdf, ps, other]
Title: Comparing Unsupervised and Supervised Semantic Speech Tokens: A Case Study of Child ASR
Comments: ASRU-AI4CSL
Subjects: Audio and Speech Processing (eess.AS)
[8]  arXiv:2512.03636 (cross-list from cs.HC) [pdf, ps, other]
Title: Head, posture, and full-body gestures in dyadic conversations
Comments: 7 figures, 10 tables, 29 pages
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9]  arXiv:2512.03458 (cross-list from eess.SP) [pdf, ps, other]
Title: A Convolutional Framework for Mapping Imagined Auditory MEG into Listened Brain Responses
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 3 Dec 2025

[10]  arXiv:2512.02891 [pdf, ps, other]
Title: Perceptual evaluation of Acoustic Level of Detail in Virtual Acoustic Environments
Comments: This work has been submitted to Acoustics for possible publication. Template provided by MDPI
Subjects: Audio and Speech Processing (eess.AS)
[11]  arXiv:2512.02759 [pdf, ps, other]
Title: Towards Language-Independent Face-Voice Association with Multimodal Foundation Models
Comments: This paper presents the system description of the UZH-CL team for the FAME2026 Challenge at ICASSP 2026. Our model achieved second place in the final ranking
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[12]  arXiv:2512.02027 [pdf, ps, other]
Title: On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts
Comments: 6 pages, 1 figure. Accepted to ASRU 2025. This is the arXiv preprint of the accepted paper
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[13]  arXiv:2512.02650 (cross-list from cs.CV) [pdf, ps, other]
Title: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14]  arXiv:2512.02593 (cross-list from cs.CL) [pdf, ps, other]
Title: Spoken Conversational Agents with Large Language Models
Comments: Accepted to EMNLP 2025 Tutorial
Subjects: Computation and Language (cs.CL); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 2 Dec 2025

[15]  arXiv:2512.01466 [pdf, ps, other]
Title: Identifiability Conditions for Acoustic Feedback Cancellation with the Two-Channel Adaptive Feedback Canceller Algorithm
Comments: Accepted for publication in IEEE Open Journal of Signal Processing (OJSP)
Subjects: Audio and Speech Processing (eess.AS)
[16]  arXiv:2512.00937 [pdf, ps, other]
Title: Arabic TTS with FastPitch: Reproducible Baselines, Adversarial Training, and Oversmoothing Analysis
Authors: Lars Nippert
Subjects: Audio and Speech Processing (eess.AS)
[17]  arXiv:2512.00511 [pdf, ps, other]
Title: A Low-Complexity Speech Codec Using Parametric Dithering for ASR
Comments: 10 pages, 8 figures, Accepted 2026 Data Compression Conference
Subjects: Audio and Speech Processing (eess.AS)
[18]  arXiv:2512.00482 [pdf, ps, other]
Title: Beyond Performance: Probing Representation Dynamics In Speech Enhancement Models
Subjects: Audio and Speech Processing (eess.AS)

Mon, 1 Dec 2025

[19]  arXiv:2511.23098 [pdf, ps, other]
Title: Group-Aware Partial Model Merging for Children's Automatic Speech Recognition
Comments: IEEE ASRU 2025 Workshop AI4CSL
Subjects: Audio and Speech Processing (eess.AS)
[20]  arXiv:2511.22687 (cross-list from cs.SD) [pdf, ps, other]
Title: PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning
Comments: Accepted by ASRU2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21]  arXiv:2511.22503 (cross-list from cs.CL) [pdf, ps, other]
Title: Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking
Comments: submitted to ICASSP 2026
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22]  arXiv:2511.22293 (cross-list from cs.SD) [pdf, ps, other]
Title: GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[23]  arXiv:2511.21872 (cross-list from cs.SD) [pdf, ps, other]
Title: Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection
Comments: 16 pages, 6 Figures, 2 Tables, submitted to Marine Mammal Science as part of a special issue on Machine Learning and Artificial Intelligence in Marine Mammal Research
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ total of 23 entries: 1-23 ]
[ showing up to 25 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2512, contact, help  (Access key information)