We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions, skipping first 39

[ total of 48 entries: 1-25 | 15-39 | 40-48 ]
[ showing 25 entries per page: fewer | more | all ]

Mon, 1 Dec 2025

[40]  arXiv:2511.23178 [pdf, ps, other]
Title: HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding
Comments: Accepted by AAAI 2026
Subjects: Sound (cs.SD)
[41]  arXiv:2511.22696 [pdf, ps, other]
Title: Probabilistic Fusion and Calibration of Neural Speaker Diarization Models
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[42]  arXiv:2511.22687 [pdf, ps, other]
Title: PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning
Comments: Accepted by ASRU2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43]  arXiv:2511.22293 [pdf, ps, other]
Title: GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[44]  arXiv:2511.21872 [pdf, ps, other]
Title: Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection
Comments: 16 pages, 6 Figures, 2 Tables, submitted to Marine Mammal Science as part of a special issue on Machine Learning and Artificial Intelligence in Marine Mammal Research
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[45]  arXiv:2511.23142 (cross-list from cs.LG) [pdf, ps, other]
Title: Adapting Neural Audio Codecs to EEG
Comments: Foundation Models for the Brain and Body (BrainBodyFM@NeurIPS)
Subjects: Machine Learning (cs.LG); Sound (cs.SD)
[46]  arXiv:2511.22503 (cross-list from cs.CL) [pdf, ps, other]
Title: Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking
Comments: submitted to ICASSP 2026
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[47]  arXiv:2511.21780 (cross-list from cs.MM) [pdf, ps, other]
Title: 3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation
Subjects: Multimedia (cs.MM); Sound (cs.SD)
[48]  arXiv:2511.21704 (cross-list from cs.CL) [pdf, ps, other]
Title: On the Cross-lingual Transferability of Pre-trained wav2vec2-based Models
Subjects: Computation and Language (cs.CL); Sound (cs.SD)
[ total of 48 entries: 1-25 | 15-39 | 40-48 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help  (Access key information)