We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions, skipping first 17

[ total of 51 entries: 1-25 | 18-42 | 43-51 ]
[ showing 25 entries per page: fewer | more | all ]

Tue, 9 Dec 2025 (continued, showing last 2 of 19 entries)

[18]  arXiv:2512.06304 (cross-list from eess.AS) [pdf, ps, other]
Title: Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Sound (cs.SD)
[19]  arXiv:2512.05994 (cross-list from eess.AS) [pdf, ps, other]
Title: KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)

Mon, 8 Dec 2025

[20]  arXiv:2512.05592 [pdf, ps, other]
Title: The T12 System for AudioMOS Challenge 2025: Audio Aesthetics Score Prediction System Using KAN- and VERSA-based Models
Comments: Accepted by IEEE ASRU 2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21]  arXiv:2512.05508 [pdf, ps, other]
Title: Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction
Comments: 8 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[22]  arXiv:2512.05528 (cross-list from q-bio.NC) [pdf, ps, other]
Title: Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening
Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[23]  arXiv:2512.05201 (cross-list from cs.NI) [pdf, ps, other]
Title: MuMeNet: A Network Simulator for Musical Metaverse Communications
Comments: To appear in 2025 IEEE 6th International Symposium on the Internet of Sounds (IS2) proceedings
Subjects: Networking and Internet Architecture (cs.NI); Sound (cs.SD)
[24]  arXiv:2512.05126 (cross-list from eess.AS) [pdf, ps, other]
Title: SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)

Fri, 5 Dec 2025

[25]  arXiv:2512.04847 [pdf, ps, other]
Title: Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[26]  arXiv:2512.04827 [pdf, ps, other]
Title: Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs
Authors: Wenzhang Du
Comments: 11 pages, 3 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[27]  arXiv:2512.04814 [pdf, ps, other]
Title: Shared Multi-modal Embedding Space for Face-Voice Association
Comments: Ranked 1st in Fame 2026 Challenge, ICASSP
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
[28]  arXiv:2512.04793 [pdf, ps, other]
Title: YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases
Comments: 17 pages, 5 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[29]  arXiv:2512.04779 [pdf, ps, other]
Title: YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
Comments: 13 pages, 3 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[30]  arXiv:2512.04720 [pdf, ps, other]
Title: M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis
Comments: Submitted to ICASSP 2026
Subjects: Sound (cs.SD)
[31]  arXiv:2512.04711 [pdf, ps, other]
Title: Large Speech Model Enabled Semantic Communication
Comments: 15 pages, 9 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[32]  arXiv:2512.04616 [pdf, ps, other]
Title: Standard audiogram classification from loudness scaling data using unsupervised, supervised, and explainable machine learning techniques
Subjects: Sound (cs.SD); Medical Physics (physics.med-ph)
[33]  arXiv:2512.04552 [pdf, ps, other]
Title: RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34]  arXiv:2512.04551 [pdf, ps, other]
Title: Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Thu, 4 Dec 2025

[35]  arXiv:2512.03637 [pdf, ps, other]
Title: AaPE: Aliasing-aware Patch Embedding for Self-Supervised Audio Representation Learning
Comments: 11 pages, 4 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Machine Learning (stat.ML)
[36]  arXiv:2512.03563 [pdf, ps, other]
Title: State Space Models for Bioacoustics: A comparative Evaluation with Transformers
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[37]  arXiv:2512.03783 (cross-list from cs.AI) [pdf, ps, other]
Title: Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD)
[38]  arXiv:2512.03636 (cross-list from cs.HC) [pdf, ps, other]
Title: Head, posture, and full-body gestures in dyadic conversations
Comments: 7 figures, 10 tables, 29 pages
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39]  arXiv:2512.03458 (cross-list from eess.SP) [pdf, ps, other]
Title: A Convolutional Framework for Mapping Imagined Auditory MEG into Listened Brain Responses
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 3 Dec 2025 (showing first 3 of 12 entries)

[40]  arXiv:2512.02783 [pdf, ps, other]
Title: Exploring Definitions of Quality and Diversity in Sonic Measurement Spaces
Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE)
[41]  arXiv:2512.02669 [pdf, ps, other]
Title: SAND Challenge: Four Approaches for Dysartria Severity Classification
Comments: 7 pages, 5 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[42]  arXiv:2512.02652 [pdf, ps, other]
Title: Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[ total of 51 entries: 1-25 | 18-42 | 43-51 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help  (Access key information)