We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions, skipping first 8

[ total of 47 entries: 1-50 | 9-47 ]
[ showing up to 50 entries per page: fewer | more ]

Thu, 11 Dec 2025

[9]  arXiv:2512.09504 [pdf, ps, other]
Title: DMP-TTS: Disentangled multi-modal Prompting for Controllable Text-to-Speech with Chained Guidance
Subjects: Sound (cs.SD)
[10]  arXiv:2512.09285 [pdf, ps, other]
Title: Who Speaks What from Afar: Eavesdropping In-Person Conversations via mmWave Sensing
Subjects: Sound (cs.SD)
[11]  arXiv:2512.09066 [pdf, ps, other]
Title: ORCA: Open-ended Response Correctness Assessment for Audio Question Answering
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[12]  arXiv:2512.08973 [pdf, ps, other]
Title: Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture
Authors: Karamvir Singh
Comments: 5 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13]  arXiv:2512.09786 (cross-list from cs.LG) [pdf, ps, other]
Title: TinyDéjàVu: Smaller Memory Footprint & Faster Inference on Sensor Data Streams with Always-On Microcontrollers
Subjects: Machine Learning (cs.LG); Performance (cs.PF); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[14]  arXiv:2512.09327 (cross-list from cs.CV) [pdf, ps, other]
Title: UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[15]  arXiv:2512.09299 (cross-list from cs.CV) [pdf, ps, other]
Title: VABench: A Comprehensive Benchmark for Audio-Video Generation
Comments: 24 pages, 25 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

Wed, 10 Dec 2025

[16]  arXiv:2512.08812 [pdf, ps, other]
Title: Emovectors: assessing emotional content in jazz improvisations for creativity evaluation
Authors: Anna Jordanous
Comments: Presented at IEEE Big Data 2025 3rd Workshop on AI Music Generation (AIMG 2025). this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[17]  arXiv:2512.08403 [pdf, ps, other]
Title: DFALLM: Achieving Generalizable Multitask Deepfake Detection by Optimizing Audio LLM Components
Subjects: Sound (cs.SD)
[18]  arXiv:2512.08238 [pdf, ps, other]
Title: SpeechQualityLLM: LLM-Based Multimodal Assessment of Speech Quality
Comments: 9 pages, 5 figures, 8 tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[19]  arXiv:2512.08203 [pdf, ps, other]
Title: Error-Resilient Semantic Communication for Speech Transmission over Packet-Loss Networks
Comments: submitted to IEEE in Nov. 2025
Subjects: Sound (cs.SD)
[20]  arXiv:2512.08006 [pdf, ps, other]
Title: Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[21]  arXiv:2512.07872 [pdf, ps, other]
Title: LocaGen: Sub-Sample Time-Delay Learning for Beam Localization
Comments: 7 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[22]  arXiv:2512.07845 [pdf, ps, other]
Title: AudioScene: Integrating Object-Event Audio into 3D Scenes
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23]  arXiv:2512.08282 (cross-list from cs.CV) [pdf, ps, other]
Title: PAVAS: Physics-Aware Video-to-Audio Synthesis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)

Tue, 9 Dec 2025

[24]  arXiv:2512.07627 [pdf, ps, other]
Title: Incorporating Structure and Chord Constraints in Symbolic Transformer-based Melodic Harmonization
Comments: Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025), Brussels, Belgium, September 10th-12th
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Symbolic Computation (cs.SC)
[25]  arXiv:2512.07352 [pdf, ps, other]
Title: MultiAPI Spoof: A Multi-API Dataset and Local-Attention Network for Speech Anti-spoofing Detection
Subjects: Sound (cs.SD)
[26]  arXiv:2512.07168 [pdf, ps, other]
Title: JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention
Comments: UniReps: Unifying Representations in Neural Models (NeurIPS 2025 Workshop)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27]  arXiv:2512.07005 [pdf, ps, other]
Title: Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition
Comments: Accepted by ACMMM 2025
Journal-ref: Proceedings of the 33rd ACM International Conference on Multimedia (ACMMM 2025), Pages 12714-12721, October 27, 2025. Dublin, Ireland
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[28]  arXiv:2512.06999 [pdf, ps, other]
Title: Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model
Comments: Accepted to ACMMM 2025 oral
Journal-ref: Proceedings of the 33rd ACM International Conference on Multimedia (ACMMM 2025), Pages 12227-12236
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[29]  arXiv:2512.06890 [pdf, ps, other]
Title: What Needs to be Known in Order to Perform a Meaningful Scientific Comparison Between Animal Communications and Human Spoken Language
Authors: Roger K. Moore
Comments: 5 pages, 1 figure, Proc. Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR-24), Kos, Greece, 6 Sept. 2024
Journal-ref: Proc. Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR-24), pp 22-26, Kos, Greece, 6 Sept. 2024
Subjects: Sound (cs.SD)
[30]  arXiv:2512.06757 [pdf, ps, other]
Title: XM-ALIGN: Unified Cross-Modal Embedding Alignment for Face-Voice Association
Comments: FAME 2026 Technical Report
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
[31]  arXiv:2512.06380 [pdf, ps, other]
Title: Protecting Bystander Privacy via Selective Hearing in LALMs
Comments: Dataset: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[32]  arXiv:2512.06259 [pdf, ps, other]
Title: Who Will Top the Charts? Multimodal Music Popularity Prediction via Adaptive Fusion of Modality Experts and Temporal Engagement Modeling
Comments: 8 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[33]  arXiv:2512.06041 [pdf, ps, other]
Title: Technical Report of Nomi Team in the Environmental Sound Deepfake Detection Challenge 2026
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34]  arXiv:2512.06040 [pdf, ps, other]
Title: Physics-Guided Deepfake Detection for Voice Authentication Systems
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[35]  arXiv:2512.06022 [pdf, ps, other]
Title: DreamFoley: Scalable VLMs for High-Fidelity Video-to-Audio Generation
Comments: 10 pages; Bytedance
Subjects: Sound (cs.SD); Multimedia (cs.MM)
[36]  arXiv:2512.07741 (cross-list from cs.LG) [pdf, ps, other]
Title: A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data
Subjects: Machine Learning (cs.LG); Sound (cs.SD)
[37]  arXiv:2512.07351 (cross-list from cs.CV) [pdf, ps, other]
Title: DeepAgent: A Dual Stream Multi Agent Fusion for Robust Multimodal Deepfake Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD)
[38]  arXiv:2512.07226 (cross-list from eess.AS) [pdf, ps, other]
Title: Unsupervised Single-Channel Audio Separation with Diffusion Source Priors
Comments: 15 pages, 31 figures, accepted by The 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[39]  arXiv:2512.07209 (cross-list from cs.MM) [pdf, ps, other]
Title: Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits
Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Sound (cs.SD)
[40]  arXiv:2512.06417 (cross-list from cs.LG) [pdf, ps, other]
Title: Hankel-FNO: Fast Underwater Acoustic Charting Via Physics-Encoded Fourier Neural Operator
Authors: Yifan Sun (1), Lei Cheng (1), Jianlong Li (1), Peter Gerstoft (2) ((1) College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China, (2) Scripps Institution of Oceanography, University of California San Diego, La Jolla, USA)
Subjects: Machine Learning (cs.LG); Sound (cs.SD)
[41]  arXiv:2512.06304 (cross-list from eess.AS) [pdf, ps, other]
Title: Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Sound (cs.SD)
[42]  arXiv:2512.05994 (cross-list from eess.AS) [pdf, ps, other]
Title: KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)

Mon, 8 Dec 2025

[43]  arXiv:2512.05592 [pdf, ps, other]
Title: The T12 System for AudioMOS Challenge 2025: Audio Aesthetics Score Prediction System Using KAN- and VERSA-based Models
Comments: Accepted by IEEE ASRU 2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44]  arXiv:2512.05508 [pdf, ps, other]
Title: Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction
Comments: 8 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[45]  arXiv:2512.05528 (cross-list from q-bio.NC) [pdf, ps, other]
Title: Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening
Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[46]  arXiv:2512.05201 (cross-list from cs.NI) [pdf, ps, other]
Title: MuMeNet: A Network Simulator for Musical Metaverse Communications
Comments: To appear in 2025 IEEE 6th International Symposium on the Internet of Sounds (IS2) proceedings
Subjects: Networking and Internet Architecture (cs.NI); Sound (cs.SD)
[47]  arXiv:2512.05126 (cross-list from eess.AS) [pdf, ps, other]
Title: SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
[ total of 47 entries: 1-50 | 9-47 ]
[ showing up to 50 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help  (Access key information)