We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions

[ total of 49 entries: 1-25 | 26-49 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 13 May 2026

[1]  arXiv:2605.12387 [pdf, ps, other]
Title: A Semi-Supervised Framework for Speech Confidence Detection using Whisper
Comments: 12 pages, 9 Figures, Submitted to IEEE Transactions on Audio, Speech and Language Processing
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[2]  arXiv:2605.12310 [pdf, ps, other]
Title: Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling
Comments: Accepted by ICASSP 2026
Subjects: Sound (cs.SD)
[3]  arXiv:2605.12135 [pdf, ps, other]
Title: STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts
Authors: Joshua Opria
Comments: 9 pages, 4 figures, 3 tables. Code and models: this https URL<your-github-username>/autocharter
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4]  arXiv:2605.11866 [pdf, ps, other]
Title: AuDirector: A Self-Reflective Closed-Loop Framework for Immersive Audio Storytelling
Subjects: Sound (cs.SD)
[5]  arXiv:2605.11192 [pdf, ps, other]
Title: Exploring Token-Space Manipulation in Latent Audio Tokenizers
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[6]  arXiv:2605.11098 [pdf, ps, other]
Title: AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling
Comments: Accepted to ACL Findings 2026
Subjects: Sound (cs.SD)
[7]  arXiv:2605.12287 (cross-list from eess.AS) [pdf, ps, other]
Title: The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking
Comments: 6 pages, 3 figures. Technical report on beat tracking failure modes; prepared for ISMIR 2026
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8]  arXiv:2605.11286 (cross-list from eess.SP) [pdf, ps, other]
Title: Adaptive Diagonal Loading using Krylov Subspaces for Robust Beamforming
Comments: 5 pages, 8 figures
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 12 May 2026 (showing first 17 of 18 entries)

[9]  arXiv:2605.10494 [pdf, ps, other]
Title: Multi-layer attentive probing improves transfer of audio representations for bioacoustics
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[10]  arXiv:2605.10281 [pdf, ps, other]
Title: Drum Synthesis from Expressive Drum Grids via Neural Audio Codecs
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[11]  arXiv:2605.10256 [pdf, ps, other]
Title: A Cold Diffusion Approach for Percussive Dereverberation
Comments: Accepted for the 2026 IEEE World Congress on Computational Intelligence, IJCNN Track, 21-26 June 2026, Maastricht, the Netherlands
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[12]  arXiv:2605.10203 [pdf, ps, other]
Title: Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration
Comments: Accepted by ICML 2026
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13]  arXiv:2605.10153 [pdf, ps, other]
Title: APEX: Audio Prototype EXplanations for Classification Tasks
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[14]  arXiv:2605.09846 [pdf, ps, other]
Title: ChladniSonify: A Visual-Acoustic Mapping Method for Chladni Patterns in New Media Art Creation
Comments: 9 pages, 5 figures, IEEE conference format
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[15]  arXiv:2605.09259 [pdf, ps, other]
Title: Remix the Timbre: Diffusion-Based Style Transfer Across Polyphonic Stems
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[16]  arXiv:2605.09087 [pdf, ps, other]
Title: Towards Trustworthy Audio Deepfake Detection: A Systematic Framework for Diagnosing and Mitigating Gender Bias
Comments: Submitted to SMC 2026 conference
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[17]  arXiv:2605.08762 [pdf, ps, other]
Title: Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search
Comments: 43 pages
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[18]  arXiv:2605.08554 [pdf, ps, other]
Title: Online Segmented Beamforming via Dynamic Programming
Comments: 4 pages, 2 figures
Subjects: Sound (cs.SD)
[19]  arXiv:2605.08214 [pdf, ps, other]
Title: Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
Comments: 3 figures and 5 tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20]  arXiv:2605.08194 [pdf, ps, other]
Title: ShipEcho -- An Interactive Tool for Global Mapping of Underwater Radiated Noise from Vessels
Comments: 34 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[21]  arXiv:2605.10084 (cross-list from eess.AS) [pdf, ps, other]
Title: PoDAR: Power-Disentangled Audio Representation for Generative Modeling
Comments: 9 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[22]  arXiv:2605.09908 (cross-list from cs.LG) [pdf, ps, other]
Title: Voice Biomarkers for Depression and Anxiety
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)
[23]  arXiv:2605.09906 (cross-list from cs.AI) [pdf, ps, other]
Title: Separate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thought
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD)
[24]  arXiv:2605.09120 (cross-list from cs.IR) [pdf, ps, other]
Title: Reddit2Deezer: A Scalable Dataset for Real-World Grounded Conversational Music Recommendation
Subjects: Information Retrieval (cs.IR); Sound (cs.SD)
[25]  arXiv:2605.08729 (cross-list from cs.CV) [pdf, ps, other]
Title: Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM); Sound (cs.SD)
[ total of 49 entries: 1-25 | 26-49 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2605, contact, help  (Access key information)