Electrical Engineering and Systems Science
New submissions
[ showing up to 2000 entries per page: fewer | more ]
New submissions for Fri, 6 Feb 26
- [1] arXiv:2602.04944 [pdf, ps, other]
-
Title: Smart Diagnosis and Early Intervention in PCOS: A Deep Learning Approach to Women's Reproductive HealthAuthors: Shayan Abrar, Samura Rahman, Ishrat Jahan Momo, Mahjabin Tasnim Samiha, B. M. Shahria Alam, Mohammad Tahmid Noor, Nishat Tasnim NiloyComments: 6 pages, 12 figures. This is the author's accepted manuscript of a paper accepted for publication in the Proceedings of the 16th International IEEE Conference on Computing, Communication and Networking Technologies (ICCCNT 2025). The final published version will be available via IEEE XploreSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Polycystic Ovary Syndrome (PCOS) is a widespread disorder in women of reproductive age, characterized by a hormonal imbalance, irregular periods, and multiple ovarian cysts. Infertility, metabolic syndrome, and cardiovascular risks are long-term complications that make early detection essential. In this paper, we design a powerful framework based on transfer learning utilizing DenseNet201 and ResNet50 for classifying ovarian ultrasound images. The model was trained on an online dataset containing 3856 ultrasound images of cyst-infected and non-infected patients. Each ultrasound frame was resized to 224x224 pixels and encoded with precise pathological indicators. The MixUp and CutMix augmentation strategies were used to improve generalization, yielding a peak validation accuracy of 99.80% by Densenet201 and a validation loss of 0.617 with alpha values of 0.25 and 0.4, respectively. We evaluated the model's interpretability using leading Explainable AI (XAI) approaches such as SHAP, Grad-CAM, and LIME, reasoning with and presenting explicit visual reasons for the model's behaviors, therefore increasing the model's transparency. This study proposes an automated system for medical picture diagnosis that may be used effectively and confidently in clinical practice.
- [2] arXiv:2602.04971 [pdf, ps, other]
-
Title: Multi-Sensor Scheduling for Remote State Estimation over Wireless MIMO Fading Channels with Semantic Over-the-Air AggregationSubjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
In this work, we study multi-sensor scheduling for remote state estimation over wireless multiple-input multiple-output (MIMO) fading channels using a novel semantic over-the-air (SemOTA) aggregation approach. We first revisit Kalman filtering with conventional over-the-air (OTA) aggregation and highlight its transmit power limitations. To balance power efficiency and estimation performance, we formulate the scheduling task as a finite-horizon dynamic programming (DP) problem. By analyzing the structure of the optimal Q-function, we show that the resulting scheduling policy exhibits a semantic structure that adapts online to the estimation error covariance and channel variations. To obtain a practical solution, we derive a tractable upper bound on the Q-function via a positive semidefinite (PSD) cone decomposition, which enables an efficient approximate scheduling policy and a low-complexity remote estimation algorithm. Numerical results confirm that the proposed scheme outperforms existing methods in both estimation accuracy and power efficiency.
- [3] arXiv:2602.04983 [pdf, ps, other]
-
Title: AI-Based Detection of In-Treatment Changes from Prostate MR-Linac ImagesAuthors: Seungbin Park, Peilin Wang, Ryan Pennell, Emily S. Weg, Himanshu Nagar, Timothy McClure, Mert R. Sabuncu, Daniel Margolis, Heejong KimSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Purpose: To investigate whether routinely acquired longitudinal MR-Linac images can be leveraged to characterize treatment-induced changes during radiotherapy, particularly subtle inter-fraction changes over short intervals (average of 2 days). Materials and Methods: This retrospective study included a series of 0.35T MR-Linac images from 761 patients. An artificial intelligence (deep learning) model was used to characterize treatment-induced changes by predicting the temporal order of paired images. The model was first trained with the images from the first and the last fractions (F1-FL), then with all pairs (All-pairs). Model performance was assessed using quantitative metrics (accuracy and AUC), compared to a radiologist's performance, and qualitative analyses - the saliency map evaluation to investigate affected anatomical regions. Input ablation experiments were performed to identify the anatomical regions altered by radiotherapy. The radiologist conducted an additional task on partial images reconstructed by saliency map regions, reporting observations as well. Quantitative image analysis was conducted to investigate the results from the model and the radiologist. Results: The F1-FL model yielded near-perfect performance (AUC of 0.99), significantly outperforming the radiologist. The All-pairs model yielded an AUC of 0.97. This performance reflects therapy-induced changes, supported by the performance correlation to fraction intervals, ablation tests and expert's interpretation. Primary regions driving the predictions were prostate, bladder, and pubic symphysis. Conclusion: The model accurately predicts temporal order of MR-Linac fractions and detects radiation-induced changes over one or a few days, including prostate and adjacent organ alterations confirmed by experts. This underscores MR-Linac's potential for advanced image analysis beyond image guidance.
- [4] arXiv:2602.05034 [pdf, ps, other]
-
Title: Phase-Only Positioning in Distributed MIMO Under Phase Impairments: AP Selection Using Deep LearningAuthors: Fatih Ayten, Musa Furkan Keskin, Akshay Jain, Mehmet C. Ilter, Ossi Kaltiokallio, Jukka Talvitie, Elena Simona Lohan, Mikko ValkamaSubjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS)
Carrier phase positioning (CPP) can enable cm-level accuracy in next-generation wireless systems, while recent literature shows that accuracy remains high using phase-only measurements in distributed MIMO (D-MIMO). However, the impact of phase synchronization errors on such systems remains insufficiently explored. To address this gap, we first show that the proposed hyperbola intersection method achieves highly accurate positioning even in the presence of phase synchronization errors, when trained on appropriate data reflecting such impairments. We then introduce a deep learning (DL)-based D-MIMO antenna point (AP) selection framework that ensures high-precision localization under phase synchronization errors. Simulation results show that the proposed framework improves positioning accuracy compared to prior-art methods, while reducing inference complexity by approximately 19.7%.
- [5] arXiv:2602.05103 [pdf, ps, other]
-
Title: Learning Nonlinear Continuous-Time Systems for Formal Uncertainty Propagation and Probabilistic EvaluationComments: 10 pages, 4 figures, to appear in ACM Int'l Conf. on Hybrid Systems: Computation and Control (HSCC), and ACM/IEEE Int'l Conference on Cyber-Physical Systems (ICCPS) 2026Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)
Nonlinear ordinary differential equations (ODEs) are powerful tools for modeling real-world dynamical systems. However, propagating initial state uncertainty through nonlinear dynamics, especially when the ODE is unknown and learned from data, remains a major challenge. This paper introduces a novel continuum dynamics perspective for model learning that enables formal uncertainty propagation by constructing Taylor series approximations of probabilistic events. We establish sufficient conditions for the soundness of the approach and prove its asymptotic convergence. Empirical results demonstrate the framework's effectiveness, particularly when predicting rare events.
- [6] arXiv:2602.05104 [pdf, ps, other]
-
Title: Personalized White Matter Bundle Segmentation for Early ChildhoodAuthors: Elyssa M. McMaster, Michael E. Kim, Nancy R. Newlin, Gaurav Rudravaram, Adam M. Saunders, Aravind R. Krishnan, Jongyeon Yoon, Ji S. Kim, Bryce L. Geeraert, Meaghan V. Perdue, Catherine Lebel, Daniel Moyer, Kurt G. Schilling, Laurie E. Cutting, Bennett A. LandmanSubjects: Image and Video Processing (eess.IV)
White matter segmentation methods from diffusion magnetic resonance imaging range from streamline clustering-based approaches to bundle mask delineation, but none have proposed a pediatric-specific approach. We hypothesize that a deep learning model with a similar approach to TractSeg will improve similarity between an algorithm-generated mask and an expert-labeled ground truth. Given a cohort of 56 manually labelled white matter bundles, we take inspiration from TractSeg's 2D UNet architecture, and we modify inputs to match bundle definitions as determined by pediatric experts, evaluation to use k fold cross validation, the loss function to masked Dice loss. We evaluate Dice score, volume overlap, and volume overreach of 16 major regions of interest compared to the expert labeled dataset. To test whether our approach offers statistically significant improvements over TractSeg, we compare Dice voxels, volume overlap, and adjacency voxels with a Wilcoxon signed rank test followed by false discovery rate correction. We find statistical significance across all bundles for all metrics with one exception in volume overlap. After we run TractSeg and our model, we combine their output masks into a 60 label atlas to evaluate if TractSeg and our model combined can generate a robust, individualized atlas, and observe smoothed, continuous masks in cases that TractSeg did not produce an anatomically plausible output. With the improvement of white matter pathway segmentation masks, we can further understand neurodevelopment on a population level scale, and we can produce reliable estimates of individualized anatomy in pediatric white matter diseases and disorders.
- [7] arXiv:2602.05116 [pdf, ps, other]
-
Title: GPU-to-Grid: Voltage Regulation via GPU Utilization ControlSubjects: Systems and Control (eess.SY)
While the rapid expansion of data centers poses challenges for power grids, it also offers new opportunities as potentially flexible loads. Existing power system research often abstracts data centers as aggregate resources, while computer system research primarily focuses on optimizing GPU energy efficiency and largely ignores the grid impacts of optimized GPU power consumption. To bridge this gap, we develop a GPU-to-Grid framework that couples device-level GPU control with power system objectives. We study distribution-level voltage regulation enabled by flexibility in LLM inference, using batch size as a control knob that trades off the voltage impacts of GPU power consumption against inference latency and token throughput. We first formulate this problem as an optimization problem and then realize it as an online feedback optimization controller that leverages measurements from both the power grid and GPU systems. Our key insight is that reducing GPU power consumption alleviates violations of lower voltage limits, while increasing GPU power mitigates violations near upper voltage limits in distribution systems; this runs counter to the common belief that minimizing GPU power consumption is always beneficial to power grids.
- [8] arXiv:2602.05121 [pdf, ps, other]
-
Title: Trojan Attacks on Neural Network Controllers for Robotic SystemsComments: Paper submitted to the 2026 IEEE Conference on Control Technology and Applications (CCTA)Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Neural network controllers are increasingly deployed in robotic systems for tasks such as trajectory tracking and pose stabilization. However, their reliance on potentially untrusted training pipelines or supply chains introduces significant security vulnerabilities. This paper investigates backdoor (Trojan) attacks against neural controllers, using a differential-drive mobile robot platform as a case study. In particular, assuming that the robot's tracking controller is implemented as a neural network, we design a lightweight, parallel Trojan network that can be embedded within the controller. This malicious module remains dormant during normal operation but, upon detecting a highly specific trigger condition defined by the robot's pose and goal parameters, compromises the primary controller's wheel velocity commands, resulting in undesired and potentially unsafe robot behaviours. We provide a proof-of-concept implementation of the proposed Trojan network, which is validated through simulation under two different attack scenarios. The results confirm the effectiveness of the proposed attack and demonstrate that neural network-based robotic control systems are subject to potentially critical security threats.
- [9] arXiv:2602.05201 [pdf, ps, other]
-
Title: Diffusion-aided Extreme Video Compression with Lightweight Semantics GuidanceComments: Accepted by ICASSP 2026Subjects: Image and Video Processing (eess.IV)
Modern video codecs and learning-based approaches struggle for semantic reconstruction at extremely low bit-rates due to reliance on low-level spatiotemporal redundancies. Generative models, especially diffusion models, offer a new paradigm for video compression by leveraging high-level semantic understanding and powerful visual synthesis. This paper propose a video compression framework that integrates generative priors to drastically reduce bit-rate while maintaining reconstruction fidelity. Specifically, our method compresses high-level semantic representations of the video, then uses a conditional diffusion model to reconstruct frames from these semantics. To further improve compression, we characterize motion information with global camera trajectories and foreground segmentation: background motion is compactly represented by camera pose parameters while foreground dynamics by sparse segmentation masks. This allows for significantly boosts compression efficiency, enabling descent video reconstruction at extremely low bit-rates.
- [10] arXiv:2602.05207 [pdf, ps, other]
-
Title: ARCHI-TTS: A flow-matching-based Text-to-Speech Model with Self-supervised Semantic Aligner and Accelerated InferenceComments: Accepted by ICASSP 2026Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
Although diffusion-based, non-autoregressive text-to-speech (TTS) systems have demonstrated impressive zero-shot synthesis capabilities, their efficacy is still hindered by two key challenges: the difficulty of text-speech alignment modeling and the high computational overhead of the iterative denoising process. To address these limitations, we propose ARCHI-TTS that features a dedicated semantic aligner to ensure robust temporal and semantic consistency between text and audio. To overcome high computational inference costs, ARCHI-TTS employs an efficient inference strategy that reuses encoder features across denoising steps, drastically accelerating synthesis without performance degradation. An auxiliary CTC loss applied to the condition encoder further enhances the semantic understanding. Experimental results demonstrate that ARCHI-TTS achieves a WER of 1.98% on LibriSpeech-PC test-clean, and 1.47%/1.42% on SeedTTS test-en/test-zh with a high inference efficiency, consistently outperforming recent state-of-the-art TTS systems.
- [11] arXiv:2602.05208 [pdf, ps, other]
-
Title: Context-Aware Asymmetric Ensembling for Interpretable Retinopathy of Prematurity Screening via Active Query and Vascular AttentionComments: 16 pages, 6 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Retinopathy of Prematurity (ROP) is among the major causes of preventable childhood blindness. Automated screening remains challenging, primarily due to limited data availability and the complex condition involving both structural staging and microvascular abnormalities. Current deep learning models depend heavily on large private datasets and passive multimodal fusion, which commonly fail to generalize on small, imbalanced public cohorts. We thus propose the Context-Aware Asymmetric Ensemble Model (CAA Ensemble) that simulates clinical reasoning through two specialized streams. First, the Multi-Scale Active Query Network (MS-AQNet) serves as a structure specialist, utilizing clinical contexts as dynamic query vectors to spatially control visual feature extraction for localization of the fibrovascular ridge. Secondly, VascuMIL encodes Vascular Topology Maps (VMAP) within a gated Multiple Instance Learning (MIL) network to precisely identify vascular tortuosity. A synergistic meta-learner ensembles these orthogonal signals to resolve diagnostic discordance across multiple objectives. Tested on a highly imbalanced cohort of 188 infants (6,004 images), the framework attained State-of-the-Art performance on two distinct clinical tasks: achieving a Macro F1-Score of 0.93 for Broad ROP staging and an AUC of 0.996 for Plus Disease detection. Crucially, the system features `Glass Box' transparency through counterfactual attention heatmaps and vascular threat maps, proving that clinical metadata dictates the model's visual search. Additionally, this study demonstrates that architectural inductive bias can serve as an effective bridge for the medical AI data gap.
- [12] arXiv:2602.05209 [pdf, ps, other]
-
Title: Integrated Sensing, Communication, and Control for UAV-Assisted Mobile Target TrackingComments: 13 pages, 10 figuresSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
Unmanned aerial vehicles (UAVs) are increasingly deployed in mission-critical applications such as target tracking, where they must simultaneously sense dynamic environments, ensure reliable communication, and achieve precise control. A key challenge here is to jointly guarantee tracking accuracy, communication reliability, and control stability within a unified framework. To address this issue, we propose an integrated sensing, communication, and control (ISCC) framework for UAV-assisted target tracking, where the considered tracking system is modeled as a discrete-time linear control process, with the objective of driving the deviation between the UAV and target states toward zero. We formulate a stochastic model predictive control (MPC) optimization problem for joint control and beamforming design, which is highly non-convex and intractable in its original form. To overcome this difficulty, the target state is first estimated using an extended Kalman filter (EKF). Then, by deriving the closed-form optimal beamforming solution under a given control input, the original problem is equivalently reformulated into a tractable control-oriented form. Finally, we convexify the remaining non-convex constraints via a relaxation-based convex approximation, yielding a computationally tractable convex optimization problem that admits efficient global solution. Numerical results show that the proposed ISCC framework achieves tracking accuracy comparable to a non-causal benchmark while maintaining stable communication, and it significantly outperforms the conventional control and tracking method.
- [13] arXiv:2602.05236 [pdf, ps, other]
-
Title: Exterior sound field estimation based on physics-constrained kernelComments: This paper has been accepted to the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2026Subjects: Audio and Speech Processing (eess.AS)
Exterior sound field interpolation is a challenging problem that often requires specific array configurations and prior knowledge on the source conditions. We propose an interpolation method based on Gaussian processes using a point source reproducing kernel with a trainable inner product formulation made to fit exterior sound fields. While this estimation does not have a closed formula, it allows for the definition of a flexible estimator that is not restricted by microphone distribution and attenuates higher harmonic orders automatically with parameters directly optimized from the recordings, meaning an arbitrary distribution of microphones can be used. The proposed kernel estimator is compared in simulated experiments to the conventional method using spherical wave functions and an established physics-informed machine learning model, achieving lower interpolation error by approximately 2 dB on average within the analyzed frequencies of 100 Hz and 2.5 kHz and reconstructing the ground truth sound field more consistently within the target region.
- [14] arXiv:2602.05263 [pdf, ps, other]
-
Title: Nonlinear Predictive Cost Adaptive Control of Pseudo-Linear Input-Output Models Using Polynomial, Fourier, and Cubic Spline ObservablesSubjects: Systems and Control (eess.SY)
Control of nonlinear systems with high levels of uncertainty is practically relevant and theoretically challenging. This paper presents a numerical investigation of an adaptive nonlinear model predictive control (MPC) technique that relies entirely on online system identification without prior modeling, training, or data collection. In particular, the paper considers predictive cost adaptive control (PCAC), which is an extension of generalized predictive control. Nonlinear PCAC (NPCAC) uses recursive least squares (RLS) with subspace of information forgetting (SIFt) to identify a discrete-time, pseudo-linear, input-output model, which is used with iterative MPC for nonlinear receding-horizon optimization. The performance of NPCAC is illustrated using polynomial, Fourier, and cubic-spline basis functions.
- [15] arXiv:2602.05308 [pdf, ps, other]
-
Title: A Migration-Assisted Deep Learning Scheme for Imaging Defects Inside Cylindrical Structures via GPR: A Case Study for Tree TrunksAuthors: Jiwei Qian, Yee Hui Lee, Kaixuan Cheng, Qiqi Dai, Arda Yalcinkaya, Mohamed Lokman Mohd Yusof, James Wang, Abdulkadir C. YucelJournal-ref: IEEE Transactions on Geoscience and Remote Sensing, 2026Subjects: Signal Processing (eess.SP)
Ground-penetrating radar (GPR) has emerged as a prominent tool for imaging internal defects in cylindrical structures, such as columns, utility poles, and tree trunks. However, accurately reconstructing both the shape and permittivity of the defects inside cylindrical structures remains challenging due to complex wave scattering phenomena and the limited accuracy of the existing signal processing and deep learning techniques. To address these issues, this study proposes a migration-assisted deep learning scheme for reconstructing the shape and permittivity of defects within cylindrical structures. The proposed scheme involves three stages of GPR data processing. First, a dual-permittivity estimation network extracts the permittivity values of the defect and the cylindrical structure, the latter of which is estimated with the help of a novel structural similarity index measure-based autofocusing technique. Second, a modified Kirchhoff migration incorporating the extracted permittivity of the cylindrical structure maps the signals reflected from the defect to the imaging domain. Third, a shape reconstruction network processes the migrated image to recover the precise shape of the defect. The image of the interior defect is finally obtained by combining the reconstructed shape and extracted permittivity of the defect. The proposed scheme is validated using both synthetic and experimental data from a laboratory trunk model and real tree trunk samples. Comparative results show superior performance over existing deep learning methods, while generalization tests on live trees confirm its feasibility for in-field deployment. The underlying principle can further be applied to other circumferential GPR imaging scenarios. The code and database are available at: https://github.com/jwqian54/Migration-Assisted-DL.
- [16] arXiv:2602.05342 [pdf, ps, other]
-
Title: Joint Optimization of Latency and Accuracy for Split Federated Learning in User-Centric Cell-Free MIMO NetworksSubjects: Signal Processing (eess.SP)
This paper proposes a user-centric split federated learning (UCSFL) framework for user-centric cell-free multiple-input multiple-output (CF-MIMO) networks to support split federated learning (SFL). In the proposed UCSFL framework, users deploy split sub-models locally, while complete models are maintained and updated at access point (AP)-side distributed processing units (DPUs), followed by a two-level aggregation procedure across DPUs and the central processing unit (CPU). Under standard machine learning (ML) assumptions, we provide a theoretical convergence analysis for UCSFL, which reveals that the AP-cluster size is a key factor influencing model training accuracy. Motivated by this result, we introduce a new performance metric, termed the latency-to-accuracy ratio, defined as the ratio of a user's per-iteration training latency to the weighted size of its AP cluster. Based on this metric, we formulate a joint optimization problem to minimize the maximum latency-to-accuracy ratio by jointly optimizing uplink power control, downlink beamforming, model splitting, and AP clustering. The resulting problem is decomposed into two sub-problems operating on different time scales, for which dedicated algorithms are developed to handle the short-term and long-term optimizations, respectively. Simulation results verify the convergence of the proposed algorithms and demonstrate that UCSFL effectively reduces the latency-to-accuracy ratio of the VGG16 model compared with baseline schemes. Moreover, the proposed framework adaptively adjusts splitting and clustering strategies in response to varying communication and computation resources. An MNIST-based handwritten digit classification example further shows that UCSFL significantly accelerates the convergence of the VGG16 model.
- [17] arXiv:2602.05363 [pdf, ps, other]
-
Title: Policy-Driven Orchestration Framework for Multi-Operator Non-Terrestrial NetworksComments: Accepted for publication in IEEE Transactions on CommunicationsSubjects: Systems and Control (eess.SY)
Non-terrestrial networks (NTNs) have gained significant attention for their scalability and wide coverage in next-generation communication systems. A large number of NTN nodes, such as satellites, are required to establish a global NTN, but not all operators have the capability to deploy such a system. Therefore, cooperation among multiple operators, facilitated by an orchestrator, enables the construction of virtually large-scale constellations. In this paper, we propose a weak-control-based orchestration framework that coordinates multiple NTN operators while ensuring that operations align with the policies of both the orchestrator and the individual operators. Unlike centralized orchestration frameworks, where the orchestrator determines the entire route from source to destination, the proposed framework allows each operator to select preferred routes from multiple candidates provided by the orchestrator. To evaluate the effectiveness of our proposed framework, we conducted numerical simulations under various scenarios and network configurations including dynamic NTN environments with time-varying topologies, showing that inter-operator cooperation improves the availability of feasible end-to-end routes. Furthermore, we analyzed the iterative negotiation process to address policy conflicts and quantitatively demonstrated the "price of autonomy," where strict individual policies degrade global feasibility and performance. The results also demonstrate that outcomes of the proposed framework depend on the operators' policies and that hop count and latency increase as the number of operators grows. These findings validate the proposed framework's ability to deliver practical benefits of orchestrated multi-operator collaboration in future NTN environments.
- [18] arXiv:2602.05442 [pdf, ps, other]
-
Title: Robust data-driven model-reference control of linear perturbed systems via sliding mode generationSubjects: Systems and Control (eess.SY)
This paper introduces a data-based integral sliding mode control scheme for robustification of model-reference controllers, accommodating generic multivariable linear systems with unknown dynamics and affected by matched disturbances. Specifically, an integral sliding mode control (ISMC) law is recast into a data-based framework relying on an integral sliding variable depending only on the reference model, without the need of modeling the plant. The main strength of the proposed approach is the enforcement of the desired reference model in closed-loop under sliding mode conditions, despite the lack of knowledge of the model dynamics and the presence of the matched disturbances. Moreover, the conditions required to guarantee an integral sliding mode generation and the closed-loop stability are formally analyzed in the paper, remarking the generality of the proposed data-driven integral sliding mode control (DD-ISMC) with respect to the related model-based counterpart. Finally, the main practices for the data-based design of the proposed control scheme are deeply discussed in the paper, and the proposed method is tested in simulation on a benchmark example, and experimentally on a real laboratory setup. Simulation and experimental evidence fully corroborates the theoretical analysis, thus motivating further research in this direction.
- [19] arXiv:2602.05443 [pdf, ps, other]
-
Title: Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech Generation from SSL featuresComments: Accepted by IEEE ICASSP 2026. 5 pages, 3 figures, and 2 tablesSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
We propose WaveTrainerFit, a neural vocoder that performs high-quality waveform generation from data-driven features such as SSL features. WaveTrainerFit builds upon the WaveFit vocoder, which integrates diffusion model and generative adversarial network. Furthermore, the proposed method incorporates the following key improvements: 1. By introducing trainable priors, the inference process starts from noise close to the target speech instead of Gaussian noise. 2. Reference-aware gain adjustment is performed by imposing constraints on the trainable prior to matching the speech energy. These improvements are expected to reduce the complexity of waveform modeling from data-driven features, enabling high-quality waveform generation with fewer inference steps. Through experiments, we showed that WaveTrainerFit can generate highly natural waveforms with improved speaker similarity from data-driven features, while requiring fewer iterations than WaveFit. Moreover, we showed that the proposed method works robustly with respect to the depth at which SSL features are extracted. Code and pre-trained models are available from https://github.com/line/WaveTrainerFit.
- [20] arXiv:2602.05453 [pdf, ps, other]
-
Title: Towards Segmenting the Invisible: An End-to-End Registration and Segmentation Framework for Weakly Supervised Tumour AnalysisAuthors: Budhaditya Mukhopadhyay, Chirag Mandal, Pavan Tummala, Naghmeh Mahmoodian, Andreas Nürnberger, Soumick ChatterjeeComments: Accepted for AIBio at ECAI 2025Journal-ref: Artificial Intelligence for Biomedical Data, AIBIO 2025, CCIS 2696, pp 1-14, 2026Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph)
Liver tumour ablation presents a significant clinical challenge: whilst tumours are clearly visible on pre-operative MRI, they are often effectively invisible on intra-operative CT due to minimal contrast between pathological and healthy tissue. This work investigates the feasibility of cross-modality weak supervision for scenarios where pathology is visible in one modality (MRI) but absent in another (CT). We present a hybrid registration-segmentation framework that combines MSCGUNet for inter-modal image registration with a UNet-based segmentation module, enabling registration-assisted pseudo-label generation for CT images. Our evaluation on the CHAOS dataset demonstrates that the pipeline can successfully register and segment healthy liver anatomy, achieving a Dice score of 0.72. However, when applied to clinical data containing tumours, performance degrades substantially (Dice score of 0.16), revealing the fundamental limitations of current registration methods when the target pathology lacks corresponding visual features in the target modality. We analyse the "domain gap" and "feature absence" problems, demonstrating that whilst spatial propagation of labels via registration is feasible for visible structures, segmenting truly invisible pathology remains an open challenge. Our findings highlight that registration-based label transfer cannot compensate for the absence of discriminative features in the target modality, providing important insights for future research in cross-modality medical image analysis. Code an weights are available at: https://github.com/BudhaTronix/Weakly-Supervised-Tumour-Detection
- [21] arXiv:2602.05483 [pdf, ps, other]
-
Title: Toward Operationalizing Rasmussen: Drift Observability on the Simplex for Evolving SystemsAuthors: Anatoly A. KrasnovskySubjects: Systems and Control (eess.SY); Computers and Society (cs.CY); Applications (stat.AP)
Monitoring drift into failure is hindered by Euclidean anomaly detection that can conflate safe operational trade-offs with risk accumulation in signals expressed as shares, and by architectural churn that makes fixed schemas (and learned models) stale before rare boundary events occur. Rasmussen's dynamic safety model motivates drift under competing pressures, but operationalizing it for software is difficult because many high-value operational signals (effort, remaining margin, incident impact) are compositional and their parts evolve. We propose a vision for drift observability on the simplex: model drift and boundary proximity in Aitchison geometry to obtain coordinate-invariant direction and distance-to-safety in interpretable balance coordinates. To remain comparable under churn, a monitor would continuously refresh its part inventory and policy-defined boundaries from engineering artifacts and apply lineage-aware aggregation. We outline early-warning diagnostics and falsifiable hypotheses for future evaluation.
- [22] arXiv:2602.05554 [pdf, ps, other]
-
Title: Beamformed Fingerprint-Based Transformer Network for Trajectory Estimation and Path Determination in Outdoor mmWave MIMO SystemsAuthors: Mohammad Shamsesalehi, Mahmoud Ahmadian Attari, Mohammad Amin Maleki Sadr, Benoit ChampagneComments: 14 pages, 11 figuresSubjects: Signal Processing (eess.SP)
Radio transmissions in millimeter wave (mmWave) bands have gained significant interest for applications demanding precise device localization and trajectory estimation. This paper explores novel neural network (NN) architectures suitable for trajectory estimation and path determination in a mmWave multiple-input multiple-output (MIMO) outdoor system based on localization data from beamformed fingerprint (BFF). The NN architecture captures sequences of BFF signals from different users, and through the application of learning mechanisms, subsequently estimate their trajectories. In turn, this information is employed to find the shortest path to the target, thereby enabling more efficient navigation. Specifically, we propose a two-stage procedure for trajectory estimation and optimal path finding. In the first stage, a transformer network (TN) based on attention mechanisms is developed to predict trajectories of wireless devices using BFF sequences captured in a mmWave MIMO outdoor system. In the second stage, a novel algorithm based on Informed Rapidly-exploring Random Trees (iRRT*) is employed to determine the optimal path to target locations using trajectory estimates derived in the first stage. The effectiveness of the proposed schemes is validated through numerical experiments, using a comprehensive dataset of radio measurements, generated using ray tracing simulations to model outdoor propagation at 28 GHz. We show that our proposed TN-based trajectory estimator outperforms other methods from the recent literature and can successfully generalize to new trajectories outside the training set. Furthermore, our proposed iRRT* algorithm is able to consistently provide the shortest path to the target.
- [23] arXiv:2602.05560 [pdf, ps, other]
-
Title: Depth estimation of a monoharmonic source using a vertical linear array at fixed distanceSubjects: Signal Processing (eess.SP); Atmospheric and Oceanic Physics (physics.ao-ph)
Estimating the depth of a monoharmonic sound source at a fixed range using a vertical linear array (VLA) is challenging in the absence of seabed environmental parameters, and relevant research remains scarce. The orthogonality constrained modal search based depth estimation (OCMS-D) method is proposed in this paper, which enables the estimation of the depth of a monoharmonic source at a fixed range using a VLA under unknown seabed parameters. Using the sparsity of propagating normal modes and the orthogonality of mode depth functions, OCMS-D estimates the normal mode parameters under a fixed source-array distance at first. The estimated normal mode parameters are then used to estimate the source depth. To ensure the precision of the source depth estimation, the method utilizes information on both the amplitude distribution and the sign (positive/negative) patterns of the estimated mode depth functions at the inferred source depth. Numerical simulations evaluate the performance of OCMS-D under different conditions. The effectiveness of OCMS-D is also verified by the Yellow Sea experiment and the SWellEx-96 experiment. In the Yellow Sea experiment, the depth estimation absolute errors by OCMS-D with a 4-second time window are less than 2.4 m. And the depth estimation absolute errors in the SWellEx-96 experiment with a 10-second time window are less than 5.4 m for the shallow source and less than 10.8 m for the deep source.
- [24] arXiv:2602.05579 [pdf, ps, other]
-
Title: Physics-Aware Tensor Reconstruction for Radio Maps in Pixel-Based Fluid Antenna SystemsSubjects: Signal Processing (eess.SP)
The deployment of pixel-based antennas and fluid antenna systems (FAS) is hindered by prohibitive channel state information (CSI) acquisition overhead. While radio maps enable proactive mode selection, reconstructing high-fidelity maps from sparse measurements is challenging. Existing physics-agnostic or data-driven methods often fail to recover fine-grained shadowing details under extreme sparsity. We propose a Physics-Regularized Low-Rank Tensor Completion (PR-LRTC) framework for radio map reconstruction. By modeling the signal field as a three-way tensor, we integrate environmental low-rankness with deterministic antenna physics. Specifically, we leverage Effective Aerial Degrees-of-Freedom (EADoF) theory to derive a differential gain topology map as a physical prior for regularization. The resulting optimization problem is solved via an efficient Alternating Direction Method of Multipliers (ADMM)-based algorithm. Simulations show that PR-LRTC achieves a 4 dB gain over baselines at a 10% sampling ratio. It effectively preserves sharp shadowing edges, providing a robust, physics-compliant solution for low-overhead beam management.
- [25] arXiv:2602.05581 [pdf, ps, other]
-
Title: Physics-Inspired Target Shape Detection and Reconstruction in mmWave Communication SystemsComments: Accepted by GLOBECOM 2023Subjects: Signal Processing (eess.SP)
The integration of sensing and communication (ISAC) is an essential function of future wireless systems. Due to its large available bandwidth, millimeter-wave (mmWave) ISAC systems are able to achieve high sensing accuracy. In this paper, we consider the multiple base-station (BS) collaborative sensing problem in a multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) mmWave communication system. Our aim is to sense a remote target shape with the collected signals which consist of both the reflection and scattering signals. We first characterize the mmWave's scattering and reflection effects based on the Lambertian scattering model. Then we apply the periodogram technique to obtain rough scattering point detection, and further incorporate the subspace method to achieve more precise scattering and reflection point detection. Based on these, a reconstruction algorithm based on Hough Transform and principal component analysis (PCA) is designed for a single convex polygon target scenario. To improve the accuracy and completeness of the reconstruction results, we propose a method to further fuse the scattering and reflection points. Extensive simulation results validate the effectiveness of the proposed algorithms.
- [26] arXiv:2602.05584 [pdf, ps, other]
-
Title: Fairness-aware design of nudging policies under stochasticity and prejudicesComments: Submitted to IFAC WC 2026Subjects: Systems and Control (eess.SY)
We present an injustice-aware innovation-diffusion model extending the Generalized Linear Threshold framework by assigning agents activation thresholds drawn from a Beta distribution to capture the stochastic nature of adoption shaped by inequalities. Because incentive policies themselves can inadvertently amplify these inequalities, building on this model, we design a fair Model Predictive Control (MPC) scheme that incorporates equality and equity objectives for allocating incentives. Simulations using real mobility-habit data show that injustice reduces overall adoption, while equality smooths incentive distribution and equity reduces disparities in the final outcomes. Thus, incorporating fairness ensures effective diffusion without exacerbating existing social inequalities.
- [27] arXiv:2602.05586 [pdf, ps, other]
-
Title: Observer-based Control of Multi-agent Systems under STL SpecificationsComments: This paper has been submitted for consideration to the 23rd IFAC World CongressSubjects: Systems and Control (eess.SY)
This paper proposes a decentralized controller for large-scale heterogeneous multi-agent systems subject to bounded external disturbances, where agents must satisfy Signal Temporal Logic (STL) specifications requiring cooperation among non-communicating agents. To address the lack of direct communication, we employ a decentralized k-hop Prescribed Performance State Observer (k-hop PPSO) to provide each agent with state estimates of those agents it cannot communicate with. By leveraging the performance bounds on the state estimation errors guaranteed by the k-hop PPSO, we first modify the space robustness of the STL tasks to account for these errors, and then exploit the modified robustness to design a decentralized continuous-time feedback controller that ensures satisfaction of the STL tasks even under worst-case estimation errors. A simulation result is provided to validate the proposed framework.
- [28] arXiv:2602.05644 [pdf, ps, other]
-
Title: UAV Trajectory Optimization via Improved Noisy Deep Q-NetworkSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
This paper proposes an Improved Noisy Deep Q-Network (Noisy DQN) to enhance the exploration and stability of Unmanned Aerial Vehicle (UAV) when applying deep reinforcement learning in simulated environments. This method enhances the exploration ability by combining the residual NoisyLinear layer with an adaptive noise scheduling mechanism, while improving training stability through smooth loss and soft target network updates. Experiments show that the proposed model achieves faster convergence and up to $+40$ higher rewards compared to standard DQN and quickly reach to the minimum number of steps required for the task 28 in the 15 * 15 grid navigation environment set up. The results show that our comprehensive improvements to the network structure of NoisyNet, exploration control, and training stability contribute to enhancing the efficiency and reliability of deep Q-learning.
- [29] arXiv:2602.05715 [pdf, ps, other]
-
Title: Sound Field Estimation Using Optimal Transport Barycenters in the Presence of Phase ErrorsSubjects: Signal Processing (eess.SP)
This study introduces a novel approach for estimating plane-wave coefficients in sound field reconstruction, specifically addressing challenges posed by error-in-variable phase perturbations. Such systematic errors typically arise from sensor mis-calibration, including uncertainties in sensor positions and response characteristics, leading to measurement-induced phase shifts in plane wave coefficients. Traditional methods often result in biased estimates or non-convex solutions. To overcome these issues, we propose an optimal transport (OT) framework. This framework operates on a set of lifted non-negative measures that correspond to observation-dependent shifted coefficients relative to the unperturbed ones. By applying OT, the supports of the measures are transported toward an optimal average in the phase space, effectively morphing them into an indistinguishable state. This optimal average, known as barycenter, is linked to the estimated plane-wave coefficients using the same lifting rule. The framework addresses the ill-posed nature of the problem, due to the large number of plane waves, by adding a constant to the ground cost, ensuring the sparsity of the transport matrix. Convex consistency of the solution is maintained. Simulation results confirm that our proposed method provides more accurate coefficient estimations compared to baseline approaches in scenarios with both additive noise and phase perturbations.
- [30] arXiv:2602.05724 [pdf, ps, other]
-
Title: Reciprocity Calibration of Dual-Antenna Repeaters via MMSE EstimationComments: 13 pages, 9 figuresSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
This paper proposes a novel Bayesian reciprocity calibration method that consistently ensures uplink and downlink channel reciprocity in repeater-assisted multiple-input multiple-output (MIMO) systems. The proposed algorithm is formulated under the minimum mean-square error (MMSE) criterion. Its Bayesian framework incorporates complete statistical knowledge of the signal model, noise, and prior distributions, enabling a coherent design that achieves both low computational complexity and high calibration accuracy. To further enhance phase alignment accuracy, which is critical for calibration tasks, we develop a von Mises denoiser that exploits the fact that the target parameters lie on the circle in the complex plane. Simulation results demonstrate that the proposed MMSE algorithm achieves substantially improved estimation accuracy compared with conventional deterministic non-linear least-squares (NLS) methods, while maintaining comparable computational complexity. Furthermore, the proposed method exhibits remarkably fast convergence, making it well suited for practical implementation.
- [31] arXiv:2602.05738 [pdf, ps, other]
-
Title: Disc-Centric Contrastive Learning for Lumbar Spine Severity GradingSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
This work examines a disc-centric approach for automated severity grading of lumbar spinal stenosis from sagittal T2-weighted MRI. The method combines contrastive pretraining with disc-level fine-tuning, using a single anatomically localized region of interest per intervertebral disc. Contrastive learning is employed to help the model focus on meaningful disc features and reduce sensitivity to irrelevant differences in image appearance. The framework includes an auxiliary regression task for disc localization and applies weighted focal loss to address class imbalance. Experiments demonstrate a 78.1% balanced accuracy and a reduced severe-to-normal misclassification rate of 2.13% compared with supervised training from scratch. Detecting discs with moderate severity can still be challenging, but focusing on disc-level features provides a practical way to assess the lumbar spinal stenosis.
- [32] arXiv:2602.05770 [pdf, ps, other]
-
Title: Zero-Shot TTS With Enhanced Audio Prompts: Bsc Submission For The 2026 Wildspoof Challenge TTS TrackComments: Accepted to ICASSP 2026Subjects: Audio and Speech Processing (eess.AS)
We evaluate two non-autoregressive architectures, StyleTTS2 and F5-TTS, to address the spontaneous nature of in-the-wild speech. Our models utilize flexible duration modeling to improve prosodic naturalness. To handle acoustic noise, we implement a multi-stage enhancement pipeline using the Sidon model, which significantly outperforms standard Demucs in signal quality. Experimental results show that finetuning enhanced audios yields superior robustness, achieving up to 4.21 UTMOS and 3.47 DNSMOS. Furthermore, we analyze the impact of reference prompt quality and length on zero-shot synthesis performance, demonstrating the effectiveness of our approach for realistic speech generation.
- [33] arXiv:2602.05802 [pdf, ps, other]
-
Title: Discrete Aware Tensor Completion via Convexized $\ell_0$-Norm ApproximationSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
We consider a novel algorithm, for the completion of partially observed low-rank tensors, where each entry of the tensor can be chosen from a discrete finite alphabet set, such as in common image processing problems, where the entries represent the RGB values. The proposed low-rank tensor completion (TC) method builds on the conventional nuclear norm (NN) minimization-based low-rank TC paradigm, through the addition of a discrete-aware regularizer, which enforces discreteness in the objective of the problem, by an $\ell_0$-norm regularizer that is approximated by a continuous and differentiable function normalized via fractional programming (FP) under a proximal gradient (PG) framework, in order to solve the proposed problem. Simulation results demonstrate the superior performance of the new method both in terms of normalized mean square error (NMSE) and convergence, compared to the conventional state of-the-art (SotA) techniques, including NN minimization approaches, as well as a mixture of the latter with a matrix factorization approach.
- [34] arXiv:2602.05803 [pdf, ps, other]
-
Title: Privacy-Preserving Dynamic Average Consensus by Masking Reference SignalsComments: Accepted at ACC 2026Subjects: Systems and Control (eess.SY)
In multi-agent systems, dynamic average consensus (DAC) is a decentralized estimation strategy in which a set of agents tracks the average of time-varying reference signals. Because DAC requires exchanging state information with neighbors, attackers may gain access to these states and infer private information. In this paper, we develop a privacy-preserving method that protects each agent's reference signal from external eavesdroppers and honest-but-curious agents while achieving the same convergence accuracy and convergence rate as conventional DAC. Our approach masks the reference signals by having each agent draw a random real number for each neighbor, exchanges that number over an encrypted channel at the initialization, and computes a masking value to form a masked reference. Then the agents run the conventional DAC algorithm using the masked references. Convergence and privacy analyses show that the proposed algorithm matches the convergence properties of conventional DAC while preserving the privacy of the reference signals. Numerical simulations validate the effectiveness of the proposed privacy-preserving DAC algorithm.
- [35] arXiv:2602.05876 [pdf, ps, other]
-
Title: IDSOR: Intensity- and Distance-Aware Statistical Outlier Removal for Weather-Robust LiDAR Point CloudsSubjects: Signal Processing (eess.SP)
LiDAR point clouds captured in rain or snow are often corrupted by weather-induced returns, which can degrade perception and safety-critical scene understanding. This paper proposes Intensity- and Distance-Aware Statistical Outlier Removal (IDSOR), a range-adaptive filtering method that jointly exploits intensity cues and neighborhood sparsity. By incorporating an empirical, range-dependent distribution of weather returns into the threshold design, IDSOR suppresses weather-induced points while preserving fine structural details without cumbersome manual parameter tuning. We also propose a variant that uses a previously proposed method to estimate the weather return distribution from data, and integrates it into IDSOR. Experiments on simulation-augmented level-crossing measurements and on the Winter Adverse Driving dataset (WADS) demonstrate that IDSOR achieves a favorable precision-recall trade-off, maintaining both precision and recall above 90% on WADS.
- [36] arXiv:2602.06026 [pdf, ps, other]
-
Title: GUARDIAN: Safety Filtering for Systems with Perception Models Subject to Adversarial AttacksComments: 6 pages, 4 figures, submitted to L-CSS/CDCSubjects: Systems and Control (eess.SY)
Safety filtering is an effective method for enforcing constraints in safety-critical systems, but existing methods typically assume perfect state information. This limitation is especially problematic for systems that rely on neural network (NN)-based state estimators, which can be highly sensitive to noise and adversarial input perturbations. We address these problems by introducing GUARDIAN: Guaranteed Uncertainty-Aware Reachability Defense against Adversarial INterference, a safety filtering framework that provides formal safety guarantees for systems with NN-based state estimators. At runtime, GUARDIAN uses neural network verification tools to provide guaranteed bounds on the system's state estimate given possible perturbations to its observation. It then uses a modified Hamilton-Jacobi reachability formulation to construct a safety filter that adjusts the nominal control input based on the verified state bounds and safety constraints. The result is an uncertainty-aware filter that ensures safety despite the system's reliance on an NN estimator with noisy, possibly adversarial, input observations. Theoretical analysis and numerical experiments demonstrate that GUARDIAN effectively defends systems against adversarial attacks that would otherwise lead to a violation of safety constraints.
Cross-lists for Fri, 6 Feb 26
- [37] arXiv:2602.04904 (cross-list from cs.LG) [pdf, ps, other]
-
Title: DCER: Dual-Stage Compression and Energy-Based ReconstructionComments: 13 pages, 2 figures, 8 tables. Submitted to ICML 2026. Code will be available on GitHubSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Multimodal fusion faces two robustness challenges: noisy inputs degrade representation quality, and missing modalities cause prediction failures. We propose DCER, a
unified framework addressing both challenges through dual-stage compression and energy-based reconstruction. The compression stage operates at two levels:
within-modality frequency transforms (wavelet for audio, DCT for video) remove noise while preserving task-relevant patterns, and cross-modality bottleneck tokens
force genuine integration rather than modality-specific shortcuts. For missing modalities, energy-based reconstruction recovers representations via gradient descent
on a learned energy function, with the final energy providing intrinsic uncertainty quantification (\r{ho} > 0.72 correlation with prediction error). Experiments on
CMU-MOSI, CMU-MOSEI, and CH-SIMS demonstrate state-of-the-art performance across all benchmarks, with a U-shaped robustness pattern favoring multimodal fusion at
both complete and high-missing conditions. The code will be available on Github. - [38] arXiv:2602.04932 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Comparing Euclidean and Hyperbolic K-Means for Generalized Category DiscoveryComments: 11 pages, 4 figures. To be published in the VISAPPSubjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Hyperbolic representation learning has been widely used to extract implicit hierarchies within data, and recently it has found its way to the open-world classification task of Generalized Category Discovery (GCD). However, prior hyperbolic GCD methods only use hyperbolic geometry for representation learning and transform back to Euclidean geometry when clustering. We hypothesize this is suboptimal. Therefore, we present Hyperbolic Clustered GCD (HC-GCD), which learns embeddings in the Lorentz Hyperboloid model of hyperbolic geometry, and clusters these embeddings directly in hyperbolic space using a hyperbolic K-Means algorithm. We test our model on the Semantic Shift Benchmark datasets, and demonstrate that HC-GCD is on par with the previous state-of-the-art hyperbolic GCD method. Furthermore, we show that using hyperbolic K-Means leads to better accuracy than Euclidean K-Means. We carry out ablation studies showing that clipping the norm of the Euclidean embeddings leads to decreased accuracy in clustering unseen classes, and increased accuracy for seen classes, while the overall accuracy is dataset dependent. We also show that using hyperbolic K-Means leads to more consistent clusters when varying the label granularity.
- [39] arXiv:2602.05011 (cross-list from math.OC) [pdf, ps, other]
-
Title: Banach Control Barrier Functions for Large-Scale Swarm ControlComments: Accepted by IEEE ACC 2026Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper studies the safe control of very large multi-agent systems via a generalized framework that employs so-called Banach Control Barrier Functions (B-CBFs). Modeling a large swarm as probability distribution over a spatial domain, we show how B-CBFs can be used to appropriately capture a variety of macroscopic constraints that can integrate with large-scale swarm objectives. Leveraging this framework, we define stable and filtered gradient flows for large swarms, paying special attention to optimal transport algorithms. Further, we show how to derive agent-level, microscopical algorithms that are consistent with macroscopic counterparts in the large-scale limit. We then identify conditions for which a group of agents can compute a distributed solution that only requires local information from other agents within a communication range. Finally, we showcase the theoretical results over swarm systems in the simulations section.
- [40] arXiv:2602.05078 (cross-list from cs.CV) [pdf, ps, other]
-
Title: Food Portion Estimation: From Pixels to CaloriesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Reliance on images for dietary assessment is an important strategy to accurately and conveniently monitor an individual's health, making it a vital mechanism in the prevention and care of chronic diseases and obesity. However, image-based dietary assessment suffers from estimating the three dimensional size of food from 2D image inputs. Many strategies have been devised to overcome this critical limitation such as the use of auxiliary inputs like depth maps, multi-view inputs, or model-based approaches such as template matching. Deep learning also helps bridge the gap by either using monocular images or combinations of the image and the auxillary inputs to precisely predict the output portion from the image input. In this paper, we explore the different strategies employed for accurate portion estimation.
- [41] arXiv:2602.05156 (cross-list from cs.RO) [pdf, ps, other]
-
Title: PLATO Hand: Shaping Contact Behavior with Fingernails for Precise ManipulationSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
We present the PLATO Hand, a dexterous robotic hand with a hybrid fingertip that embeds a rigid fingernail within a compliant pulp. This design shapes contact behavior to enable diverse interaction modes across a range of object geometries. We develop a strain-energy-based bending-indentation model to guide the fingertip design and to explain how guided contact preserves local indentation while suppressing global bending. Experimental results show that the proposed robotic hand design demonstrates improved pinching stability, enhanced force observability, and successful execution of edge-sensitive manipulation tasks, including paper singulation, card picking, and orange peeling. Together, these results show that coupling structured contact geometry with a force-motion transparent mechanism provides a principled, physically embodied approach to precise manipulation.
- [42] arXiv:2602.05157 (cross-list from cs.SE) [pdf, ps, other]
-
Title: The Necessity of a Holistic Safety Evaluation Framework for AI-Based Automation FeaturesSubjects: Software Engineering (cs.SE); Systems and Control (eess.SY)
The intersection of Safety of Intended Functionality (SOTIF) and Functional Safety (FuSa) analysis of driving automation features has traditionally excluded Quality Management (QM) components from rigorous safety impact evaluations. While QM components are not typically classified as safety-relevant, recent developments in artificial intelligence (AI) integration reveal that such components can contribute to SOTIF-related hazardous risks. Compliance with emerging AI safety standards, such as ISO/PAS 8800, necessitates re-evaluating safety considerations for these components. This paper examines the necessity of conducting holistic safety analysis and risk assessment on AI components, emphasizing their potential to introduce hazards with the capacity to violate risk acceptance criteria when deployed in safety-critical driving systems, particularly in perception algorithms. Using case studies, we demonstrate how deficiencies in AI-driven perception systems can emerge even in QM-classified components, leading to unintended functional behaviors with critical safety implications. By bridging theoretical analysis with practical examples, this paper argues for the adoption of comprehensive FuSa, SOTIF, and AI standards-driven methodologies to identify and mitigate risks in AI components. The findings demonstrate the importance of revising existing safety frameworks to address the evolving challenges posed by AI, ensuring comprehensive safety assurance across all component classifications spanning multiple safety standards.
- [43] arXiv:2602.05304 (cross-list from cs.LG) [pdf, ps, other]
-
Title: A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG AlgorithmsSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
Stochastic variance-reduced algorithms such as Stochastic Average Gradient (SAG) and SAGA, and their deterministic counterparts like the Incremental Aggregated Gradient (IAG) method, have been extensively studied in large-scale machine learning. Despite their popularity, existing analyses for these algorithms are disparate, relying on different proof techniques tailored to each method. Furthermore, the original proof of SAG is known to be notoriously involved, requiring computer-aided analysis. Focusing on finite-sum optimization with smooth and strongly convex objective functions, our main contribution is to develop a single unified convergence analysis that applies to all three algorithms: SAG, SAGA, and IAG. Our analysis features two key steps: (i) establishing a bound on delays due to stochastic sub-sampling using simple concentration tools, and (ii) carefully designing a novel Lyapunov function that accounts for such delays. The resulting proof is short and modular, providing the first high-probability bounds for SAG and SAGA that can be seamlessly extended to non-convex objectives and Markov sampling. As an immediate byproduct of our new analysis technique, we obtain the best known rates for the IAG algorithm, significantly improving upon prior bounds.
- [44] arXiv:2602.05311 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Formal Synthesis of Certifiably Robust Neural Lyapunov-Barrier CertificatesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
Neural Lyapunov and barrier certificates have recently been used as powerful tools for verifying the safety and stability properties of deep reinforcement learning (RL) controllers. However, existing methods offer guarantees only under fixed ideal unperturbed dynamics, limiting their reliability in real-world applications where dynamics may deviate due to uncertainties. In this work, we study the problem of synthesizing \emph{robust neural Lyapunov barrier certificates} that maintain their guarantees under perturbations in system dynamics. We formally define a robust Lyapunov barrier function and specify sufficient conditions based on Lipschitz continuity that ensure robustness against bounded perturbations. We propose practical training objectives that enforce these conditions via adversarial training, Lipschitz neighborhood bound, and global Lipschitz regularization. We validate our approach in two practically relevant environments, Inverted Pendulum and 2D Docking. The former is a widely studied benchmark, while the latter is a safety-critical task in autonomous systems. We show that our methods significantly improve both certified robustness bounds (up to $4.6$ times) and empirical success rates under strong perturbations (up to $2.4$ times) compared to the baseline. Our results demonstrate effectiveness of training robust neural certificates for safe RL under perturbations in dynamics.
- [45] arXiv:2602.05324 (cross-list from cs.GT) [pdf, ps, other]
-
Title: A Data Driven Structural Decomposition of Dynamic Games via Best Response MapsComments: 11 pages, 6 figures, 5 tables, Submitted to RSS 2026Subjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
Dynamic games are powerful tools to model multi-agent decision-making, yet computing Nash (generalized Nash) equilibria remains a central challenge in such settings. Complexity arises from tightly coupled optimality conditions, nested optimization structures, and poor numerical conditioning. Existing game-theoretic solvers address these challenges by directly solving the joint game, typically requiring explicit modeling of all agents' objective functions and constraints, while learning-based approaches often decouple interaction through prediction or policy approximation, sacrificing equilibrium consistency. This paper introduces a conceptually novel formulation for dynamic games by restructuring the equilibrium computation. Rather than solving a fully coupled game or decoupling agents through prediction or policy approximation, a data-driven structural reduction of the game is proposed that removes nested optimization layers and derivative coupling by embedding an offline-compiled best-response map as a feasibility constraint. Under standard regularity conditions, when the best-response operator is exact, any converged solution of the reduced problem corresponds to a local open-loop Nash (GNE) equilibrium of the original game; with a learned surrogate, the solution is approximately equilibrium-consistent up to the best-response approximation error. The proposed formulation is supported by mathematical proofs, accompanying a large-scale Monte Carlo study in a two-player open-loop dynamic game motivated by the autonomous racing problem. Comparisons are made against state-of-the-art joint game solvers, and results are reported on solution quality, computational cost, and constraint satisfaction.
- [46] arXiv:2602.05376 (cross-list from math.OC) [pdf, ps, other]
-
Title: Distributed Model Predictive Control for Energy and Comfort Optimization in Large Buildings Using Piecewise Affine ApproximationSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
The control of large buildings encounters challenges in computational efficiency due to their size and nonlinear components. To address these issues, this paper proposes a Piecewise Affine (PWA)-based distributed scheme for Model Predictive Control (MPC) that optimizes energy and comfort through PWA-based quadratic programming. We utilize the Alternating Direction Method of Multipliers (ADMM) for effective decomposition and apply the PWA technique to handle the nonlinear components. To solve the resulting large-scale nonconvex problems, the paper introduces a convex ADMM algorithm that transforms the nonconvex problem into a series of smaller convex problems, significantly enhancing computational efficiency. Furthermore, we demonstrate that the convex ADMM algorithm converges to a local optimum of the original problem. A case study involving 36 zones validates the effectiveness of the proposed method. Our proposed method reduces execution time by 86\% compared to the centralized version.
- [47] arXiv:2602.05456 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Ontology-Driven Robotic Specification SynthesisComments: 8 pages, 9 figures, 3 tables, journalSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
This paper addresses robotic system engineering for safety- and mission-critical applications by bridging the gap between high-level objectives and formal, executable specifications. The proposed method, Robotic System Task to Model Transformation Methodology (RSTM2) is an ontology-driven, hierarchical approach using stochastic timed Petri nets with resources, enabling Monte Carlo simulations at mission, system, and subsystem levels. A hypothetical case study demonstrates how the RSTM2 method supports architectural trades, resource allocation, and performance analysis under uncertainty. Ontological concepts further enable explainable AI-based assistants, facilitating fully autonomous specification synthesis. The methodology offers particular benefits to complex multi-robot systems, such as the NASA CADRE mission, representing decentralized, resource-aware, and adaptive autonomous systems of the future.
- [48] arXiv:2602.05458 (cross-list from cs.SE) [pdf, ps, other]
-
Title: Emergence-as-Code for Self-Governing Reliable SystemsAuthors: Anatoly A. KrasnovskySubjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Systems and Control (eess.SY)
SLO-as-code has made per-service} reliability declarative, but user experience is defined by journeys whose reliability is an emergent property of microservice topology, routing, redundancy, timeouts/fallbacks, shared failure domains, and tail amplification. As a result, journey objectives (e.g., "checkout p99 < 400 ms") are often maintained outside code and drift as the system evolves, forcing teams to either miss user expectations or over-provision and gate releases with ad-hoc heuristics. We propose Emergence-as-Code (EmaC), a vision for making journey reliability computable and governable via intent plus evidence. An EmaC spec declares journey intent (objective, control-flow operators, allowed actions) and binds it to atomic SLOs and telemetry. A runtime inference component consumes operational artifacts (e.g., tracing and traffic configuration) to synthesize a candidate journey model with provenance and confidence. From the last accepted model, the EmaC compiler/controller derives bounded journey SLOs and budgets under explicit correlation assumptions (optimistic independence vs. pessimistic shared fate), and emits control-plane artifacts (burn-rate alerts, rollout gates, action guards) that are reviewable in a Git workflow. An anonymized artifact repository provides a runnable example specification and generated outputs.
- [49] arXiv:2602.05517 (cross-list from cs.CR) [pdf, ps, other]
-
Title: GNSS SpAmming: a spoofing-based GNSS denial-of-service attackAuthors: Sergio Angulo Cosín, Javier Junquera-Sánchez, Carlos Hernando-Ramiro, José-Antonio Gómez-SánchezSubjects: Cryptography and Security (cs.CR); Signal Processing (eess.SP)
GNSSs are vulnerable to attacks of two kinds: jamming (i.e. denying access to the signal) and spoofing (i.e. impersonating a legitimate satellite). These attacks have been extensively studied, and we have a myriad of countermeasures to mitigate them. In this paper we expose a new type of attack: SpAmming, which combines both approaches to achieve the same effects in a more subtle way.
Exploiting the CDMA multiplexing present in most GNSSs, and through a spoofing attack, this approach leads the receiver to lose access to the signal of a legitimate satellite, which would be equivalent to a denial of service; but in this case the existing countermeasures against jamming or spoofing would not allow safeguarding its effectiveness, as it is neither of them.
An experimental proof-of-concept is presented in which its impact is evaluated as a function of the previous state of the receiver. Using an SDR-based system developed at the Space Security Centre, the attack is executed against a cold-started receiver, a warm-started receiver, and a receiver that has already acquired the PVT solution and is navigating. Different attack configurations are also tested, starting from a raw emission of the false signal, to surgical Doppler effect configuration, code offset, etc. Although it is shown to be particularly successful against cold-started receivers, the results show that it is also effective in other scenarios, especially if accompanied by other attacks. We will conclude the article by outlining possible countermeasures to detect and, eventually, counteract it; and possible avenues of research to better understand its impact, especially for authenticated services such as OSNMA, and to characterize it in order to improve the response to similar attacks. - [50] arXiv:2602.05565 (cross-list from cs.CE) [pdf, ps, other]
-
Title: On Path-based Marginal Cost of Heterogeneous Traffic Flow for General NetworksSubjects: Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
Path marginal cost (PMC) is a crucial component in solving path-based system-optimal dynamic traffic assignment (SO-DTA), dynamic origin-destination demand estimation (DODE), and network resilience analysis. However, accurately evaluating PMC in heterogeneous traffic conditions poses significant challenges. Previous studies often focus on homogeneous traffic flow of single vehicle class and do not well address the interactive effect of heterogeneous traffic flows and the resultant computational issues. This study proposes a novel but simple method for approximately evaluating PMC in complex heterogeneous traffic condition. The method decomposes PMC into intra-class and inter-class terms and uses conversion factor derived from heterogeneous link dynamics to explicitly model the intricate relationships between vehicle classes. Additionally, the method considers the non-differentiable issue that arises when mixed traffic flow approaches system optimum conditions. The proposed method is tested on a small corridor network with synthetic demand and a large-scale network with calibrated demand from real-world data. Results demonstrated that our method exhibits superior performance in solving bi-class SO-DTA problems, yielding lower total travel cost and capturing the multi-class flow competition at the system optimum state.
- [51] arXiv:2602.05666 (cross-list from cs.IT) [pdf, ps, other]
-
Title: Low-complexity Design for Beam Coverage in Near-field and Far-field: A Fourier Transform ApproachComments: 13 pages, 7 figures, submitted to IEEE for possible publicationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In this paper, we study efficient beam coverage design for multi-antenna systems in both far-field and near-field cases. To reduce the computational complexity of existing sampling-based optimization methods, we propose a new low-complexity yet efficient beam coverage design. To this end, we first formulate a general beam coverage optimization problem to maximize the worst-case beamforming gain over a target region. For the far-field case, we show that the beam coverage design can be viewed as a spatial-frequency filtering problem, where angular coverage can be achieved by weight-shaping in the antenna domain via an inverse FT, yielding an infinite-length weighting sequence. Under the constraint of a finite number of antennas, a surrogate scheme is proposed by directly truncating this sequence, which inevitably introduces a roll-off effect at the angular boundaries, yielding degraded worst-case beamforming gain. To address this issue, we characterize the finite-antenna-induced roll-off effect, based on which a roll-off-aware design with a protective zoom is developed to ensure a flat beamforming-gain profile within the target angular region. Next, we extend the proposed method to the near-field case. Specifically, by applying a first-order Taylor approximation to the near-field channel steering vector (CSV), the two-dimensional (2D) beam coverage design (in both angle and inverse-range) can be transformed into a 2D inverse FT, leading to a low-complexity beamforming design. Furthermore, an inherent near-field range defocusing effect is observed, indicating that sufficiently wide angular coverage results in range-insensitive beam steering. Finally, numerical results demonstrate that the proposed FT-based approach achieves a comparable worst-case beamforming performance with that of conventional sampling-based optimization methods while significantly reducing the computational complexity.
- [52] arXiv:2602.05670 (cross-list from cs.SD) [pdf, ps, other]
-
Title: HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake DetectionComments: 20 pages, 8 figuresSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Advances in AIGC technologies have enabled the synthesis of highly realistic audio deepfakes capable of deceiving human auditory perception. Although numerous audio deepfake detection (ADD) methods have been developed, most rely on local temporal/spectral features or pairwise relations, overlooking high-order interactions (HOIs). HOIs capture discriminative patterns that emerge from multiple feature components beyond their individual contributions. We propose HyperPotter, a hypergraph-based framework that explicitly models these synergistic HOIs through clustering-based hyperedges with class-aware prototype initialization. Extensive experiments demonstrate that HyperPotter surpasses its baseline by an average relative gain of 22.15% across 11 datasets and outperforms state-of-the-art methods by 13.96% on 4 challenging cross-domain datasets, demonstrating superior generalization to diverse attacks and speakers.
- [53] arXiv:2602.05683 (cross-list from cs.RO) [pdf, ps, other]
-
Title: From Vision to Decision: Neuromorphic Control for Autonomous Navigation and TrackingSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Robotic navigation has historically struggled to reconcile reactive, sensor-based control with the decisive capabilities of model-based planners. This duality becomes critical when the absence of a predominant option among goals leads to indecision, challenging reactive systems to break symmetries without computationally-intense planners. We propose a parsimonious neuromorphic control framework that bridges this gap for vision-guided navigation and tracking. Image pixels from an onboard camera are encoded as inputs to dynamic neuronal populations that directly transform visual target excitation into egocentric motion commands. A dynamic bifurcation mechanism resolves indecision by delaying commitment until a critical point induced by the environmental geometry. Inspired by recently proposed mechanistic models of animal cognition and opinion dynamics, the neuromorphic controller provides real-time autonomy with a minimal computational burden, a small number of interpretable parameters, and can be seamlessly integrated with application-specific image processing pipelines. We validate our approach in simulation environments as well as on an experimental quadrotor platform.
- [54] arXiv:2602.05798 (cross-list from stat.ME) [pdf, ps, other]
-
Title: Learning False Discovery Rate Control via Model-Based Neural NetworksComments: Accepted to IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2026Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
Controlling the false discovery rate (FDR) in high-dimensional variable selection requires balancing rigorous error control with statistical power. Existing methods with provable guarantees are often overly conservative, creating a persistent gap between the realized false discovery proportion (FDP) and the target FDR level. We introduce a learning-augmented enhancement of the T-Rex Selector framework that narrows this gap. Our approach replaces the analytical FDP estimator with a neural network trained solely on diverse synthetic datasets, enabling a substantially tighter and more accurate approximation of the FDP. This refinement allows the procedure to operate much closer to the desired FDR level, thereby increasing discovery power while maintaining effective approximate control. Through extensive simulations and a challenging synthetic genome-wide association study (GWAS), we demonstrate that our method achieves superior detection of true variables compared to existing approaches.
- [55] arXiv:2602.05908 (cross-list from physics.app-ph) [pdf, ps, other]
-
Title: Self-Portrait of the Focusing Process in Speckle: III. Tailoring Complex Spatio-Temporal Focusing Laws To Overcome Reverberations in Reflection ImagingComments: 29 pages, 8 figures, 2 tablesSubjects: Applied Physics (physics.app-ph); Image and Video Processing (eess.IV); Medical Physics (physics.med-ph); Optics (physics.optics)
This is the third article in a series of three dealing with the exploitation of speckle for imaging purposes. In complex media, a fundamental limit is the multiple scattering phenomenon that completely blurs the imaging process in depth. Matrix imaging can provide a relevant framework for solving this problem. As it proved to be an adequate tool for probing reverberations in speckle [E. Giraudat et al., Part I], we will show how it can be used to tailor complex spatio-temporal focusing laws to monitor the interference between the multiply-reflected paths and the ballistic component of the wave-field. To do so, we extend the distortion matrix concept to the frequency domain. An iterative phase reversal process operated from the space-time Fourier space is then used to compensate for reverberations and optimize both the axial and transverse resolution of the confocal image. Here, we first present an experimental proof-of-concept consisting in imaging a tissue-mimicking phantom through a reverberating plate before outlining the potential and the limits of this strategy for transcranial ultrasound and beyond.
- [56] arXiv:2602.05967 (cross-list from cs.LG) [pdf, ps, other]
-
Title: A Hybrid Data-Driven Algorithm for Real-Time Friction Force Estimation in Hydraulic CylindersComments: Published in: 2025 33rd International Conference on Electrical Engineering (ICEE), Publisher IEEESubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Hydraulic systems are widely utilized in industrial applications due to their high force generation, precise control, and ability to function in harsh environments. Hydraulic cylinders, as actuators in these systems, apply force and position through the displacement of hydraulic fluid, but their operation is significantly influenced by friction force. Achieving precision in hydraulic cylinders requires an accurate friction model under various operating conditions. Existing analytical models, often derived from experimental tests, necessitate the identification or estimation of influencing factors but are limited in adaptability and computational efficiency. This research introduces a data-driven, hybrid algorithm based on Long Short-Term Memory (LSTM) networks and Random Forests for nonlinear friction force estimation. The algorithm effectively combines feature detection and estimation processes using training data acquired from an experimental hydraulic test setup. It achieves a consistent and stable model error of less than 10% across diverse operating conditions and external load variations, ensuring robust performance in complex situations. The computational cost of the algorithm is 1.51 milliseconds per estimation, making it suitable for real-time applications. The proposed method addresses the limitations of analytical models by delivering high precision and computational efficiency. The algorithm's performance is validated through detailed analysis and experimental results, including direct comparisons with the LuGre model. The comparison highlights that while the LuGre model offers a theoretical foundation for friction modeling, its performance is limited by its inability to dynamically adjust to varying operational conditions of the hydraulic cylinder, further emphasizing the advantages of the proposed hybrid approach in real-time applications.
- [57] arXiv:2602.05974 (cross-list from math.OC) [pdf, ps, other]
-
Title: Normalization of ReLU Dual for Cut Generation in Stochastic Mixed-Integer ProgramsSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
We study the Rectified Linear Unit (ReLU) dual, an existing dual formulation for stochastic programs that reformulates non-anticipativity constraints using ReLU functions to generate tight, non-convex, and mixed-integer representable cuts. While this dual reformulation guarantees convergence with mixed-integer state variables, it admits multiple optimal solutions that can yield weak cuts. To address this issue, we propose normalizing the dual in the extended space to identify solutions that yield stronger cuts. We prove that the resulting normalized cuts are tight and Pareto-optimal in the original state space. We further compare normalization with existing regularization-based approaches for handling dual degeneracy and explain why normalization offers key advantages. In particular, we show that normalization can recover any cut obtained via regularization, whereas the converse does not hold. Computational experiments demonstrate that the proposed approach outperforms existing methods by consistently yielding stronger cuts and reducing solution times on harder instances.
Replacements for Fri, 6 Feb 26
- [58] arXiv:2403.13694 (replaced) [pdf, ps, other]
-
Title: Overview of Publicly Available Degradation Data Sets for Tasks within Prognostics and Health ManagementSubjects: Databases (cs.DB); Signal Processing (eess.SP)
- [59] arXiv:2404.02687 (replaced) [pdf, ps, other]
-
Title: Dynamic Resource Allocation with Karma: An Experimental StudySubjects: General Economics (econ.GN); Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY)
- [60] arXiv:2408.11717 (replaced) [src]
-
Title: Evaluating S-Band Interference: Impact of Satellite Systems on Terrestrial NetworksComments: This submission is withdrawn because it duplicates arXiv:2501.05462, which should be considered the authoritative versionSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
- [61] arXiv:2409.12636 (replaced) [pdf, ps, other]
-
Title: Image inpainting for corrupted images by using the semi-super resolution GANSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
- [62] arXiv:2409.13901 (replaced) [pdf, ps, other]
-
Title: Self-Portrait of the Focusing Process in Speckle: II. Gouy Phase Shift for Defocus Correction and Pixel Depth ReassignmentAuthors: Flavien Bureau, Emma Brenner, Naiara Korta Martiartu, Elsa Giraudat, Arthur Le Ber, William Lambert, Louis Carmier, Aymeric Guibal, Mathias Fink, Alexandre AubryComments: 43 pages, 8 figures, 3 tablesSubjects: Medical Physics (physics.med-ph); Image and Video Processing (eess.IV); Applied Physics (physics.app-ph)
- [63] arXiv:2501.02279 (replaced) [pdf, ps, other]
-
Title: Stochastic Generalized Dynamic Games with Coupled Chance ConstraintsSubjects: Systems and Control (eess.SY)
- [64] arXiv:2502.10154 (replaced) [pdf, ps, other]
-
Title: Video Soundtrack Generation by Aligning Emotions and Temporal BoundariesComments: IEEE Transactions on Multimedia, 2026, in printSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
- [65] arXiv:2504.07053 (replaced) [pdf, ps, other]
-
Title: TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language ModelingComments: ICLR 2026Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [66] arXiv:2505.19447 (replaced) [pdf, ps, other]
-
Title: A Contrastive Learning Foundation Model Based on Perfectly Aligned Sample Pairs for Remote Sensing ImagesComments: This article has been accepted for publication in Geo-spatial Information Science, published by Taylor & FrancisSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [67] arXiv:2506.08520 (replaced) [pdf, ps, other]
-
Title: Plug-and-play linear attention with provable guarantees for training-free image restorationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [68] arXiv:2506.20628 (replaced) [pdf, ps, other]
-
Title: Maximum Likelihood Estimation for System Identification of Networks of Dynamical SystemsComments: This work has been submitted to the IEEE for possible publication. Submitted to IEEE Transactions on Automatic ControlSubjects: Systems and Control (eess.SY)
- [69] arXiv:2507.16838 (replaced) [pdf, ps, other]
-
Title: Segmentation-free Goodness of PronunciationComments: The article has been accepted for publication by IEEE TASLPROSubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [70] arXiv:2508.09020 (replaced) [pdf, ps, other]
-
Title: Improved SINR Approximation for Downlink SDMA-based Networks with Outdated Channel State InformationAuthors: Maria Cecilia Fernández Montefiore, Gustavo González, F. Javier López-Martínez, Fernando GregorioComments: 5 pages, 3 figures. This work has been submitted to the IEEE for publicationSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
- [71] arXiv:2508.14422 (replaced) [pdf, ps, other]
-
Title: A Sliced Learning Framework for Online Disturbance Identification in Quadrotor SO(3) Attitude ControlComments: v3: Major revision--Revised title; introduced the Sliced Learning framework; added comparative experiments, extended theoretical results, and supplementary materials (such as algorithms and proofs)Subjects: Systems and Control (eess.SY); Robotics (cs.RO); Optimization and Control (math.OC)
- [72] arXiv:2508.19345 (replaced) [pdf, ps, other]
-
Title: Privacy-Preserving Distributed Control for a Networked Battery Energy Storage SystemComments: Accepted for publication in Journal of Energy StorageSubjects: Systems and Control (eess.SY)
- [73] arXiv:2509.00260 (replaced) [pdf, ps, other]
-
Title: Sensor Insoles: A ReviewAuthors: Bastian Latsch, Felix Herbst, Mark Suppelt, Julian Seiler, Stephan Schaumann, Sven Suppelt, Alexander A. Altmann, Martin Grimmer, and Mario KupnikComments: 20 pages, 8 figures, review article published in IEEE Sensors JournalJournal-ref: IEEE Sensors Journal, vol. 26, no. 3, pp. 3577-3596, Dec. 2025Subjects: Signal Processing (eess.SP)
- [74] arXiv:2509.03070 (replaced) [pdf, ps, other]
-
Title: YOLO-based Bearing Fault Diagnosis With Continuous Wavelet TransformComments: 5 pages, 2 figures, 2 tables, submitted to IEEE Signal Processing LettersSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
- [75] arXiv:2509.13745 (replaced) [pdf, ps, other]
-
Title: Theoretical Validation of the Latent Optimally Partitioned-$\ell_2/\ell_1$ Penalty with Application to Angular Power Spectrum EstimationSubjects: Signal Processing (eess.SP)
- [76] arXiv:2509.24187 (replaced) [pdf, ps, other]
-
Title: Reasoning Beyond Majority Vote: An Explainable SpeechLM Framework for Speech Emotion RecognitionAuthors: Bo-Hao Su, Hui-Ying Shih, Jinchuan Tian, Jiatong Shi, Chi-Chun Lee, Carlos Busso, Shinji WatanabeSubjects: Audio and Speech Processing (eess.AS)
- [77] arXiv:2510.00771 (replaced) [pdf, ps, other]
-
Title: UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow MatchingComments: Accepted to ICASSP 2026Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
- [78] arXiv:2510.03055 (replaced) [pdf, ps, other]
-
Title: Compressed Multiband Sensing in FR3 Using Alternating Direction Method of MultipliersAuthors: Dexin Wang, Isha Jariwala, Ahmad Bazzi, Sundeep Rangan, Theodore S. Rappaport, Marwa ChafiiComments: accepted to IEEE Wireless Communications and Networking Conference (WCNC) 2026. This replacement is the final camera-ready version for the accepted paperSubjects: Signal Processing (eess.SP)
- [79] arXiv:2510.06528 (replaced) [pdf, ps, other]
-
Title: BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical MusicComments: Accepted by IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2026Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [80] arXiv:2510.08176 (replaced) [pdf, ps, other]
-
Title: Leveraging Whisper Embeddings for Audio-based Lyrics MatchingComments: Accepted at ICASSP 2026 (IEEE International Conference on Acoustics, Speech and Signal Processing)Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [81] arXiv:2510.24750 (replaced) [pdf, ps, other]
-
Title: Opportunistic Screening of Wolff-Parkinson-White Syndrome using Single-Lead AI-ECG Mobile System: A Real-World Study of over 3.5 million ECG Recordings in ChinaAuthors: Shun Huang, Deyun Zhang, Sumei Fan, Gongzheng Tang, Shijia Geng, Yujie Xiao, Xingliang Wu, Mingke Yan, Haoyu Wang, Rui Zhang, Zhaoji Fu, Shenda HongSubjects: Signal Processing (eess.SP)
- [82] arXiv:2511.11963 (replaced) [pdf, ps, other]
-
Title: Noisy MRI Reconstruction via MAP Estimation with an Implicit Deep-Denoiser PriorComments: 6 pages, 5 figures, conference paperSubjects: Image and Video Processing (eess.IV)
- [83] arXiv:2511.13690 (replaced) [pdf, ps, other]
-
Title: Novel Stability Criteria for Discrete and Hybrid Systems via Ramanujan Inner ProductsComments: 7 pages, 2 figuresSubjects: Systems and Control (eess.SY)
- [84] arXiv:2511.15458 (replaced) [pdf, ps, other]
-
Title: Division-based Receiver-agnostic RFF Identification in WiFi SystemsSubjects: Signal Processing (eess.SP)
- [85] arXiv:2512.01023 (replaced) [pdf, ps, other]
-
Title: Approximating Analytically-Intractable Likelihood Densities with Deterministic Arithmetic for Optimal Particle FilteringSubjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
- [86] arXiv:2512.12016 (replaced) [pdf, ps, other]
-
Title: Bandit-Based Rate Adaptation for a Single-Server QueueSubjects: Systems and Control (eess.SY); Information Theory (cs.IT)
- [87] arXiv:2601.13948 (replaced) [pdf, ps, other]
-
Title: Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization via Neural Audio Codec and Language ModelsComments: Accepted by ICASSP2026. Demo/code: this https URLSubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
- [88] arXiv:2601.18535 (replaced) [pdf, ps, other]
-
Title: Audio Inpainting in Time-Frequency Domain with Phase-Aware PriorComments: submitted to IEEE for reviewSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [89] arXiv:2601.19117 (replaced) [pdf, ps, other]
-
Title: Optimized $k$-means color quantization of digital images in machine-based and human perception-based colorspacesAuthors: Ranjan MaitraComments: 25 pages, 11 figures, 5 tables, accepted in the Journal of Electronic ImagingSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
- [90] arXiv:2601.19462 (replaced) [pdf, ps, other]
-
Title: Physical Human-Robot Interaction: A Critical Review of Safety ConstraintsSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
- [91] arXiv:2601.21069 (replaced) [pdf, ps, other]
-
Title: CompSRT: Quantization and Pruning for Image Super Resolution TransformersSubjects: Image and Video Processing (eess.IV)
- [92] arXiv:2601.22052 (replaced) [pdf, ps, other]
-
Title: Learning to Dial-a-Ride: A Deep Graph Reinforcement Learning Approach to the Electric Dial-a-Ride ProblemSubjects: Systems and Control (eess.SY)
- [93] arXiv:2602.02603 (replaced) [pdf, ps, other]
-
Title: EchoJEPA: A Latent Predictive Foundation Model for EchocardiographyAuthors: Alif Munim, Adibvafa Fallahpour, Teodora Szasz, Ahmadreza Attarpour, River Jiang, Brana Sooriyakanthan, Maala Sooriyakanthan, Heather Whitney, Jeremy Slivnick, Barry Rubin, Wendy Tsang, Bo WangSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [94] arXiv:2602.03070 (replaced) [pdf, ps, other]
-
Title: ProOPF: Benchmarking and Improving LLMs for Professional-Grade Power Systems Optimization ModelingAuthors: Chao Shen, Zihan Guo, Xu Wan, Zhenghao Yang, Yifan Zhang, Wengi Huang, Jie Song, Zongyan Zhang, Mingyang SunSubjects: Systems and Control (eess.SY); Software Engineering (cs.SE)
- [95] arXiv:2602.03891 (replaced) [pdf, ps, other]
-
Title: Sounding Highlights: Dual-Pathway Audio Encoders for Audio-Visual Video Highlight DetectionComments: 5 pages, 2 figures, to appear in ICASSP 2026Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
- [96] arXiv:2602.04795 (replaced) [pdf, ps, other]
-
Title: Maximum-Volume Nonnegative Matrix FactorizationComments: arXiv admin note: substantial text overlap with arXiv:2412.06380 (this paper is an updated version of Chapter 7 of the thesis of the first author, available from arXiv:2412.06380). The code is available from this https URLSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Numerical Analysis (math.NA); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, eess, recent, 2602, contact, help (Access key information)