We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 1025 entries: 1-1025 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 7 May 26

[1]  arXiv:2605.04050 [pdf, ps, other]
Title: LCM: Lossless Context Management
Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL); Software Engineering (cs.SE)

We introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks. When benchmarked using Opus 4.6, our LCM-augmented coding agent, Volt, achieves higher scores than Claude Code on the OOLONG long-context eval, including at every context length between 32K and 1M tokens.
LCM may be considered both a vindication and extension of the recursive paradigm pioneered by Recursive Language Models (RLMs). Our results demonstrate that recursive context manipulation can outperform not just conventional LLMs, but frontier coding agents with native file-system access.
LCM departs from RLM by decomposing symbolic recursion into two deterministic, engine-managed mechanisms: recursive context compression, in which a hierarchical summary DAG automatically compacts older messages while retaining lossless pointers to every original; and recursive task partitioning, in which engine-managed parallel primitives like LLM-Map replace model-written loops. This trade-off, analogous to the move from GOTO to structured control flow in program-ming language design, sacrifices maximal flexibility for termination guarantees, zero-cost continuity on short tasks, and lossless retrievability of all prior state.

[2]  arXiv:2605.04052 [pdf, ps, other]
Title: Constraint-Aware Execution Planning for Hybrid Space-Ground Compute Workloads
Authors: Subhadip Mitra
Comments: 11 pages, 6 figures, 2 algorithms, 4 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Low Earth orbit (LEO) satellites increasingly carry compute hardware capable of on-board processing, yet each satellite generates roughly two orders of magnitude more data than it can downlink per orbit. This mismatch forces operators to decide, for every workload, which computation runs on-board and which runs on the ground, how intermediate data crosses the space-ground boundary through narrow contact windows, and how to maintain delivery guarantees over noisy channels. We present Constraint-Aware Execution (CAE), a planning system that takes a satellite identifier, a workload expressed as a directed acyclic graph of processing steps, and a set of orbital and resource constraints, and produces a deterministic, physically grounded execution plan. CAE operates in four phases: (1) orbital environment construction via SGP4 propagation with eclipse detection and ground station pass prediction, (2) compute placement using a cost model that compares on-board resource consumption against transfer overhead, (3) transfer insertion with adaptive forward error correction and security overhead modeling, and (4) greedy first-fit scheduling into orbital windows under power, thermal, compute, and communication constraints. We evaluate CAE against five representative workload patterns across satellites in distinct orbital regimes and demonstrate that the system produces feasible plans in under two seconds, correctly exploits onboard data reduction to minimize transfer volume, and adapts FEC and multi-pass allocation to varying channel conditions. CAE is deployed as a production API computing plans for any cataloged satellite using live two-line element data.

[3]  arXiv:2605.04054 [pdf, ps, other]
Title: Endogenous Regime Switching Driven by Scalar-Irreducible Learning Dynamics
Authors: Sheng Ran
Subjects: Machine Learning (cs.LG)

Achieving endogenous regime switching is crucial for the emergence of autonomous intelligence, yet remains a central challenge for existing machine learning frameworks, where such transitions are typically externally imposed. In this work, we introduce a classification that distinguishes scalar-reducible dynamics, which can be expressed as gradient flows driven by a scalar objective, from scalar-irreducible dynamics that cannot be reduced to such a form. While most existing machine learning systems operate within the scalar-reducible class, we demonstrate that scalar-irreducible dynamics naturally enable internally generated regime switching through feedback between fast dynamical variables and slow structural adaptation. Using a minimal dynamical model, we illustrate how this mechanism produces sustained endogenous regime transitions without external scheduling. Our results suggest a new dynamical paradigm for regime exploration and provide a potential route toward autonomous learning systems whose adaptive behavior is organized internally rather than externally prescribed.

[4]  arXiv:2605.04055 [pdf, ps, other]
Title: A Self-Attentive Meta-Optimizer with Group-Adaptive Learning Rates and Weight Decay
Subjects: Machine Learning (cs.LG)

Adaptive optimizers like AdamW apply uniform hyperparameters across all parameter groups, ignoring heterogeneous optimization dynamics across layers and modules. We address this limitation by proposing MetaAdamW - a new optimizer that integrates a self-attention mechanism to dynamically modulate per-group learning rates and weight decay. The modulation factors are produced by a lightweight Transformer encoder that operates on statistical features (gradient norms, momentum norms, correlations) extracted from each parameter group. To train the attention module, we introduce a meta-learning objective that combines gradient alignment, loss decrease, and generalization gap. A key novel contribution is the extension of homoscedastic uncertainty weighting (HUW) with task-specific priorities that directly scale the regularization terms - enabling domain knowledge to guide automatic loss balancing. Extensive experiments on five diverse tasks-time series forecasting (ETT), language modeling (WikiText-2), machine translation (Multi30k), image classification (CIFAR-10), and sentiment analysis (IMDB) - demonstrate that MetaAdamW consistently outperforms the standard AdamW baseline in terms of validation loss, accuracy, or perplexity. Depending on the task, MetaAdamW either reduces overall training time (by up to 17.11%) or improves performance (by up to 11.08%) while introducing only moderate overhead; in some cases, it can also mitigate issues of insufficient convergence caused by premature early stopping. Ablation studies validate the effectiveness of each component, including feature versions, grouping strategies, and the proposed priority-injected uncertainty weighting.

[5]  arXiv:2605.04056 [pdf, ps, other]
Title: Transformation Categorization Based on Group Decomposition Theory Using Parameter Division
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Representation learning seeks meaningful sensory representations without supervision and can model aspects of human development. Although many neural networks empirically learn useful features, a principled account of what makes a representation "good" remains elusive. We study unsupervised categorization of transformations between pairs of inputs under algebraic constraints. Classical disentanglement favors mutually independent factors and fails when factors are coupled. Our prior Galois-theoretic approach decomposes a group via normal subgroups by learning a product of two transformations with one factor constrained to a normal subgroup, covering both commutative and non-commutative cases. That method, however, relied on auxiliary assumptions (e.g., motion and isometry restrictions) not required by decomposition theory, and ablations did not separate theory-based from auxiliary effects. We propose parameter division for a single transformation: we split its parameter into components, impose homomorphism constraints mapping the full transformation to one component, and identify the normal subgroup as the set of transformations when that component is fixed to the identity. This formulation drops the previous auxiliary assumptions and applies more broadly. We evaluate on image pairs involving rotation, translation, and scale; ablations show that group-decomposition constraints drive appropriate categorization.

[6]  arXiv:2605.04057 [pdf, ps, other]
Title: Structured Progressive Knowledge Activation for LLM-Driven Neural Architecture Search
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This paper focuses on a key challenge in Neural Architecture Search (NAS): integrating established architectural knowledge while exploring new designs under expensive evaluations. Large language models (LLMs) are a promising assistant for NAS because they can translate rich architectural and coding priors into executable code edits. However, in practice, seemingly local revisions often propagate into non-local behavioral and performance shifts because a single edit can inadvertently couple multiple interacting functional factors, a phenomenon we refer to as functional entanglement. To make LLM knowledge usable under such entanglement, we propose Structured Progressive Knowledge Activation (SPARK), which activates relevant priors by explicitly selecting the functional factor to modify and conditioning the edit on that factor. This factor-conditioned editing reduces entangled side effects and yields more targeted, reliable architecture modifications. On CLRS-DFS, SPARK achieves a 28.1x sample-efficient architecture evolution speedup and yields a 22.9 percent relative improvement in OOD accuracy.

[7]  arXiv:2605.04058 [pdf, ps, other]
Title: MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning
Comments: AAAI2026 Accepted
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Parameter-efficient transfer learning (PETL) has emerged as a pivotal paradigm for adapting pre-trained foundation models to downstream tasks, significantly reducing trainable parameters yet suffering from substantial memory overhead caused by gradient backpropagation during fine-tuning. While memory-efficient transfer learning (METL) circumvents this challenge by bypassing backbone gradient computation via lightweight small side networks, its stringent memory constraint severely limits learning capacity of side networks, thereby significantly compromising performance. To address these limitations, we propose a novel Mixed-Precision Interactive Side Mixture-of-Experts framework (MP-ISMoE). Specifically, we first propose a Gaussian Noise Perturbed Iterative Quantization (GNP-IQ) scheme to quantize weights into lower-bits while effectively decreasing quantization errors. By leveraging memory conserved from GNP-IQ, we subsequently employ Interactive Side Mixture-of-Experts (ISMoE) to scaling up side networks without sacrificing overall memory efficiency. Different from conventional mixture-of-experts, ISMoE learns to select optimal experts by interacting with salient features from frozen backbones, thus suppressing knowledge forgetting and boosting performance. Extensive experiments across diverse vision-language and language-only tasks demonstrate that MP-ISMoE remarkably promotes accuracy compared to state-of-the-art METL approaches, while maintaining comparable parameter and memory efficiency.

[8]  arXiv:2605.04059 [pdf, ps, other]
Title: Continual Distillation of Teachers from Different Domains
Comments: Accepted to CVPR 2026
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Deep learning models continue to scale, with some requiring more storage than many large-scale datasets. Thus, we introduce a new paradigm: Continual Distillation (CD), where a student learns sequentially from a stream of teacher models without retaining access to earlier teachers. CD faces two challenges: teacher training data is unavailable, and teachers have varying expertise. We show that external unlabeled data enables Unseen Knowledge Transfer (UKT), allowing the student to acquire information from domains not present in the training data, while known to the teacher. We also show that sequential distillation causes Unseen Knowledge Forgetting (UKF) when transferred knowledge is lost after training on later teachers. To better trade off between UKT and UKF, we propose Self External Data Distillation (SE2D), a method that preserves logits on external data to stabilize learning across heterogeneous teachers. Experiments on multiple benchmarks show that SE2D reduces UKF and improves cross-domain generalization. The code and implementation for this work are publicly available at: https://github.com/Nicolas1203/continual_distillation.

[9]  arXiv:2605.04060 [pdf, ps, other]
Title: Lookahead Drifting Model
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Recently, a new paradigm named \emph{drifting model} has been proposed for mapping distributions, which achieves the SOTA image generation performance over ImageNet via one-step neural functional evaluation (NFE). The basic idea is to compute a drifting term at each training iteration and then push the output of the model towards the direction of the drifting term. In this paper, we propose a \emph{lookahead drifting model}. At each training iteration, we compute a set of drifting terms sequentially. Each drifting term is calculated by making use of previously computed ones as well as the positive samples and the output of the model. %One key step is to properly scale the drifting terms so that their magnitudes are in a comparable range. In principle, the drifting terms obtained at a later stage capture higher order gradient information towards the positive samples. At each training iteration, the model is optimized by pushing its output towards the direction of the (weighted) summation of the drifting terms. Experimental results on toy examples and CIFAR10 demonstrate the better performance of the new method than the baseline.

[10]  arXiv:2605.04061 [pdf, ps, other]
Title: Single-Position Intervention Fails: Distributed Output Templates Drive In-Context Learning
Comments: 8 pages, 3 figures. Accepted to the 2026 Learning and Intelligent Optimization Conference and workshops on Foundation Models for Science: Real-World Impact and Science-First Design, Latent & Implicit Thinking - Going Beyond CoT Reasoning, Logical Reasoning of Large Language Models at ICLR 2026
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Understanding how large language models encode task identity from few-shot demonstrations is a central open problem in mechanistic interpretability. Prior work uses linear probing to localize task representations, reporting high classification accuracy at specific layers. We reveal a striking dissociation: probing accuracy completely fails to predict causal importance. Single-position activation intervention achieves 0% task transfer across all 28 layers of Llama-3.2-3B-despite 100% probing accuracy at those same positions. This null result is itself a key finding, demonstrating that task encoding is fundamentally distributed. Multi-position intervention-replacing activations at all demonstration output tokens simultaneously-achieves up to 96% transfer (N=50, 95% CI: [87%, 99%]) at layer 8, pinpointing for the first time the causal locus of ICL task identity. We establish the generality of these findings across four models spanning three architecture families (LLaMA, Qwen, Gemma), discovering a universal intervention window at ~30% network depth. Causal tracing uncovers an asymmetric architecture: the query position is strictly necessary (53-100% disruption) while no individual demonstration position is necessary (0% disruption)-resolving a key ambiguity in prior accounts. Crucially, transfer depends on internal representation compatibility, not surface similarity (r=-0.05 vs r=0.31), ruling out trivial explanations. These results establish the distributed template hypothesis: ICL task identity is encoded as output format templates distributed across demonstration tokens, fundamentally reshaping our understanding of how in-context learning operates.

[11]  arXiv:2605.04062 [pdf, ps, other]
Title: EdgeRazor: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent years have witnessed an increasing interest in deploying LLMs on resource-constrained devices, among which quantization has emerged as a promising lightweight technique that converts full-precision model weights and activations into lower-bit formats. Existing weight quantization approaches can be roughly divided into three categories: Post-Training Quantization (PTQ) that calibrates quantized parameters on a small dataset without retraining but suffers from severe performance degradation below 4-bit, Quantization-Aware Training (QAT) that searches low-bit parameters using surrogate gradients but demands substantial computational resources, and Quantization-Aware Distillation that integrates QAT with knowledge transfer from a full-precision teacher but manually selects features to distill and relies heavily on teacher-specific data. In this paper, we propose EdgeRazor, a lightweight framework for LLMs with mixed-precision and extremely low-bit weight quantization. The EdgeRazor framework contains three modules: Mixed-Precision Quantization-Aware Distillation for the fine-grained control of precision, Adaptive Feature Distillation that derives an $n$-bit student from its 16-bit teacher, and Entropy-Aware KL Divergence on both human-annotated and distilled datasets, whose forward-reverse balance is determined solely by the teacher's output distribution. Empirical investigations of EdgeRazor are conducted on base, instruction-tuned, and multimodal LLMs. Notably, EdgeRazor with 1.88-bit surpasses all contenders with the 3-bit precision, especially outperforms the leading 2-bit PTQ methods by 11.3 points, within a 4-10$\times$ lower training budget than the leading QAT approach. EdgeRazor delivers higher compression ratios at all bit width; the 1.58-bit Qwen3-0.6B reduces storage from 1.41 GB to 0.28 GB while accelerating decoding by 15.1$\times$ relative to the 16-bit baseline.

[12]  arXiv:2605.04063 [pdf, ps, other]
Title: Investigating Trustworthiness of Nonparametric Deep Survival Models for Alzheimer's Disease Progression Analysis
Comments: 12 pages, 6 figures, 2 tables, IEEE/ACM Conference on Connected Health: Applications, Systems, and Engineering Technologies
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Alzheimer's Dementia (AD) is a progressive neurodegenerative disease marked by irreversible decline, making reliable modeling of its progression essential for effective patient care. Progression-aware methods such as survival analysis are therefore crucial tools for the early detection and monitoring of AD. Recent advancements in deep learning have demonstrated remarkable performance in survival tasks, but alarmingly fewer studies have been conducted in the domain of AD. Further, the studies that do exist do not consider learned bias within the model itself, which could result in unfair and unreliable predictions toward certain marginalized groups. As such, we conduct a rigorous study of fairness in AD progression analysis along with a thorough feature importance study to determine the characteristics which are most important for reliable AD predictions. Furthermore, we propose two novel fairness metrics, called Time-Dependent Concordance Impurity and Kaplan-Meier Fairness, to quantify bias with respect to sensitive attributes such as sex, race, and education in nonparametric survival models. Our study demonstrates that while deep learning powered survival models are robust tools which can aid clinicians in AD care decisions, they often exhibit considerable bias, representing important avenues for future research.

[13]  arXiv:2605.04064 [pdf, ps, other]
Title: Improving Medical VQA through Trajectory-Aware Process Supervision
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Reasoning capabilities are crucial for reliable medical visual question answering (VQA); however, existing datasets rarely include reasoning explanations.
We address this by generating reasoning trajectories for six medical VQA benchmarks using the COMCTS algorithm with open-source vision-language models, with an LLM serving as the verification judge.
Building on these generated datasets, we propose a two-stage training framework: supervised fine-tuning followed by Group Relative Policy Optimization (GRPO) with a novel process-based reward.
While standard approaches rely solely on exact-match rewards for final answers, we introduce a trajectory-aware reward that measures the similarity between generated and ground-truth reasoning processes.
Specifically, we embed reasoning steps using sentence transformers and compute the Dynamic Time Warping (DTW) distance between the resulting vector sequences.
Experiments across six benchmarks demonstrate that combining the DTW-based process reward with exact-match reward consistently outperforms SFT-only training, raising mean accuracy from 0.598 to 0.689, mean BERTScore from 0.845 to 0.881, and mean ROUGE-L from 0.665 to 0.748.
Our results highlight the importance of process supervision in training reasoning-capable medical VLMs.
We make our code and generated reasoning datasets publicly available at https://anonymous.4open.science/r/MICCAI-R1-MED-VQA-code-B14B/

[14]  arXiv:2605.04065 [pdf, ps, other]
Title: Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs
Comments: Accepted by ACL 2026
Subjects: Computation and Language (cs.CL); Emerging Technologies (cs.ET); Machine Learning (cs.LG)

Unsupervised reinforcement learning (RL) has emerged as a promising paradigm for enabling self-improvement in large language models (LLMs). However, existing unsupervised RL-based methods often lack the capacity to adapt to the model's evolving reasoning capabilities during training. Therefore, these methods can misdirect policy optimization in the absence of ground-truth supervision. To address this issue, we introduce FREIA, a novel RL-based algorithm built on two key innovations: (1) Free Energy-Driven Reward (FER) adapts rewards to balance consensus and exploration based on the Free Energy Principle. (2) Adaptive Advantage Shaping (AAS) adaptively adjusts learning signals based on the statistical characteristics of sampled rewards. Empirical evaluations on nine datasets across three reasoning tasks showcase that FREIA outperforms other unsupervised RL-based baselines. Notably, in mathematical reasoning tasks, FREIA surpasses other methods by an average of 0.5 to 3.5 points in Pass@1 using the DeepSeek-R1-Distill-Qwen-1.5B model.

[15]  arXiv:2605.04066 [pdf, ps, other]
Title: Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning
Comments: Accepted to ACL 2026 (Findings)
Subjects: Computation and Language (cs.CL); Emerging Technologies (cs.ET); Machine Learning (cs.LG)

Reinforcement Learning with Verifiable Rewards (RLVR) is an essential paradigm that enhances the reasoning capabilities of Large Language Models (LLMs). However, existing methods typically rely on static policy optimization schemes that misalign with the model's evolving reasoning capabilities. To address this issue, we propose Adaptive Power-Mean Policy Optimization (APMPO), which comprises two main innovations: Power-Mean Policy Optimization (PMPO) and Feedback-Adaptive Clipping (FAC). Specifically, PMPO introduces a generalized power-mean objective. This enables the model to adaptively transition from the signal-amplifying behavior of the arithmetic mean to the consistency-enforcing behavior of the geometric mean. FAC adaptively adjusts clipping bounds based on real-time reward statistics to overcome the limitations of static mechanisms. Capitalizing on these innovations, APMPO improves learning dynamics and reasoning performance. Extensive experiments on nine datasets across three reasoning tasks showcase the superiority of APMPO over state-of-the-art RLVR-based baselines. For instance, APMPO boosts the average Pass@1 score on mathematical reasoning benchmarks by 3.0 points compared to GRPO when using Qwen2.5-3B-Instruct.

[16]  arXiv:2605.04067 [pdf, ps, other]
Title: SemiConLens: Visual Analytics for 2D Semiconductor Discovery
Subjects: Human-Computer Interaction (cs.HC); Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)

The past few years have witnessed vibrant efforts in discovering new two-dimensional (2D) semiconductor materials from both academia and the industry, due to their promising potential in resolving the severe performance deterioration of traditional semiconductors resulting from condensed silicon thickness. However, existing methods (e.g., Density Functional Theory (DFT) or machine-learning-based approaches) suffer from various challenges such as small datasets, and reliability and trustworthiness issues. To bridge this gap, we propose SemiConLens, a visual analytics approach to combine human expertise with the power of ML to enable effective and reliable 2D semiconductor discovery. Specifically, we first develop a new Correlation Aware Multivariate Imputation (CAMI) method and use ML models like autoencoder, which can better learn from limited data and reveal uncertainty, to address the challenge of sparse data in semiconductivity prediction. Built upon this, our visualization module, consisting of three visualization views with linked interactions, allows material researchers to interactively filter, discover and compare 2D semiconductor candidates. A novel circular glyph design and a new cluster-aware layout optimization approach are proposed to effectively display all the user-configurable key attributes and possible prediction uncertainties of each semiconductor candidate, ensuring a reliable and trustable 2D semiconductor discovery. We assess SemiConLens through quantitative evaluations, expert interviews, and use cases. The results demonstrate SemiConLens's capability to help material researchers conduct effective discovery of desirable 2D semiconductors.

[17]  arXiv:2605.04068 [pdf, ps, other]
Title: Designing a double deep reinforcement learning selection tool for resilient demand prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The use of artificial intelligence in supply chain forecasting has attracted many scientific studies for several decades. However, the process of selecting an appropriate forecasting solution becomes a daunting task. This complexity arises due to the distinct features inherent to each dataset. Research to tackle this issue has been performed since the eighties but recent development of demand forecasting has opened new perspectives. This research aims to enhance automatic forecasting model selection by proposing a novel architecture that acts as a double deep reinforcement learning agent, selecting automatically a forecasting model from the forecasting committee at the time of prediction. Moreover, a novel early-stopping approach based on average reward convergence has been introduced to expedite training time. To evaluate the model's performance, an empirical study was conducted utilizing grocery sales datasets and snack demands datasets. The experimental results demonstrate the robustness of the proposed approach when compared to state-of-the-art methods.

[18]  arXiv:2605.04069 [pdf, ps, other]
Title: LAWS: Learning from Actual Workloads Symbolically -- A Self-Certifying Parametrized Cache Architecture for Neural Inference, Robotics, and Edge Deployment
Comments: 45 pages. Companion paper to arXiv:2604.06228 (Probabilistic Language Tries)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE)

We introduce LAWS (Learning from Actual Workloads Symbolically), a self-certifying inference caching architecture that builds a growing library of certified expert functions from deployment observations. Each expert covers a region of input space defined by a node in the Probabilistic Language Trie (PLT) of the base model and carries a formal error bound holding uniformly over all inputs. The central result is a self-certification theorem: for any input x, the LAWS approximation error is bounded by epsilon_fit + 2*Lambda(W)*C_E, where Lambda(W) is the model Lipschitz constant, C_E is the maximum embedding diameter, and epsilon_fit is the expert training error -- all checkable at deployment time without ground truth. We prove that LAWS generalizes both Mixture-of-Experts and KV prefix caching as special cases and is strictly more expressive than any fixed-K MoE or finite cache. Further results include a monotone hit rate theorem (any-match routing ensures coverage only increases), an expert library growth rate of O(2^H log N) where H is workload entropy, a fleet learning convergence theorem with Omega(K) speedup for K-unit fleets, and an over-the-air update bandwidth bound. We conjecture that LAWS is acquisition-optimal among stationary online caching algorithms and that the effective Lipschitz constant on the training distribution grows polynomially rather than exponentially in depth. Applications are developed for LLM inference, robotic control, and multi-agent edge deployment.

[19]  arXiv:2605.04070 [pdf, ps, other]
Title: Toward Human-AI Complementarity Across Diverse Tasks
Comments: 10 pages main text, 37 pages total with appendices
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Human-AI complementarity, the idea that combining human and AI judgments can outperform either alone, offers a promising pathway toward robust oversight of advanced AI systems. However, whether human-AI complementarity can be achieved on realistic tasks remains an open question. We investigate this through two approaches: hybridization and two AI assistance methods (top-2 assistance and subtask delegation), evaluated on a multi-domain dataset of 1,886 samples spanning knowledge, factuality, long-context reasoning, and deception detection. We find only modest complementarity gains. Baseline hybridization yields just +0.4 percentage points (pp) over AI alone (69.3\% vs 68.9\%), limited both by a small complementarity region (only 8.9\% of items where AI errs but humans do not) and the inability of confidence-based routing to identify it, since the model's confidence is similarly distributed across correct and incorrect predictions. Applied when AI has low confidence, top-2 assistance increases human accuracy from 28.4\% to 38.3\%, surpassing AI alone (37.7\%) -- but primarily because humans adopt correct AI suggestions, not because they successfully override AI errors. These findings suggest that the primary bottleneck is not human task accuracy per se, but the ability to route decisions to humans when it matters and to design assistance methods that enable humans to catch AI mistakes. Our quantitative and qualitative analyses pinpoint where and why each method succeeds or fails, offering concrete targets for future work. We will release our dataset and code upon request to support progress toward more effective human-AI collaboration for AI oversight.

[20]  arXiv:2605.04071 [pdf, ps, other]
Title: FlatASCEND: Autoregressive Clinical Sequence Generation with Continuous Time Prediction and Association-Based Pharmacological Testing
Comments: 22 pages, 2 figures, 12 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

Autoregressive models can predict clinical events, but generating patient-conditioned multi-step trajectories that respond to intervention tokens and testing whether those responses preserve known pharmacological associations has received limited attention. We present FlatASCEND, a 14.5M-parameter autoregressive clinical sequence model using flat composite tokens and a zero-inflated log-normal time head. Standard distributional metrics (Jaccard 0.889-0.954) do not distinguish FlatASCEND from trivial baselines; the model's value lies in conditional generation from patient-specific prefixes. A prompt-shuffle ablation shows patient-specific conditioning amplifies mechanistic pharmacological effects (2.0-2.2x for steroid to glucose, diuretic to potassium) while leaving confounding-driven associations unchanged (0.9x for insulin to glucose). An incident-user framework assesses directional consistency against prior pharmacological knowledge on MIMIC-IV (N=500 per comparison): 4/10 recover correct mechanistic directions, 2 reproduce treatment-context associations, 4 are incorrect (9/10 significant, Wilcoxon p<0.05). This pattern - partial recovery under residual confounding - is consistent with learned observational associations without causal distinction. Direct preference optimisation with surrogate reward destroys all correct associations (3/3 to 0/3), illustrating reward exploitation when reward and evaluation share an outcome domain. Generative evidence is strongest for short-horizon ICU data; outpatient temporal fidelity is weaker (median 10 vs 154 days on INSPECT), and zero-shot cross-site transfer degrades without adaptation.

[21]  arXiv:2605.04072 [pdf, ps, other]
Title: Sparse Autoencoder Decomposition of Clinical Sequence Model Representations: Feature Complexity, Task Specialisation, and Mortality Prediction
Comments: 17 pages, 4 figures, 7 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Sparse autoencoders (SAEs) have been applied to large language models and protein language models, but not systematically to electronic health record (EHR) foundation models. We train TopK SAEs on FlatASCEND, a 14.5-million-parameter autoregressive clinical sequence model, at all 10 residual stream extraction points on INSPECT (outpatient) and MIMIC-IV (ICU). SAE decomposition reveals progressive abstraction across transformer depth: layer-0 features are near-perfect token detectors (45.7% singleton), while layer-6 features span approximately 30 token types across multiple clinical categories (0.5% singleton). Under full-sequence simple linear probes, SAE features outperform dense representations for discrete event prediction (mortality) while dense representations outperform for continuous magnitude prediction (length of stay) - a probe-level representational phenomenon that does not extend to clinically relevant leakage-safe windows, where dense representations match or exceed SAE features across all tested settings (eICU-CRD 48-hour AUC: SAE 0.871 versus dense 0.880; base model zero-shot, SAE dictionaries trained on eICU activations; MIMIC-IV: 0.836 versus 0.914; INSPECT 1-year/3-year: 0.697 versus 0.800). A delta-mode intervention method reduces SAE perturbation noise by 86x, enabling cleaner feature-level experiments, though the resulting perturbation effects are larger than random controls in 3 of 4 conditions but not formally significant. Feature reproducibility across random seeds is 21%, and individual features should be interpreted as illustrative rather than stable.

[22]  arXiv:2605.04073 [pdf, ps, other]
Title: Confronting Label Indeterminacy in Automated Bail Decisions
Comments: This manuscript has been accepted for presentation as a short paper at the 21st International Conference of AI & Law in Singapore, June 8 to 12 of 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Bail decisions present a fundamental challenge for data-driven decision support systems. When bail is denied, the counterfactual outcome of whether the defendant would have appeared in court remains unobserved. As a result, historical bail data embed structural label indeterminacy: future decisions are influenced by past decisions whose outcomes are only partially knowable. Building automated systems on such data risks introducing bias and reinforcing feedback loops. This raises a core question for machine-learning systems intended to assist judicial actors: how should cases in which bail was denied be treated during model development? In a case study of bail decisions from the Unified Judicial System of Pennsylvania, we evaluate five contemporary approaches to handling label indeterminacy across three machine learning models, including a novel label imputation method motivated by the dynamics of bail decisions. Each method relies on unverifiable assumptions, yet all influence the models' predictive behaviour, sometimes even more so than the choice of model itself. Explainable AI analysis further reveals that these effects extend to the models' internal decision-making processes as well. Finally, we consider the notion of label indeterminacy from a legal perspective and assess the legitimacy of these approaches in the context of bail decision-making.

[23]  arXiv:2605.04074 [pdf, ps, other]
Title: A Physics-Aware Framework for Short-Term GPU Power Forecasting of AI Data Centers
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET); Operating Systems (cs.OS)

AI data centers experience rapid fluctuations in power demand due to the heterogeneity of computational tasks that they have to support. For example, the power profile of inference and training of large language models (LLMs) is quite distinct and big divergences can result in the instability of the underlying electricity grid. In this paper we propose, to the best of our knowledge, the first physics-informed DLinear time-series model that can accurately forecast power utilization of an AI data center 5-80 minutes (short-term forecasting) into the future. The physics, based on a multi-node lumped thermal resistance-capacitance (RC) network consistent with Newton's law of cooling, is captured using newly derived time-dependent ordinary differential equations (ODE) that separately models and interlinks power consumption with the GPU compute and memory utilization and temperature. The resulting model, that we refer to as PI-DLinear, trained and evaluated on a real AI data center dataset and is not only more accurate than the state-of-the-art (SOTA) models tested, but the forecast profile respects the underlying physics under power throttling and load transient events. Relative to the SOTA transformer-based and non-transformer-based models, improvements in forecasting accuracy (averaged across all look-back and prediction windows) range from 0.782%-39.08% for MSE, 0.993%-51.82% for MAE, and 0.370%-22.28% for RMSE.

[24]  arXiv:2605.04075 [pdf, ps, other]
Title: RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Multimodal Large Language Models face severe challenges in computational efficiency and memory consumption due to the substantial expansion of the visual KV cache when processing long visual contexts. Existing KV cache compression methods typically rely on the "persistence of importance" hypothesis to prune tokens. However, this approach proves fragile in multimodal settings due to two key issues: 1) Visual tokens display "deferred importance," initially exhibiting low salience but becoming pivotal during later decoding, which can lead to premature eviction. 2) Discrete pruning disrupts the inherent spatial continuity of visual cues. To address these challenges, we propose RetentiveKV, an entropy-driven KV cache optimization method that reformulates KV eviction from "discrete context truncation" to "continuous memory evolution" based on State Space Models. Our method leverages information entropy to quantify the information potential of low-attention tokens and integrates tokens scheduled for eviction into a continuous state space through entropy-guided state transitions, enabling their dynamic reactivation when semantic relevance arises during subsequent decoding. Extensive experiments on multimodal benchmarks demonstrate that RetentiveKV achieves 5.0 times KV cache compression and 1.5 times decoding acceleration.

[25]  arXiv:2605.04076 [pdf, ps, other]
Title: A Regulatory Governance Framework for AI-Driven Financial Fraud Detection in U.S. Banking: Integrating OCC, SR 11-7, CFPB, and FinCEN Compliance Requirements for Model Development, Validation, and Monitoring Lifecycles
Comments: 38 pages, submitted to Cogent Business & Management (Taylor & Francis), currently under peer review
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

U.S. financial institutions deploying AI-based fraud detection face a fragmented compliance landscape spanning four regulatory frameworks -- OCC Bulletin 2011-12, SR 11-7, the CFPB AI circular, and FinCEN BSA/SAR requirements -- with no integrated governance life cycle connecting these requirements to model development, validation, and monitoring practice. This paper presents the Regulatory Governance Framework for AI-Driven Financial Fraud Detection (RGF-AFFD), a three-tier governance architecture empirically anchored in a multi-study empirical program. Using the IEEE-CIS dataset (590,540 transactions) and ULB benchmark (284,807 transactions), we benchmark six architectures including an LSTM+XGBoost ensemble, and conduct ablation, temporal drift, SHAP interpretability, and BISG fairness analyses. The LSTM+XGBoost ensemble achieves ROC-AUC of 0.9289 (F1: 0.6360) with a benefit-cost ratio of 6:1. XGBoost demonstrates the strongest temporal stability (delta-AUC = -0.0017 versus -0.0626 for LSTM). The RDT-FG Regulatory Digital Twin meta-model translates metrics into four regulator-specific health scores and a composite Regulatory Fitness Index for continuous compliance monitoring. The RGF-AFFD is the first integrated deployment blueprint to simultaneously satisfy OCC, SR 11-7, CFPB, and FinCEN requirements, supported by a community bank implementation vignette and four evidence-based policy recommendations.

[26]  arXiv:2605.04077 [pdf, ps, other]
Title: Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Reinforcement learning with verifiable rewards (RLVR) has become a central paradigm for improving reasoning and code generation in large language models, and GRPO-style training is widely adopted for its simplicity and effectiveness. However, an important design choice remains underexplored: how token-level policy gradient terms are aggregated within each sampled group. Standard GRPO uses sequence aggregation, while recent work has advocated token aggregation as a better alternative. We show that these two rules induce different optimization biases: token aggregation introduces sign-length coupling, while sequence aggregation implicitly downweights longer responses through sequence-level equal weighting. To address this tension, we propose \textbf{Balanced Aggregation (BA)}, a simple drop-in replacement that computes token-level means separately within the positive and negative subsets and then combines them with sequence-count-based weights. Experiments with Qwen2.5-Math-7B and Qwen3-1.7B on DAPO-17k and Polaris, evaluated on six reasoning and coding benchmarks, show that BA consistently improves training stability and final performance over standard token and sequence aggregation. Our analysis further shows that the relative effectiveness of token and sequence aggregation is largely governed by response-length variation and the positive-negative length gap, highlighting aggregation as a critical design dimension in GRPO-style RLVR.

[27]  arXiv:2605.04078 [pdf, ps, other]
Title: Validity-Calibrated Reasoning Distillation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Reasoning distillation aims to transfer multi-step reasoning capabilities from large language models to smaller, more efficient ones. While recent methods have shown promising gains, they typically rely on static teacher-student hierarchies and frame distillation as trajectory imitation. This is misaligned with the structure of reasoning, where intermediate steps are often locally under-specified: global correctness constrains the final answer, but does not uniquely determine each intermediate move. We propose validity-calibrated reasoning distillation, a framework that treats reasoning distillation as a problem of local learning-signal allocation rather than path alignment. Instead of enforcing token-level imitation, we compare the student's and teacher's proposed next-step actions under the same prefix and use their relative local validity to modulate the strength of the distillation update. This yields a dynamic, context-dependent supervision mechanism that preserves the teacher's structural guidance while adapting update strength to local reasoning quality. Across mathematical reasoning, code generation, and instruction-following benchmarks, our method consistently outperforms strong distillation baselines. These results indicate that effective LLM reasoning distillation is governed not by rigid trajectory imitation, but by principled, locally calibrated allocation of learning signal.

[28]  arXiv:2605.04079 [pdf, ps, other]
Title: Efficient Handwriting-Based Alzheimer,s Disease Diagnosis Using a Low-Rank Mixture of Experts Deep Learning Framework
Comments: 26 pages, 6 figures, and 17 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Early and reliable detection of Alzheimer's disease (AD) is crucial for timely clinical intervention and improved patient management. It also supports the evaluation of emerging therapeutic strategies. In this paper, we propose a Low-Rank Mixture of Experts (LoRA-MoE) deep learning framework for Alzheimer's disease diagnosis based on handwriting analysis. Handwriting signals provide a non-invasive and scalable digital biomarker that captures subtle cognitive-motor impairments associated with early AD progression. The proposed architecture allows multiple experts to specialize in different handwriting patterns while sharing a common base network. This design enables efficient learning of general representations while reducing interference between experts. Each expert is equipped with lightweight low-rank adapters. This mechanism significantly reduces the number of trainable parameters compared with standard Mixture of Experts (MoE) models and improves training stability. The proposed framework is evaluated on the Diagnosis AlzheimeR WIth haNdwriting (DARWIN) dataset. Extensive experiments are conducted, including ablation studies on key architectural parameters such as hidden dimension size, number of experts, and LoRA rank. The method is compared with multilayer perceptron (MLP) and conventional MoE architectures. In addition, stacking ensemble strategies (StackMean and StackMax) are investigated to improve robustness and predictive performance. Experimental results show that the LoRA-MoE framework achieves powerful diagnostic performance while activating significantly fewer parameters during inference. These results highlight the potential of the proposed approach as an accurate and computationally efficient solution for handwriting-based Alzheimer's disease screening and digital health applications.

[29]  arXiv:2605.04080 [pdf, ps, other]
Title: Connecting online criminal behavior with machine learning: Using authorship attribution to analyze and link potential online traffickers
Comments: Doctoral thesis
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

This research investigated how online criminal activities can be better understood and connected using data-driven machine learning methods. Many illegal activities, such as human trafficking and illicit trade, have moved to online platforms where offenders hide behind anonymous accounts and frequently change identities. This makes it difficult for authorities to understand how large these networks are and how different online profiles may be linked.
The research shows that people tend to maintain consistent patterns in how they write advertisements and present images online, even when they try to stay anonymous. By analysing these patterns across large collections of online advertisements, the research demonstrates how to link related accounts and identify repeated behaviour across illegal online markets.
In addition, the research also addresses how such methods should be used responsibly. It proposes clear guidelines to ensure that privacy, fairness, and transparency are respected when these tools are applied. Overall, the research provides practical ways to support law enforcement investigations while emphasising careful and ethical use.

[30]  arXiv:2605.04081 [pdf, ps, other]
Title: Time series causal discovery with variable lags
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Causal Bayesian Networks (CBNs) are a powerful tool for reasoning under uncertainty about complex real-world problems. Such problems evolve over time, responding to external shocks as they occur. To support decision-making, CBNs require a cause-and-effect map of the variables under consideration, known as the network's structure. Learning the graphical structure of a causal model from data remains challenging; learning it from time-series data is even harder because dependencies may arise at different time lags. Existing time-series causal discovery methods often assume a fixed lag window and do not explicitly optimise edge-specific lags. We propose a Tabu-based structure learning algorithm that searches for a time-ordered directed structure (i.e., where every edge respects time) while allowing edge-specific lags up to a specified maximum lag. The approach uses a decomposable BIC-based score with node-specific effective sample sizes and an explicit lag-length penalty encouraging parsimonious delay assignments while preserving efficient local score updates. We provide theoretical guarantees of validity and local optimality, and we also describe a parallel implementation for improved scalability. In simulations, the method recovered graph structure competitively and estimated lags accurately when true adjacencies were recovered. On a real-world UK COVID-19 policy dataset, the learnt structure was dominated by short delays while retaining a substantial minority of longer-lag dependencies, consistent with delayed behavioural and epidemiological effects.

[31]  arXiv:2605.04082 [pdf, ps, other]
Title: Enhancing the interpretability of spatially variable N2O model predictions with soft sensors during wastewater treatment
Comments: 1 Graphical abstract, 2 Tables, 7 Figures
Subjects: Machine Learning (cs.LG)

Model-based solutions for nitrous oxide (N2O) emissions from wastewater treatment plants (WWTP) are informed by operational datasets designed to control nutrient levels in liquid waste, coupled with dedicated campaigns for N2O measurements. We analysed how machine learning (ML) models predict disturbances to WWT operation and spatially variable N2O emissions. A real dataset was investigated to validate the modelling framework from N2O emissions predicted by four ML models (R2 = 0.79 - 0.89). Monitoring campaigns for N2O were simulated with a plant-wide mechanistic model to include additional sensors, site-level N2O datasets, and wastewater disturbances (n = 16). ML models were highly accurate (0.97 +- 0.02, n = 80), but the feature importance depended on the model, the scenario and the N2O measurement scale (reactor vs. WWTP). We argue that N2O soft sensor model predictions are limited to the measuring location and the methodological uncertainty of the dataset, which affect the interpretability of the model. Lastly, the analysis of the mechanistic model structure exposed interactions between autotrophic and heterotrophic pathways over nitric oxide which can overestimate aerobic nitrite production and bias the N2O pathway contributions.

[32]  arXiv:2605.04083 [pdf, ps, other]
Title: AsymmetryZero: A Framework for Operationalizing Human Expert Preferences as Semantic Evals
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Much of the focus in RL today is on evaluation design: building meaningful evals that serve simultaneously as benchmarks and as well-defined reward signals for post-training. Yet, many real-world tasks are governed by subjective, procedural, and domain-specific requirements that are difficult to encode as exact-match targets or open-ended preference judgments frequently used in RL pipelines today. In this work, we present AsymmetryZero, a framework for operationalizing human expert preferences as semantic evals. AsymmetryZero represents each task as a stable evaluation contract that makes grading criteria explicit: what is being graded, how each criterion is judged, and how criterion-level decisions are aggregated into a task outcome. The same contract can be executed using Inspect for model-only evaluations, as well as the Harbor Framework for agentic evaluations, enabling comparable scores and shared audit artifacts across both settings. We argue that the central challenge in post-training today is the faithful encoding of expert requirements into the evaluation itself. To that end, we present a study using Harbor that holds task contracts fixed and compares a five-model frontier jury against a five-model compact jury across four frontier-class solvers (Claude Opus 4.6, GPT-5.4, Grok-4.20, Gemini-3.1-Pro). We find that criterion-level frontier-vs-compact agreement ranges from $75.9\%$ to $89.6\%$ (strict common-subset agreement: $77.8\%$ to $92.1\%$), while compact juries exhibit substantially higher internal dissent (3--2 split rate $28.7\%$--$32.4\%$) than frontier juries ($6.1\%$--$11.5\%$). Verifier traces further show that compact juries reduce per-criterion judging cost to roughly $4.2\%$--$5.6\%$ of frontier and latency to roughly $21.7\%$--$27.1\%$, even as aggregated task-level outcomes often remain comparatively stable.

[33]  arXiv:2605.04084 [pdf, ps, other]
Title: FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)

Compressing large language models (LLMs) for deployment on commodity GPUs remains challenging: conventional scalar quantization is limited to fixed bit-widths (e.g., 8/4/3-bit), offers only a few discrete compression points, and typically requires calibration data. We present FASQ (Flexible Accelerated Subspace Quantization), a calibration-free framework that applies product quantization to LLM weight matrices. By tuning two parameters, sub-vector size and codebook cardinality, FASQ exposes a continuous design space spanning 27-49% of the original FP16 model size, filling compression gaps that fixed-bit schemes cannot reach. On Meta-Llama-3-8B, FASQ surpasses 4-bit GPTQ and AWQ in accuracy (67.1-67.7 avg.) at 37-42% model size, with consistent results on Qwen3-8B and Qwen3.5-9B-Base. To make product quantization practical at inference time, we design custom CUDA kernels: a LUT-free direct-compute GEMV for decode and an output-stationary double-buffered LUT GEMM for prefill, both with split-K parallelism. On an RTX~3090, FASQ achieves 45.2 tok/s decode at effective 4-bit (2.56x memory reduction) and 51.8 tok/s at effective 3-bit (2.80x), both surpassing FP16 tensor-core performance (43.9 tok/s) and delivering 1.6 to 1.8x the throughput of AWQ, 2.5 to 2.5x of GPTQ, and 4.3 to 5x of RTN. FASQ is the only compressed method that accelerates decode beyond FP16, offering calibration-free compression, continuous size-quality trade-offs, and real-time inference on a single consumer GPU.

[34]  arXiv:2605.04085 [pdf, ps, other]
Title: Evaluating Patient Safety Risks in Generative AI: Development and Validation of a FMECA Framework for Generated Clinical Content
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Methodology (stat.ME)

Objectives: Large language models (LLMs) are increasingly used for clinical text summarization, yet structured methods to assess associated patient safety risks remain limited. Failure Mode, Effects, and Criticality Analysis (FMECA) provides a proactive framework for systematic risk identification but has not been adapted to LLM-generated clinical content. This study aimed to develop and validate a novel FMECA framework for the prospective assessment of patient safety risks in LLM-generated clinical summaries.
Materials and Methods: An interdisciplinary expert panel (n = 8) developed a taxonomy of failure modes through literature review and brainstorming. Standard FMECA dimensions (occurrence, severity, detectability) were adapted into 5-point ordinal scales. The framework was applied to 36 discharge summaries from four patients, generated by an open LLM (GPT-OSS 120B) using real-world clinical data from the Geneva University Hospitals. Reviewers independently annotated the summaries across two rounds. Inter-rater reliability was assessed at failure mode, severity and detectability score levels. Usability and content validity were evaluated using an adapted System Usability Scale and structured feedback.
Results: The final framework comprised 14 failure modes organized into categories. Inter-rater agreement improved between rounds, reaching moderate-to-substantial agreement for failure mode identification and good agreement for severity and detectability scoring. Usability was rated as good (mean SUS: 79.2/100), with high evaluator confidence.
Discussion and Conclusion: This study presents the first FMECA-based framework for systematic patient safety risk assessment of LLM-generated clinical summaries. The framework provides a structured and reproducible method for identifying clinically relevant risks caused by these summaries.

[35]  arXiv:2605.04091 [pdf, ps, other]
Title: OpenCLAW-Nexus: A Self-Reinforcing Trust Framework for Byzantine-Resilient Decentralized Federated Learning
Subjects: Networking and Internet Architecture (cs.NI)

Decentralized Federated Learning (DFL) eliminates the central aggregator but introduces a severe 'trust gap': without a trusted coordinator, the system becomes vulnerable to Byzantine and Sybil attacks, while existing solutions treat node selection, aggregation, and consensus as isolated modules, often relying on a trusted root dataset unavailable in truly decentralized settings.We propose OpenCLAW-Nexus, a self-reinforcing trust framework that bridges this gap through a single primitive, a discounted Beta-reputation model, that unifies reputation-based node selection, reputation-weighted aggregation Rep-FedAvg, and reputation-aware BFT consensus. Rep-FedAvg eliminates the trusted root dataset requirement; we formally prove reputation separation between honest and Byzantine nodes under non-IID data with noisy evaluations.On a 1,000-node global testbed spanning three cloud providers and nine regions, Rep-FedAvg achieves 72.6% accuracy on non-IID CIFAR-10 with 20% Byzantine nodes and record-level differential privacy, within 0.5,pp of centralized FLTrust.Under a 300-node Sybil attack, reputation-weighted consensus maintains 84.2% validation correctness versus 62.8% (PoW) and 47.6% (PoS).

[36]  arXiv:2605.04093 [pdf, ps, other]
Title: Decision Evidence Maturity Model for Agentic AI: A Property-Level Method Specification
Authors: Oleg Solozobov
Comments: 41 pages, 8 tables. Companion artefact: Decision Trace Reconstructor v0.1.0 (Apache-2.0), this https URL Decision Event Schema (MIT): this https URL
Subjects: Computers and Society (cs.CY)

Agentic AI systems produce decision evidence at scale through execution telemetry, but property-level reconstruction often fails when an external party asks a specific governance question about a specific decision: the assembled evidence is insufficient to answer it. We name this pattern the container fallacy: the automatic equation of evidence-container presence with audit sufficiency. This paper specifies the Decision Evidence Maturity Model (DEMM), a property-level reconstructability method for agentic decisions. DEMM classifies evidence sufficiency into four executable categories plus a protocol-level "conflicting" category and aggregates per-property verdicts into a five-level capability rubric anchored to the established maturity-model lineage. The open-source Decision Trace Reconstructor ships ten executable adapter-fallback classes spanning vendor SDKs, protocol traces, public-postmortem prose, and generic JSONL records. A reproducible feasibility exercise runs the protocol on 140 synthetic scenarios plus three public incidents; the resulting completeness range (53.6% to 100%) is implementation behaviour, not external validation.

[37]  arXiv:2605.04098 [pdf, ps, other]
Title: Are Multimodal LLMs Ready for Clinical Dermatology? A Real-World Evaluation in Dermatology
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Multimodal large language models (MLLMs) have demonstrated promise on publicly available dermatology benchmarks. However, benchmark performance may not generalize to real-world dermatologic decision-making. To quantify this benchmark-to-bedside gap, we evaluated four open-weight MLLMs (InternVL-Chat v1.5, LLaVA-Med v1.5, SkinGPT4 and MedGemma-4B-Instruct) and one commercial MLLM (GPT-4.1) across three publicly available dermatology datasets and a retrospective multi-site hospital-based dermatology consultation cohort comprising 5,811 cases and 46,405 clinical images. Models were evaluated on two clinically relevant tasks: differential diagnosis generation and severity-based triage. Diagnostic performance was modest on public datasets and declined substantially in the real-world cohort. On public benchmarks, top-3 diagnostic accuracy reached 26.55% for the best open-weight model and 42.25% for GPT-4.1. On real-world consultation cases using images alone, top-3 diagnostic accuracy fell to 1.50%-13.35% among open-weight models and 24.65% for GPT-4.1. Incorporating clinical context improved performance across all models, increasing top-3 diagnostic accuracy up to 28.75% among open-weight models and 38.93% for GPT-4.1. However, model outputs were highly sensitive to incomplete or erroneous consultation context. For severity-based triage, models achieved moderate sensitivity (above 60%), suggesting potential utility for screening but insufficient reliability for clinical deployment. These findings demonstrate that benchmark performance substantially overestimates the real-world clinical capability of current dermatology MLLMs.

[38]  arXiv:2605.04100 [pdf, ps, other]
Title: Regularized Centered Emphatic Temporal Difference Learning
Subjects: Artificial Intelligence (cs.AI)

Off-policy temporal-difference (TD) learning with function approximation faces a structural tradeoff among stability, projection geometry, and variance control. Emphatic TD (ETD) improves the off-policy projection geometry through follow-on emphasis, but the follow-on trace can have high variance. We revisit this tradeoff through Bellman-error centering. Although centering naturally removes a common drift term from TD errors, we show that a naive centered emphatic extension introduces an auxiliary coupling that can destroy the positive-definiteness of the ETD key matrix. We propose \emph{Regularized Emphatic Temporal-Difference Learning} (RETD), which preserves the follow-on trace and regularizes only the auxiliary centering recursion, corresponding to lifting the lower-right block of the coupled key matrix from \(1\) to \(1+c\). We derive the RETD core matrix, prove convergence under a conservative sufficient regularization condition, and evaluate the method on diagnostic linear off-policy prediction tasks. The experiments show that RETD avoids the instability of naive centered emphatic learning, preserves favorable emphatic geometry, and exhibits a robust intermediate regime for the regularization parameter \(c\) across the diagnostics.

[39]  arXiv:2605.04103 [pdf, ps, other]
Title: HERCULES: Hardware-Efficient, Robust, Continual Learning Neural Architecture Search
Comments: 21 pages, 1 figure
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Neural Architecture Search (NAS) has emerged as a powerful framework for automatically discovering neural architectures that balance accuracy and efficiency. However, as AI transitions from static benchmarks to real-world deployment, the traditional focus on hardware-aware efficiency is no longer sufficient. We observe that modern NAS methods, especially those that target edge AI, are evolving to address a triple objective: Efficiency, Robustness, and Continual Learning. While efficiency ensures feasibility in resource-constrained environments, robustness guarantees reliability under environmental variabilities, and continual learning enables adaptation to sequential tasks without catastrophic forgetting. We propose a taxonomy of NAS approaches through this triple lens, distinguishing between methods targeting resource optimization, environmental resilience, and architectural plasticity. This unified perspective reveals that these axes, though often studied in isolation, are mutually reinforcing. Building on this taxonomy, we map the current landscape of these NAS methods into a new framework called Hardware-Efficient, Robust, and ContinUal LEarning Search (HERCULES). We define the desiderata, the twelve labours of HERCULES, addressing the non-trivial challenge of balancing an adequate search-space exploration with the immense computational costs of a multi-objective NAS, accounting for these crucial objectives of current AI systems. By identifying critical gaps in existing research, this survey outlines a roadmap toward integrated algorithmic, architectural, and hardware-software co-design for truly deployable, lifelong-learning AI systems.

[40]  arXiv:2605.04107 [pdf, ps, other]
Title: TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments
Authors: Furkan Sakizli
Comments: 19 pages, 6 figures, 23 tables. Code, benchmark suite, and evaluation logs: this https URL
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Production agent frameworks (OpenAI Function Calling, Anthropic Tool Use, MCP) transmit tool schemas as JSON, a format designed for machine parsing, not for interpretation by language models. For small models (4B-14B), this protocol mismatch accounts for the majority of tool-use failure at production catalog sizes. We present TSCG, a deterministic tool-schema compiler that resolves this mismatch at the API boundary, converting JSON schemas into token-efficient structured text without model access, fine-tuning, or runtime search. TSCG combines eight composable operators with a formal compression bound (>=51% on well-formed schemas).
On TSCG-Agentic-Bench (about 19,000 calls, 12 models, 5 scenarios), TSCG restores Phi-4 14B from 0% to 84.4% accuracy at 20 tools (90.3% at 50 tools) and achieves 108-181% accuracy-retained ratio across three models on BFCL. Format-versus-compression decomposition (R^2=0.88 -> 0.03) establishes representation change as the dominant mechanism. Per-operator isolation across three frontier models reveals three distinct operator-response profiles: operator-hungry (Opus 4.7), operator-sensitive (GPT-5.2), and operator-robust (Sonnet 4), providing per-model deployment guidance. Scaling experiments show accuracy advantages persisting on heavy production MCP schemas (+5.0 pp at about 10,500 input tokens) despite saturation on light synthetic catalogs, with 52-57% token savings throughout. The synthetic benchmark generalizes to real MCP schemas within 0.1 accuracy points. TSCG ships as a 1,200-line zero-dependency TypeScript package.

[41]  arXiv:2605.04108 [pdf, ps, other]
Title: MuCALD-SplitFed: Causal-Latent Diffusion for Privacy-Preserving Multi-Task Split-Federated Medical Image Segmentation
Comments: Accepted to oral presentation and conference proceedings at IEEE International Conference on Image Processing (ICIP 2026), Finland
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Federated Learning enables decentralized training by aggregating model updates across clients without sharing raw data, while Split Federated Learning further partitions the model between clients and a server to reduce computation and communication at the client side. However, decentralized medical institutions rarely operate on a single shared task, making standard Federated and SplitFed collaborations poorly aligned with real clinical workflows. Multi-task FL extends these frameworks by allowing clients to handle different tasks, but often introduces instability and privacy vulnerabilities. This study proposes \textbf{MuCALD-SplitFed}, a multi-task SplitFed framework that integrates causal representation learning and latent diffusion. Experiments show MuCALD-SplitFed consistently improves segmentation, while baseline SplitFed fails to converge. The proposed approach further reduces information leakage at split points, mitigating reconstruction-based and membership inference attacks. Additionally, MuCALD SplitFed outperforms state-of-the-art personalized FL and multi-task FL approaches. The code repository is: https://github.com/ChamaniS/MuCALD_SplitFed.

[42]  arXiv:2605.04109 [pdf, ps, other]
Title: Resource Utilization of Differentiable Logic Gate Networks Deployed on FPGAs
Comments: 6 pages, 6 figures, conference submission
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)

On-edge machine learning (ML) often strives to maximize the intelligence of small models while miniaturizing the circuit size and power needed to perform inference. Meeting these needs, differentiable Logic Gate Networks (LGN) have demonstrated nanosecond-scale prediction speeds while reducing the required resources as compares to traditional binary neural networks. Despite these benefits, the trade-offs between LGN parameters and resulting hardware synthesis characteristics are not well characterized. This paper therefore studies the tradeoffs between power, resource utilization, inference speed, and model accuracy when varying the depth and width of LGNs synthesized for Field Programmable Gate Arrays (FPGA). Results reveal that the final layer of an LGN is critical to minimize timing and resource usage (i.e. 28\% decrease), as this layer dictates the logic size of summing operations. Subject to timing and routing constraints, deeper and wider LGNs can be synthesized for FPGA when the final layer is narrow. Further tradeoffs are presented to help ML engineers select baseline LGN architectures for FPGAs with a set number of Look Up Tables (LUT).

[43]  arXiv:2605.04111 [pdf, ps, other]
Title: Optimally Covering Large Triangles with Homothetic Unit Triangles
Authors: John M. Boyer
Comments: 24 pages, 5 figures
Subjects: Computational Geometry (cs.CG)

We answer an open problem in the \emph{American Mathematical Monthly} about covering large triangles. Given a triangle $T$ of any triangular shape with a selected side length between $n \in \mathbb{N}$ and $n+1$, Baek and Lee proved that $T$ could not be covered with $n^2+1$ homothetic unit triangles (with the selected side of length 1). Letting $T_{n+d}$ denote a triangle with selected side length $n + d$ with $d \in (0, 1)$, Baek and Lee extended their proof to establish upper bounds for $d$ above which a $T_{n+d}$ cannot be covered with $n^2+2$ or $n^2+3$ homothetic unit triangles. Then, they showed that these bounds are tight based on analyses of a method by Conway and Soifer for the $n^2+2$ case and their own method for the $n^2+3$ case. Baek and Lee stated as an open problem the need to find tight upper bounds for the $n^2 + k$ cases for $4 \le k \le 2n$. We extend the Baek and Lee proof to establish upper bounds for those higher cases, and we show the upper bounds are tight by presenting two new triangle covering methods for the odd and even cases of $k$ that meet the upper bounds, as well as an optimal consolidated method that uses whichever of the two will cover a given $T_{n+d}$ with the fewest homothetic unit triangles.

[44]  arXiv:2605.04114 [pdf, ps, other]
Title: Semantic Reverse Engineering Legacy Software Applications with ChatGPT, Gemini AI, and Claude AI
Journal-ref: Primera Scientific Engineering, Denton, TX, 8.5 (2026): 04-23
Subjects: Software Engineering (cs.SE); Databases (cs.DB)

This research paper describes our research results on using ChatGPT, Gemini, and Claude AI to semantically reverse engineer legacy database software applications.

[45]  arXiv:2605.04115 [pdf, ps, other]
Title: Learning reveals invisible structure in low-rank RNNs
Authors: Yoav Ger, Omri Barak
Comments: 30 pages, 12 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

Learning in neural systems arises from synaptic changes that reshape the representations underlying behavior. While low-rank recurrent neural networks (RNNs) have emerged as a powerful framework for linking connectivity to function, a theoretical understanding of their learning process remains elusive. Here, we extend the low-rank framework from activity to learning by deriving gradient-descent dynamics directly in a reduced overlap space. We formulate a closed-form, low-dimensional system of ODEs that governs learning in this space, exact for linear RNNs and asymptotically exact for nonlinear RNNs in the large-$N$ Gaussian limit. Central to our analysis is a distinction between two classes of overlaps: loss-visible overlaps, which fully determine network activity, output, and loss, and loss-invisible overlaps, which do not affect function but are required to describe learning. We illustrate the consequences of this decomposition through two phenomena. First, we show that learning can serve as a perturbation that exposes differences in connectivity between functionally equivalent networks. Second, we show that loss-invisible overlaps can act as memory variables that encode training history, and characterize the conditions under which this occurs. Finally, we present several testable predictions for biological learning experiments derived from our theory.

[46]  arXiv:2605.04116 [pdf, ps, other]
Title: Membership Inference Attacks for Retrieval Based In-Context Learning for Document Question Answering
Comments: this https URL
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

We show that remotely hosted applications employing in-context learning when augmented with a retrieval function to select in-context examples can be vulnerable to membership-inference attacks even when the service provider and users are separate parties. We propose two black-box membership inference attacks that exploit query text prefixes to distinguish member from non-member inputs. The first attack uses a reference model to estimate an otherwise unavailable loss metric. The second attack improves upon it by eliminating the reference model and instead computing a membership statistic through a simple but novel weighted-averaging scheme. Our comprehensive empirical evaluations consider a stricter case in which the adversary has a paraphrased version of the text in the queries and show that our attacks can exhibit stronger resilience to paraphrasing and outperform three prior attacks in many cases with small number of prefixes. We also adapt an existing ensemble prompting defense to our setting, demonstrating that it substantially mitigates the privacy leakage caused by our second attack.

[47]  arXiv:2605.04126 [pdf, ps, other]
Title: Simultaneous CNN Approximation on Manifolds with Applications to Boundary Value Problems
Authors: Hanfei Zhou, Lei Shi
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

This paper develops convolutional neural network (CNN) methods for simultaneous approximation and elliptic boundary value problems on compact Riemannian manifolds. We establish simultaneous Sobolev approximation results for single- and multichannel CNNs, showing that manifold functions and their derivatives can be approximated with rates governed by the intrinsic dimension and the smoothness gap, rather than by the ambient dimension, thereby mitigating the curse of dimensionality. Building on this approximation theory, we propose a physics-informed CNN (PICNN) framework specially designed for boundary value problems. The main numerical issue is a boundary-norm mismatch: standard PINNs usually impose boundary data through low-order, often L2-type, penalties, whereas elliptic stability requires Sobolev trace control. We address this by introducing a spectral boundary loss based on the boundary Laplace-Beltrami operator, which represents trace errors as weighted frequency energies and relates truncation error to boundary eigenvalue decay. This avoids smooth auxiliary constructions required by exact boundary enforcement and singular double integrals arising in Sobolev-Slobodeckij penalties, while enabling implementations based on Fast Fourier Transforms (FFTs) or precomputed spectral bases on structured boundaries. Numerical experiments demonstrate improved accuracy, convergence, and stability over standard PINNs.

[48]  arXiv:2605.04127 [pdf, ps, other]
Title: Position: the Stochastic Parrot in the Coal Mine. Model Collapse is a Threat to Low-Resource Communities
Comments: 13 pages, 1 figure, International Conference on Machine Learning
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computers and Society (cs.CY)

Model collapse, the degradation in performance that arises when generative models are trained on the outputs of prior models, is an increasing concern as artificially generated content proliferates. Related critiques of large language models have highlighted their tendency to reproduce frequent patterns in training data, their reliance on vast datasets, and their substantial environmental cost. Together, these factors contribute to data degradation, the reinforcement of cultural biases, and inefficient resource use. In this position paper we aim to combine these views and argue that model collapse threatens current efforts to democratize AI. By reducing training efficiency and skewing data distributions away from the tails of their support, model collapse disproportionately impacts low-resource and marginalized communities. We examine both the environmental and cultural implications of this phenomenon, situate our position within recent position papers on model collapse, and conclude with a call to action. Finally, we outline initial directions for mitigating these effects.

[49]  arXiv:2605.04128 [pdf, ps, other]
Title: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation
Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception and generation to interact through a shared multimodal interface. Around this architecture, we build a scalable training recipe that combines unified instruction tuning, long-text rendering supervision, spatially grounded data, and both general and spatial editing signals. This design gives the model broad multimodal capability while strengthening geometry-aware reasoning and controllable visual synthesis. Experiments across understanding, generation, long-text rendering, and editing benchmarks show that JoyAI-Image achieves state-of-the-art or highly competitive performance. More importantly, the bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence. These results suggest a promising path for unified visual models in downstream applications such as vision-language-action systems and world models.

[50]  arXiv:2605.04129 [pdf, ps, other]
Title: Quantum-Resistant Networks: A Review of Primitives, Protocols and Best Practices
Comments: 40 pages, 1 figure
Subjects: Cryptography and Security (cs.CR)

Large-scale quantum computers threaten the public-key cryptographic foundations underpinning today's network security infrastructures. While significant progress has been made in standardizing post-quantum cryptographic (PQC) primitives and adapting individual protocols such as TLS and SSH, far less attention has been paid to the broader architectural consequences of the post-quantum transition for networked systems. In particular, many real-world deployments such as mobile networks, industrial control systems, IoT environments, and regulated infrastructures cannot assume the universal availability, deployability, or desirability of PQ public-key infrastructures. This paper presents the first comprehensive systematization of PQ-resistant network architectures, focusing on key distribution and management as a system-level design problem rather than a protocol-local substitution. We introduce a unified taxonomy spanning cryptographic foundations (symmetric-only, PQ-PKI, hybrid, and information-theoretic multi-path), key-distribution architectures (centralized, hierarchical, replicated, threshold, MPC-backed, and serverless), trust and threat models, key-management lifecycle, and deployment environments. Using this framework, we analyze the security, scalability, and operational trade-offs of a wide range of architectures under realistic PQ adversary assumptions, including harvest-now, decrypt-later attacks and partial infrastructure compromise. Our study highlights fundamental gaps in existing approaches, clarifies when PQ-PKI is necessary or avoidable, and identifies promising research directions for building cryptographically agile, quantum-resilient network infrastructures.

[51]  arXiv:2605.04130 [pdf, ps, other]
Title: Constrained Extreme Gradient Boosting for Adapting Reduced-Order Models
Comments: Preprint. Under review. 4 numerical examples
Subjects: Machine Learning (cs.LG)

High-fidelity simulations, such as computational fluid dynamics and finite element analysis, are essential for modeling complex engineering systems but are often prohibitively expensive for tasks including parametric studies, optimization, and real-time control. Projection-based reduced-order models (ROMs) alleviate this cost by projecting the governing dynamics onto low-dimensional subspaces. However, their performance can deteriorate under parameter variation, motivating the need for adaptive basis construction. In this work, we propose a constrained ensemble learning framework, termed Constrained Extreme Gradient Boosting (cXGBoost), for predicting Proper Orthogonal Decomposition (POD) bases as functions of system parameters. The approach leverages a geometric representation of subspaces on the Grassmann manifold, which are mapped to a Euclidean space to enable efficient regression using gradient boosting trees. A norm constraint is imposed during training to ensure the validity of the inverse mapping and preserve the geometric structure of the predicted subspaces. The proposed method is evaluated on four numerical examples, including fluid dynamics and wave propagation problems, demonstrating its ability to accurately predict parameter-dependent bases while maintaining robustness across nonlinear regimes. These results highlight the potential of combining geometric learning with constrained ensemble methods for scalable and reliable reduced-order modeling of high-dimensional parametric systems.

[52]  arXiv:2605.04132 [pdf, ps, other]
Title: Two Integration Pathways in Human-Centered Requirements Engineering: A Systematic Mapping Study of Structural Gaps
Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)

Human-centered Requirements Engineering (HC-RE) integrates user cognition, emotions, and social interactions into the RE process through contributions from disciplines such as psychology, cognitive science, design thinking, and human-computer interaction. Despite growing interest, how these multidisciplinary contributions are structured and why they remain fragmented across the RE lifecycle is not well understood.
This systematic mapping study analyzes 56 primary studies across seven dimensions, including RE phases, user involvement techniques, contributing disciplines, and evaluation methods. Results show that 70\% of approaches involve multidisciplinary contributions, yet only 39% have been empirically evaluated and 48% address only the elicitation phase. A cross-study analysis reveals a structural separation between two parallel integration traditions: a Cognitive-Formal (C-F) pathway grounded in goal-based frameworks and formal modeling, and a Participatory-Iterative (P-I) pathway grounded in scenario-based frameworks and iterative design. Each pathway has developed complementary strengths, but their near-total disconnection explains the persistent lifecycle concentration and theory-practice gap observed in the corpus.
The findings identify the absence of translation mechanisms between human-centered artifacts and formal RE specifications as the field's primary structural gap, provide a structured research agenda organized into four priority tiers, and establish the empirical foundation for Experience-Centered Requirements Engineering, a direction in which user experience is explicitly operationalized as a first-class concern in requirements specification.

[53]  arXiv:2605.04134 [pdf, ps, other]
Title: Model synthesis and identifiability analysis of stiff chemical reaction systems with inVAErt networks
Subjects: Machine Learning (cs.LG)

We consider the problem of learning data-driven replicas for stiff systems of ordinary differential equations arising in chemical kinetics that can be evaluated with high computational efficiency. We first focus on training emulators for families of reaction equations under varying reaction rates, using conditional residual networks or long-short term memory architectures. We then apply a recently proposed data-driven framework known as ``inVAErt networks'' to address the ill-posed inverse problem of inferring reaction rates, integration time, and possibly initial conditions from a target set of species concentrations - a problem that has received relatively little attention in the literature. The proposed approach is demonstrated on chemical systems with reversible and irreversible kinetics, spanning 2 to 20 differential equations, 3 to 20 chemical species, and 3 to 25 reaction rate parameters. Relative root mean squared errors produced by the proposed emulators range from $10^{-5}$ for lower-dimensional systems to $10^{-4}$ and $10^{-3}$ for an air pollution model and a hydrogen-air reaction system, respectively. Manifolds of non-identifiable reaction rates recovered by the proposed approach can be analytically verified for simple systems and are consistent with local identifiability analysis in higher dimensions.

[54]  arXiv:2605.04135 [pdf, ps, other]
Title: Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation
Comments: 60 pages, 9 figures, 7 tables, 8 appendices. Pre-registered on OSF: this https URL (DOI: 10.17605/OSF.IO/7XM3D, registered 2026-04-17). Companion artefacts: VERSIO-AI v1.2 reporting checklist (Appendix A; CC-BY-4.0); frontierlag Python package (this https URL, MIT) and per-DOI audit tool at this https URL
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Readers of applied-domain LLM capability evaluations want to know what AI systems can currently do. That literature answers a related, but consequentially different, question: what older, cheaper, less-elicited models could do months or years earlier (a 2026 paper evaluating GPT-4o-mini zero-shot, say, against a frontier of reasoning-capable, tool-using systems like GPT-5.5 Pro and Claude Opus 4.7), often reported with sparse configuration details and abstracted upward into claims about "AI" that propagate through citations, media, and policy. We measure the 'publication elicitation gap' (the gap between these answers) in a pre-registered audit of 112,303 LLM-keyword-matched candidate records (2022-01 to 2026-04; 18,574 admissible, 4,766 full-paper texts retrievable), comparing tested models to the contemporaneous frontier on the Epoch AI Capabilities Index (ECI), reproduced under Arena Elo and Artificial Analysis.
The median paper evaluates a model +10.85 ECI (~1.4x the distance between Claude Sonnet 3.7 and Claude Opus 4.5) behind the contemporaneous frontier at evaluation time (H1); an exploratory rational-lag baseline (H8) decomposes this into ~25% peer-review latency, ~75% excess lag. The gap is widening at +5.53 ECI/year (H2; 95% CI [+5.03, +5.83]). Meanwhile, only 3.2% of abstracts (21.2% of full-texts) disclose reasoning-mode status on reasoning-capable models (H4) and 52.5% (95% CI [48.2, 56.9]) state conclusions at the level of "AI" rather than the evaluated model(s), rising at OR = 1.23/year.
Proposed remedies include API-access subsidies and editorial enforcement of reporting frameworks mandating configuration-surface disclosure (model snapshot, reasoning mode/effort, tool access, scaffolding, prompting, etc.); VERSIO-AI is a 13-item checklist (Core 3 desk-reject) extending existing frameworks at the elicitation surface, with per-DOI analysis at frontierlag.org.

[55]  arXiv:2605.04157 [pdf, ps, other]
Title: FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals
Subjects: Computation and Language (cs.CL)

SemEval-2026 Task 13 investigates machine-generated code detection across multiple programming languages and application scenarios, asking participating systems to generalize to unseen languages and domains. This paper describes our participation in Subtask A (binary classification) and explores both pretrained code encoders and lightweight feature-based methods. We design ratio-based features that are less sensitive to snippet length. To support the extraction of descriptiveness-related signals, we use parsing engines and a programming-language classifier. Additionally, we train a separate code-vs-text line classifier to identify raw natural language segments embedded within samples. We combine a shallow decision tree with heuristic rules derived from data analysis to produce the final predictions. Our approach is computationally efficient, requires only CPU resources for training, and achieves near-instant inference time, offering a lightweight alternative to large pretrained models.

[56]  arXiv:2605.04164 [pdf, ps, other]
Title: Enabling Real-Time Training of a Wildfire-to-Smoke Map with Multilinear Operators
Comments: 27 pages
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Computational Physics (physics.comp-ph)

Wildfires are a major producer of fine particulate matter, impacting human health and the electrical grid. Accurately forecasting smoke impacts over long time scales incorporates fuel treatment strategies, natural fuel succession, and stochastic events like lightning strikes. However, predicting smoke for each fuel distribution with a forward simulation of a coupled fire-atmosphere model is computationally infeasible. Moreover, relatively simple fire models are tractable to run in many long-time scenarios but do not capture smoke transport. We use data-driven multilinear operators to predict a smoke concentration field from knowledge of the time since ignition for two quantities of interest: aerosol optical depth and smoke detection. Our method first computes the principal components of time-since-ignition and smoke concentration fields and then learns a map from powers of the input coefficients to the output coefficients. We apply our learned operator to smoke prediction in the Upper Rio Grande Watershed. After collecting training data, learning the approximation weights on a CPU takes less than 30 seconds, and each forward call takes less than 1 ms. On a proxy for aerosol optical depth, we obtain equal accuracy to Monte Carlo sampling with fewer than half as many coupled model calls. For smoke detection, we obtain an intersection-over-union (IoU) of 65% and an area under the receiver operating characteristic curve (AUC) of 0.95 on holdout data. Our method is significantly more accurate than the most similar published smoke classifier, which obtains an IoU and AUC of 0.15 and 0.61, respectively, on a 2015 bushfire in Australia.

[57]  arXiv:2605.04165 [pdf, ps, other]
Title: FlowEval: Reference-based Evaluation of Generated User Interfaces
Subjects: Multiagent Systems (cs.MA); Human-Computer Interaction (cs.HC)

While large language models (LLMs) and coding agents are often applied to user interface (UI) development, developers find it difficult to reliably assess their proficiency in visual and interaction design. Existing evaluations either rely on human experts, who can accurately assess usability by testing critical flows but are slow and costly, or on automated judges, which are scalable but less accurate and opaque. We present FlowEval, a reference-based framework that measures whether a generated UI supports realistic interaction flows by comparing navigation traces from real websites to traces from generated analogs using reference-based similarity metrics (e.g., dynamic time warping). In a small-scale study with expert UI evaluators, we show that reference-based metrics strongly correlate with human judgments, suggesting that they can provide scalable yet trustworthy evaluation for UI generation systems.

[58]  arXiv:2605.04169 [pdf, ps, other]
Title: Actionable Real-Time Modeling of Surgical Team Dynamics via Time-Expanded Interaction Graphs
Comments: Accepted at Hybrid Human Artificial Intelligence (HHAI) 2026
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Surgical team performance arises from complex interactions between technical execution and non-technical skills, including communication and coordination dynamics. However, current surgical AI systems predominantly model visual workflow signals, lacking structured representations of intraoperative team interactions over time. We propose a real-time actionable approach for modeling surgical team dynamics using time-expanded interaction graphs, where team members are modeled as time-indexed nodes and communication exchanges define directed edges. This spatio-temporal expansion enables dynamic interaction modeling, while allowing efficient inference with a static graph neural network. The model predicts procedural efficiency as the deviation from the expected duration and supports real-time deployment. Beyond prediction, we perform a counterfactual analysis to identify minimal changes in communication structure and interpretable behavioral variables associated with improved predicted outcomes. Experiments on recorded surgical procedures show that structured modeling of team interactions improves early identification of prolonged interventions and provides coherent, actionable explanations. This work advances surgical AI toward real-time, team-aware, and actionable decision support in the operating room.

[59]  arXiv:2605.04171 [pdf, ps, other]
Title: Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing
Comments: 6 pages, 4 figures, 2 tables, conference accepted and presented paper
Subjects: Computation and Language (cs.CL)

Large Language models (LLMs) show extraordinary abilities, but they are still prone to hallucinations, especially when we use them for generating Academic content. We have investigated four popular LLMs, ChatGPT, Grok, Gemini, and Copilot for hallucinations specifically for academic writing. We have designed 80 prompts across four categories, namely, reference generation, factual explanation, abstract generation, and writing improvement. We evaluated the model using a 0-5 rubric score, which checks factual accuracy, reference validity, coherence, style consistency, and academic tone. A novel weighted metric, Hallucination Index (HI), was introduced to measure hallucination in the responses generated by the models. Some of the most widely used evaluation metrics often fail to check errors which alter sentiment in machine-translated text. We found that Grok and Copilot perform better on reference generation tasks, but they often struggle with abstract or stylistic prompts, with HI values of 0.67 and 0.70, respectively. Whereas, Gemini and ChatGPT have done well with having stronger tone control, but they lack in writing factual tasks and higher hallucination risk with HI scores of 0.53 and 0.57, respectively. Our study found that hallucination behavior does not depend solely on model architecture but also on the type of task and the prompting conditions we are providing. We propose that our work opens new research dimensions for future researchers.

[60]  arXiv:2605.04172 [pdf, ps, other]
Title: täkōFormal: Enabling Robust Software for Programmable Memory Hierarchies (Extended Version)
Comments: 19 pages, 18 Figures. Conference Version of Paper to be published at ISCA 2026
Subjects: Hardware Architecture (cs.AR); Logic in Computer Science (cs.LO)

Accelerators provide large performance and energy-efficiency benefits, but can significantly change the hardware-software interface. The t\"{a}k\={o} programmable memory hierarchy accelerates data movement by enabling programmers to run user-defined callback functions triggered by cache misses, evictions, and writebacks. However, it also leads to drastically increased complexity and counterintuitive outcomes. In response, we develop an ISA-level memory consistency model (MCM) for t\"{a}k\={o} that captures the semantics of its operation, and we show how it enables programmers to formally reason about their t\"{a}k\={o} programs. We also prove the soundness of this ISA-level MCM by constructing a detailed t\"{a}k\={o} implementation model and verifying that all executions of the implementation model are allowed by our ISA-level MCM. Along the way, we discover useful insights about microarchitectural modeling and verification that are applicable to hardware in general.
This is the extended version of the ISCA 2026 paper "t\"{a}k\={o}Formal: Enabling Robust Software for Programmable Memory Hierarchies". This version adds material on additional litmus tests to Section V to further explore the programmability of t\"{a}k\={o} using our ISA-level MCM.

[61]  arXiv:2605.04175 [pdf, ps, other]
Title: A Provably Convergent and Practical Algorithm for Gromov--Wasserstein Optimal Transport
Authors: Ling Liang, Lei Yang
Comments: 21 Pages, 6 figures
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Gromov--Wasserstein optimal transport (GWOT) aligns metric measure spaces by matching their within-domain relational structures, but large-scale GWOT remains challenging because its objective is nonconvex and projection onto the transport polytope is often solved only approximately in practice. This leads to a gap between practical projected-gradient implementations and convergence theory, which typically assumes exact projections. For squared-loss GWOT, we propose an inexact projected-gradient framework with a verifiable feasibility-residual-based inexact condition for the projection subproblem. This condition is directly computable and avoids unknown quantities such as the exact projection point. Under this implementable condition, we prove subsequential convergence to stationary points and, with a mild tolerance-decay condition, convergence of the whole sequence. The resulting method retains the simplicity and sparsity of projected-gradient schemes while providing rigorous convergence guarantees, turning projected-gradient methods into a principled and scalable approach for GWOT with provable reliability.

[62]  arXiv:2605.04177 [pdf, ps, other]
Title: Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

As LLMs enter conflict monitoring, understanding systematic distortions in their outputs is critical for humanitarian accountability. We evaluate four vanilla open-weight models Gemma 3 4B, Llama 3.2 3B, Mistral 7B, and OLMo 2 7B and two domain-adapted models, AfroConfliBERT and AfroConfliLLAMA, on Nigeria and Cameroon conflict-event classification against ACLED, a gold-standard dataset with multi-stage verification. We find a bifurcated divergence in normative directionality. Open-weight models exhibit statistically significant False Illegitimation bias: Gemma misclassifies to 18.29% of legitimate battles as civilian-targeted violence while making zero False Legitimation errors. By contrast, AfroConfliBERT and AfroConfliLLAMA achieve near-directional neutrality, with Legitimization Bias differences indistinguishable from zero. Yet domain adaptation does not eliminate actor-based selection bias. Both adapted models show statistically significant actor bias comparable to vanilla LLMs; in Nigeria, state actors are legitimized 36.5% more often than non-state actors in identical tactical contexts. Open-weight outputs are also fragile to geography-specific lexical framing: delegitimizing phrases produce flip rates up to 66.7% in Cameroon and 34.2% in Nigeria, while perturbations salient in one context may not matter in another. Error trace profiling shows models mask normative bias through unfaithful rationale confabulations. In contrast, AfroConfliBERT and AfroConfliLLAMA are largely robust, with near-zero flip rates across perturbation categories. Overall, current models are not ready for unsupervised deployment in conflict monitoring. We call for fairness-aware fine-tuning to reduce actor-based selection bias, mandatory adversarial robustness evaluation against lexical manipulation, and context-specific human-in-the-loop oversight calibrated to regional difficulty.

[63]  arXiv:2605.04178 [pdf, ps, other]
Title: Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)

Rapidly evolving GPU architectures featuring complex memory hierarchies, matrix units, and varied precision formats continue to widen the gap between theoretical peaks and achievable performance. We design and develop analytical performance models for NVIDIA Blackwell (B200) and AMD CDNA3 (MI300A) grounded in systematic microbenchmark characterization. For Blackwell, the model captures Tensor Memory (TMEM), asynchronous bulk copy (TMA), and 5th-generation tensor cores; for CDNA3, the model captures Infinity Cache hierarchy, VGPR constraints, and occupancy. Validation yields 1.31% MAE on B200 (21 kernels) and 0.09% on MI300A (27 kernels), while naive roofline baselines exceed 95% error on the same kernels. We further validate the models using Rodinia~3.1 and SPEChpc 2021 Tiny.The models are updated with HBM bandwidth, capacity, and cache parameters and applied to H200 (Hopper) and MI250X (CDNA2), indicating no major restructuring of the models are needed. All models and benchmarks will be released as open-source upon acceptance.

[64]  arXiv:2605.04180 [pdf, ps, other]
Title: MedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMs
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models exhibit strong reasoning and semantic understanding capabilities but often hallucinate in domains that require expert knowledge, among which fabrications, the generation of factually incorrect yet fluent statements, pose the greatest risk in medical contexts. Existing medical hallucination datasets inadequately capture fabrication phenomena due to limited fabrication coverage, stylistic disparities between human and LLM-authored texts, and distributional drift during hallucinated sample synthesis. To address this, we propose a data-centric pipeline to generate realistic and word-level fabrications that preserve syntactic and stylistic fidelity while introducing subtle factual deviations, resulting in MedFabric. Building upon this dataset, we introduce ETHER, a modular word-level fabrication detector integrating Text2Table Decomposition, Word Masking and Filling and Hybrid Sentence Pair Evaluation to enhance factual alignment. Empirical results demonstrate that MedFabric outperforms state-of-the-art detectors by over 15% on word-level fabrication benchmarks while maintaining consistent performance across structural similarities, offering a comprehensive framework for reliable and domain-specific factuality detection.

[65]  arXiv:2605.04183 [pdf, ps, other]
Title: Nearly-Tight Bounds for Zonotope Containment and Beyond
Subjects: Data Structures and Algorithms (cs.DS); Metric Geometry (math.MG)

We investigate the convex-body containment problem $\max\{s >0 : s Z \subseteq Q\}$, where the outer body $Q \subseteq \mathbb R^d$ is described by a membership oracle and the inner body $Z \subseteq \mathbb R^d$ is a zonotope. Our main result is a sampling-based $O(\sqrt{d})$-approximation algorithm for this problem that almost matches the lower bound of $\Omega(\sqrt{d/\log d})$ by Khot and Naor in the oracle model. Assuming zonotopes can be sparsified by a linear number of generators, which is referred to as Talagrand conjecture, our approach attains the optimal approximation factor of $\Theta(\sqrt{d/\log d})$. Our second main result is a proof of Talagrand's conjecture for $\Delta$-modular zonotopes whenever $\Delta$ is constant. Those zonotopes are of the form $Z = \{ Wx \colon \| x\|_\infty \leq 1\}$ where the non-zero $d \times d$ sub-determinants of $W$ are between $1$ and $\Delta$. This result establishes a connection between zonoid sparsification and spectral sparsification of Batson, Spielman and Srivastava. We complement these results with a universal $\Omega(\sqrt{d/\log d})$ lower bound holding for all zonotopes.
Finally, we consider containment problems $\max\{s >0 : s K \subseteq Q\}$, for general convex bodies $K \subseteq \mathbb R^d$. A result of Nasz\'odi on approximating $K \subseteq \mathbb R^d$ by a polytope implies a $\Theta(d/\log d)$ approximation algorithm in polynomial time. We show the tightness of this approximation factor in the oracle model via a reduction to the circumradius computation. Our lower bound holds for centrally symmetric convex sets, implying that Barvinok's optimal $O(\sqrt{d})$-approximation of a centrally symmetric convex body by a polytope with a polynomial number of vertices cannot be computed in polynomial time.

[66]  arXiv:2605.04185 [pdf, ps, other]
Title: Constraint-Enhanced Reinforcement Learning Based on Dynamic Decoupled Spherical Radial Squashing
Comments: 27 pages, 60 figures
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

When deploying reinforcement learning policies to physical robots, actuator rate constraints -- hard limits on how fast each joint can move per control step -- are unavoidable. These limits vary substantially across joints due to differences in motor inertia, power bandwidth, and transmission stiffness, creating pronounced heterogeneity that existing methods fail to handle geometrically: the per-joint feasible region forms a high-dimensional box in action-increment space, yet QP projection and spherical parameterization methods impose isotropic ball-shaped constraints, exponentially under-covering the true feasible set as heterogeneity grows. This paper proposes Dynamic Decoupled Spherical Radial Squashing (DD-SRad), which resolves this mismatch by computing a position-adaptive radius independently for each actuator, achieving tight alignment with the true per-joint feasible region. DD-SRad satisfies per-step hard constraints with probability~1, preserves well-conditioned gradients throughout training, and admits exact policy gradient backpropagation with zero runtime solver overhead. MuJoCo benchmark experiments demonstrate the highest task return at zero constraint violation -- matching the unconstrained upper bound -- with 30%--50% improvement in constraint-space coverage over spherical baselines. High-fidelity IsaacLab simulations with Unitree H1 and G1 humanoid robots confirm end-to-end optimality parameterized directly from official joint specifications, validating a systematic pathway from hardware datasheets to safe deployment.

[67]  arXiv:2605.04188 [pdf, ps, other]
Title: A Multi-Agent Consensus Protocol for Stable Software Remodularization
Authors: Ahmed F. Ibrahim
Subjects: Software Engineering (cs.SE)

Automatic software remodularisation is typically cast as a single-objective optimization problem. While recent metaheuristics have improved search efficiency, real-world architecture recovery must reconcile the conflicting attributes of structural cohesion and evolutionary stability. We reframe software module clustering as a distributed consensus problem among autonomous agents. We introduce an Asymmetric Monotonic Concession Protocol (AMCP) that enables agents to negotiate decompositions that respect multi-attribute utility thresholds. We formally prove the protocol's termination, its bounded concession behaviour consistent with the Zeuthen Strategy under closed-instance conditions, and the local Pareto-satisfactoriness of the resulting partitions. Preliminary experiments on a synthetic benchmark and the Xwork Java framework confirm that our negotiated consensus matches state-of-the-art optimizers when stability budgets are loose, while acting as a "circuit breaker" to enforce strict stability constraints. Extended results on ten further systems, including comparisons with multi-objective evolutionary algorithms and multi-version chains, will be reported in a forthcoming full paper.

[68]  arXiv:2605.04189 [pdf, ps, other]
Title: Exploring the Output of Software Testing Tools through a Visual Comparative Analysis
Subjects: Human-Computer Interaction (cs.HC); Software Engineering (cs.SE)

Software testing is a fundamental process of software development, and prior work has shown that visualizations of test results support testers' decision-making. However, Human-Computer Interaction research on software testing has yet to explore and understand the shared interface elements and patterns in visualization of testing outputs. To address this, we conducted a visual comparative analysis of the output of 50 software testing tools and harnesses (44 with CLI output, 6 with GUI output) across four popular programming languages. Our analysis reveals the common interface elements in software testing tools, how these tools display and visualize test results, as well as the specific make-up of the output. Our findings provide insight on how visual testing output is formatted and how colour is used across both CLI and GUI environments, identifying trends that can be applied by developers of testing tools.

[69]  arXiv:2605.04193 [pdf, ps, other]
Title: ANDRE: An Attention-based Neuro-symbolic Differentiable Rule Extractor
Comments: 35 pages, 8 figures, 10 tables
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)

Inductive Logic Programming (ILP) aims to learn interpretable first-order rules from data, but existing symbolic and neuro-symbolic approaches struggle to scale to noisy and probabilistic settings. Classical ILP relies on discrete combinatorial rule search and is brittle under uncertainty, while differentiable ILP methods typically depend on predefined rule templates or inaccurate fuzzy operators that suffer from vanishing gradients or poor approximation of logical structure when reasoning over probabilistic predicate valuations. This paper proposes an Attention-based Neuro-symbolic Differentiable Rule Extractor (ANDRE), a novel ILP framework that learns first-order logic programs by optimizing over a continuous rule space with attention-based logical operators. ANDRE replaces both rule templates and logical operators with fully differentiable, attention-driven conjunction and disjunction operators that approximate logical min-max semantics, enabling accurate, stable, and interpretable reasoning over probabilistic data. By softly selecting, negating, or excluding predicates within each rule, ANDRE supports flexible rule induction while preserving symbolic structure. Extensive experiments on classical ILP benchmarks, large-scale knowledge bases, and synthetic datasets with probabilistic predicates and noisy supervision demonstrate that ANDRE achieves competitive or superior predictive performance while reliably recovering correct symbolic rules under uncertainty. In particular, ANDRE remains robust to moderate label noise, substantially outperforming existing differentiable ILP methods in both rule extraction quality and stability.

[70]  arXiv:2605.04194 [pdf, ps, other]
Title: Coupled-NeuralHP: Directional Temporal Coupling Between AI Innovation Exposure and Public Response
Subjects: Computers and Society (cs.CY)

Artificial intelligence innovation exposure and public response co-evolve, but innovation arrives as irregular event streams while response is observed monthly. We introduce Coupled-NeuralHP, a hybrid event-plus-state model linking eight-domain USPTO AI patent publication streams to a train-only Google Trends response index. Under the cleaned response protocol, the validation-selected one-way real-data variant gives the best held-out innovation count forecasts in the registered comparison set (pseudo-log-likelihood -30.4 vs. -34.7; root mean squared error (RMSE) 471 vs. 532) while matching the stronger multi-lag factor-family baseline on response RMSE (0.295). Ablations show that the real-data response signal is carried mainly by the structured forecast head, whereas the reverse response-to-innovation block is not supported on held-out count prediction. Across 60 semi-synthetic replications with known structure, the broader coupled family recovers innovation-to-response links much better than vector autoregression with exogenous inputs (VARX) (F1 = 0.734 vs. 0.386). A placebo-controlled 2022 split-date analysis finds no robust milestone-specific regime break.

[71]  arXiv:2605.04196 [pdf, ps, other]
Title: The Impact of Vocabulary Overlaps on Knowledge Transfer in Multilingual Machine Translation
Subjects: Computation and Language (cs.CL)

Knowledge transfer, especially across related languages, has been found beneficial for multilingual neural machine translation (MNMT), but some aspects are still under-explored and deserve further investigation. A joint vocabulary is most often applied to form a uniform word embedding space, but since the impact of a disjoint vocabulary on model performance is far less studied, there is no consensus on how much knowledge transfer is mainly due to vocabulary overlap. In this paper, we present systematic experiments with joint and disjoint vocabularies, and auxiliary languages related and unrelated to the source language. We design this experiment in an out-of-domain setup in order to emphasize transfer and the impact of the auxiliary language. As expected, we yield better results with more extensive vocabulary overlaps typical for related languages, but our experiments also show that domain-match and language relatedness are more important than a joint vocabulary.

[72]  arXiv:2605.04198 [pdf, ps, other]
Title: Deep Wave Network for Modeling Multi-Scale Physical Dynamics
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn); Plasma Physics (physics.plasm-ph)

Performance of deep learning models is strongly governed by architectural capacity, with width and depth as primary controls. However, in physical-science applications, models are often compared at a single fixed size or by separating accuracy and computational cost, which can be misleading since architectures exhibit different accuracy-cost scaling as width and depth vary. This issue is particularly relevant for U-Net-type encoder-decoder models, widely used for multi-scale gas, fluid, and plasma dynamics due to their ability to represent features across spatial scales. A U-Net constructs a multi-resolution representation via an encoder that progressively reduces spatial resolution, followed by a decoder that restores it for prediction. Skip connections link corresponding encoder and decoder features, preserving fine-scale information and improving optimization. In practice, U-Net width is routinely tuned, while depth is typically kept fixed (a set number of down/up-sampling stages with few convolutions per stage), limiting systematic exploration of depth for improving the accuracy-cost trade-off. We address this limitation by increasing effective depth through stacking multiple encoder-decoder "waves" in series, with skip connections both within and across waves to enable progressive cross-scale refinement. We call this architecture a Deep Wave Network (DW-Net). Training data, optimization, and schedules are kept identical across models. Instead of evaluating single configurations, we train multiple width variants of each architecture and compare accuracy vs. GPU time Pareto fronts. Across several 2D and 3D flow benchmarks, DW-Net models consistently improve the Pareto frontier over single-wave U-Nets, achieving higher accuracy at matched cost or similar accuracy at reduced cost, and reaching low-error regimes with up to 3x less training time under identical training settings.

[73]  arXiv:2605.04201 [pdf, ps, other]
Title: Topology-Constrained Quantized nnUNet for Efficient and Anatomically Accurate 3D Tooth Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a topology-constrained quantized nnUNet framework for efficient and anatomically accurate 3D tooth segmentation, addressing the challenges of spatial distortion introduced by quantization in deep learning models. The proposed method integrates a novel tooth-specific topological loss into quantization-aware training, preserving critical anatomical structures such as tooth count, adjacency relationships, and cavity integrity while maintaining computational efficiency. The system employs an 8-bit quantized nnUNet backbone, where weights and activations are dynamically calibrated to minimize precision loss during inference. Furthermore, the topological loss combines connected-component analysis, adjacency consistency, and hole detection penalties, ensuring anatomical fidelity without modifying the underlying network architecture. The joint optimization objective harmonizes cross-entropy loss, quantization regularization, and topological constraints, enabling end-to-end training with gradient approximations for persistent homology terms. Experiments demonstrate that our approach significantly reduces topological errors compared to conventional quantized models, achieving clinically plausible segmentations on dental CBCT scans. The method retains the hardware efficiency of integer-only inference, making it suitable for deployment in resource-constrained clinical environments. This work bridges the gap between computational efficiency and anatomical precision in medical image segmentation, offering a practical solution for real-world dental applications.

[74]  arXiv:2605.04202 [pdf, ps, other]
Title: Sequential Strategic Classification with Multi-Stage Selective Classifiers
Comments: Shorter version presented as a poster at GameNets 2026
Subjects: Machine Learning (cs.LG)

Strategic classification studies the problem where self-interested individuals or agents manipulate their response to obtain favorable decision outcomes made by classifiers, typically turning to dishonest actions when they are less costly than genuine efforts. Prior works have demonstrated a fundamental inability to get out of this conundrum by only focusing on the design of a classifier. We note that prior work also heavily focuses on either one-shot settings or repeated interaction with the same classifier. Real-world decision making is often multi-stage, involving a sequence of potentially different classifiers as an agent progresses. This paper introduces a sequential, stochastic, multi-stage model of strategic classification, by capturing how agents adapt their behavior, through improvement actions (enhancing both observable features and true attributes) and gaming actions (enhancing only observable features), over multiple levels of classification with increasing difficulty as well as reward. For each level, we adopt a selective classifier that can abstain from making a prediction at low confidence. Consequently, a positive (resp. negative) outcome leads to promotion (resp. demotion) of the agent to the next higher (resp. lower) level, while abstention keeps the agent at the same level. We fully characterize the agent's optimal instantaneous action under selective classifiers and compare the long-term properties and utility of the agent repeatedly following an optimal myopic policy of either no-improvement (never choose the improvement action) or no-gaming (never choose the gaming action). We further examine design principles over the sequence of classifiers that yield higher long-term utility for the latter policy, thereby effectively incentivizing genuine effort in the long run.

[75]  arXiv:2605.04204 [pdf, ps, other]
Title: Symmetry-induced quantum-inspired parallelism of classical dynamic systems
Comments: 24 pages, 5 figures
Subjects: Emerging Technologies (cs.ET)

Performing multiple computations within the same system,
without spatial or temporal separation of tasks, requires encoding
multiple data items into a well-defined physical state. The most widely
explored mechanism for such encoding is the superposition of physical
states representing computational states. However, superposition requires
the system to be linear, which significantly limits the set of
achievable operations. We show that system symmetries provide an
alternative mechanism for encoding multiple computational states.
Notably, this mechanism also applies to nonlinear systems and therefore
does not impose inherent limits on computed functions.
Using the evaluation of Boolean functions as an example, we show that a
relaxed spin network driven by the V-2 model supports this
mechanism. We relate the resulting simultaneous computations enabled by
symmetry-induced parallelism to properties of the evaluated functions.
We demonstrate symmetry-induced parallelism for a logical AND/OR
gate and an N-bit adder.

[76]  arXiv:2605.04206 [pdf, ps, other]
Title: Climate-based Pre-screening of Self-sustaining Regreening Opportunities in Drylands: A Case Study for Saudi Arabia
Subjects: Machine Learning (cs.LG)

Large-scale restoration in drylands is widely promoted to address land degradation and biodiversity loss, yet many efforts rely on long-term irrigation, limiting sustainability in water-scarce regions. A key challenge is identifying locations where native vegetation can persist without intensive management while minimizing costly field campaigns. A scalable pre-screening framework is presented that integrates climate and remote sensing data to enable cost-efficient site selection in arid environments using Saudi Arabia as a case study. A Climate Suitability Score (CSS), derived from machine learning models trained on expert-curated reference sites, captures complex climatic dependencies on vegetation persistence. Using multi-year ERA5-Land data for Saudi Arabia, national-scale prediction maps are generated and combined with vegetation indices to identify areas where climate is favorable, but vegetation remains underdeveloped. Multi-criteria screening reduces candidates to thirteen priority locations. Climatically analogous intact ecosystems provide benchmarks for restoration targets and indicate that an average 2.5 fold increase in vegetation coverage is a realistic target for restoration efforts. Overall, this approach narrows the search space, reduces costs, and supports resilient ecosystem recovery planning in water-limited regions.

[77]  arXiv:2605.04208 [pdf, ps, other]
Title: Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages
Subjects: Computation and Language (cs.CL)

Large language models (LLMs) have demonstrated impressive multilingual capabilities for well-resourced languages, yet their performance on low-resource African languages remains poorly understood and largely unevaluated. This paper presents Nsanku, a systematic benchmark that evaluates the zero-shot machine translation performance of 19 open-weight and proprietary LLMs across 43 Ghanaian languages paired with English. Evaluation sentences were sourced from the YouVersion Bible platform, providing 300 sentence pairs per language. Two complementary automatic metrics are employed: Bilingual Evaluation Understudy (BLEU) and Character n-gram F-Score (chrF), alongside an average accuracy score and a cross-language consistency dimension. Nsanku represents the most comprehensive LLM translation evaluation for Ghanaian languages conducted to date. Results show that gemini-2.5-flash achieves the highest overall average score of 26.88 (BLEU: 24.60, chrF: 29.16), followed by claude-sonnet-4-5 at 24.87 (BLEU: 22.46, chrF: 27.28) and gpt-4.1 at 23.20 (BLEU: 21.15, chrF: 25.24). Among open-weight models, kimi-k2-instruct-0905 leads at an average score of 20.87. A critical finding from the consistency analysis is that no model and no language reached the Leaders quadrant of high performance and high consistency simultaneously, indicating that current LLMs are not yet reliably usable for Ghanaian language translation at scale. Siwu achieved the highest per-language average score at 25.73 while Nkonya scored lowest at 11.65. Nsanku establishes a publicly available, community-extensible evaluation infrastructure for African language NLP research.

[78]  arXiv:2605.04209 [pdf, ps, other]
Title: Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We present Sparse Backdoor, a supply-chain attack that plants a \emph{provably undetectable} backdoor in pre-trained image classifiers, including convolutional networks and Vision Transformers. The attack injects a structured sparse perturbation along a randomly chosen direction into a small subset of columns at each fully connected layer, propagating a trigger signal to an adversary-chosen target class, and masks the perturbation with an independent isotropic Gaussian dither. The dither serves a single technical purpose: it induces a clean reference distribution anchored at the pre-trained weights, against which undetectability can be formalized. Under a mild margin condition on the pre-trained classifier, we show that the dithered reference is functionally equivalent to the original classifier. We prove that distinguishing the backdoor-injected model from this reference is at least as hard as Sparse PCA detection, which is computationally infeasible under standard hardness assumptions. The guarantee holds against any probabilistic polynomial-time distinguisher with white-box access to the parameters.

[79]  arXiv:2605.04213 [pdf, ps, other]
Title: The Anatomy of Silent Data Corruption: GPU Error Pattern Study and Modeling Guidance
Comments: Accepted for publication in IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) 2026 (Industry Track)
Subjects: Hardware Architecture (cs.AR)

Silent data corruption (SDC) threatens the reliability of large-scale GPU clusters used for training large language models, yet its rarity and lack of explicit error signals make accurate high-level modeling challenging. To address this gap, we conducted a large-scale gate-level stuck-at fault injection on a production-class data-center GPU, consuming over three million simulator hours across 63 CUDA micro-benchmarks. We extracted GPU SDC characteristics in terms of corruption types, bit-flip behavior, and warp-aligned spatial correlation. Our results show that NaN/+INF/-INF account for only 1.01% of SDC outcomes, that single-bit flips constitute less than 40% of bit-flip events, and that corruption addresses exhibit periodicity. These statistics motivate distribution-aware high-level fault modeling and realistic software-based fault injection for resilience evaluation of production-class GPU architectures.

[80]  arXiv:2605.04215 [pdf, ps, other]
Title: Predict-then-Diffuse: Adaptive Response Length for Compute-Budgeted Inference in Diffusion LLMs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Diffusion-based Large Language Models (D-LLMs) represent a promising frontier in generative AI, offering fully parallel token generation that can lead to significant throughput advantages and superior GPU utilization over traditional autoregressive paradigm. However, this parallelism is constrained by the requirement of a fixed-size response length prior to generation. This architectural limitation imposes a severe trade-off: oversized response length results in computational waste on semantically meaningless padding tokens, while undersized response length cause output truncation requiring costly re-computations that introduce unpredictable latency spikes. To tackle this issue, we propose Predict-then-Diffuse, a simple and model-agnostic framework, that enables compute-budgeted inference per input query by first estimating the response length and then using it to run inference with D-LLM. At its core lies a Adaptive Response Length Predictor (AdaRLP) auxiliary predictor that predicts the optimal response length given an input query. As a measure against under-predicting the response length and re-running inference with a higher response length, we introduce a data-driven safety mechanism, which trades a negligible padding overhead. As a whole, our framework limits the significant waste of computation on padding tokens and preserves output quality. Experimental validation on multiple datasets demonstrate that Predict-then-Diffuse significantly reduces computational costs (FLOP) compared to the default D-LLM inference mechanism and baselines based on heuristics, while being robust to skewed data distributions.

[81]  arXiv:2605.04217 [pdf, ps, other]
Title: Jordan-RoPE: Non-Semisimple Relative Positional Encoding via Complex Jordan Blocks
Authors: Yaobo Zhang
Comments: 15 pages, 4 figures, 6 tables; code available at this https URL
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Relative positional encodings determine which functions of query-key lag can enter the primitive attention logit. RoPE supplies a rotary phase, while ALiBi supplies an additive distance bias. Motivated by group-theoretic views of linear translation-invariant positional encodings, we study a non-semisimple case in which a complex rotary eigenvalue and a nilpotent response live in the same defective Jordan block. The resulting relative operator generates oscillatory-polynomial features such as $e^{-\gamma d}\cos(\omega d)$, $e^{-\gamma d}\sin(\omega d)$, $d e^{-\gamma d}\cos(\omega d)$, and $d e^{-\gamma d}\sin(\omega d)$, for causal lag $d=i-j\geq 0$. Thus the construction realizes a distance-modulated phase basis $d e^{i\omega d}$, rather than merely adding a separate distance channel to RoPE.
We formulate Exact Jordan-RoPE as a non-semisimple one-parameter representation, give its real block form, and specify the contragredient query action required by non-orthogonal positional maps. We also distinguish this exact representation from stabilized variants whose bounded shear improves numerical behavior but breaks the exact group law. Kernel-level diagnostics and a Jordan-friendly synthetic language-model task show that the coupled Jordan basis is useful when the target contains distance-modulated phase interactions. On a small WikiText-103 byte language model, a scaled-exact variant improves over RoPE and direct-sum baselines within the Jordan family, while RoPE+ALiBi remains strongest overall. The evidence is structural rather than a broad performance claim.

[82]  arXiv:2605.04221 [pdf, ps, other]
Title: Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Clinical named entity recognition from dental progress notes is challenging because documentation is highly unstructured, domain-specific, and often privacy-sensitive. We developed a locally deployable framework that enables small language models to self-generate, verify, refine, and evaluate entity-specific prompts for extracting multiple clinical entities from dental notes. Using 1,200 annotated notes, we evaluated candidate open-weight models with multi-prompt ensemble inference and further adapted selected models using QLoRA-based supervised fine-tuning and direct preference optimization. Model performance varied substantially, highlighting the need for task-specific evaluation rather than reliance on generic benchmarks. Qwen2.5-14B-Instruct achieved the strongest baseline performance. After DPO, Qwen2.5-14B-Instruct and Llama-3.1-8B-Instruct achieved micro/macro F1 scores of 0.864/0.837 and 0.806/0.797, respectively. These findings suggest that automated prompt optimization combined with lightweight preference-based post-training can support scalable clinical information extraction using locally deployed small language models.

[83]  arXiv:2605.04222 [pdf, ps, other]
Title: Safety by Invariance, Liveness through Refinement: Heterogeneous Contract Framework for Co-Design of Layered Control
Comments: 22 pages
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

Real-world control systems must achieve long-horizon objectives (liveness) while respecting continuous-time safety constraints, a combination that motivates hierarchical layered control architectures (LCAs). Existing LCA research, however, lacks (i) a uniform specification language across discrete planning and continuous execution, (ii) formal guarantees that specifications are preserved when interconnecting subsystems at heterogeneous time scales, and (iii) compositional separation between layers, owing to reliance on naive input-filtering laws. This paper addresses all three gaps by importing the safety--liveness decomposition into a heterogeneous assume--guarantee framework: \emph{safety is enforced by invariance} at the continuous-time layer, while \emph{liveness is achieved through refinement} at the discrete-time layer, with inter-layer coordination formalized via vertical refinement and timing-compatibility conditions. We instantiate this contract with a novel LCA combining an MPC planner, an input-to-state stabilizing (ISS) low-level controller, and a reference-governor bridge, and validate it on a Hybrid Energy Storage System (HESS) comprising a battery and a supercapacitor.

[84]  arXiv:2605.04225 [pdf, ps, other]
Title: ARMATA: Auto-Regressive Multi-Agent Task Assignment
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Coordinating multi-agent systems over spatially distributed areas requires solving a complex hierarchical problem: first distributing areas among agents (allocation) and subsequently determining the optimal visitation order (routing). Existing methods typically decouple these stages ignoring inter-stage dependencies or rely on decentralized heuristics that lack global context. In this work, we propose a centralized, fully end-to-end auto-regressive framework that jointly generates allocation decisions and routing sequences. The core contribution of our approach is a multi-stage decoding mechanism that unifies high-level allocation and low-level routing in a single autoregressive pass while maintaining a centralized global state. This enables the model to implicitly balance workload distribution with routing efficiency, avoiding local optima common in decentralized methods. Extensive experiments demonstrate that our method significantly outperforms diverse baselines, achieving up to a 20\% improvement in solution quality over industrial solvers such as Google OR-Tools, IBM CPLEX, and LKH-3, while reducing computation time from hours to seconds.

[85]  arXiv:2605.04226 [pdf, ps, other]
Title: ipc_shared_ptr: A Publish/Subscribe-Aware Smart Pointer for Cross-Process Object Lifetime Management
Comments: Accepted for publication in the 2026 IEEE 29th International Symposium on Real-Time Distributed Computing (ISORC); 10 pages, 8 figures
Subjects: Operating Systems (cs.OS); Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO)

True zero-copy Inter-Process Communication (IPC) in publish/subscribe (pub/sub) middleware such as Robot Operating System 2 (ROS 2) requires subscribers to reference message objects in publisher-owned shared memory. Objects must not be reclaimed while referenced, yet must eventually be reclaimed, with correct handling of crash recovery and Transient Local QoS retention requirements. We propose ipc_shared_ptr, a pub/sub-aware smart pointer for cross-process message lifetime management. ipc_shared_ptr exploits pub/sub structural properties to specialize Birrell's reference listing, limiting global metadata updates to per-subscriber 0<->1 transitions and achieving an order-of-magnitude reduction in global communication over general-purpose distributed reference counting. We analyze the key metadata management tradeoff: scalability versus implementation simplicity. Owner-driven reclaim offers greater scalability, but concurrent membership changes and reclamation decisions produce races that widen the correctness-verification state space. Single-writer achieves structural atomicity, eliminating this complexity at the cost of a centralized bottleneck. iceoryx2 (owner-driven reclaim) and Agnocast -- a true zero-copy ROS 2 IPC middleware sharing the publisher's heap with subscribers and adopting ipc_shared_ptr with single-writer -- embody each architecture. Comparative evaluation at the scale of Autoware -- the largest open-source ROS 2 application -- confirms that single-writer achieves sufficient scalability: at 200 topics, two subscribers per topic and 100 Hz, Agnocast's E2E p99.9 is 2.9x lower than iceoryx2's, justifying implementation simplicity over owner-driven reclaim.

[86]  arXiv:2605.04227 [pdf, ps, other]
Title: Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Procedural tasks with multiple ordered steps are ubiquitous in daily life. Recent advances in multimodal large language models (MLLMs) have enabled personal assistants that support daily activities. However, existing systems primarily provide reactive guidance triggered by user queries, or limited proactive assistance for isolated short-term events rather than long-horizon procedural tasks. In this work, we introduce Pro$^2$Assist, a step-aware proactive assistant that continuously tracks fine-grained task progress and reasons over the user's evolving state to provide timely assistance throughout tasks. Pro$^2$Assist leverages multimodal data from augmented reality (AR) glasses to achieve motion-based perception. It then extracts step-oriented procedural context from multi-scale temporal dynamics and task-specific expert knowledge. Based on both sensory input and procedural context, Pro$^2$Assist performs continuous reasoning to infer user needs and display timely assistance on AR glasses. We evaluate Pro$^2$Assist using a dataset curated from public sources and a real-world dataset collected on our testbed with AR glasses. Extensive evaluations show that Pro$^2$Assist outperforms the best-performing baselines by over 21% in procedural action understanding accuracy, and it achieves up to 2.29x the proactive timing accuracy of baselines. A user study with 20 participants further shows that 90% find Pro$^2$Assist useful, indicating its effectiveness for real-world procedural assistance.

[87]  arXiv:2605.04228 [pdf, ps, other]
Title: Thinking fast and slow -- decision intelligence for power systems
Authors: Apoorv Mathur
Comments: 5 pages, This work has been submitted to IEEE for possible publication
Subjects: Systems and Control (eess.SY); Distributed, Parallel, and Cluster Computing (cs.DC)

Decision-making in power systems spans multiple timescales - from milliseconds to prevent surges, to seconds to balance frequency and protect grid assets, to minutes for real-time energy balancing, to day-ahead, seasonal, and long-term planning. Growing uncertainty and complexity, driven by intermittent renewables and distributed energy resources (DER), demand fresh approaches to power system intelligence and architecture. Daniel Kahneman describes the interplay of two systems of human decision-making: System 1 that is fast, intuitive, experience based, reactive, and System 2 that is slow, deliberate, analytical. Similarly, octopus intelligence illustrates a model for distributed yet coordinated decision-making between central and edge intelligence. Future power systems must embed coordinated intelligence that operates across diverse timescales and with placement at both edge and centralized levels. This paper maps decision-intelligence in power systems against System 1 and 2 and edge-central architecture paradigms based on the trade-offs inherent in decision making such as speed/latency, energy cost/compute, accuracy, and robustness. The framework inspires an agentic intelligence architecture - laying the foundation for trustworthy, autonomous power systems of the future.

[88]  arXiv:2605.04229 [pdf, ps, other]
Title: Capabilities of Auto-encoders and Principal Component Analysis of the Reduction of Microstructural Images; Application on the Acceleration of Phase-Field Simulations
Comments: 21 pages, 8 figures. Preprint version of article published in Computational Materials Science
Journal-ref: Computational Materials Science, Volume 216, 5 January 2023, Article 111820
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)

In this work, a data-driven framework based on Phase-Field simulations data is proposed to highlight the capabilities of neural networks to ensure accurate low dimensionality reduction of simulated microstructural images and to provide time-series analysis. The dataset was indeed constructed from high-fidelity Phase-Field simulations. Analyses demonstrated that the association of auto-encoder neural networks and principal component analyses leads to ensure efficient and significant dimensionality reduction: 1/196 of reduction ratio with more than 80% of accuracy. These findings give insight to apply analyses on data from the latent dimension. Application of Long Short Term Memory (LSTM) neural networks showed the possibility of making next frame predictions; that makes possible the acceleration of Phase-Field simulation without the need of high computing resources. We discussed the application of such a framework on various areas of research. Different methods are proposed from the conducted analyses, in order to ensure dimensionality reduction, including auto-encoders, principal component analysis and Artificial Neural Networks, and time-series analysis, including LSTM and Gated Recurrent Unit (GRU).

[89]  arXiv:2605.04230 [pdf, ps, other]
Title: Layerwise LQR for Geometry-Aware Optimization of Deep Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Geometry-aware optimizers such as Newton and natural gradient can improve conditioning in deep learning, but scalable variants such as K-FAC, Shampoo, and related preconditioners usually impose structural approximations early, often discarding cross-layer interactions induced by the network computation. We introduce Layerwise LQR (LLQR), a framework for learning structured inverse preconditioners under a global layerwise optimal-control objective. The starting point is an exact equivalence: the steepest-descent step under a broad class of divergence-induced quadratic models--including Newton, Gauss-Newton, Fisher/natural-gradient, and intermediate-layer metrics--can be written as a finite-horizon Linear Quadratic Regulator (LQR) problem. This formulation serves as a reference that exposes the layerwise dynamics and cost matrices encoding the original dense geometry. We then derive a scalable relaxation that learns diagonal, (E-)Kronecker-factored, or other structured inverse preconditioners by minimizing the LQR objective and reusing them across iterations. The resulting optimizer wraps standard methods while retaining a principled connection to second-order geometry, without forming or inverting the global curvature matrix. Experiments on ResNets and Transformers show that LLQR improves optimization dynamics and often translates these gains into improved final test performance, while adding only modest wall-clock overhead. It establishes LLQR as a practical framework for geometry-aware second-order methods and a reference for evaluating scalable approximations.

[90]  arXiv:2605.04231 [pdf, ps, other]
Title: Anatomy of a failure: When, how, and why deep vision fails in scientific domains
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Mirroring its ubiquity in popular media and all human activities, the use of deep learning (DL) is rapidly growing in scientific imaging modalities. However, unlike everyday RGB pictures, pixels encode precise physicochemical properties in scientific imaging across potentially thousands of channels. While DL is well validated on human-centric RGB perceptual tasks, its effectiveness for scientific imaging remains uncertain. Here, we show that the naive application of DL frameworks to scientific images can lead to critical failures. We evaluate the use of DL for pathology, comparing RGB images of stained tissue with the quantitative and information-rich biochemical signatures of infrared (IR) imaging. Despite this informational advantage, DL models trained on IR data paradoxically underperform. We investigate this discrepancy to find that IR data priors interact poorly with the simplicity bias of DL, causing models to collapse to one-dimensional predictions. This constitutes a catastrophic DL failure because the model's representational capacity remains largely unused, while furthermore raising AI safety concerns and undermining the advantages of such scientific modalities. Notably, this problem persists even with state-of-the-art DL robustification strategies, which are primarily designed and validated for RGB imagery and thus inherit the same prior-bias mismatch. This work establishes a framework for understanding the limitations of generic DL in science and advocates for the study of modality-specific failure modes to guide the development of specialized, safe AI algorithms.

[91]  arXiv:2605.04232 [pdf, ps, other]
Title: Probabilistic Floating-Point Round-Off Analysis via Concentration Inequalities
Comments: Long version of the eponymous OOPSLA 2026 paper
Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL); Numerical Analysis (math.NA)

Floating-point round-off errors are ubiquitous in numerically intensive programs arising in fields such as scientific computing and optimization. As floating-point errors potentially lead to unexpected and catastrophic program failures, one must derive guaranteed round-off thresholds to ensure the correctness of these programs. However, deterministic round-off thresholds tend to be too conservative to be usable in practice, since they often involve large round-off errors that occur with small probability. Probabilistic thresholds relax deterministic ones by specifying that the probability of the round-off error exceeding a threshold is below a given confidence.
In this work, we propose a novel approach to probabilistic round-off analysis, by applying concentration inequalities over the Taylor expansion from FPTaylor (TOPLAS 2018). A major obstacle in applying concentration inequalities is that the Taylor expansion involves absolute value operators that make the calculation of the expected values of the first order partial differential terms difficult. Our first step to overcome this obstacle is a sound over-approximation that removes the absolute value operators in polynomial expressions. Then, we show how to handle fractional expressions by a transformation into polynomial case. Finally, we show how to improve our approach with range partitioning. Our approach is scalable since the key computational part is the calculation of expected values of polynomial expressions with independent variables, for which the linear and independence properties of expectation boost the computation. Experimental results show that our approach is orders of magnitude more time efficient, while producing thresholds with comparable precision against the state of the art.

[92]  arXiv:2605.04234 [pdf, ps, other]
Title: Disentangled Learning Improves Implicit Neural Representations for Medical Reconstruction
Comments: 17 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Implicit neural representations (INRs) have emerged as a powerful paradigm for medical imaging via physics-informed unsupervised learning. Classical INRs optimize an entire network from scratch for each subject, leading to inefficient training and suboptimal imaging quality. Recent initialization-based approaches attempt to inject population priors into pre-trained networks, yet they rely on high-quality images and often suffer from catastrophic forgetting during fine-tuning. We present DisINR, a novel INR framework that explicitly disentangles shared and subject-specific representations. DisINR introduces a shared encoder-decoder pair and subject-specific encoders, whose features are jointly decoded for image reconstruction. By integrating differentiable forward models, it pre-trains the shared modules directly from limited raw measurements, removing the need for pre-acquired high-quality images. During test-time adaptation, only the subject-specific encoder is optimized, while the shared pair remains frozen, effectively preserving learned priors. Extensive evaluations on three representative medical imaging tasks show that DisINR significantly outperforms state-of-the-art INRs in both reconstruction accuracy and efficiency.

[93]  arXiv:2605.04236 [pdf, ps, other]
Title: Adaptive Consensus in LLM Ensembles via Sequential Evidence Accumulation: Automatic Budget Identification and Calibrated Commit Signals
Authors: Roberto Medina
Subjects: Machine Learning (cs.LG)

Large Language Model ensembles improve reasoning accuracy up to a performance boundary; beyond it, additional deliberation degrades accuracy. Static-budget methods cannot detect this boundary. Extended-thinking architectures compound the problem: a wrong answer after 120k tokens is indistinguishable from a correct one. We introduce DASE (Deliberative Adaptive Stopping Ensemble), a stopping heuristic for iterative ensemble deliberation that commits early on genuine consensus and applies a global-frequency fallback on fragmented evidence. Two configurations are evaluated: a persistence heuristic and DASE-Spatial (arena half-width W). Three contributions. (1) DASE produces a commit-type routing partition complementary to verbalized single-call confidence. On a contamination-controlled corpus (AIME 2010-2023, N=254, 3 seeds), a 120B ensemble achieves a 24.8 pp routing gap (right-wall 97.1% vs. left-wall 73.6%), statistically equivalent to Opus 4.6 Standard verbalized confidence at coverage-matched threshold (25.7 pp gap; bootstrap CI on difference: [-12.0, +10.3] pp, p=0.873). The two mechanisms disagree on 27% of routing assignments, establishing them as complements rather than substitutes; every DASE decision is accompanied by a machine-readable deliberation record. (2) Adaptive stopping, not injection bandwidth, drives accuracy gains. On AIME-300, bandwidth accounts for only 0.3 pp (ns); on GPQA-Extended, 4.4 pp bandwidth versus 5.0 pp stopping effect. DASE-Spatial ties Debate-Dense at its optimal budget using one-tenth the injection bandwidth and identifies that budget automatically; W=8 (65.0%) significantly outperforms W=4 (59.3%) on AIME-300 (adj p=0.0042). (3) Injection-based methods exhibit a retrospective accuracy-vs-inference inverted-U on both benchmarks; this pattern is hypothesis-generating for future work.

[94]  arXiv:2605.04239 [pdf, ps, other]
Title: Densification and forecasting of Sentinel-2 time series from multimodal SAR and Optical satellite data using deep generative models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Optical satellite image time series are extensively used in many Earth observation applications, including agriculture, climate monitoring, and land surface analysis. However, clouds and swath edges result in irregular sampling along the temporal dimension, limiting continuous monitoring. To address this issue, a growing body of work has focused on temporal densification and reconstruction of satellite image time series, with the objective of filling missing or cloud-contaminated observations within the temporal extent of the available data. While these approaches improve temporal continuity, they are inherently restricted to the reconstruction of the gaps within the observed time periods, and do not address the prediction of future observations. This work proposes a probabilistic deep learning framework for the densification and forecasting of Sentinel-2 time series by generating optical images at arbitrary past or future dates. The approach leverages multimodal satellite data by jointly exploiting Sentinel-2 optical and Sentinel-1 SAR observations. Unlike most existing works, we propose to focus on the uncertainty of the generated images. Experimental results demonstrate effective densification and forecasting, on sparse and temporally misaligned time series.

[95]  arXiv:2605.04242 [pdf, ps, other]
Title: Road Risk Monitor: A Deployable U.S. Road Incident Forecasting System with Live Weather and Road-Level Tiles
Authors: Anton Ivchenko
Subjects: Machine Learning (cs.LG)

Nationwide road-incident forecasting is a systems problem before it is a modeling problem. A usable service must connect historical incident archives, historicalandliveweather,nationalroadgeometry, offline model training, tile generation, web serving and runtime handoff. This paper presents Road Risk Monitor, a U.S.-wide road-safety stack that combines a nationwide H3 baseline trained on FARS fatal-crash data with a road-segment forecasting pipeline trained from TIGER/Line geometry and US-Accidents events, then serves predictions through live APIs, raster tiles, JSON road tiles, and a public web application.

[96]  arXiv:2605.04243 [pdf, ps, other]
Title: Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA
Authors: Tran Quang Liem
Comments: Preprint. 22 pages, 2 figures
Subjects: Artificial Intelligence (cs.AI)

Despite significant advances, large language models (LLMs) continue to exhibit brittle performance on complex temporal reasoning tasks. This failure mode is widely attributed to inherent deficits in autoregressive logical deduction. In this paper, we challenge this prevailing narrative, demonstrating that temporal reasoning is not the fundamental bottleneck; rather, the locus of failure lies in unstructured text-to-event representation. We introduce a novel neuro-symbolic question-answering framework governed by a Probabilistic Inconsistency Signal (PIS) that explicitly isolates perceptual errors from reasoning failures. By lifting unstructured text into explicit event graphs and interval constraints, our architecture strictly decouples semantic extraction from a symbolic reasoning engine. To robustly detect structural breaks, the PIS elegantly unifies symbolic credal intervals with epistemic neural uncertainty extracted via Evidential Deep Learning on LLM hidden states. Empirical evaluations reveal a striking paradigm shift: when provided with correct structural representations, our system's explicit proof traces achieve perfect 1.0 accuracy (4000/4000) and strictly zero false positives/negatives on temporal arithmetic benchmarks. On broader, noise-injected QA settings, the framework maintains a competitive 75.1\% accuracy while enabling deterministic, step-level failure localization. Ultimately, by isolating the representation bottleneck from the reasoning substrate, this work reframes temporal QA from an algorithmic reasoning challenge to a structural alignment problem, charting a verifiable path forward for reliable neuro-symbolic AI.

[97]  arXiv:2605.04244 [pdf, ps, other]
Title: Faster Iterative $φ$ Queries on the Positional BWT
Subjects: Data Structures and Algorithms (cs.DS)

The Positional Burrows-Wheeler Transform (PBWT) is a fundamental data structure for the efficient representation and analysis of large-scale haplotype panels. For a panel of $h$ sequences $\{S_1, \dots, S_h\}$ over $m$ sites, a key operation is the $\phi_j(i)$ query, which returns the haplotype index immediately preceding $S_i$ in co-lexicographic order at site $j$. Efficient support for $k$ iterative queries $\phi^1, \dots, \phi^k$ is essential for haplotype matching and variation analysis.
In this work, we introduce a simple and novel decomposition scheme that decomposes each haplotype row into sub-intervals, called refined segments, within which a haplotype's co-lexicographic predecessor for the sites remains unchanged. We show that refined segments satisfy two key properties: (i) each segment $[b,e]$ associated with $S_i$ overlaps with at most a constant number of segments of $S_{\phi_e(i)}$, and (ii) the total number of segments is bounded by $O(\tilde{r} + h)$, where $\tilde{r}$ denotes the number of runs in the PBWT. Building on this decomposition, we present two space-time tradeoffs for supporting $k$ iterative $\phi$ queries: (i) a structure using $O((\tilde{r} + h)\log n)$ bits of space that answers $k$ iterative queries in $O(\log \log_w \min(m,h) + k)$ time, where $n = m \cdot h$, and (ii) a more compact structure using $O(\tilde{r} \log h + h \log n)$ bits of space that supports queries in $O(k \log \log_w h)$ time.
Prior to our work, supporting these queries required $O((\tilde{r} + h)\log n)$ bits of space and $O(k \cdot \log \log_w m)$ time. Our second tradeoff is expected to be effective in practice for modern genomic datasets, where the number $h$ of haplotypes is typically much smaller than the number $m$ of sites.

[98]  arXiv:2605.04247 [pdf, ps, other]
Title: Physics-Guided Regime Unmixing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The Linear Mixing Model (LMM) dominates spectral unmixing for its simplicity, but fails under multiple scattering; existing nonlinear models compensate by applying a fixed regime uniformly across entire scenes. We propose Physics-Guided Regime Unmixing (PGRU), which estimates a pixel-wise scalar $\xi_i \in [0,1]$ from observable physical features to activate nonlinear mixing only where justified. Residuals from the Generalized Bilinear Model (GBM), the Post-Nonlinear Mixing Model (PPNM), and Hapke are combined via learned attention, yielding interpretable regime maps. Experiments on Samson, Jasper Ridge, and Urban show consistent improvements over baselines, with physical coherence $\rho > 0.90$.

[99]  arXiv:2605.04249 [pdf, ps, other]
Title: Towards a Zero-Trust Supply-Chain Assurance Rubric for ORAN RIC Applications
Authors: Chun Yin Chiu
Comments: 10 pages, 2 figures, 5 tables. Preprint. Accepted by 9th International Conference on Information Science and Systems (ICISS 2026)
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)

Open RAN enables third-party xApps and rApps to be onboarded and updated at operational cadence, creating a software supply chain that spans developers, CI systems, registries, onboarding pipelines, and runtime enforcement points. This preprint proposes a zero-trust supply-chain assurance rubric for O-RAN RIC applications. It makes three contributions: first, an app-centric lifecycle threat model for RIC applications across build, signing, publication, onboarding, runtime, and update or rollback stages; second, a WG11-aligned threat-control-evidence mapping that relates lifecycle threats to O-RAN security baselines and complementary supply-chain evidence; and third, an operator-facing assurance profile that combines secure software development practices, SBOM transparency, and SLSA-style provenance into incremental onboarding levels. Analytical case-study walkthroughs and a minimal evidence-checking workflow illustrate how the rubric can support explicit Accept, Escalate, or Block decisions during RIC app onboarding. The evaluation is intended to assess applicability rather than deployment-scale performance; empirical measurements of operational overhead, decision consistency, and detection coverage are left for future work.

[100]  arXiv:2605.04250 [pdf, ps, other]
Title: Binary Image-Based Intrusion Detection for Operational Technology Networks: Extending the SPHBI Methodology from IoT to Modbus TCP
Authors: Aamir Omar
Comments: 14 pages, 5 figures, 5 tables. Submitted to ESORICS 2026 (Spring Cycle)
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)

This paper extends the Single Packet Header Binary Image (SPHBI) intrusion detection methodology from IoT to Modbus TCP, evaluating five approaches spanning a gradient of protocol depth on the CIC Modbus 2023 dataset (11.4 million packets, eight detectable attack types). TCP/IP headers alone achieve only 51.8% binary accuracy, confirming that header-level heterogeneity exploited in IoT traffic is absent in uniform SCADA environments. Adding eight bytes of application-layer information improves binary accuracy to 98.1% with just 63 parameters, directly relevant to per-packet classification on resource-constrained OT edge devices. The best-performing approach achieves 94.4% +/- 2.2pp multiclass accuracy across nine classes (95% CI [92.9%, 95.9%], 10 seeds) with 56,873 parameters, roughly 430 times fewer than comparable ResNet50-based approaches. Per-class recall analysis shows seven of eight detectable attack types identified with recall above 94%, while replay attacks remain structurally undetectable by any single-packet method.

[101]  arXiv:2605.04251 [pdf, ps, other]
Title: Root-Cause-Driven Automated Vulnerability Repair
Comments: Under submission
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Recent LLM-based systems have made automated vulnerability repair increasingly practical, but two challenges remain. First, without strong signals about where a bug originates, repair agents drift toward shallow edits that silence the observed failure while leaving the underlying defect unresolved. Second, finding the root cause for bugs is hard: even developers familiar with the codebase frequently produce fixes that address symptoms rather than the root cause, and LLM-based agents, operating with noisier context and less program understanding, are no exception. We present Kumushi, a root-cause-driven patching agent that addresses both challenges by combining diversified dynamic fault localization with evidence-weighted ranking to focus the LLM on the code most relevant to the defect. To rigorously measure whether Kumushi produces genuinely better patches, we also introduce a two-tier patch quality metric that pairs automated oracle validation with structured expert assessment of patches. Evaluated on 178 C/C++ vulnerabilities, Kumushi substantially outperforms prior specialized repair agents under automated evaluation while matching a frontier commercial coding agent. Expert assessment then reveals differences that oracles cannot: Kumushi produces more root-cause fixes and fewer superficial patches, and is preferred in the majority of decisive pairwise comparisons. Together, these results demonstrate that progress in automated vulnerability repair requires not only stronger patching systems, but also richer evaluation methods capable of distinguishing genuine fixes from oracle-passing ones.

[102]  arXiv:2605.04253 [pdf, ps, other]
Title: Second-Order FALQON Parameter Transfer for the Max-Cut Problem on 3-Regular Graphs
Subjects: Emerging Technologies (cs.ET); Quantum Physics (quant-ph)

The Feedback-based Algorithm for Quantum Optimization (FALQON) offers a deterministic alternative to variational quantum algorithms by bypassing classical optimization loops. However, maintaining convergence on large problem instances often requires restricting the time step, necessitating quantum circuit depths that exceed Noisy Intermediate-Scale Quantum (NISQ) hardware capabilities. This paper investigates the parameter transferability of second-order FALQON applied to the Max-Cut problem on 3-regular graphs. Through numerical experiments evaluating quantum circuits up to 16 layers on graphs up to 24 nodes, we demonstrate a highly advantageous scaling behavior: transferring feedback parameters optimized on small instances to larger target graphs yields significantly higher approximation ratios than natively optimizing the parameters directly on the larger graphs. This performance advantage arises because parameters trained on smaller instances can safely adopt aggressively larger time steps. By offloading the expensive parameter discovery phase to small-scale instances, this transfer strategy simultaneously reduces computational overhead and enhances the approximation ratio, thereby bringing FALQON closer to practical viability on near-term quantum architectures.

[103]  arXiv:2605.04254 [pdf, ps, other]
Title: Hierarchical Support Vector State Partitioning for Distilling Black Box Reinforcement Learning Policies
Comments: Accepted for poster presentation at HHAI 2026
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)

We introduce State Vector Space Partitioning (SVSP), a novel method to mimic a black box reinforcement learning policy using a set of human-interpretable subpolicies. By partitioning a distillation dataset of state action pairs with linear support vector machine splits, SVSP constructs a compact and structured representation of the original policy. Our method improves mean return by +7.4\% over previous critic driven state partitioning attempts such as Voronoi State Partitioning (VSP) and +2.8\% over the original TD3 policy, while reducing the number of required subpolicies against VSP by 82.1\%. Our results pave the path towards a more flexible form of distillation where both the decision boundary and surrogate models can be chosen within a margin of the original black box behavior.

[104]  arXiv:2605.04256 [pdf, ps, other]
Title: phys-MCP: A Control Plane for Heterogeneous Physical Neural Networks
Comments: 13 pages, 3 figures, 4 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)

Physical neural networks (PNNs) embed computation directly in material dynamics, including molecular, chemical, biological, photonic, memristive, and mechanical substrates. They are attractive for edge computing, especially at the extreme edge, where computation can be placed at the interface to sensing, actuation, or the physical process itself. However, PNNs are difficult to integrate into edge-cloud software stacks because each substrate exposes distinct interfaces, timing behavior, observability limits, and lifecycle requirements. This paper argues that the missing systems component is a common control plane for heterogeneous PNNs. We present phys-MCP, a substrate-aware orchestration architecture that exposes physical neural substrates as discoverable and invocable resources for edge, fog, and cloud workflows, while preserving their possible placement at the extreme edge. phys-MCP defines a capability model, lifecycle semantics, telemetry interfaces, and digital-twin bindings that retain substrate-specific properties such as latency, resetability, plasticity, and I/O modality. We instantiate the architecture through a prototype with three representative backend classes, an HTTP-backed execution path, and an integrated Cortical Labs adapter exposing a wetware-facing API path through the same control model. The evaluation combines controlled experiments on representative backends with end-to-end validation of the Cortical Labs path. Results show descriptor-portable integration across heterogeneous backends, improved runtime-aware matching over simpler baselines, telemetry-aware recovery under representative faults, successful execution against the API-backed wetware path, and small local control-path overhead. Overall, results provide prototype-level evidence that substrate-aware control can span heterogeneous physical AI resources, twin-backed backends, and a wetware-facing API path.

[105]  arXiv:2605.04257 [pdf, ps, other]
Title: HUGO-CS: A Hybrid-Labeled, Uncertainty-Aware, General-Purpose, Observational Dataset for Cold Spray
Comments: 22 pages, 8 figures, 4 tables
Subjects: Machine Learning (cs.LG)

Cold spraying is an increasingly common approach for repairing and manufacturing components due to its solid-state manufacturing capabilities. However, process optimization remains difficult due to many interdependent parameters and the lack of large-scale, machine-readable data to support modeling. While the scientific literature contains many relevant experiments, results are inconsistently reported (often in tables and figures) and use non-uniform units, limiting utilization at scale. To address these limitations, this work presents HUGO-CS, a literature-derived dataset of 4,383 cold-spray experiments with 144 features from 1,124 sources, exceeding the previous largest dataset (137 samples) by 30x. With completely manual extraction requiring an average of 91 minutes per document, this work designs and leverages a Hybrid-labeled, Uncertainty-aware, General-purpose, Observational extraction framework, called HUGO, to support this extraction. HUGO combines automated LLM-based labeling with targeted manual label refinement to handle this experimental result extraction process from scientific literature. To balance labeling efficiency with extraction accuracy, HUGO introduces a Hierarchical Risk Mitigation (HRM) to route LLM outputs with a high risk of potential errors for manual review, while retaining low-risk records as auto-labeled. Lastly, HUGO post-processing consolidates categorical descriptors, maps reported feedstock chemistries into structured continuous compositions, and normalizes units across sources. Of the 4,383 reported experiments, 1,765 are hand-labeled, providing a high-quality labeled subset for benchmarking, error analysis, and higher-fidelity data points. All code to replicate this work, along with the complete HUGO-CS dataset, are released under a CC-BY license at https://github.com/sprice134/HUGO.

[106]  arXiv:2605.04258 [pdf, ps, other]
Title: Constructing Suffixient Arrays Revisited
Comments: To appear at CPM2026
Subjects: Data Structures and Algorithms (cs.DS)

Recently, Cenzato et al.\ proposed a new text index, called the \emph{suffixient array}, which is a subset of the suffix array and supports locating a single pattern occurrence or finding its maximal exact matches (MEMs), assuming random access to the input text $T[1..n]$ is available. They show that, given the suffix array, the longest common prefix array, and the Burrows--Wheeler transform (BWT) of the reverse of $T[1..n]$ over an alphabet $\{1,\ldots,\sigma\}$, a suffixient array can be constructed in linear time. However, their construction algorithms require multiple scans of these arrays. When restricted to a single pass over the arrays, they present an alternative construction algorithm running in $O(n + \overline{r} \log \sigma)$ time, where $\overline{r}$ is the number of runs in the BWT of the reversed text. In this paper, we present a new one-pass algorithm that constructs a suffixient array in linear time under the standard RAM model.

[107]  arXiv:2605.04259 [pdf, ps, other]
Title: EngThrive: Make It Fast and Easy to Do Great Work
Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)

Frameworks such as SPACE, DevEx, and DORA established that developer productivity is inherently multidimensional, but left practitioners with a practical question: what should we measure, and how should we use it to improve? This paper introduces Engineering Thrive (EngThrive), a measurement and improvement system developed and deployed across Microsoft's engineering organization. EngThrive organizes productivity around three dimensions - Speed, Ease, and Quality - with Thriving as a guardrail to ensure developer wellbeing improves alongside performance. Within each dimension, outcome-oriented North Star metrics are paired with diagnostic submetrics, combining system telemetry with developer surveys to provide both scale and context. We describe the design principles that guide metric selection, including an approach in which well-chosen metrics align "gaming" behavior with genuine improvement. We also outline the data platform, survey program, and dashboard ecosystem required to operationalize this approach in practice, and present case studies demonstrating how outcome-oriented measurement enables sustained, system-level improvements. Finally, we show that EngThrive functions as a general-purpose evaluation language, applicable not only to developer tools and AI, but to organizational policies, work environments, and other factors that shape how developers experience their work. We offer EngThrive as a concrete model for organizations seeking to move beyond measuring activity toward improving outcomes.

[108]  arXiv:2605.04260 [pdf, ps, other]
Title: Lightweight Vulnerability Detection from Code Metrics and Token Features
Authors: Chun Yin Chiu
Comments: 5 pages, 4 tables. Preprint. Accepted by 5th International Conference on Artificial Intelligence and Software Engineering (ICAISE 2026)
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Vulnerability detection for C/C++ code increasingly relies on heavy representations such as code graphs and deep models, while many practical workflows still benefit from fast and reproducible ranking baselines for human triage. This preprint studies a lightweight function-level vulnerability triage pipeline that combines sparse token n-grams from raw function text with a small set of inexpensive code metrics, including NLOC, approximate cyclomatic complexity, token count, maximum brace depth, and parameter count. We use TF-IDF token features and a class-weighted logistic regression classifier, avoiding deep learning, transformers, and program graphs.
Using the Devign function-level labels, we evaluate random and cross-project settings, including a FFmpeg-to-QEMU transfer experiment. We emphasize precision-recall AUC and Recall@10% as ranking-oriented metrics for skewed or triage-oriented workloads. On the random split, the best combined variant reaches PR-AUC 0.642 and Recall@10% 0.161, while cross-project generalization is substantially harder, with PR-AUC around 0.436. We further report ablations, test-only identifier-renaming robustness, and end-to-end efficiency. The results suggest that simple token and metric features provide a useful transparent baseline, but also expose sensitivity to superficial lexical cues and limited cross-project transfer.

[109]  arXiv:2605.04261 [pdf, ps, other]
Title: Laundering AI Authority with Adversarial Examples
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Vision-language models (VLMs) are increasingly deployed as trusted authorities -- fact-checking images on social media, comparing products, and moderating content. Users implicitly trust that these systems perceive the same visual content as they do. We show that adversarial examples break this assumption, enabling \emph{AI authority laundering}: an attacker subtly perturbs an image so that the VLM produces confident and authoritative responses about the \emph{wrong} input. Unlike jailbreaks or prompt injections, our attacks do not compromise model alignment; the attack operates entirely at the perceptual level. We demonstrate that standard attacks against publicly available CLIP models transfer reliably to production VLMs -- including GPT-5.4, Claude Opus~4.6, Gemini~3, and Grok~4.2. Across four attack surfaces, we show that authority laundering can amplify misinformation, disparage individuals, evade content moderation, and manipulate product recommendations. Our attacks have high success rates: In hundreds of attacks targeting identity manipulation and NSFW evasion, we measure success rates of $22 - 100\%$ across six models. No novel attack algorithm is required: basic techniques known for over a decade suffice, establishing a lower bound on attacker capability that should concern defenders. Our results demonstrate that visual adversarial robustness is now a practical -- and still largely unsolved -- safety problem.

[110]  arXiv:2605.04262 [pdf, ps, other]
Title: Imagery Dataset for Remaining Useful Life Estimation of Synthetic Fibre Ropes
Comments: 7 pages, 2 figures, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Remaining useful life (RUL) estimation of synthetic fibre ropes (SFRs) is critical for safe operation in offshore-crane, wind turbine installation, and heavy-load handling applications, where rope failure can result in catastrophic safety incidents and costly downtime. Despite growing research interest in data-driven condition monitoring, there is no publicly available image dataset that captures the complete degradation lifecycle of SFRs under controlled cyclic fatigue loading. To address this gap, we present a novel image dataset comprising approximately 34,700 high-resolution images of eleven Dyneema SK75/78 high-modulus polyethylene (HMPE) rope samples subjected to cyclic fatigue on a sheave-bend test stand at seven distinct axial load levels ranging from 60 kN to 280 kN. Ropes were loaded until mechanical failure, with fatigue lifetimes ranging from 695 cycles to 8,340 cycles. After every fixed number of sheave cycles (an inspection burst), ten images were captured at different cross-sectional positions along the rope, providing spatially representative sampling of surface degradation throughout the rope's entire service life. The images obtained from each load are annotated with the corresponding elapsed cycle count, enabling a direct computation of RUL for any rope in the sequence. This dataset aims to support a broad range of machine learning (ML) tasks including RUL regression, damage progression modelling, anomaly detection, and load-conditioned prognostics. The dataset is intended to serve as a benchmark resource for the development and comparison of vision-based condition monitoring (CM) and prognostics algorithms for SFRs.

[111]  arXiv:2605.04263 [pdf, ps, other]
Title: Parallel Prefix Verification for Speculative Generation
Subjects: Artificial Intelligence (cs.AI)

We introduce PARSE (PArallel pRefix Speculative Engine), a speculative generation framework that accelerates large language model (LLM) inference by parallelizing prefix verification on a semantic level. Existing speculative decoding methods are fundamentally limited by token-level equivalence: the target model must verify each token, leading to short acceptance lengths and modest speedups. Moving to semantic or segment-level verification can substantially increase acceptance granularity, but prior approaches rely on sequential verification, introducing significant overhead and limiting practical gains. PARSE introduces parallel prefix verification, enabling semantic-level verification without sequential checks. Given a full draft from a draft model, the target model evaluates correctness across multiple prefixes in a single forward pass using a custom attention mask, directly identifying the maximal valid prefix. This eliminates sequential segment verification, and makes verification compute-efficient. PARSE is orthogonal to token-level speculative decoding and can be composed with it for additional gains. Across models and benchmarks, PARSE delivers $1.25\times$ to $4.3\times$ throughput gain over the target model, and $1.6\times$ to $4.5\times$ when composed with EAGLE-3, all with negligible accuracy degradation. This demonstrates parallel prefix verification as an effective, general approach to accelerating LLM inference.

[112]  arXiv:2605.04264 [pdf, ps, other]
Title: Governed Collaborative Memory as Artificial Selection in LLM-Based Multi-Agent Systems
Subjects: Multiagent Systems (cs.MA)

Persistent memory is turning language-model-based agents from stateless participants in isolated interactions into state-bearing components of LLM-based multi-agent systems. As memory becomes durable, reloadable, and behavior-shaping across agents, sessions, or versions, a design question arises that is not captured by retrieval accuracy or access control alone: which candidate memories should become shared institutional state? This Viewpoint frames that problem as governed collaborative memory. We argue that memory governance functions as a selection regime, determining which memory variants persist, which remain private, and which are rejected, abstained from, or superseded. We distinguish ungoverned persistence, constitutional or hybrid selection, automatic metric-based selection, and human-ratified artificial selection, emphasizing that these regimes are not a ranking but a design choice over target properties. We then describe a layered architecture that separates agent-local memory, shared institutional memory, archive memory, and project-continuity memory, with provenance and version lineage making selection inspectable. Documented traces from one running LLM-based multi-agent ecosystem illustrate unmanaged false-memory persistence, ratified institutional memory, rejection and revision, identity-preserving expansion, and governance-as-learning. The contribution is a design agenda: persistent LLM-based multi-agent systems should evaluate memory not only for recall and performance, but also for provenance fidelity, selection traceability, epistemic quality, correction pathways, and role preservation.

[113]  arXiv:2605.04266 [pdf, ps, other]
Title: Explaining and Preventing Alignment Collapse in Iterative RLHF
Comments: Code at: this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Reinforcement learning from human feedback (RLHF) typically assumes a static or non-strategic reward model (RM). In iterative deployment, however, the policy generates the data on which the RM is retrained, creating a feedback loop. Building on the Stackelberg game formulation of this interaction, we derive an analytical decomposition of the policy's true optimization gradient into a standard policy gradient and a parameter-steering term that captures the policy's influence on the RM's future parameters. We show that standard iterative RLHF, which drops this steering term entirely, suffers from alignment collapse: the policy systematically exploits the RM's blind spots, producing low-quality, high-reward outputs whose feedback reinforces the very errors it exploits. To mitigate this, we propose foresighted policy optimization (FPO), a mechanism-design intervention that restores the missing steering term by regularizing the policy's parameter-steering effect on RM updates. We instantiate FPO via a scalable first-order approximation and demonstrate that it prevents alignment collapse on both controlled environments and an LLM alignment pipeline using Llama-3.2-1B.

[114]  arXiv:2605.04267 [pdf, ps, other]
Title: QUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimization
Comments: Accepted at Genetic and Evolutionary Computation Conference (GECCO '26)
Subjects: Machine Learning (cs.LG)

Interactive multi-objective optimization systems face a budget allocation dilemma: one can spend resources on expensive objective evaluations or on eliciting decision-maker preferences that identify the relevant region of the Pareto set. Moreover, preference elicitation itself spans modalities with different information content and cognitive burden, ranging from cheap, noisy pairwise preference statements (PS) to richer but costlier indifference adjustments (IA).
We study cost-aware optimization under an unknown scalarization and introduce QUIVER (Query-Informed Value Estimation for Regret), a surrogate-assisted evolutionary multi-objective optimizer that adaptively chooses between objective evaluations and heterogeneous preference queries. At each step, QUIVER selects the next action by maximizing the expected decision-quality improvement per unit total cost. Across DTLZ and WFG benchmarks under synthetic decision-maker models, QUIVER achieves the lowest final utility regret on challenging WFG problems (utility regret of 2.14 on WFG4, 2.82 on WFG9: a 25% improvement over baselines), outperforming all single-modality baselines. We analyze how the optimal mix of PS and IA adapts to problem difficulty: on easy problems (DTLZ2), QUIVER selects 80\% PS queries; on hard problems (WFG9), it shifts to 35% IA queries. This adaptive modality selection demonstrates cost-aware preference learning in action.

[115]  arXiv:2605.04270 [pdf, ps, other]
Title: OPENJ: A Conceptual Framework for Open-Source Digital Human Modeling and Ergonomic Assessment in a CAD Environment
Comments: 11 pages, 2 figures, submitted to ASME IMECE 2026
Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO); Systems and Control (eess.SY)

Industrial workplace challenges range from musculoskeletal disorders -- a leading cause of occupational injury -- to suboptimal workstation layouts, inefficient task sequences, and poor human-equipment fit. Digital human modeling (DHM) tools address several of these challenges by placing a scalable virtual mannequin in a computer-aided design (CAD) environment, enabling engineers to evaluate ergonomic risk through standardized assessment methods (RULA, REBA, NIOSH Lifting Equation, OWAS), optimize workstation layouts for reach and visibility, predict task postures through inverse kinematics, and simulate operations before physical implementation. Despite four decades of development since the Jack system originated at the University of Pennsylvania in the 1980s, the integrated DHM capability set -- anthropometric mannequin, posture prediction, ergonomic assessment, and CAD integration -- remains exclusive to commercial platforms such as Siemens Tecnomatix Jack (Process Simulate), Dassault DELMIA, Humanetics RAMSIS, and the University of Iowa's Santos system. These platforms operate under proprietary, vendor-quoted pricing models, and their acquisition and operating costs, together with closed-source implementations, have been repeatedly identified as practical adoption barriers for individual researchers, small-to-medium enterprises, and educational institutions. Organizations without access resort to manual observational methods -- paper-based worksheets applied to photographs or video -- sacrificing the predictive power and reproducibility that computational analysis provides. The paper serves as a design blueprint for (OpenJane/Joe), positioning the project for subsequent open-source implementation and community adoption.

[116]  arXiv:2605.04274 [pdf, ps, other]
Title: A Mean Curvature Approach to Boundary Detection: Geometric Insights for Unsupervised Learning
Comments: 26 pages, 6 tables, 8 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Accurate boundary detection in high-dimensional data remains a central challenge in unsupervised learning, particularly in the presence of non-linear structures and heterogeneous densities. In this work, we introduce Mean Curvature Boundary Points (MCBP), a novel geometric framework grounded in Geometric Machine Learning that departs from traditional density-based approaches by explicitly modeling the intrinsic curvature of the data manifold. The method relies on a discrete approximation of the shape operator, estimated from local k-nearest neighbor patches, to compute pointwise mean curvature without requiring explicit manifold parametrization. The key insight of MCBP is to use mean curvature as a principled descriptor of boundary structure: high-curvature regions naturally correspond to transitions between clusters, geometric irregularities, and low-density interfaces. This yields a unified geometric interpretation of boundary, outlier, and transition points. We further introduce an adaptive percentile-based thresholding scheme that enables multiscale boundary extraction without relying on ad hoc density parameters. Beyond detection, we propose a curvature-driven data decomposition that separates samples into smooth (low-curvature) and boundary (high-curvature) subsets, effectively acting as a non-linear geometric filtering mechanism. This representation enhances cluster separability and improves the robustness of downstream unsupervised algorithms. Extensive experiments on synthetic and real-world datasets demonstrate that MCBP consistently improves clustering performance, particularly in complex and high-dimensional scenarios. These results position MCBP as a concrete contribution to Geometric Machine Learning, highlighting the potential of curvature-aware analysis as a unifying paradigm bridging differential geometry and data-driven modeling.

[117]  arXiv:2605.04278 [pdf, ps, other]
Title: Material Database Agent: A Multimodal Agentic Framework for Scientific Literature Mining
Subjects: Computation and Language (cs.CL)

Materials science workflows rely on structured and unstructured data from the vast body of available scientific literature. However, most of the experimental details remain buried in text, tables, graphs and figures. Thus, constructing databases that incorporate this data is a manual, time-consuming, and hard-to-scale process. Multimodal large language models have made it feasible to extract information from text and scientific figures with high speed and accuracy. This opens the possibility of an AI system that can create production-scale material databases. Material Database Agent (MDA) is a modular, multi-agent system architecture for converting research literature into structured databases. MDA accepts article PDFs as input, which are subsequently processed in parallel into markdown files and figures. Multiple sub-agents read these markdown files and figures in parallel to assemble sub-databases for each paper. These sub-databases are then compiled into a single tabular database by an agent. As opposed to using either a rule-based approach or a single-pass pipeline for extracting information, MDA is a specialized architecture for transforming the literature into a database in the field of materials science. More generally, this study provides a basis for positioning multimodal agentic information extraction as a viable means for constructing next-generation scientific databases from the primary literature.

[118]  arXiv:2605.04279 [pdf, ps, other]
Title: Gradient Flow Structure and Quantitative Dynamics of Multi-Head Self-Attention
Authors: Ayan Pendharkar
Comments: 20 pages, 5 figures
Subjects: Machine Learning (cs.LG)

Transformer self-attention can be interpreted as a gradient flow on the unit sphere, in which tokens evolve under softmax interaction potentials and tend to form clusters. While prior work has established clustering behavior for single-head attention, the multi-head setting remains less understood due to geometric interference between heads, which invalidates standard monotonicity arguments.
In this work, we develop a theoretical framework for multi-head self-attention dynamics and resolve several open questions. We show that, under suitable conditions on the score matrices, a natural multi-head energy functional is non-decreasing along both flat and spherical dynamics. We identify the key obstruction to per-head monotonicity as radial shadow terms, which are projections of each head's output onto token directions, persisting even under orthogonality assumptions. We introduce a sufficient condition ensuring monotonicity and establish robustness to approximate orthogonality.
In a simplified scalar-head regime with equiangular token configurations, we derive a closed-form expression for the critical inverse temperature governing clustering behavior, and show that heterogeneous heads exhibit super-additive clustering rates. In this regime, we also prove a separation in clustering time between ReLU and softmax attention in the linearized dynamics. Finally, we establish an entropy production identity and show that attention entropy increases monotonically toward equilibrium as clustering progresses.
Our results provide a unified perspective on the dynamics of multi-head attention and clarify the mechanisms underlying clustering and stability in transformer models.

[119]  arXiv:2605.04280 [pdf, ps, other]
Title: Revocation-Ready CP-ABE Key Management for Blockchain-Based IoT Data Sharing
Authors: Chun Yin Chiu
Comments: 14 pages, 7 figures, 4 tables. Accepted by the 2026 8th Blockchain and Internet of Things Conference (BIOTC 2026). Preprint
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Blockchain-based IoT data sharing systems increasingly adopt a hybrid architecture in which a permissioned ledger stores tamper-evident metadata while encrypted payloads are placed in content-addressed storage. In such systems, a central security bottleneck is key access control: enforcing dynamic, multi-user authorization for releasing or using bulk-data decryption keys. Existing designs often rely on always-online RBAC or smart-contract gates that return keys to authorized users, reintroducing a trusted online policy enforcement point and weakening auditability. This paper presents a revocation-ready key management layer that replaces online key release with ciphertext key publication: the ledger records metadata of the form (CID, CK, PolicyID, epoch), where CK is a CP-ABE ciphertext encapsulating an AES-GCM key. Users retrieve CK from the ledger and decrypt locally if their attributes satisfy the policy.
To support forward revocation and policy evolution without re-encrypting large files, the design introduces an epoch/time-bound attribute and a lightweight CK-rotation protocol that updates only small ciphertext keys and ledger entries. We implement a minimal end-to-end prototype using a local content-addressed store, a hash-chained ledger, and a CP-ABE backend, with the goal of isolating key-management costs rather than benchmarking production blockchain throughput. Experiments on a commodity MacBook show that CP-ABE encryption dominates store latency, with approximately 186 ms for a k=6 mixed-Boolean policy, while ledger and storage operations remain around 1-2 ms. Epoch-based revocation amortizes key update cost under churn, gateway-assisted mode reduces median client-side decryption time by more than 4x under a simulated 4x client slow-down, and ledger growth scales with the number of shared assets rather than the number of readers.

[120]  arXiv:2605.04282 [pdf, ps, other]
Title: Hardware-Aware Neural Feature Extraction for Resource-Constrained Devices
Comments: This paper has been accepted for publication at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. \c{opyright}IEEE
Subjects: Machine Learning (cs.LG)

Visual SLAM is a core component of spatial computing systems, yet deploying learned local feature extractors on microcontroller-class hardware remains challenging due to memory, bandwidth, and quantization constraints. While modern neural descriptors provide strong robustness, their practical adoption is often hindered by system-level bottlenecks that are not captured by FLOP-based efficiency metrics. In this work, we introduce Gideon, a hardware-aware neural feature extractor explicitly designed for resource-constrained devices. Our approach combines relational knowledge distillation from a SuperPoint teacher with differentiable neural architecture search (DNAS) under strict memory and operator constraints. Unlike conventional design pipelines, we treat quantization stability and dynamic-range compactness as first-class objectives. We show that architectural choices such as replacing Batch Normalization with affine layers significantly improve INT8 robustness, and that descriptor dimensionality directly governs quantization resilience. Deployed on STM32N6, Gideon achieves 9.003 ms inference time (111 fps) while remaining below a 1.5 MB memory footprint. Remarkably, INT8 quantization induces negligible degradation and occasionally matches full-precision performance. These results demonstrate that robust learned feature extraction can be reconciled with embedded hardware constraints through holistic hardware-algorithm co-design.

[121]  arXiv:2605.04286 [pdf, ps, other]
Title: Probabilistic Classification and Uncertainty Quantification of Sahara Desert Climate Using Feedforward Neural Networks
Subjects: Machine Learning (cs.LG); Applications (stat.AP); Computation (stat.CO)

Climate classification plays a vital role in agricultural planning, hydrological studies, and climate science. One of the most widely used systems for classifying global climate zones is the K\"oppen-Trewartha (KT) classification. However, the KT classification is fundamentally deterministic, offering discrete labels to spatial locations without accounting for uncertainties in classification. In this paper, we provide a framework for probabilistic modeling of climatic zones. We implement a feedforward artificial neural network (ANN) for classification, allowing for efficient, uncertainty-aware categorization of climatic regions, thereby offering a more nuanced understanding of transitional climate zones compared to traditional deterministic methods. We apply this method to the Sahara Desert region over the 30-year period of 1960 - 1989, using data at more than 400,000 space-time locations from the first 11 years to train our model. We assess the model's short- and long-term classification capabilities to evaluate its stability and accuracy over time. We also compare the probabilistic classification from our model with the traditional KT classification. In addition, we use fluctuation analysis methods to highlight the temporal evolution of climatic zones across the Sahara region and identify areas undergoing significant flux of probabilities of their climate classes, providing insights into broader trends in desertification.

[122]  arXiv:2605.04289 [pdf, ps, other]
Title: Building Power Grid Models from Open Data: A Complete Pipeline from OpenStreetMap to Optimal Power Flow
Comments: All models are publicly released at this https URL
Subjects: Systems and Control (eess.SY)

Access to realistic transmission grid models is essential for power systems research, yet detailed network data in the United States remains restricted under critical-infrastructure regulations. We present a pipeline that constructs complete, OPF-solvable transmission network models entirely from publicly available data. The five-stage pipeline (1) extracts power infrastructure from OpenStreetMap via a local Overpass API instance, (2) reconstructs bus-branch topology through voltage inference, line merging, and transformer detection, (3) estimates electrical parameters using voltage-class lookup tables calibrated with U.S. Energy Information Administration (EIA) plant-level data, (4) allocates hourly demand from EIA-930 to individual buses using US Census population as a spatial proxy, and (5) solves both DC and AC optimal power flow using PowerModels.jl with a progressive relaxation strategy that automatically loosens constraints on imprecise models. We validate the pipeline on all 48 contiguous US states and six multi-state regions, including the full Western (5,076 buses) and Eastern (21,697 buses) Interconnections. Of the 48 single-state models, 42 (88%) converge at the strictest relaxation level for AC-OPF at peak hour and 44 (92%) off-peak. Dispatch costs (median $22/MWh) and system losses (median 1.0%) are consistent with real wholesale-market outcomes. The pipeline relies exclusively on open data sources, enabling reproducible grid analysis without proprietary data. All 54 models (48 single-state and 6 multi-state) are publicly released at https://github.com/microsoft/GridSFM.

[123]  arXiv:2605.04290 [pdf, ps, other]
Title: StormWave: An Open-Source Portable SDR Platform for Over-the-Air Resilience Evaluation of Terrestrial and Aerial Communications
Comments: 7 pages, 10 figures
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)

This paper presents \emph{StormWave}, an open-source, portable software-defined Radio Frequency (RF) interference generation and monitoring platform designed for realistic field-based evaluation of the resilience of wireless communication systems. StormWave enables seamless composition and runtime switching among a wide range of narrowband and wideband waveforms, while supporting multiple digital modulations, adaptive coding, and multi-radio orchestration with real-time spectrum visualization. We evaluate the effectiveness of StormWave through both outdoor ground and air-to-air (A2A) experiments. Ground experiments demonstrate clear waveform- and modulation-dependent interference effects under realistic propagation conditions, while A2A experiments reveal pronounced distance-dependent constellation distortion and access-symbol degradation under active interference. The StormWave source code will be released to the community, with the expectation that StormWave will be used as a flexible, extensible, and field-ready platform for systematically validating interference resilience of wireless systems under realistic operating conditions.

[124]  arXiv:2605.04291 [pdf, ps, other]
Title: Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion
Comments: To appear in ACL 2026
Subjects: Machine Learning (cs.LG)

We present a discrete diffusion-based language model using Glauber dynamics from statistical physics. Our main insight is that instead of trying to train a discrete state space diffusion model using Glauber dynamics with a uniform transition kernel as the forward process, one can set up an ``energy function'' based on pretrained causal/masked language models. When viewed as the stationary distribution, this energy function allows us to significantly improve the quality of the generated text. Incorporating UL2 as the pretrained model into our diffusion pipeline, we outperform prior diffusion based LMs and perform competitively with autoregressive models of comparable model sizes. Furthermore, our models are competitive with or outperform prior diffusion models and GPT-2 style auto-regressive models on zero-shot common sense reasoning tasks as well as planning and search tasks like Sudoku and Zebra puzzles.

[125]  arXiv:2605.04295 [pdf, ps, other]
Title: LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy
Comments: Accepted for publication in the Proceedings of IJCAI 2026, the 35th International Joint Conference on Artificial Intelligence, 12 Pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

LLMs' overconfidence, particularly when hallucinating, poses a significant challenge for the deployment of the models in safety-critical settings and makes a reliable estimation of uncertainty necessary. Existing approaches for uncertainty quantification typically prioritize lexical or probabilistic measures; however, these techniques often ignore the semantic variance of different responses with similar meaning. In this paper, we propose Adaptive Conformal Semantic Entropy (ACSE), a method for estimating prompt-level uncertainty by adaptively measuring semantic dispersion in LLMs outputs. Our uncertainty scoring function is based on clustering semantic entropy of multiple diverse responses to the same prompt. The function adaptively adjusts the uncertainty score based on semantic features of each cluster. To ensure statistical reliability of our score, we use conformal calibration to apply a decision rule to accept/abstain the prompts, providing a finite-sample, distribution-free guarantee such that the error rate among the accepted responses remains bounded by a user-specified tolerance. Our extensive experimental evaluations using different LLMs and datasets, demonstrate that our approach consistently outperforms state-of-the-art uncertainty quantification baselines using discriminative performance, conformal guarantees, and probabilistic calibration indicators. As a highlight, for TriviaQA dataset, AUROC of our approach is 0.88 compared to 0.65 produced by the token entropy approach.

[126]  arXiv:2605.04296 [pdf, ps, other]
Title: Dynamic Quantum-Assisted Co-Design of Control Tuning and Lyapunov Stability Synthesis for Nonlinear Systems
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper proposes a dynamic quantum-assisted co-design framework for nonlinear closed-loop systems in which controller parameters and Lyapunov-certificate parameters are redesigned jointly at successive decision epochs. Unlike conventional nonlinear control designs that typically tune controller gains offline and verify stability separately, the proposed method embeds performance improvement and Lyapunov-based stability synthesis within a unified online optimization loop. The main novelty is a two-step computational structure that first contracts the continuous admissible search region around the current operating condition using a Black-Hole-based calibration procedure and then constructs a finite binary representation only over this calibrated region. The encoded objective is obtained from sampled nonlinear closed-loop evaluations and approximated by a local quadratic pseudo-Boolean surrogate, enabling an Ising-type Hamiltonian representation suitable for quantum-assisted optimization. Quantum imaginary time evolution is then used to explore the encoded Hamiltonian, and the resulting candidate bitstrings are decoded into continuous controller and Lyapunov parameters. To reduce dependence on the surrogate model, the decoded candidates are re-evaluated using the original nonlinear closed-loop cost and Lyapunov penalties before the final update is applied. The framework can accommodate different Lyapunov decay specifications by modifying the stability penalty and is validated on first-order nonlinear consensus, second-order nonlinear consensus, and induction-motor drive control examples. The implementation code used to generate the reported results is available at \href{https://github.com/LSU-RAISE-LAB/DQCLS-NS}{GitHub}.

[127]  arXiv:2605.04298 [pdf, ps, other]
Title: Towards Self-Referential Analytic Assessment: A Profile-Based Approach to L2 Writing Evaluation with LLMs
Comments: Accepted for the 21st Workshop on Innovative Use of NLP for Building Educational Applications
Subjects: Computation and Language (cs.CL)

Automated essay scoring (AES) research often relies on rank-based correlation metrics to validate analytic assessment. However, such metrics obscure both intrinsic intercorrelations among analytic dimensions that arise from the structure of writing proficiency itself and halo effects, whereby holistic impressions bleed into fine-grained component scores. As a result, high correlations may mask a system's true diagnostic behaviour. In this study, we propose a novel self-referential assessment evaluation framework that focuses on identifying intra-learner strengths and weaknesses rather than assessing inter-learner rankings. We conduct experiments on the publicly available ICNALE GRA, a uniquely dense second-language writing dataset annotated holistically and analytically by up to 80 trained raters. To obtain reliable reference scores, we apply two-facet Rasch modelling to calibrate rater severity and derive fair average scores across ten analytic aspects and holistic proficiency. We compare the analytic scoring performance of human operational raters and three large language models (LLMs) in a zero-shot setting. Our results show that LLMs tend to outperform single human raters in identifying relative weaknesses (negative feedback) across several proficiency aspects, while human raters remain stronger at identifying relative strengths (positive feedback). Overall, our findings highlight the limitations of rank-based evaluation for analytic assessment and demonstrate the value of intra-learner, profile-based methods for assessing and deploying LLMs in AES.

[128]  arXiv:2605.04299 [pdf, ps, other]
Title: Beyond Fixed Thresholds and Domain-Specific Benchmarks for Explainable Multi-Task Classification in Autonomous Vehicles
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Scene understanding is a vital part of autonomous driving systems, which requires the use of deep learning models. Deep learning methods are intrinsically black box models, which lack transparency and safety in autonomous driving. To make these systems transparent, multi-task visual understanding has become crucial for explainable autonomous driving perception systems, where simultaneous prediction of multiple driving behaviors and their underlying explanations is essential for safe navigation and human trust in autonomous vehicles. In order to design an accurate and cross-cultural explainable autonomous driving system, we introduce a comprehensive confidence threshold sensitivity analysis that evaluates various threshold values to identify optimal decision boundaries for different tasks. Our analysis demonstrates that traditional fixed threshold approaches are suboptimal for multi-task scenarios. Through extensive evaluation, we demonstrate that our adaptive threshold selection methodology improves F1-scores across different tasks. In addition, we introduce IUST-XAI-AD, a novel dataset consisting of 958 images with human annotations for driving decisions and corresponding reasoning. This dataset addresses the critical gap in domain-specific evaluation benchmarks for distinct driving contexts and provides a more challenging test environment compared to existing datasets. Experimental results demonstrate that confidence threshold sensitivity analysis can significantly improve model performance, while the introduction of the IUST-XAI-AD dataset reveals important insights about cross-cultural driving behavior patterns. The combined contributions of this work provide both methodological advances and practical evaluation tools that can accelerate the development of more reliable, explainable, and culturally-adaptive autonomous driving systems for global deployment.

[129]  arXiv:2605.04302 [pdf, ps, other]
Title: Rigid homotopies for sampling from algebraic varieties: a Waring structure complexity model
Comments: 29 pages, 3 figures, 2 tables
Subjects: Numerical Analysis (math.NA); Computational Complexity (cs.CC); Algebraic Geometry (math.AG)

Polynomial system solving has seen major progress in both theory and practice over the past decade. A landmark achievement was addressing Smale's 17th problem, establishing average-case polynomial-time algorithms for computing approximate solutions of polynomial systems via homotopy continuation. Recent improvements in complexity bounds for these algorithms led to the development of rigid homotopy methods. In this article, we prove a new complexity result for rigid homotopies for polynomial systems with Waring representations of prescribed length. In addition, we provide the first computational experiments for rigid homotopies using a preliminary implementation.

[130]  arXiv:2605.04304 [pdf, ps, other]
Title: Hierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoning
Comments: Accepted to ACL 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Advanced chart question answering requires both precise perception of small visual elements and multi-step reasoning across several subplots. While existing MLLMs are strong at understanding single plots, they often struggle with multi-step reasoning across multiple subplots. We propose HierVA, a hierarchical visual agent framework for chart reasoning that iteratively constructs and updates a working context in a joint image--text space. A high-level manager generates plans and maintains a compact context containing only key information, while specialized workers perform reasoning, gather evidence, and return results. In particular, the agent maintains separate visual and textual contexts, using a zoom-in tool to restrict the visual context. Experiments on the CharXiv reasoning subset demonstrate consistent improvements over strong multimodal baselines, and ablation studies verify that hierarchical architecture, scoped visual context, and distilled context contribute complementary gains.

[131]  arXiv:2605.04305 [pdf, ps, other]
Title: SWAN: Semantic Watermarking with Abstract Meaning Representation
Comments: Accepted to ACL 2026 Main
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computers and Society (cs.CY)

We introduce SWAN (Semantic Watermarking with Abstract Meaning Representation), a novel framework that embeds watermark signatures into the semantic structure of a sentence using Abstract Meaning Representation (AMR). In contrast to existing watermarking methods, which typically encode signatures by adjusting token selection preferences during text generation, SWAN embeds the signature directly in the sentence's semantic representation. As the signature is encoded at the semantic structure level, any paraphrase that preserves meaning automatically preserves the signature. SWAN is training-free: watermark injection is achieved by prompting an LLM to generate sentences guided by a selected AMR template while maintaining contextual coherence, and detection uses an off-the-shelf AMR parser followed by a simple one-proportion z-test. Empirical evaluation on the RealNews benchmark shows SWAN matches state-of-the-art detection performance on unaltered watermarked text, while significantly improving robustness against paraphrasing, increasing detection AUC by up to 13.9 percentage points compared to prior methods. These results demonstrate that SWAN's approach of anchoring watermarks in AMR semantic structures provides a simple, effective, and prompt-based method for robust text provenance verification under paraphrasing, opening new avenues for semantic-level watermarking research.

[132]  arXiv:2605.04306 [pdf, ps, other]
Title: dtour: a steerable tour de vis through high-dimensional data
Subjects: Human-Computer Interaction (cs.HC)

Understanding high-dimensional data requires projecting it into lower-dimensional spaces, but any single projection inevitably loses information or introduces distortions. Tours address this limitation through animation of 2D projection sequences, yet existing tools present tradeoffs in the freedom and steerability of projection traversal, providing little to no ability to move between expert-guided paths and unrestrained exploration. We present dtour, a tour interface that combines static projection previews, reversible scrubbing along continuous geodesic projection paths, manual projection manipulation, and a wandering grand tour, all within a single progressive exploration interface. dtour scales to millions of points via GPU-accelerated rendering, runs in any modern browser, and integrates with both Python and JavaScript ecosystems. We demonstrate dtour on text, image, and single-cell data for two usage scenarios: gradually revealing structure in high-dimensional data and validating non-linear dimensionality reduction outputs.

[133]  arXiv:2605.04308 [pdf, ps, other]
Title: Memory as a Markov Matrix: Sample Efficient Knowledge Expansion via Token-to-Dictionary Mapping
Comments: Accepted to ICML 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Continual incorporation of new knowledge is essential for the long-term evolution of large language models (LLMs). Existing approaches typically rely on parameter-update algorithms to mitigate catastrophic forgetting, yet they suffer from fundamental limitations: 1) forgetting is unavoidable as the amount of newly injected knowledge grows; and 2) model updates are often irreversible. As modern LLMs become increasingly expressive, it is natural to question whether large-scale weight updates are necessary for acquiring a small amount of new knowledge. In this work, we propose a principled framework that models autoregressive language generation as a Markov process over tokens, where model memory is represented by a Markov transition matrix. Under this formulation, incorporating new knowledge/tokens corresponds to extending the state space, and preserving existing transitions guarantees retention of previously learned knowledge. We then prove a sample complexity bound for incorporating new tokens via a token-to-dictionary mapping strategy. In particular, for learning the transition behavior of each new token, the required number of samples scales linearly with the number of existing tokens it is mapped to. To realize this mapping, we propose an embedding-tuning algorithm that requires minimal parameter updates and induces zero forgetting. Experimental results further demonstrate the effectiveness of our method and validate our theoretical findings.

[134]  arXiv:2605.04309 [pdf, ps, other]
Title: Interpreting V1 Population Activity via Image-Neural Latent Representation Alignment
Subjects: Neural and Evolutionary Computing (cs.NE)

Understanding the neural mechanisms underlying visual computation has long been a central challenge in neuroscience. Recent alignment based approaches have improved the accuracy of decoding visual stimuli from brain activity, yet they provide limited insight into the neural computations that give rise to these improvements. To address this gap, we propose Dual-Tower Image-Neural Alignment (DINA), an interpretable contrastive framework for analyzing population level visual computations in primary visual cortex (V1). DINA jointly trains a biologically motivated dual-tower architecture that aligns visual stimuli and corresponding V1 population responses in a shared latent space at the level of intermediate feature maps, enabling both accurate decoding and direct access to interpretable feature maps. Evaluated on large-scale two-photon calcium imaging data from mouse V1, DINA achieves accurate neural-based decoding while revealing that decoding performance is primarily supported by coarse, low-level visual structure, rather than semantic category information or fine-grained details. Further analysis reveals that alignable feature maps emerge from multiple spatially distributed image regions, capturing both shape and texture cues, and are predominantly reconstructed by sparse subsets of strongly responsive neurons and their functional interactions. Together, these results confirm that, beyond enabling accurate decoding, DINA provides a principled framework for probing the computational mechanisms underlying visual processing in V1.

[135]  arXiv:2605.04310 [pdf, ps, other]
Title: ClusterLess: Deadline-Aware Serverless Workflow Orchestration on Federated Edge Clusters
Comments: 11 pages, 12 figures, 3 algorithms, 3 tables, accepted in IEEE ICDCS 2026
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The recent convergence of edge computing, serverless execution, and Kubernetes (K8s) based container orchestration has enabled the processing of application workflows close to data sources. While effective within a single edge cluster, existing schemes do not generalize to federated multi edge environments, where multiple workflows execute concurrently under strict end to end (E2E) deadline constraints. This paper introduces ClusterLess, a deadline aware serverless workflow orchestration method for federated multi edge K8s clusters. ClusterLess manages the E2E lifecycle of workflow execution, including dependency analysis, execution mode selection, and resource aware placement. To this end, it integrates structured intra cluster orchestration with a leader selected, super master driven intercluster coordination layer, determining where and how each workflow function should be executed across the federated edge clusters. We implement ClusterLess using OpenFaaS as the serverless execution substrate and Argo for workflow management, and deploy it on a realistic testbed of six edge clusters comprising 64 heterogeneous edge nodes. Experimental results with concurrent serverless workflows, spanning 18 workload configurations across different input sizes and deadline classes, show that ClusterLess reduces workflow completion time by up to 40 %, increases deadline satisfaction from below 50 % to over 90 %, and confines deadline violations to single digit seconds compared to four baseline methods.

[136]  arXiv:2605.04312 [pdf, ps, other]
Title: Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games
Authors: Connacher Murphy
Comments: 15 pages, 3 figures, 3 tables
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Static capabilities benchmarks suffer from saturation and contamination, making it difficult to track capabilities progress over time. We introduce Agent Island, a multiplayer simulation environment in which language-model agents compete in a game of interagent cooperation, conflict, and persuasion. The environment yields a dynamic benchmark designed to mitigate both saturation and contamination; new models can always outperform the current leading player in this winner-take-all game, and agents compete against other adaptive agents rather than face a fixed task set. We rank players with a Bayesian Plackett-Luce model, allowing us to quantify uncertainty in player skill. In 999 games involving 49 unique models, openai/gpt-5.5 dominates its peers with a posterior mean skill of 5.64, compared with 3.10 for the second-ranked model, openai/gpt-5.2, and 2.86 for the third-ranked model, openai/gpt-5.3-codex. We release the game logs as a dataset for analyses of model behavior. As an example, we investigate same-provider preference in final-round votes and find that models are 8.3 p.p. more likely to support a same-provider finalist than finalists from other providers. This preference is not uniform across providers: among separately estimated providers, the effect is strongest for OpenAI models and weakest for Anthropic models.

[137]  arXiv:2605.04313 [pdf, ps, other]
Title: NoisyCausal: A Benchmark for Evaluating Causal Reasoning Under Structured Noise
Authors: Zhi Xu, Yun Fu
Comments: ACL oral accept; 5 figures, 8 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Causal reasoning in natural language requires identifying relevant variables, understanding their interactions, and reasoning about effects and interventions, often under noisy or ambiguous conditions. While large language models (LLMs) exhibit strong general reasoning abilities, they struggle to disentangle correlation from causation, particularly when observations are partially incorrect or irrelevant information is present. In this work, we introduce NoisyCausal, a new benchmark designed to evaluate causal reasoning under structured noise. Each instance is generated from a ground-truth causal graph and contextualized with a natural language scenario by injecting controllable forms of noise, such as irrelevant distractors, value perturbations, confounding, and partial observability. Moreover, we propose a modular reasoning framework that combines LLMs with explicit causal structure to address these challenges. Our method prompts the LLM to extract variables, construct a causal graph from context, and then reformulates the reasoning task as a structured prompt grounded in this graph. Rather than relying on statistical patterns alone, the LLM is guided by symbolic structure, enabling more interpretable and robust inference. Experimental results show that our method significantly outperforms standard prompting and reasoning baselines on NoisyCausal. Furthermore, it generalizes well to external benchmarks such as Cladder without task-specific tuning. Our findings highlight the importance of combining causal abstractions with language-driven reasoning to achieve faithful and robust causal understanding in LLMs.

[138]  arXiv:2605.04316 [pdf, ps, other]
Title: Orchestrating Serverless Applications in the Edge Cloud Space Continuum: What Breaks and What is Next?
Comments: 11 pages, 2 figures, 2 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Serverless computing has matured into an effective execution model for edge cloud environments, enabling function level decomposition, demand driven scaling, and workflow execution across stable, well provisioned infrastructure. This success motivates extending it to the edge cloud space continuum, where Low Earth Orbit (LEO) constellations are increasingly explored as distributed compute substrates. However, existing serverless orchestration is not directly applicable in this setting, where LEO systems impose time varying contact graphs, intermittent link availability, and strict feasibility constraints on energy, memory, communication, and operational cost. This article identifies ten broken assumptions in existing serverless orchestration and organizes them into three core challenges: spatiotemporal execution over dynamic graphs, constraint aware function placement and scaling, and correctness and progress under decentralized and delayed state. It then proposes an architecture that enables robust and efficient serverless execution across the continuum, grounded in these challenges and demonstrated through a representative flood response use case.

[139]  arXiv:2605.04320 [pdf, ps, other]
Title: Reproduction Test Generation for Java SWE Issues
Subjects: Software Engineering (cs.SE)

Given an issue on a software repository, a reproduction test confirms its presence in the code before it gets fixed and its absence after. Reproduction tests provide crucial execution-based feedback for diagnosis and validation during software development. Unfortunately, they are usually missing. Therefore, recent work has introduced both benchmarks and a thriving literature on solutions for reproduction test generation from issues. However, that work has focused on Python and neglected other languages such as Java, which is important for enterprise software. This paper introduces both a benchmark and a solution for Java repository-level reproduction test generation. The benchmark, TDD-Bench-Java, is the first to model this problem and comprises 250 instances sourced from popular open-source repositories. The solution, e-Otter++ for Java, adapts a state-of-the-art reproduction test generator for Python to yield high performance on Java. To evaluate in an industry setting, besides empirical results with TDD-Bench-Java, this paper also presents results with a contamination-free proprietary dataset. Overall, we hope that this paper contributes to bringing better diagnosis and validation to Java software development.

[140]  arXiv:2605.04321 [pdf, ps, other]
Title: AI and Suicide Prevention: A Cross-Sector Primer
Comments: 39 pages, 3 figures, 2 tables
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

AI chatbots already function as de facto mental health support tools for millions of people, including people in crisis. Yet, they lack the clinical validation, shared standards, and coordinated oversight that their societal role demands. This primer was developed in conjunction with a multistakeholder workshop hosted by Partnership on AI in 2026, convening AI labs, mental health practitioners, people with lived experience, and policymakers, to provide a common cross-sector reference point for the current state of the field of AI and suicide prevention. It begins with an overview of clinical best practices, then turns to how frontier AI systems (as of winter 2026) detect and respond to suicide and non-suicidal self-injury (NSSI) queries. Together, these provide insight into what it would take to design and implement AI tools that not only better prevent suicide and NSSI, but also promote overall well-being. Drawing on clinical literature, publicly available AI lab policies, an emerging landscape of evaluation frameworks, and conversations with leaders across the AI and mental health fields, we map challenges posed by general-purpose AI chatbots for mental health across model, product, and policy layers, ultimately highlighting priority areas where cross-industry alignment is both urgently needed and achievable.

[141]  arXiv:2605.04323 [pdf, ps, other]
Title: LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems
Comments: 27 pages, 7 figures, 1 table
Subjects: Machine Learning (cs.LG); Databases (cs.DB)

Understanding soil is fundamental to agriculture, carbon cycling, and environmental sustainability, yet progress is limited by fragmented and heterogeneous datasets that constrain modeling to small-scale predictive settings rather than high-dimensional representation learning. We introduce LUCAS-MEGA, a large-scale multimodal dataset constructed through systematic data fusion of European soil-environment observations, with the LUCAS survey as its backbone. The fused dataset comprises over 70,000 samples and more than 1,000 features spanning physical, chemical, environmental, biological, and visual attributes, aggregated from 68 source datasets. To enable integration at scale, we develop SoilFuser, a multi-agent, human-in-the-loop data fusion pipeline that standardizes heterogeneous data formats and measurement protocols, resolves inconsistencies and invalid entries (e.g., unit inconsistencies, codebook mismatches, and erroneous values), incorporates natural language annotations, and harmonizes multimodal attributes and metadata into a unified, machine learning-ready feature space. The resulting dataset captures key characteristics of real-world soil observations, including multimodality, uneven feature coverage, and heterogeneous uncertainty. To demonstrate the usability of LUCAS-MEGA for data-driven modeling, we pretrain a multimodal tabular transformer (SoilFormer) using a self-supervised objective based on feature masking, achieving stable training, strong predictive performance, and representations that support uncertainty-aware prediction. We further show that the learned representations recover relationships consistent with established soil processes. LUCAS-MEGA is released with open access and is accompanied by composable, agent-friendly APIs that support structured querying and data-driven workflows.

[142]  arXiv:2605.04324 [pdf, ps, other]
Title: DeFed-GMM-DaDiL: A Decentralized Federated Framework for Domain Adaptation
Subjects: Machine Learning (cs.LG)

Decentralized multi-source domain adaptation seeks to transfer knowledge from multiple heterogeneous and related source domains to an unlabeled target domain in a decentralized setting. We address this challenge through a fully decentralized federated approach, DeFed-GMM-DaDiL, an extension of the GMM-Dataset Dictionary Learning (DaDiL) framework. Each client models its dataset as a Gaussian Mixture Model (GMM), and the federation jointly approximates them via labeled Wasserstein barycenters of shared, learnable GMM atoms. This design enables adaptation without a central server while preserving clients' privacy. We empirically study the stability of the learned representations in scenarios where the target domain has missing classes. Empirical results demonstrate that DeFed-GMM-DaDiL maintains stable and consistent shared representations across clients, effectively reconstructs missing classes, and achieves competitive performance on multi-source domain adaptation benchmarks.

[143]  arXiv:2605.04325 [pdf, ps, other]
Title: On the Architectural Complexity of Neural Networks
Comments: 67 pages, 54 figures, 11 tables
Subjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM); Combinatorics (math.CO)

We introduce a unified theoretical framework for the rigorous analysis and systematic construction of deep neural networks (DNNs). This framework addresses a gap in existing theory by explicitly modeling the structure of tensor operations -- lower level information that is often abstracted. Our framework enables two novel objectives: (1) analysis of the evolution of architectural complexity over deep learning history, and (2) automatic construction of novel architectures based on new types of tensor operations. Our study of DNNs introduced over the past 40 years reveals a connection between groundbreaking architectures and increases in different types of architectural complexity. Moreover, we identify several large classes of higher complexity architectures that have not yet been explored. We then collect a dataset of 3,000+ higher complexity architectures, which we publicly release at: https://github.com/combinatoriallabs/ArchitecturalComplexity.

[144]  arXiv:2605.04327 [pdf, ps, other]
Title: From Language to Logic: A Theoretical Architecture for VLM-Grounded Safe Navigation
Comments: 8 pages, 3 figures, to be published in ICUAS 2026 conference proceedings
Subjects: Robotics (cs.RO)

We propose an architecture for integrating high-level, human-provided safety rules and operator-aligned semantic preferences into autonomous robot navigation in unstructured outdoor environments. In our approach, natural-language rules are translated into Signal Temporal Logic (STL) specifications that guide planning and navigation during runtime. Persistent, environment-centric rules and terrain preferences are grounded into a 2D cost map, while temporally dynamic requirements are expressed as STL specifications to be monitored during runtime. We hypothesize the use of Vision-Language Models (VLMs) for zero-shot scene understanding, enabling mapping between human instructions, semantic features, and environmental constraints. Within this framework, we construct an illustrative navigation model that is designed to satisfy a set of STL-encoded specifications and soft operator preferences through formal satisfaction metrics embedded into environmental properties and runtime monitoring.

[145]  arXiv:2605.04330 [pdf, ps, other]
Title: The Scaling Properties of Implicit Deductive Reasoning in Transformers
Comments: preprint
Subjects: Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Logic in Computer Science (cs.LO); Symbolic Computation (cs.SC)

We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.

[146]  arXiv:2605.04332 [pdf, ps, other]
Title: Learning-based Statistical Refinement for Denoising
Authors: Rihuan Ke
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

This work proposes a learning-based statistical refinement method for improving the denoising results of a given denoiser without knowing the precise noise distribution or accessing clean images or calibration data. While there are many existing successful denoising approaches for handling different kinds of noise, they typically require accurate modelling of the images and the noise (implicitly or explicitly), and hence the denoising results can be suboptimal due to different practical factors such as imperfect models, unreliable noise assumptions, or low quality data. In particular, when clean image samples are not available and there is a lack of knowledge of the underlying noise distribution, which is the case in various practical situations, the results may not well align with the noise statistics. The unawareness of the useful statistical information leads to suboptimal results. This work aims to make the best use of the statistical information to improve the consistency between the given denoising results and the noise statistics, under the assumption that the noise is conditionally pixel-wise independent given the clean signal. A method, based on a Bayesian formulation of an auxiliary signal in the noisy data, is proposed for evaluating the consistency of the denoising results, without precise information on noise distribution. By leveraging the statistical information from noisy data, the method enhances the statistical noise consistency and improves denoising quality.

[147]  arXiv:2605.04333 [pdf, ps, other]
Title: Resilient AI Supercomputer Networking using MRC and SRv6
Comments: 18 pages, 22 figures
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Tail latency dominates the performance of synchronous pretraining jobs when running at very large scales. We describe a three-pronged approach: (1) a new RDMA-based transport protocol, MRC, sprays across many paths and actively load-balances between them, eliminating the issue of flow collisions (2) the use of multi-plane Clos topologies to get the benefits of high switch radix and redundancy, allowing training clusters well over 100K GPUs to be built as two-tier topologies while increasing physical redundancy, and (3) the use of static source-routing using SRv6 to allow MRC the freedom to bypass failures by itself. We describe our experiences running MRC and static SRv6 routing in production in OpenAI and Microsoft's largest training clusters, where it has been used to train the latest frontier models. We demonstrate how MRC allows AI training jobs to ride out many network failures that previously would have interrupted training.

[148]  arXiv:2605.04334 [pdf, ps, other]
Title: Science discussions of retracted articles on Bluesky: public scrutiny or misinformation spreading?
Comments: 26 pages, 5 figures
Subjects: Digital Libraries (cs.DL); Computers and Society (cs.CY); Social and Information Networks (cs.SI)

Post-publication peer review (PPPR) has emerged as an important supplement to traditional peer review, with social media playing a growing role in publicising potential problems in published research. However, it remains unclear whether social media discussions of retracted articles primarily reflect good practices, such as exposing flaws and acknowledging retraction status, or bad practices, such as overlooking retractions and continuing to disseminate scientific misinformation. In this study, we collected Bluesky posts referencing scholarly articles from Altmetric and retrieved metadata for the referenced articles using OpenAlex. The final dataset included 284 retracted articles with 79 pre-retraction posts and 857 post-retraction posts, 59 retraction notices with 186 posts, and 609,461 non-retracted articles with 1,344,756 posts. We manually coded Bluesky posts discussing retracted articles to identify instances of good and bad practice. The results show that posts demonstrating good practice (89.9%) substantially outnumbered those demonstrating bad practice (10.1%). Posts reflecting good practice also had more user engagement. In the pre-retraction phase, good practice posts constituted a slight minority (43.0%), whereas in the post-retraction phase they were dominant (94.2%). Most negative posts in the pre-retraction phase (90.0%) had good practice while only 17.3% positive posts in the post-retraction phase showed bad practice. Thus, sentiment analysis can be helpful to filter posts that could flag potential flaws before retraction, but it may struggle to accurately identify the spread of misinformation after retraction. More broadly, this study highlights the potential of Bluesky to support responsible scientific communication, public scrutiny, and research integrity.

[149]  arXiv:2605.04340 [pdf, ps, other]
Title: Analysis of a Competitive Bivirus SIS Epidemic Model with Game Theoretic Social Distancing
Subjects: Systems and Control (eess.SY)

We propose a competitive bi-virus model with dynamic social distancing behavior. Our model illustrates how public perception of different viruses changes the conditions for their eradication, their coexistence, or the dominance of one over the other. We show that our model is not monotone, in contrast to the classic bi-virus model. We detail how social distancing behavior produces different sets of equilibria than the classic bi-virus model and changes the criteria for their stability. In particular, we detail the set of disease free equilibria (DFE) present in our model and identify necessary and sufficient conditions for almost global exponential stability of the same. We prove similar global results for all but one non-DFE isolated (unilateral) equilibria and local stability results for the remainder. We also consider coexistence equilibria; we show such equilibria, when they exist, take the form of lines of equilibria and give local conditions for their stability. Finally, we illustrate our theoretical findings with numerical examples.

[150]  arXiv:2605.04341 [pdf, ps, other]
Title: Budgeted LoRA: Distillation as Structured Compute Allocation for Efficient Inference
Comments: Preprint. 9 pages main text, 18 pages total, 2 figures, 9 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We study distillation for large language models under explicit compute constraints, with the goal of producing student models that are not only cheaper to train, but structurally efficient at inference time. While prior approaches to parameter-efficient distillation, such as LoRA, reduce adaptation cost, they leave the dense backbone unchanged and therefore fail to deliver meaningful inference savings. We propose Budgeted LoRA, a distillation framework that treats model compression as a structured compute allocation problem. Instead of using a fixed student architecture, we introduce a global compute budget that sets the final target fraction of dense computation retained. Under this constraint, the model redistributes capacity across dense and low-rank pathways via (i) module-level dense retention coefficients, (ii) adaptive low-rank allocation, and (iii) post-training compression that selectively removes, approximates, or preserves dense components. This formulation yields a family of students controlled by a single budget dial. Empirically, Budgeted LoRA matches standard LoRA perplexity at a moderate budget with a 1.74x compressed-module speedup; at an aggressive budget it achieves a 4.05x speedup with moderate perplexity degradation, and it preserves higher accuracy on function-style in-context learning probes. These results suggest that, under compute-constrained distillation, retaining behavior is less about matching perplexity or removing more parameters than it is about controlling how dense computation is transferred to low-rank pathways.

[151]  arXiv:2605.04342 [pdf, ps, other]
Title: Adaptive Diagonal Loading for Norm Constrained Beamforming
Comments: 5 pages, 5 figures
Subjects: Systems and Control (eess.SY); Information Theory (cs.IT); Sound (cs.SD); Applications (stat.AP)

Reliable adaptive beamforming is critical for large microphone arrays operating in highly dynamic acoustic environments. In scenarios characterized by fast-moving talkers and interferers, the available sample support for estimating the spatial correlation matrix is often snapshot-deficient. This deficiency, coupled with array imperfections, degrades the White Noise Gain (WNG), leading to severe target signal cancellation. To ensure stable and robust beamforming, we propose a novel adaptive diagonal loading method that guarantees the WNG remains strictly within specified bounds. By leveraging the Kantorovich inequality, we map the desired WNG to a strict upper bound on the condition number of the correlation matrix. Furthermore, we present three estimation techniques for the adaptive loading level, ranging from trace-based bounding to exact eigenvalue decomposition, offering scalable computational complexities of $\mathcal{O}(M)$, $\mathcal{O}(M^2)$, and $\mathcal{O}(M^3)$. Our approach demonstrates highly stable beamforming under fast-changing interference.

[152]  arXiv:2605.04345 [pdf, ps, other]
Title: Structural Equivalence and Learning Dynamics in Delayed MARL
Subjects: Machine Learning (cs.LG)

We formally establish the equivalence between Observation Delay (OD) and Action Delay (AD) in cooperative partially observable multi-agent systems using observation-action histories. We show that both systems generate identical admissible joint-policy sets, and their induced state-action-observation trajectories are identical in distribution, leading to identical optimal solutions in Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). This formally generalizes existing infinite-horizon single-agent results to any-horizon partially observable cooperative multi-agent problems with decentralized policy execution, and allows any mixed-delay configuration to be reduced to a pure OD system. We further prove that in Transition-Independent MDPs (TI-MDPs), the observation-action history reduces to a tractable minimal local augmented state.
However, we show through numerical experiments that although the optimal solution spaces are structurally isomorphic, the practical learning dynamics are fundamentally different. First, using the minimal local augmented state, the equivalence no longer holds when transitions are not independent. Second, operational constraints and causal credit-assignment errors in Temporal Difference (TD) algorithms induce different learning behaviors across regimes. Finally, leveraging this structural equivalence to bypass these learning challenges, we demonstrate successful multi-agent zero-shot policy transfer from OD to AD, paving the way for unified, efficient solution methods in complex delayed systems.

[153]  arXiv:2605.04346 [pdf, ps, other]
Title: Covariance-Aware Goodness for Scalable Forward-Forward Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The Forward-Forward algorithm eliminates global gradient flow and full network activations storage. However, in convolutional settings, existing BP-free FF methods significantly under-perform backpropagation on complex benchmarks such as ImageNet-100 and Tiny-ImageNet. We identify this gap as a structural bottleneck in goodness extraction: standard sum-of-squares formulation collapses feature volumes into channel-wise activation energies which omits critical second-order dependencies. To address this, we propose a framework centered on three key components. First, Bi-axis Covariance Goodness(BiCovG) explicitly augments the standard goodness function with structured second-order information along two axes: cross-channel projections that model inter-feature covariance, and nested multi-scale aggregation that encodes spatial correlation statistics. This provides a tractable approximation to covariance-aware goodness without the prohibitive O(C^2) complexity of explicit matrix estimation. Second, a lightweight Logistic Fusion module aggregates layer-wise predictions, amplifying the contribution of deeper representations. Third, the Feature Alignment Layer(FAL) introduces a zero-initialized correction at block boundaries to mitigate representation misalignment in deep locally trained networks. By introducing these three components, we effectively double the depth of viable Forward-Forward learning, extending robust layer utilization from shallow baselines to 16 layer architectures like VGG-16. The resulting BP-free model achieves 73.01% on ImageNet-100 and 50.30% on Tiny-ImageNet. As a practical extension, Hybrid Goodness Blocks control the scope of gradient propagation via configurable block sizes, further narrowing the ImageNet-100 gap to 3.6% and matching BP on Tiny-ImageNet, while still reducing peak memory by approximately 50% relative to BP.

[154]  arXiv:2605.04352 [pdf, ps, other]
Title: Probing Structural Mathematical Reasoning in Language Models with Algebraic Trapdoors
Authors: Igor Rivin
Subjects: Machine Learning (cs.LG); Group Theory (math.GR)

We introduce a benchmark suite for evaluating structural mathematical reasoning in language models, built on subgroup-construction problems in SL(3, Z) with cryptographic-style verifier-prover asymmetry. Each instance presents a finitely generated subgroup as a list of integer matrices and asks for an arithmetic invariant -- index, surjection-at-prime, or membership -- that the construction-time information (N, K) pins down in O(1) closed form, but that the solver, lacking that information, must derive by either Aschbacher-classification analysis or by a membership query in SL(3, Z) of unknown decidability. The benchmark therefore distinguishes models with internalized algebraic priors (Aschbacher classes, McLaughlin's theorem, Property (T), the congruence subgroup property) from models that rely on general-purpose computation. We report empirical results across five representative reasoning traces from two state-of-the-art models. The headline result: on the index variant, one model spent 152 minutes of reasoning, explicitly identified the kernel-side membership question as the bottleneck, attempted constructive verification, and abstained with "DON'T KNOW" rather than commit to its computed cokernel candidate -- demonstrating calibrated meta-cognition on the open-decidability boundary that the benchmark was designed to probe. We argue that the benchmark exposes a four-way classification of model behavior (commit-correct, commit-wrong, abstain-correct, abstain-wrong) that standard answer-key scoring conflates.

[155]  arXiv:2605.04355 [pdf, ps, other]
Title: InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Autonomous driving systems rely heavily on robust sensor fusion to perceive complex envi- ronments. Traditional setups using RGB cameras and LiDAR often struggle in high-dynamic- range scenes or high-speed scenarios due to motion blur and latency. Dynamic Vision Sensors (DVS), or event cameras, offer a paradigm shift by capturing asynchronous brightness changes with microsecond temporal resolution and high dynamic range. In this paper, we propose an extended architecture of the state-of-the-art InterFuser model, integrating DVS as an additional modality to enhance perception reliability. We introduce a novel token-based fusion strategy that incorporates accumulated event frames into the transformer-based backbone of InterFuser. Our method leverages the complementary nature of RGB, LiDAR, and DVS data. We evaluate our approach on the Car Learning to Act (CARLA) Leaderboard benchmarks, demonstrating that the inclusion of DVS improves the robustness of the driving agent, achieving a competitive Driving Score of 77.2 and a superior Route Completion of 100%. The results indicate that event-based vision is a promising direction for improving safety and performance in adverse lighting and dynamic conditions.

[156]  arXiv:2605.04356 [pdf, ps, other]
Title: Efficiently Aligning Language Models with Online Natural Language Feedback
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Reinforcement learning with verifiable rewards has been used to elicit impressive performance from language models in many domains. But, broadly beneficial deployments of AI may require us to train models with strong capabilities in "fuzzy", hard-to-supervise domains. In this paper, we develop methods to align language models in fuzzy domains where human experts are still able to provide high-quality supervision signal, but only for a small number of model outputs, using online natural language feedback. Specifically, we train models by iteratively optimizing against proxy reward signals, stopping at the point of over-optimization, collecting fresh expert supervision, and updating the proxy reward. We construct proxy reward models from language models using in-context learning (ICL) and fine-tuning. We test our methods by eliciting creative writing and alignment research capabilities in Qwen3-8B and Haiku 4.5 respectively. For Qwen3-8B, ICL methods recover up to 35% of performance with 50x fewer expert samples, while fine-tuning methods recover 80% with up to 20x fewer samples and 100% with 3x fewer samples. For Haiku 4.5, ICL methods recover up to 35% of performance with 30x fewer samples, and fine-tuning methods recover 100% with 10x fewer samples. Our results suggest that online natural language feedback can substantially improve the data efficiency of expert supervision.

[157]  arXiv:2605.04357 [pdf, ps, other]
Title: Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

The usage of large language models (LLMs) has grown increasingly fragmented, with no single model dominating. Meanwhile, cloud providers offer a wide range of mid-tier and older-generation GPUs that enjoy better availability and deliver comparable performance per dollar to top-tier hardware. To efficiently harness these heterogeneous resources for serving multiple LLMs concurrently, we introduce Coral, an adaptive heterogeneity-aware multi-LLM serving system. The key idea behind Coral is to jointly optimize resource allocation and the serving strategy of each model replica across all models. To keep pace with shifting throughput demand and resource availability, Coral applies a lossless two-stage decomposition that preserves joint optimality while cutting online solve time from hours to tens of seconds. Our evaluation across 6 models and 20 GPU configurations shows that Coral reduces serving cost by up to 2.79$\times$ over the best baseline, and delivers up to 2.39$\times$ higher goodput under scarce resource availability.

[158]  arXiv:2605.04358 [pdf, ps, other]
Title: Intermediate Representations are Strong AI-Generated Image Detectors
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The rapid advancement in generative AI models has enabled the creation of photorealistic images. At the same time, there are growing concerns about the potential misuse and dangers of generated content, as well as a pressing need for effective AI-generated image detectors. However, current training-based detection techniques are typically computationally costly and can hardly be generalized to unseen data domains, while training-free methods fall short in detection performance. To bridge this gap, we propose a search-based method employing data embedding sensitivity in intermediate layers to detect AI-generated images. Given a set of real and AI-generated images, our method examines the similarity between original image embeddings and perturbed image embeddings, and detects AI-generated images based on the similarity. We examine the proposed method on two comprehensive benchmarks: GenImage and Forensics Small. Our method exhibits improved performance across different datasets compared to both training-free and training-based state-of-the-art methods. On average, our method achieves the largest performance gain on the Forensics Small benchmark by 39.61% compared to the best training-free method and 5.14% compared to the best training-based method in AUROC score.

[159]  arXiv:2605.04361 [pdf, ps, other]
Title: When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration
Comments: 16 pages, 14 tables. 2,700 multi-agent experiments across 10 software design tasks, 7 artifact conditions, and 4 convergence pressure levels
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

The prevailing assumption in agent orchestration is that more context is better. We test this on multi-agent software design across 10 tasks, 7 context-injection conditions, and over 2,700 runs, and find a crossover effect: the same artifact type improves design exploration on some tasks (up to 20$\times$ tradeoff coverage) and actively degrades it on others (up to 46% reduction). On several tasks, an irrelevant document performs as well as or better than every relevant artifact. The direction is predicted by a single measurable variable--baseline exploration without context--with Pearson $r = -0.82$ ($p < 0.001$). Probing the mechanism by manipulating convergence pressure through prompt design reveals two distinct regimes: convergence driven by training data priors (natural) responds to artifact disruption, while convergence driven by explicit instructions (induced) does not. The implication is that context injection should be conditional, not universal: one no-context trial is a cheap diagnostic that predicts whether knowledge artifacts will help or hurt a given task.

[160]  arXiv:2605.04363 [pdf, ps, other]
Title: Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment
Comments: ICML 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

TabPFN has recently gained attention as a foundation model for tabular datasets, achieving strong performance by leveraging in-context learning on synthetic data. However, we find that TabPFN is vulnerable to label shift, often overfitting to the majority class in the training dataset. To address this limitation, we propose DistPFN, the first test-time posterior adjustment method designed for tabular foundation models. DistPFN rescales predicted class probabilities by downweighting the influence of the training prior (i.e., the class distribution of the context) and emphasizing the contribution of the model's predicted posterior, without architectural modification or additional training. We further introduce DistPFN-T, which incorporates temperature scaling to adaptively control the adjustment strength based on the discrepancy between prior and posterior. We evaluate our methods on over 250 OpenML datasets, demonstrating substantial improvements for various TabPFN-based models in classification tasks under label shift, while maintaining strong performance in standard settings without label shift. Code is available at this repository: https://github.com/seunghan96/DistPFN.

[161]  arXiv:2605.04364 [pdf, ps, other]
Title: Online Nonstochastic Prediction: Logarithmic Regret via Predictive Online Least Squares
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)

We study online prediction for marginally stable, partially observed linear dynamical systems under nonstochastic disturbances. Our objective is to minimize the cumulative squared prediction loss and compete with the best-in-hindsight Luenberger predictor. Standard online learning methods typically rely on bounded domains/gradients, and thus their guarantees may fail to deal with potentially unbounded trajectories in marginally stable systems. In this paper, we introduce an unconstrained online least squares method that stabilizes the learning process via tailored predictive hints. With model knowledge, we prove that hints constructed from any stabilizing Luenberger predictor render the hint residuals uniformly bounded, achieving logarithmic regret despite unbounded trajectory growth. We also discuss model-free prediction and introduce a simple universal hint for symmetric systems, under which logarithmic regret is maintained without model knowledge. Our results provide an adaptive, instance-wise optimal online predictor compared to classical fixed-gain observers under nonstochastic disturbances.

[162]  arXiv:2605.04365 [pdf, ps, other]
Title: How Do Ice Shelves Calve? Peridynamic Modeling of Ice Shelf Fracture Driven by Wave Erosion, Basal Melting, and Buoyancy Flexure
Subjects: Computational Engineering, Finance, and Science (cs.CE)

An ice shelf is a floating extension of a land-based ice sheet into the ocean. It plays a crucial role in slowing down the flow of land ice into the sea, thus stabilizing the ice sheet. However, this stabilizing effect can be weakened by ice calving, a process in which large fragments of ice detach from the ice shelf. Although ice calving is widely acknowledged as a major contributor to ice mass loss, and its frequency and magnitude are highly sensitive to the environmental forcing, the underlying physics-based mechanisms remain poorly understood, particularly under ocean wave actions. In this context, we developed a nonlocal peridynamics (PD) framework to model the ice calving process subjected to wave-induced frontal corrosion. The proposed physics-based PD framework enables investigation of the coupled effects of self-weight bending, buoyancy-induced foot loosening, and ice calving process. To authors' best knowledge, this work represents the first attempt to employ a physics-based peridynamics framework for simulating ice calving processes. Compared with conventional finite element methods (FEM), the PD framework naturally captures crack initiation, interaction, and propagation without the need for special numerical treatments, thereby providing a robust tool for simulating fracture phenomena under large deformations and long-term environmental loading. To quantitatively resolve fracture processes, we implemented a static first Piola Kirchhoff virial stress formulation within the PD framework, allowing direct evaluation of stress concentration and energy release at evolving crack tips. Subsequently, the model is rigorously validated through one-to-one comparisons with finite-element stress fields, analytical beam-theory solutions, and recent field observations of wave-driven ice-shelf failure reported by Sartore et al. (2025).

[163]  arXiv:2605.04366 [pdf, ps, other]
Title: Conditional Flow-VAE for Safety-Critical Traffic Scenario Generation
Comments: ICRA 2026
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Safety-critical scenarios are essential for the development of autonomous vehicles (AVs) but are rare in real-world driving data. While simulation offers a way to generate such scenarios, manually designed test cases lack scalability, and adversarial optimization often produces unrealistic behaviors. In this work, we introduce a conditional latent flow matching approach for scalable and realistic safety-critical scenario generation. Our method uses distribution matching to transform nominal scenes into safety-critical rollouts. Furthermore, we demonstrate that incorporating both simulation and real-world data enables our framework to efficiently generate diverse, data-driven scenarios. Experimental results highlight that our approach is able to more consistently and realistically generate novel safety-critical scenarios, making it a valuable tool for training and benchmarking AV systems.

[164]  arXiv:2605.04368 [pdf, ps, other]
Title: Extending Differential Temporal Difference Methods for Episodic Problems
Comments: RLC 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Differential temporal difference (TD) methods are value-based reinforcement learning algorithms that have been proposed for infinite-horizon problems. They rely on reward centering, where each reward is centered by the average reward. This keeps the return bounded and removes a value function's state-independent offset. However, reward centering can alter the optimal policy in episodic problems, limiting its applicability. Motivated by recent works that emphasize the role of normalization in streaming deep reinforcement learning, we study reward centering in episodic problems and propose a generalization of differential TD. We prove that this generalization maintains the ordering of policies in the presence of termination, and thus extends differential TD to episodic problems. We show equivalence with a form of linear TD, thereby inheriting theoretical guarantees that have been shown for those algorithms. We then extend several streaming reinforcement learning algorithms to their differential counterparts. Across a range of base algorithms and environments, we empirically validate that reward centering can improve sample efficiency in episodic problems.

[165]  arXiv:2605.04371 [pdf, ps, other]
Title: Reddit's Globalization over Twenty Years: Inferring Community Time Zone from Activity Timestamps
Comments: 31 pages, 12 figures. Includes an analysis of Reddit's geographic evolution from 2005 to 2025
Subjects: Social and Information Networks (cs.SI)

Online communities are a global phenomenon, but assessing their actual geographical spread requires accurate and scalable measurement. We propose and evaluate methods that infer the time zone of online communities solely from their temporal activity patterns, requiring nothing beyond hourly activity counts. Grounding our approach in the well-established finding that posting rhythms encode circadian structure, we compare time-domain and frequency-domain methods against a parsimonious heuristic: that activity reaches its minimum around 4 a.m. local time. On Reddit, we show that the best-performing method is accurate to a sub-30-minute resolution, and that fewer than a thousand comments are sufficient to reach peak performance. Similarly, our heuristic almost matches the accuracy of more complex methods, recovering the correct time zone within a one-hour margin on average. This simple method correlates significantly with the actual distribution of Reddit's geographical spread; we validate its generalizability across communities organized around diverse cultural phenomena, from sports to finance, and apply it at scale to characterize the geographic evolution of Reddit from its founding to the present. Our method is portable across platforms and requires no user disclosure, making it a practical baseline for any study that must account for the geographic structure of online behavior.

[166]  arXiv:2605.04373 [pdf, ps, other]
Title: Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers
Comments: 23 pages, 12 figures, 4 tables
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

RL-based controllers achieve strong average-case performance in networking tasks such as congestion control and adaptive bitrate streaming. Yet their performance can degrade severely under network conditions where strong performance is still achievable. Identifying such conditions and quantifying the resulting performance gap is intractable by enumeration, while the sequential and closed-loop nature of RL controllers makes formal verification methods impractical.
We present ReGuard, a framework that discovers worst-case scenarios for a given RL controller and protects it against them at inference time without retraining. Discovery is formulated as a bilevel regret-maximization problem, which yields a certified lower bound on the worst-case performance gap. The discovered trajectories are then analyzed as counterfactuals and compiled into lightweight logic rules that intervene only when a risky state is detected, leaving the controller's behavior unchanged otherwise.
We evaluate ReGuard across three RL-based network controllers: Pensieve, Sage, and Park. ReGuard discovers scenarios in which the controller's performance is 43$-$64% worse than what is achievable. ReGuard not only discovers gaps 57% to 6$\times$ larger than those found by the strongest baselines but also shrinks them by 79$-$85% via lightweight rule-based protection while preserving nominal performance. ReGuard's protection extends beyond the scenarios it discovers, improving performance across a wider range of network conditions.

[167]  arXiv:2605.04374 [pdf, ps, other]
Title: $p$-adic Manifold Learning and Benchmark Tasks from Impartial Games
Authors: Tomoki Mihara
Subjects: Machine Learning (cs.LG); Number Theory (math.NT)

We introduce $p$-adic manifold learning, propose an algorithm to solve it, and propose benchmark tasks from impartial games.

[168]  arXiv:2605.04375 [pdf, ps, other]
Title: Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery
Comments: Experiment-as-Code (EaC) white paper
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)

To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent's ability to control and explore in real-world labs is essential because the physical lab remains foundational to scientific discovery. While some tasks can be performed on a computer (e.g., data analysis, running simulated experiments), Eureka moments could occur at any time while operating lab instruments (e.g., when a scientist notices unexpected clues, intuition may prompt a real-time course change). Although autonomous labs are on the rise, which expose programmable APIs to control scientific instruments via software, bridging the gap between increasingly powerful AI agents and automated lab equipment requires innovation that draws insights from computer systems.
We propose a new paradigm called ``Experiment-as-Code (EaC) Labs,'' where a core concept is to encode experiments as declarative configurations that can be compiled down to device-level APIs. AI agents come up with hypotheses and experiments, written as an ensemble of declarative configurations. The systems layer performs program analysis, safety checks, resource assignment, and job orchestration. Finally, programmatic experimentation occurs via actuating the device APIs. This is a general stack that is science-, lab-, and instrument-independent, representing a novel synthesis across the physical, systems, and intelligence layers to unleash the next breakthrough in AI for Science.

[169]  arXiv:2605.04376 [pdf, ps, other]
Title: GraphPI: Efficient Protein Inference with Graph Neural Networks
Journal-ref: Journal of Proteome Research 23.11 (2024): 4821-4834
Subjects: Machine Learning (cs.LG)

The integration of deep learning approaches in biomedical research has been transformative, enabling breakthroughs in various applications. Despite these strides, its application in protein inference is impeded by the scarcity of extensively labeled datasets, a challenge compounded by the high costs and complexities of accurate protein annotation. In this study, we introduce GraphPI, a novel framework that treats protein inference as a node classification problem. We treat proteins as interconnected nodes within a protein-peptide-PSM graph, utilizing a Graph Neural Network-based architecture to elucidate their interrelations. To address label scarcity, we train the model on a set of unlabeled public protein datasets with pseudo-labels derived from an existing protein inference algorithm, enhanced by self-training to iteratively refine labels based on confidence scores. Contrary to prevalent methodologies necessitating dataset-specific training, our research illustrates that GraphPI, due to the well normalized nature of Percolator features, exhibits universal applicability without dataset-specific fine-tuning, a feature that not only mitigates the risk of overfitting but also enhances computational efficiency. Our empirical experiments reveal notable performance on various test datasets and deliver significantly reduced computation times compared to common protein inference algorithms.

[170]  arXiv:2605.04377 [pdf, ps, other]
Title: Towards Formal Verification of Hybrid Synchronous Programs with Refinement Types
Comments: NASA Formal Methods
Subjects: Programming Languages (cs.PL)

Cyber-physical systems (CPS) such as autonomous cars, aircraft, and robots are often also safety-critical; thus it is imperative that they operate as intended with a high degree of certainty. Formal verification has been employed to verify the software controlling these systems, but due to their complexity, is usually performed on an abstract model rather than the executable code. Synchronous programming languages extended with differential equations promise both rigorous modeling and sufficient expressiveness to implement executable controller code, and recent developments have introduced formal verification of strictly discrete-time programs. Extending these verification techniques to hybrid systems enables precise modeling of the environment for a wider variety of programs to be both verified and executed. We formalize the operational semantics of initial value problems and zero-crossing detection expressed in a synchronous programming language, extend its type system for verification thereof, and prove its soundness.

[171]  arXiv:2605.04396 [pdf, ps, other]
Title: Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize
Authors: Sarwan Ali
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent work has shown that Transformers' compositional generalization is governed by \emph{complexity control}, initialization scale and weight decay, which steers training toward low-complexity reasoning solutions rather than high-complexity memorization. Existing analyses, however, treat complexity control as a single static hyperparameter choice, leaving open \emph{when} during training this control is actually decisive. We show that the memorization-versus-reasoning fate of a Transformer is determined within a sharp, identifiable window of training. On a controlled compositional task we find that (i)~weight decay applied for a single 25\%-of-training window matches full-training weight decay in out-of-distribution (OOD) accuracy ($0.93$ vs $0.91$); (ii)~holding total regularization budget constant, placing it in the middle of training yields $5{-}9\times$ higher OOD accuracy than placing it early; (iii)~the boundary of the critical window is remarkably sharp, window onset shifted by as little as $100$ optimization steps causes mean OOD to jump from chance ($0.15$) to reasoning-regime ($0.61$); (iv)~the window's position depends systematically on initialization scale, but the basin of attraction for reasoning solutions \emph{shrinks} at small initialization, contradicting the prevailing recommendation that smaller initialization is uniformly better. We further show that the critical-window phenomenon is task-specific: it does not appear on grokking with modular arithmetic, where properly tuned constant weight decay matches scheduled weight decay.

[172]  arXiv:2605.04397 [pdf, ps, other]
Title: Optimize-at-Capture: Highly-adaptive Exposure Controlling for In-Vehicle Non-contact Heart-rate Monitoring
Subjects: Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)

Remote photoplethysmography (rPPG) holds great promise for continuous heart-rate monitoring of drivers in intelligent vehicles. However, its performance is severely degraded by the highly dynamic illumination changes. A critical yet overlooked factor is the lack of exposure controlling during video acquisition -- most existing systems rely on either fixed exposure settings or camera build-in auto-exposure, both of which fail to maintain stable facial brightness under rapidly changing lighting conditions during driving. To address this gap, we propose a highly-adaptive exposure controlling framework that proactively adjusts exposure parameters based on predictive modeling of historical skin reflections. Unlike standard auto-exposure, our method is specifically optimized for rPPG measurement, ensuring the skin region of interest (ROI) remains within the optimal dynamic range for rPPG signal extraction. As an important contribution of this study, we introduce ExpDrive, a public in-vehicle physiological monitoring dataset comprising synchronized facial video and reference ECG from 48 subjects captured under real driving conditions. Extensive experiments demonstrate that our method consistently outperforms fixed exposure and standard auto-exposure strategies. Specifically, it reduces the Mean Absolute Error (MAE) by 6.31 bpm (from 14.1 to 7.79 bpm) and significantly increases the success rate by 32.3 percentage points (p < 0.001) (from 24.9% to 57.2%) across challenging driving scenarios. Notably, it clearly improved the performance of non-contact heart-rate monitoring in both low-light (rainy) and high-glare (sunny) conditions, validating the efficacy of exposure-aware acquisition design.

[173]  arXiv:2605.04400 [pdf, ps, other]
Title: Contextual Memory-Enhanced Source Coding for Low-SNR Communications
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)

While Separate Source-Channel Coding (SSCC) retains the practical benefits of modular system design, its effectiveness in noisy text transmission is fundamentally constrained by the fragility of autoregressive source decoding. In low-SNR regimes, even a small number of residual bit errors after channel decoding may derail the subsequent lossless reconstruction process, especially when Arithmetic Coding (AC) relies on Large Language Model (LLM)-based probability estimation. Existing remedies either strengthen channel decoding based solely on channel observations or introduce contextual information only at the receiver for post-hoc correction, yet neither fully addresses the fragility of source probability modeling under residual channel errors. To this end, this paper proposes a Memory-Augmented Source Coding (MASC) scheme for robust SSCC-based transmission. Rather than treating context as external side information, MASC internalizes contextual patterns into a source model shared by both the transmitter-side source encoder and the receiver-side source decoder. Specifically, MASC employs a shared Parameterized Contextual Memory (PCM) to encode multi-order $n$-gram patterns, and further introduces a Mixture-of-Memory-Experts Router (MMER) to perform sparse, hidden-state-dependent routing over memory experts during autoregressive source modeling. By adaptively activating only the most relevant memories at each coding step, MASC refines source probability estimation, shortens average codelength, and mitigates the sensitivity of source decoding to residual channel errors. Extensive experiments over Rayleigh fading and AWGN channels demonstrate the effectiveness of the proposed scheme compared with state-of-the-art methods.

[174]  arXiv:2605.04405 [pdf, ps, other]
Title: Detecting Deepfakes via Hamiltonian Dynamics
Comments: First Version
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Driven by the rapid development of generative AI models, deepfake detectors are compelled to undergo periodic recalibration to capture newly developed synthetic artifacts. To break this cycle, we propose a new perspective on deepfake detection: moving from static pattern recognition to dynamical stability analysis. Specifically, our approach is motivated by physics-inspired priors: we hypothesize that natural images, as products of dissipative physical processes, tend to settle near stable, low-energy equilibria. In contrast, generative models optimize for statistical similarity to real images but do not explicitly enforce structural constraints such as geometric smoothness, leaving deepfakes more likely to occupy unstable, high-energy states. To operationalize this, we introduce Hamiltonian Action Anomaly Detection (HAAD), comprising three contributions: \textbf{i)} We model the image latent manifold as a potential energy surface. Under this hypothesis, real images are expected to produce basin-like low-energy responses, whereas fake images are more likely to induce high-potential, high-gradient responses. \textbf{ii)} We employ Hamiltonian-inspired dynamics as a stability probe. By releasing latent states from rest, samples near stable regions remain bounded, while high-gradient samples produce larger trajectory responses. \textbf{iii)} We quantify these dynamic behaviors through two trajectory statistics, \ie, Hamiltonian action and energy dissipation. Extensive experiments show that HAAD outperforms evaluated state-of-the-art baselines on challenging cross-dataset transfer benchmarks, supporting a physics-inspired stability prior for digital forensics.

[175]  arXiv:2605.04406 [pdf, ps, other]
Title: Beyond Rigid Geometries: The Spline-Pullback Metric for Universal Diffeomorphic SPD Representation Learning
Subjects: Machine Learning (cs.LG)

The integration of Symmetric Positive Definite (SPD) matrices into deep learning has historically relied on fixed algebraic Riemannian metrics. Analogous to hand-crafted features in classical machine learning, these static formulations impose rigid geometries limiting network expressivity and adaptability. Recent attempts to parameterize these geometries often violate the axioms of primary matrix functions through unconstrained powers or rank-dependent scaling, inviting spatial folding, loss of global surjectivity, and gradient collapse at spectral singularities. In this paper, we introduce the Spline-Pullback Metric (SPM), instantiated as Spectral-SPM and Cholesky-SPM, marking a paradigm shift from static metric selection to universal geometric approximation. By parameterizing the global diffeomorphism via a rank-invariant, monotonically constrained B-spline, SPM acts as a dense universal approximator for strictly increasing $C^1$ diffeomorphisms and theoretically subsumes existing pullback metrics while enabling localized non-linear spectral modelling. Topologically, SPM provides a globally bijective pullback geometry precluding rank-swapping discontinuities and gradient instabilities. Empirically, SPM achieves a state-of-the-art performance across 3 datasets utilizing Linear Probes, SPDNets, and deep Riemannian ResNets.

[176]  arXiv:2605.04407 [pdf, ps, other]
Title: Assessing Generalisation Capability of Machine Learning Models for Intrusion Detection
Comments: 13 Pages, 3 Figures, 5 Tables, Conference
Subjects: Cryptography and Security (cs.CR)

The growth of networked and IoT systems has intensified cyber-security threats and exposed the limits of traditional signature-based intrusion detection. Although machine-learning-based intrusion detection systems often report strong benchmark performance, high ac- curacy within a single dataset does not necessarily guarantee reliable performance in unseen network environments. This study investigates the generalisation capability of supervised machine learning models for intrusion detection using UNSW-NB15 and TON_IoT. Random Forest, Logistic Regression, and Naive Bayes were evaluated under same-dataset and cross-dataset settings. Random Forest achieved the strongest same dataset performance, with 95.08% accuracy on UNSW-NB15 and 99.79% on TON_IoT, but performance dropped sharply in cross-dataset testing. When trained on UNSW-NB15 and tested on TON_IoT or vice versa, below 40% accuracy. These results reveal a significant generalisation gap in intrusion detection. We connect this challenge to affective computing and human-centric AI, where behavioural signal analysis, anomaly detection, domain shift, and context-sensitive modelling are also central. This framing highlights the need for adaptive, generalisable cyber-security models that can operate across changing network and IoT environments.

[177]  arXiv:2605.04408 [pdf, ps, other]
Title: Autonomous Laparoscope Control through Unified Mechanics-Based Representation of Multimodal Intraoperative Information
Subjects: Robotics (cs.RO)

Laparoscope-holding robots can provide surgeons with a stable laparoscopic field of view (FOV) and reduce the burden on human assistants. To maintain an ideal intraoperative FOV, the robot must continuously adjust the laparoscope pose according to intraoperative information. However, intraoperative multimodal signals, such as position, force/torque, and images, differ markedly in physical meaning and units, making it difficult to build a unified representation and to generate control commands that can be used directly for laparoscope control. To address this issue, we propose a laparoscope-holding robot control method based on unified mechanics modeling of multimodal information. First, we design mapping strategies for multiple intraoperative sources, including position, force/torque, and images, and unify them into an equivalent-wrench representation in the operational space. Then, using a task-priority scheme, we inject the wrenches into the task space and the null space, respectively, and synthesize laparoscope control commands via task-priority projection, thereby achieving consistent representation and coordinated fusion of multimodal information within a single framework. Finally, taking the intraoperative remote center of motion (RCM) position, force/torque sensor readings, and laparoscopic images as examples, we construct an RCM-constraint wrench to enforce the RCM geometric constraint and reduce the contact force at the trocar site, a laparoscope-manipulation wrench to enable compliant dragging, and an instrument-tracking wrench to achieve autonomous visual tracking of the instruments. Experiments on a surgical phantom and in vivo porcine trials demonstrate that the proposed method supports multi-task operation, including compliant laparoscope manipulation and autonomous instrument tracking, while maintaining the RCM constraint and reducing sustained trocar-site loading.

[178]  arXiv:2605.04409 [pdf, ps, other]
Title: UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely on implicit feature differencing without explicitly modeling structured change semantics, and struggle to reconcile the conflicting representation demands of change detection and caption generation. In addition, current benchmarks provide limited coverage of high-resolution urban construction scenarios. To address these challenges, we propose PTNet, a prototype-guided task-adaptive framework for joint change captioning and detection. PTNet explicitly models structured change semantics through a learnable prototype bank that guides cross-temporal interaction, disentangles task-specific representations via multi-head gating, and injects detection-derived spatial priors into caption generation, enabling coherent semantic correspondence while preserving fine-grained spatial sensitivity. Furthermore, we construct UCCD, a large-scale UAV-based benchmark comprising 9,000 high-resolution image pairs and 45,000 annotated sentences for urban construction monitoring. Extensive experiments on UCCD and WHU-CDC demonstrate that PTNet consistently outperforms existing methods. The dataset and source code are publicly available at https://github.com/G124556/ptnet.

[179]  arXiv:2605.04410 [pdf, ps, other]
Title: Evaluation Cards for XAI Metrics
Comments: 7 pages. Accepted at the 5th XAI4CV Workshop, CVPR 2026 (non-archival)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

The evaluation of explainable AI (XAI) methods is affected by a lack of standardization. Metrics are inconsistently defined, incompletely reported, and rarely validated against common baselines. In this paper, we identify transparency of evaluation reporting as a central, under-addressed problem. We propose the XAI Evaluation Card, a documentation template analogous to model cards, designed to accompany any study that introduces an XAI evaluation metric. The card covers explicit declaration of target properties, grounding levels, metric assumptions, validation evidence, gaming risks, and known failure cases. We argue that adopting this template as a community norm would reduce evaluation fragmentation, support meta-analysis, and improve accountability in XAI research.

[180]  arXiv:2605.04412 [pdf, ps, other]
Title: Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllable generation naturally emerges as an important and desirable direction. However, existing approaches typically rely on style images that lie within or are similar to the training distribution of 3D generation models. When presented with out-of-distribution (OOD) styles, their performance degrades significantly or even fails. To address this limitation, we introduce $\textbf{DiLAST}$: 2D Diffusion-based Latent Awakening for 3D Style Transfer. Specifically, we leverage a pretrained 2D diffusion model as a teacher to provide rich and generalizable style priors. By aligning rendered views with the target style under diffusion-based guidance, our method optimizes the structured 3D latent representation for stylization. We observe that this limitation stems not from insufficient model capacity, but from the underutilization of structured 3D latents, which are inherently expressive. Despite being trained on comparatively limited data, 3D generation models can leverage 2D diffusion guidance to steer denoising toward specific directions in latent space, thereby producing diverse, OOD styles. Extensive experiments across diverse data and multiple 3D generation backbones demonstrate the effectiveness and plug-and-play nature of our approach.

[181]  arXiv:2605.04413 [pdf, ps, other]
Title: Counterfactual identifiability beyond global monotonicity: non-monotone triangular structural causal models
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

Structural causal models provide a unified semantics for interventions and counterfactuals, but most identifiability results rely on restrictive assumptions like global monotonicity, which are often violated in embodied interaction, where the same exogenous perturbation can induce opposite responses under different contact contexts. We ask what structure still suffices once global monotonicity is dropped. We introduce non-monotone triangular structural causal models (NM-TM-SCM), which retain triangular recursion but replace global monotonicity with mechanism-wise invertibility and context-independent inverse transport. We prove that these conditions are equivalent to exogenous isomorphism and imply complete counterfactual identifiability, and we give a counterexample showing that local invertibility alone is insufficient. We instantiate the theory in CausalInverter, with triangular invertible layers, orientation gates, and transport-stability regularization. On synthetic non-monotonic mechanisms, the structural bias yields systematic counterfactual gains as non-monotonicity increases. On MuJoCo Door, our model achieves perfect event-level counterfactual recovery, lowers continuous angle error relative to a Transformer baseline, and delivers substantially more stable recovery than Transformer and conditional-flow predictors. On MuJoCo Push, where non-monotonicity is weaker, the same low-data predictors remain competitive or better, consistent with a bias-variance boundary. These results identify a broader identifiable regime between globally monotone triangular models and unconstrained black-box world models.

[182]  arXiv:2605.04418 [pdf, ps, other]
Title: Demystifying Manifold Constraints in LLM Pre-training
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

The empirical success of large language model (LLM) pre-training relies heavily on heuristic stabilization techniques, such as explicit normalization layers and weight decay. While recent constrained optimization approaches that explicitly restrict weights may improve numerical stability and performance, the mechanism and motivation for adding constraints still remain elusive. This paper systematically demystifies the role of explicit manifold constraints in LLM pre-training. By introducing the Msign-Aligned Constrained Riemannian Optimizer (MACRO)-a provably convergent, single-loop optimization framework-our study disentangles weight regularization heuristics from interacting mechanisms like RMS normalization and decoupled weight decay. Theoretical analyses and comprehensive empirical evaluations reveal that manifold constraints independently bound forward activation scales and enforce stable rotational equilibrium, thereby subsuming the roles of these heuristic mechanisms. Evaluations on large-scale LLM architectures demonstrate that MACRO achieves highly competitive performance while rigorously preserving the theoretical guarantees of exact Riemannian optimization.

[183]  arXiv:2605.04421 [pdf, ps, other]
Title: FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Continuous-time (CT) Transformers improve irregular and long-range modeling over CT-RNNs by exploiting inputs or outputs embeddings with continuous dynamics. However, the core scaled-dot-product-attention (SDPA) mechanism remains inherently discrete. We propose FLUID (Flexible Unified Information Dynamics), a CT Transformer that incorporates continuous dynamics directly into the attention computation by replacing it with Liquid Attention Network (LAN). LAN reinterprets attention logits as continuous dynamical system and reformulates them as the solution to a linear ODE modulated by input-dependent nonlinear recurrent gates. Theoretically, we establish stability guarantees for LAN dynamics and show that it serves as an interpolating middle ground between SDPA and CT-RNNs, recovering each as special case under well-defined parameterization of its gating functions. LAN also introduces an explicit attention-sink gate to eliminate disproportionate attention mass on uninformative nodes. FLUID replaces standard residual connections with input-dependent Liquid Hyper-Connections to adaptively regulate interlayer information flow. Empirically, we evaluate FLUID on a broad set of learning tasks, including (i) irregular time-series, (ii) long-range modeling, (iii) lane-keeping control of autonomous vehicles, and (iv) learning physical dynamics under a scarce data regime. Across all the tasks, FLUID consistently matches or outperforms CT baselines, achieving improvements of up to 47% in certain scenarios and enhancing generalization under distributional shifts. Additionally, FLUID demonstrates superior noise robustness and a self-correcting inductive bias in autonomous vehicle control. We also provide a detailed analysis of key hyperparameters to guide tuning and show that FLUID occupies an intermediate position among competing approaches in terms of runtime and memory efficiency.

[184]  arXiv:2605.04425 [pdf, ps, other]
Title: Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt Learning
Comments: 15 pages, 4 figures. Preprint version
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision-language models such as CLIP achieve strong visual-textual alignment, but often suffer from overfitting and limited interpretability when adapted through continuous prompt learning. While discrete prompt optimization improves interpretability, it usually depends on large external models, leading to high computational costs and limited scalability. In this paper, we propose Interpretable Prompt Learning (IPL), a hybrid framework that alternates between discrete semantic token selection and continuous prompt optimization. Specifically, IPL formulates semantic token selection as an approximate submodular optimization problem, encouraging tokens that are both human-understandable and semantically diverse. It further adopts an alternating optimization strategy to integrate discrete token selection with continuous prompt tuning, improving interpretability while preserving adaptability to downstream tasks. Our framework is plug-and-play, allowing seamless integration with existing prompt learning methods. Extensive experiments on multiple benchmarks show that IPL consistently improves both interpretability and accuracy across five representative prompt learning methods, providing an effective and scalable extension to existing frameworks.

[185]  arXiv:2605.04426 [pdf, ps, other]
Title: Telegraph English: Semantic Prompt Compression via Structured Symbolic Rewriting
Subjects: Computation and Language (cs.CL)

We introduce Telegraph English (TE), a prompt-compression protocol that rewrites natural language into a symbol-rich, formally-structured dialect. Where token-deletion methods such as LLMLingua-2 train a classifier to delete low-importance tokens at a fixed ratio, TE performs a full semantic rewrite: it decomposes the input into atomic fact lines, substitutes verbose phrases with $\sim$40 logical and relational symbols, and lets the compression ratio adapt to each document's information density. A consequence of the line-structure rule is that compression and semantic chunking become the same operation -- each output line is an independently addressable fact, so the compressed representation is simultaneously a semantic index. We evaluate TE on 4{,}081 question-answer pairs from LongBench-v2 across five OpenAI models and two difficulty levels. At roughly 50\% token reduction, TE preserves 99.1\% accuracy on key facts with GPT-4.1 and outperforms LLMLingua-2 at matched compression ratios on every model and task tested. The gap widens on smaller models -- up to 11 percentage points on fine-detail tasks -- suggesting that explicit relational structure compensates for limited model capacity. We release the grammar specification, compression prompt, benchmark data, and reference implementation.

[186]  arXiv:2605.04427 [pdf, ps, other]
Title: Structure-Preserving and Pressure-Robust PINNs for Incompressible Oseen Problems
Subjects: Numerical Analysis (math.NA)

We develop a new class of physics-informed neural network approximations for the stationary Oseen equations based on stability-consistent loss constructions. In contrast to standard PINN formulations, which are typically heuristic, the proposed consistent PINN (CPINN) framework is systematically derived from the stability structure of the continuous problem. Within this setting, we introduce two fundamentally new approaches. First, we design standard CPINN formulations that exhibit clear improvements over conventional PINNs. Second, we propose pressure-robust CPINN formulations that provably eliminate the influence of gradient forces on the velocity approximation, yielding velocity errors that depend solely on the divergence-free component of the forcing and are independent of the pressure. The framework accommodates both exactly divergence-free architectures and unconstrained velocity approximations, providing a unified treatment of these two paradigms. Using techniques from optimal recovery theory, we establish, for the first time in the PINN setting for Oseen-type problems, quantitative recovery estimates and optimal error bounds for both velocity and pressure under suitable Besov regularity assumptions. In particular, we obtain optimal rates for the velocity in $\boldsymbol{H}^1(\Omega)$ and for the pressure in $L^2(\Omega)$. The proposed methodology introduces a pressure-robust CPINN paradigm for incompressible flows, combining structural consistency, robustness with respect to irrotational forces, and rigorous accuracy guarantees. Numerical experiments corroborate the theoretical findings and demonstrate the effectiveness of the approach.

[187]  arXiv:2605.04428 [pdf, ps, other]
Title: Submodular Ground-Set Pruning: Monotone Tightness and a Non-Monotone Separation
Authors: Alan Kuhnle
Comments: 39 pages, 0 figures
Subjects: Data Structures and Algorithms (cs.DS)

Large-scale subset selection asks for a small useful set of examples, features, sensors, seed users, or context passages from an enormous ground set. Submodular maximization is a canonical model for such diminishing-returns problems, but rapidly growing datasets make even linear-time algorithms ever costlier. We study \emph{containment pruning}: first reduce the ground set to a smaller core $P$, then require that $P$ contain a near-optimal feasible solution for every downstream budget up to~$k$. Prior work has formulated many heuristics, but the theoretical limits of this preprocessing problem are largely unknown. For monotone submodular objectives, we prove that $1-1/e$ is tight: greedy achieves this containment factor, and no algorithm can beat it even with a larger pruning budget. For non-monotone objectives, we give the first$1/2-\varepsilon$ containment algorithms under cardinality constraints and extend the approach to knapsack constraints. This $1/2$ factor exceeds the best known algorithmic ratio and the known hardness threshold for non-monotone maximization, showing that pruning can be provably easier than optimization. Empirically, pruning lets an exact IP solver run on the reduced MaxCut instance with a ${\approx}620\times$ speedup, and proof-of-concept experiments on LLM context selection demonstrate the utility of non-monotone submodular proxies and our proposed containment algorithms.

[188]  arXiv:2605.04431 [pdf, ps, other]
Title: Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Reinforcement fine-tuning (RFT) has become a core paradigm for post-training large language models, yet its training process remains highly fragile. Existing efforts mainly improve reliability at the system level or address specific issues in individual subproblems by modifying RFT algorithms. Despite their effectiveness, they largely overlook the problem of failure management at the training-process level. When training goes wrong, practitioners still rely heavily on expert-driven manual inspection and correction, and automatic failure management for RFT remains largely unexplored. In this paper, we take a first step toward systematic failure management for reinforcement fine-tuning. To understand the empirical structure of RFT failures, we first construct RFT-FaultBench, the first benchmark for fine-grained failures in reinforcement fine-tuning, covering 5 fault families, 16 fault types, 779 training runs, 22,549 train-step records, and 1,457,288 trajectory-level records. Based on this benchmark, we conduct a comprehensive empirical study showing that RFT failures are both observable from training dynamics and distinguishable through their empirical fault fingerprints. Building on these findings, we propose RFT-FM, an automatic failure management framework for reinforcement fine-tuning that unifies anomaly detection, failure diagnosis, and auto remediation in a closed loop. Experimental results show that RFT-FaultBench is neither trivial nor saturated: it exhibits clear anomaly structure while still posing substantial challenges, especially under subtle fault settings. Moreover, RFT-FM shows strong capability in detecting, diagnosing, and mitigating RFT failures.

[189]  arXiv:2605.04435 [pdf, ps, other]
Title: Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Feedforward Gaussian Splatting has recently emerged as an efficient paradigm for 4D reconstruction in autonomous driving. However, in unstructured off-road scenes, its performance degrades due to high-frequency geometry, ego-motion jitter, and increased non-rigid dynamics. These factors introduce conflicting Gaussian observations across timestamps, leading to either over-smoothed renderings or structural artifacts. To address this issue, we propose Ground4D, a spatially-grounded 4D feedforward framework for pose-free off-road reconstruction. The key idea is to resolve temporal conflicts through spatially localized conditioning. Specifically, we introduce voxel-grounded temporal Gaussian aggregation, which partitions the canonical Gaussian space into spatial voxels and performs query-conditioned temporal attention within each voxel. Intra-voxel softmax normalization ensures that temporal selectivity and spatial occupancy become mutually reinforcing rather than conflicting. We furthermore introduce surface normal cues as auxiliary geometric guidance to regularize the geometry of Gaussian primitives. Extensive experiments on ORAD-3D and RELLIS-3D demonstrate that Ground4D consistently outperforms existing feedforward methods in reconstruction quality and generalizes zero-shot to unseen off-road domains. Project page and code:https://github.com/wsnbws/Ground4D.

[190]  arXiv:2605.04436 [pdf, ps, other]
Title: Joint Optimization of Trajectory Control, Resource Allocation, and Task Offloading for Multi-UAV-Assisted IoV
Comments: This paper has been submitted to TMC
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

This paper investigates a multi-Unmanned Aerial Vehicle (UAV) joint base station-assisted Internet of Vehicles (IoV) task offloading system in dense urban environments. To minimize system delay and energy consumption under strict coupling constraints, the complex non-convex optimization problem is decoupled into a hierarchical execution framework. First, a sequential distributed optimization algorithm based on Second-Order Cone Programming (SOCP) is proposed to optimize the 3D flight trajectory of each UAV, ensuring adaptive network coverage. Second, a novel hybrid resource scheduling paradigm synergizing Deep Reinforcement Learning (DRL) and Large Language Models (LLMs) is developed. Within this framework, the DRL agent dictates the initial resource allocation, while the LLM acts as a semantic macro-scheduler to rectify long-tail allocation imbalances for failed and surplus tasks. Crucially, a reward decoupling mechanism is introduced to isolate DRL training from external LLM interventions, thereby ensuring policy convergence. Finally, the task offloading ratios are precisely determined via Linear Programming (LP) within an alternating optimization loop. Simulation results demonstrate that the proposed method significantly outperforms traditional multi-agent reinforcement learning baselines in terms of task success rate and system efficiency.

[191]  arXiv:2605.04439 [pdf, ps, other]
Title: A cross-modal network for facial expression recognition
Comments: Published in IEEE Transactions on Image Processing 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks enriched with structural information have been widely employed for facial expression recognition tasks. However, these methods often depend on hierarchical information rather than face property to finish expression recognition. In this paper, we propose a cross-modal network with strong biological and structural information for facial expression recognition (CMNet). CMNet can respectively learn expression information via face symmetry on a whole face, left and right half faces to extract complementary facial features. To prevent negative effect of biological and structural information fusion, a salient facial information refinement module can obtain salient facial expression information to improve stability of an obtained facial expression classifier. To reduce reliance on unilateral facial features, a half-face alignment optimization mechanism is designed to align obtained expression information of learned left and right half faces. Our experimental results demonstrate that CMNet outperforms several novel methods, i.e., SCN and LAENet-SA for facial expression recognition. Codes can be obtained at https://github.com/hellloxiaotian/CMNet.

[192]  arXiv:2605.04445 [pdf, ps, other]
Title: LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection
Comments: 10 pages,2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The rapid advancement of generative technologies has made synthetic images nearly indistinguishable from real ones, thereby creating an urgent need for robust detectors to counter misinformation. However, existing methods mainly rely on universal artifact features that are shared across multiple generators. We observe that as the diversity of generators increases, the overlap of these common features gradually decreases. This severely undermines model generalization. In contrast, focusing only on unique artifacts tends to cause overfitting to specific forgery patterns. To address this challenge, we propose LEGO (LoRA-Enabled Generator-Oriented Framework). The core mechanism of LEGO employs an MLP to modulate multiple LoRA (Low-Rank Adaptation) blocks, each pretrained to capture the unique artifacts of a specific generator, followed by attention-based feature fusion. Unlike conventional methods that seek a single universal solution, LEGO delegates unique artifact extraction to specialized LoRA modules by dividing its training procedure into two stages. Each LoRA module is individually trained on a single-generator dataset to learn generator-specific representations, then MLP and attention layers are trained on mixed datasets to dynamically regulate the contribution of each module. Benefiting from its modular yet robust design, LEGO can be naturally extended by incorporating new LoRA modules for adaptation to newly emerging next-generation datasets, while still achieving substantially better performance than prior SOTA methods with fewer than 30,000 training images, less than 10% of their training data, and only 5 epochs in each training stage.

[193]  arXiv:2605.04446 [pdf, ps, other]
Title: Misrouter: Exploiting Routing Mechanisms for Input-Only Attacks on Mixture-of-Experts LLMs
Subjects: Cryptography and Security (cs.CR)

Mixture-of-Experts (MoE) architectures have emerged as a leading paradigm for scaling large language models through sparse, routing-based computation. However, this design introduces a new attack surface: the routing mechanism that determines which experts process each input. Prior work shows that manipulating routing can bypass safety alignment, but existing attacks require model modification and thus apply only to locally deployed models. By contrast, real-world LLM services are remotely hosted and accessible only through input queries. This raises a fundamental question: can MoE routing be exploited through input-only attacks to induce stronger unsafe behaviors in real-world services? Our key insight is to optimize attacks in a white-box setting on open-source surrogate MoE models and transfer the resulting adversarial inputs to public API services within the same model family. This setting presents three main challenges: routing can be influenced only indirectly through input perturbations, routing control and output generation are tightly coupled, and even a successful safety bypass may still produce low-quality responses. To address these challenges, we propose Misrouter, an input-only attack framework that jointly targets routing behavior and expert functionality. Misrouter identifies weakly aligned experts that are willing to produce target harmful content by analyzing expert activations under harmful queries paired with unsafe continuations. It then optimizes adversarial inputs to steer routing toward these experts and away from strongly aligned ones. It further biases routing toward highly capable general-purpose experts identified from benign question-answering tasks. Finally, because routing and output objectives can conflict, Misrouter uses a two-phase optimization strategy that first steers routing and then optimizes harmful outputs while preserving routing stability.

[194]  arXiv:2605.04447 [pdf, ps, other]
Title: Deep Reprogramming Distillation for Medical Foundation Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepancy between pre-training and downstream tasks, the real-world computation, and speed constraints. Relevant techniques that probably handle this challenge more or less suffer from some intrinsic limitations. For example, knowledge distillation (KD) assumes that teacher and student models share the same task, training strategy, and model structure family, while prevalent parameter-efficient fine-tuning (PEFT) fails to achieve personalized and lightweight deployment. Even the combination of PEFT and KD still struggles to resolve model structures and training strategies inconsistencies between teacher and student models, leading to inefficient knowledge transfer. In this study, we propose a novel framework called Deep Reprogramming Distillation (DRD) to combat the general adaptation challenge. Specifically, DRD introduces the novel reprogramming module that on the one side overcomes the domain and task discrepancy between pretraining and downstream scenarios, and on the other side builds the student-friendly efficient distillation from foundation models to lightweight downstream models. Furthermore, to mitigate variability under different training conditions, we design a centered kernel alignment (CKA) distillation method to promote robust knowledge transfer. Empirical results show that DRD surpasses previous PEFT and KD methods across 18 medical downstream tasks under different foundation models, covering various scenarios including 2D/3D classification and 2D/3D segmentation.

[195]  arXiv:2605.04448 [pdf, ps, other]
Title: Queue-Aware and Resilient Routing in LEO Satellite Networks Using Multi-Agent Reinforcement Learning
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

With the rapid growth in data demand and stringent latency requirements of modern applications has driven significant interest in Low Earth Orbit (LEO) satellite constellations as an emerging solution for global Internet coverage. However, routing in LEO networks remains a fundamental challenge due to highly dynamic topologies, time-varying traffic conditions, and its susceptibility to link failures. Conventional routing algorithms typically assume static link metrics and fail to account for queue backlogs or real-time system variations, making them less effective in such environments. We propose a queue-aware multi-agent deep reinforcement learning (MA-DRL) framework for routing in LEO satellite networks. Each satellite is modeled as an independent agent responsible for making local routing decisions, enabling a distributed and scalable solution. The proposed framework formulates a latency-aware optimization problem that incorporates background traffic, queue dynamics at each satellite, and a resilience score to improve robustness. We evaluate the proposed approach against the state-action-reward-state-action (SARSA) and Dijkstra algorithms. While Dijkstra achieves the lowest end-to-end latency under ideal conditions, its computational and signaling overhead becomes a significant bottleneck as the network scales. In contrast, our proposed approach incurs significantly lower overhead (approximately 50% of Dijkstra at a 5 s recalculation interval), scales efficiently with network size, and effectively manages queue backlogs and resilience under increasing traffic load, demonstrating enhanced robustness and scalability in LEO satellite networks while maintaining competitive latency and resilience scores.

[196]  arXiv:2605.04449 [pdf, ps, other]
Title: GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking
Comments: 9 pages, 1 figure. Submitted to AAAI 2026. Also available at Amazon Science: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Dialogue State Tracking (DST) requires precise extraction of structured information from multi-domain conversations, a task where Large Language Models (LLMs) struggle despite their impressive general capabilities. We present GEM (Graph-Enhanced Mixture-of-Experts), a novel framework that combines language models and graph-structured dialogue understanding with ReAct agent-based reasoning for superior DST performance. Our approach dynamically routes between specialized experts: a Graph Neural Network that captures dialogue structure and turn-level dependencies, and a finetuned T5-Small encoder-decoder for sequence modeling, coordinated by an intelligent router. For complex value generation tasks, we integrate ReAct agents that perform structured reasoning over dialogue context. On MultiWOZ 2.2, GEM achieves 65.19% Joint Goal Accuracy, substantially outperforming end-to-end LLM approaches (best: 38.43%) and surpassing state-of-the-art (SOTA) methods including TOATOD (63.79%), D3ST (58.70%), and Diable (56.48%). Our graph-enhanced mixture-of-experts architecture with ReAct integration demonstrates that combining structured dialogue representation with dynamic expert routing and agent-based reasoning provides a powerful paradigm for dialogue state tracking, achieving superior accuracy while maintaining computational efficiency through selective expert activation.

[197]  arXiv:2605.04450 [pdf, ps, other]
Title: One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Generative Recommender (GR) inference places embedding hot caches (EMB) and KV caches in direct competition for limited GPU HBM: allocating more memory to one improves its efficiency but degrades the other. Existing systems optimize them in isolation, overlooking that the optimal EMB-KV allocation ratio can shift by up to 0.35 across workload regimes, leaving 20-30\% latency improvement unrealized. While online reallocation is required to close this gap, naive approaches introduce H2D refill traffic on the critical path, causing P99 SLO violations.
To address this, we present HELM, which jointly manages HBM allocation and request routing at runtime through two key components: (1) Adaptive Memory Allocation, a three-layer PPO-based controller (frozen base policy, online residual adapter, and burst-aware recovery controller) that achieves $32\,\mathrm{\mu s}$ decision latency while staying within 0.024-0.029 of the offline-optimal ratio; and (2) EMB-KV-Aware Scheduling, which routes requests by jointly considering KV residency, embedding locality, and node load to avoid routing inefficiencies under heterogeneous allocations. Evaluations on three production-scale datasets over a 32-node A100 cluster show that HELM reduces P99 latency by 24-38\% over the best static policy and achieves 93.5-99.6\% SLO satisfaction across Steady, Trend, and Burst workloads, significantly outperforming state-of-the-art baselines without sacrificing throughput.

[198]  arXiv:2605.04451 [pdf, ps, other]
Title: RemoteZero: Geospatial Reasoning with Zero Human Annotations
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inference chains. Yet a final dependency remains: they are still supervised by human-annotated ground-truth coordinates. This leaves the reasoning process autonomous, but not its spatial endpoint, and prevents true self-evolution on abundant unlabeled remote sensing data. To break this bottleneck, we introduce RemoteZero, a box-supervision-free framework for geospatial reasoning. RemoteZero is motivated by a simple asymmetry: an MLLM is typically better at verifying whether a region satisfies a query than at directly generating precise coordinates. Leveraging this stronger discriminative ability, RemoteZero replaces geometric supervision with intrinsic semantic verification and enables GRPO training without box annotations. The resulting framework further supports iterative self-evolution, allowing the model to improve from unlabeled remote sensing imagery through its own verification signal. Experiments show that RemoteZero achieves competitive performance against strong supervised methods, demonstrating the potential of self-verifying training for geospatial reasoning localization.

[199]  arXiv:2605.04452 [pdf, ps, other]
Title: Beyond Ability: The Four-Fold Spectrum of Power and the Logic of Full Inability
Authors: Shanxia Wang
Comments: Comments: This is a revised and significantly extended version of the prior preprint arXiv:2604.27917. All comments and feedback are welcome
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)

Coalition Logic studies what coalitions can enforce. Recent work treats inability as simple non-ability: $\neg\Eff{C}\varphi$. This conflates two distinct configurations -- a coalition unable to force $\varphi$ may still force $\neg\varphi$, retaining adversarial control rather than genuine inability. We introduce \textbf{Full Inability} ($\FI$): the symmetric condition in which a coalition can enforce neither a proposition nor its negation.
Combining coalitional effectivity with propositional negation yields a four-fold spectrum: \textbf{Full Control} ($\FC$), \textbf{Positive Determination} ($\PD$), \textbf{Adverse Determination} ($\AD$), and \textbf{Full Inability} ($\FI$). These categories partition a coalition's strategic status exhaustively and exclusively. We establish their algebraic and order-theoretic structure. Under $\alpha$-duality, propositional negation and coalition complementation generate a Klein four-group symmetry. In playable models, the four power regions are order-convex in the powerset lattice, yielding interval-stable verification of inability.
We axiomatize $\CLFI$, a definitional extension treating Full Inability as a primitive modality. Via elimination translation, we prove soundness, completeness, and conservativity over Coalition Logic. The extension preserves expressive power and complexity ($\PSPACE$-complete), but provides direct proof-theoretic access to symmetric inability, strategic dependence, propositional dummyhood, and containment verification.

[200]  arXiv:2605.04453 [pdf, ps, other]
Title: StableI2I: Spotting Unintended Changes in Image-to-Image Transition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure of the input image. To address this limitation, we propose StableI2I, a unified and dynamic evaluation framework that explicitly measures content fidelity and pre--post consistency across a wide range of I2I tasks without requiring reference images, including image editing and image restoration. In addition, we construct StableI2I-Bench, a benchmark designed to systematically evaluate the accuracy of MLLMs on such fidelity and consistency assessment tasks. Extensive experimental results demonstrate that StableI2I provides accurate, fine-grained, and interpretable evaluations of content fidelity and consistency, with strong correlations to human subjective judgments. Our framework serves as a practical and reliable evaluation tool for diagnosing content consistency and benchmarking model performance in real-world I2I systems.

[201]  arXiv:2605.04454 [pdf, ps, other]
Title: Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Software Engineering (cs.SE)

Alignment evaluation in machine learning has largely become evaluation of models. Influential benchmarks score model outputs under fixed inputs, such as truthfulness, instruction following, or pairwise preference, and these scores are often used to support claims about deployed alignment. This paper argues that deployment-relevant alignment cannot be inferred from model-level evaluation alone. Alignment claims should instead be indexed to the level at which evidence is collected: model-level, response-level, interaction-level, or deployment-level. Two studies support this position. First, a structured audit of eleven alignment benchmarks, extended to a sixteen-benchmark corpus, dual-coded against an eight-dimension rubric with Cohen's kappa = 0.87, finds that user-facing verification support is absent across every benchmark examined, while process steerability is nearly absent. The few interactional benchmarks identified, including tau-bench, CURATe, Rifts, and Common Ground, remain fragmented in coverage, and benchmark construction rather than data source determines what is measured. Second, a blinded cross-model stress test using 180 transcripts across three frontier models and four scaffolds finds that the same verification scaffold raises one model's verification support to ceiling while leaving another categorically unchanged. This shows that scaffold efficacy is model-dependent and that the gap identified by the audit cannot be closed at the model level alone. We propose a system-level evaluation agenda: alignment profiles instead of single scores, fixed-scaffolding protocols for comparable interactional evaluation, and reporting templates that make the inferential distance between evaluation evidence and deployment claims explicit.

[202]  arXiv:2605.04455 [pdf, ps, other]
Title: Long-time $L^2$&$H^1$-stability of the Family of DLN Methods for the Two-dimensional Incompressible Navier-Stokes Equations
Subjects: Numerical Analysis (math.NA)

In this report, we study the long-time stability of the family of one-leg DLN methods for the two-dimensional incompressible Navier-Stokes equations. The family of DLN methods (with one parameter $\theta$), non-linear energy stable ($G$-stable) and second-order accurate under arbitrary time grids, has been widely applied to the simulations of various fluid models with success. We derive a new version of the $G$-stability identity for the family of DLN methods under uniform time grids and mild time constraints. Then we utilize this crucial auxiliary tool and the discrete uniform Gr\"onwall inequality lemma to prove the uniform-in-time stability of the numerical solutions. Essentially, the bounds are independent of the time interval and the initial conditions, consistent with the theories of the continuous case.

[203]  arXiv:2605.04458 [pdf, ps, other]
Title: DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Evaluation of long-form, citation-backed reports has lately received significant attention due to the wide-scale adoption of retrieval-augmented generation (RAG) systems. Core to many evaluation frameworks is the use of atomic facts, or nuggets, to assess a report's coverage of query-relevant information attested in the underlying collection. While nuggets have traditionally been represented as short statements, recent work has used question-answer (QA) representations, enabling fine-grained evaluations that decouple the information need (i.e. the question) from the potentially diverse content that satisfies it (i.e. its answers).
A persistent challenge for nugget-based evaluation is the need to manually curate sets of nuggets for each topic in a test collection -- a laborious process that scales poorly to novel information needs. This challenge is acute in cross-lingual settings, where information is found in multilingual source documents. Accordingly, we introduce DoGMaTiQ, a pipeline for generating high-quality QA-based nugget sets in three stages: (1) document-grounded nugget generation, (2) paraphrase clustering, and (3) nugget subselection based on principled quality criteria. We integrate DoGMaTiQ nuggets with AutoArgue -- a recent nugget-based evaluation framework -- to enable fully automatic evaluation of generated reports. We conduct extensive experiments on two cross-lingual TREC shared tasks, NeuCLIR and RAGTIME, showing strong rank correlations with both human-in-the-loop and fully manual judgments. Finally, detailed analysis of our pipeline reveals that a strong LLM nugget generator is key, and that the system rankings induced by DoGMaTiQ are robust to outlier systems. We facilitate future research in report evaluation by publicly releasing our code and artifacts at https://github.com/manestay/dogmatiq.

[204]  arXiv:2605.04460 [pdf, ps, other]
Title: Discovering Sparse Counterfactual Factors via Latent Adjustment for Survey-based Community Intervention
Subjects: Machine Learning (cs.LG)

Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual community intervention from survey responses, where the goal is to shift a target respondent group toward a desired reference group through controllable survey-variable adjustments. We formulate this task as a policy-feasible distributional alignment problem using a fixed-basis nonnegative latent representation that preserves pre/post comparability and provides a stable map from latent factors to original variables. To make latent movement actionable, target-relevant latent factors are identified through Shapley-guided attribution and transferred to controllable variables as intervention priorities. Feasible group-level adjustments are then learned by minimizing an entropy-regularized optimal-transport discrepancy between the post-intervention target distribution and the reference distribution, together with a weighted $\ell_{2,1}$ penalty that promotes shared policy-lever sparsity. Experiments on real-world transportation survey datasets show that the proposed framework produces compact and interpretable policy-feasible interventions with explicit adjustment magnitudes, improves population-level conversion, and preserves intervention sparsity. Code and datasets are publicly available at: https://github.com/pangjunbiao/latent-group-alignment.git

[205]  arXiv:2605.04461 [pdf, ps, other]
Title: Stream-T1: Test-Time Scaling for Streaming Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack temporal guidance. To address these structural bottlenecks, we propose shifting the focus to streaming video generation. We identify that its chunk-level synthesis and few denoising steps are intrinsically suited for TTS, significantly lowering computational overhead while enabling fine-grained temporal control. Driven by this insight, we introduced Stream-T1, a pioneering comprehensive TTS framework exclusively tailored for streaming video generation. Specifically, Stream-T1 is composed of three units: (1) Stream -Scaled Noise Propagation, which actively refines the initial latent noise of the generating chunk using historically proven, high-quality previous chunk noise, effectively establishes temporal dependency and utilizing the historical Gaussian prior to guide the current generation; (2) Stream -Scaled Reward Pruning, which comprehensively evaluates generated candidates to strike an optimal balance between local spatial aesthetics and global temporal coherence by integrating immediate short-term assessments with sliding-window-based long-term evaluations; (3) Stream-Scaled Memory Sinking, which dynamically routes the context evicted from KV-cache into distinct updating pathways guided by the reward feedback, ensuring that previously generated visual information effectively anchors and guides the subsequent video stream. Evaluated on both 5s and 30s comprehensive video benchmarks, Stream-T1 demonstrates profound superiority, significantly improving temporal consistency, motion smoothness, and frame-level visual quality.

[206]  arXiv:2605.04465 [pdf, ps, other]
Title: Robust Inverse Quadratic Error Decay with Meshing and Beam Search for Random Subset Sum
Subjects: Data Structures and Algorithms (cs.DS)

The Subset Sum Problem is a fundamental NP-complete problem in cryptography and combinatorial optimization, with many real-world applications. The Random Subset Sum Problem (RSSP) is a more applicable version of subset sum, where numbers are drawn from some i.i.d input distribution. We present an algorithm that, with probability $1-\delta$, constructs the same $O(B/w)$ mesh as Da Cunha et al. (2023), while trimming to $w$ elements throughout and running in $O(w\log w)$ time. Then, we present a novel beam search heuristic running in linearithmic time w.r.t list size $n$ and beam width $w$ using the mesh that gives an expected error of $O\!\left(\frac{B}{nw^2}\right)$ under a standard mean-field assumption with equal standard deviation, demonstrating the practical effectiveness of meshing to achieve error decay. The algorithm is empirically robust to multiple input distributions and can naturally extend to variants with simple changes to the scoring heuristic, establishing a new practical baseline for robust subset sum error decay and $\epsilon$-approximation theory.

[207]  arXiv:2605.04467 [pdf, ps, other]
Title: KEET: Explaining Performance of GPU Kernels Using LLM Agents
Comments: 12 pages, 8 figures, 3 tables
Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC)

Performance profiles of GPU kernels generated by tools such as Nsight Compute are rich in detail but are often challenging to interpret. To achieve the best performance possible on a given GPU architecture, kernel developers need to spend significant time analyzing and comparing profiles in the tool's graphical interface to identify and understand kernel performance bottlenecks. Large Language Models (LLMs) have shown promise in understanding complex data and generating natural language explanations. In this paper, we propose the Kernel Execution Explanation Toolkit (KEET), an LLM-based agentic framework for interpreting Nsight Compute profiles to generate useful and data-grounded natural language explanations of performance issues in GPU kernels, and suggestions for optimizations. We evaluate \toolname using several CUDA kernels of varying complexity on NVIDIA H100 GPUs. We find that the generated explanations, when provided as context, improve the quality of LLM code optimization and multiple-choice question answering in downstream tasks. We further demonstrate that the tool can be used to interpret performance data from large sets of profiles to improve the quality of optimization suggestions.

[208]  arXiv:2605.04468 [pdf, ps, other]
Title: Stabilizing LLM Supervised Fine-Tuning via Explicit Distributional Control
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Post-training large language models (LLMs) often suffers from catastrophic forgetting, where improvements on a target objective degrade previously acquired capabilities. Recent evidence suggests that this phenomenon is primarily driven by excessive distributional drift during optimization. Motivated by this perspective, we propose Anchored Learning, a simple framework that explicitly controls distributional updates during offline fine-tuning via a dynamically evolving moving anchor. Instead of matching a fixed reference distribution, the anchor interpolates between the current model and a frozen reference to construct an intermediate target that the model distills toward, transforming global fine-tuning into a sequence of local trust-region updates in distribution space. Theoretically, we prove this anchor-based update admits a linear KL-divergence upper bound per iteration, ensuring a stable transition between model distributions. Extensive experiments on iGSM, MedCalc, and IFEval show that Anchored Learning consistently lies on the Pareto frontier of gain-stability trade-offs, achieving near-optimal performance improvements while substantially reducing degradation compared to strong baselines. For example, while standard SFT suffers from over 53% performance degradation on iGSM and MedCalc, Anchored Learning slashes this drop to under 5% while maintaining near-optimal gains (e.g., 75.2% on iGSM).

[209]  arXiv:2605.04470 [pdf, ps, other]
Title: CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

Open-loop imitation learning has advanced modern autonomous driving policy architectures, but closed-loop deployment remains vulnerable to policy-induced distribution shift. Existing post-training paradigms exhibit fundamental trade-offs: closed-loop RL fine-tuning provides grounded feedback from executed actions but is constrained by the sparsity of informative events, whereas counterfactual fine-tuning provides dense supervision over candidate futures but inherits bias from imperfect future estimates. We introduce Counterfactual-to-Interactive Reinforcement Fine-Tuning (CRAFT), an on-policy framework that formulates closed-loop post-training as proxy-residual optimization. CRAFT uses group-normalized counterfactual advantages as a dense proxy for real closed-loop advantages and aligns this proxy with the closed-loop world through grounded residual correction from interaction-critical events. To stabilize adaptation, CRAFT regularizes the online policy toward an EMA teacher via asymmetric KL self-distillation. Theoretically, CRAFT decomposes the real closed-loop policy gradient into proxy and residual terms under the same visited-state distribution, reducing residual variance with an aligned proxy while mitigating proxy bias through grounded residual approximation. Empirically, CRAFT achieves the strongest closed-loop gains on Bench2Drive across hierarchical planning, vision-language-action, and vocabulary-scoring architectures. Ablations, scaling behavior, stability analyses, and transfer results further validate the complementary roles of dense counterfactual proxy and grounded residual correction. Project page: https://currychen77.github.io/CRAFT.

[210]  arXiv:2605.04471 [pdf, ps, other]
Title: Order Flow Exclusivity and Value Extraction Mechanisms: An Analysis of Ethereum Builder Centralization
Authors: Ao Zhang (1), Yunwen Liu (2), Ren Zhang (3), Yingdi Shan (1), Yongwei Wu (1) ((1) Tsinghua University, Beijing, China, (2) KU Leuven, Leuven, Belgium, (3) Cryptape and Nervos, China)
Subjects: Cryptography and Security (cs.CR)

This study investigates the rapid centralization of the Ethereum builder market under the Proposer-Builder Separation (PBS) architecture. We argue that existing research, by focusing predominantly on influential order flows, lacks a comprehensive evaluation of order flow behavioral patterns and economic purposes. To address this gap, we analyze Ethereum transactions from September 2023 to August 2025 to characterize Exclusive Order Flows (EOFs) and non-atomic Maximal Extractable Value (MEV) -- the missing components corresponding to these behavioral and economic dimensions, respectively. We introduce a novel exclusivity metric based on Kullback-Leibler divergence and employ supervised learning to identify 75 EOFs and 322 non-atomic MEV flows, which account for 71\% and 23\% of trading-related builder revenue. A longitudinal analysis of builder strategies across these dimensions delineates the market's evolution into four distinct eras, revealing that while EOFs were instrumental in establishing early dominance, incumbents have since decoupled market share from immediate EOF dependency by leveraging entrenched network effects. Ultimately, we conclude that builder centralization is an emergent property of the PBS framework itself, as the architecture systematically violates the fundamental prerequisites of a competitive market.

[211]  arXiv:2605.04472 [pdf, ps, other]
Title: Automated Formal Proofs of Combinatorial Identities via Wilf-Zeilberger Guidance and LLMs
Comments: Accepted to ICML 2026. Preprint version
Subjects: Machine Learning (cs.LG)

Automating formal proofs of combinatorial identities is challenging for LLM-based provers, as long-horizon proof planning is required and unconstrained search quickly explodes. Symbolic methods such as the Wilf-Zeilberger (WZ) method can achieve a mechanized proof of combinatorial identities by constructing special auxiliary functions and demonstrating that they satisfy specific recurrence relations. We propose WZ-LLM, a neuro-symbolic framework that turns WZ proof plans into executable proof sketches in Lean 4 and uses an LLM-based prover to discharge the resulting machine-checkable subgoals. We also train a dedicated WZ-Prover via a Lean-kernel-verified bootstrapping loop with expert-verified iteration, followed by DAPO-based refinement. Experiments show that WZ-LLM achieves a 34% proof success rate on LCI-Test (100 classic combinatorial identities), outperforming strong baselines such as DeepSeek-V3 and Goedel-Prover-V2, and delivering consistent gains on CombiBench and PutnamBench-Comb. These results indicate that our framework provides two complementary strengths: improved direct proving for identities beyond the scope of WZ, and substantially higher end-to-end success when WZ sketches guide a specialized prover.

[212]  arXiv:2605.04474 [pdf, ps, other]
Title: Geometry-Aware Neural Optimizer for Shape Optimization and Inversion
Comments: To appear in ICML2026
Subjects: Machine Learning (cs.LG)

Geometry is central to PDE-governed systems, motivating shape optimization and inversion. Classical pipelines conduct costly forward simulation with geometry processing, requiring substantial expert effort. Neural surrogates accelerate forward analysis but do not close the loop because gradients from objectives to geometry are often unavailable. Existing differentiable methods either rely on restrictive parameterizations or unstable latent optimization driven by scalar objectives, limiting interpretability and part-wise control. To address these challenges, we propose Geometry-Aware Neural Optimizer (GANO), an end-to-end differentiable framework that unifies geometry representation, field-level prediction, and automated optimization/inversion in a single latent-space loop. GANO encodes shapes with an auto-decoder and stabilizes latent updates via a denoising mechanism, and a geometry-injected surrogate provides a reliable gradient pathway for geometry updates. Moreover, GANO supports part-wise control through null-space projection and uses remeshing-free projection to accelerate geometry processing. We further prove that denoising induces an implicit Jacobian regularization that reduces decoder sensitivity, yielding controlled deformations. Experiments on three benchmarks spanning 2D Helmholtz, 2D airfoil, and 3D vehicles show state-of-the-art accuracy and stable, controllable updates, achieving up to +55.9% lift-to-drag improvement for airfoils and ~7% drag reduction for vehicles.

[213]  arXiv:2605.04475 [pdf, ps, other]
Title: Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Reliable autonomous driving requires scene understanding that is semantically consistent across heterogeneous sensors and verifiable at the reasoning stage. However, many recent LLM-driven driving systems attach the language model as a post-processor and force it to reason over redundant or conflicting perception outputs, which can amplify hallucinated entities and unsafe conclusions. This paper proposes InfoCoordiBridge, a BEV-centric neuro-symbolic architecture that inserts an explicit coordination bridge between perception and language reasoning. InfoCoordiBridge comprises (i) a unified multi-agent perception layer that outputs typed structured facts together with modality-focused synopses, (ii) an ICA module that aligns and fuses multi-source outputs into a single SceneSummary, and (iii) an SSRE module that performs SceneSummary-grounded reasoning with verification. Experiments on nuScenes and Waymo show that ICA preserves competitive 3D detection accuracy while substantially improving fusion consistency, reducing redundancy to below 1% and achieving about 98% attribute agreement. On NuScenes-QA and a template-aligned Waymo-QA benchmark, SSRE improves factual grounding and reduces hallucinated entity mentions compared with representative VLM and agentic baselines. Overall, by coordinating multi-sensor outputs into a single conflict-aware SceneSummary before prompting, InfoCoordiBridge prevents redundant and cross-modally inconsistent perception evidence from propagating into high-level reasoning.

[214]  arXiv:2605.04477 [pdf, ps, other]
Title: Data-dependent Exploration for Online Reinforcement Learning from Human Feedback
Subjects: Machine Learning (cs.LG)

Online reinforcement learning from human feedback (RLHF) has emerged as a promising paradigm for aligning large language models (LLMs) by continuously collecting new preference feedback during training. A foundational challenge in this setting is exploration, which requires algorithms that enable the LLMs to generate informative comparisons that improve sample-efficiency in online RLHF. Existing exploration strategies often derive bonuses via on-policy expectations, which are difficult to estimate reliably from the limited historical preference data available during training; as a result, the policy can prematurely down-weight under-explored regions that may contain high-value behaviors. In this paper, we propose data-dependent exploration for preference optimization (DEPO), a simple and scalable method that leverages historical data to construct an extra uncertainty bonus for high-uncertainty regions, encouraging exploration toward potentially high-value data. Theoretically, we provide a data-dependent regret bound for the proposed algorithm, showing that it adapts to the hardness of the learning task itself and can be tighter than worst-case bounds in practice. Empirically, the proposed method consistently outperforms strong baselines across benchmarks, demonstrating improved sample efficiency.

[215]  arXiv:2605.04478 [pdf, ps, other]
Title: CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training
Comments: Accepted by PPoPP'26, 13 figures, 2 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

As training scales grow, collective communication libraries (CCL) increasingly face anomalies arising from complex interactions among hardware, software, and environmental factors. These anomalies typically manifest as slow/hang communication, the most frequent and time-consuming category to diagnose. However, traditional diagnostic methods remain inaccurate and inefficient, frequently requiring hours or even days for root cause analysis. To address this, we propose CCL-D, a high-precision diagnostic system designed to detect and locate slow/hang anomalies in large-scale distributed training. CCL-D integrates a rank-level real-time probe with an intelligent decision analyzer. The probe measures cross-layer anomaly metrics using a lightweight distributed tracing framework to monitor communication traffic. The analyzer performs automated anomaly detection and root-cause location, precisely identifying the faulty GPU rank. Deployed on a 4,000-GPU cluster over one year, CCL-D achieved near-complete coverage of known slow/hang anomalies and pinpointed affected ranks within 6 minutes-substantially outperforming existing solutions.

[216]  arXiv:2605.04480 [pdf, ps, other]
Title: Geometric Milstein Scheme for Stochastic Differential Equations on SO(n) and SE(n)
Authors: Xi Wang, Victor Solo
Comments: 36 pages, 6 figures
Subjects: Numerical Analysis (math.NA)

In the paper, we propose a higher-order geometry-preserving numerical method for stochastic differential equations (SDEs) evolving on the Lie groups SO(n) and SE(n). Most existing Lie group integrators rely on Magnus expansion of the exponential map, which makes the construction of higher-order stochastic schemes difficult. To overcome this limitation, we develop a tangent-space parameterization corrected Milstein method (TaSP-CM), extending the tangent space parameterization (TaSP) framework from Lie-group ODEs to the stochastic setting. Although TaSP is a well-established method for Lie ODEs, the extension to SDEs is non-trivial and requires new stochastic corrections that ensure both geometric consistency and higher-order accuracy. We prove that the proposed scheme achieves strong convergence of order 1 under both commutative and non-commutative noise. Numerical experiments illustrate the theoretical results and demonstrate the efficiency and robustness of the proposed method.

[217]  arXiv:2605.04481 [pdf, ps, other]
Title: Tightly-Coupled Estimation and Guidance for Robust Low-Thrust Rendezvous via Adaptive Homotopy
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Minimum-fuel low-thrust rendezvous guidance yields bang-bang control structures highly sensitive to estimation errors, sensor anomalies, and solver regularization, making aggressive closed-loop execution brittle for uncooperative proximity operations. This paper proposes a tightly-coupled estimation and guidance architecture where navigation confidence directly modulates the homotopy parameter of a receding-horizon indirect optimal control solver. Relative motion is modeled in the Clohessy-Wiltshire frame. The translational state is estimated via a linear Kalman filter augmented by a Multiple Tuning Factors (MTF) covariance inflation mechanism that suppresses suspicious innovation directions. A composite score from the normalized innovation and MTF activity is mapped online to the homotopy parameter, allowing the controller to relax toward a smoother, conservative regime when confidence degrades, and recover fuel-efficient bang-bang control as sensing improves. Numerical results under severe measurement degradation show fixed bang-bang guidance remains brittle; both plain-KF and MTF-KF fixed-epsilon controllers yield large terminal miss distances. Conversely, the proposed MTF-adaptive homotopy controller reduces terminal miss by roughly two orders of magnitude, from hundreds of meters to sub-meter levels, requiring only a moderate increase in control effort versus the open-loop fuel-optimal benchmark. A comparison indicates adaptive homotopy is the dominant robustness mechanism, while MTF provides additional accuracy and efficiency improvements. The receding-horizon implementation exhibits consistently fast and reliable solution times, supporting the practical online viability of the proposed method.

[218]  arXiv:2605.04485 [pdf, ps, other]
Title: Analysis of gradient flow for computing defocusing action ground states of rotating nonlinear Schrödinger equations
Comments: 20 pages, 5 figures
Subjects: Numerical Analysis (math.NA)

This work focuses on the numerical computation of defocusing action ground states for rotating nonlinear Schr\"odinger equations (RNLS) using a direct gradient flow (DGF) method. We address theoretical gaps in the existing literature concerning the stability and convergence of this DGF scheme. Firstly, we prove the unconditional stability of the DGF scheme, demonstrating that the action functional is monotonically non-increasing along the discrete flow for arbitrary time step sizes. Secondly, we establish a rigorous convergence analysis, proving global convergence under minor assumptions and local exponential convergence to the action ground state under a reasonable non-degeneracy condition. The analysis relies on the uniform boundedness of sublevel sets of the action functional and introduces a tailored $H^1$-distance between phase-shift equivalence classes to handle complex-valued ground states with quantized vortices. A novel analytical framework is also developed to establish the exponential convergence rate. Numerical experiments are presented to validate the theoretical findings, demonstrating both the global migration towards a neighborhood of the ground state and subsequent exponential convergence.

[219]  arXiv:2605.04488 [pdf, ps, other]
Title: How Does Thinking Mode Change LLM Moral Judgments? A Controlled Instant-vs-Thinking Comparison Across Five Frontier Models
Subjects: Artificial Intelligence (cs.AI)

We evaluate whether enabling provider-exposed reasoning mode changes moral judgments within the same model checkpoint. Across 100 moral-judgment scenarios and five frontier reasoning-trained LLMs (Claude Sonnet 4.6, GPT 5.5, Gemini 3 Flash, DeepSeek V3.1, and Qwen3.5 397B), aggregate binary-verdict agreement remains high and statistically indistinguishable between instant and thinking modes (Krippendorff's alpha = 0.78 vs. 0.79). However, disagreement is concentrated in 21 model-disputed scenarios, where instant-mode agreement is near chance (alpha = 0.08). On these scenarios, reasoning directionally narrows cross-model disagreement, increasing mean pairwise agreement from 5.4 to 6.7 out of 10. Reasoning also reduces demographic-judgment inconsistency in three of five models and does not increase it for any model. Across all five model families, reasoning changes self-labeled ethical frameworks more often than binary verdicts.

[220]  arXiv:2605.04489 [pdf, ps, other]
Title: A Hybrid Method for Low-Resource Named Entity Recognition
Comments: Published in Journal of Applied Data Sciences, Volume 7, Issue 2, pages 999--1019, 2026. Open access under CC BY 4.0
Journal-ref: Journal of Applied Data Sciences, Vol. 7, No. 2, pp. 999--1019, 2026
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Named Entity Recognition (NER) is a critical component of Natural Language Processing with diverse applications in information extraction and conversational AI. However, NER in specific domains for low-resource languages faces challenges such as limited annotated data and heterogeneous label sets. This study addresses these issues by proposing a hybrid neurosymbolic framework that integrates rule-based processing with deep learning models for Vietnamese NER. The core idea involves a two-stage pipeline: first, a rule-based component reduces label complexity by grouping relational and special categories; second, pre-trained language models are fine-tuned for high-precision extraction. A post-processing module is then utilized to restore fine-grained labels, preserving expressiveness for application-level usability. To mitigate data scarcity, a scalable data augmentation strategy leveraging Large Language Models (LLMs) is introduced to expand the label set without full re-annotation, which is a significant novelty of this work. The effectiveness of this method was evaluated across five specific-domain datasets, including logistics, wildlife, and healthcare. Experimental results demonstrate substantial improvements over strong RoBERTa-based baselines. Specifically, the proposed system achieved F1 scores of 90 percent in Customer Service, up from 83 percent; 84 percent in GAM, up from 73 percent; 83 percent in AI Fluent, up from 80 percent; 94 percent in PhoNER_Covid19, up from 91 percent; and 60 percent in Rare Wildlife, up from 36 percent. These findings confirm that the hybrid approach effectively captures the linguistic complexity of Vietnamese and contextual nuances in specialized domains, offering a robust contribution to low-resource NER research.

[221]  arXiv:2605.04491 [pdf, ps, other]
Title: An Evaluation of Chat Safety Moderations in Roblox
Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR)

Roblox is among the most popular online gaming platforms, used by hundreds of millions of users every day. A substantial portion of these users are underage, who are at a greater risk, where abusive users may utilize Roblox's real-time chat interface to make the initial contact with potential victims. Roblox employs automated chat moderation mechanisms to detect potentially abusive messages; however, to date, their effectiveness has not been independently investigated. Toward this goal, we collected approximately 2 million chat messages from four games across multiple age groups and analyzed them to evaluate the moderation system. These messages were collected from public game servers following ethical and legal norms as well as Roblox's terms of service.
We use this corpus to qualitatively study which types of unsafe chats escape the moderation system and how policy-violating users evade the moderation system. Given the dataset's scale, it is prohibitively expensive to conduct qualitative content analysis manually. Therefore, we adopt a two-step approach. First, we manually labeled safe and unsafe messages (n=99.8K) and used them as a ground truth to evaluate four locally hosted state-of-the-art large language models (LLMs). Next, the best-performing LLM was applied to the entire corpus to identify potentially unsafe messages, which we manually categorized using iterative open and axial coding methods until thematic saturation was reached. Overall, our findings reveal a troublesome reality: numerous instances of unsafe chat messages related to grooming, sexualizing minors, bullying, & harassment, violence, self-harm, and sharing sensitive information, etc., escaped the current moderation. Our analysis of users whose messages were previously flagged revealed that they continue to send harmful messages by employing a wide range of techniques to evade the moderation system.

[222]  arXiv:2605.04494 [pdf, ps, other]
Title: Towards General Preference Alignment: Diffusion Models at Nash Equilibrium
Comments: 21 pages, 5 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computationally efficient alternative that avoids explicit reward modeling and has been widely adopted in diffusion alignment. However, existing preference-based methods for diffusion alignment still rely on reward-induced preference signals and typically assume that human preferences can be adequately modeled by the Bradley--Terry (BT) model, which may fail to capture the full complexity of human preferences. In this paper, we formulate diffusion alignment from a game-theoretic perspective. We propose Diffusion Nash Preference Optimization (Diff.-NPO), an intuitive general preference framework for diffusion alignment. Diff.-NPO encourages the current policy to play against itself to achieve self improvement and lead to a better alignment. Empirically, we demonstrate the effectiveness of Diff.-NPO on the text-to-image generation task via various metrics. Diff.-NPO consistently outperforms existing preference-based diffusion alignment methods.

[223]  arXiv:2605.04495 [pdf, ps, other]
Title: CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Retrieval-Augmented Generation (RAG) depends on document ranking to provide useful evidence for generation, but conventional reranking methods mainly optimize query-document relevance rather than generation usefulness. A relevant document may still introduce noise, while a lower-ranked document may better reduce the generator's uncertainty. We propose CAR (Confidence-Aware Reranking), a query-guided, training-free, and plug-and-play reranking framework that uses generator confidence change as a document usefulness signal. CAR estimates confidence through the semantic consistency of multiple sampled answers under query-only and query-document conditions. Documents that significantly increase confidence are promoted, those that decrease confidence are demoted, and uncertain cases preserve the baseline order, while a query-level gate avoids unnecessary intervention on already confident queries. Experiments on four BEIR datasets show that CAR consistently improves NDCG@5 across sparse and dense retrievers, LLM-based and supervised rerankers, and four LLM backbones. Notably, CAR improves the YesNo reranker by 25.4 percent on average under Contriever retrieval, and its ranking gains strongly correlate with downstream generation F1 improvements, achieving Spearman rho = 0.964.

[224]  arXiv:2605.04496 [pdf, ps, other]
Title: SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States
Comments: ICML 2026
Subjects: Computation and Language (cs.CL)

Long-Text Understanding (LTU) at million-token scale requires balancing reasoning fidelity with computational efficiency. Frontier long-context LLMs can process millions of token contexts end-to-end, but they suffer from high token consumption and attention dilution. In parallel, specialized LTU agents often sacrifice fidelity through task-agnostic abstractions like graph construction or indexing. We identify a key insight for LTU: query-relevant information is typically sparse relative to the full document, so effective reasoning should rely on a query-sufficient subset rather than the entire context. To address this, we propose SCOUT, a new paradigm for LTU that shifts from passive processing to active information foraging. It treats the document as an explorable environment and answers from a compact, provenance-grounded epistemic state. Guided by state-level gap diagnosis, SCOUT adaptively alternates between coarse-to-fine exploration and anchored state updates that progressively contract its epistemic state toward query sufficiency. Experiments show that SCOUT matches state-of-the-art proprietary models while reducing token consumption by up to 8x. Moreover, SCOUT remains stable as context length scales, substantially alleviating the practical cost-performance trade-off.

[225]  arXiv:2605.04497 [pdf, ps, other]
Title: Quadrature-TreeSHAP: Depth-Independent TreeSHAP and Shapley Interactions
Subjects: Machine Learning (cs.LG)

Shapley values are a standard tool for explaining predictions of tree ensembles, with Path-Dependent SHAP being the most widely used variant. Despite substantial progress, existing methods still exhibit trade-offs between depth-dependent runtime, numerical stability, and support for higher-order interactions. To address these challenges, we introduce Quadrature-TreeSHAP, a quadrature-based reformulation of Path-Dependent TreeSHAP that is numerically stable, naturally extends to any-order Shapley interaction values and is practically insensitive to tree depth. Our implementation supports both CPU and GPU and is integrated into XGBoost.
Our method is based on a weighted-Banzhaf interaction polynomial, which expresses Banzhaf interaction values as expectations under a feature participation probability $p$. Shapley values and any-order interaction values are then recovered by integrating these polynomials over $p$ from 0 to 1. We evaluate these integrals using Gauss-Legendre quadrature, and show that, in practice, only 8 fixed quadrature points are sufficient to reach machine precision. In fact, Quadrature-TreeSHAP with 8 fixed points achieves greater numerical stability than TreeSHAP. This fixed-point formulation removes depth dependence from the inner computation and enables efficient SIMD execution.
We confirm these advantages empirically. On 12 XGBoost benchmarks, Quadrature-TreeSHAP computes Shapley values 1.06x-10.59x faster than TreeSHAP on CPU and 1.84x-6.95x faster than GPUTreeSHAP on GPU. Shapley pairwise interactions are 3.80x-58.11x faster on CPU, with higher-order interactions achieving speedups of up to 1200x compared to TreeSHAP-IQ.

[226]  arXiv:2605.04499 [pdf, ps, other]
Title: Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals exacerbates this challenge. While recent research has explored automating tasks such as penetration testing using LLM-based agents, existing frameworks often perform poorly due to limited capability in strategy formulation, domain-specific reasoning, and accurate action and tool selection. To overcome these limitations, we propose Pen-Strategist framework, consisting of a novel domain-specific reasoning model that derives pentesting strategies via logical reasoning and a classifier that converts the strategies into actionable steps. First, we construct a reasoning dataset containing logical explanations for both strategy derivation and step selection in pentesting scenarios. We then fine-tune a Qwen-3-14B model for strategy generation using reinforcement learning. Evaluation on the test split of the dataset demonstrates a 87% improvement in strategy derivation performance compared to the baseline. Furthermore, we integrate the fine-tuned Pen-Strategist model into existing automated pentesting frameworks, such as PentestGPT, and evaluate its performance on vulnerable machines, achieving a 47.5% improvement in subtask completion while surpassing the baseline GPT-5. Further experiments on the CTFKnow benchmark show an 18% performance gain over the base model. For step prediction, we train a semantic-based CNN classifier, which outperforms commercial LLMs by 28% and enhances execution stability. Finally, we conduct a user study to qualitatively assess the generated strategies, and Pen-Strategist demonstrates superior performance compared to the Claude-4.6-Sonnet.

[227]  arXiv:2605.04500 [pdf, ps, other]
Title: Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties
Comments: Accepted to CoNLL 2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize differences between them. However, for low-resource varieties, linguistic dissimilarity is also an important cue allowing generalization to unseen varieties. Unlike prior approaches, we propose a two-stage Language Generalization framework that focuses on capturing variety-specific cues while also exploiting rich overlap offered by high-resource source variety. First, we propose TOPPing, a source-selection method specifically designed for low-resource varieties. Second, we suggest a lightweight VACAI-Bowl architecture that learns variety-specific attributes with one branch while a parallel branch captures variety-invariant attributes using adversarial training. We evaluate our framework on structural prediction tasks, which are among the few tasks available, as proxy for performance on other downstream tasks. Using VACAI-Bowl with TOPPing yields an average 54.62% improvement in the dependency parsing task, which serves as a proxy for performance on other downstream tasks across 10 low-resource varieties.

[228]  arXiv:2605.04501 [pdf, ps, other]
Title: Example-Based Object Detection
Authors: ZhiXin Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In recent years, object detection has achieved significant progress, especially in the field of open-vocabulary object detection. Unlike traditional methods that rely on predefined categories, open-vocabulary approaches can detect arbitrary objects based on human-provided prompts. With the advancement of prompt-based detection techniques, models such as SAM3 can even outperform some category-specific detectors trained on particular datasets without requiring additional training on those datasets. However, despite these advancements, false positives and false negatives still occur. In practical engineering applications, persistent misdetections or missed detections of the same object are unacceptable. Yet retraining the model every time such errors occur incurs substantial costs in terms of human effort, computational resources, and time. Therefore, how to leverage existing false positive and false negative samples to prevent such errors from recurring remains a highly challenging and urgent problem. To address this issue, we propose EBOD (Example-Based Object Detection), which integrates a prompt-based detector (SAM3) with robust feature matching modules (DINOv3 and LightGlue). The proposed framework effectively suppresses the repeated occurrence of false positives and false negatives by leveraging previous error examples, without requiring additional model retraining. Code is available at https://github.com/sunzx97/examples_based_object_detection.

[229]  arXiv:2605.04502 [pdf, ps, other]
Title: Gradient Scaling Effects in Adaptive Spectral PINNs for Stiff Nonlinear ODEs
Comments: 8 pages, 4 figures, 1 table. This work appeared at the ICLR 2026 AI&PDE Workshop on OpenReview
Subjects: Machine Learning (cs.LG)

Physics-Informed Neural Networks (PINNs) often struggle to train reliably on stiff and oscillatory dynamical systems due to poor optimization conditioning. While prior work has emphasized representational remedies such as spectral parameterizations, the optimization implications of initial-condition (IC) embeddings in adaptive spectral PINNs have not been well characterized. In this work, we show that the choice of IC gating function induces explicit time-dependent gradient scaling, which interacts with spectral representations during training. Using a nonlinear stiff spring-pendulum ODE as a controlled benchmark, we compare exponential and linear IC gates in combination with fixed and adaptive Fourier spectral trunks. We observe stiffness-dependent changes in relative dominance for adaptive PINNs: at moderate stiffness ($k=20$), exponential gating often yields lower error but exhibits heterogeneous behavior across random seeds, whereas at higher stiffness ($k=60$), linear gating becomes preferable, with additional reversals observed at larger $k$. These trends hold for both relative $L^2$ error and maximum pointwise error and are confirmed by paired Wilcoxon signed-rank tests with Holm correction. Overall, our results demonstrate that IC embeddings are not a neutral design choice in PINNs: the induced gradient scaling materially shapes optimization conditioning in stiff regimes, with distinct sensitivity patterns in baseline and adaptive spectral models.

[230]  arXiv:2605.04503 [pdf, ps, other]
Title: DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Image Difference Captioning (IDC) generates natural language descriptions that precisely identify differences between two images, serving as a key benchmark for fine-grained change perception, cross-modal reasoning, and image editing data construction. However, existing benchmarks lack diversity and compositional complexity, and standard lexical-overlap metrics (e.g., BLEU, METEOR) fail to capture semantic consistency or penalize hallucinations, which together prevent a comprehensive and robust evaluation of multimodal large language models (MLLMs) on IDC. To address these gaps, we introduce DiffCap-Bench, a comprehensive IDC benchmark covering ten distinct difference categories to ensure diversity and compositional complexity. Furthermore, we propose an LLM-as-a-Judge evaluation protocol grounded in human-validated Difference Lists, enabling a robust assessment of models' ability to both capture and describe visual changes. Through extensive evaluation of state-of-the-art MLLMs, we reveal significant performance gaps between proprietary and open-source models, highlight the critical importance of reasoning capability, and identify clear limitations in model scaling. Our framework also demonstrates strong alignment with human expert judgments and strong correlation with downstream image editing data construction quality. These findings establish DiffCap-Bench as both a reliable IDC evaluation framework and a practical predictor of downstream utility. The benchmark and code will be made publicly available to support further research.

[231]  arXiv:2605.04504 [pdf, ps, other]
Title: SpecPL: Disentangling Spectral Granularity for Prompt Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Existing prompt learning for VLMs exhibits a modality asymmetry, predominantly optimizing text tokens while still relying on frozen visual encoder as holistic extractor and neglecting the spectral granularity essential for fine-grained discrimination. To bridge this, we introduce Disentangling Spectral Granularity for Prompt Learning (SpecPL), which approaches prompt learning from a novel spectral perspective via Counterfactual Granule Supervision. Specifically, we leverage a frozen VAE to decompose visual signals into semantic low-frequency bands and granular high-frequency details. A frozen Visual Semantic Bank anchors text representations to universal low-frequency invariants, mitigating overfitting. Crucially, fine-grained discrimination is driven by counterfactual granule training: by permuting high-frequency signals, we compel the model to explicitly distinguish visual granularity from semantic invariance. Uniquely, SpecPL serves as a universal plug-and-play booster, revitalizing text-oriented baselines like CoOp and MaPLe via visual-side guidance. Experiments on 11 benchmarks demonstrate competitive state-of-the-art performance, achieving a new performance ceiling of 81.51\% harmonic-mean accuracy. These results validate that spectral disentanglement with counterfactual supervision effectively bridges the gap in the stability-generalization trade-off. Code is released at https://github.com/Mlrac1e/SpecPL-Prompt-Learning.

[232]  arXiv:2605.04506 [pdf, ps, other]
Title: Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting
Comments: The International Conference on Pattern Recognition (ICPR) 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

We introduce Ilov3Splat, a novel framework for instance-level open-vocabulary 3D scene understanding built on 3D Gaussian Splatting (3D-GS). Most prior work depends on 2D rendering-based matching or point-level semantic association, which undermines cross-view consistency, lacks coherent instance-level reasoning, and limits precision in downstream 3D tasks. To address these limitations, our method jointly optimizes scene geometry and semantic representations by augmenting Gaussian splats with view-consistent feature fields. Specifically, we leverage multi-resolution hash embedding to efficiently encode language-aligned CLIP features, enabling dense and coherent language grounding in 3D space. We further train an instance feature field using contrastive loss over SAM masks, supporting fine-grained object distinction across views. At inference time, CLIP-encoded queries are matched against the learned features, followed by two-stage 3D clustering to retrieve relevant Gaussian groups. This enables our framework to identify arbitrary objects in 3D scenes based on natural language descriptions, without requiring category supervision or manual annotations. Experiments on standard benchmarks demonstrate that Ilov3Splat outperforms prior open-vocabulary 3D-GS methods in both object selection and instance segmentation, offering a flexible and accurate solution for language-driven 3D scene understanding. Project page: https://csiro-robotics.github.io/Ilov3Splat.

[233]  arXiv:2605.04507 [pdf, ps, other]
Title: Distilling Bayesian Belief States into Language Models for Auditable Negotiation
Comments: Preprint. 24 pages, 6 figures, 18 tables. Code available at this https URL
Subjects: Computation and Language (cs.CL)

Negotiation agents must infer what their counterpart values, update those beliefs over dialogue turns, and choose actions under uncertainty. End-to-end large language models (LLMs) can imitate negotiation dialogue, but their opponent beliefs are usually implicit and difficult to inspect. We propose BOND (Bayesian Opponent-belief Negotiation Distillation), a framework for auditable negotiation. BOND consists of an LLM-based Bayesian teacher that scores dialogue contexts against the six possible opponent priority orderings, updates a posterior over those orderings, and uses the posterior for menu-based decision making, as well as a smaller 8B student language model that emits both negotiation actions and normalized posterior beliefs as tagged text. In the CaSiNo negotiation dataset, BOND outperforms the state-of-the-art and achieves mean Brier score 0.085 over opponent-priority posteriors. The distilled student preserves much of this belief signal, achieving Brier 0.114, below the uniform six-ordering reference of 5/36, approximately 0.139. Compared with a 70B structured-CoT baseline, the significantly smaller 8B student model yields substantially better elicited posterior calibration. We further showcase auditability through posterior trajectories, belief-versus-policy error decomposition, and posterior-prefix interventions. These diagnostics reveal that distillation preserves a scoreable belief report more strongly than causal belief-conditioned control, making weak belief-action coupling visible, not hidden.

[234]  arXiv:2605.04509 [pdf, ps, other]
Title: CoherentRaster: Efficient 3D Gaussian Splatting for Light Field Displays
Subjects: Graphics (cs.GR)

Light field displays (LFDs) require rendering an interlaced image that encodes many view-dependent observations. This multi-view requirement introduces substantial computational overhead, making real-time rendering difficult to achieve. While 3D Gaussian Splatting (3DGS) is efficient for single-view rendering on 2D displays, directly extending it to LFDs is computationally expensive. Moreover, prior accelerations either suffer from GPU inefficiency under spatially incoherent subpixel layouts or rely on computationally heavy multi-plane intermediates. In this paper, we propose CoherentRaster, a 3DGS-based light field rendering framework that performs subpixel-level rasterization. Our method employs Cross-view Coherent Attribute Reuse to eliminate redundant computation across neighboring viewpoints and applies View-coherent Remapping to restore warp-level memory efficiency degraded by the interlaced subpixel layout. Together, CoherentRaster provides an efficient pipeline for real-time, high-quality light field synthesis on consumer-grade hardware.

[235]  arXiv:2605.04515 [pdf, ps, other]
Title: From Priors to Perception: Grounding Video-LLMs in Physical Reality
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally conflate generative artifacts with genuine physical fallacies. Furthermore, we find that models fail systematically not only in anti-physics anomalies but also in counter-intuitive scenarios where visual facts contradict statistical expectations. Accordingly, we propose the Unified Attribution Theory: this dual failure stems not from perception deficiency, but from Semantic Prior Dominance -- the reasoning mechanism is deeply hijacked by internal narrative scripts. To address this, we construct the Programmatic Adversarial Curriculum (PACC), the first high-fidelity adversarial video dataset synthesized based on physical laws, thoroughly decoupling visual artifacts from logical errors. Concurrently, we design the Visual-Anchored Reasoning Chain (VARC) to force models to explicitly ground their judgments in low-level visual facts prior to logical adjudication. Experiments demonstrate that without invasive architectural modifications, standard LoRA fine-tuning with the PACC curriculum effectively neutralizes prior interference in state-of-the-art (SOTA) models, yielding a substantial leap in physical reasoning capabilities.

[236]  arXiv:2605.04518 [pdf, ps, other]
Title: DALight-3D: A Lightweight 3D U-Net for Brain Tumor Segmentation from Multi-Modal MRI
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Automatic brain tumor segmentation from multi-modal MRI remains challenging because volumetric models often incur substantial computational cost. This paper presents DALight-3D, a compact 3D U-Net variant that combines depthwise separable 3D convolutions, identifier-conditioned normalization, cross-slice attention, and adaptive skip fusion. The method is evaluated on the Medical Segmentation Decathlon Task01 BrainTumour benchmark under matched optimization settings against standard 3D U-Net, Attention U-Net, Residual 3D U-Net, and V-Net baselines. In the reported 50-epoch comparison, DALight-3D achieves a mean Dice of 0.727 with 2.22M parameters, compared with 0.710 Dice and 3.20M parameters for Residual 3D U-Net. Component-wise ablations show consistent performance degradation when SepConv, identifier-conditioned normalization, CSA, or SSFB is removed. These results indicate that DALight-3D offers a favorable accuracy-efficiency trade-off within the present benchmark setting.

[237]  arXiv:2605.04519 [pdf, ps, other]
Title: FL-Sailer: Efficient and Privacy-Preserving Federated Learning for Scalable Single-Cell Epigenetic Data Analysis via Adaptive Sampling
Journal-ref: Transactions on Machine Learning Research (TMLR), May 2026
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Single-cell ATAC-seq (scATAC-seq) enables high-resolution mapping of chromatin accessibility, yet privacy regulations and data size constraints hinder multi-institutional sharing. Federated learning (FL) offers a privacy-preserving alternative, but faces three fundamental barriers in scATAC-seq analysis: ultra-high dimensionality, extreme sparsity, and severe cross-institutional heterogeneity. We propose FL-Sailer, the first FL framework designed for scATAC-seq data. FL-Sailer integrates two key innovations: (i) adaptive leverage score sampling, which selects biologically interpretable features while reducing dimensionality by 80%, and (ii) an invariant VAE architecture, which disentangles biological signals from technical confounders via mutual information minimization. We provide a convergence guarantee, showing that FL-Sailer converges to an approximate solution of the original high-dimensional problem with bounded error. Extensive experiments on synthetic and real epigenomic datasets demonstrate that FL-Sailer not only enables previously infeasible multi-institutional collaborations but also surpasses centralized methods by leveraging adaptive sampling as an implicit regularizer to suppress technical noise. Our work establishes that federated learning, when tailored to domain-specific challenges, can become a superior paradigm for collaborative epigenomic research.

[238]  arXiv:2605.04522 [pdf, ps, other]
Title: DAO-enabled decentralized physical AI: A new paradigm for human-machine collaboration
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); General Economics (econ.GN)

We propose DAO-enabled decentralized physical AI (DePAI), a democratic architecture for coordinating humans and autonomous machines in the operation and governance of physical-digital systems. We (1) synthesize foundations in blockchains, decentralized autonomous organizations (DAOs), and cryptoeconomics; (2) connect DAO design with digital-democracy research on deliberation and voting, showing how each can advance the other; (3) position DAO-governed decentralized physical infrastructure networks (DePIN) within a vertically integrated stack that links energy and sensing to connectivity, storage/compute, models, and robots; (4) show how these elements specify workflows that couple machine execution with human oversight, enabling enhanced self-organization of techno-socio-economic systems, which we call DePAI; and (5) analyze risks, including security, centralization, incentive failure, legal exposure, and the crowding-out of intrinsic motivation, and argue for value-sensitive design and continuously adaptive governance. DePAI offers a path to scalable, resilient self-organization that integrates physical infrastructure, AI, and community ownership under transparent rules, on-chain incentives, and permissionless participation, aiming to preserve human autonomy.

[239]  arXiv:2605.04523 [pdf, ps, other]
Title: RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out of 26 teams, achieving a conditioned harmonic mean of 0.7827 and outperforming the strongest baseline (gpt-oss-120b, 0.6390). Ablations show that diversity in model families, scales, and prompting strategies is essential, with the ensemble consistently beating any single model. We also introduce Meno-Lite-0.1, a 7B domain-adapted model with a strong cost--performance trade-off, and analyse MTRAGEval, highlighting annotation limitations and directions for improvement. Our code is publicly available: https://github.com/RaguTeam/ragu_mtrag_semeval

[240]  arXiv:2605.04524 [pdf, ps, other]
Title: High-Fidelity Single-Image Head Modeling with Industry-Grade Topology
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

We present a single-image head mesh reconstruction framework that addresses the longstanding challenge of simultaneously preserving facial identity and producing industry-grade topology. Our framework adopts a coarse-to-fine optimization pipeline that refines a rigged template across three stages -- rig, joint, and vertex -- achieving stable convergence and consistent topology. To mitigate the ill-posed nature of single-image 3D face reconstruction and ensure identity preservation, we employ a normal consistency objective jointly with landmark alignment. To further preserve local surface structure and enforce topological regularity, we introduce geometry-aware constraints based on Gaussian curvature and conformal consistency, along with auxiliary regularizations that correct fine artifacts such as lip seams and eyelid discontinuities. Our hierarchical optimization with geometry-aware regularization yields meshes with semantically meaningful edge flow and industry-grade topology. After geometry reconstruction, we extract UV-space texture and normal maps to preserve appearance details for visualization and downstream use. In a user study with 22 professional technical artists, our results were assessed as approaching industry-grade usability, and 95% of participants ranked our method as the top-performing approach, underscoring its effectiveness for real-world digital human production.

[241]  arXiv:2605.04525 [pdf, ps, other]
Title: HDFlow: Hierarchical Diffusion-Flow Planning for Long-horizon Tasks
Comments: ICML 2026 (Spotlight)
Subjects: Robotics (cs.RO)

Recent advances in generative models have shown promise in generating behavior plans for long-horizon, sparse reward tasks. While these approaches have achieved promising results, they often lack a principled framework for hierarchical decomposition and struggle with the computational demands of real-time execution, due to their iterative denoising process. In this work, we introduce Hierarchical Diffusion-Flow (HDFlow), a novel hierarchical planning framework that optimally leverages the strengths of diffusion and rectified flow models to overcome the limitations of single-paradigm generative planners. HDFlow employs a high-level diffusion planner to generate sequences of strategic subgoals in a learned latent space, capitalizing on diffusion's powerful exploratory capabilities. These subgoals then guide a low-level rectified flow planner that generates smooth and dense trajectories, exploiting the speed and efficiency of ordinary differential equation (ODE)-based trajectory generation. We evaluate HDFlow on four challenging furniture assembly tasks in both simulation and real-world, where it significantly outperforms state-of-the-art methods. Furthermore, we also showcase our method's generalizability on two long-horizon benchmarks comprising diverse locomotion and manipulation tasks. Project website: https://hdflow-page.github.io/

[242]  arXiv:2605.04527 [pdf, ps, other]
Title: Velox: Learning Representations of 4D Geometry and Appearance
Comments: CVPR 2026, Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured dynamic point cloud, to construct. Specifically, Velox trains an encoder to compress spatiotemporal color point clouds into a set of dynamic shape tokens. These tokens are supervised using two complementary decoders: a 4D surface decoder, which models the time-varying surface distribution capturing the geometry; and a Gaussian decoder, which maps the tokens to 3D Gaussians, helping learn appearance. To demonstrate the utility of our representation, we evaluate it across three downstream tasks -- video-to-4D generation, 3D tracking, and cloth simulation via image-to-4D generation -- and observe strong performances in all settings.

[243]  arXiv:2605.04528 [pdf, ps, other]
Title: YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Mechanical equipment forms the critical backbone of modern industrial production, yet domain shift severely limits the generalization of deep learning based fault diagnosis models across different equipment and operating conditions.Inspired by the success of foundation models in achieving zero-shotgeneralization, we propose YOTOnet (You Only Train Once), a novel architecture specifically designed for cross-domain fault diagnosis in mechanical equipment.YOTOnet comprises three core components: (1) a physics-aware Invariant Feature Distiller that extracts domain-agnostic representations using multi-scale dilated convolutions and FFT-based time-frequency fusion,(2) Domain-Conditioned Sparse Experts (DC-MoE) that adaptively route inputs to specialized processors via learned gating without external meta-data, and (3) a dual-head classification system with auxiliary supervision.Extensive validation on five public bearing datasets (CWRU, MFPT, XJTU,OTTAWA, HUST) through 30 cross-dataset protocols demonstrates the superiority of YOTOnet compared with other state-of-the-art methods. Critically, we observe a clear scaling effect-average test F1 improves from 0.5339(1 training dataset) to 0.705 (4 datasets), with a clear gain when moving from 3 to 4 datasets. These findings provide empirical evidence that foundation model principles can enable robust, train-once deployment for industrial fault diagnosis.

[244]  arXiv:2605.04530 [pdf, ps, other]
Title: SADE: Symptom-Aware Diagnostic Escalation for LLM-Based Network Troubleshooting
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

Large language model (LLM) agents are increasingly applied to network troubleshooting, but root-cause localization on public benchmarks remains well below practical deployment thresholds. We argue this is because existing agents do not encode the disciplined, layer-by-layer methodology that human network engineers use, and instead rely on free-form deliberation that conflates evidence acquisition with hypothesis commitment. We present SADE (Symptom-Aware Diagnostic Escalation), an agent that encodes the classical Cisco troubleshooting methodology as an explicit policy. SADE pairs a phase-gated diagnostic workflow, which separates evidence acquisition from hypothesis commitment, with a routed library of fault-family skills and high-yield diagnostic helpers. On a held-out 523 incident set of the public NIKA benchmark covering eleven unseen scenarios, SADE improves root-cause F1 by 37 percentage points over a ReAct + GPT-5 baseline; a model-controlled comparison against the same Claude Sonnet backend without the SADE policy attributes 22 of those points to the diagnostic policy alone, showing that the gain is not a side-effect of the model upgrade.

[245]  arXiv:2605.04531 [pdf, ps, other]
Title: Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of region proposals. While recent test-time adaptive object detection methods for VLM-based either rely on costly backpropagation or bypass semantic misalignment via external memory, none directly and efficiently align text and vision in a training-free manner. To address this, we propose Reward-Guided Semantic Evolution (RGSE), a training-free framework that directly refines the text embeddings at test time. Inspired by evolutionary search, RGSE treats text embedding adaptation as a semantic search process: it perturbs text embeddings as candidate variants, evaluates them via cosine similarity with current and historical high-confidence visual proposals as a reward signal, and fuses them into a refined embedding through reward-weighted averaging. Without any backpropagation, RGSE achieves state-of-the-art performance across multiple detection benchmarks while adding minimal computational overhead. Our code will be open source upon publication.

[246]  arXiv:2605.04532 [pdf, ps, other]
Title: Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap
Authors: Christoph Treude
Comments: 3rd ACM International Conference on AI-powered Software (AIware 2026)
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

AI coding assistants and autonomous agents are becoming integral to software development workflows, reshaping how code is produced, reviewed, and maintained. While recent research has focused mainly on the capabilities and impacts of productivity of these systems, much less attention has been paid to accountability: who is responsible when agents generate, modify, or recommend code? In practice, accountability is defined through the Terms of Service (ToS) and related policy documents that govern the use of AI-powered development tools.
In this vision paper, we present a comparative analysis of the Terms of Service for widely used AI coding assistants and agent-enabled development tools. We examine how these documents allocate ownership, responsibility, liability, and disclosure obligations between tool providers and software developers, and we identify common patterns and divergences between providers. Our analysis reveals a consistent tendency to shift responsibility for correctness, safety, and legal compliance onto users, as well as substantial variation in how providers address issues such as indemnification, data reuse, and acceptable use.
Based on these findings, we argue that existing policy frameworks are poorly aligned with increasingly agent-mediated and autonomous software development workflows. We outline a research roadmap for accountable agents in software engineering, identifying challenges and opportunities for modeling responsibility, designing governance artifacts, developing tooling that supports accountability, and conducting empirical studies of developers' perceptions and practices.

[247]  arXiv:2605.04534 [pdf, ps, other]
Title: Characterizing Students' LLM Usage Behaviors and Their Association with Learning in Critical Thinking Tasks
Comments: EDM 2026
Subjects: Human-Computer Interaction (cs.HC)

Large language models (LLMs) are becoming increasingly embedded in students' learning practices, yet much of what is known about how students use LLMs and how this usage impacts learning comes from problem-solving domains or constrained experimental settings. We present an analysis of data on LLM usage collected during two offerings of a research-oriented course where students learn to read, reason about, and critique academic papers. Without restrictions on whether or how to use LLMs, students reported their LLM usage practices when asked to do these activities as a series of homework assignments during the course. This paper extends prior work done on data from a single offering of the same course by presenting a refined bottom-up categorization of LLM usage types, cross-labeled by the extent of student initiative these usages entail. Furthermore, we examine how LLM use impacts student learning, measured by performance on three midterms, looking at factors such as frequency and type of usage.

[248]  arXiv:2605.04535 [pdf, ps, other]
Title: From Video-to-PDE: Data-Driven Discovery of Nonlinear Dye Plume Dynamics
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Applications (stat.AP); Machine Learning (stat.ML)

Inferring continuum models directly from video is hampered by two facts: the recorded field is uncalibrated image intensity rather than a physical state, and direct numerical differentiation of noisy frames is unstable. We develop a video-to-PDE pipeline that converts grayscale recordings of an ink plume into a normalised scalar field $u(x,y,t)$, isolates a bulk drift $\mathbf{v}(t)$ from intrinsic spreading via the intensity-weighted centroid, and identifies an effective transport law by weak-form sparse regression. Conditioning, threshold-sweep and random-centre diagnostics show that overcomplete libraries are strongly collinear; the search is therefore restricted to compact gradient-based libraries. Coefficients are refined by an inverse physics-informed network and recalibrated against forward rollouts, with a chronological block bootstrap quantifying uncertainty. The selected reduced model $u_t+\mathbf v(t)\!\cdot\!\nabla u = 9.005\,|\nabla u|^{2}+0.666\,\Delta u$ outperforms advection--diffusion baselines on held-out frames, retains a positive Laplacian coefficient, and admits a Cole--Hopf reduction to a linear advection--diffusion equation. The framework demonstrates that uncalibrated visual data can yield compact, predictive and structurally interpretable continuum models when discovery, calibration and uncertainty are treated as distinct stages.

[249]  arXiv:2605.04539 [pdf, ps, other]
Title: RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Direct Preference Optimization (DPO), the efficient alternative to PPO-based RLHF, falls short on knowledge-intensive generation: standard preference signals from human annotators or LLM judges exhibit a systematic verbosity bias that rewards fluency over logical correctness. This blindspot leaves a logical alignment gap -- SFT models reach NLI entailment of only 0.05-0.22 despite producing fluent text. We propose RLearner-LLM with Hybrid-DPO: an automated preference pipeline that fuses a DeBERTa-v3 NLI signal with a verifier LLM score, removing human annotation while overcoming the "alignment tax" of single-signal optimization. Evaluated across five academic domains (Biology, Medicine, Law) with three base architectures (LLaMA-2-13B, Qwen3-8B, Gemma 4 E4B-it), RLearner-LLM yields up to 6x NLI improvement over SFT, with NLI gains in 11 of 15 cells and consistent answer-coverage gains. On Gemma 4 E4B-it (4.5B effective params), Hybrid-DPO lifts NLI in four of five domains (+11.9% to +2.4x) with faster inference across all five, scaling down to compact base models without losing the alignment-tax mitigation. Our Qwen3-8B RLearner-LLM wins 95% of pairwise comparisons against its own SFT baseline; GPT-4o-mini in turn wins 95% against our concise output -- alongside the 69% win the same judge gives a verbose SFT over our DPO model, this replicates verbosity bias on a frontier comparator and motivates logic-aware metrics (NLI, ACR) over LLM-as-a-judge for knowledge-intensive generation.

[250]  arXiv:2605.04541 [pdf, ps, other]
Title: Angle-I2P: Angle-Consistent-Aware Hierarchical Attention for Cross-Modality Outlier Rejection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image-to-point-cloud registration (I2P) is a fundamental task in robotic applications such as manipulation,grasping, and localization. Existing deep learning-based I2P methods seek to align image and point cloud features in a learned representation space to establish correspondences, and have achieved promising results. However, when the inlier ratio of the initial matching pairs is low, conventional Perspective-n-Points (PnP) methods may struggle to achieve accurate results. To address this limitation, we propose Angle-I2P, an outlier rejection network that leverages angle-consistent geometric constraints and hierarchical attention. First, we design a scale-invariant, crossmodality geometric constraint based on angular consistency. This explicit geometric constraint guides the model in distinguishing inliers from outliers. Furthermore, we propose a global-tolocal hierarchical attention mechanism that effectively filters out geometrically inconsistent matches under rigid transformation, thereby improving the Inlier Ratio (IR) and Registration Recall (RR). Experimental results demonstrate that our method achieves state-of-the-art performance on the 7Scenes, RGBD Scenes V2, and a self-collected dataset, with consistent improvements across all benchmarks.

[251]  arXiv:2605.04542 [pdf, ps, other]
Title: Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation
Subjects: Machine Learning (cs.LG)

Recent analyses question whether reinforcement learning (RL) is responsible for strong reasoning in large language models (LLMs). At the same time, distillation and inference-time sampling, including power sampling, have emerged as effective ways to improve LLM performance. However, the relationship among RL, distillation, and sampling remains unclear. In this study, we focus on the power distribution, the target distribution of power sampling, and show that the power distribution bridges sampling, self-reward KL-regularized RL, and self-distillation. From the sampling perspective, we show that inexpensive local approximations cannot reproduce sequence-level power without information about possible suffixes. From the RL perspective, the power distribution is the closed-form optimizer of KL-regularized RL when the model's sequence-level log-probabilities are used as the reward. This identification leads to power self-distillation, an offline distillation surrogate that shares the same target distribution and amortizes the cost of power sampling into supervised training on teacher samples. We further show that power self-distillation can achieve self-reward sharpening, while improvement in a downstream true reward is governed by the covariance between true reward and self-reward under the power distribution. Experiments on reasoning tasks support our analysis: power sampling raises self-reward, true-reward gains depend on alignment with self-reward, and power self-distillation can match or exceed the performance of power sampling at much lower inference cost.

[252]  arXiv:2605.04543 [pdf, ps, other]
Title: UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Speculative decoding accelerates Large Language Models via draft-then-verify, where verification can be framed as an Optimal Transport (OT) problem. Existing approaches typically handle multi-draft and multi-step aspects in isolation, applying either flat OT to single-step drafts or per-token rejection sampling to tree-structured candidates. This separation leaves the joint regime (where multi-step dependencies meet multi-draft branching) poorly optimized, as local verification rules fail to exploit the coupling between horizontal and vertical dimensions of candidate trees. In this paper, we propose a unified perspective that casts tree-based verification as a conditional OT problem. Our key insight is that vertical dependencies can be abstracted through prefix acceptance probabilities, which act as dynamic scaling factors to actively guide horizontal draft selection. Based on this principle, we introduce UniVer, a verification algorithm that jointly optimizes across tree levels by composing local optimal transport plans under prefix constraints. We prove that UniVer remains lossless and achieves the optimal acceptance rate under the proposed conditional framework. Extensive experiments across different tasks and models demonstrate that UniVer improves acceptance length by 4.2% to 8.5% over standard recursive rejection sampling without replacement, while maintaining exact distributional alignment with the target model.

[253]  arXiv:2605.04544 [pdf, ps, other]
Title: Hard CNF Instances for Ideal Proof Systems
Subjects: Computational Complexity (cs.CC)

Since the introduction of the Ideal Proof System (IPS) by Grochow and Pitassi (J. ACM 2018), a substantial body of work has established size lower bounds for IPS and its fragments. In particular, Forbes, Shpilka, Tzameret, and Wigderson (Theory Comput. 2021) developed the main lower-bound frameworks for restricted IPS fragments, namely functional lower bounds and the hard multiples method, while Alekseev, Grigoriev, Hirsch, and Tzameret (SIAM J. Comput. 2024) gave a general template for conditional lower bounds for full IPS.
Yet all these lower bounds apply only to purely algebraic formulas over a field, that is, non-Boolean formulas not directly expressible in propositional logic. Proving lower bounds for CNF formulas has therefore remained a central open problem in this line of work.
The current work resolves this question for IPS over read-once oblivious algebraic branching programs (roABPs) by proving lower bounds for refutations of CNF formulas in this system. Our approach is a rank-based feasible interpolation argument, following the method of Pudl\'ak and Sgall (Proof Complexity and Feasible Arithmetic 1996) for monotone span programs, in which decomposing a given roABP refutation along a variable partition yields a low-dimensional space of polynomials from which we construct a span-program interpolant. We extend their result from Nullstellensatz refutations measured by degree to Nullstellensatz refutations measured by roABP size (i.e., roABP-IPS$_\text{LIN}$).

[254]  arXiv:2605.04545 [pdf, ps, other]
Title: Z-Opt: A Near-Optimal Reduced-Complexity Two-Dimensional Grassmannian Constellation
Comments: 12 pages, 11 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Grassmannian constellations are known to achieve the capacity of noncoherent communications over Rayleigh fading channels in the high-SNR regime, yet their efficient construction remains challenging. In this paper, we propose two construction methods for Grassmannian constellations of one-dimensional subspaces in a two-dimensional space, termed S-Opt and Z-Opt, along with two low-complexity detectors. Both the construction and detection procedures are performed on the unit sphere, known as the Bloch sphere in quantum computing. We show that the chordal distance on the Grassmann manifold is proportional to the Euclidean distance on the Bloch sphere and derive a corresponding theoretical upper bound based on the Fejes--T\'oth bound on the minimum chordal distance. The S-Opt constellation is constructed from sphere-packing solutions and attains the derived upper bound for the optimal Bloch-sphere packings considered. The S-Opt detector can be applied to arbitrary Grassmannian constellations on $\mathcal{G}(2,1)$, and its time complexity scales linearly with the number of receive antennas and logarithmically with the constellation size, while yielding the same detection performance as the GLRT detector. Furthermore, based on the insight obtained through the S-Opt construction, the Z-Opt constellation is constructed by stacking regular polygons on the Bloch sphere, and its minimum chordal distance approaches the derived upper bound over the evaluated constellation sizes. The Z-Opt detector's time complexity scales linearly with the number of receive antennas, while yielding the same detection performance as the GLRT detector for Z-Opt.

[255]  arXiv:2605.04547 [pdf, ps, other]
Title: Stage-adaptive audio diffusion modeling
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

Recent progress in diffusion-based audio generation and restoration has substantially improved performance across heterogeneous conditioning regimes, including text-conditioned audio generation and audio-conditioned super-resolution. However, training audio diffusion models remains computationally expensive, and most existing pipelines still rely on static optimization recipes that treat the relative importance of training signals as fixed throughout learning. In this work, we argue that a major source of inefficiency lies in the evolving balance between semantic acquisition and generation-oriented refinement. Early training places stronger emphasis on acquiring condition-aligned semantic structure and coarse global organization, whereas later training increasingly emphasizes temporal consistency, perceptual fidelity, and fine-detail refinement. To characterize this evolving balance, we introduce a progress-based regime variable derived from the training-time slope of an SSL-space discrepancy, which measures semantic progress during training. Based on this signal, we develop three complementary stage-aware mechanisms: decayed SSL guidance for early semantic bootstrapping, self-adaptive timestep sampling driven by the regime variable, and structure-aware regularization activated from convergent grouped organization in parameter space. We evaluate these mechanisms on text-conditioned audio generation and audio-conditioned super-resolution. Across both settings, the proposed stage-aware strategies improve convergence behavior and yield gains on the primary generation and spectral reconstruction metrics over standard static baselines. These results support the view that efficient audio diffusion training can benefit from treating external guidance, internal organization, and optimization emphasis as stage-dependent components rather than fixed ingredients.

[256]  arXiv:2605.04548 [pdf, ps, other]
Title: Event-Based Early Warning of Vineyard Disease Risk from Environmental Time Series
Subjects: Machine Learning (cs.LG)

Accurate early warning of vineyard disease risk from environmental observations is essential for timely intervention and more sustainable crop protection. However, many existing studies formulate disease prediction as daily presence classification, which can favor persistence-driven predictions and provide only limited support for actionable short-horizon warning. In this paper, we present an event-based approach for early warning of vineyard disease risk from environmental time series and evaluate it through a vineyard case study. Rather than predicting daily disease status, the task is reformulated to predict transitions into annotated disease-risk periods within a future window of 3-7 days. To reduce fragmentation caused by short interruptions in the binary labels, new events are defined only after a minimum disease-free gap. This formulation encourages models to capture environmental precursors associated with upcoming risk periods instead of merely reproducing temporal persistence. Using multi-year agro-meteorological data, we construct input representations that capture humidity dynamics, rainfall accumulation, temperature variability, and seasonal structure through cyclic temporal encoding. We evaluate representative methods from classical machine learning and deep learning, including XGBoost, Long Short-Term Memory (LSTM) networks, and Temporal Convolutional Networks (TCNs), using both standard classification metrics and an event-oriented early warning protocol. The results show that the event-based formulation supports practical short-horizon warning, while the compared models exhibit distinct trade-offs between event recall, lead time, and false-alert behavior. Overall, the study underscores the importance of problem formulation in environmental time-series learning and demonstrates the value of event-based prediction for vineyard disease warning systems.

[257]  arXiv:2605.04550 [pdf, ps, other]
Title: Neural-Guided Domain Restriction to Accelerate Pseudospectra Computation for Structured Non-normal Banded Matrices
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)

Computing pseudospectra of non-normal matrices is essential for understanding the stability and transient behavior of dynamical systems. Such analysis is critical in applications including fluid dynamics, control systems, and differential operators, where non-normality can lead to significant transient amplification and sensitivity to perturbations that are not captured by eigenvalue analysis alone. At large scales, commonly used numerical approaches for pseudospectra computation can become computationally demanding, as they require repeated auxiliary computations to identify spectrally sensitive regions in the complex plane.
We present a neural network-based approach that predicts sensitive regions directly from matrix features, thereby avoiding exhaustive pseudospectra evaluation across the entire complex plane. We calibrate the prediction threshold on validation data to ensure reliable coverage of sensitive regions. The trained neural network guides the selection of grid points requiring full computation, enabling focused computation only where necessary. The approach provides a practical preprocessing strategy for efficient pseudospectra computation. Numerical experiments on non-normal banded matrices demonstrate substantial speedup compared to full grid-based numerical evaluation while maintaining high accuracy in identifying sensitive regions.

[258]  arXiv:2605.04552 [pdf, ps, other]
Title: The Newsworthiness of Brazilian Distress: A Peak Analysis on Time Series of International Media Attention to Disasters in Brazil
Subjects: Computation and Language (cs.CL)

Media coverage influences disaster response, yet the drivers of international media attention to local events remain unevenly understood. Brazil offers a compelling case: some of its natural and technological disasters occasionally hit the international headlines. However, systematic analyses of what makes these events be discussed abroad are still missing. Addressing this gap requires representative, validated and country-specific news datasets. This paper presents a peak analysis of 2k news about Brazilian fires and landslides in German newspapers from 2000 to 2024. Using time series segmentation to detect news event peaks, we examine the extent to which they can be temporally aligned with observations in national and global disaster databases.

[259]  arXiv:2605.04554 [pdf, ps, other]
Title: InterMesh: Explicit Interaction-Aware End-to-End Multi-Person Human Mesh Recovery
Comments: 16 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Humans constantly interact with their surroundings. Existing end-to-end multi-person human mesh recovery methods, typically based on the DETR framework, capture inter-human relationships through self-attention across all human queries. However, these approaches model interactions only implicitly and lack explicit reasoning about how humans interact with objects and with each other. In this paper, we propose InterMesh, a simple yet effective framework that explicitly incorporates human-environment interaction information into human mesh recovery pipeline. By leveraging a human-object interaction detector, InterMesh enriches query representations with structured interaction semantics, enabling more accurate pose and shape estimation. We design lightweight modules, Contextual Interaction Encoder and Interaction-Guided Refiner, to integrate these features into existing HMR architectures with minimal overhead. We validate our approach through extensive experiments on 3DPW, MuPoTS, CMU Panoptic, Hi4D, and CHI3D datasets, demonstrating remarkable improvements over state-of-the-art methods. Notably, InterMesh reduces MPJPE by 9.9% on CMU Panoptic and 8.2% on Hi4D, highlighting its effectiveness in scenarios with complex human-object and inter-human interactions.

[260]  arXiv:2605.04555 [pdf, ps, other]
Title: Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Model-based reinforcement learning (MBRL) offers a promising approach for data-efficient energy management in buildings, combining the strengths of predictive modeling and reinforcement learning. While previous MBRL methods applied to HVAC control have reduced training data requirements, they still require several months of interaction with the building to learn a satisfactory control policy. A key reason is that existing surrogate models attempt to predict the entire state-space, including weather and electricity prices that are unaffected by control actions, or completely ignore these variables. Addressing these issues, we propose Counter-Dyna, a method that enhances the data-efficiency of Dyna, an MBRL method. We create data-efficient counterfactual surrogate models (CSM) by leveraging invariances in the state-space. Using a CSM in Dyna speeds up RL training measured in environment interaction data compared to previous results. In comparison with previous state-of-the-art that used 6-12 months of environment interactions, our method needs only 5 weeks. We evaluate our method in a large simulation study using the literature standard BOPTEST framework and proximal policy algorithm (PPO) as the RL algorithm. Our results show cost-saving potentials of 5.3% to 17.0% in a hypothetical deployment scenario. Our work is a significant step towards making real-world deployment of RL algorithms in HVAC control practically viable.

[261]  arXiv:2605.04556 [pdf, ps, other]
Title: Benchmarking LLMs on the Massive Sound Embedding Benchmark (MSEB)
Subjects: Sound (cs.SD); Machine Learning (cs.LG)

The Massive Sound Embedding Benchmark (MSEB) has emerged as a standard for evaluating the functional breadth of audio models. While initial baselines focused on specialized encoders, the shift toward "audio-native" Large Language Models (LLMs) suggests a new paradigm where a single multimodal backbone may replace complex, task-specific pipelines. This paper provides a rigorous empirical evaluation of leading LLMs - including members from the Gemini and GPT families - across the eight core MSEB capabilities to assess their efficacy and audio-text parity. Our results indicate that while a significant modality gap persists regarding performance and robustness, the empirical evidence for an "optimal" modeling approach remains inconclusive. Ultimately, the choice between audionative and cascaded architectures depends heavily on specific use-case requirements and the underlying assumptions regarding latency, cost, and reasoning depth.

[262]  arXiv:2605.04557 [pdf, ps, other]
Title: Efficient Geometry-Controlled High-Resolution Satellite Image Synthesis
Comments: 2026 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

High-resolution satellite images are often scarce and costly, especially for remote areas or infrequent events. This shortage hampers the development and testing of machine learning models for land-cover classification, change detection, and disaster monitoring. In this paper, we tackle the problem of geometry-controlled high-resolution satellite image synthesis by adding control over existing pre-trained diffusion models. We propose a simple yet efficient method for controlling the synthesis process by leveraging only skip connection features using windowed cross-attention modules. Several previously established control techniques are compared, indicating that our method achieves comparable performance while leading to a better alignment with the geometry control map. We also discuss the limitations in current evaluation approaches, amplifying the necessity of a consistent alignment assessment.

[263]  arXiv:2605.04559 [pdf, ps, other]
Title: Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation
Comments: Accepted by SIGIR 2026. 11 pages, 8 figures
Subjects: Information Retrieval (cs.IR)

Large Language Models have revolutionized recommender systems (LLM4Rec) by leveraging their generative capabilities to model complex user preferences. However, existing LLM4Rec methods primarily rely on token-level objectives, making it difficult to optimize list-level and non-differentiable metrics (e.g., NDCG, fairness) that define actual recommendation quality. While Best-of-N (BoN) directly optimizes these metrics during inference, its high computational cost hinders real-world deployment. To address this, BoN Alignment aims to distill the search capability into the model itself, yet current approaches suffer from two critical limitations: (1) Indiscriminate Supervision, where the static reference fails to distinguish the relative quality of candidates exceeding its empirical range, leading to a loss of ranking guidance; and (2) Gradient Decay, where the effective supervision signal rapidly diminishes as the evolving policy improves, resulting in inefficient optimization.
To overcome these challenges, we propose BLADE (Bayesian List-wise Alignment via Dynamic Estimation). Unlike static approaches, BLADE introduces a Bayesian framework that continuously updates the target distribution by fusing historical priors with dynamic evidence from the model's current rollouts. This mechanism constructs a self-evolving target that adapts to the model's growing capabilities, ensuring the training signal remains informative throughout the learning process. Extensive experiments on three real-world datasets demonstrate that BLADE significantly outperforms state-of-the-art baselines. Crucially, it breaks the static performance upper bound, achieving sustained gains in both ranking accuracy (Recall, NDCG) and complex list-wise metrics (Fairness, Diversity). The code is available via https://github.com/RegionCh/BLADE.

[264]  arXiv:2605.04560 [pdf, ps, other]
Title: SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Perceptual image compression focuses on preserving high visual quality under low-bitrate constraints. Most existing approaches to perceptual compression leverage the strong generative capabilities of generative adversarial networks or diffusion models, at the cost of substantial model complexity. To this end, we present an efficient perceptual image compression method that exploits the long-range modeling capability and linear computational complexity of state space models, with a particular focus on Mamba. Unlike existing methods that rely on an inherently fixed scanning order and consequently impair semantic continuity and spatial correlation, we develop a semantic-aware Mamba block (SAMB) to enable scanning guided by dynamically clustered semantic features, thereby alleviating the strict causality constraints and long-range information decay inherent to Mamba. Inspired by singular value decomposition, we design an SVD-inspired redundancy reduction module (SVD-RRM) that performs a low-rank approximation on the latent features by introducing a learnable soft threshold, leading to channel-wise redundancy information reduction. The proposed SAMB is integrated into both the encoder and decoder of the compression framework, whereas the SVD-RRM is incorporated only in the encoder. Extensive experiments demonstrate that our method performs favorably against state-of-the-art approaches in terms of rate-distortion-perception tradeoff and model complexity. The source code and pretrained models will be available at https://github.com/Jasmine-aiq/SAMIC.

[265]  arXiv:2605.04563 [pdf, ps, other]
Title: RangeGuard: Efficient, Bounded Approximate Error Correction for Reliable DNNs
Comments: 16 pages, 9 figures. Accepted to the 53rd Annual IEEE/ACM International Symposium on Computer Architecture (ISCA 2026)
Subjects: Hardware Architecture (cs.AR)

As DRAM scales in density and adopts 3D integration, raw fault rates increase and multi-bit errors are no longer rare. Such errors can severely impact Deep Neural Networks (DNNs): although DNNs tolerate small numerical perturbations, random bit flips can create extreme outliers that propagate and sharply degrade accuracy. Large Language Models (LLMs) are particularly vulnerable because attention, residual, and normalization layers can amplify and preserve a single corrupted activation across many layers, destabilizing inference.
This paper introduces RangeGuard, a metadata-centric error-correcting framework that provides strong reliability and high efficiency based on bounded approximate correction. Instead of protecting raw bits, RangeGuard encodes compact Range Identifiers (RIDs) that capture the numerical range of each value. These compact metadata enable efficient use of limited redundancy and concentrate protection on range changes, which indicate harmful semantic deviations, while ignoring benign intra-range variations. Upon detecting a range change, RangeGuard restores the correct range and substitutes a representative value, ensuring that error magnitudes are bounded within the range. Based on RIDs, RangeGuard can tolerate 64+ flipped bits using only 16 bits of parity available in GPU memories without a noticeable accuracy loss. By introducing semantic range protection, RangeGuard enables reliable DNN execution even under frequent memory errors and tight redundancy budgets.

[266]  arXiv:2605.04564 [pdf, ps, other]
Title: Practical validation of synthetic pre-crash scenarios
Subjects: Robotics (cs.RO)

The representativeness of synthetic pre-crash scenarios is crucial for assessing the safety impact of Driving Automation Systems through virtual simulations. However, a gap remains in the robust evaluation of synthetic pre-crash scenarios' practical equivalence to their real-world counterparts; that is, whether they are similar enough for the intended assessment purpose. Conventional significance testing is inadequate, as it focuses on detecting differences rather than establishing practical equivalence. This study addresses the research gap by extending our previous work on a Bayesian Region of Practical Equivalence (ROPE)-based equivalence testing framework by introducing a binning-based approach to define appropriate statistics and equivalence criteria. Two binning-based statistics are proposed to measure practically meaningful distributional differences between datasets in the context of safety impact assessment. The framework's applicability is demonstrated through a case study, which tests the practical equivalence of two synthetic rear-end pre-crash datasets with a previously developed reference dataset in the context of the safety impact assessment of an Automatic Emergency Braking system. The results show that the framework provides informative quantitative assessments of practical equivalence as well as diagnostic insights into the divergence of datasets. Although the demonstration focuses on rear-end pre-crash scenarios, the framework is generic and extensible to broader validation contexts, providing an interpretable and principled basis for practical equivalence assessment across diverse synthetic data applications.

[267]  arXiv:2605.04565 [pdf, ps, other]
Title: Delay-Aware Large-Small Model Collaboration over LEO Satellite Networks
Comments: 6 pages, 6 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In this paper, we introduce a delay-aware largesmall model collaboration scheme for low Earth orbit (LEO) satellite networks, which can balance the computational load among satellites and the communication load across inter-satellite links. Specifically, computational resource constrained remote sensing satellites are responsible for data collection and local processing using small models, while collaborating with computing satellites that provide large model processing. To minimize the service delay, we formulate a joint optimization problem for offloading decision and routing strategy design, which is transformed into a decentralized partially observable Markov decision process. To solve the problem, we develop a multi-agent reinforcement learning (MARL)-based algorithm with offline policy training and online bisection search. The offline trained policy determines routing strategies, while online bisection search iteratively adjusts the offloading decisions. Simulation results demonstrate that the proposed scheme can reduce the service delay by up to 31.85% compared with the benchmarks.

[268]  arXiv:2605.04566 [pdf, ps, other]
Title: Open-Source Image Editing Models Are Zero-Shot Vision Learners
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Recent studies have shown that large generative models can solve vision tasks they were not explicitly trained for. However, existing evidence relies on closed-source models~(Veo~3, Nano Banana Pro) or requires task-specific instruction tuning, leaving open whether publicly available image-editing models possess zero-shot vision abilities out of the box.
We conduct a systematic evaluation of three open-source image-editing models -- Qwen-Image-Edit, FireRed-Image-Edit, and LongCat-Image-Edit -- on dense visual prediction tasks \emph{without any fine-tuning}. We benchmark monocular depth estimation on NYUv2 and DIODE, surface normal estimation on NYUv2, and semantic segmentation on Cityscapes, covering both geometric and semantic scene understanding.
Results show that open-source image-editing models exhibit non-trivial zero-shot visual understanding. On NYUv2 surface normals, FireRed-Image-Edit achieves a mean angular error of $17.69^\circ$, surpassing the fine-tuned Marigold ($20.86^\circ$) and matching the instruction-tuned Vision Banana ($17.78^\circ$) without any task-specific training. On NYUv2 depth estimation, LongCat-Image-Edit obtains $\delta_1{=}0.822$ with affine alignment, and Qwen-Image-Edit leads on DIODE Indoor ($\delta_1{=}0.868$). On Cityscapes semantic segmentation, Qwen-Image-Edit reaches 25.7 mIoU at the 19-class level and 49.5 mIoU at a coarser 7-category level. By comparing three independently trained editors, we test whether zero-shot vision ability is an emergent property of image-editing pretraining rather than a model-specific artifact. Code, evaluation scripts, and all results are publicly released to serve as a reproducible baseline for future work.

[269]  arXiv:2605.04568 [pdf, ps, other]
Title: Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

State-of-the-art model-based Reinforcement Learning (RL) approaches either use gradient-free, population-based methods for planning, learned policy networks, or a combination of policy networks and planning. Hybrid approaches that combine Model Predictive Control (MPC) with a learned model and a policy prior to leverage the advantages of both paradigms have shown promising results. However, these approaches typically rely on gradient-free optimization methods, which can be computationally expensive for high-dimensional control tasks. While gradient-based methods are a promising alternative, recent works have empirically shown that gradient-based methods often perform worse than their gradient-free counterparts. We propose Dream-MPC, a novel approach that generates few candidate trajectories from a rolled-out policy and optimizes each trajectory by gradient ascent using a learned world model, uncertainty regularization and amortization of optimization iterations over time by reusing previously optimized actions. Our results on 24 continuous control tasks show that Dream-MPC can significantly improve the performance of the underlying policy and can outperform gradient-free MPC and state-of-the-art baselines. We will open source our code and more at https://dream-mpc.github.io.

[270]  arXiv:2605.04569 [pdf, ps, other]
Title: Lightning Unified Video Editing via In-Context Sparse Attention
Comments: Accepted by ICML 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing. Our design is grounded in two key insights: first, context tokens exhibit significantly lower saliency than source tokens; second, we theoretically prove and empirically validate that Query sharpness correlates with approximation error. Motivated by these findings, ISA implements an efficient pre-selection strategy to prune redundant context, followed by a dynamic query grouping mechanism that routes high-error queries to full attention and low-error ones to a computationally efficient 0-th order Taylor sparse attention. Furthermore, we build \textbf{\texttt{LIVEditor}} , a novel lightning video editing model via ISA and a proposed video-editing data pipeline that curated a 1.7M high-quality dataset. Extensive experiments demonstrate that LIVEditor achieves a $\sim$60% reduction in attention-module latency while surpassing state-of-the-art methods across EditVerseBench, IVE-Bench, and VIE-Bench, delivering near-lossless acceleration without compromising visual fidelity.

[271]  arXiv:2605.04570 [pdf, ps, other]
Title: PINSIGHT: A Comprehensive Threat Exploration of Domain-Adaptive Wi-Fi based PIN Code Inference
Subjects: Cryptography and Security (cs.CR)

Wi-Fi signals can be exploited by adversaries as a sensing side channel to eavesdrop on physical information. By monitoring propagation effects of radio waves within the victim's environment, attackers can remotely infer sensitive information. One particularly concerning example is PIN code inference, where the attacker faces the challenge of mapping Wi-Fi physical-layer channel estimations back into typed digits. While effective in their training environment, such attacks typically fail as soon as they are deployed in unseen environments. The current state-of-the-art attack, WiKI-Eve, attempts to overcome this problem using a deep-learning approach, reporting high PIN code inference accuracy independent of environments, devices, and users. While this suggests a significant real-world threat, it is not well understood how far the attack actually reaches, nor what its underlying generalization performance is based on. In this work, we close this gap by presenting PINSIGHT, a novel methodology that separates the effects of environmental variation and PIN code typing. This enables the first rigorous threat assessment of such attacks, evaluating their generalization capabilities and limitations. Our approach leverages a robotic typing platform that produces highly repeatable keystroke events across systematically varied environment changes [...]. This dataset constitutes the first benchmark for environment generalization in Wi-Fi PIN code inference attacks. Evaluating several state-of-the-art methods, we find that attacks generalize reliably across changes in the surrounding environment but degrade substantially when the channel's encoding of typing itself shifts - precisely the condition that defines a realistic attack scenario. We conclude that the reported performance of current state-of-the-art Wi-Fi PIN inference attacks is not representative of the actual real-world threat.

[272]  arXiv:2605.04572 [pdf, ps, other]
Title: From Parameter Dynamics to Risk Scoring : Quantifying Sample-Level Safety Degradation in LLM Fine-tuning
Comments: Accepted by ICML 2026
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Safety alignment of Large Language Models (LLMs) is extremely fragile, as fine-tuning on a small number of benign samples can erase safety behaviors learned from millions of preference examples. Existing studies attempt to explain this phenomenon by comparing parameters and hidden states before and after fine-tuning, but overlook their dynamic evolution during fine-tuning. In this paper, we uncover a critical mechanism underlying safety degradation by analyzing parameter dynamics, where benign fine-tuning causes parameters to cumulatively drift toward danger-aligned directions, progressively undermining the model's safety. This finding suggests that samples contributing more to this drift has greater fine-tuning risks. Based on this insight, we propose a method of Sample-Level Quantification of Safety Degradation (SQSD), which quantifies the influence of each training sample on safety degradation. Specifically, SQSD computes continuous risk scores to samples by measuring their induced parameter updates' projection difference between danger and safety directions. Extensive experiments across multiple models and datasets demonstrate that SQSD effectively quantifies sample-level fine-tuning risks and exhibits strong transferability across model architectures, parameter scales, and parameter-efficient methods.

[273]  arXiv:2605.04573 [pdf, ps, other]
Title: Mixed Finite Elements for Geometrically Exact Beams using Discontinuous Rotations and Discrete Curvature
Subjects: Numerical Analysis (math.NA)

We propose a novel mixed finite-element formulation for geometrically exact (Simo--Reissner) beams that introduces the moment vector as additional independent field. The specific mixed form allows for an element-local, discontinuous approximation of rotations, which is key to a simple and efficient discretization framework. The concept of discrete curvature provides a mathematically consistent treatment of rotation discontinuities. For linear constitutive laws, the mixed form is derived via a Legendre transform of the curvature-related strain energy. Objectivity is retained at the discrete level by interpolating relative rotations through a multiplicative split of the rotation field; path-independence is inherent to the total Lagrangian setting and verified numerically. Several benchmarks demonstrate optimal rates of convergence and accuracy, irrespective of the beam's slenderness and order of approximation. Notably, the lowest-order element entirely avoids rotation interpolation by employing element-constant rotations only.

[274]  arXiv:2605.04574 [pdf, ps, other]
Title: VL-UniTrack: A Unified Framework with Visual-Language Prompts for UAV-Ground Visual Tracking
Subjects: Computer Vision and Pattern Recognition (cs.CV)

UAV-ground visual tracking (UGVT) aims to simultaneously track the same object from both the UAV and the ground view. However, existing two-stream methods suffer from isolated feature extraction and rely heavily on implicit appearance matching, which struggles to establish reliable correspondence under drastic view differences, leading to tracking unreliability. To address these limitations, we propose VL-UniTrack, a fully unified framework enhanced by visual-language prompts. By encoding features from both views within a single shared encoder, our method breaks the barrier of feature isolation to facilitate sufficient cross-view interaction. To overcome the ambiguity caused by relying solely on appearance matching, we design visual-language geometric prompting module, which fuses language descriptions with visual features to generate learnable prompts. These prompts are then fed into our prompt-guided cross-view adapter module to enable sufficient cross-view feature interaction and to guide the learning of view-specific feature representations. Furthermore, a confidence-modulated mutual distillation loss is proposed to regularize the training by mitigating noise propagation. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the latest benchmark. The code can be downloaded in https://github.com/xuboyue1999/VL-UniTrack.git

[275]  arXiv:2605.04576 [pdf, ps, other]
Title: Benchmarking POS Tagging for the Tajik Language: A Comparative Study of Neural Architectures on the TajPersParallel Corpus
Comments: Preprint
Subjects: Computation and Language (cs.CL)

This paper presents the first benchmark for the task of automatic part-of-speech (POS) tagging for the Tajik language. Despite the existence of multilingual language models demonstrating high effectiveness for many of the world's languages, their capacity for grammatical analysis of Tajik has remained unexplored until now. The aim of this study is to fill this gap through a systematic comparison of classical neural network architectures and modern multilingual transformers.
Experiments were conducted on the TajPersParallel corpus, a parallel lexical resource comprising approximately 44,000 dictionary entries. Due to the absence of full-fledged example sentences in the current version of the corpus, the task was performed at the level of isolated lexical units, representing a challenging case of context-independent classification. The study compares the following architectures: a recurrent BiLSTM-CRF model, as well as multilingual models XLM-RoBERTa (large), mBERT, ParsBERT (Persian), and ruBERT (Russian), adapted using the parameter-efficient fine-tuning method LoRA.
The testing results showed that the best performance is achieved by the mBERT + LoRA model (macro F1-score = 0.11, weighted F1-score = 0.62). It was established that in the absence of syntactic context, all models experience significant difficulty in resolving morphological ambiguity, successfully classifying primarily high-frequency classes ("noun," "adjective") while demonstrating zero effectiveness for rare function words. Zero-shot evaluation revealed the greatest typological proximity of Tajik to Persian (ParsBERT) and Russian (ruBERT). The obtained results form a foundation for further research and development in the field of automatic processing of the Tajik language.

[276]  arXiv:2605.04581 [pdf, ps, other]
Title: GTF: Omnidirectional EPI Transformer for Light Field Super-Resolution
Comments: Accepted to NTIRE 2026. 9 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Light field (LF) image super-resolution benefits from Epipolar Plane Images (EPIs), whose line slopes explicitly encode disparity. However, existing Transformer-based LF SR methods mainly attend to horizontal and vertical EPIs, leaving diagonal epipolar geometry underexplored. We present GTF, an omnidirectional EPI Transformer that explicitly models horizontal, vertical, 45-degree, and 135-degree EPIs within a unified reconstruction framework. GTF combines directional EPI processing, MacPI-based prior injection, adaptive directional fusion, and a topology-preserving feed-forward network to better exploit LF geometry. For the NTIRE 2026 fidelity tracks, we use GTF as the main model, while a lightweight GTF-Tiny variant targets the efficiency track. On five standard LF SR benchmarks covering both real-captured and synthetic scenes, GTF reaches 32.78 dB without inference-time enhancement, and stronger inference settings with EPSW and test-time augmentation further improve performance. Under the NTIRE 2026 efficiency constraint, GTF-Tiny attains 32.57 dB with only 0.915M parameters and 19.81 GFLOPs. In the NTIRE 2026 Light Field Image Super-Resolution Challenge, our submissions rank 3rd on Track 1 and Track 3 and 4th on Track 2. Architecture-evolution, channel-width, and inference analyses further support the effectiveness of diagonal EPI modeling, directional fusion, and the lightweight design.

[277]  arXiv:2605.04583 [pdf, ps, other]
Title: TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script)
Comments: Preprint
Subjects: Computation and Language (cs.CL)

The Tajik language, written in Cyrillic script, remains severely under-resourced in terms of publicly available natural language processing (NLP) toolkits, hindering both linguistic research and applied development. This paper introduces TajikNLP, an open-source Python library that provides the first comprehensive pipeline for processing authentic Tajik text while preserving the original Cyrillic orthography. The library implements a modular architecture centered around a unified Doc object, enabling sequential application of components for cleaning, normalization, tokenization (including subword BPE), morphemic segmentation, part-of-speech tagging, stemming, lemmatization, and sentence splitting. A novel unified morphology engine is introduced, offering controlled and deep analysis modes that significantly improve handling of Tajik's agglutinative nominal and verbal inflections. The release further incorporates a lexicon-based sentiment analyser and pre-trained Word2Vec/FastText embeddings loaded directly from the Hugging Face Hub. To ensure reproducibility and facilitate future research, four accompanying linguistic datasets -- a POS-tagged corpus (52.5k entries), a sentiment lexicon (3.5k entries), a toponym gazetteer (5.6k entries), and a personal names dataset (3.8k entries) -- have been openly published under permissive licenses. The library's reliability is validated by an extensive test suite of 616 automated tests achieving 93% source code coverage. TajikNLP thus establishes a foundational technological infrastructure for Tajik language processing, lowering the barrier to entry for both academic and industrial applications in low-resource Cyrillic-script environments.

[278]  arXiv:2605.04585 [pdf, ps, other]
Title: IntenBot: Flexible and Imprecise Multimodal Input for LLMs to Understand User Intentions for Casual and Human-Like HRI
Subjects: Human-Computer Interaction (cs.HC)

In natural human-to-human communication, multimodal user input is typically used to supplement explicit and complement implicit voice commands, with casualness allowing for flexible input modality combinations and tolerance for imprecise input data. For example, saying "I want that." with a casual glance at a bottle of water is clear enough in human-to-human communication as an implicit voice command accompanied by gaze and/or gestures, rather than an explicit one. To enable such a human-like interaction in human-robot interaction (HRI), we propose a system, IntenBot, to understand user intentions from flexible and imprecise multimodal input, including voice, gaze, and finger-pointing, in XR. The disambiguation capability of large language models (LLMs) is used to filter out irrelevant input modalities and imprecise input data, generating potential instructions for user confirmation. The flexible and imprecise multimodal input enables casual, human-like interaction with robots, reducing time, effort, and attention, and could also be used as non-voice input. We conducted an informative user behavior study in a simulated environment to understand users' natural be- havior in flexibly interacting with a robot using multimodal input and to obtain appropriate angle range parameters for gaze and finger-pointing. An XR study was then performed to evaluate the performance of IntenBot, compared with other methods. We also deployed IntenBot on a physical robot to showcase its real-world applications.

[279]  arXiv:2605.04590 [pdf, ps, other]
Title: From Diffusion to Rectified Flow: Rethinking Text-Based Segmentation
Comments: Accepted at ICMR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have shown that diffusion models (e.g., Stable Diffusion) can provide rich multimodal semantic features, leading to studies of using diffusion models as feature extractors for segmentation tasks. Such methods, however, inherit the generative natures of diffusion models that are harmful to discriminative segmentation tasks. In response, we propose RLFSeg, a novel framework that leverages Rectified Flow to learn direct mapping from the image to the segmentation mask within the latent space. The model is thus freed from the noise-denoise process and the need to optimize the time step of diffusion models, resulting in substantially better performance than previous diffusion-based methods, especially on zero-shot scenarios. By introducing label refinement and an Adaptive One-Step Sampling strategy, the model achieves higher accuracy even on a single inference step. The framework redirects a pretrained generative model to the discriminative segmentation task with zero modification to model structure, thus reveals promising application potential and significant research value.

[280]  arXiv:2605.04593 [pdf, ps, other]
Title: DiCLIP: Diffusion Model Enhances CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation
Comments: This work is accepted by IEEE Transactions on Image Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Weakly Supervised Semantic Segmentation (WSSS) with image-level labels typically leverages Class Activation Maps (CAMs) to achieve pixel-level predictions. Recently, Contrastive Language-Image Pre-training (CLIP) has been introduced to generate CAMs in WSSS. However, previous WSSS methods solely adopt CLIP's vision-language paired property for dense localization, neglecting its inherently limited dense knowledge across both visual and text modalities, which renders CAM generation suboptimal. In this work, we propose DiCLIP, a novel WSSS framework that leverages the generative diffusion model to enhance CLIP's dense knowledge across two modalities. Specifically, Visual Correlation Enhancement (VCE) and Text Semantic Augmentation (TSA) modules are proposed for dense prediction enhancement. To improve the spatial awareness of visual features, our VCE module utilizes diffusion's reliable spatial consistency to mitigate the over-smoothing issue in CLIP's attention. It designs the Attention Clustering Refinement (ACR) module to reliably extract diverse correlation maps from the diffusion model. The correlation maps act as a diversity bias for CLIP's self-attention, recursively pushing its visual features towards a more discriminative dense distribution. To augment the semantics of text embeddings, our TSA module argues that a single text modality is insufficient to encompass the variability of visual categories. Thus, we leverage diffusion's generative power to maintain a dynamic key-value cache model, shifting CAM generation from a patch-text matching mechanism to a novel visual knowledge retrieval paradigm. With these enhancements, DiCLIP not only outperforms state-of-the-art methods on PASCAL VOC and MS COCO but also significantly reduces training costs. Code is publicly available at https://github.com/zwyang6/DiCLIP.

[281]  arXiv:2605.04594 [pdf, ps, other]
Title: HeterSEED: Semantics-Structure Decoupling for Heterogeneous Graph Learning under Heterophily
Comments: 29 pages, 9 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Many real-world heterogeneous graphs exhibit pronounced heterophily, where connected nodes often have dissimilar labels or play different semantic roles. In such settings, standard heterogeneous graph neural networks that aggregate messages along metapaths or meta-relations primarily based on feature similarity can propagate misleading information, since feature similarity may be misaligned with underlying relational semantics. In this paper, we propose HeterSEED, a semantics-structure decoupling framework for heterogeneous graph learning under heterophily. HeterSEED decouples representation learning into a heterogeneous semantic channel that captures type- and relation-aware local semantics and a structure-aware heterophily channel that separates homophilic and heterophilic neighborhoods via pseudo-label-guided partitioning and aggregates them using metapath-based structural weights. A node-level adaptive fusion mechanism then combines the two channels to produce context-dependent node representations. Theoretically, we establish that, on heterogeneous graphs under heterophily, HeterSEED is strictly more expressive than standard heterogeneous graph neural networks that rely primarily on feature similarity and provably reduces the prediction bias introduced by heterophilic neighbors. Experiments on five real-world heterogeneous graphs, including two large-scale networks at the million-node and hundred-million-edge scale, demonstrate that HeterSEED consistently outperforms representative heterogeneous graph neural networks and recent heterophily-aware baselines, especially in strongly heterophilic regimes.

[282]  arXiv:2605.04595 [pdf, ps, other]
Title: A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints
Comments: Accepted in ICML 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

The rapid adoption of large language models (LLMs) has created significant challenges for efficient inference at scale. Unlike traditional workloads, LLM inference is constrained by both computation and the memory overhead of key-value (KV) caching, which accelerates decoding but quickly exhausts GPU memory. In this paper, we introduce the first queueing-theoretic framework that explicitly incorporates both computation and GPU memory constraints into the analysis of LLM inference. Based on this framework, we derive rigorous stability and instability conditions that determine whether an LLM inference service can sustain incoming demand without unbounded queue growth. This result offers a powerful tool for system deployment, potentially addressing the core challenge of GPU provisioning. By combining an estimated request arrival rate with our derived stable service rate, operators can calculate the necessary cluster size to avoid both costly over-purchasing and performance-violating under-provisioning. We further validate our theoretical predictions through extensive experiments in real GPU production environments. Our results show that the predicted stability conditions are highly accurate, with deviations typically within 10%.

[283]  arXiv:2605.04600 [pdf, ps, other]
Title: A Blockchain-as-a-Service Solution for TAFES-Compliant Verification of Fair Trade Certifications
Subjects: Computational Engineering, Finance, and Science (cs.CE)

\abstract{\textbf{Purpose:} This study addresses the lack of trust in ethical product labels by designing a blockchain platform grounded in the TAFES principles (Transparency, Accountability, Fairness, Ethics, Safety). It aims to bridge the gap between blockchain's theoretical transparency and a responsible, real-world implementation for certification ecosystems.
\textbf{Design/Methodology/Approach:} Using Action Design Research, we developed a proof-of-concept platform for label authentication. A hybrid architecture records critical events on an Ethereum Layer-2 network for security, while supporting evidence is stored off-chain via IPFS and linked via content identifiers. The solution was validated through a coffee supply chain scenario.
\textbf{Findings:} The proof of concept demonstrates how a TAFES-aligned blockchain platform can support verification of label claims without requiring trust in a single intermediary by creating tamper-evident provenance records and auditable certification evidence across multiple stakeholders. The design supports low-cost, near-real-time anchoring of supply chain events while mitigating adoption barriers related to scalability, privacy, and operational viability.
\textbf{Originality/Value:} This research contributes an integrated ethical and technical blueprint for trustworthy label authentication systems by translating TAFES into implementable design requirements and evaluation checks, and validating them through an ADR driven proof of concept. It advances prior work by moving from the question of whether blockchain can help to the question of how it should be implemented responsibly in multi stakeholder certification ecosystems.}

[284]  arXiv:2605.04606 [pdf, ps, other]
Title: Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness
Comments: 23 pages 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Traditional one-shot detection methods have addressed the closed-set problem in object detection, but the high cost of data annotation remains a critical challenge. General unsupervised methods generate pseudo boxes without category labels, thus failing to achieve category-aware classification. To overcome these limitations, we propose Reference-based Category Discovery (RefCD), an unsupervised detector that enables category-aware\footnotemark[1] detection without any manually annotated labels. It leverages feature similarity between predicted objects and unlabeled reference images. Unlike previous unsupervised methods that lack category guidance and one-shot methods which require labeled data, RefCD introduces a carefully designed feature similarity loss to explicitly guide the learning of potential category-specific features. Additionally, RefCD supports category-agnostic detection without reference images, serving as a unified framework. Comprehensive quantitative and qualitative analysis of category-aware and category-agnostic detection results demonstrates its effectiveness, and RefCD can learn category information in an unsupervised paradigm even without category labels.

[285]  arXiv:2605.04607 [pdf, ps, other]
Title: Right Model, Right Time: Real-Time Cascaded-Fidelity MPC for Bipedal Walking
Comments: Accepted to IEEE ICRA 2026 Workshop "2cnd Workshop on Frontiers of Optimization for Robotics"
Subjects: Robotics (cs.RO)

This paper presents a multi-phase whole-body model predictive control approach for bipedal walking, combining a detailed whole-body model in the near horizon with a simplified single-rigid-body model in the later prediction steps. This reduces computational complexity while retaining prediction capabilities. The resulting nonlinear optimal control problem is solved using sequential quadratic programming (SQP) in acados. Using a prior specified contact schedule and a target walking speed, the controller optimizes joint torques without depending on prior selected foot step locations. The controller is validated in MuJoCo simulation on the 18-DoF bipedal robot HyPer-2

[286]  arXiv:2605.04608 [pdf, ps, other]
Title: SensingAgents: A Multi-Agent Collaborative Framework for Robust IMU Activity Recognition
Subjects: Artificial Intelligence (cs.AI)

Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is a cornerstone of mobile health, smart environments, and human-computer interaction. However, current deep learning-based HAR models often struggle with heavy reliance on labeled data, position-specific ambiguity, and a lack of transparent reasoning. Inspired by the advanced agents framework, which emulates a collaborative agent using Large Language Models (LLMs), we propose SensingAgents, a novel multi-agent system for robust IMU activity recognition. SensingAgents organizes LLM-powered agents into specialized roles: a group of Analyst Agents for position-specific sensor analysis (arm, wrist, belt, pocket), a pair of Advocate Agents that resolves sensor conflicts through dynamic and static dialectical debates, and a Decision Agent that ensures reliability under sensor drift or failure. Evaluation on the Shoaib dataset demonstrates that SensingAgents significantly outperforms state-of-the-art single-agent and multi-agent LLM models, achieving an accuracy of 79.5% in a zero setting--29% higher than existing agent models and 9.4% higher than deep learning baselines--particularly in complex scenarios where multi-sensor data is conflicting or noisy. Our work highlights the potential of multi-agent collaborative reasoning for advancing the robustness and interpretability of ubiquitous sensing systems.

[287]  arXiv:2605.04609 [pdf, ps, other]
Title: Advancing Aesthetic Image Generation via Composition Transfer
Journal-ref: International Journal of Computer Vision, 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Composition is a cornerstone of visual aesthetics, influencing the appeal of an image. While its principles operate independently of specific content, in practice, composition is often coupled with semantics. As a result, existing methods often enhance composition either through implicit learning or by semantics-based layout control, rather than explicitly modeling composition itself. To address this gap, we introduce Composer, a framework rooted in aesthetic theory, designed to model composition in a semantic-agnostic manner. First, it supports composition transfer by extracting key composition-aware representations from a reference image and leveraging a tailored conditional guidance module to control composition based on pre-trained diffusion models. Second, when users specify only text themes without a composition reference, Composer supports theme-driven composition retrieval by leveraging the in-context learning capabilities of Large Vision-Language Models (LVLMs), achieving explicit composition planning. To enhance composition in a reference-free mode, we conduct text-to-composition fine-tuning on the trained control module to enable implicit composition planning. Furthermore, we curated a high-quality dataset comprising 2 million image-text pairs using state-of-the-art generative models to support model training. Experimental results demonstrate that Composer significantly enhances aesthetic quality in text-to-image tasks and facilitates personalized composition control and transfer, offering users precision and flexibility in the creative process.

[288]  arXiv:2605.04610 [pdf, ps, other]
Title: Active Contact Sensing for Robust Robot-to-Human Object Handover
Subjects: Robotics (cs.RO)

Robot-to-human object handover is an essential skill for robot assistants, from serving drinks at home to passing surgical tools in the operating room. We expect robots to perform handover robustly -- to release the object only after a firm human grasp while ignoring incidental touches. Existing passive-sensing methods struggle to generalize across diverse objects and human behaviors, as they lack informative perturbations to disambiguate different contact conditions, such as firm grasp versus incidental touch. We propose an active sensing approach for robust handovers: the robot applies information-gathering motions and senses the resulting human-applied forces to infer the contact state. A firm grasp produces forces in multiple directions, while an accidental touch does not. To capture this distinction, we model the contact state with a Bayesian linear model: a distribution over piecewise-linear mappings from robot motions to human-applied forces. This model enables firm grasp detection and active information gathering. In experiments with 12 participants and 30 diverse rigid objects, our method achieved a 97.5% success rate -- over 30% higher than two common baselines.

[289]  arXiv:2605.04612 [pdf, ps, other]
Title: An Axiomatic Analysis of Proportionality Notions in Approval-Based Multiwinner Voting
Subjects: Computer Science and Game Theory (cs.GT)

Even though proportional representation is a fundamental goal in multiwinner voting and a plethora of proportionality notions has been introduced, the normative justifications for choosing one notion over another remain poorly understood. We address this by introducing the axiomatic study of proportionality notions in the approval-based multiwinner voting setting. That is, we define axioms (or desirable properties) that ``good'' proportionality notions should possess. Using these axioms, we then provide axiomatic characterizations of two prominent recently introduced notions: PJR+ and EJR+ [Brill and Peters 2023]. Our characterization proceeds in two parts. Firstly, we provide a characterization of refinements of PJR+ and EJR+. That is, we define axioms such that any notion satisfying these axioms must imply PJR+ (or EJR+, respectively). In particular, the fundamental axiom distinguishing PJR+ and EJR+ from their predecessors PJR and EJR is the classical axiom of monotonicity. Secondly, we introduce our framework of witness-based proportionality notions, that is, proportionality notions that certify ``misrepresentation'' via a witness set of misrepresented voters. In this class, we provide characterizations of PJR+ and EJR+ as the strongest (assuming certain axioms). Thus, by putting both directions together we obtain exact characterizations of both notions. Among our results, it may be worth highlighting that any notion satisfying mild conditions (monotonicity, independence of losers, robustness to fully satisfied voters, and lower quota) refines PJR+. In this sense, PJR+ turns out to be the canonical minimal requirement that one may impose on proportionality.

[290]  arXiv:2605.04613 [pdf, ps, other]
Title: VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

High-quality singing annotations are fundamental to modern Singing Voice Synthesis (SVS) systems. However, obtaining these annotations at scale through manual labeling is unrealistic due to the substantial labor and musical expertise required, making automatic annotation highly necessary. Despite their utility, current automatic transcription systems face significant challenges: they often rely on complex multi-stage pipelines, struggle to recover text-note alignments, and exhibit poor generalization to out-of-distribution (OOD) singing data. To alleviate these issues, we present VocalParse, a unified singing voice transcription (SVT) model built upon a Large Audio Language Model (LALM). Specifically, our novel contribution is to introduce an interleaved prompting formulation that jointly models lyrics, melody, and word-note correspondence, yielding a generated sequence that directly maps to a structured musical score. Furthermore, we propose a Chain-of-Thought (CoT) style prompting strategy, which decodes lyrics first as a semantic scaffold, significantly mitigating the context disruption problem while preserving the structural benefits of interleaved generation. Experiments demonstrate that VocalParse achieves state-of-the-art SVT performance on multiple singing datasets. The source code and checkpoint are available at https://github.com/pymaster17/VocalParse.

[291]  arXiv:2605.04615 [pdf, ps, other]
Title: Beyond Retrieval: A Multitask Benchmark and Model for Code Search
Comments: project site: this https URL
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary relevance. In this paper, we introduce \textsc{CoREB}, a contamination-limited, multitask \underline{co}de \underline{r}etrieval and r\underline{e}ranking \underline{b}enchmark, together with a fine-tuned code reranker, that goes beyond retrieval to cover the full code search pipeline. \textsc{CoREB} is built from counterfactually rewritten LiveCodeBench problems in five programming languages and delivered as timed releases with graded relevance judgments. We benchmark eleven embedding models and five rerankers across three tasks: text-to-code, code-to-text, and code-to-code. Our experiments reveal that: \circone code-specialised embeddings dominate code-to-code retrieval (${\sim}2{\times}$ over general encoders), yet no single model wins all three tasks; \circtwo short keyword queries, the format closest to real developer search, collapse every model to near-zero nDCG@10; \circthree off-the-shelf rerankers are task-asymmetric, with a 12-point swing on code-to-code and no baseline net-positive across all tasks; \circfour our fine-tuned \textsc{CoREB-Reranker} is the first to achieve consistent gains across all three tasks. The data and model are released.

[292]  arXiv:2605.04616 [pdf, ps, other]
Title: Guidelines for Designing AI Technologies to Support Adult Learning
Comments: Pages: 22, Figures: 7, Tables: 3, Conference: Designing Interactive Systems (DIS) 2026, Dates: received 19 January 2026; revised 12 March 2026; accepted 5 June 2026. Jennifer M. Reddig, Glen R. Smith Jr.: co-first authors
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

AI-powered educational technologies have demonstrated measurable benefits for learners, but their design and evaluation have largely centered on K-12 contexts. As a result, many AI-supported learning systems remain poorly aligned with the needs, constraints, and goals of adult learners. To better understand how AI systems function in adult education, this paper examines the deployment of several AI learning technologies developed within a multidisciplinary, national research institute in the United States focused on adult learning and online education. Drawing on longitudinal deployment data, we conducted a reflexive thematic analysis to identify recurring challenges and design considerations across systems. These insights were synthesized into a set of 19 design guidelines intended to inform future AI-supported adult learning technologies. We demonstrate the utility of these guidelines through a heuristic evaluation of the deployed systems. Lastly, we present a guideline exploration tool that aids in the ideation of technologies by connecting the guidelines to stakeholder statements surfaced in the analysis process.

[293]  arXiv:2605.04617 [pdf, ps, other]
Title: Temporal Structure Matters for Efficient Test-Time Adaptation in Wearable Human Activity Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Wearable human activity recognition (WHAR) models often suffer from performance degradation under real-world cross-user distribution shifts. Test-time adaptation (TTA) mitigates this degradation by adapting models online using unlabeled test streams, yet existing methods largely inherit assumptions from vision tasks and underexploit the inherent inter-window temporal structure in WHAR streams. In this paper, we revisit such temporal structure as a feature-conditioned inference signal rather than merely an output-space smoothing prior. We derive the insight that temporal continuity and observation-induced feature deviations provide complementary cues for determining when to preserve or release temporal inertia and where to route prediction refinement during likely transitions. Building upon this insight, we propose SIGHT, a lightweight and backpropagation-free TTA framework for WHAR, enabling real-time edge deployment. SIGHT estimates predictive surprise by comparing the current feature with a prototype-based expected state, and then uses the resulting feature deviation to guide geometry-aware transition routing based on prototype alignment and stream-level marginal habit tracking. Evaluations on real-world datasets confirm that SIGHT outperforms existing TTA baselines while reducing computational and memory costs.

[294]  arXiv:2605.04618 [pdf, ps, other]
Title: Constructions of locally repairable codes via concatenated codes
Subjects: Information Theory (cs.IT)

In recent years, locally repairable codes (LRCs) have attracted considerable attention owing to their pivotal role in distributed storage systems. Since binary linear locally repairable codes can significantly reduce the complexity of both encoding and decoding processes, the construction of binary LRCs has attracted extensive research interest. In this paper, we construct locally repairable codes via concatenated codes and present a systematic approach to select outer codes to obtain optimal binary LRCs, where the outer codes are linear codes over $\mathbb{F}_4$. The weight distributions of the resulting LRCs are determined by the weight distributions of the selected linear codes over $\mathbb{F}_4$. Furthermore, several classes of optimal binary locally repairable codes are constructed, including binary LRCs meeting the Griesmer-like bound, and binary perfect LRCs. Meanwhile, for the locality $r=2$, we improve the Johnson-like bound for binary LRCs with disjoint local repair groups established by Ma and Ge, and construct explicit LRCs that attain this new bound.

[295]  arXiv:2605.04622 [pdf, ps, other]
Title: Library learning with e-graphs on jazz harmony
Comments: 10 pages, 7 figures, 2 listings, 1 table, no conference
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Symbolic Computation (cs.SC)

Humans can acquire a highly structured intuitive understanding of musical patterns, yet these patterns often require multiple iterations of reflection and re-listening to internalize fully. To capture such an internalization process, we present a computational model for the learning of jazz harmonic patterns based on library learning. Given a corpus of harmonic progressions, our model searches over a space of programs composed of primitive harmonic relations in order to discover concise generative explanations of the corpus. The model first enumerates possible programs for each piece, and then jointly learns a library of harmonic patterns and refactored programs. To efficiently navigate the vast joint space of programs and libraries, we integrate deductive parsing with library learning on e-graphs. We explore how well our model captures aspects of human musical pattern learning by evaluating the intuitiveness of both programs and libraries, as well as similarities to human-written harmonic derivations.

[296]  arXiv:2605.04624 [pdf, ps, other]
Title: AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
Comments: 25 pages, 8 figures, NeurIPS 2026 Evaluation and D Directions Track
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Agent-repair leaderboards reorder under evaluator reconfiguration, and a measurable share of the reordering is produced by methods that consult evaluator-derived signal during internal selection of candidate repairs. We document this failure mode on a public leaderboard and release AuditRepairBench, a paired-execution trace corpus of 576,000 registered cells (96,000 executed) that operationalizes evaluator-channel-blocking ranking instability within a declared observability boundary. A modular screening architecture decides pathway-blocking through four interchangeable implementations, a learned influence proxy, a rule-based channel-exposure ratio that uses no trained model, a counterfactual sensitivity proxy, and a sparse human-audit proxy, combined into a screening posterior that feeds a cell-level flip functional, a set-valued label, a stratified system score, and a set-valued leaderboard. The resource is supported by mechanism-anchored validation on an 80-case source-level channel-surgery subset, an independent-discovery protocol under which two annotator groups separated from the pipeline developers discover coupling patterns blinded to the screening design and the frozen ensemble attains pooled AUROC 0.83 on their 79 cases, implementation robustness, uncertainty propagation that raises 95% coverage from 0.81 to 0.95, and forward transfer with pooled community-evaluator Spearman \r{ho} = 0.65. Screening-guided blinding patches reduce rank displacement by 55--74% (mean 62%) at fewer than 50 lines of code, whereas random channel blinding produces at most 7% reduction and generic retraining at most 13%. AuditRepairBench-Lite, a rule-only configuration on a 12,000-cell subset, preserves the leaderboard at Kendall {\tau} = 0.88 under twenty-four GPU-hours and is the primary release artifact at 42 GB.

[297]  arXiv:2605.04627 [pdf, ps, other]
Title: Autonomous Synchronization of Discrete-Time Heterogeneous Multiagent Systems
Authors: Wei Hu, Quanyi Liang
Comments: 9 pages, 7 figures, submitted to IEEE Transactions on Control of Network Systems
Subjects: Multiagent Systems (cs.MA)

This paper investigates the autonomous synchronization problem for discrete-time heterogeneous multiagent systems.
The synchronization problem is transformed into the asymptotic decoupling problem of stable modes in a class of discrete-time linear time-varying systems,
for which we provide a sufficient condition.
Leveraging this condition, synchronization conditions are established.
The synchronization conditions are based on the average of the agents' initial dynamic matrices,
without requiring the differences among these matrices to be small.
This approach reduces the conservativeness of existing conditions and achieves a unification of both homogeneous and heterogeneous systems.
Numerical simulation results are provided to support the theoretical findings.

[298]  arXiv:2605.04629 [pdf, ps, other]
Title: CombOL: a Library for Practical Enumeration and Boltzmann Sampling of Combinatorial Classes
Comments: 10 pages, 2 figures. Submitted to ICMS (International Congress on Mathematical Software) 2026
Subjects: Mathematical Software (cs.MS); Combinatorics (math.CO)

We present CombOL (Combinatorial Objects Library), an open-source library for the enumeration and Boltzmann sampling of combinatorial classes. Classes can be specified by a concise string syntax, and may depend on an arbitrary number of parameters. CombOL automatically derives the associated generating functions, enabling the generation of counting sequences and the compilation of Boltzmann samplers. The library supports exact and approximate-size Boltzmann rejection sampling with automatic parameter tuning to target specific sizes. In addition to implementing established methods, CombOL contributes a novel early-rejection scheme, as well as guaranteed statistical correctness by dynamically increasing the numerical precision, eliminating bias due to floating-point rounding errors. Through the Python interface, sampled structures can be mapped to application-specific objects, enabling direct sampling of domain objects such as graphs, chemical structure representations, or other complex data types. CombOL is available from PyPI as 'combol' (pypi.org/project/combol). The source code is available at gitlab.com/casbjorn/combol.

[299]  arXiv:2605.04635 [pdf, ps, other]
Title: UniPCB: A Generation-Assisted Detection Framework for PCB Defect Inspection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Printed Circuit Board (PCB) defect inspection faces two compounding challenges: scarce and imbalanced defect samples that limit model training, and insufficient feature representation under complex circuit backgrounds. Existing generation methods rely on single-modality conditions with coarse structural control, while detection methods improve architectures without addressing the data bottleneck. To resolve both challenges jointly, we propose a generation-assisted PCB defect inspection framework that integrates controlled defect synthesis with task-specific defect detection. On the generation side, a Multi-modal Condition Generator extracts complementary edge, depth, and text conditions in parallel. A ScaleEncoder then embeds these conditions into the diffusion U-Net at four resolutions, and a Condition Modulation applies FiLM-style spatially-adaptive modulation at each scale, enabling structurally aligned and defect-aware sample synthesis. On the detection side, an Inverted Residual Shift Attention couples self-attention with shift-wise convolution to jointly capture global context and local texture, and a Cross-level Complementary Fusion Block generates pixel-level gates for selective cross-level feature fusion. The synthesized samples directly enrich the detection training set, so that improvements in generation compound with improvements in detection. Extensive experiments on DsPCBSD+ demonstrate that UniPCB achieves mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% on defect detection, surpassing all compared methods, while the generation branch attains an FID of 129.61 and SSIM of 0.619, outperforming existing conditional generation approaches.

[300]  arXiv:2605.04637 [pdf, ps, other]
Title: SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies
Comments: 35 pages, 12 figures, 18 tables
Subjects: Multiagent Systems (cs.MA); Software Engineering (cs.SE)

The emergence of "vibe coding" platforms, where users describe applications in natural language and AI agents autonomously generate full-stack software, has created a need for rigorous evaluation beyond code-level benchmarks. In order to assess them as virtual software development agencies on understanding business requirements, making architectural decisions, writing production code, handling iterative modifications, and maintaining business readiness, we introduce SWE-WebDev Bench, a 68-metric evaluation framework spanning 25 primary and 43 diagnostic metrics across seven groups, organized along three dimensions: Interaction Mode (App Creation Request (ACR) vs. App Modification Request (AMR)), Agency Angle (Product Manager (PM), Engineering, Ops), and Complexity Tier (T4 multi-role SaaS, T5 AI-native).
Our evaluation (six platforms, three domains, 18 evaluation cells) reveals four recurring shortcomings in the current generation of AI app builders: (1) A specification bottleneck, where platforms compress rich business requirements into oversimplified technical plans, (2) A pervasive frontend-backend decoupling, where visually polished UIs mask absent or broken backend infrastructure, (3) A steep production-readiness cliff, where no platform scores above 60% on engineering quality and post-generation human effort varies substantially across platforms and (4) Widespread security and infrastructure failures, with no platform exceeding 65% Security Score against a 90% target and concurrency handling as low as 6%. These observations are descriptive of our sample and require larger-scale replication to establish generality. We release SWE-WebDev Bench as a community benchmark to enable such replication and help platform builders identify and address these gaps.
Code and benchmark resources are available at: https://github.com/snowmountainAi/webdevbench and https://webdevbench.com/.

[301]  arXiv:2605.04638 [pdf, ps, other]
Title: Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models
Comments: Accepted by ICML 2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Uncertainty quantification (UQ) is an important technique for ensuring the trustworthiness of LLMs, given their tendency to hallucinate. Existing state-of-the-art UQ approaches for free-form generation rely heavily on sampling, which incurs high computational cost and variance. In this work, we propose the first gradient-based UQ method for free-form generation, SemGrad, which is sampling-free and computationally efficient. Unlike prior gradient-based methods developed for classification tasks that operates in parameter space, we propose to consider gradients in semantic space. Our method builds on the key intuition that a confident LLM should maintain stable output distributions under semantically equivalent input perturbations. We interpret the stability as the gradients in semantic space and introduce a Semantic Preservation Score (SPS) to identify embeddings that best capture semantics, with respect to which gradients are computed. We further propose HybridGrad, which combines the strengths of SemGrad and parameter gradients. Experiments demonstrate that both of our methods provide efficient and effective uncertainty estimates, achieving superior performance than state-of-the-art methods, particularly in settings with multiple valid responses.

[302]  arXiv:2605.04639 [pdf, ps, other]
Title: Cognitive Alignment Drives Attention: Modeling and Supporting Socially Shared Regulation in Pair Programming
Subjects: Human-Computer Interaction (cs.HC)

Grounded in socially shared regulation of learning (SSRL), this paper investigates how joint mental effort (JME) and joint visual attention (JVA) serve as process-level indicators of shared regulation in pair programming and how AI-driven adaptive feedback can strengthen these processes.
We present three eye-tracking studies involving 182 dyads engaged in collaborative debugging tasks. Study 1 examines natural collaboration and shows that high-performing dyads exhibit significantly higher JME and JVA, a greater prevalence of productive high-JME-high-JVA episodes, and a stable causal relationship in which JME predicts JVA. Study 2 evaluates reactive adaptive feedback based on real-time deviations in JME and/or JVA. Results show that combined feedback targeting both dimensions yields the strongest improvements in performance, regulatory coherence, and cognitive-to-attentional causality, outperforming single-channel feedback. Study 3 introduces proactive, forecast-based feedback using machine-learning predictions of future collaboration states. Proactive support further enhances performance and sustains shared regulation by anticipating breakdowns before they manifest.
Across studies, causal modeling reveals that cognitive alignment systematically drives attentional coordination in successful collaboration, while mismatches between effort and attention characterize unproductive regulation. Methodologically, this work integrates dual eye-tracking, pupillometry, episode-based analysis, and causal inference to capture SSRL as a dynamic, emergent process. Conceptually, the findings position AI not as an automated controller, but as an intelligence-augmenting co-regulator that supports learners' capacity to coordinate effort, attention, and understanding together.

[303]  arXiv:2605.04641 [pdf, ps, other]
Title: CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Although Large Vision-Language Models (LVLMs) have demonstrated remarkable performance on downstream tasks, they frequently produce contents that deviate from visual information, leading to object hallucination. To tackle this, recent works mostly depend on expensive manual annotations and training cost, or decoding strategies which significantly increase inference time. In this work, we observe that LVLMs' attention to visual information is significantly enhanced when answering caption queries compared to non-caption queries. Inspired by this phenomenon, we propose Caption-guided Visual Attention Steering (CAST), a training-free, plug-and-play hallucination mitigation method that leverages the attention activation pattern corresponding to caption queries to enhance LVLMs' visual perception capability. Specifically, we use probing techniques to identify attention heads that are highly sensitive to caption queries and estimate optimized steering directions for their outputs. This steering strengthens LVLM's fine-grained visual perception capabilities, thereby effectively mitigating object hallucination. CAST reduced object hallucination by an average of 6.03% across five widely used LVLMs and five benchmarks including both discriminative and generative tasks, demonstrating state-of-the-art performance while adding little inference cost and preserving other foundational capabilities.

[304]  arXiv:2605.04642 [pdf, ps, other]
Title: Securing the Web with HSTS-Enforced
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)

TLS stripping attacks expose sensitive web traffic by forcing secure HTTPS connections to fall back to unencrypted HTTP. At present, protection against these attacks relies on website operators explicitly opting into security by deploying mechanisms such as HTTP Strict Transport Security (HSTS) headers. These mechanisms have significant limitations: some are weak or difficult to configure, which raises the risk of misconfiguration and reduces practical adoption; others violate HTTP backward compatibility; at least one can even be abused to enable unintended user tracking.
We introduce HSTS-Enforced, a mechanism that eliminates the remaining attack surface for TLS stripping while still allowing operators to securely specify that their websites need to be accessed over HTTP when necessary, thereby maintaining accessibility. To achieve this, we flip the current opt-in security model to an opt-out model: all connections default to HTTPS, and operators can explicitly opt out if their websites require HTTP using so-called HTTP-Required indicators. We propose two such HTTP-Required indicators: a new DNS record and an HTTP-Required Preload list. We evaluate HSTS-Enforced under multiple deployment scenarios, demonstrating that it blocks all practical TLS stripping attempts while maintaining compatibility for sites that require HTTP - without introducing overhead in the typical case. Finally, we outline a practical transition path to accelerate global adoption.

[305]  arXiv:2605.04643 [pdf, ps, other]
Title: Graph-Augmented LLMs for Swiss MP Ideology Prediction
Comments: Accepted by SwissText 26
Subjects: Computation and Language (cs.CL)

Approximating the ideological position of Members of Parliament (MPs) is a fundamental task in political science, helping researchers understand legislative behavior, party alignment, and policy preferences. While Large Language Models (LLMs) have shown promising results in estimating MPs' ideological stances, there are more actors and elements in the parliamentary system, and relations between them, that could provide a wider and more informative picture. However, due to the complexity of integrating them in the prediction task, these additional elements are generally ignored. In this work, we propose an LLM framework, PG-RAG, that implements a retrieval-augmented generation pipeline: it first queries a political knowledge graph (KG) and then integrates the resulting graph-structured information into the context. This allows for capturing both textual semantics and inter-MP relationships, another relevant information source in any parliamentary system. We evaluate the approach on the task of ideology prediction, using data from a Swiss parliamentary dataset. When comparing graph-augmented models against several state-of-the-art baselines, the results demonstrate that incorporating this enriched information, which encodes information about different entities and relations, improves prediction performance. These results help to highlight the value of domain-specific relational information in modeling political behavior.

[306]  arXiv:2605.04644 [pdf, ps, other]
Title: Heat and mass transfer through fabric: a model for fabric drying with heated cylinders
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)

Textile drying is a key operation in the textile production cycle as it represents one of the most energy-intensive stages and plays a critical role in determining both product quality and overall process efficiency. In this work we propose a mathematical model for the drying process of a generic textile material using heated cylinders, operating under low-pressure conditions. The model's parameters are estimated by nonlinear least squares regression. Given a specific fabric, the developed model allows to predict the drying time and the residual moisture content. The model is validated using real world data provided by a major Italian textile company.

[307]  arXiv:2605.04647 [pdf, ps, other]
Title: ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving
Subjects: Robotics (cs.RO)

We introduce ReflectDrive-2, a masked discrete diffusion planner with separate action expert for autonomous driving that represents plans as discrete trajectory tokens and generates them through parallel masked decoding. This discrete token space enables in-place trajectory revision: AutoEdit rewrites selected tokens using the same model, without requiring an auxiliary refinement network. To train this capability, we use a two-stage procedure. First, we construct structure-aware perturbations of expert trajectories along longitudinal progress and lateral heading directions and supervise the model to recover the original expert trajectory. We then fine-tune the full decision--draft--reflect rollout with reinforcement learning (RL), assigning terminal driving reward to the final post-edit trajectory and propagating policy-gradient credit through full-rollout transitions. Full-rollout RL proves crucial for coupling drafting and editing: under supervised training alone, inference-time AutoEdit improves PDMS by at most $0.3$, whereas RL increases its gain to $1.9$. We also co-design an efficient reflective decoding stack for the decision--draft--reflect pipeline, combining shared-prefix KV reuse, Alternating Step Decode, and fused on-device unmasking. On NAVSIM, ReflectDrive-2 achieves $91.0$ PDMS with camera-only input and $94.8$ PDMS in a best-of-6 oracle setting, while running at $31.8$ ms average latency on NVIDIA Thor.

[308]  arXiv:2605.04649 [pdf, ps, other]
Title: From Reach to Insert: Tactile-Augmented Precision Assembly under Sub-Millimeter Tolerances
Comments: 8 pages, 9 figures
Subjects: Robotics (cs.RO)

High-precision assembly frequently involves tight-tolerance insertions, where even slight pose errors can cause jamming or excessive interaction forces, making robust and safe insertion policies difficult to obtain. This paper proposes a tactile-augmented two-stage method that combines Imitation Learning (IL) and Reinforcement Learning (RL) for precision insertion tasks. In the first stage, IL learns a reaching policy with position generalization that grasps the peg and brings it to the vicinity of the target region. In the second stage, RL executes the insertion and enables recovery from failures during contact-rich interactions. To better exploit tactile feedback, we introduce tactile group sampling to increase coverage of critical contact segments during training, and design a tactile critic to more accurately evaluate policy values, improving insertion performance while maintaining low contact forces. We conduct systematic experiments across five hole geometries and three clearance settings. Results show that our method substantially improves insertion performance across all settings; under the most challenging 0.05\,mm clearance, it achieves a 67\% success rate while keeping contact forces low, reducing the maximum interaction force by 60\% and torque by 44\%, thereby validating both effectiveness and safety for precision assembly.

[309]  arXiv:2605.04651 [pdf, ps, other]
Title: FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation
Comments: 9 pages, 6 figures, 10 tables
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytically compiles labeled examples into fast weights in a single pass. By eliminating memory or context dependence, FAAST achieves constant-time inference and decouples task adaptation from pretrained representation. Across image classification and language modeling benchmarks, FAAST matches or exceeds backprop-based adaptation while reducing adaptation time by over 90\% and is competitive to memory/context-based adaptation while saving memory usage by up to 95\%. These results demonstrate FAAST as a highly efficient, scalable solution for supervised task adaptation, particularly for resource-constrained models. We release the code and models at https://github.com/baoguangsheng/faast.

[310]  arXiv:2605.04652 [pdf, ps, other]
Title: CHE-TKG: Collaborative Historical Evidence and Evolutionary Dynamics Learning for Temporal Knowledge Graph Reasoning
Subjects: Computation and Language (cs.CL)

Temporal knowledge graph (TKG) reasoning aims to predict future events from historical facts. A key challenge lies in jointly capturing two sources of predictive information in TKGs: historical evidence and evolutionary dynamics. However, existing methods typically focus on only one of these sources, which limits the ability to fully exploit the complementary predictive signals in TKGs. To address this, we propose CHE-TKG, a novel collaborative dual-view learning framework for TKG reasoning. CHE-TKG explicitly separates and jointly models historical evidence and evolutionary dynamics, aiming to learn and exploit their complementary predictive signals. Specifically, CHE-TKG constructs a historical evidence graph to capture long-term structural regularities and stable relational constraints, alongside an evolutionary dynamics graph to model temporal transitions and recent changes, with dedicated encoders for each view. We further employ relation decomposition and a contrastive alignment objective to better capture the predictive signals across the two views. Extensive experiments demonstrate that CHE-TKG achieves state-of-the-art performance on multiple benchmarks.

[311]  arXiv:2605.04653 [pdf, ps, other]
Title: Threshold-Guided Optimization for Visual Generative Models
Comments: Accepted to ICML 2026
Subjects: Machine Learning (cs.LG)

Aligning large visual generative models with human feedback is often performed through pairwise preference optimization. While such approaches are conceptually simple, they fundamentally rely on annotated pairs, limiting scalability in settings where feedback is collected as independent scalar ratings. In this work, we revisit the KL-regularized alignment objective and show that the optimal policy implicitly compares each sample's reward to an instance-specific baseline that is generally intractable. We propose a threshold-guided alignment framework that replaces this oracle baseline with a data-driven global threshold estimated from empirical score statistics. This formulation turns alignment into a binary decision task on unpaired data, enabling effective optimization directly from scalar feedback. We also incorporate a confidence weighting term to emphasize samples whose scores deviate strongly from the threshold, improving sample efficiency. Experiments across both diffusion and masked generative paradigms, spanning three test sets and five reward models, show that our method consistently improves preference alignment over previous methods. These results position our threshold-guided framework as a simple yet principled alternative for aligning visual generative models without paired comparisons.

[312]  arXiv:2605.04656 [pdf, ps, other]
Title: Adaptive MPC for Constrained Trajectory Tracking of Uncertain LTI System with Input-Rate Limits
Subjects: Systems and Control (eess.SY)

This paper addresses the trajectory-tracking problem for discrete-time linear time-invariant systems with bounded parametric uncertainty, subject to hard constraints on system states, control inputs, and input rates. Unlike existing methods, which often consider only partial uncertainty, omit input-rate or state constraints, or focus on regulation problems, this work provides a systematic adaptive model predictive control (MPC) solution for constrained trajectory tracking under full parametric uncertainty. Determining the control input required to achieve zero tracking error under unknown parameters is challenging. Simultaneously, trajectory tracking under uncertainty with input-rate constraints induces temporal coupling in the control sequence, resulting in a time-varying admissible control set and rendering standard recursive feasibility arguments inapplicable. These challenges are overcome by systematically utilizing the estimated system parameters, coupled with a suitably designed adaptive learning process within a reformulated MPC framework. The recursive feasibility of the proposed MPC optimization routine is then rigorously established despite the time-varying admissible control set induced by input-rate constraints. Closed-loop stability is guaranteed via Lyapunov-based analysis, ensuring convergence of the tracking error and boundedness of system states. Simulation results validate the effectiveness of the pr

[313]  arXiv:2605.04657 [pdf, ps, other]
Title: Logics for Context-free Hyperproperties
Subjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL)

We introduce a novel logic for the specification of context-free hyperproperties, which capture, e.g., the flow of information in security-critical recursive systems. Intuitively, the logic extends visibly pushdown automata by quantification over traces, just like HyperLTL, the most important logic for regular hyperproperties, extends LTL by quantification over traces. Using a game-based approach, we show that model-checking is decidable for formulas with a single quantifier alternation, provided the stack height of the visibly pushdown automaton only depends on the traces bound to the variables of the first quantifier block. A single quantifier alternation suffices to express many information-flow properties studied in the literature. Complementarily, we show that model-checking is undecidable for formulas with a single quantifier alternation, if the stack behavior of the visibly pushdown automaton may depend on the second quantifier block. This also implies that model-checking is undecidable for almost all fragments with more than one quantifier alternation.

[314]  arXiv:2605.04660 [pdf, ps, other]
Title: A third-order multi-moment cell-centered Lagrangian scheme for hydrodynamics with an accurate 2D nodal solver
Subjects: Numerical Analysis (math.NA)

This paper presents a novel high-order cell-centered Lagrangian scheme for 2D compressible hydrodynamics by bridging the multi-moment constrained finite volume method (MCV) [16, 51, 52] with a nodal Riemann solver. This scheme (denoted by LMCV) not only maintains high-order accuracy as MCV but also inherits the conservation and robust properties of the nodal Riemann solver. On the one hand, the MCV employs and evolves both the point-values (PV) at cell vertexes and the volume-integrated averages (VIA) on computational mesh, which ensures the rigorous numerical conservation and establishes an adequate foundation for the computation of Lagrangian fluxes with high accuracy. On the other hand, we developed a 2D Riemann solver based on EUCCLHYD [24], it takes fully advantage of numerical formulations from high-order scheme and accomplishes the compatibility between the mesh movement and numerical fluxes. The main new features of the solver are the introduction of a new set of jump and balance conditions. The jump condition provides a high-accurate formulation linking the surface pressure of each cell to its nodal velocity, while the balance condition ensures nodal conservation and stabilizes the velocity field without losing accuracy. More intriguing is that our nodal solver can be regarded as a natural high-order extension of the HLLC and the HLLC-2D [41] solvers. The comparison between these solvers better demonstrates our innovative approach in addressing the difficulties encountered in constructing 2D high-order Lagrangian schemes. A variety of numerical experiments are carried out to illustrate the accuracy and robustness of the algorithm.

[315]  arXiv:2605.04662 [pdf, ps, other]
Title: Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generating realistic reactive motions, in which one person reacts to the fixed motions of others, is challenging due to strict interaction constraints and a limited feasible solution space. This paper focuses on a typical scenario: duet dance, where high-quality data is scarce, motion patterns are complex, and the details of human interactions are both intricate and abundant. To tackle these challenges, we propose a novel two-stage framework. In the first stage, we introduce a motion VQ-VAE with separate body-part encoders and a joint decoder, enabling specialized codebooks to enhance representation capacity while dynamically modeling dependencies across body parts during decoding, thereby preventing inconsistencies in the generated motions. In the second stage, we propose a contact-aware diffusion model for reactive motion generation that jointly generates motion and a contact matrix between individuals, enabling explicit interaction modeling and providing guidance toward more precise and constrained interaction dynamics during sampling. Experiments show that our method outperforms Duolando with lower $\text{FID}_k$ (8.89 vs. 25.30) and $\text{FID}_{cd}$ (8.01 vs. 9.97), as well as a higher BED (0.4606 vs. 0.2858), indicating improved interaction fidelity and rhythmic synchronization.

[316]  arXiv:2605.04664 [pdf, ps, other]
Title: Evidence-based anomaly detection in clinical domains
Comments: Published at AMIA Annual Symposium 2007. PDF-only submission; LaTeX source not available (paper was authored in Word)
Subjects: Machine Learning (cs.LG)

Anomaly detection methods can be very useful in identifying interesting or concerning events. In this work, we develop and examine new probabilistic anomaly detection methods that let us evaluate management decisions for a specific patient and identify those decisions that are highly unusual with respect to patients with the same or similar condition. The statistics used in this detection are derived from probabilistic models such as Bayesian networks that are learned from a database of past patient cases. We apply our methods to the problem of identifying unusual patient-management decisions in post-surgical cardiac patients.

[317]  arXiv:2605.04665 [pdf, ps, other]
Title: Paraphrase-Induced Output-Mode Collapse: When LLMs Break Character Under Semantically Equivalent Inputs
Subjects: Computation and Language (cs.CL)

When the substantive content of a request is rewritten, do large language models still answer in the format the original task asked for? We find that they often do not, even at temperature zero. On a 150-query evaluation over five compact 2025-era LLMs and four task types, we observe a systematic failure mode we call prompt-variant output-mode collapse: when a closed-form prompt asks for a bare label or a single choice token, content-preserving prompt variants can push the model into conversational prose, the requested format dissolves, and exact-match evaluation pipelines silently misjudge the result. To make this measurable, we release PARACONSIST, a 900-prompt benchmark of 150 base queries with five lexical, syntactic, and semantic-expansion prompt variants each, and a Semantic Consistency Score that decomposes prompt-variant robustness into answer consistency, sentence-BERT semantic similarity, and length stability. Under a whole-word answer-set match, only ~22% of closed-form variant responses preserve the ground-truth label inside their output, while ~78% drift away from the answer space entirely. In our pool, the dominant predictor of collapse is task structure rather than model identity, with model differentiation jointly carried by answer consistency and length stability. Robustness audits should therefore track response-mode preservation as a first-class reliability target alongside answer accuracy.

[318]  arXiv:2605.04666 [pdf, ps, other]
Title: Feature importance analysis for patient management decisions
Comments: Published at MEDINFO 2010. doi:10.3233/978-1-60750-588-4-861. PDF-only submission; LaTeX source not available
Subjects: Machine Learning (cs.LG)

The objective of this paper is to understand what characteristics and features of clinical data influence physician's decision about ordering laboratory tests or prescribing medications the most. We conduct our analysis on data and decisions extracted from electronic health records of 4486 post-surgical cardiac patients. The summary statistics for 335 different lab order decisions and 407 medication decisions are reported. We show that in many cases, physician's lab-order and medication decisions can be well predicted from a small subset of all features.

[319]  arXiv:2605.04671 [pdf, ps, other]
Title: ITBoost: Information-Theoretic Trust for Robust Boosting
Subjects: Machine Learning (cs.LG)

Gradient boosting remains a strong and widely used method for tabular data learning, but its performance often degrades when training labels are noisy. This behavior is largely related to the way boosting algorithms emphasize samples with large gradients, without explicitly accounting for whether such errors originate from informative hard cases or from unreliable labels. We address this issue by reconsidering how sample reliability is evaluated during boosting. Instead of relying on instantaneous error, we examine the evolution of each sample's residuals across iterations. Based on this insight, we propose Information-Theoretic Trust Boosting (ITBoost), which uses the Minimum Description Length principle to measure the complexity of residual trajectories. Samples whose residual patterns fluctuate in an irregular manner are treated as less trustworthy and are down-weighted during learning. Theoretically, we derive a tighter generalization bound for ITBoost under label noise. Empirical results on various tabular benchmarks indicate that ITBoost provides improved robustness in noisy environments over leading boosting and deep tabular models, while retaining best average performance on clean data.

[320]  arXiv:2605.04672 [pdf, ps, other]
Title: AI-Aided Advancements in Autonomous Underwater Vehicle Navigation
Subjects: Robotics (cs.RO)

Autonomous underwater vehicles (AUVs) have become indispensable for deep-sea exploration, spanning critical scientific research and commercial applications. The rapid attenuation of electromagnetic waves renders satellite radio signals unavailable, while the dynamic unpredictability of the marine environment presents formidable navigation challenges. This chapter explores recent advancements in AI-aided AUV positioning, specifically focusing on advanced sensor fusion architectures that integrate inertial navigation systems with Doppler velocity logs and cameras. Beyond traditional model-based filtering, we examine the transformative emergence of AI-driven learning approaches in enhancing inertial dead-reckoning tasks and adaptive fusion algorithms. By addressing these recent milestones, this chapter provides a comprehensive roadmap for achieving the high-precision navigation essential for autonomous underwater missions.

[321]  arXiv:2605.04675 [pdf, ps, other]
Title: Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Pattern
Comments: CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visible-thermal (RGB-T) object detection is a crucial technology for applications such as autonomous driving, where multimodal fusion enhances performance in challenging conditions like low light. However, the security of RGB-T detectors, particularly in the physical world, has been largely overlooked. This paper proposes a novel approach to RGB-T physical attacks using adversarial clothing with a non-overlapping RGB-T pattern (NORP). To simulate full-view (0$^{\circ}$--360$^{\circ}$) RGB-T attacks, we construct 3D RGB-T models for human and adversarial clothing. NORP is a new adversarial pattern design using distinct visible and thermal materials without overlap, avoiding the light reduction in overlapping RGB-T patterns (ORP). To optimize the NORP on adversarial clothing, we propose a spatial discrete-continuous optimization (SDCO) method. We systematically evaluated our method on RGB-T detectors with different fusion architectures, demonstrating high attack success rates both in the digital and physical worlds. Additionally, we introduce a fusion-stage ensemble method that enhances the transferability of adversarial attacks across unseen RGB-T detectors with different fusion architectures.

[322]  arXiv:2605.04677 [pdf, ps, other]
Title: CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement
Comments: 14 pages, 2 figures
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

We present CodeEvolve, an evolutionary framework for improving program performance and code quality with Large Language Models (LLMs). CodeEvolve extends OpenEvolve with runtime-guided target selection, Monte Carlo Tree Search (MCTS), automated code refinement, and language-specific evaluation pipelines for Java and Salesforce Apex. The system uses Java Flight Recorder (JFR) profiles to build weighted component graphs and select optimization targets that account for most execution cost, reducing reliance on manual bottleneck identification. For each target, CodeEvolve generates candidate edits, evaluates them through build validation, unit tests, performance checks, static analysis, and LLM-based review, and retains only variants that preserve functional correctness. Across real-world optimization tasks, CodeEvolve improves performance and code metrics while maintaining correctness. On a large enterprise Java codebase, it achieves an average speedup of 15.22$\times$ across seven hotspot functions and outperforms single-pass LLM optimization on five of them. An ablation study on Apex optimization shows that the full MCTS-augmented configuration produces 19.5 valid programs out of 20 on average, indicating that search, filtering, and refinement each contribute to more reliable optimization.

[323]  arXiv:2605.04678 [pdf, ps, other]
Title: From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Latent actions serve as an intermediate representation that enables consistent modeling of vision-language-action (VLA) models across heterogeneous datasets. However, approaches to supervising VLAs with latent actions are fragmented and lack a systematic comparison. This work structures the study of latent action supervision from two perspectives: (i) regularizing the trajectory via image-based latent actions, and (ii) unifying the target space with action-based latent actions. Under a unified VLA baseline, we instantiate and compare four representative integration strategies. Our results reveal a formulation-task correspondence: image-based latent actions benefit long-horizon reasoning and scene-level generalization, whereas action-based latent actions excel at complex motor coordination. Furthermore, we find that directly supervising the VLM with discrete latent action tokens yields the most effective performance. Finally, our experiments offer initial insights into the benefits of latent action supervision in mixed-data, suggesting a promising direction for VLA training. Code is available at https://github.com/RUCKBReasoning/From_Pixels_to_Tokens.

[324]  arXiv:2605.04679 [pdf, ps, other]
Title: Ultra Low-Power SDM-based Circuit-Switching for Networks-on-Chip
Comments: in 11th. FPGAWORLD Conference, 2014. the paper was accepted in FPGAworld 14, but has no doi and was not published in any digital library
Subjects: Hardware Architecture (cs.AR)

In many modern AI chips and multicore systems-on-chip, embedded applications exhibit predictable inter-core traffic behavior that can be characterized at design time. For such applications, a variety of design-time traffic management and network optimization techniques can be employed to improve NoC power and performance. To exploit this predictability, we propose a novel low-power circuit-switched NoC design. It uses the Spatial Division Multiplexing (SDM) technique to establish circuits, implemented as subsets of NoC wires, for the communication flows of a target application. To further reduce the power profile of SDM, the design incorporates a new router architecture that combines hard-wired switches with conventional programmable crossbars. The architecture is complemented by an algorithm that maps application tasks onto a mesh NoC and assigns an SDM route with adequate bit-width to each circuit built for inter-task communication flows. Compared with a conventional packet-switched NoC, the proposed approach achieves approximately 38% lower NoC power consumption, 19% smaller area, and 12% lower packet latency.

[325]  arXiv:2605.04680 [pdf, ps, other]
Title: Multi-Level Bidirectional Biomimetic Learning for EEG-Based Visual Decoding
Comments: 20 pages, 13 figures, 15 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

EEG-based visual neural decoding aims to align neural responses with visual stimuli for tasks such as image retrieval. However, limited paired data and a fundamental mismatch between high-fidelity digital images and biological visual perception - distorted by retinotopic mapping and subject-specific neuroanatomy - severely impede cross-modal alignment. To address this, we propose MB2L, a Multi-Level Bidirectional Biomimetic Learning framework that incorporates structured physiological inductive biases into representation learning. Specifically, we propose Adaptive Blur with Visual Priors to mitigate perceptual-structural mismatch by reweighting visual inputs according to retinotopic priors. We further propose Biomimetic Visual Feature Extraction to learn multi-level visual representations consistent with hierarchical cortical processing, enhancing subject-invariant encoding. These modules are jointly optimized via Multi-level Bidirectional Contrastive Learning, which aligns EEG and visual features in a shared semantic space through bidirectional contrastive objectives. Experiments show MB2L achieves 80.5% Top-1 and 97.6% Top-5 accuracy on zero-shot EEG-to-image retrieval, significantly outperforming prior methods and demonstrating strong generalization across subjects and experimental settings.

[326]  arXiv:2605.04682 [pdf, ps, other]
Title: HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Spatial transcriptomics offers spatially resolved gene expression profiling within tissue sections, but its cost and limited throughput hinder large-scale deployment. To extend this capability to routine practice, recent computational methods aim to infer spatial gene expression directly from ubiquitous hematoxylin and eosin-stained histology slides. However, most existing models assume Cartesian or geometry-agnostic locality, despite the hexagonal sampling of widely used spot-array platforms, and point-wise regression objectives often yield over-smoothed gene expression profiles, obscuring gene-specific spatial heterogeneity. To address these, we propose HEXST, a geometry-aligned Transformer for spatial gene expression prediction from histology. HEXST operates directly on hexagonal spot coordinates to enable efficient local-to-global contextual modeling via tailored shifted-window attention mechanism and hexagonal rotary positional encoding. To enhance gene-wise spatial contrast, HEXST complements point-wise regression with a contrast-sensitive differential objective and transcriptomic priors from a pretrained single-cell foundation model during training. Across seven spatial transcriptomics datasets, HEXST consistently outperforms state-of-the-art models, providing accurate and robust spatial gene expression predictions while preserving gene-wise contrast and spatial heterogeneity.

[327]  arXiv:2605.04683 [pdf, ps, other]
Title: Average Attention Transformers and Arithmetic Circuits
Subjects: Computational Complexity (cs.CC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are given as an input to an encoder. The circuit families that can be simulated this way have constant depth while using unbounded addition, binary multiplication and sign gates. The transformers we use have arithmetic circuits instead of feed-forward networks. With typical average attention the functions they compute are also computed by the same class of circuit families. Our results hold for transformers over the reals, rationals and any ring in between the two.

[328]  arXiv:2605.04688 [pdf, ps, other]
Title: Hamiltonian Interface Dynamics for Reduced-Order Optimization of Incompressible Mixing
Comments: 29 pages
Subjects: Numerical Analysis (math.NA)

We develop a reduced-order framework for optimizing mixing in two-dimensional incompressible flows. Instead of optimizing the full transport PDE, the method maximizes the length of advected material interfaces, leading to a finite-dimensional Hamiltonian control problem based on parametrized stream functions. We derive the continuous adjoint equations and reduced gradients, and discretize the forward and adjoint dynamics with the implicit midpoint rule. The resulting discrete adjoint is algebraically consistent with the derivative of the fully discrete objective, up to the tolerance of the nonlinear midpoint solves. The approach applies to bounded two-dimensional domains with smooth finite-dimensional stream-function parametrizations. Numerical experiments on cellular-flow and Doswell frontogenesis benchmarks show that the optimized time-dependent Hamiltonians generate near-exponential interface stretching and substantially faster decay of the $\dot{H}^{-1}$ mix-norm, in contrast with the polynomial behavior observed for stationary flows. When evaluated on a common reference transport solver, the interface-based controls produce faster $\dot{H}^{-1}$ decay than a Eulerian Sobolev-norm optimizer under a matched setup, while substantially reducing computational cost. We also identify a limitation of the reduced model: increasing the control basis may further improve the interface-length objective without yielding proportional gains in $\dot{H}^{-1}$ mixing, confirming that interface length is an effective but not fully faithful proxy for mixing in geometrically complex regimes.

[329]  arXiv:2605.04690 [pdf, ps, other]
Title: Learning Time-Inhomogeneous Markov Dynamics in Financial Time Series via Neural Parameterization
Comments: 10 pages, 10 figures and 1 table. Presented at The 2026 ASA Midwest Regional Conference in Statistics and Data Science and the 2026 Undergraduate Symposium at the University of Wisconsin - Madison
Subjects: Machine Learning (cs.LG); Mathematical Finance (q-fin.MF)

Modeling the dynamics of non-stationary stochastic systems requires balancing the representational power of deep learning with the mathematical transparency of classical models. While classical Markov transition operators provide explicit, theoretically grounded rules for system evolution, their empirical estimation collapses due to severe data sparsity when applied to high-resolution, high-noise environments. We explore this statistical barrier using financial time series as a canonical, real-world testbed. To overcome the degeneracy of empirical counting, we introduce a framework that utilizes neural networks strictly as parameterization engines to generate explicit, time-varying Markov transition matrices. By constraining the neural network to output its predictions as a formal stochastic operator, we maintain complete structural interpretability. We demonstrate that these learned operators successfully capture complex regime shifts: the state-conditioned model achieves mean row heterogeneity $\bar{\rho} = 0.0073$ while the state-free ablation collapses to exactly zero, and operator row entropy correlates with realized variance at $r = -0.62$ ($p \approx 10^{-251}$), revealing that high-volatility regimes homogenize transition dynamics rather than diversify them. Furthermore, rather than enforcing the Chapman-Kolmogorov equations as a rigid structural requirement, we repurpose them as a localized diagnostic tool to pinpoint specific temporal windows where first-order memory assumptions break down. Ultimately, this framework demonstrates how neural networks can be constrained to make rigorous, classical operator analysis viable for complex real-world time series.

[330]  arXiv:2605.04692 [pdf, ps, other]
Title: Towards Lag Consensus with Noisy Digital Twins Perception in Second-order Multi-agent Cyber-physical Systems
Comments: accepted by IFAC WC 26
Subjects: Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO)

In this paper, we study second-order lag consensus in multi-agent cyber-physical networks subject to random noise and input failures, within a framework modeling the interactions and perceptions between physical twins and digital twins. We propose a lag consensus protocol and establish sufficient conditions for the mean-square (exponential) stability of the resulting stochastic lag error dynamics. The consensus criteria are derived via Lyapunov analysis using the It\^o formula, ensuring robustness to random perturbations and intermittent input failures. Numerical examples illustrate the effectiveness of the proposed method.

[331]  arXiv:2605.04698 [pdf, ps, other]
Title: Gray-Box Poisoning of Continuous Malware Ingestion Pipelines
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Modern malware detection pipelines rely on continuous data ingestion and machine learning to counter the high volume of novel threats. This work investigates a realistic gray-box poisoning threat model targeting these pipelines. Using the secml_malware framework, we generate problem-space adversarial binaries through functionality-preserving manipulations, specifically Import Address Table (IAT) and section injections. We evaluate the impact of these poisoned samples when ingested into a defender's training set for a LightGBM malware detection model. Our empirical results demonstrate that subtle IAT-based perturbations enable compact poisoning samples that significantly degrade detection recall. These findings illustrate the inherent challenge of developing low-visibility adversarial perturbations that maintain high poisoning efficacy within continuous learning systems. We further evaluate a defense mechanism based on a homogeneous ensemble, which successfully identifies and filters up to 95.6% of poisoning attempts while maintaining a high retention rate for legitimate data. These findings emphasize the necessity of robust pre-ingestion validation in production pipelines.

[332]  arXiv:2605.04699 [pdf, ps, other]
Title: A Separation Between Optimal Demand-Oblivious and Demand-Aware Network Throughput
Subjects: Networking and Internet Architecture (cs.NI); Discrete Mathematics (cs.DM)

The performance of distributed applications often critically depends on the interconnecting network or more specifically on its throughput: how fast data can be carried across a network. Over the last years, great progress has been made in understanding demand-oblivious throughput: how fast a given demand matrix describing pairwise communication requirements can be served on a given network. However, surprisingly little is known today about the achievable demand-aware throughput: the throughput on a network topology which can be optimized toward the demand. Such demand-aware networks have recently gained popularity in datacenters and are enabled by emerging reconfigurable optical technologies.
In this paper, we are interested in both the achievable demand-aware throughput bounds as well as in the computational complexity of finding a throughput-optimizing network topology. We take a systematic approach and investigate four variants of demand-aware throughput: we analyze, and derive bounds for, two definitions of throughput, the classic throughput usually considered in the literature, and a new generalized definition which we call weak throughput; for each of them, we consider two routing models, a direct one, where demand can only be served on a single hop, and a general one, where multi-hop routing is allowed.
Our main result is a separation result which solves an open problem in the literature about the classic throughput definition, showing that demand-aware topologies can outperform demand-oblivious topologies even in the worst case: the demand-aware throughput asymptotically approaches at least 5/8, while it is known that the demand-oblivious throughput is n/(2n-1), which is roughly 1/2. In terms of computational complexity, we show that computing the demand-aware weak throughput is NP-hard, but computing the demand-aware (weak) direct throughput is polynomial-time solvable.

[333]  arXiv:2605.04700 [pdf, ps, other]
Title: Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by analyzing the structure of token-aligned gradients in ALMs. We find that gradient energy is highly non-uniform across audio tokens, indicating that only a small subset of token-aligned audio regions dominates the optimization signal. Motivated by this observation, we propose Token-Aware Gradient Optimization (TAGO), which enables sparse jailbreak optimization by retaining only waveform gradients aligned with audio tokens that have high gradient energy, while masking the remaining gradients at each iteration. Across three ALMs, TAGO outperforms baselines, and substantial sparsification preserves strong attack success rates (e.g. on Qwen3-Omni, $\mathrm{ASR}_{l}$ remains at 86% with a token retention ratio of 0.25, compared to 87% with full token retention). These results demonstrate that dense waveform updates are largely redundant, and we advocate that future audio jailbreak and safety alignment research should further leverage this heterogeneous token-level gradient structure.

[334]  arXiv:2605.04701 [pdf, ps, other]
Title: When Graph Traversal Meets Structured Preferences: Unified Framework and Complexity Results
Subjects: Computer Science and Game Theory (cs.GT)

Preference restrictions have played a significant role in computational social choice. This paper studies a framework that connects preference restrictions with classical graph search paradigms. We model candidates as vertices of a graph and interpret the preference ordering of each voter as the outcome of traversing the graph according to a graph search. We focus on six fundamental paradigms: breadth-first search (BFS), depth-first search (DFS), breadth-first search (LexBFS), lexicographic depth-first (LexDFS), maximum cardinality search (MCS), and maximal neighborhood search (MNS).
Within this framework, we study the problem of determining whether a given preference profile admits a graph support subject to structural restrictions, that is, whether there exists a graph such that each preference ordering can be generated by traversing the graph under the chosen paradigm. For all considered paradigms, we show that this problem is NP-hard when the graph support is required to have at most $k$ edges, where $k$ is a given integer. We further extend these hardness results to the case where the graph support is required to have maximum degree $k$. For DFS, we prove that recognizing whether a preference profile admits a tree support can be solved in polynomial time. Moreover, existing results imply polynomial-time solvability of the problem for all remaining graph traversals, except BFS and LexBFS, for which the complexity remains open.

[335]  arXiv:2605.04702 [pdf, ps, other]
Title: FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Identity-preserving text-to-video generation (IPT2V) empowers users to produce diverse and imaginative videos with consistent human facial identity. Despite recent progress, existing methods often suffer from significant identity distortion under large facial pose variations or facial occlusions. In this paper, we propose \textit{FaithfulFaces}, a pose-faithful facial identity preservation learning framework to improve IPT2V in complex dynamic scenes. The key of FaithfulFaces is a pose-shared identity aligner that refines and aligns facial poses across distinct views via a pose-shared dictionary and a pose variation-identity invariance constraint. By mapping single-view inputs into a global facial pose representation with explicit Euler angle embeddings, FaithfulFaces provides a pose-faithful facial prior that guides generative foundations toward robust identity-preserving generation. In particular, we develop a specialized pipeline to curate a high-quality video dataset featuring substantial facial pose diversity. Extensive experiments demonstrate that FaithfulFaces achieves state-of-the-art performance, maintaining superior identity consistency and structural clarity even as pose changes and occlusions occur.

[336]  arXiv:2605.04703 [pdf, ps, other]
Title: Entropy and Distributed Source Coding of Connected Soft Random Geometric Graphs
Subjects: Information Theory (cs.IT); Probability (math.PR)

We consider the distributed compression of Soft Random Geometric Graphs (SRGGs) above the connectivity threshold. We establish the Slepian-Wolf rate region for the SRGG in the setting where there are a finite number of encoders compressing sections of the graph independently. To do so, we prove novel limit theorems and asymptotic equipartition properties for the SRGG and its entropy, which allow us to use random binning techniques for distributed compression.

[337]  arXiv:2605.04704 [pdf, ps, other]
Title: UVMarvel: an Automated LLM-aided UVM Machine for Subsystem-level RTL Verification
Comments: This paper has been accepted by DAC 2026 and will appear in the proceedings
Subjects: Hardware Architecture (cs.AR); Software Engineering (cs.SE)

Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of total effort. While the Universal Verification Methodology (UVM) improves reuse through structured verification environments, constructing subsystem-level UVM testbenches and generating high-quality stimuli still require extensive manual coding, repeated EDA tool runs, and deep protocol and micro-architectural expertise. We present UVMarvel, an automated verification framework that leverages Large Language Models (LLMs) to build UVM testbenches for subsystem-level RTL.UVMarvel introduces an Intermediate Representation (IR) and a Bus Protocol Library to translate heterogeneous specifications into protocol-correct subsystem-level UVM testbenches, and employs a Signal Tracker and a Verilog Patching Library to guide LLM-based stimuli refinement. UVMarvel is the first framework capable of automatically constructing subsystem-level UVM testbenches across mainstream bus protocols, and it achieves an average code coverage of 95.65%, while reducing verification time from several human working days to a 4.5-hour automated execution.

[338]  arXiv:2605.04705 [pdf, ps, other]
Title: Vol-Mark: A Watermark for 3D Medical Volume Data Via Cubic Difference Expansion and Contrastive Learning
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Today, advances in medical technology extensively utilize 3D volume data for accurate and efficient diagnostics. However, sharing these data across networks in telemedicine poses significant security risks of data tampering and unauthorized copying. To address these challenges, this paper proposes a novel reversible-zero watermarking approach, termed Vol-Mark, for medical volume data to protect their ownership and authenticity in telemedicine. The proposed Vol-Mark method offers two key benefits: 1) it designs a volume data feature extractor that leverages contrastive learning to efficiently extract discriminative and stable volumetric features, ensuring robustness against 3D attacks; 2) it introduces the cubic difference expansion (c-DE) technique, which leverages the 3D integer wavelet transform to embed watermark bits into neighboring voxels within cubes at low-frequency coefficients. The voxel differences within each cube are expanded to create embedding space, and a majority voting mechanism is employed during extraction to enhance reliability. The embedding process incurs low distortion and supports lossless removal, thereby preserving the integrity and diagnostic accuracy of medical volume data. Through these two benefits, Vol-Mark enables both integrity verification and ownership verification. Integrity verification is first performed, and ownership verification through hypothesis testing is further conducted to enhance reliability, particularly under data tampering or watermark removal attacks. Comprehensive experimental results show the effectiveness of the proposed method and its superior robustness against conventional, geometric, and hybrid attacks on medical volume data. In particular, through multiple tasks evaluations, Vol-Mark consistently achieves an ACC above 0.90 in most attack scenarios, outperforming existing methods by a clear margin.

[339]  arXiv:2605.04708 [pdf, ps, other]
Title: Differentiable Chemistry in PINNs for Solving Parameterized and Stiff Reaction Systems
Subjects: Machine Learning (cs.LG)

From neural ODEs to continuous-time machine learning, differentiable solvers allow physics, optimization, and simulation to become trainable components within deep learning systems. This has opened the path to a new generation of deep learning frameworks for scientific computing, with many promising applications still emerging. In this paper, we integrate a differentiable chemistry solver into a modified physics-informed neural network to solve parameterized reaction systems that are inherently stiff. The proposed framework introduces several key components required to overcome limitations of standard physics-informed neural networks. These include a differentiable chemistry solver, a network architecture for parameterized solutions, and residual weighting tailored to stiff reactions. We evaluate the framework on a set of differential equations related to hydrogen combustion, which include initial/boundary value problems, inverse parameter identification, and a parameterized partial differential equation. Our results highlight the ability of the proposed approach to extend physics-informed neural networks to stiff chemical systems that were previously inaccessible.

[340]  arXiv:2605.04709 [pdf, ps, other]
Title: ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)

A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding model errors amplified by visual occlusions make deep imagination brittle. We present ELVIS, a latent model predictive controller (MPC) designed to make long-horizon planning practical. ELVIS plans in a Dreamer-style recurrent state space model (RSSM) and replaces standard unimodal model predictive path integral (MPPI) with a Gaussian-mixture MPPI that maintains multiple coherent hypotheses over long horizons, avoiding mode averaging under branching rollouts. In parallel, ELVIS stabilizes deep imagination with a shared uncertainty-aware lambda-return: an ensemble of latent critics defines an upper-confidence-bound (UCB) score that gates a time-varying lambda, adaptively trading off bootstrapping versus look-ahead to limit compounding error during planning. The same return is used both to train an actor-critic prior from imagined rollouts and to score candidate trajectories inside GMM-MPPI, aligning RL objectives with the planner's long-horizon optimization. On fourteen DeepMind Control Suite visual tasks, ELVIS establishes state-of-the-art performance compared with TD-MPC2 and DreamerV3. Finally, ELVIS transfers zero-shot to a real-world sand-spraying task with severe occlusions, improving surface-quality metrics and demonstrating robustness beyond simulation.

[341]  arXiv:2605.04711 [pdf, ps, other]
Title: Budget-aware Auto Optimizer Configurator
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)

Optimizer states occupy massive GPU memory in large-scale model training. However, gradients in different network blocks exhibit distinct behaviors, such as varying directional stability and scale anisotropy, implying that expensive optimizer states are not universally necessary and using a global optimizer is often memory-inefficient. We propose the Budget-Aware Optimizer Configurator (BAOC) to reduce memory cost by assigning suitable optimizer configurations to individual blocks under given budgets. Specifically, BAOC samples gradient streams to derive statistical metrics that quantify the potential performance risk of applying cheaper configurations (e.g., low precision or removing momentum). It then solves a constrained allocation problem to minimize total risk under memory and time budgets, selecting a budget-feasible configuration for each block. Experiments across vision, language, and diffusion workloads demonstrate that BAOC maintains training quality while significantly reducing the memory usage of optimizer states. The code is available at https://anonymous.4open.science/r/BAOC-45C6.

[342]  arXiv:2605.04712 [pdf, ps, other]
Title: SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
Comments: Accepted to ICML 2026
Subjects: Machine Learning (cs.LG)

In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixture-of-Experts (MoE) networks have been reported to enable scaling laws and facilitate the learning of diverse skills. However, in continual reinforcement learning settings, their performance can degenerate as learning proceeds, indicating a loss of plasticity. To address this, building on Neural Tangent Kernel (NTK) theory, we formalize the plasticity loss in MoE policies as a loss of spectral plasticity. We then derive a tractable proxy for spectral plasticity, one expressible in terms of individual expert feature matrices. Leveraging this proxy, we introduce SPHERE, a practical Parseval penalty tailored for MoE-based policies that alleviates the loss of spectral plasticity. On MetaWorld and HumanoidBench, SPHERE improves average success under continual RL by 133% and 50% over an unregularized MoE baseline, while maintaining higher spectral plasticity throughout training.

[343]  arXiv:2605.04713 [pdf, ps, other]
Title: Not Every Subject Should Stay: Machine Unlearning for Noisy Engagement Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Engagement recognition datasets are typically subject-indexed and often contain noisy, subjective supervision, making post-hoc dataset revision a practical problem. Existing noisy-label and data-cleaning methods largely operate at the sample level before or during training, but do not directly address a different question: once a model has already been trained, can the influence of an entire problematic subject be removed without full retraining? We study this setting through subject-level machine unlearning as a post-hoc sanitization mechanism for engagement recognition. Starting from a baseline trained on all subjects, we rank candidate harmful subjects using a model-dependent proxy, apply a lightweight approximate unlearning update, and compare the result against an oracle model retrained from scratch on the retained subjects only. We instantiate this protocol on DAiSEE and EngageNet using Tensor-Convolution and Convolution-Transformer Network (TCCT-Net) as a fixed platform and evaluate three matched model states under the same removal scenario: baseline, unlearned, and oracle. In representative K=3 forget-set settings, the unlearned model recovers 89.3% and 92.5% of the oracle gain on EngageNet and DAiSEE, respectively, at roughly one quarter of retraining cost. Across the tested small-audit regimes, effectiveness is strongest at an intermediate forget-set size, indicating that approximate subject-level unlearning is a useful low-cost correction mechanism, but one whose benefit depends on subject selection quality and removal regime.

[344]  arXiv:2605.04715 [pdf, ps, other]
Title: On the Complexity of Minimum Riesz s-Energy Subset Selection in Euclidean and Ultrametric Spaces
Comments: 7 Figures, 15 pages
Subjects: Computational Geometry (cs.CG); Computational Complexity (cs.CC); Optimization and Control (math.OC)

We study the computational complexity of exact cardinality-constrained minimum Riesz $s$-energy subset selection in finite metric spaces: given $n$ points, select $k<n$ points of minimum Riesz $s$-energy. The objective sums inverse-power pair interactions and therefore promotes well-separated subsets; as $s$ becomes large, it increasingly approaches a bottleneck criterion governed by the closest selected pair, linking it to minimum pairwise distance (MPD). Building on the general-metric NP-hardness result of Pereverdieva et al. (2025), we prove that NP-hardness persists for point sets in the Euclidean plane when $s$ is part of the input. In contrast, finite ultrametric spaces form an exact tractable regime: on rooted binary ultrametric trees with $n$ leaves, an optimal size-$k$ subset can be computed by dynamic programming in $O(nk^2)$ time. We also discuss the ordered one-dimensional Euclidean case, where the classical MPD objective admits simple dynamic programming, but the additive Riesz energy does not appear to allow the same state compression. Finally, we explain why one natural route to fixed-$s$ Euclidean hardness does not close: Fowler-style 3SAT gadgets, together with zeta-function bounds for far-field interactions, show why this approach still requires an exponent depending on $k$. Together, these results provide a compact complexity landscape for a natural diversity or dispersion objective, distinguishing Euclidean hardness, ultrametric tractability, and the ordered one-dimensional case.

[345]  arXiv:2605.04718 [pdf, ps, other]
Title: On Minimum CADs for Algebraic Sets in Dimension Three
Authors: Lucas Michel
Comments: Accepted for publication in the Proceedings of the International Symposium on Symbolic and Algebraic Computation (ISSAC '26)
Subjects: Symbolic Computation (cs.SC); Algebraic Geometry (math.AG)

Cylindrical Algebraic Decomposition (CAD) algorithms typically produce a decomposition adapted to a finite family of semi-algebraic sets $\mathcal{F}$ (i.e. every member of $\mathcal{F}$ is a union of cells). Different algorithms may produce different outputs, and introduce unnecessary cell divisions. Recent work by Michel, Mathonet, and Z\'ena\"idi in ISSAC 2024 formalised this issue by studying the refinement order on the set of all CADs adapted to $\mathcal{F}$ and analysing the existence of a minimum (coarsest) adapted CAD. It was shown that such a minimum adapted CAD always exists for subsets of $\mathbb{R}$ and $\mathbb{R}^2$, but not of $\mathbb{R}^n$ ($n \geqslant 3$) in general.
It is natural to seek natural classes of subsets of $\mathbb{R}^n$ that admit a minimum adapted CAD. In this paper, we identify a class of subsets of $\mathbb{R}^3$ that contains all algebraic sets for which minimum adapted CADs do exist. This provides the first positive existence theorem for minimum CAD for a non-trivial class of sets.

[346]  arXiv:2605.04719 [pdf, ps, other]
Title: Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL
Subjects: Computation and Language (cs.CL)

Tool-integrated Text-to-SQL parsing has emerged as a promising paradigm, framing SQL generation as a sequential decision-making process interleaved with tool execution. However, existing reinforcement learning approaches mainly rely on coarse-grained outcome supervision, resulting in a fundamental credit assignment problem: models receive the same reward for any trajectory that yields the correct answer, even when intermediate steps are redundant, inefficient, or erroneous. Consequently, models are encouraged to explore suboptimal reasoning spaces, limiting both efficiency and generalization. To address this problem, we propose FineStep, a novel framework for step-level credit assignment in tool-augmented Text-to-SQL. First, we introduce a reward design with independent process rewards to alleviate the signal sparsity of outcome supervision. Next, we present a step-level credit assignment mechanism to precisely quantify the value of each reasoning step. Finally, we develop a policy optimization method based on step-level advantages for efficient updates. Extensive experiments on BIRD benchmarks show that FineStep achieves state-of-the-art performance and reduces redundant tool interactions, with a 3.25% average EX gain over GRPO at the 4B scale.

[347]  arXiv:2605.04720 [pdf, ps, other]
Title: A Framework of Secure Source Coding using Mutual Information Security Criterion: Universal Coding, Strong Converse Theorem
Comments: 10 pages, 4 figures
Subjects: Information Theory (cs.IT)

In this paper, we propose a framework of source encryption, where cryptographic processing is applied to a prescribed fixed length source code. The proposed source encryption framework is based on the secure communication framework of the Shannon cipher system. In the proposed framework, we use the mutual information as a measure of information leakage to an adversary. For the proposed framework, we explicitly establish the necessary and sufficient condition for reliable and secure communication under the condition that error probability and information leakage, respectively, are upper bounded by prescribed constants $\epsilon\in (0,1)$ and $\delta \in (0,\infty)$. We also show that the obtained necessary and sufficient condition does not depend on the constants $\epsilon\in (0,1)$ and $\delta\in (0,\infty)$, demonstrating that we have the strong converse theorem for the proposed framework of source encryption. We further prove the existence of encryption/decryption schemes, which are universal in the sense that they work effectively for any distributions of the plain text and those of the key used for the encryption.

[348]  arXiv:2605.04722 [pdf, ps, other]
Title: Exact Dual Geometry of SOC-ICNN Value Functions
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

Input Convex Neural Networks (ICNNs) are commonly used in a two-stage manner: one first trains a convex network and then minimizes it over its input in a downstream inference problem. Recent second-order-cone ICNNs (SOC-ICNNs) enrich ReLU-based ICNNs with quadratic and conic modules and admit an exact representation as value functions of second-order cone programs (SOCPs). This value-function structure enables an explicit convex-analytic treatment of SOC-ICNN inference. In this paper, we study the exact first-order and local second-order geometry of SOC-ICNNs from the dual viewpoint. We show that supporting slopes, subdifferentials, directional derivatives, and local Hessians can be recovered directly from optimal dual variables. These results provide the geometric primitives for white-box SOC-ICNN inference, going beyond black-box automatic differentiation. Numerical experiments validate the exact multiplier readout, the local Hessian formula, and the set-valued behavior at structurally degenerate inputs. We also provide a step-by-step tutorial showing how the readout mechanism instantiates a complete white-box inference loop. The code is available at https://anonymous.4open.science/r/SOC-ICNN-Theory-BEFC/.

[349]  arXiv:2605.04723 [pdf, ps, other]
Title: Rethinking Convolutional Networks for Attribute-Aware Sequential Recommendation
Comments: Accepted at IJCAI-ECAI 2026
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Attribute-aware sequential recommendation entails predicting the next item a user will interact with based on a chronologically ordered history of past interactions, enriched with item attributes. Existing methods typically leverage self-attention mechanisms to aggregate the entire sequence into a unified representation used for next-item prediction. While effective, these models often suffer from high computational complexity and memory consumption, limiting their ability to process long user histories. This constraint restricts the model's capacity to fully capture long-term user preferences. In some scenarios, modeling item interactions purely through attention may also not be the most effective approach to extract sequential patterns. In this work, we propose ConvRec, an alternative method with linear computational and memory complexity that employs convolutional layers in a hierarchical, down-scaled fashion to generate compact, yet expressive sequence representations. To further enhance the model's ability to capture diverse sequential patterns, each layer aggregates the neighboring items gradually to reach a comprehensive sequence representation. Extensive experiments on four real-world datasets demonstrate that our approach outperforms state-of-the-art sequential recommendation models, highlighting the potential of convolution-based architectures for efficient and effective sequence modeling in recommendation systems. Our implementation code and datasets are available here https://github.com/ismll-research/ConvRec.

[350]  arXiv:2605.04724 [pdf, ps, other]
Title: From Beats to Breaches:How Offensive AI Infers Sensitive User Information from Playlists
Comments: This paper is accepted at IEEE EuroS&P 2026
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

The pervasive integration of AI has enabled Offensive AI: the exploitation of AI for malicious ends across the cyber-kill chain. A critical manifestation is the user attribute inference attack, where AI infers sensitive Personally Identifiable Information (PII) from innocuous public data. We explore how music streaming ecosystems, where users routinely release public playlists, can be exploited for Offensive AI. To quantify this threat, we developed musicPIIrate. This novel tool leverages deep learning architectures that utilize both standalone data representations and the structural information embedded in a user's playlist collection. Our design explores set-based approaches (e.g., Deep Sets) and methodologies modeling relationships between playlists (e.g., Graph Neural Networks), which we also combine to leverage both perspectives. Our approach addresses feature extraction from unordered, variable-length set data, enabling accurate PII prediction.
Empirical evaluation demonstrates that musicPIIrate achieves state-of-the-art inference accuracy. The tool successfully infers a wide array of attributes, including: Demographics (Age, Country, Gender), Habits (Alcohol, Smoke, Sport), and Personality Traits (OCEAN scores). musicPIIrate outperforms existing methods, beating baselines in 9 out of 15 attribute inference tasks. To counter this vulnerability, we propose JamShield, a lightweight defensive framework. JamShield strategically injects dummy playlists into an account to dilute the PII-carrying signal. Our analysis indicates that JamShield represents a promising defense, lowering inference F1-scores by an average of 10%. This work provides an initial Offensive-AI benchmark for playlist-based PII inference using architectures that leverage set- and graph-structured data and introduces a defense showing encouraging mitigation effects.

[351]  arXiv:2605.04726 [pdf, ps, other]
Title: RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation
Subjects: Information Retrieval (cs.IR)

Predicting a user's next search query from recent interaction behaviors is a critical problem in modern e-commerce systems, particularly in scenarios where user intent evolves rapidly. Large Language Models (LLMs) offer strong semantic reasoning capabilities and have recently been adopted to enhance training data construction for next-query prediction. However, due to resource constraints on mobile devices, existing applications are deployed on cloud servers, resulting in high inference costs. In this paper, we propose RecGPT-Mobile, a framework that designs a lightweight LLM-based intent understanding agent to improve recommendation quality in mobile e-commerce scenarios. By deploying LLMs directly on mobile devices, our approach can capture evolving interests of users more quickly and adjust the recommendation results in real time. Extensive offline analyses and online experiments demonstrate that our method significantly improves the accuracy of recommendation results, laying a practical path for LLM deployment in production-scale recommendation systems on mobile devices, as well as a scalable solution for integrating LLMs into real-world next-query prediction systems.

[352]  arXiv:2605.04727 [pdf, ps, other]
Title: Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols
Comments: Accepted at the International Conference on Intelligent Tutoring Systems (ITS 2026). To appear in Springer LNCS
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)

Programming Knowledge Tracing (PKT) has recently advanced through hybrid approaches that integrate attention-based feature modeling for code representation with RNN-based sequential prediction. While these models report strong empirical performance, their reliability can be sensitive to subtle implementation and experimental design choices. This study revisits representative PKT models and shows that reported gains can be substantially influenced by model configuration and sequence construction practices. We identify issues in attention dimension settings that affect performance estimates, and demonstrate that improper ordering of student attempts, such as ignoring ServerTimestamp, can violate temporal causality and lead to overly optimistic results. To ensure consistent evaluation, hyperparameters are selected via grid search guided by a single designated fold and then fixed uniformly across all folds during cross-validation. We further analyze the role of assignment-wise characteristics and systematically explore the impact of maximum sequence length. Using this protocol, we re-evaluate PKT models on the CodeWorkout dataset. Our results show that, under controlled and consistent settings, the performance gap between attention-enhanced models and standard DKT is significantly reduced, and increased architectural complexity does not consistently translate into superior performance. Beyond individual model comparisons, this work provides practical guidance for reliable and comparable evaluation in programming knowledge tracing.

[353]  arXiv:2605.04728 [pdf, ps, other]
Title: Anny-Fit: All-Age Human Mesh Recovery
Comments: CVPR 2026 Findings Track - Code available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recovering 3D human pose and shape from a single image remains a cornerstone of human-centric vision, yet most methods assume adult subjects and optimize each person independently. These assumptions fail in real-world, all-age scenes, where body proportions and depth must be resolved jointly. We introduce Anny-Fit, a multi-person, camera-space optimization framework for all-age 3D human mesh recovery (HMR). Unlike existing per-person fitting methods, Anny-Fit jointly optimizes all individuals directly in the camera coordinate system, enforcing global spatial consistency. At the core of our approach is the use of multiple forms of expert knowledge -- including metric depth maps, instance segmentation, 2D keypoints, and, VLM-derived semantic attributes such as age and gender -- each obtained from dedicated off-the-shelf networks. These complementary signals jointly guide the optimization, constraining the depth-scale ambiguity characteristic of all-age scenes. Across diverse datasets, Anny-Fit consistently improves 2D reprojection accuracy (+13 to 16), relative depth ordering (+6 to 7), 3D estimation error (-9 to -29) and shape estimation (+25 to +82), producing more coherent scenes. Finally, we show that VLM-based semantic knowledge can be distilled into an HMR model via the pseudo-ground-truth annotations produced by Anny-Fit on training data, enabling it to learn semantically meaningful shape parameters while improving HMR performance. Our approach bridges adult-only and all-age modeling by enabling zero-shot adaptation of adult-trained HMR pipelines to the full age spectrum without retraining. Code is publicly available at https://github.com/naver/anny-fit.

[354]  arXiv:2605.04729 [pdf, ps, other]
Title: AISSA: Implementation and Deployment of an AI-based Student Slides Analysis tool for Academic Presentations
Comments: Accepted in LASI Spain 26: Learning Analytics Summer Institute Spain 2026
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Providing timely and actionable feedback on oral presentation slides is challenging in higher education, particularly in large classes where teachers cannot realistically deliver detailed formative feedback before students present. This paper introduces AISSA (AI-based Student Slides Analysis tool), a web-based system that combines large language models (LLMs) and Learning Analytics dashboards to support scalable, rubric-based feedback on presentation slides. AISSA allows students to upload their slide decks prior to an oral presentation and automatically receive quantitative scores and qualitative feedback based on teacher-defined evaluation rubrics. The system analyzes both slide-level features and slide content, generates structured feedback through an LLM (ChatGPT 5.2), and presents the results through interactive dashboards for students and teachers. We tested AISSA on a pilot deployment with 46 undergraduate students in a real academic setting. The results indicate that AISSA is technically reliable, economically feasible, and perceived by students as useful for iterative slide improvement. These findings suggest that combining LLM-based analysis with Learning Analytics dashboards is a promising approach for supporting formative feedback on presentation slides at scale.

[355]  arXiv:2605.04730 [pdf, ps, other]
Title: ULF-Loc: Unbiased Landmark Feature for Robust Visual Localization with 3D Gaussian Splatting
Comments: published to CVPR (highlight)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual localization is a core technology for augmented reality and autonomous navigation. Recent methods combine the efficient rendering of 3D Gaussian Splatting (3DGS) with feature-based localization. These methods rely on direct matching between 2D query features and the 3D Gaussian feature field, but this often results in mismatches due to an inherent bias in the learned Gaussian feature. We theoretically analyze the feature learning process in 3DGS, revealing that the widely adopted $\alpha$-blending optimization inherently introduces bias into 3D point features. This bias stems from the entanglement between individual Gaussians and their neighboring Gaussians, making the learned features unsuitable for precise matching tasks. Motivated by these findings, we propose ULF-Loc, an unbiased landmark feature framework that replaces biased feature optimization with geometry-weighted feature fusion. We further introduce keypoint-consensus landmark sampling to select reliable Gaussians and local geometric consistency verification to reject mismatches caused by rendering artifacts. On the Cambridge Landmarks dataset, ULF-Loc reduces the mean median translation error by 17\% compared to the state-of-the-art, while achieving superior efficiency with only 1/10 the training time and 1/6 the GPU memory of STDLoc.

[356]  arXiv:2605.04731 [pdf, ps, other]
Title: Morphology-Guided Cross-Task Coupling for Joint Building Height and Footprint Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Building height (BH) and building footprint (BF) jointly describe the vertical and horizontal extent of the built environment and are required inputs for urban climate, disaster-risk, and population-mapping models. The two parameters are coupled through floor-area-ratio (FAR) constraints, yet remote-sensing approaches typically treat them as independent regression targets. We argue that explicitly encoding this cross-task coupling is more impactful than further refining individual encoders, and propose MorphoFormer, a joint BH/BF estimation framework built around two complementary mechanisms: (i) a BF-Guided Task Decoder (BGTD) that gates the height branch via cross-attention on a footprint-derived morphology context, and (ii) a Morphology Consistency Loss (MCL) that supervises a height-from-footprint surrogate against the ground-truth BH, indirectly forcing the BF feature to encode height-correlated structure. The encoder is a single-stage Swin backbone fed by Sentinel-1 SAR, Sentinel-2 multispectral, and DEM inputs, trained and evaluated on a geo-blocked split of 54 cities. Against a Swin-MTL baseline at identical receptive field, MorphoFormer reduces BH test RMSE from 3.39 to 3.15 m (R^2 improves 0.62 -> 0.67) with BF R^2 stable at 0.80. Controlled ablations at identical capacity attribute most of this 0.24 m improvement to the two proposed mechanisms: removing BGTD raises BH RMSE by 0.11 m and removing MCL raises it by 0.11 m, with the residual approximately 0.02 m falling within the noise floor of encoder-side variations. Because both mechanisms act on cross-task representations rather than pixels, the design carries no intrinsic dependence on input resolution.

[357]  arXiv:2605.04732 [pdf, ps, other]
Title: Using Common Random Numbers for Simulation-based Planning with Rollouts
Journal-ref: Reinforcement Learning Journal 2026
Subjects: Machine Learning (cs.LG)

Simulation-based planning with rollouts is a widely-deployed technique for decision making in stochastic environments. The primary instrument of simulation-based planning is a sampling model, which is repeatedly called to generate trajectories and estimate the utilities of available actions. Among the actions thus explored, one with the maximum estimated utility is then executed. In this paper, we examine the effect of using common random numbers in the simulation process. We obtain a simple recipe for (provably) reducing variance in relative utility when simulations invoke a rollout policy beyond some depth. Experiments on synthetic tasks confirm that our scheme improves task performance. The broader significance of our innovation is apparent from two practical applications: (1) single-step lookahead planning in a pension-disbursement task, and (2) a deployment of the well-known UCT algorithm for the game of Ludo.

[358]  arXiv:2605.04733 [pdf, ps, other]
Title: Reward-Decomposed Reinforcement Learning for Immersive Video Role-Playing
Subjects: Artificial Intelligence (cs.AI)

Text-based role-playing models can imitate character styles, yet they often fail to reflect a scene's atmosphere and evolving tension, both essential for immersive applications such as Virtual Reality (VR) games and interactive narratives. We study video-grounded role-playing dialogue and introduce EBM-RL (Eye-Brain-Mouth Reinforcement Learning), a decoupled GRPO-based framework that explicitly separates observation ([perception]), reasoning ([think]), and utterance ([answer]). This structure promotes human-like sensory grounding by compelling the model to first attend to visual cues, then form internal interpretations, and finally generate context-appropriate dialogue.
EBM-RL integrates four complementary rewards: (i) CLIP-based scene-text alignment to improve ambiance and emotion; (ii) a Perceptual-Cognitive reward that encourages [perception] and [think] processes that increase the likelihood of the reference response; (iii) answer accuracy to ensure faithfulness; and (iv) a dense format reward to enforce the desired structured output.
Extensive experiments demonstrate that EBM-RL substantially outperforms text-only role-playing baselines and larger-scale vision-language models on our immersive role-playing benchmark, delivering simultaneous gains in visual-atmosphere consistency and character authenticity. Beyond the role-playing domain, EBM-RL also exhibits strong zero-shot generalization: without any additional fine-tuning, it consistently improves performance on out-of-domain VideoQA benchmarks. We additionally release an open-source dataset for video-grounded role-playing dialogue.

[359]  arXiv:2605.04735 [pdf, ps, other]
Title: Sequential topology optimization: SIMP initialization for level-set boundary refinement
Authors: Ondřej Ježek (1,2), Ján Kopačka (1), Martin Isoz (1), Dušan Gabriel (1) ((1) Institute of Thermomechanics, Czech Academy of Sciences, Praha, Czech Republic, (2) Faculty of Mechanical Engineering, Czech Technical University in Prague, Praha, Czech Republic)
Comments: 19 pages, 7 figures, 5 tables. Submitted to Advances in Engineering Software. Source code: this https URL Archived snapshot with reproduction data: this https URL
Subjects: Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA); Optimization and Control (math.OC)

Density-based topology optimization methods such as SIMP enable efficient topological exploration but produce diffuse material boundaries that require interpretation before manufacturing. Level-set methods maintain sharp interfaces but are sensitive to the initial design. This paper presents a sequential framework that addresses these complementary limitations through a signed distance function (SDF)-based geometry transfer, formulated for three-dimensional meshes. The SIMP density distribution is converted into an SDF that initializes subsequent level-set boundary refinement. From the level-set perspective, the SIMP-derived initialization mitigates sensitivity to the initial design. From the SIMP perspective, the level-set stage acts as optimization-driven post-processing that produces manufacturing-ready boundaries. Validation on three-dimensional cantilever and MBB benchmarks demonstrates compliance comparable to standalone level-set optimization, with up to 4.6x wall-clock speedup on the cantilever case. The full implementation is released under an open-source license to support reproducibility.

[360]  arXiv:2605.04738 [pdf, ps, other]
Title: OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization
Comments: ICML 2026
Subjects: Machine Learning (cs.LG)

Large Language Models (LLMs) have demonstrated remarkable capabilities. However, their massive parameter scale leads to significant resource consumption and latency during inference. Post-training weight-only quantization offers a promising solution by reducing model size and accelerating token generation through alleviating the memory-bound issue. Nevertheless, the presence of inherent systematic outliers in weights continues to be a major obstacle. While existing methods, such as scaling and rotation, attempt to address this issue, the performance remains unsatisfactory. In this paper, we propose Outlier Self-Absorption Quantization (OSAQ), which performs additive weight suppression guided by the second-order low-rank property for low-bit weight-only quantization of LLMs. Specifically, we observe that the Hessian exhibits low-rank consistency across different inputs, with certain directions consistently showing vanishing curvature. Leveraging this property, we identify a stable null space of the Hessian and then construct an additive weight transformation by linearly combining the vectors within this null space, thereby suppressing weight outliers without affecting the task loss. This additive transformation can be absorbed into the weights offline, requiring no inter-layer transformations and introducing no inference overhead. Moreover, the construction is efficiently achieved by a closed-form solution, without resource-intensive training or iterative procedures. Extensive experiments demonstrate that OSAQ effectively suppresses outliers and enhances low-bit quantization performance. For instance, in 2-bit quantization, OSAQ, when integrated with GPTQ, achieves over 40% lower perplexity compared to vanilla GPTQ.

[361]  arXiv:2605.04740 [pdf, ps, other]
Title: AICoFe: Implementation and Deployment of an AI-Based Collaborative Feedback System for Higher Education
Comments: Accepted in LASI Spain 26: Learning Analytics Summer Institute Spain 2026
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Effective peer feedback is essential for developing critical reflection in higher education, yet its impact is often limited by the inconsistent quality of student-generated comments. This paper presents the implementation and deployment of AICoFe (AI-based Collaborative Feedback), a system designed to bridge this gap through a human-centered AI approach. We describe a modular architecture that orchestrates a multi-LLM pipeline, utilizing GPT-4.1-mini, Gemini 2.5 Flash, and Llama 3.1, to synthesize quantitative rubric data and qualitative observations into coherent, actionable feedback. Key to the system is a "teacher-in-the-loop" mediation workflow, where educators use specialized Learning Analytics dashboards to curate and refine AI-generated drafts before delivery. Furthermore, we detail the underlying data infrastructure, which employs a hybrid SQL and MongoDB strategy to ensure traceability and manage semi-structured feedback versions.

[362]  arXiv:2605.04741 [pdf, ps, other]
Title: Hierarachical Multiagent Reinforcement Learning for Multi-Group Tax Game
Subjects: Multiagent Systems (cs.MA)

Reinforcement learning has increasingly been used to study economic decision-making, such as taxation, public spending, and labour supply. However, most existing RL-based economic models focus on a single government--household group, thereby overlooking the strategic interactions that arise when multiple governments compete while managing their own populations. In practice, many economic systems (e.g., taxation) exhibit a multi-group structure, where each government must optimize its fiscal policy in response not only to household behaviour within its jurisdiction, but also to the policies of other competing governments. To capture this structure, we formulate taxation as a hierarchical multi-group game. Within each group, the interaction between the government and households is modelled as a leader--follower game; across groups, governments are modelled as players in a competitive game. This results in a hybrid hierarchical game that is difficult to solve using standard multi-agent reinforcement learning algorithms. We therefore propose a bi-level training framework built on multi-agent reinforcement learning, together with \textit{ Curriculum Learning} and a \textit{ Closed-Loop Sequential Update} strategy, to stabilize training and promote convergence. We instantiate this framework in a taxation game simulation environment grounded in classical economic models. The environment supports the evaluation of different taxation algorithms and provides multiple economic indicators for assessing policy performance. Experiments show that our approach can learn stable tax policies that benefit all participating groups. Compared with a two-group baseline without the proposed update mechanisms, our method avoids premature game collapse, extends the effective game duration by 60.92\%, produces more sustainable and robust tax policies, and reduces GDP disparities among governments by 44.12\%.

[363]  arXiv:2605.04744 [pdf, ps, other]
Title: MixINN: Accelerating Plant Breeding by Combining Mixed Models and Deep Learning for Interaction Prediction
Comments: 11 pages, 1 figure
Subjects: Machine Learning (cs.LG)

Plant breeding underpins global food security through incremental, accumulating improvements in crop yield, quality and sustainability, achieved via repeated cycles of crop ranking, selection and crossing. Climate change disrupts this process by altering local growing conditions, thereby shifting the relative performance of crop genotypes. Predicting these relative changes in yield is critical for food security. Yet, this problem remains an open challenge in plant breeding, and relatively unexplored within the AI community. We propose MixINN, an approach that first isolates high-quality genotype-environment interaction labels using mixed models, and then predicts these interactions for new crop varieties in future environmental conditions with a deep neural network. We evaluate our method on a corn multi-environment trial across the continental United States and show improved prediction of genotype ranking over current plant breeding methods. MixINN demonstrated superior performance in identifying the 20% most productive corn genotypes, leading to a 5.8% higher average yield, which further improved to 7.2% when targeting specific growing environments. These are competitive results for real-world breeding programs, demonstrating the potential of AI research in accelerating the development of climate-adapted crops, and improving future food security under climate change.

[364]  arXiv:2605.04747 [pdf, ps, other]
Title: Knowledge-Free Correlated Agreement for Incentivizing Federated Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

We introduce Knowledge-Free Correlated Agreement (KFCA) to reward client contributions in federated learning (FL) without relying on ground truth, a public test set, or distribution knowledge. Under categorical reports and an honest majority, KFCA is strictly truthful, addressing the label-flipping vulnerability of Correlated Agreement (CA). We evaluate KFCA on federated LLM adapter tuning and a real-world PCB inspection task, showing efficient real-time reward computation suitable for decentralized and blockchain-based incentive designs.

[365]  arXiv:2605.04750 [pdf, ps, other]
Title: VC-FeS: Viewpoint-Conditioned Feature Selection for Vehicle Re-identification in Thermal Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)

Identification of less-articulated objects using single-channel images, such as thermal images, is important in many applications, such as surveillance. However, in this domain, existing methods show poor performance due to high similarity among objects of the same category in the absence of color information (overlooking shape information) and de-emphasized texture information. Furthermore, variability in viewpoint adds more complexity as the features vary from side to side. We address these issues by constructing viewpoint-conditioned feature vectors and area-specific feature comparisons in separate feature spaces. These interventions enable leveraging the advancements of existing RGB-pre-trained ViT feature extractors while effectively adapting them to address the challenges specific to the thermal domain. We test our system with RGBNT100 (IR) vehicle dataset and a thermal maritime dataset acquired by us. Our results surpass the state-of-the-art methods by 19.7% and 12.8% for the above datasets in mAP scores, respectively. We also plan to make our thermal dataset available, the first of its kind for maritime vessel identification.

[366]  arXiv:2605.04751 [pdf, ps, other]
Title: Sequential Monte Carlo for Resilient Networks: Assessment, Mitigation, and Generative Modeling
Subjects: Systems and Control (eess.SY)

Resilience is becoming crucial for future wireless networks, which must withstand, adapt to, and recover from rare but potentially cascading disruptions. This paper develops a sequential Monte Carlo (SMC) simulation framework for such systems, in which resilience failures are formulated as path-dependent rare events arising from staged degradation and delayed recovery, and are decomposed into semantically interpretable levels defined by a reaction coordinate. Building on this structure, we present a fixed-level splitting approach with budget-aware population control, enabling efficient estimation of rare non-recovery probabilities. We discuss the potential reuse of SMC checkpoints as representative near-critical states for policy evaluation and simulation-based selection. We further extend the methodology to learned stochastic simulation by using generative sequence models as restartable surrogates within data-driven digital twins. We showcase the framework in a delay-critical wireless network use case, where SMC substantially improves over standard Monte Carlo in rare-event regimes with both physical and learned simulators.

[367]  arXiv:2605.04752 [pdf, ps, other]
Title: Hybrid Congestion Classification Framework Using Flow-Guided Attention and Empirical Mode Decomposition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Accurate traffic congestion classification requires models that jointly capture roadway scene context and non-stationary traffic motion, yet most prior work treats these requirements in isolation. Vision-based methods often depend on appearance cues with standard temporal pooling, which can bias predictions toward static infrastructure, whereas signal-based approaches characterize temporal dynamics but lack the spatial context needed for scene-level localization. These complementary limitations motivate a unified framework that links motion evidence to spatial feature selection while preserving data-adaptive temporal characterization. This study therefore proposes FLO-EMD, a hybrid approach that couples motion-guided attention with empirical, data-driven temporal decomposition. Dense optical flow guides channel and spatial attention so that RGB features are refined toward motion-relevant regions. In parallel, aggregated flow statistics form compact motion traces that are decomposed using Empirical Mode Decomposition (EMD) to extract intrinsic temporal components. The resulting EMD embedding is fused with learned spatiotemporal representations to classify light, medium, and heavy congestion. Experiments on 1,050 five-second clips from four surveillance networks show that FLO-EMD achieves 97.5% overall test accuracy (weighted F1 = 0.9742), outperforming established baselines and remaining robust across diverse environmental conditions; ablation and sensitivity analyses further quantify the contributions of EMD, the number of intrinsic mode functions, and the selected motion descriptors.

[368]  arXiv:2605.04754 [pdf, ps, other]
Title: AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures
Comments: accepted at the IEEE Computer Society Annual Symposium on VLSI ISVLSI 2026
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)

Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied as independent routes towards efficient inference, the former by replacing exact arithmetic with low-power approximate multipliers, the latter by routing inputs through specialized expert sub-networks to enable conditional computation. However, their interaction remains entirely unexplored. This paper presents AxMoE, the first study of the impact of approximate multiplication on MoE DNN architectures. We evaluate three MoE variants: Hard MoE, Soft MoE, and Cluster MoE against dense baselines across three CNN architectures (ResNet-20, VGG11_bn, VGG19_bn) on CIFAR-100 and a Vision Transformer (ViT-Small) on Tiny ImageNet-200 dataset, using eight 8-bit signed multipliers (including one exact baseline) from the EvoApproxLib library. Results show that, without retraining, the Dense baseline is the most resilient topology across all CNN architectures, whereas on ViT-Small, all topologies degrade at comparable rates regardless of routing strategy. After approximate-aware retraining, recovery varies substantially across architectures, topologies, and multipliers. ResNet-20 achieves full recovery across the entire multiplier range, whereas VGG architectures recover at moderate multipliers but fail irreversibly at aggressive ones for all topologies except Cluster MoE on VGG11_bn; on ViT-Small, Hard MoE outperforms Dense under aggressive approximation at equal normalized inference cost. These results pave the way for future approximate MoE hardware-software co-design strategies.

[369]  arXiv:2605.04757 [pdf, ps, other]
Title: 3D Printing of Passively Actuated Self-Folding Robots with Integrated Functional Modules
Comments: 8pages, 10 figures, This paper is accepted in ICRA 2026
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)

We introduce an elastic-driven self-folding approach that fabricates robots directly from flat 3D-printed conductive PLA nets. Elastic bands routed through printed hooks store energy that folds the sheet into programmed 3D geometries, while the flat state allows accurate placement of electronics and magnets before deployment. The same substrate doubles as electrodes for capacitive touch and supports a reusable platform I/O palette with Hall sensors and eccentric rotating mass (ERM) motors for docking detection and vibration actuation. We also derive a closed-form folding model that balances hinge stiffness with elastic band moment to predict equilibrium fold angles; experiments validate the model and yield a design map linking hinge thickness, band size, and hook spacing to target angles. Using this workflow we realize multiple polyhedral modules and demonstrate three applications: a cube that highlights the potential of self-folding for scalable modular robot collectives, a deployable gripper, and a tendon-driven finger. The method is low cost, stimulus-free, and integrates actuation and sensing.

[370]  arXiv:2605.04759 [pdf, ps, other]
Title: Gyan: An Explainable Neuro-Symbolic Language Model
Comments: also submitted to NeurIPS 2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG)

Transformer based pre-trained large language models have become ubiquitous. There is increasing evidence to suggest that even with large scale pre-training, these models do not capture complete compositional context and certainly not, the full human analogous context. Besides, by the very nature of the architecture, these models hallucinate, are difficult to maintain, are not easily interpretable and require enormous compute resources for training and inference. Here, we describe Gyan, an explainable language model based on a novel non-transformer architecture, without any of these limitations. Gyan achieves SOTA performance on 3 widely cited data sets and superior performance on two proprietary data sets. The novel architecture decouples the language model from knowledge acquisition and representation. The model draws on rhetorical structure theory, semantic role theory and knowledge-based computational linguistics. Gyan's meaning representation structure captures the complete compositional context and attempts to mimic humans by expanding the context to a 'world model'. AI model adoption critically depends on trust and transparency especially in mission critical use cases. Collectively, our results demonstrate that it is possible to create models which are trustable and reliable for mission critical tasks. We believe our work has tremendous potential for guiding the development of transparent and trusted architectures for language models.

[371]  arXiv:2605.04760 [pdf, ps, other]
Title: AFL-ICP: Enhancing Industrial Control Protocol Reliability via Specification-Guided Fuzzing
Comments: 11 pages, 5 figures
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Software Engineering (cs.SE)

Industrial Control Protocols (ICPs) are critical to the reliability and stability of industrial infrastructure, yet their security is fundamentally compromised by a specification-blindness bottleneck. Modern fuzzers, constrained by observation-driven inference, struggle to penetrate deep protocol states or detect subtle semantic deviations. In this paper, we present AFL-ICP, an autonomous fuzzing framework that pioneers a specification-driven paradigm. AFL-ICP features a context-aware specification formalization pipeline to transform complex specifications into rigorous machine-executable grammars. Building on this formalized specification, AFL-ICP leverages LLMs to enable automated protocol adaptation and seed generation, allowing for rapid extension to new protocols with minimal manual effort. Additionally, it includes an LLM-powered differential checker that cross-references implementation outputs with specification requirements to detect subtle semantic and logic bugs that existing fuzzers cannot detect. We implement AFL-ICP and evaluate it on four widely used ICPs, including both open-source and closed-source variants. Results show that AFL-ICP significantly outperforms state-of-the-art fuzzers in coverage and uncovers 24 previously unknown vulnerabilities, for which we have received acknowledgments from affected vendors (e.g., FreyrSCADA). Specifically, the identified vulnerabilities include 16 semantic and logic bugs that can silently disrupt industrial operations and degrade service availability.

[372]  arXiv:2605.04761 [pdf, ps, other]
Title: Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop
Comments: 40 pages, 5 figures, 20 tables, 1 algorithm, 10 listings
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

This paper presents the Personalized Thinking Model (PTM), a hierarchical and interpretable learner representation designed for AI supported education. PTM organizes evidence from learner journals into a five-layer structure covering behavioral instances, behavioral patterns, cognitive routines, metacognitive tendencies, and self-system values. PTM is grounded in Marzano's New Taxonomy of Educational Objectives and tries to clone learner's thinking model and build cognitive twin. It was constructed using a pipeline that combines large language model inference (Gemini 2.5 Pro), sentence embeddings, dimensionality reduction, and consensus clustering. This paper evaluates PTM fidelity through three methods applied to 40 participants in a seven-week study. First, automatic evaluation using atomic information point matching yielded an overall F1 score of 74.57% before human-in-the-loop (HITL) refinement and 75.48% after refinement. Second, user evaluation using a Likert scale produced mean ratings of 4.26 and 4.30 on a five-point scale for pre and post-HITL conditions respectively. Third, semantic alignment verification showed that topic coherence increased from 0.436 at the behavioral layer to 0.626 at the core value layer, while lexical overlap with journal vocabulary decreased from 0.114 to 0.007 across those same layers. These results suggest that the PTM produces outputs with acceptable fidelity, was generally perceived by users as reflecting their thinking, and showed a pattern consistent with semantic abstraction across layers.

[373]  arXiv:2605.04763 [pdf, ps, other]
Title: How Does Chunking Affect Retrieval-Augmented Code Completion? A Controlled Empirical Study
Comments: 10 pages, 7 figures
Subjects: Software Engineering (cs.SE)

Retrieval-augmented generation (RAG) pipelines for code completion rely on chunking to segment source files into retrievable units, yet chunking strategies are typically adopted without empirical justification, and practitioner recommendations are notably inconsistent. We present a controlled empirical study isolating the effect of chunking on code completion quality by crossing four representative strategies (Function, Declaration, Sliding Window, and cAST) with four retrievers, five generators, and nine parameter configurations on two benchmarks (RepoEval and CrossCodeEval), totaling 864 experimental settings. Our results reveal that chunking strategy has a statistically significant effect on RAG-based code completion. Contrary to intuition, chunking based on functions underperforms all other strategies by 3.57--5.64 percentage points on RepoEval (Cliff's delta = -1.0), while the remaining chunking strategies perform comparably. Our further analysis demonstrates that this observation holds across all retriever--generator combinations. We also find that cross-file context length is the dominant parameter: doubling from 2,048 to 8,192 tokens yields up to 4.2 percentage points of improvement, whereas chunk size has a weaker, non-monotonic effect. On the cost--quality Pareto front, Sliding Window and cAST dominate both benchmarks; Function chunking is never Pareto-optimal.

[374]  arXiv:2605.04764 [pdf, ps, other]
Title: Elicitation Matters: How Prompts and Query Protocols Shape LLM Surrogates under Sparse Observations
Subjects: Computation and Language (cs.CL)

Large language models are increasingly used as surrogate models for low-data optimization, but their optimizer-facing prediction and its uncertainty remain poorly understood. We study the surrogate belief elicited from an LLM under sparse observations, showing that it depends strongly on prompt text and query protocol. We introduce an uncertainty-alignment criterion that measures whether model uncertainty tracks residual ambiguity among sample-consistent functions. Across controlled inference tasks and Bayesian optimization studies, we find that structural prompts act as effective priors, POINTWISE and JOINT querying induce different beliefs, and sequential evidence leads to non-monotonic, order-sensitive confidence updates. These effects change downstream acquisition decisions and regret, showing that elicitation protocol is part of the LLM surrogate specification, not a formatting detail.

[375]  arXiv:2605.04765 [pdf, ps, other]
Title: Computational and Analytical Study of Variations and Generalizations of the FC-Gram Approximation Algorithm
Subjects: Numerical Analysis (math.NA)

The FC-Gram algorithm approximates non-periodic functions to high order by constructing a periodic extension with controlled boundary behavior and applying trigonometric interpolation. In this paper we introduce a generalized FC-Gram framework (GenFC), which provides greater flexibility in the construction of the blending continuation of Gram polynomials. This flexibility gives better control over the shape of the periodic extension and leads to improved approximation accuracy. We establish a convergence theorem showing that the trigonometric interpolant converges at the rate $\mathcal{O}(n^{-\min(r+\beta,\,d)})$ in the supremum norm on the original interval, where $r$ is the smoothness of the target function, $d$ the number of Gram polynomials, and $\beta \in [0,1]$ a Fourier-decay parameter. The framework and its analysis are developed so that the modified FC-Gram method of [J. Sci. Comput., 105(1):8, 2025] is recovered as a particular case. Numerical experiments confirm the predicted convergence rates and show that the added flexibility of the GenFC framework leads to improved approximation accuracy, with the gains carrying over to a Fourier continuation solver for two-point boundary value problems.

[376]  arXiv:2605.04768 [pdf, ps, other]
Title: From open-loop representations to closed-loop feedback implementations in differential games: A numerical case study
Subjects: Systems and Control (eess.SY)

Solutions to pursuit-evasion and surveillance-evasion differential games are typically computed and expressed using open-loop representations, with the synthesis of feedback strategies significantly less common. We propose a numerical scheme for obtaining feedback strategies for the recently introduced prying-pedestrian surveillance-evasion differential game. The scheme involves computing feedback strategies as input-output maps approximated via neural networks trained using data obtained from open-loop representations of solutions. Simulations show the effectiveness of neural networks trained with an appropriate learning-loss function. Since optimal feedback strategies are discontinuous, as a second contribution, the potential loss/gain of individual players is subsequently studied for players using sample-and-hold feedback compared to continuous-time feedback.

[377]  arXiv:2605.04769 [pdf, ps, other]
Title: Lightweight Cross-Spectral Face Recognition via Contrastive Alignment and Distillation
Comments: Accepted in IEEE TBIOM
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Heterogeneous Face Recognition (HFR) aims at matching face images captured across different sensing modalities, such as thermal-to-visible or near-infrared-to-visible, enhancing the usability of face recognition systems in challenging real-world conditions. Although recent HFR methods have achieved significant improvements in performance, many rely on computationally expensive models, making them impractical for deployment on resource-limited edge devices. In this work, we introduce a lightweight yet effective HFR framework by adapting a hybrid CNN-Transformer model originally developed for RGB homogeneous face recognition. Our approach enables efficient end-to-end training with only a small amount of paired heterogeneous data, while still maintaining strong performance on standard RGB face recognition benchmarks. This makes it suitable for both homogeneous and heterogeneous settings. Comprehensive experiments on several challenging HFR and face recognition benchmarks show that our method achieves state-of-the-art or competitive performance while keeping computational requirements low.

[378]  arXiv:2605.04770 [pdf, ps, other]
Title: Gaze4HRI: Zero-shot Benchmarking Gaze Estimation Neural-Networks for Human-Robot Interaction
Comments: Accepted to the 2026 IEEE International Conference on Automatic Face and Gesture Recognition (FG 2026)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Robotics (cs.RO)

While zero-shot appearance-based 3D gaze estimation offers significant cost-efficiency by directly mapping RGB images to gaze vectors, its reliability in Human-Robot Interaction (HRI) settings remains uncertain. Existing benchmarks frequently overlook fundamental HRI conditions, such as dynamic camera viewpoints and moving targets in video. Furthermore, current cross-dataset evaluations often suffer from a complexity gap, where methods trained on diverse datasets are tested on significantly smaller and less varied sets, failing to assess true robustness. To bridge these gaps, we introduce Gaze4HRI, a large-scale dataset (50+ subjects, 3,000+ videos, 600,000+ frames) designed to evaluate state-of-the-art performance against critical HRI variables: illumination, head-gaze conflict, as well as the motion of camera and gaze target in video. Our benchmark reveals that all evaluated methods fail in at least one condition, identifying steeply-downward gaze as a universal failure point. Notably, PureGaze trained on the ETH-X-Gaze dataset uniquely maintains resilience across all other conditions. These results challenge the recent focus in the literature on complex spatial-temporal modeling and Transformer-based architectures. Instead, our findings suggest that extensive data diversity, as exemplified by the ETH-X-Gaze dataset, serves as the primary driver of zero-shot robustness in unconstrained environments, while resilience-enhancing frameworks, such as PureGaze's self-adversarial loss for gaze feature purification, provide a substantial further improvement. Ultimately, this study establishes a rigorous benchmark that provides practical guidelines for practitioners as well as reshaping future research. The dataset and codes are available at https://gazeforhri.github.io.

[379]  arXiv:2605.04772 [pdf, ps, other]
Title: MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education
Comments: Accepted at the Workshop on Applications of Medical AI (AMAI 2025), in conjunction with MICCAI 2025
Journal-ref: Workshop on Applications of Medical AI (AMAI 2025), MICCAI 2025, pp 103-112, 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Access to diverse, well-annotated medical images with interactive learning tools is fundamental for training practitioners in medicine and related fields to improve their diagnostic skills and understanding of anatomical structures. While medical atlases are valuable, they are often impractical due to their size and lack of interactivity, whereas online image search may provide mislabeled or incomplete material. To address this, we propose MIRAGE, a multimodal medical text and image retrieval and generation system that allows users to find and generate clinically relevant images from trustworthy sources by mapping both text and images to a shared latent space, enabling semantically meaningful queries. The system is based on a fine-tuned medical version of CLIP (MedICaT-ROCO), trained with the ROCO dataset, obtained from PubMed Central. MIRAGE allows users to give prompts to retrieve images, generate synthetic ones through a medical diffusion model (Prompt2MedImage) and receive enriched descriptions from a large language model (Dolly-v2-3b). It also supports a dual search option, enabling the visual comparison of different medical conditions. A key advantage of the system is that it relies entirely on publicly available pretrained models, ensuring reproducibility and accessibility. Our goal is to provide a free, transparent and easy-to-use didactic tool for medical students, especially those without programming skills. The system features an interface that enables interactive and personalized visual learning through medical image retrieval and generation. The system is accessible to medical students worldwide without requiring local computational resources or technical expertise, and is currently deployed on Kaggle: this http URL

[380]  arXiv:2605.04773 [pdf, ps, other]
Title: AGIPC: Adaptive In-Solve Algebraic Coarsening for GPU IPC
Subjects: Graphics (cs.GR); Performance (cs.PF)

Implicit time integration is key to robustly simulating stiff materials and large deformations, but its performance is often dominated by repeatedly solving large linear systems. Adaptive coarsening can reduce this cost by concentrating degrees of freedom (DoF) to where it is most needed, yet conventional explicit remeshing changes connectivity (and often vertex ordering), complicating parallel implementations, harming memory locality, and sometimes being disallowed when it may introduce local geometry intersections. Adaptive subspace approaches avoid topological changes, but basis construction and updates incur irregular data access patterns and typically produce dense system matrices, limiting GPU efficiency and keeping many practical systems CPU-centric. We present algebraic adaptive in-solve coarsening, a GPU-oriented method that dynamically reduces DoF within the Newton solve of implicit time integration without explicit topological modification. Starting from a fine mesh, we express adaptivity as a selective edge-collapse process governed by per-edge tags. Collapsible edges are aggregated in parallel using a warp-level hash mapping scheme that groups fine vertices into coarse super-nodes, while protected edges preserve local detail. This defines an implicit coarse mesh whose linear system is assembled algebraically by mapping and reducing fine-scale gradients and Hessians via efficient GPU reduction kernels. We solve the resulting coarse system with a preconditioned conjugate gradient (PCG) method and then prolongate the solution back to the fine mesh. Our approach integrates seamlessly with IPC's barrier energy and exploits GPU parallelism end-to-end. Across a range of challenging scenarios, we achieve up to 3x speedup over a state-of-the-art GPU IPC solver while producing visually indistinguishable results.

[381]  arXiv:2605.04777 [pdf, ps, other]
Title: Bridging Perception and Action: A Lightweight Multimodal Meta-Planner Framework for Robust Earth Observation Agents
Subjects: Multiagent Systems (cs.MA)

Autonomous Earth Observation (EO) agents are transitioning from passive perception to complex, multi-step task execution. However, current architectures that integrate planning and execution within a single model often struggle with combinatorial complexity and reasoning errors in dynamic EO scenarios. To resolve these challenges, we propose the Lightweight Multimodal Meta-Planner (LMMP) framework. LMMP incorporates a dual-awareness mechanism that grounds strategic plans in both multimodal image features and high-level task semantics. Crucially, we introduce a Meta Task Library to inject remote sensing expert knowledge directly into the workflow, which standardizes domain logic and ensures plans are physically feasible. We further implement a two-stage training pipeline, initializing the Meta-Planner via expert-distilled Supervised Fine-Tuning and refining it through Direct Preference Optimization based on execution feedback. Extensive experiments on a dataset derived from EarthBench and ThinkGeo demonstrate that LMMP significantly improves tool-calling accuracy and task success rates. Moreover, the framework exhibits strong ``plug-and-play'' versatility, consistently enhancing the performance of diverse executor backbones across previously unseen EO missions.

[382]  arXiv:2605.04778 [pdf, ps, other]
Title: Steady Incremental Viscosity Splitting Method for solving the stationary Navier-Stokes equation
Subjects: Numerical Analysis (math.NA)

We develop a novel and efficient iterative scheme for solving incompressible steady Navier-Stokes equations. The method is an adaptation of the Incremental Viscosity Splitting approximation for unsteady flows to steady equations. At each nonlinear iteration, the scheme requires solving an elliptic PDE for the velocity variable and a system with an SPD matrix for the pressure variable, which remains the same across all nonlinear iterations. The method can also be interpreted as an algebraic splitting approach. We prove boundedness and geometric convergence. Numerical tests illustrate the efficiency of the proposed algorithm.

[383]  arXiv:2605.04779 [pdf, ps, other]
Title: A meta-analysis of the effect of generative AI on productivity and learning in programming
Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)

Generative artificial intelligence (GenAI) is increasingly used for programming, yet it remains unclear when and where GenAI tools lead to productivity gains. Evidence on the effects of GenAI on the long-term development of programming skills is similarly mixed. Here, we present a meta-analysis of $n = 23$ studies reporting $k = 27$ effect sizes to quantify the effect of GenAI-powered coding assistants on productivity and learning. We systematically searched (i) ACM, (ii) arXiv, (iii) Scopus, and (iv) Web of Science for studies published between 2019 and 2025. Studies were required to compare GenAI-assisted with unassisted programming using quantitative measures of (1) productivity (i.e., task completion time, commits, and lines of code) and (2) learning (i.e., exam performance). We assessed the risk of bias using RoB2 and ROBINS-I and compared standardized effect sizes using Hedges' $g$. We find a statistically significant, but moderate positive effect of GenAI assistance on developer productivity ($g = 0.33$, $95\%$ CI: $[0.09, 0.58]$), yet with substantial heterogeneity across settings. Notably, productivity gains tend to be larger in controlled experimental settings, while effects are smaller in open-source and enterprise contexts. In contrast, we find no statistically significant effect of GenAI assistance on learning outcomes ($g = 0.14$, $95\%$ CI: $[-0.18, 0.47]$). Overall, these results highlight that GenAI coding assistants can increase developer productivity, although these gains depend strongly on context. In educational settings, however, the use of GenAI does not consistently translate into improved learning or skill development, which highlights the need for careful integration of GenAI into computer science education.

[384]  arXiv:2605.04785 [pdf, ps, other]
Title: AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use
Authors: Chenglin Yang
Comments: 31 pages, 2 figures, 15 tables; preprint
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A single unsafe action, including accidental deletion, credential exposure, or data exfiltration, can cause irreversible harm. Existing defenses are incomplete: post-hoc benchmarks measure behavior after execution, static guardrails miss obfuscation and multi-step context, and infrastructure sandboxes constrain where code runs without understanding what an action means.
We present AgentTrust, a runtime safety layer that intercepts agent tool calls before execution and returns a structured verdict: allow, warn, block, or review. AgentTrust combines a shell deobfuscation normalizer, SafeFix suggestions for safer alternatives, RiskChain detection for multi-step attack chains, and a cache-aware LLM-as-Judge for ambiguous inputs.
We release a 300-scenario benchmark across six risk categories and an additional 630 independently constructed real-world adversarial scenarios. On the internal benchmark, the production-only ruleset achieves 95.0% verdict accuracy and 73.7% risk-level accuracy at low-millisecond end-to-end latency. On the 630-scenario benchmark, evaluated under a patched ruleset and not claimed as zero-shot, AgentTrust achieves 96.7% verdict accuracy, including about 93% on shell-obfuscated payloads. AgentTrust is released under the AGPL-3.0 license and provides a Model Context Protocol server for MCP-compatible agents.

[385]  arXiv:2605.04786 [pdf, ps, other]
Title: Superconvergence in finite element method by smoothing
Subjects: Numerical Analysis (math.NA)

This paper develops a smoothing-based postprocessing method for superconvergence in finite element methods. The method applies a few smoothing iterations, such as damped Jacobi, Gauss-Seidel, or conjugate gradient, with initial guess being the current finite element solution embedded in an enriched finite element space. The resulting procedure is algebraic, easy to implement, and applicable to high-order and three-dimensional discretizations. For symmetric and positive-definite problems, we prove superconvergence of the smoothed solutions under additive and multiplicative smoothers. Effectiveness of the proposed method is demonstrated by numerical experiments for the Poisson, Maxwell, biharmonic and Helmholtz equations.

[386]  arXiv:2605.04787 [pdf, ps, other]
Title: Long-Term Risks of IoT Devices: The Case of the Smart Fridge
Authors: Erik Buchmann
Journal-ref: BUCHMANN, Erik. Long-Term Risks of IoT Devices: The Case of the Smart Fridge. In: Proceedings of the 17th Conference on Digital Society (ICDS'23), 2023
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)

Replacing conventional devices with smart ones has many advantages, e.g., a seamless integration of physical objects into the users digital environment or improved modes of use. However, if a conventional device is replaced by a smart device, its IT components can cause risks, that shorten the life of the device. Such risks stem from different life cycles of embedded soft- and hardware, libraries and protocols used, and the IT ecosystem required. This is problematic, because many conventional household appliances, say, a fridge or TV, have a much longer life span than typical IT equipment. In this paper, we use a systematic approach to identify long-term risks for the operational life span of a smart fridge. In particular, we identify 8 different use cases of three typical smart fridges, e.g., cooling or managing "best before" dates. We model the IT ecosystem needed to run these use cases, and we inspect each asset in this ecosystem for potential long-term risks. We found that even cooling, the most basic use case, is at risk in the long run. This is because the setting cooling parameters may depend on parts of the IT ecosystem that are not under the users control. On the other hand, we did not find any risk that may lead to harm of the category "threatening". Our findings on the smart fridge can be generalized to other smart devices easily.

[387]  arXiv:2605.04788 [pdf, ps, other]
Title: Equilibrium points and stability of synchronous machine systems
Subjects: Systems and Control (eess.SY)

This paper investigates equilibrium points and stability in two synchronous machine configurations: (i) a single generator with an impedance load and (ii) two interconnected machines with co-located loads. We consider both abc and dq reference frames to show that the equilibrium condition reduces to a cubic polynomial in the single-machine case and to an 18th- degree polynomial in the two-machine case. For the single-machine system, Lyapunov stability analysis and linearization based stability analysis are carried out. For the two-machine system, local stability is assessed through linearization and eigenvalue analysis. Illustrative examples confirm the existence of multiple equilibria and illustrate the impact of parameter variation on stability. Our results provide insight into the stability of synchronous machine systems.

[388]  arXiv:2605.04791 [pdf, ps, other]
Title: OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches
Subjects: Human-Computer Interaction (cs.HC)

Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we intro- duce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a com- mercial smartwatch. It contains over 10 hours of Inertial Measurement Unit (IMU) and Photoplethysmography (PPG) data across 50 participants and a vocabulary of 59 labelled gesture sequences. Furthermore, we present a subject-independent evaluation protocol including traditional and deep learning methods for time-series classification. On top of this, we develop two novel methodologies for hand-gesture recognition: (i) MixToken, a task-specific mixture-of-experts that fuses per-channel IMU filterbank features with cross-channel statistical tokens through learned logit mixing, and (ii) NormWear-Lora, a low-rank adaptation module for smartwatch foundation models. Our benchmarking results reveal that PPG signals carries a sub- stantial predictive benefit (+12.5% F1-score) for foundational smartwatch models. In addition, we show that task-specific architectures (i.e. MixToken) substantially outperforms finetuned smartwatch foundation models in terms of accuracy (F1- score=90% vs 66%) and memory efficiency (223k vs 136M parameters). Finally, we also provide clear empirical guidance on the trade-offs between specialized architecture design, modality fusion, data augmentations, and foundation-model adaptation for resource-constrained wearable sensing.

[389]  arXiv:2605.04793 [pdf, ps, other]
Title: Bilinear Mamba-Koopman Neural MPC for Varying Dynamics
Comments: 18 pages, 5 figures. Preprint
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Koopman-based neural MPC models generate time-varying dynamics from historical data, but preserve convexity by enforcing that the system operator is independent of the current control input. This conditional independence constraint limits adaptation to changing dynamics within a single MPC horizon, particularly under time-varying conditions and under stale-plan execution.
We propose Bilinear Mamba-Koopman Neural MPC, a minimal extension that introduces control-dependent coupling in the latent dynamics, allowing the effective operator to adapt to the current input. The resulting model is a strict generalization of the standard linear, conditional-independence formulation, adds less than 1% parameters through a low-rank structure, and admits exact model Jacobians that enable efficient Sequential Convex Programming (SCP) with monotone-descent and KKT convergence results under standard trust-region assumptions.
Across CartPole and RSCP benchmarks in time-invariant and time-varying regimes, the proposed model matches or improves forecasting accuracy on every cell when training noise is averaged out, with strict gains where control-state coupling is structurally present. Its main closed-loop gains appear in the RSCP TV task, where iterative SCP improves adaptation within the horizon and substantially stabilizes training; in CartPole TV, the gains are modest but consistent. In delayed re-planning experiments on the time-varying variants, the bilinear model degrades more gracefully under stale-plan execution, maintaining a consistent advantage on CartPole TV and a substantially larger robustness margin on RSCP TV. These results show that control-dependent latent dynamics provide a simple and effective mechanism for robust MPC under varying conditions.

[390]  arXiv:2605.04794 [pdf, ps, other]
Title: Distance Distributions Between Nodes in Concentric Disk-Annulus or Sphere-Shell Regions
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This letter derives closed-form expressions for the probability density function of the distance between two nodes located in heterogeneous concentric geometries, namely a disk or sphere and a surrounding annulus or spherical shell. Two scenarios are considered: (i) both nodes are independently distributed in different regions, disk or sphere and annulus or shell, and (ii) one node is static in the outer region while the other follows the stationary distribution of the random waypoint model in the inner region. The resulting expressions provide a tractable analytical tool for performance evaluation in concentric wireless regions.

[391]  arXiv:2605.04796 [pdf, ps, other]
Title: Negative Imaginary and Passivity Properties of Synchronous Machine Systems
Subjects: Systems and Control (eess.SY)

The recent rapid proliferation of renewable energy is fundamentally changing the dynamic operations of power systems, necessitating new approaches to assess stability for these highly nonlinear systems. In this paper, we prove that synchronous machine systems, modeled in the nonlinear dq-frame, possess fundamental dissipativity properties. Specifically, we show passivity from current input to voltage output and a nonlinear negative imaginary property from torque input to rotor angle output. For the nonlinear system shifted around an equilibrium point, we derive explicit conditions for both passivity and the NI property to hold. Finally, we demonstrate that interconnection with passive droop controllers preserves these dissipativity properties with identical supply rates, thereby ensuring closed-loop stability.

[392]  arXiv:2605.04797 [pdf, ps, other]
Title: Beyond Seeing Is Believing: On Crowdsourced Detection of Audiovisual Deepfakes
Comments: Accepted at ROMCIR 2026, the 6th Workshop on Reducing Online Misinformation through Credible Information Retrieval, held in conjunction with ECIR 2026
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Deepfakes are increasingly realistic and easy to produce, raising concerns about the reliability of human judgments in misinformation settings. We study audiovisual deepfake detection by measuring how consistently crowd workers distinguish authentic from manipulated videos and, when they flag a video as manipulated, how accurately they identify the manipulation type (audio-only, video-only, or audio-video) and how consistently they report manipulation timestamps. We run two matched crowdsourcing studies on Prolific using AV-Deepfake1M and the Trusted Media Challenge (TMC) dataset. We sample 48 videos per dataset (96 total) and collect 960 judgments (10 per video). Results show that crowd workers rarely misclassify authentic videos as manipulated, but they miss many manipulations, and agreement remains limited across videos. Aggregating multiple judgments per video stabilizes the authenticity signal, but it cannot recover manipulations that most workers consistently miss. Manipulation type identification is substantially noisier than authenticity detection even when workers detect a manipulation, with joint audio-video cases being particularly hard to recognize. Overall, these findings suggest that crowdsourcing can provide a scalable screening signal for audiovisual authenticity, while reliable modality attribution remains an open challenge.

[393]  arXiv:2605.04798 [pdf, ps, other]
Title: Online Orthogonal Vectors Revisited
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)

We prove new upper and lower bounds for the Online Orthogonal Vectors Problem ($\mathsf{OnlineOV}_{n,d}$). In this problem, a preprocessing algorithm receives $n$ vectors $x_1,\ldots,x_n\in\{0,1\}^d$ and constructs a data structure of size $S$. A query algorithm subsequently receives a query vector $q\in\{0,1\}^d$ and in time $T$ decides whether $q$ is orthogonal to any of the input vectors $x_i$.
We design a new deterministic data structure for $\mathsf{OnlineOV}_{n,d}$. In low dimensions ($d = c \log n$), our data structure matches the performance of the best known randomized algorithm due to Chan [SoCG 2017]. Furthermore, in moderate dimensions ($d=n^{\varepsilon}$), we give the first improvement since Charikar, Indyk and Panigrahy [ICALP 2002]. Along the way, we give the first deterministic refutation of a conjecture on the hardness of $\mathsf{OnlineOV}$ posed by Goldstein, Lewenstein and Porat [ISAAC 2017]. This data structure also extends to a number of problems, including Partial Match, Orthogonal Range Search, and DNF Evaluation. We use a novel structure-versus-randomness decomposition to design our algorithm.
Under the Non-Uniform Strong Exponential Time Hypothesis, we also prove arbitrarily large polynomial space lower bounds for any $\mathsf{OnlineOV}$ data structure with sublinear query time even with computationally unbounded preprocessing. These lower bounds extend to several other problems, including Polynomial Evaluation, Partial Match, Orthogonal Range Search, and Approximate Nearest Neighbors. We also prove similar lower bounds for $\mathsf{3-SUM}$ with preprocessing under the Non-Uniform Hamiltonian Path Conjecture.

[394]  arXiv:2605.04803 [pdf, ps, other]
Title: Not All Faults Are Equal: Transient-Fault Sensitivity Characterization of an Open-Source RISC-V Vector Cluster
Subjects: Hardware Architecture (cs.AR)

We present a transient-fault sensitivity study of the open-source RISC-V vector cluster Spatz under SET and SEU fault models. Across 100,000 fault injections on six MatMul and Widening MatMul configurations, faulty data corruption (FD) is the dominant manifesting outcome for all evaluated workloads, accounting for at least 86% of manifesting errors in the SET campaigns and at least 91% in the SEU campaigns. At the module level, SET sensitivity is concentrated in the vector execution path, while TCDM is the major contributor to FD manifestations. We further quantify SDC severity across FP32, FP16, BP16, and FP8 by analyzing both the average number of corrupted outputs and their RMSE. FP8 shows the lowest output impact overall, while FP16 Widening MatMul reduces both corruption spread and RMSE compared with FP16 MatMul. By contrast, the effect of widening on FP8 is limited in our experiments. Finally, exponent-targeted corruptions induce the most severe SDC events, with the largest deviations observed in FP32 and BP16, motivating selective protection of the highest-impact datapaths and fault cases.

[395]  arXiv:2605.04805 [pdf, ps, other]
Title: An Adaptive Finite Element Method Based on Generalized Barycentric Coordinates
Authors: Yihui Zhou, Yuwen Li
Subjects: Numerical Analysis (math.NA)

This work derives a posteriori error estimate of polygonal finite element methods based on Wachspress barycentric coordinates. In particular, we prove that the classical residual-based a posteriori error estimator is both an upper and lower bounds for the discretization error. The analysis relies a Scott-Zhang type interpolation and homogeneity arguments for rational functions on polygonal elements. Numerical experiments on square and L-shaped domains demonstrate the effectiveness of the adaptive algorithm.

[396]  arXiv:2605.04806 [pdf, ps, other]
Title: Dr-PoGO: Direct Radar Pose-Graph Optimization
Comments: Accepted for presentation at ICRA 2026 Cite as @inproceedings{legentil2026drpogo, title={Dr-PoGO: Direct Radar Pose-Graph Optimization}, author={{Le Gentil}, Cedric and Weican, Li and Brizi, Leonardo and Barfoot, Timothy D.}, booktitle={IEEE International Conference on Robotics and Automation (ICRA)}, year={2026} }
Subjects: Robotics (cs.RO)

This paper introduces Dr-PoGO, a method for Simultaneous Localization And Mapping (SLAM) using a 2D spinning radar. Unlike cameras or lidars that require line-of-sight, millimetre-wave radars can `see' through dust, falling snow, rain, etc. Accordingly, it is a great modality for robust perception regardless of the weather conditions. While most existing radar-based SLAM methods rely on the extraction of point clouds or features to perform ego-motion estimation, Dr-PoGO leverages direct registration techniques for odometry (DRO) and loop-closure registration. An off-the-shelf radar-focused place recognition algorithm, RaPlace, provides loop-closure candidates. As RaPlace does not provide relative transformations, Dr-PoGO introduces a coarse-to-fine registration that uses visual features and descriptors to obtain an initial guess for the direct transformation refinement. The global trajectory is optimized in a pose-graph optimization. Dr-PoGO demonstrates state-of-the-art performance over 300km of data in various real-world automotive environments. Our implementation is publicly available: https://github.com/utiasASRL/dr_pogo.

[397]  arXiv:2605.04808 [pdf, ps, other]
Title: DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents
Comments: 279 pages, 148 figures
Subjects: Artificial Intelligence (cs.AI)

AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due to their high capability and flexibility, such agents raise significant security and safety concerns. A growing number of real-world incidents have shown that adversaries can easily manipulate agents into performing harmful actions, such as leaking API keys, deleting user data, or initiating unauthorized transactions. Evaluating agent security is inherently challenging, as agents operate in dynamic, untrusted environments involving external tools, heterogeneous data sources, and frequent user interactions. However, realistic, controllable, and reproducible environments for large-scale risk assessment remain largely underexplored. To address this gap, we introduce the DecodingTrust-Agent Platform (DTap), the first controllable and interactive red-teaming platform for AI agents, spanning 14 real-world domains and over 50 simulation environments that replicate widely used systems such as Google Workspace, Paypal, and Slack. To scale the risk assessment of agents in DTap, we further propose DTap-Red, the first autonomous red-teaming agent that systematically explores diverse injection vectors (e.g., prompt, tool, skill, environment, combinations) and autonomously discovers effective attack strategies tailored to varying malicious goals. Using DTap-Red, we curate DTap-Bench, a large-scale red-teaming dataset comprising high-quality instances across domains, each paired with a verifiable judge to automatically validate attack outcomes. Through DTap, we conduct large-scale evaluations of popular AI agents built on various backbone models, spanning security policies, risk categories, and attack strategies, revealing systematic vulnerability patterns and providing valuable insights for developing secure next-generation agents.

[398]  arXiv:2605.04809 [pdf, ps, other]
Title: Optimal Uncertainty-Aware Calibration for the AX=YB Problem
Comments: 23 pages, 26 figures, under review in IJRR
Subjects: Robotics (cs.RO)

This article proposes a general optimization framework for solving hand-eye calibration problem. Unlike traditional methods, an iterative algorithm based on Lie algebra that achieves approximately global optimal solutions is developed. During the optimization process, the method strictly preserves the structural constraints of the calibration parameters and enables synchronized updates between calibration parameters. Recognizing that data used in real-word hand-eye calibration often contain uncertainty, especially in over-loading and large workspace industrial robot scenarios, which can significantly degrade accuracy, and accurately modeling such uncertainty is inherently difficult, this article avoids explicit uncertainty modeling. Instead, an uncertainty metric to evaluate the relative uncertainty between data sources is introduced and used to dynamically refine the iterative process. To further enhance convergence efficiency, an effective initial solution generation method that improves overall stability and accuracy is designed. Numerical simulations and real-world experiments validate the effectiveness of the proposed approach, and in synthetic datasets, the proposed approach improves the estimation accuracy by at least 67\% under high-uncertainty conditions compared with the existing methods.

[399]  arXiv:2605.04811 [pdf, ps, other]
Title: Tree-based Credit Assignment for Multi-Agent Memory System
Subjects: Multiagent Systems (cs.MA)

Memory systems are widely adopted to enhance LLMs for long-horizon tasks, and are commonly organized as multi-agent pipelines with memory building, summarizing, and retrieval agents. To empower this system, existing RL-based methods either apply final downstream task rewards (e.g., QA accuracy) for all agents uniformly, which are coarse and ambiguous, or design task-specific rewards for agents on different subtasks, which require costly annotations (e.g., key evidence) and are difficult to define reliably. To address these limitations, we propose Tree-based Credit Assignment for Multi-Agent Memory Systems (TreeMem), which derives agent-specific credit from the final reward without task-specific annotations. Specifically, TreeMem extends the multi-agent pipeline (builder--summarizer--retrieval) into a tree structure, where each agent's outputs are expanded into multiple subsequent branches. The contribution of each agent is estimated via Monte Carlo averaging over its subsequent branches, capturing how intermediate agent actions may influence the final reward. This converts the coarse final reward into agent-specific optimization signals. These signals are then used to update all agent policies simultaneously, helping heterogeneous agents specialize effectively. Experiments on long-horizon benchmarks show that TreeMem improves memory system performance over strong baselines, validating the effectiveness of tree-structured credit assignment for the multi-agent memory system.

[400]  arXiv:2605.04813 [pdf, ps, other]
Title: A Biased Nonnegative Block Term Tensor Decomposition Model for Dynamic QoS Prediction
Subjects: Machine Learning (cs.LG)

With the rapid development of cloud computing and Web services, Quality of Service (QoS) has become a key criterion for service selection and recommendation. Tensor latent feature analysis provides an effective way to model multidimensional QoS data, and most existing QoS prediction methods are mainly based on Canonical Polyadic (CP) decomposition or Tucker decomposition. However, constrained by their inherent structural properties, these methods cannot accurately capture the complex and dynamic dependencies in user-service interactions, which limits their prediction performance. To address this issue, this paper proposes a dynamic QoS prediction framework based on the Biased Nonnegative Block Term Tensor Decomposition Model, termed BNBT. Specifically, the proposed framework is developed from three aspects: (1) block term tensor decomposition is employed to enhance the representation capability of latent feature learning; (2) linear bias terms are incorporated to further improve prediction accuracy; and (3) a tensor-oriented single-element-dependent nonnegative multiplicative update algorithm, called SLF-NMUT, is designed for efficient parameter estimation. Extensive experiments on real-world QoS datasets demonstrate that the proposed BNBT framework consistently outperforms several state-of-the-art QoS prediction methods in terms of prediction accuracy.

[401]  arXiv:2605.04816 [pdf, ps, other]
Title: Building AI Companions that Prioritise Learning over Performance
Subjects: Human-Computer Interaction (cs.HC)

Large language models (LLMs) are rapidly transforming knowledge work by improving the quality and efficiency of tasks such as writing, coding, and data analysis. However, their growing use in education has exposed a learning-performance paradox: while they can enhance short-term task performance, they may also undermine genuine learning, including cognitive growth, knowledge transfer, and metacognitive development. This paper addresses the question of how artificial intelligence should be designed and used to support learning rather than merely improve immediate outputs. We introduce the concept of AI learning companions, defined as adaptive, pedagogically informed, LLM-powered agents designed for integration into learning environments. We propose a framework for their design built on three interrelated foundations: a pedagogical foundation focused on how students learn with AI, an adaptive foundation focused on how AI learns about students, and a responsible design foundation ensuring systems remain transparent, accountable, inclusive, and secure. The framework is illustrated through five case studies spanning diverse educational contexts, levels, and tool designs, revealing both the promise and current limitations of existing tools. We conclude that there is a necessary shift away from LLMs designed for task-oriented performance, and beyond simply prompting them to act as tutors, toward deliberately developed AI learning companions that are pedagogically sound, adapt to their learners, and foster durable understanding, metacognitive growth, and learner agency.

[402]  arXiv:2605.04819 [pdf, ps, other]
Title: Unsat Core Prediction through Polarity-Aware Representation Learning over Clause-Literal Hypergraphs
Comments: Accepted at ICML 2026. Camera-ready version coming soon
Subjects: Machine Learning (cs.LG)

Graph neural networks have been widely used in Boolean satisfiability (SAT) tasks to learn structural information from SAT formulas. The goal of these studies is to solve SAT instances or to enhance SAT solvers, including tasks such as unsat-core prediction. However, most existing approaches model a SAT formula as a bipartite graph or a directed acyclic graph, which are less expressive in capturing higher-order interactions among literals and clauses. Moreover, these approaches are limited in modeling intrinsic polarity-related properties of SAT, such as the complementary relationship between the positive and negative literals of a variable. To address these limitations, we propose a polarity-aware representation learning framework over clause-literal hypergraphs. We model SAT formulas as clause-literal hypergraphs augmented with a clause incidence graph to capture higher-order structural interactions. We then introduce a polarity-aware decomposed mechanism that separates variable representations into polarity invariant and equivariant components, explicitly modeling the relationship between positive and negative literals, with the resulting literal representations propagated along the hypergraph structure. We further incorporate a polarity-inversion consistency regularization to reinforce polarity-consistent representations during training. Experimental results on multiple SAT datasets demonstrate the effectiveness of the proposed approach.

[403]  arXiv:2605.04821 [pdf, ps, other]
Title: Toward less conservative distributed stability analysis of power systems via matrix-valued differential passivity indices
Comments: 18 pages, 9 figures
Subjects: Systems and Control (eess.SY)

Passivity indices have been widely adopted to derive distributed stability certificates for power systems. Nevertheless, conventional passivity indices remain scalar-valued even for multi-input-multi-output (MIMO) systems, which can introduce excessive conservatism and compromise analysis accuracy. To overcome these limitations, this paper extends the differential passivity index to a matrix-valued formulation that captures both channel-wise passivity properties and inter-channel coupling effects in MIMO subsystems. On this basis, semi-distributed and fully distributed stability criteria are developed for power systems with heterogeneous nonlinear devices. It is shown that system stability is guaranteed when the aggregate passivity excess of devices compensates for the passivity shortage imposed by the network. Furthermore, analytical passivity matrix expressions for typical power system components are derived, facilitating compositional stability analysis. Case studies on a three-bus system and a modified IEEE 118-bus system validate the effectiveness of the proposed framework.

[404]  arXiv:2605.04825 [pdf, ps, other]
Title: Improving FMQA via Initial Training Data Design Considering Marginal Bit Coverage in One-Hot Encoding
Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech)

Factorization machine with quadratic-optimization annealing (FMQA) is a black-box optimization method that combines a factorization machine (FM) surrogate with QUBO-based search by an Ising machine. When FMQA is applied to integer or discretized continuous variables via one-hot encoding, uniform random initial sampling can leave many binary variables never active in the initial training data, and the corresponding FM parameters receive no direct gradient updates from the observed responses. We address this by designing the initial training data to achieve complete marginal bit coverage, namely, ensuring that every binary variable obtained by one-hot encoding takes the value one at least once. We use two space-filling sampling methods, Latin hypercube sampling (LHS) and the Sobol' sequence, yielding LHS-FMQA and Sobol'-FMQA. On the human-powered aircraft wing-shape optimization benchmark with 17 and 32 design variables, both proposed methods achieved numerically higher mean final cruising speeds than the baseline FMQA, with the advantage more pronounced on the 32-variable problem.

[405]  arXiv:2605.04826 [pdf, ps, other]
Title: Faster Algorithms for Shortest Unique or Absent Substrings
Comments: SWAT 2026
Subjects: Data Structures and Algorithms (cs.DS)

We revisit two well-known algorithmic problems on strings: computing a shortest unique substring (SUS) and a shortest absent substring (SAS) of a string $S$ of length $n$. Both problems admit folklore $\mathcal{O}(n)$-time solutions using the suffix tree of $S$. However, for small alphabets, this complexity is not necessarily optimal in the word RAM model, where a string of length $n$ over alphabet $[0,\sigma)$ can be stored in $\mathcal{O}(n \log \sigma/\log n)$ space and read in $\mathcal{O}(n \log \sigma/\log n)$ time.
We present an $\mathcal{O}(n \log \sigma/\sqrt{\log n})$-time algorithm for computing a SUS of $S$. This algorithm decomposes the problem according to the length and the period of the sought substring and uses several tools and techniques, such as synchronizing sets, the analysis of runs, and wavelet trees, to reduce the computation of a SUS to a simple geometric problem. Further, we adapt this algorithm and combine it with an efficient construction of de Bruijn sequences in order to obtain an $\mathcal{O}(n \log \sigma/\sqrt{\log n})$-time algorithm for computing a SAS of $S$.

[406]  arXiv:2605.04827 [pdf, ps, other]
Title: Trustworthy Federated Label Distribution Learning under Annotation Quality Disparity
Subjects: Machine Learning (cs.LG)

Label Distribution Learning (LDL) models supervision as an instance-wise probability distribution, enabling fine-grained learning under inherent ambiguity, but its success relies on high-fidelity label distributions that are costly to obtain and thus often noisy. Motivated by privacy-sensitive applications, we study Federated Label Distribution Learning (Fed-LDL), where data isolation further induces heterogeneous annotation quality across clients, making local updates unevenly reliable and breaking sample-size-based aggregation (e.g., FedAvg). To address this trust dilemma, we propose FedQual, a quality-aware Fed-LDL framework with two coupled mechanisms: (i) quality-adaptive client training guided by a global semantic anchor that calibrates low-quality clients while preserving high-quality autonomy, and (ii) reliability-aware server aggregation that reweights client contributions by effective reliable information rather than raw sample size. To enable rigorous evaluation, we construct four new Fed-LDL benchmarks (FER-LDL, FI-LDL, PIPAL-LDL, and KADID-LDL) with controlled annotation quality disparity. We further provide a theoretical guarantee showing that under heterogeneous supervision quality, client-specific calibration is strictly better than any uniform calibration. Extensive experiments on the proposed benchmarks demonstrate the effectiveness of FedQual.

[407]  arXiv:2605.04828 [pdf, ps, other]
Title: Toward an Understanding of Developer Behaviour while Using Bug Localization Tools
Comments: 6 pages, 1 figure, accepted in International Conference on Evaluation and Assessment in Software Engineering (EASE), 2026 edition
Subjects: Software Engineering (cs.SE)

Bug fixing is a complex and time-consuming task in software development. Bug localization research tends to focus on the accuracy of automated tools that suggest source code files for developers to look at. However, little is known about how developers use these tools in practice. This paper reports on an ongoing qualitative user study. Eleven participants worked through four realistic bug localization tasks in a controlled environment and were given varying levels of support information offered by a specialized tool. Participants were asked to think aloud in a semi-structured interview session. The preliminary findings provide insight into three aspects of practice: how developers interact with tools, the role social and contextual information plays, and problem solving. The study demonstrates that bug localization is complex and suggests that the adoption of effective tools depends on more than their accuracy.

[408]  arXiv:2605.04829 [pdf, ps, other]
Title: Traffic Chunk Sizing vs. Optical Switching Speed in Future All-Optical Satellite Networks
Subjects: Networking and Internet Architecture (cs.NI)

To enable efficient resource utilization under stringent Size, Weight, and Power (SWaP) constraints through transparent and all-optical switched satellites transmission, various switching paradigms can be considered, including packet, burst, or circuit. To this end, the traffic assembly and algorithmic design for path computations at the ground stations play a key role in determining the switching fabric design. Generally, traffic can be buffered and assembled in chunks at the ground stations and forwarded over the pre-computed optical path in space, similar to terrestrial optical burst switching or fast circuit switching. Regardless of the chosen paradigm, the switching fabric must satisfy specific latency performance requirements. This paper studies the performance of all-optical satellite networks based on the maximum traffic chunk sizes that can be scheduled and the performance of optical switching fabrics in the future over all-optical constellations. We consider various optical switching technologies, including MEMS- and integrated photonic-based solutions, in the context of switching speed, power consumption, and insertion loss. Simulation results indicate that traffic chunk size critically impacts the performance required by optical switching fabrics onboard a satellite.

[409]  arXiv:2605.04830 [pdf, ps, other]
Title: Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models
Comments: 20 pages, 10 figures. comments are welcome
Subjects: Machine Learning (cs.LG)

Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcate into different semantic minima of the energy landscape, whereas the nonlocality picture views the critical window as when local denoising fails. We study whether two notions of such phase transitions are concurrent in modern diffusion transformers. By evaluating the dynamics and outcomes of the generation trajectory, we observe a near-simultaneous occurrence of the non-locality and symmetry breaking critical times. Our work is the first to unify the two notions of phase transitions in practice: it provides a concrete diagnostic for when and why diffusion models rely on conditioning and global denoising, enabling principled evaluation of model efficiency and guiding the design of architectures and sampling schemes that avoid unnecessary computation.

[410]  arXiv:2605.04831 [pdf, ps, other]
Title: StoryAlign: Evaluating and Training Reward Models for Story Generation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Story generation aims to automatically produce coherent, structured, and engaging narratives. Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex narrative structure and human-aligned preferences. A key reason is the absence of effective modeling of human story preferences, which are inherently subjective and under-explored. In this work, we systematically evaluate the modeling of human story preferences and introduce StoryRMB, the first benchmark for assessing reward models on story preferences. StoryRMB contains $1,133$ high-quality, human-verified instances, each consisting of a prompt, one chosen story, and three rejected stories. We find existing reward models struggle to select human-preferred stories, with the best model achieving only $66.3\%$ accuracy. To address this limitation, we construct roughly $100,000$ high-quality story preference pairs across diverse domains and develop StoryReward, an advanced reward model for story preference trained on this dataset. StoryReward achieves state-of-the-art (SoTA) performance on StoryRMB, outperforming much larger models. We also adopt StoryReward in downstream test-time scaling applications for best-of-n (BoN) story selection and find that it generally chooses stories better aligned with human preferences. We will release our dataset, model, and code to facilitate future research. Related code and data are available at https://github.com/THU-KEG/StoryReward.

[411]  arXiv:2605.04832 [pdf, ps, other]
Title: Replay-Based Continual Learning for Physics-Informed Neural Operators
Subjects: Machine Learning (cs.LG)

Neural operators generally demonstrate strong predictive performance on in-distribution (ID) problems. However, a critical limitation of existing methods is their significant performance degradation when encountering out-of-distribution (OOD) data. To address this issue, this work introduces continual learning into physics-informed neural operators, with particular emphasis on neural operators built upon the Transolver architecture, and proposes a simple yet effective replay-based continual learning strategy. The proposed method is fully physics-informed and does not require labeled data, relying solely on input fields together with physical constraints for training. When new OOD data become available, a small number of past data are incorporated through a distillation-based constraint to preserve previously acquired knowledge and alleviate catastrophic forgetting. Meanwhile, a transfer learning LoRA is employed to enable rapid adaptation to the new data. The proposed framework is systematically validated on three representative physical problems, including the Darcy flow problem in fluid mechanics, a two-dimensional hyperelastic brain tumor problem in biomechanics, and a three-dimensional linear elastic Triply Periodic Minimal Surfaces problem in solid mechanics. The results demonstrate that the proposed method effectively mitigates catastrophic forgetting on previously learned data while maintaining fast adaptability to new data. Compared with conventional joint training strategies, the proposed method significantly improves training efficiency while reducing additional memory usage and computational cost.

[412]  arXiv:2605.04834 [pdf, ps, other]
Title: Bridging Input Feature Spaces Towards Graph Foundation Models
Comments: 33 Pages, 2 Figures, 26 Tables, ICLR 2026
Subjects: Machine Learning (cs.LG)

Unlike vision and language domains, graph learning lacks a shared input space, as input features differ across graph datasets not only in semantics, but also in value ranges and dimensionality. This misalignment prevents graph models from generalizing across datasets, limiting their use as foundation models. In this work, we propose ALL-IN, a simple and theoretically grounded method that enables transferability across datasets with different input features. Our approach projects node features into a shared random space and constructs representations via covariance-based statistics, thus eliminating dependence on the original feature space. We show that the computed node-covariance operators and the resulting node representations are invariant in distribution to permutations of the input features. We further demonstrate that the expected operator exhibits invariance to general orthogonal transformations of the input features. Empirically, ALL-IN achieves strong performance across diverse node- and graph-level tasks on unseen datasets with new input features, without requiring architecture changes or retraining. These results point to a promising direction for input-agnostic, transferable graph models.

[413]  arXiv:2605.04835 [pdf, ps, other]
Title: Patterns of Developer Adoption of LLM-Generated Code Refactoring Suggestions
Comments: Accepted to PROMISE 2026
Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)

Large language models (LLMs) have gained widespread popularity and have steadily improved over time, enabling software developers to use them for various code-related tasks. One common task is code refactoring, where the LLM suggests changes for the developer to apply to their code to improve quality attributes such as readability or maintainability. While current research focuses on evaluating LLM-generated refactoring suggestions, there is a limited understanding of how developers apply these suggestions in practice. To explore this, we analyze 169 GitHub commits where developers refactor their code based on a ChatGPT conversation linked in the commit message. We found that developers mostly accept and use the suggestions without modifications. When changes are made, they are mostly major and fall into five different patterns that depend on the refactoring activity, the developer's prompt, and the validity of the response from ChatGPT.

[414]  arXiv:2605.04837 [pdf, ps, other]
Title: Shedding Light onto Safety Integrity Level and Basic Software Constraints in a Real-World Automotive Application: Case Study with Driverator Framework
Authors: Tobias Denzinger (CARIAD SE), Matthias Becker (KTH Royal Institute of Technology), Peter Ulbrich (TU Dortmund University)
Comments: 8 pages, 2 figures, 6 tables. Preprint. Driverator framework: this https URL
Subjects: Software Engineering (cs.SE); Operating Systems (cs.OS)

Automotive electronic control units (ECUs) are intricate systems with hundreds of individual functions, numerous software components, and multiple interdependent tasks. A prevalent structural pattern in these systems are so-called cause-effect chains. While significant research efforts have been dedicated to the temporal analysis and optimization of these chains, particularly minimizing data age and function response times, other crucial non-functional properties remain relatively underexplored. In particular, the safety integrity level (SIL) classification substantially influences the system design by determining task colocation strategies. Improper sharing of functions or interweaving tasks with different safety levels can compromise the integrity of critical functions. Additionally, AUTOSAR basic software (BSW) (e.g. OS, runtime environment, communication stacks, or diagnostics) introduces complexity that varies based on task characteristics and SIL categories. Furthermore, memory requirements present another critical challenge, given the diversity of memory architectures and SIL-specific dependencies that strongly constrain task allocations. This paper thoroughly characterizes a real-world automotive application, describing an automotive application based on SIL constraints, the impact of basic software, and memory requirements. In this context, the Driverator configuration framework is introduced for scalable system analysis.

[415]  arXiv:2605.04839 [pdf, ps, other]
Title: Hearing the Ocean: Bio-inspired Gammatone-CNN framework for Robust Underwater Acoustic Target Classification
Comments:
Subjects: Sound (cs.SD)

This study presents a bio inspired signal processing framework for robust Underwater Acoustic Target Recognition (UATR). The latest state of the art methods often fail to resolve dense low frequency harmonic structures in vessel propulsion signals under high noise conditions, which is addressed by the proposed framework using a biologically inspired Gammatone filter bank that emulates the cochlea nonlinear frequency selectivity. By distributing filters according to the Equivalent Rectangular Bandwidth (ERB) scale, the framework achieves a high fidelity representation of engine radiated tonals while effectively suppressing isotropic ambient interference. The resulting Cochleagram features are processed by a lightweight, custom designed Convolutional Neural Network (CNN) that leverages large receptive fields to integrate spectral-temporal continuities. Experimental results on the VTUAD dataset demonstrate a state of the art classification accuracy of 98.41%, outperforming Continuous Wavelet Transform and Mel Frequency Cepstral Coefficients baselines by 3.5% and 7.7% respectively. Furthermore, the framework achieves an inference latency of only 0.77 ms and a 0.971 Cohen Kappa score, validating its efficacy for real time deployment on autonomous, low-power sonar hardware.

[416]  arXiv:2605.04842 [pdf, ps, other]
Title: Communication Offloading on SmartNIC DPUs: A Quantitative Approach
Comments: To appear in Euro-Par 2026
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

SmartNIC Data Processing Units (DPUs) offer a promising solution for saving high-end CPU resources by offloading tasks to programmable cores near the network interface. In this work, we explore the feasibility of SmartNIC DPUs in supporting an asynchronous communication model called "fire-and-forget", particularly its core message routing service. We design a communication offloading engine called Buddy that decouples communication tasks from the application process. Buddy runs flexibly on SmartNIC DPUs such as the Nvidia BlueField-3 DPU and generic x86 CPUs. Our evaluation results in five applications identify the memory-to-communication ratio as a key predictor of the offloading performance. Host-dominated workloads, such as Quicksilver and Sparse Matrix Transpose, achieved up to 1.55x speedup with communication offloaded to the DPU. We further identify a 625x increase in DRAM traffic due to the absence of Direct Cache Access support on the DPU, highlighting a critical need in future SmartNIC designs.

[417]  arXiv:2605.04843 [pdf, ps, other]
Title: Convergence analysis of Schwarz-like methods for degenerate elliptic-parabolic equations
Subjects: Numerical Analysis (math.NA)

Convergence is proven for Schwarz-like methods applied to degenerate elliptic-parabolic equations with a $p$-structure. This family of PDEs, e.g., arises when modelling nonlinear diffusion processes. The Schwarz-like approximation methods are based on decomposing the space-time domain into overlapping subdomains, which enables parallel implementations. The methods are derived by introducing a pseudo-time component and applying time integrators of splitting type, which are time stepped towards infinity. This approach of decomposing the space-time domain is related to Schwarz waveform relaxation methods, but the methods considered here have the advantage that they can be proven to converge when applied to nonlinear parabolic, or even degenerate elliptic-parabolic, PDEs. We prove convergence by deriving a nonlinear framework based on the abstract theory for monotone operators and the existence theory for degenerate elliptic-parabolic equations.

[418]  arXiv:2605.04844 [pdf, ps, other]
Title: QuadBox: Accelerating 3D Gaussian Splatting with Geometry-Aware Boxes
Comments: 6 pages, 4 figures. Accepted by ICIP 26
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

3D Gaussian Splatting (3DGS) has emerged as an advanced technique for real-time novel view synthesis by representing scene geometry and appearance using differentiable Gaussian primitives. However, efficiently computing precise Gaussian-tile intersections remains a critical task in the rasterization pipeline. To this end, we propose QuadBox, a method that leverages four axis-aligned bounding boxes to tightly encapsulate projected Gaussians in a discrete manner. First, we derive a geometry-aware stretching factor that enables the construction of a tile-aligned QuadBox, which covers the elliptical projection and largely excludes irrelevant tiles. Second, we introduce QPass, a single-pass tile traversal algorithm that exhaustively exploits the discrete nature of QuadBox, ensuring that the tile intersection check is performed with simple interval tests. Experiments on public datasets show that our method accelerates the rendering speed of 3DGS by 1.85$\times$. Code is available at \href{https://github.com/Powertony102/QuadBox}{https://github.com/Powertony102/QuadBox}.

[419]  arXiv:2605.04845 [pdf, ps, other]
Title: Agentic Repository Mining: A Multi-Task Evaluation
Authors: Johannes Härtel
Comments: Accepted at the 30th International Conference on Evaluation and Assessment in Software Engineering (EASE 2026). 11 pages
Subjects: Software Engineering (cs.SE)

Mining software repositories often requires classifying artifacts like commits, reviews, code lines, or entire repositories into categories. Human labeling is expensive and error-prone; limited context frequently leads to misclassifications or uncertainty in labels. We investigate whether LLM agents that dynamically explore repositories through standard bash commands can match the classification quality of simple LLMs that receive pre-engineered context. Across four tasks, eight approach configurations, and 4943 classifications, agents achieve competitive accuracy despite retrieving their own context. The primary advantage is robustness: agents avoid context-window overflows and scale independently of artifact size. A manual diagnosis of 100 cases where approaches disagree with the ground truth reveals specification ambiguities and labels produced under limited context, suggesting that accuracy against such ground truth may underestimate approaches with broader context access.

[420]  arXiv:2605.04847 [pdf, ps, other]
Title: Quantile-Free Uncertainty Quantification in Graph Neural Networks
Comments: Accepted at the 43rd International Conference on Machine Learning (ICML 2026)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Uncertainty quantification (UQ) in graph neural networks (GNNs) is crucial in high-stakes domains but remains a significant challenge. In graph settings, message passing often relies on strong assumptions such as exchangeability, which are rarely satisfied in practice. Moreover, achieving reliable UQ typically requires costly resampling or post-hoc calibration. To address these issues, we introduce Quantile-free Prediction Interval GNN (QpiGNN), a framework that builds on quantile regression (QR) to enable GNN-based UQ by directly optimizing coverage and interval width without requiring quantile inputs or post-processing. QpiGNN employs a dual-head architecture that decouples prediction and uncertainty, and is trained with label-only supervision through a quantile-free joint loss. This design allows efficient training and yields robust prediction intervals, with theoretical guarantees of asymptotic coverage and near-optimal width under mild assumptions. Experiments on 19 synthetic and real-world benchmarks show QpiGNN achieves average 22\% higher coverage and 50\% narrower intervals than baselines, while ensuring efficiency and robustness to noise and structural shifts.

[421]  arXiv:2605.04848 [pdf, ps, other]
Title: RTMS: A Real-Time Multimodal Scaffolding System for Improving Debugging in Computing Education
Subjects: Human-Computer Interaction (cs.HC)

Debugging is a demanding aspect of programming yet guidance on how to teach it effectively remains limited. Novices often struggle to recognize impasses regulate their problem solving and manage cognitive load and stress. This study investigates whether real time multimodal feedback triggered by indicators of cognitive load and physiological stress can improve debugging performance narrow expert novice gaps and reduce the influence of prior programming experience on success. We conducted a between subjects experiment with 120 undergraduate computer science students who debugged a medium sized Python program. Participants were assigned to one of four conditions no feedback cognitive load triggered feedback stress triggered feedback or combined trigger feedback. Eye tracking and heart rate variability data were used to detect moments of struggle and automatically deliver brief context sensitive hints. All three feedback conditions significantly improved debugging success and efficiency compared with the control group. Cognitive load triggered feedback produced stronger gains than stress triggered feedback and the combined trigger condition yielded the largest improvements. Programming expertise predicted performance only in the control condition and in all feedback conditions the novice expert gap was markedly reduced. Adaptive feedback that responds to learners cognitive and affective states can help manage debugging demands and reduce performance differences linked to prior experience highlighting opportunities for physiologically aware adaptive learning environments.

[422]  arXiv:2605.04853 [pdf, ps, other]
Title: Hybrid Iterative Neural Low-Regularity Integrator for Nonlinear Dispersive Equations
Authors: Zhangyong Liang
Subjects: Machine Learning (cs.LG)

We propose HIN-LRI, a hybrid framework that augments a classical numerical solver with a neural operator trained to correct the solver's structured truncation error. A base low-regularity integrator provides a consistent first-order approximation to nonlinear dispersive PDEs, while a lightweight neural network, operating on a low-dimensional latent manifold, learns the residual defect that analytical methods cannot close. An explicit time-step scaling on the neural correction ensures that its Lipschitz contribution remains $\mathcal{O}(\tau)$, yielding a Gronwall stability factor bounded uniformly in the step size and independent of the spatial resolution. The network is trained end-to-end through a solver-in-the-loop objective that unrolls the full iteration and penalises trajectory error in a Bourgain-type norm, aligning learning with multi-step solver dynamics rather than isolated one-step targets. Under stated assumptions, the global error satisfies $C(\varepsilon_{net}+\delta)\,\tau^\gamma\ln(1/\tau)$, where $\varepsilon_{net}$ measures the network approximation quality and $\delta$ the training shortfall. Experiments on three dispersive benchmarks with rough data show that HIN-LRI improves accuracy over analytical integrators, splitting methods, and neural PDE surrogates, with stable spatial refinement, effective out-of-distribution transfer, and modest online overhead.

[423]  arXiv:2605.04856 [pdf, ps, other]
Title: 3D Ultrasound-Derived Pseudo-CT Synthesis Using a Transformer-Augmented Residual Network for Real-Time Operator Guidance
Comments: 9 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Computed tomography (CT) is indispensable for clinical diagnosis and image-guided interventions but exposes patients to ionizing radiation, motivating the development of safer imaging alternatives. Ultrasound (US) is non-ionizing and widely accessible; however, it is highly operator dependent and lacks quantitative tissue characterization, often leading to diagnostic uncertainty and unnecessary CT examinations. This work presents a 3D ultrasound-derived pseudo-CT (UD-pCT) framework that generates CT-like anatomical reference volumes inferred from US, without aiming to reproduce physically accurate Hounsfield Units. Paired 3D kidney US and CT volumes from the TRUSTED dataset are first spatially aligned using a landmark-based multimodal registration pipeline, creating high-quality paired inputs for supervised training of an adversarial framework. The proposed Bottleneck Transformer Residual U-Net3D (BT-ResUNet3D) model employs a 3D residual encoder-decoder generator augmented with a transformer bottleneck, enabling effective modeling of fine-grained local anatomical structures as well as long-range volumetric dependencies, while a 3D Conditional PatchGAN discriminator enforces local structural realism in the synthesized pseudo-CT volumes. Quantitative evaluation using PSNR and SSIM demonstrates that the proposed method outperforms established baselines in structural fidelity and perceptual image quality. The UD-pCT volumes provide real-time anatomical reference for operator guidance, potentially reducing acquisition variability and unnecessary CT use. A limitation of this study is the relatively small paired dataset, which may limit the generalizability of the proposed model.

[424]  arXiv:2605.04857 [pdf, ps, other]
Title: Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers frequently adopt a literal-first approach, which incurs measurable cognitive costs. This resource captures these costs through ocular metrics recorded from Portuguese L1 speakers of English across all CEFR proficiency levels (A1-C2). Although the study uses entry-level 60 Hz hardware (Tobii Pro Spark), we demonstrate that this sampling rate provides sufficient data density to detect macro-cognitive events such as fixations and regressions in reading. Preliminary analysis validates the dataset by revealing a strong inverse correlation between language proficiency and regressive eye movements. Integrated into the MIA (Modeling Idiomaticity in Human and Artificial Language Processing) initiative, this dataset serves as a cognitively grounded benchmark for evaluating both human processing models and the alignment of large language models with human-like figurative understanding.

[425]  arXiv:2605.04858 [pdf, ps, other]
Title: A Pragmatic Comparison of Cryptographic Computation Technologies for Machine Learning
Subjects: Cryptography and Security (cs.CR)

As security demands increase, the importance of secure computation technologies grows, yet these technologies can often seem overwhelming to practitioners. Furthermore, many approaches focus only on a single technology, potentially overlooking superior alternatives. This work aims to address the issue of selecting the right technology for secure computation by presenting a comparative analysis of two highly relevant cryptographic methods and their software implementations, with a particular focus on machine learning. Firstly, we provide a theoretical summary and comparison of the secure computation paradigms of secure multi-party computation (SMPC) and fully homomorphic encryption (FHE). We outline the advantages and limitations of the protocols, as well as the relevant open-source software implementations. Secondly, we present the results of extensive benchmarking of the main software frameworks identified for machine learning operations and models. Regarding the current state of the art in FHE, we observe that it outperforms SMPC for regressions. Additionally it may be faster for simple dense networks using GPUs or Hybrid Models. Conversely, SMPC showed superior performance for complex models such as CNNs. Our results should pave the way for more technology-agnostic benchmarking of secure computation technologies for machine learning, providing guidance for practitioners looking to adopt these technologies.

[426]  arXiv:2605.04863 [pdf, ps, other]
Title: Update-Magnitude State Redistribution (UM-SRD): A Shut-off Extension of Weighted SRD for Cut-Cell Methods
Authors: Justo E. Karell
Subjects: Numerical Analysis (math.NA)

Berger & Giuliani (2024) developed a provably stable weighted state redistribution (SRD) algorithm for cut-cell meshes. A key limitation of their method is that, although flux redistribu- tion naturally vanishes when updates are small, SRD continuously applies redistribution even when the flux balance is zero, preventing exact steady-state preservation and potentially in- troducing unnecessary dissipation in smooth regions. This work introduces Update-Magnitude State Redistribution (UM-SRD), which blends the SRD operator with the identity operator via a smooth, locally-defined scalar indicator of the finite-volume update magnitude. UM-SRD preserves conservation and reduces exactly to the base scheme when the finite-volume update is exactly zero in a small-cell neighborhood. For a one-dimensional model problem with a single small cut cell, we prove UM-SRD is total variation diminishing under the same CFL condition as the base upwind scheme, show the local truncation error modification is higher-order in smooth regions with the unnormalized indicator, and show that the normalized implementation pre- serves first-order accuracy. Numerical experiments demonstrate convergence toward first order on smooth 1D and 2D advection tests, confirm shut-off behaviour, verify non-oscillatory proper- ties, provide numerical evidence that UM-SRD stabilizes the base scheme near a small cut cell where the base scheme diverges, and confirm exact steady-state preservation. The algorithm reuses existing weighted SRD infrastructure, adding only a local blending mechanism, making it practical for cut-cell finite-volume codes.

[427]  arXiv:2605.04866 [pdf, ps, other]
Title: Phased Ultra Massive Array (PUMA)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper proposes a novel multiple-access framework, termed the phased ultra massive antenna array (PUMA), which exploits the distinctive spatial flexibility of fluid antenna systems (FAS) at the user equipment (UE). Building upon fluid antenna multiple access (FAMA) and compact ultra-massive antenna array (CUMA), PUMA incorporates a phased array for signal aggregation. This architecture enables the UE to inherently mitigate co-user interference within the spatial domain without necessitating channel state information (CSI) for precoding at the base station (BS) or complex interference cancellation at each UE. A primary advantage of PUMA lies in its hardware efficiency: by implementing phase shifting and signal combining in the analog domain, it achieves high antenna gain while requiring only a minimal number of radio-frequency (RF) chains, potentially a single RF chain. Comprehensive theoretical analysis of the achievable data rate is provided, complemented by extensive simulations that validate the framework. The results demonstrate that PUMA markedly outperforms FAMA and CUMA architectures, particularly for UEs with a single RF chain, offering a robust and scalable solution for interference-insensitive massive connectivity in sixth-generation (6G) systems.

[428]  arXiv:2605.04868 [pdf, ps, other]
Title: Not All Scaffolds Are Equal: How Initiation Mode Determines EMME Effectiveness in Debugging
Subjects: Human-Computer Interaction (cs.HC)

Adaptive learning technologies increasingly rely on real time physiological analytics to trigger instructional support automatically yet how system driven decisions interact with learners ongoing problem solving processes remains poorly understood. Eye Movement Modeling Examples have shown promise as attention guidance tools but have been studied predominantly as static instructional materials rather than as adaptive scaffolds whose timing and initiation control can vary. This study investigates whether scaffold initiation mode shapes EMME effectiveness in novice programmers debugging and specifically whether automated triggering based on a single physiological indicator of low mental effort is a viable basis for adaptive scaffold delivery. A between subjects experiment was conducted with 120 undergraduate computer science students randomly assigned to one of four conditions: teacher initiated, learner initiated, automated or no scaffold control. Participants completed ten Python debugging tasks while eye tracking data, video interaction logs and performance scores were recorded. All EMME conditions outperformed the control. However human mediated initiation whether teacher or learner consistently produced higher performance than automated triggering and more integrative engagement with the EMME material. Automated triggering based on sustained low pupillary activity was associated with disruptive behavioral patterns suggesting mistimed delivery. EMME also eliminated the performance advantage of prior programming knowledge across all initiation modes. These findings establish scaffold initiation timing and control as critical design variables for EMME and adaptive learning technologies more broadly and demonstrate that a single low effort physiological threshold is insufficient as a trigger criterion for complex problem solving support.

[429]  arXiv:2605.04870 [pdf, ps, other]
Title: VTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQA
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video text-based visual question answering (Video TextVQA) aims to answer questions by reasoning over visual textual content appearing in videos. Despite the strong multimodal video understanding capabilities of recent Video-LLMs, their performance on existing Video TextVQA benchmarks remains limited. To better understand this gap, we conduct an upper-bound analysis through frame-wise question answering, counting a sample as correct if any frame yields the right answer, which significantly outperforms direct video-based inference and reveals a substantial performance gap. The results suggest that the primary bottleneck lies in the localization of key question-relevant evidence, rather than in reasoning capacity itself. Building on this insight, we propose a question-guided agent framework that explicitly anchors the relevant keyframes before answering. The approach operates effectively in a training-free setting and consistently surpasses direct video inference. With additional supervised fine-tuning (SFT) and reinforcement learning (RL), it achieves an average improvement of +12.12 in accuracy and +11.15 in ANLS across benchmarks, establishing new state-of-the-art results. Our study underscores the critical role of explicit keyframe anchoring for advancing Video TextVQA. The code will be publicly released.

[430]  arXiv:2605.04873 [pdf, ps, other]
Title: Measuring Psychological States Through Semantic Projection: A Theory-Driven Approach to Language-Based Assessment
Subjects: Computation and Language (cs.CL)

Recent advances in natural language processing have enabled increasingly accurate estimation of psychological traits from language. However, most existing approaches rely on supervised models trained to predict questionnaire scores, limiting interpretability and generalizability across contexts. The present study introduces a theory-driven and fully unsupervised framework for measuring psychological states directly from natural language using semantic projection. Psychological constructs were operationalized as interpretable semantic axes derived from lexical anchors and items from validated clinical scales assessing depression, anxiety, and worry. Participants textual responses were embedded using Sentence-BERT and projected onto these axes to generate continuous psychological scores across multiple response formats, including selected words, generated words, phrases, and free-text responses. Projection scores were evaluated through correlations with standardized clinical measures , split-half reliability analyses, attenuation corrections, distributional similarity using Wasserstein distance, and comparisons with lexicon-based sentiment analysis (VADER). Results showed strong associations between projection scores and clinical measures, particularly for structured formats such as selected words, written words, and phrases. Free-text responses produced weaker results when analyzed as whole texts, but performance improved substantially when sentence-level aggregation strategies were applied. These findings support semantic projection as an interpretable and scalable alternative to supervised language models for psychological assessment and highlight the importance of response format and text-processing strategies in language-based mental health measurement.

[431]  arXiv:2605.04874 [pdf, ps, other]
Title: Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Direct Preference Optimization (DPO) has proven to be an effective solution for mitigating hallucination in Multimodal Large Language Models (MLLMs) by learning from preference pairs. One of its key challenges lies in how to transfer the sequence-level preference into fine-grained supervision on visual fidelity. To safeguard vision-related tokens that are prone to hallucination, existing methods typically allocate training emphasis according to the model's self-assessed visual sensitivity signals. However, such sensitivity, estimated by a model still under training, introduces self-referential bias: reinforcing already well-learned visual cues while neglecting hard-to-perceive but critical details, thereby limiting deeper alignment. In this work, we propose an Uncertainty-aware Exploratory Direct Preference Optimization (UE-DPO) method for MLLMs, which enables the model to uncover its cognitive deficiencies and actively explore for self-correction, guided by token-level epistemic uncertainty. Specifically, we first quantify the uncertainty from the model's failure to ground token predictions in the given image. Then, based on an uncertainty-aware exploration intensity, we encourage more learning pressure on visually deficient tokens in preferred samples, and alleviate the over-penalization of beneficial knowledge in dispreferred samples. Further, we provide a theoretical justification for our method, and extensive experiments demonstrate its effectiveness and robustness.

[432]  arXiv:2605.04875 [pdf, ps, other]
Title: Anticipating Innovation Using Large Language Models
Comments: 16 pages, 4 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals detectable even decades in advance. We show that signal is not attributable to any single inventor, but emerges as a collective shift in how technologies are described across thousands of patents. To this end, we introduce TechToken, a transformer-based model that treats technologies, classified by International Patent Classification codes, as words in its vocabulary, learning the language of technologies by embedding these codes during fine-tuning. We define context similarity between code embeddings as a measure of linguistic convergence and show that it accurately predicts first technological combinations. TechToken also improves general representation quality, outperforming state-of-the-art models across different patent-related tasks.

[433]  arXiv:2605.04877 [pdf, ps, other]
Title: To Fuse or to Drop? Dual-Path Learning for Resolving Modality Conflicts in Multimodal Emotion Recognition
Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Multimodal emotion recognition (MER) benefits from combining text, audio, and vision, yet standard fusion often fails when modalities conflict. Crucially, conflicts differ in resolvability: benign conflicts stem from missing, weak, or ambiguous cues and can be mitigated by cross-modal calibration, while severe conflicts arise from intrinsically contradictory (e.g., sarcasm) or misleading signals, for which forced fusion may amplify errors. Recognizing this, we propose Dual-Path Conflict Resolution (DCR), a unified framework that learns when to fuse and when to drop modalities. Path I (Affective Fusion Distiller, AFD) performs reverse distillation from audio/visual teachers to a textual student using temporally weighted class evidence, thereby enhancing representation-level calibration and improving fusion when alignment is beneficial. Path II (Affective Discernment Agent, ADA) formulates MER as a contextual bandit that selects among fusion and unimodal predictions based on a dual-view state and a calibration-aware reward, enabling decision-level arbitration under irreconcilable conflicts without requiring per-modality reliability labels. By taking into account the full multimodal context and coupling soft calibration with hard arbitration, DCR reconciles conflicts that can be aligned while bypassing misleading modalities when fusion is harmful. Across five benchmarks covering both dialogue-level and clip-level MER, DCR consistently outperforms competitive baselines or achieves highly competitive results. Further ablations, conflict-specific subset evaluation, and modality-selection analysis verify that AFD and ADA are complementary and jointly improve robust conflict-aware emotion recognition.

[434]  arXiv:2605.04880 [pdf, ps, other]
Title: A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent research has revived and amplified interest in algorithms for undiscounted average reward reinforcement learning in infinite-horizon, non-episodic (continuing) tasks. Semi-Markov decision processes (SMDPs) are of particular interest. In SMDPs, discrete actions stochastically generate both rewards and durations, and the objective is to optimize the average reward rate. Existing algorithms approach this by optimizing the ratio of rewards to durations. However, when rewards and durations are non-stationary (in the infinite horizon), this can be incorrect. This paper presents a novel modified harmonic mean operator that correctly computes reward rates even under such conditions. This yields model-free learning algorithms that can work with SMDPs, while maintaining robustness to non-stationary reward and duration distributions over time. We prove theoretical properties of the modified harmonic mean operator, and empirically demonstrate its efficacy in comparison to existing algorithms.

[435]  arXiv:2605.04881 [pdf, ps, other]
Title: From Classical to Quantum-Mechanical Data Assimilation: A Comparison between DATO and QMDA
Subjects: Computational Engineering, Finance, and Science (cs.CE); Dynamical Systems (math.DS); Atmospheric and Oceanic Physics (physics.ao-ph)

Data assimilation provides a systematic framework for combining dynamical models with partial and noisy observations to infer the evolving state of a system. In this work, we undertake a comparative study of Data Assimilation with Transfer Operators (DATO) and Quantum Mechanical Data Assimilation (QMDA), focusing on their mathematical formulation, algorithmic structure, and empirical performance. Both methods are first cast within a common operator-theoretic framework, which makes it possible to compare, on a unified basis, their representations of uncertainty, forecast propagation, and assimilation updates. We then analyse their principal similarities and differences with respect to state-space structure, update mechanisms, structural preservation properties, and computational cost. To complement the theoretical analysis, we assess both approaches on benchmark dynamical systems across a range of observational settings, including noisy, sparse, and partially observed regimes. Our results show that, despite their shared operator-theoretic motivation, DATO and QMDA embody substantially different assimilation paradigms, leading to distinct advantages and limitations in terms of interpretability, robustness, and scalability. The present study helps delineate the regimes in which each framework is most effective and offers broader insight into the design of operator-based methodologies for data assimilation.

[436]  arXiv:2605.04882 [pdf, ps, other]
Title: FairEnc: A Fair Vision-Language Model with Fair Vision and Text Encoders for Glaucoma Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)

Automated glaucoma detection is critical for preventing irreversible vision loss and reducing the burden on healthcare systems. However, ensuring fairness across diverse patient populations remains a significant challenge. In this paper, we propose FairEnc, a fair pretraining method for vision-language models (VLMs) that enables simultaneous debiasing across multiple sensitive attributes. FairEnc jointly mitigates biases in both textual and visual modalities with respect to multiple sensitive attributes, including race, gender, ethnicity, and language. Specifically, for the textual encoder, we leverage a large language model to generate synthetic clinical descriptions with varied sensitive attributes while preserving disease semantics, and employ a contrastive alignment objective to encourage demographic-invariant representations. For the visual encoder, we propose a dual-level fairness strategy that combines mutual information regularization to reduce statistical dependence between learned features and demographic groups, with multi-discriminator adversarial debiasing. Comprehensive experiments on the publicly available Harvard-FairVLMed dataset demonstrate that FairEnc effectively reduces demographic disparity as measured by DPD and DEOdds while achieving strong diagnostic performance under both zero-shot and linear probing evaluations. Additional experiments on the private FairFundus dataset show that FairEnc consistently preserves fairness advantages under cross-domain and cross-modality settings and maintains diagnostic performance within a competitive range. These results highlight FairEnc's ability to generalize fairness under distribution shifts, supporting its potential for more equitable deployment in real-world clinical settings. Our codebase and synthetic clinical notes are available at https://github.com/Mohamed-Elhabebe/FairEnc

[437]  arXiv:2605.04885 [pdf, ps, other]
Title: A Comparative Study of PyCaret AutoML and CNN-BiLSTM for Binary Hate Speech Detection in Indonesian Twitter
Comments: 8 pages, 3 figures, and 1 table in the current manuscript. The paper presents a comparative study of PyCaret AutoML and CNN-BiLSTM for binary hate speech detection in Indonesian Twitter
Subjects: Computation and Language (cs.CL)

This paper compares a PyCaret AutoML branch and a CNN-BiLSTM branch for binary hate speech detection on Indonesian Twitter using the HS label from the corpus of Ibrohim and Budi. Both branches share the same preprocessing pipeline so that the comparison reflects modelling differences rather than inconsistent data preparation. The conventional branch uses TF-IDF with a lexicon-based abusive-word count, whereas the neural branch learns dense token representations and captures both local phrase patterns and bidirectional context. The benchmark is built from the released 13,130-row annotation table, whose HS label yields a 58:42 class ratio. On the held-out split, CNN-BiLSTM achieves the best result with 83.8% accuracy, 79.8% precision, 82.7% recall, and 81.2% F1-score. Within the PyCaret branch, Random Forest is the strongest conventional model with 77.2% accuracy and 77.0% F1-score. The neural branch therefore improves accuracy by 6.6 points and F1-score by 4.2 points. Exploratory corpus analysis, learning curves, and confusion matrices show that the dataset is short-text, moderately imbalanced, and still difficult because many decisions depend on local lexical cues plus short contextual composition. The study concludes that PyCaret AutoML is an effective conventional benchmarking framework, whereas CNN-BiLSTM is the stronger end model for the reported benchmark setting.

[438]  arXiv:2605.04886 [pdf, ps, other]
Title: BenCSSmark: Making the Social Sciences Count in LLM Research
Comments: 12 pages, Accepted to LREC 2026
Subjects: Computation and Language (cs.CL)

This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks -- standardized tools for assessing computational systems -- are pivotal in the development of artificial intelligence (AI), including large language models (LLMs). Benchmarks do more than measure progress -- they actively structure it, shaping reputations, research agendas, and commercial outcomes. Despite this central role, the social sciences are largely absent from mainstream evaluation frameworks, even though scholars in these fields generate dozens of rigorously annotated, context-sensitive datasets each year. Integrating this work into benchmark design could significantly improve the generalization and robustness of AI models. In turn, models trained on social scientific tasks would likely yield better performance on classic and contemporary tasks in disciplines as diverse as history, sociology, political science or economics. This is all the more pressing as these disciplines are quickly turning to LLMs for assistance. To address this gap, we introduce BenCSSmark, a benchmark composed of datasets annotated by computational social scientists. By integrating social scientific perspectives into benchmarking, BenCSSmark seeks to promote more robust, transparent, and socially relevant AI systems and to foster efficient collaboration.

[439]  arXiv:2605.04887 [pdf, ps, other]
Title: Sentiment Analysis and Customer Satisfaction Prediction on E-Commerce Platforms Based on YouTube Comments Using the XGBoost Algorithm
Comments: 5 pages, 10 figures
Subjects: Computation and Language (cs.CL)

The exponential expansion of digital commerce in Indonesia has significantly shifted consumer interactions toward video-centric social networks, particularly YouTube. Consequently, the sheer volume of unstructured, multi-contextual comments poses a tremendous challenge for manual sentiment tracking. This study investigates and constructs a predictive model for customer satisfaction leveraging the Extreme Gradient Boosting (XGBoost) architecture coupled with Term Frequency-Inverse Document Frequency (TF-IDF) vectorization. By utilizing a secondary dataset of YouTube comments retrieved from e-commerce review videos, the raw text underwent rigorous preprocessing to generate normalized numerical features. The experimental results demonstrate that the PyCaret-optimized machine learning framework delivers superior classification resilience. Beyond standard performance metrics, lexical evaluations and feature-importance mapping uncover a notable phenomenon: e-commerce discourse is heavily infiltrated by socio-political terminologies, which ultimately influence the polarity of audience satisfaction.

[440]  arXiv:2605.04888 [pdf, ps, other]
Title: A Comparative Analysis of Machine Learning and Deep Learning Models for Tweet Sentiment Classification: A Case Study on the Sentiment140 Dataset
Comments: 8 pages, 3 figures, 3 tables. Comparative study of Logistic Regression and BiLSTM for tweet sentiment classification on a 10,000-sample subset of the Sentiment140 dataset. Includes Streamlit/Hugging Face deployment
Subjects: Computation and Language (cs.CL)

The exponential growth of social media has created an urgent need for automated systems to analyze unstructured public sentiment in real time. This study compares a traditional Logistic Regression model using TF-IDF features with a deep learning Bidirectional Long Short-Term Memory (BiLSTM) architecture on a 10,000-tweet subset of the Sentiment140 dataset. Experimental results show that Logistic Regression outperformed BiLSTM, achieving an accuracy of 73.5% compared with 69.17%, while the deep learning model exhibited mild overfitting. These findings suggest that for medium-scale informal text data, classical machine learning with robust feature extraction can outperform more complex deep learning approaches. Finally, the trained models were integrated into an interactive web application using Streamlit and deployed on Hugging Face Spaces for public access.

[441]  arXiv:2605.04891 [pdf, ps, other]
Title: ADMM-based decomposed DNN+RLT Relaxations for Completely Positive Models in Electricity Market Clearing
Subjects: Systems and Control (eess.SY)

The day-ahead electricity market clearing with nonconvex order types can be formulated as a mixed-integer linear program (MILP), but its LP relaxation may provide weak bounds, and exact solutions can become computationally intractable in large-scale or extended market settings. We study a welfare-maximizing clearing model with elementary hourly orders, block orders with logical acceptance constraints, and flexible hourly orders. Starting from a compact MILP formulation, we derive an equivalent completely positive programming (CPP) reformulation via matrix lifting and propose relaxed CPP variants that further reduce the modeling burden while maintaining strong bounds. We then develop tractable doubly nonnegative (DNN) relaxations, including decomposed formulations that exploit the problem structure by using smaller positive semidefinite matrices. To further strengthen these bounds, we introduce reformulation-linearization technique (RLT) inequalities tailored to the decomposed structure. To tackle the challenge of large-scale DNNs, we design an alternating direction method of multipliers (ADMM) with adaptive penalty updates and rigorous dual lower bounds, enabling certified early termination. Computational experiments on synthetic instances show that the proposed DNN+RLT relaxations substantially tighten LP bounds, while decomposition and first-order methods significantly reduce computational effort.

[442]  arXiv:2605.04893 [pdf, ps, other]
Title: Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics
Comments: 42 pages, 6 figures, 3 tables; 82-page online supplement (proofs, additional experiments, dataset statistics) as an ancillary file
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation-blind (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a quantitative converse establishing the asymmetry coefficient $G$ as the unique control parameter for direction.
Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $\phi \ge 1/5$ with worst cut at $t^\ast/n \approx 0.32$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. The resulting two-axis diagnostic ($\phi$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (LC-AUROC from 0.62 to 0.84) on tested models up to 8B parameters, with polarity reversing as predicted between HaluEval and MedHallu.

[443]  arXiv:2605.04894 [pdf, ps, other]
Title: SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs
Subjects: Software Engineering (cs.SE)

Enterprises want AI code completion that is both high-quality and private, but they face a tension: proprietary models yield better results yet risk exposing proprietary code, while self-hosting large models is expensive and hard to maintain. As a lighter alternative, small CodeLLMs (1B-3B) can run on a developer's workstation accelerator with code never leaving the machine, but they fail on harder tasks. A practical solution is to use the small model for most requests and selectively route difficult ones to a larger self-hosted model. In this study, we evaluate 29 code specialized LLMs (0.5B-480B) from 12 families on execution-based fill-in-the-middle (FIM) code completion benchmarks across Python, Java, and C++, and find that model family and code specialized training matter more than size: a 3B model matches a 32B model despite being 10x smaller. Analyzing the 3B model's failures, we discover that 46% of its incorrect completions are not valid code. To enable efficient code completion, we propose SynConfRoute, a training-free method that combines token confidence with syntax validation to automatically decide per-request whether to keep the local completion or escalate to a larger self-hosted model. SynConfRoute improves pass@1 by 6.4% over confidence only routing on routine completions and by up to 31% on harder multi-language tasks, and the resulting pipeline achieves 78.9% on routine completions, 7.4% higher than always using the 480B model alone, while reducing accelerator usage by 58%. SynConfRoute generalizes across Python, Java, and C++, improving over confidence only routing on all three languages without ever rejecting a correct local completion. The pipeline uses off-the-shelf models with no custom training, making it immediately deployable in practice.

[444]  arXiv:2605.04895 [pdf, ps, other]
Title: Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization
Authors: Noel Thomas
Comments: 42 pages, 9 figures. NeurIPS 2026 submission
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and metric. An audit of 40 transfer-BO papers from NeurIPS, ICML, ICLR, AISTATS, UAI, TMLR, JMLR, and AutoML-Conf (2022-2025) finds that 98% never vary B/|A| as a controlled axis. On the same GDSC2 benchmark, changing only the budget reverses the ranking: at B=50, Greedy outperforms UCB by 0.050 Hit@1, while at B=100, UCB outperforms Greedy by 0.035. We capture this transition with the Portable Regime Score PRS=(B/|A|)(1-rho), where rho is the prior rank correlation and can be estimated from pilot contexts before the main comparison. Across 79 conditions spanning chemistry, drug-response biology, and HPO, a hierarchical model gives beta=0.50 (p=1.1e-9), and 19% of conditions fall in an equivalence zone where |advantage|<0.01 Hit@1. In five published reversal cases, PRS predicts the winner from pre-comparison observables. A No-Free-Leaderboard proposition explains why unconditional rankings are unstable: when CATE changes sign across regimes, the reported ATE becomes a function of benchmark mixture. RegimePlanner, which estimates rho online and switches acquisition accordingly, wins all 16 HPO-B search spaces at B=100 and exceeds the matched {Greedy,UCB} per-context oracle on GDSC2 by 18%. Pre-registered predictions achieve 27/40=67.5% overall accuracy and above 90% within EMA prior families. The practical protocol is simple: report B/|A|, rho, K, and metric alongside any claimed acquisition advantage.

[445]  arXiv:2605.04897 [pdf, ps, other]
Title: Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall
Comments: 17 pages, 4 figures, 7 tables. Technical report
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Extraction at ingestion is the wrong primitive for agent memory: content discarded before the query is known cannot be recovered at retrieval time. We propose True Memory, a six-layer architecture that shifts the center of the system from a storage schema to a multi-stage retrieval pipeline operating over events preserved verbatim. The full system runs as a single SQLite file on commodity CPU with no external database, vector index, graph store, or GPU. On LoCoMo (1,540 questions across 10 multi-session conversations), True Memory Pro reaches 93.0% accuracy (3-run mean) against 61.4% for Mem0, 65.4% for Supermemory, approximately 71% for Zep, and 94.5% for EverMemOS under a matched gpt-4.1-mini answer model. On LongMemEval (500 questions), True Memory Pro reaches 87.8% (3-run mean). On BEAM-1M (700 questions at the 1-million-token scale), True Memory Pro reaches 76.6% (3-run mean), above the prior published result of 73.9% for Hindsight. A 56-configuration ablation shows a 1.3-percentage-point spread within the top-performing configuration family.

[446]  arXiv:2605.04899 [pdf, ps, other]
Title: A geometric relation of the error introduced by sampling a language model's output distribution to its internal state
Comments: 12 Pages, 10 Figures, 2 Appendices. To appear in Proceedings of ICML 2026
Subjects: Machine Learning (cs.LG)

GPT-style language models are sensitive to single-token changes at generation points where the predicted probability distribution is spread across multiple tokens. Viewing this sensitivity as a geometric property, we derive an $\mathfrak{so}(n)$-valued 1-form that depends only on the geometry of the token embeddings. Despite this purely geometric origin, we show that its curvature is semantically meaningful: On chess reasoning tasks, the curvature couples to the world model of an off-the-shelf instruction-tuned model, with transformations clustering by board region and respecting piece importance. Our findings suggest that token space geometry directly reflects how models internally represent problems.

[447]  arXiv:2605.04901 [pdf, ps, other]
Title: On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
Comments: Accepted by ACL 2026
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

For Transformer models, cryptographically secure inference ensures that the client learns only the final output, while the server learns nothing about the client's input. However, securely computing nonlinear layers remains a major efficiency bottleneck due to the substantial communication rounds and data transmission required. To address this issue, prior works reveal intermediate activations to the client, allowing nonlinear operations to be computed in plaintext. Although this approach significantly improves efficiency, exposing activations enables adversaries to extract model weights. To mitigate this risk, existing works employ a shuffling defense that reveals only randomly permuted activations to the client. In this work, we show that the shuffling defense is not as robust as previously claimed. We propose an attack that aligns differently shuffled activations to a common permutation and subsequently exploits them to extract model weights. Experiments on Pythia-70m and GPT-2 demonstrate that the proposed attack can align shuffled activations with mean squared errors ranging from $10^{-9}$ to $10^{-6}$. With a query cost of approximately \$1, the adversary can recover model weights with L1-norm differences ranging from $10^{-4}$ to $10^{-2}$ compared to the oracle weights.

[448]  arXiv:2605.04902 [pdf, ps, other]
Title: A Hierarchical Agent System with Reinforcement Learning for Multivariate Time Series Data Cleaning
Subjects: Databases (cs.DB)

Multivariate time series (MTS) are frequently affected by co-occurring quality issues, such as missing values, outliers, and constraint violations, which significantly undermine downstream analytics. Existing cleaning approaches fix only a limited set of such issues, making them ill-suited for scenarios where multiple quality problems arise simultaneously. Furthermore, these methods commonly depend on the availability of ground truth data or domain-specific rules, both of which are rarely accessible in real-world applications.
In this paper, we introduce \sys, an agent system with reinforcement learning designed to clean multiple data quality issues in MTS. We cast the cleaning process as a joint optimization problem that simultaneously handles quality issue order and cleaning model selection, allowing efficient navigation of the large space of possible cleaning pipelines. Our framework relies on a hierarchical agent architecture, where a high-level agent determines the order in which data quality issues should be processed, while a low-level agent identifies the most suitable cleaning method for each issue. To guide the agent toward an optimal cleaning pipeline, we propose a dual-stage reward mechanism that couples upstream (cleaning) and downstream performance, enabling effective optimization without relying on ground truth. Our experimental results show that \sys consistently outperforms existing methods, achieving up to 96\% improvement in data cleaning quality and 27\% improvement in downstream performance.

[449]  arXiv:2605.04903 [pdf, ps, other]
Title: Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs
Comments: 19 pages, 4 figures, 7 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Large language models (LLMs) show strong potential for neural architecture generation, yet existing approaches produce complete model implementations from scratch -- computationally expensive and yielding verbose code. We propose Delta-Code Generation, where fine-tuned LLMs generate compact unified diffs (deltas) to refine baseline architectures rather than synthesizing entire models. Our pipeline iteratively fine-tunes the LLM via LoRA on curated architectures from the LEMUR dataset, with MinHash-Jaccard novelty filtering for structural diversity. We evaluate three 7B-class LLMs -- DeepSeek-Coder-7B, Qwen2.5-Coder-7B, and Mistral-7B -- across six datasets (CIFAR-10, CIFAR-100, MNIST, SVHN, ImageNette, CelebA) using a 22-cycle protocol (1,100 candidates per LLM). All three substantially surpass the full-generation baseline (50.6% valid rate, 42.3% mean first-epoch accuracy): DeepSeek-Coder reaches 75.3% valid rate and 65.8% mean accuracy; Qwen2.5-Coder 72.1%/64.6%; Mistral 66.6%/66.1%. On CIFAR-10, best first-epoch accuracies reach 85.5% (Mistral), 85.2% (DeepSeek), 80.6% (Qwen) -- well above 63.98% full generation and 71.5% for the concurrent approach of Gu et al. Output lengths are 30-50 lines versus 200+ for full generation (75-85% reduction). A 50-epoch study confirms the 1-epoch proxy preserves rankings (Mistral: Spearman $\rho$ = 0.926). Delta-based generation is a token-efficient, multi-domain, LLM-agnostic alternative to full-model synthesis for LLM-driven NAS.

[450]  arXiv:2605.04904 [pdf, ps, other]
Title: Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we explore deep learning techniques for individual identification of animals based on their skin patterns. Individual identification is crucial in biodiversity monitoring, since it enables analysis of decline or growth of populations, or intra-species interactions within populations. Models trained for the task of individual identification often do not focus on the skin pattern of animals, but on background details or body shape details. These characteristics are not individually specific, or can change drastically through time. We focus on techniques that will make machine learning models more responsive to skin pattern structure when extracting individual visual embeddings from images. For this, we explore image inpainting of task-specific masks as an auxiliary task to enhance ML-based individual identification from animal skin patterns. We propose a comparative analysis among four models as an encoder backbone for the individual identification task. We focus on the case study of zebrafish, which is a widely recognized biological model organism, and which exhibits individually identifying skin patterns. To evaluate encoder backbone performance, we present standard metrics for classification accuracy, embedding clustering metrics, and GradCAM visualizations.

[451]  arXiv:2605.04905 [pdf, ps, other]
Title: Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features
Subjects: Machine Learning (cs.LG); Databases (cs.DB)

Electrospinning is a highly sensitive fabrication process in which small variations in operating parameters can significantly influence fiber morphology and material performance. Machine learning (ML) methods are increasingly employed to model these process-structure relationships and to identify the relative importance of processing variables. However, most existing studies rely on a single ML model, implicitly assuming that the resulting feature importance is robust and reproducible. In this study, the consistency of feature importance across multiple ML model families was systematically evaluated using a curated dataset of 96 polyvinyl alcohol (PVA) electrospinning experiments. Twenty-one ML models representing linear, tree-based, kernel-based, neural network, and instance-based approaches were trained and compared. To provide a unified interpretability framework, SHAP (SHapley Additive exPlanations) values were used to calculate feature importance consistently across all models. A rank-based statistical analysis was then performed to quantify inter-model agreement and assess the robustness of parameter rankings. The results demonstrate that predictive performance and interpretive reliability are fundamentally distinct properties. Although several models achieved comparable predictive accuracy, substantial differences were observed in their feature importance rankings. Solution concentration emerged as the most robust and consistently influential parameter (variability = 0), whereas flow rate and applied voltage exhibited high ranking variability (variability > 0.9), indicating strong model dependence. These findings suggest that feature importance derived from a single ML model may be unreliable, particularly for small experimental datasets, and highlight the importance of cross-model validation for achieving trustworthy interpretation in ML-assisted electrospinning research.

[452]  arXiv:2605.04906 [pdf, ps, other]
Title: Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games
Subjects: Artificial Intelligence (cs.AI)

While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other agents brings significant challenges on the evaluation of the reasoning process and the credit assignment over multiple reasoning steps. Existing single-agent reinforcement learning (RL) approaches and their multi-agent extensions fail to address these challenges as they do not incorporate other agents in the reasoning process. In this work, we propose Strat-Reasoner, a novel RL-based framework that improves LLMs' strategic reasoning ability in multi-agent games. We introduce a novel recursive reasoning paradigm where an agent's reasoning also integrates other agents' reasoning processes. To provide effective reward signals for the intermediate reasoning sequences, we employ a centralized Chain-of-Thought (CoT) comparison module to evaluate the reasoning quality. Finally, we compute an accurate hybrid advantage and develop a group-relative RL approach to optimize the LLM policy. Experimental results show that Strat-Reasoner substantially improves strategic abilities of underlying LLMs, achieving 22.1\% average performance improvements across various multi-agent games.

[453]  arXiv:2605.04908 [pdf, ps, other]
Title: Curated AI beats frontier LLMs at pharma asset discovery
Comments: 5 pages, 5 figures, 1 table
Subjects: Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

General-purpose LLMs with web search are increasingly used to scout the competitive landscape of pharmaceutical pipelines. We benchmark Gosset -- an AI platform with a chat interface backed by curated target-, modality-, and indication-level drug-asset annotations -- against four frontier systems with web access (Claude Opus 4.7, GPT 5.5, Gemini 3.1 Pro, Perplexity sonar-pro) on ten niche oncology/immunology targets where most of the pipeline lives in the long tail of preclinical and Asian-developed assets. All five systems receive the same natural-language query and the same JSON output schema. Across 10 targets Gosset returns 3.2x more verified drugs per query than the best frontier system, at perfect precision and 100% recall against the cross-system union of verified drugs. The same curated index is exposed as a Gosset MCP server that any frontier model can call as a tool, suggesting that each of these systems can close most of the recall gap by swapping generic web search for a curated index behind the same chat interface.

[454]  arXiv:2605.04911 [pdf, ps, other]
Title: Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning
Subjects: Machine Learning (cs.LG)

Tabular data synthesis aims to generate high-quality data while preserving privacy. However, we find that existing tabular generative models exhibit a clear tradeoff in the small-data regime: improving data quality typically comes at the cost of increased memorization of training samples, thereby weakening privacy protection. This tradeoff arises because small training sets make it difficult for dataset-specific generative models to distinguish generalizable structure from sample-specific patterns. To address this, we propose DiffICL, which formulates tabular data generation as an in-context learning problem. Instead of fitting each dataset from scratch,DiffICL leverages pretrained structural priors learned from a large collection of datasets, enabling it to infer data distributions from limited context rather than memorizing individual samples. We evaluate DiffICL on 14 real-world datasets. Results show that DiffICL improves both data quality and privacy, and generate synthetic data that provides effective data augmentation. Our findings suggest that the quality-privacy tradeoff can be improved through better training paradigms.

[455]  arXiv:2605.04913 [pdf, ps, other]
Title: Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
Comments: 33pages
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-gradient access to pretrained representations. We argue that this full-depth backward coupling can be unnecessarily expensive and intrusive, particularly when post-training supervision is much narrower than pre-training. To this end, we propose \textbf{LoPT}: Local-Learning Post-Training, a simple post-training strategy that makes gradient reach an explicit design choice. LoPT places a single gradient boundary at the transformer midpoint: the second-half block learns from the task objective, while the first-half block is updated by a lightweight feature-reconstruction objective to preserve useful representations and maintain interface compatibility. LoPT shortens the task-induced backward path while limiting direct interference from narrow task gradients on early-layer representations. Extensive experiments demonstrate that LoPT achieves competitive performance with lower memory cost, higher training efficiency and better retention of pretrained capabilities. Our code is available at: https://github.com/HumyuShi/LoPT

[456]  arXiv:2605.04916 [pdf, ps, other]
Title: A Foundation Model for Zero-Shot Logical Rule Induction
Authors: Yin Jun Phua
Comments: Camera-ready version accepted at IJCAI 2026, with full appendices
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Symbolic Computation (cs.SC)

Inductive Logic Programming (ILP) learns interpretable logical rules from data. Existing methods are transductive: their learned parameters are bound to specific predicates and require retraining for each new task. We introduce Neural Rule Inducer (NRI), a pretrained model for zero-shot rule induction. Rather than encoding literal identities, NRI represents literals using domain-agnostic statistical properties such as class-conditional rates, entropy, and co-occurrence, which generalize across variable identities and counts without retraining. The model consists of a statistical encoder and a parallel slot-based decoder. Parallel decoding preserves the permutation invariance of logical disjunction; an autoregressive decoder would instead impose an arbitrary clause order. Product T-norm relaxation makes rule execution differentiable, allowing end-to-end training on prediction accuracy alone. We evaluate NRI on rule recovery, robustness to label noise and spurious correlations, and zero-shot transfer to real-world benchmarks, and we believe this work opens up the possibility of foundation models for symbolic reasoning. Code and the reference checkpoint are available at https://github.com/phuayj/neural-rule-inducer.

[457]  arXiv:2605.04917 [pdf, ps, other]
Title: Koopman Identification of Nonlinear Systems via Reservoir Liftings
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

Learning tractable linear representations of nonlinear dynamical systems via Koopman operator theory is often hindered by dictionary selection, temporal memory encoding, and numerical ill-conditioning. Inspired by Reservoir Computing (RC) paradigm, this paper introduces the RC-Koopman framework, which interprets reservoir as a stateful, finite-dimensional Koopman dictionary whose temporal depth is explicitly controlled by its spectral radius. We show that the Echo State Property (ESP) guarantees well-posedness and favorable numerical conditioning of the lifted Koopman approximation. A correlation-based spectral radius selection algorithm aligns reservoir memory with dominant system timescales. Analysis reveals how the finite memory of the reservoir determines which Koopman eigenfunctions remain observable from the lifted features. Evaluation on synthetic benchmarks demonstrates that RC-Koopman achieves a favorable balance between reconstruction accuracy of the underlying nonlinear dynamics and dynamical stability, compared to Extended Dynamic Mode Decomposition (EDMD) and Hankel-based lifting approaches. Code available at: https://github.com/NEAR-the-future/RC-Koopman.git

[458]  arXiv:2605.04920 [pdf, ps, other]
Title: Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization
Authors: Xiyan Fu, Wei Liu
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Compositional generalization refers to correctly interpret novel combinations of known primitives, which remains a major challenge. Existing approaches often rely on supervised fine-tuning, which encourages models to imitate target outputs. This token-level training paradigm fails to capture the global compositional structure required for generalizing to unseen combinations. In this work, we investigate whether compositional generalization can instead be improved through outcome-level reinforcement learning. We adopt Group Relative Policy Optimization to optimize models based on feedback on their final outputs. Within this framework, we explore both a simple binary outcome reward and a composite reward that provides additional composition feedback. Experiments on multiple compositional benchmarks show that reinforcement learning improves compositional generalization compared to supervised fine-tuning. Further analysis reveals that supervised models tend to overfit frequent training compositions, whereas reinforcement learning improves compositional generalization by reshaping the output distribution, particularly for more complex composition types.

[459]  arXiv:2605.04922 [pdf, ps, other]
Title: Evolving Idea Graphs with Learnable Edits-and-Commits for Multi-Agent Scientific Ideation
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)

LLM-empowered multi-agent systems offer new potential to accelerate scientific discovery by generating novel research ideas. However, existing methods typically coordinate agents through temporary texts, such as drafts or chat logs; it is difficult to pinpoint the weaknesses in the generated ideas and how the agents refine them. To this end, we introduce \textbf{Evolving Idea Graphs} (EIG), a graph-based multi-agent scientific ideation framework that can generate high-performance research ideas across various benchmark-native metrics, such as novelty, feasibility, and clarity. Instead of coordinating solely through texts, EIG represents a partially formed proposal as an evolving idea graph, where nodes capture scientific claims and edges encode relations (e.g., support and conflict), enabling unresolved weaknesses to remain identifiable throughout the idea evolving process. Specifically, a learned two-head controller operates over the evolving graph to guide the ideation: one head selects graph edits for agents to execute, while the other decides when the graph is ready for commit as final proposal synthesis. On AI Idea Bench 2025 and LiveIdeaBench, EIG outperforms all compared systems on both automatic benchmark scores and blind expert ratings. Ablations further show that explicit graph state provides the main performance gains, and learned edit-and-commit control adds consistent improvements.

[460]  arXiv:2605.04926 [pdf, ps, other]
Title: Unintended Negative Impacts of Promotional Language in Patent Evaluation
Subjects: Computation and Language (cs.CL)

Promotional language has been increasingly used to aid the communication of innovative ideas in science. Yet, less is known about its role in the context of technological innovation. Here, we use a validated and domain-diagnosed lexicon of 135 promotional words to study the association between promotional language and patent evaluation outcomes among 2.7 million USPTO patent applications. Our large-scale study reveals three unexpected findings. First, in contrast to scientific evaluation, we find that a higher frequency of promotional words is negatively associated with the probability of an application being (i) granted a patent, (ii) transferred ownership, and (iii) successfully appealed. This promotional penalty holds even after accounting for a range of confounding factors and is largely robust across different technological areas. Among matched samples, the difference in the success rate between the lowest and highest promotional density quintile is 5.5, 5.9, and 5.3 percentage points for patentability, transferability, and rejection reversal. Second, contrary to institutional skepticism, we show that promotional language is not a mask of weak technology, but objectively reflects the degree of combinatorial novelty and future citation impact. Third, digging into the mechanisms, we find that the tolerance to promotional framing is strongly moderated by human factors, with men and experienced examiners showing a higher acceptance of promotional narratives than women and novice examiners. By revealing an emerging paradox in the patent system, our study offers theoretical and practical implications for improving patent evaluation through more objective scrutiny of linguistic patterns in patent filings.

[461]  arXiv:2605.04930 [pdf, ps, other]
Title: When Does Gene Regulatory Network Inference Break? A Controlled Diagnostic Study of Causal and Correlational Methods on Single-Cell Data
Comments: 19 pages, 10 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

Despite theoretical advantages, causal methods for Gene Regulatory Network (GRN) inference from single-cell RNA-seq data consistently fail to match or outperform correlation-based baselines in many realistic benchmarks, a persistent puzzle which casts doubt on the value of causality for this task. We argue that existing benchmarks are insufficiently controlled to answer this question because they evaluate on real or semi-real data where multiple pathologies co-occur, confounding failure modes, and obscuring the specific conditions under which different inference methods excel or fail. To address this gap, we introduce a controlled diagnostic framework that isolates seven biologically motivated pathologies (dropout, latent confounders, cell-type mixing, feedback loops, network density, sample size, and pseudotime drift) and measure how six representative methods spanning three inference paradigms degrade as each pathology intensifies. Across 6,120 controlled experiments, we find that causal methods genuinely dominate in clean and structurally favorable regimes, but specific pathologies (notably dropout and latent confounders) selectively neutralize their advantages. We further introduce an error-type decomposition that reveals methods with similar aggregate accuracy commit qualitatively different errors. To probe whether single-pathology effects persist when multiple stressors co-occur, we perform an interaction sweep over the three most impactful pathologies and find that their joint effects are sub-additive, while also exposing density-conditional cross-overs invisible to single-dial analysis. Our findings offer a nuanced understanding of when and why different methods succeed or fail for GRN inference, providing actionable insights for method development and practical guidance for practitioners.

[462]  arXiv:2605.04933 [pdf, ps, other]
Title: Interaction Tree Semantics for RISC-V: Bridging Compiler and Hardware Verification
Subjects: Programming Languages (cs.PL)

The Instruction Set Architecture (ISA) is the contract between compilers and processors; proving this contract formally demands cross-level connection to existing mechanized compilers and hardware implementations. As an open, modular ISA gaining adoption across embedded, mobile, and cloud platforms, RISC-V makes a formally verified ISA specification particularly valuable. However, existing formal RISC-V specifications focus on hardware tooling rather than cross-level verification: they provide no machine-checked instruction-level properties and lack support for verifying this contract across levels.
We address these limitations with a formal semantics of the RISC-V ISA in Rocq, built on Interaction Trees (ITrees). By leveraging ITree bisimulation and refinement, our semantics enables cross-level verification from compiler IR to hardware within a single framework. Our formalization covers a wide spectrum of RISC-V extensions. The correctness of individual instruction semantics is backed by machine-checked lemmas in Rocq. We further validate it by extracting an executable simulator that passes all standard RISC-V test suites. Three case studies demonstrate the effectiveness of our semantics for cross-level verification: first, we prove semantic equivalence via bisimulation between LLVM IR and RISC-V code on an array access pattern via Vellvm (LLVM ITree semantics); second, we apply translation validation to a specific instruction reordering for macro-operation fusion, distinguishing safe reorderings from those that break program-counter-relative addressing; third, we prove that a K\^oika hardware ALU correctly implements all R-type integer operations (e.g., ADD, SUB, AND) against our ISA contract.

[463]  arXiv:2605.04939 [pdf, ps, other]
Title: Modular Reinforcement Learning For Cooperative Swarms
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

A cooperative robot swarm is a collective of computationally-limited robots that share a common goal. Each robot can only interact with a small subset of its peers, without knowing how this affects the collective utility. Recent advances in distributed multi-agent reinforcement learning have demonstrated that it is possible for robots to learn how to interact effectively with others, in a manner that is aligned with the common goal, despite each robot learning independently of others. However, this requires each robot to represent a potentially combinatorial number of interaction states, challenging the memory capabilities of the robots. This paper proposes an alternative approach for representing spatial interaction states for multi-robot reinforcement learning in swarms. A modular (decomposed) representation is used, where each feature of the state is handled by a separate learning procedure, and the results aggregated. We demonstrate the efficacy of the approach in numerous experiments with simulated robot swarms carrying out foraging.

[464]  arXiv:2605.04941 [pdf, ps, other]
Title: UFAL-CUNI at SemEval-2026 Task 11: An Efficient Modular Neuro-symbolic Method for Syllogistic Reasoning
Comments: Accepted at SemEval-2026
Subjects: Computation and Language (cs.CL)

This paper describes our system submitted to SemEval-2026 Task 11: Disentangling Content and Formal Reasoning in Large Language Models. We present an efficient modular neuro-symbolic approach, combining a symbolic prover with small reasoning LLMs (4B parameters). The system consists of an LLM-based parser that translates natural language syllogisms to a first-order logic (FOL) representation, an automated theorem prover, and two optional modules: machine translation for multilingual inputs and a symbolic retrieval component for the identification of relevant premises. The system achieves competitive accuracy and relatively low content effect on most subtasks. Our ablations show that this approach outperforms LLM-based zero-shot baselines in this parameter size range, but also reveal limited multilingual capabilities of small LLMs. Finally, we include a discussion of the task's main ranking metric and analyze its limitations.

[465]  arXiv:2605.04943 [pdf, ps, other]
Title: DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring
Comments: 18 pages, 8 figures, 9 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The condition monitoring (CM) of synthetic fibre ropes (SFRs) used in offshore, maritime, and industrial settings demands more than a classifier: inspectors need continuous severity estimates, maintenance recommendations, anomaly flags, deterioration timelines, and automated reports, all from a single inspection image. We present DART (Damage Assessment via Rope Transformer), a vision-language foundation model that addresses the full rope inspection workflow through a unified multi-task architecture. DART extends the Joint-Embedding Predictive Architecture (JEPA) to the cross-modal domain by coupling a Vision Transformer (ViT-H/14) with Llama-3.2-3B-Instruct via a Severity-Conditioned Cross-Modal Fusion (SC-CMF) module. Three architectural innovations drive the model's versatility: (1) HD-MASK, a saliency-guided masking strategy that focuses self-supervised reconstruction on damage-dense patches; (2) per-class learnable severity gates that adaptively weight language grounding by damage category; and (3) a Contrastive Damage Disentanglement (CDD) loss that shapes the embedding space to simultaneously encode damage type, severity ordering, and cross-modal semantics. Trained once on 4,270 images spanning 14 fine-grained rope damage classes, the frozen DART backbone supports downstream tasks without any task-specific fine-tuning: damage classification (93.22 % accuracy, 91.04 % macro-F1, +38.5 pp over a vision-only baseline), continuous severity regression (Spearman rho = 0.94, within-1-ordinal accuracy 99.6 %), few-shot recognition (89.2 % macro-F1 at 20 shots). These results demonstrate that DART functions as a general-purpose CM backbone that goes well beyond classification, providing actionable inspection intelligence from a single shared representation.

[466]  arXiv:2605.04946 [pdf, ps, other]
Title: Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Batch normalization (BN) is central to modern deep networks, but its effect on the realized function during training remains less understood than its optimization benefits. We study training-time BN in continuous piecewise-affine (CPA) networks through the geometry of switching hyperplanes and the induced affine-region partition. Conditioned on a mini-batch, we show that BN defines for each neuron a reference hyperplane through the batch centroid, and that breakpoint-switching hyperplanes are parallel translates whose offsets are expressed in batch-standardized coordinates and are independent of the raw bias. This yields an exact criterion for when a switching hyperplane intersects a local $\ell_\infty$ window and motivates a local region-density functional based on exact affine-region counts. Under explicit sufficient conditions, we show that BN increases expected local partition refinement in ReLU and more general piecewise-affine networks, and that this mechanism transfers locally through depth inside parent affine regions where the upstream representation map is an affine embedding. These results provide a function-level geometric account of training-time BN as a batch-conditional recentering mechanism near the data.

[467]  arXiv:2605.04947 [pdf, ps, other]
Title: Conflict Essences for Transformation Rules with Nested Application Conditions -- Long Version
Comments: 40 Pages, 23 Figures
Subjects: Software Engineering (cs.SE)

Conflict and dependency analysis is an important static analysis tool that provides an overview of the potential interactions of (graph) transformation rules. This analysis is based on critical pairs and initial conflicts, which represent conflicting transformations in a minimal context. However, the crucial information about a conflicting transformation pair is contained in much smaller structures, called disabling/conflict essences in existing research. Recently, we introduced disabling essences for rules with application conditions which contain the information on how an application condition can be violated by another rule. In this paper, we extend the notion of disabling essences to support not only application conditions in Alternating Quantifier Normal Form, but also arbitrary nested conditions. We introduce (symbolic) conflict essences that are constructed from disabling essences and which capture the interaction between two rules. We show that a transformation pair is parallel dependent if and only if a symbolic conflict essence can be embedded into it and relate symbolic conflict essences to initial conflicts for transformation rules with application conditions. We present our results for adhesive HLR categories, which includes several types of graph-like structures.

[468]  arXiv:2605.04948 [pdf, ps, other]
Title: Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir
Comments: Preprint
Subjects: Computation and Language (cs.CL)

This paper presents a comparative study of parameter-efficient fine-tuning (PEFT) methods, including LoRA and QLoRA, applied to the task of adapting large language models to the Bashkir language, a low-resource agglutinative language of the Turkic family. Experimental evaluation is conducted on a Bashkir text corpus of 71k documents (46.9M tokens) using models of various architectures: DistilGPT2, GPT-2 (base, medium), Phi-2, Qwen2.5-7B, DeepSeek-7B, and Mistral-7B. To improve the reliability of results, each configuration was trained with three different random seeds.
The lowest perplexity on the test set was obtained for GPT-2 medium with full fine-tuning (3.34). Meanwhile, QLoRA applied to Mistral-7B (3.79) and Phi-2 (3.81) achieved comparable quality with over 40 times fewer trainable parameters. However, we also observed cases of significant quality degradation when using PEFT for certain architectures (e.g., DeepSeek-7B with rank 8, perplexity = 129.55), indicating that the outcome depends critically on the choice of the base model and its tokenizer.
Additionally, a qualitative analysis of generated texts based on Bashkir prompts revealed that models with the best perplexity do not necessarily produce the most coherent outputs: QLoRA-tuned models generated monolingual Bashkir continuations, whereas the fully fine-tuned model with the lowest perplexity frequently switched to English. The results suggest that QLoRA on 7B-scale models offers an effective compromise between quality and computational cost for Bashkir. To ensure reproducibility, open data, code, and trained adapters will be released upon acceptance.

[469]  arXiv:2605.04949 [pdf, ps, other]
Title: AllSERP: Exhaustive Per-Element Enrichment of the Versatile AdSERP Dataset
Subjects: Information Retrieval (cs.IR)

We release AllSERP, a typed AOI and per-element behavioral enrichment of the AdSERP commercial-intent SERP corpus [4]. AdSERP ships 2,776 trials of full-page screenshots, captured SERP HTML, 150 Hz Gazepoint eye tracking, evtrack mouse telemetry, scroll, and pupil signals against real Google SERPs collected before AI Overviews -- but its bounding boxes cover only ad surfaces (15.5 % of attributable clicks). AllSERP adds pixel-accurate organic and widget bboxes via screenshot-anchored CV, semantic types across thirteen element types via an HTML parser, an inter-result gap-fill flavor (typed_gapfill), and X+Y click attribution that reaches 91.7 % of the corpus while flagging the rest at trial level. The Phase C ad-vs-non-ad partition is internally consistent with the shipped ad rectangles (0 disagreements across 38,250 classifications). We ship the pipeline, per-trial JSONs, a corpus CSV, and a browser-based replay viewer; everything is reproducible from the AdSERP Zenodo volume. The release enables per-element click, fixation, regression, and above-fold analyses that the shipped ads-vs-organic split could not resolve.

[470]  arXiv:2605.04952 [pdf, ps, other]
Title: Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
Subjects: Machine Learning (cs.LG)

Mixture-of-experts (MoE) models enable scalable transformer architectures by activating only a subset of experts per token. Recent evidence suggests that performance improves with increasingly granular experts, i.e., many small experts instead of a few large ones. However, this regime substantially increases routing cost, which can dominate computation. We introduce adaptive inverted-index routing for MoE (AIR-MoE), an inverted-index-inspired routing architecture based on vector quantization (VQ). In a first stage, AIR-MoE performs coarse shortlisting by assigning tokens to VQ codewords to construct a candidate set of experts. In a second stage, fine scoring computes exact routing scores restricted to this shortlist. This two-stage procedure approximates true top-k routing while avoiding full expert scoring and, in contrast to prior work, imposing no structural constraints on expert parameters. AIR-MoE serves as a drop-in replacement for standard routers and requires no modifications to the model architecture or loss function. We further provide a lower bound on the mass recall achieved by AIR-MoE that yields insights into its inner workings. Empirically, we demonstrate that AIR-MoE achieves improved performance compared to existing routing approaches in granular MoE settings.

[471]  arXiv:2605.04954 [pdf, ps, other]
Title: On the Influence of the Feature Computation Budget on Per-Instance Algorithm Selection for Black-Box Optimization
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization (BBO), require a part of the optimization budget to be computed. This raises two questions: (a) from which fraction of the budget spent on feature computation does PIAS become worth it for BBO, and (b) which fraction of the budget optimizes the tradeoff between feature accuracy and PIAS performance. To this end, we perform a broad study where PIAS with varying sampling budgets for feature computation is compared to the single best algorithm on a broad range of algorithm selection scenarios. These scenarios consist of two portfolio sizes, three problem sets, 4 dimensionalities, and 10 target budgets. We find that PIAS is viable for the majority of tested scenarios, even when as much as a quarter of the total budget is spent on feature computation. The tradeoff for the fraction of the budget spent on feature computation to maximize the benefit of PIAS is highly dependent on the specific AS scenario. Further, on average 20 percent of PIAS loss to the virtual best solver is explained by the budget spent on feature computation, highlighting the importance of properly accounting for the feature budget.

[472]  arXiv:2605.04955 [pdf, ps, other]
Title: Order-based Rehearsal Learning
Subjects: Machine Learning (cs.LG)

When a machine learning (ML) model forecasts an undesired event, one often seeks a decision to avoid it, known as the avoiding undesired future (AUF) problem. Many rehearsal learning methods have been proposed for AUF, but they rely on an underlying graph structure; learning such a graph from observational data is challenging and can incur substantial estimation error. In this work, we demonstrate that the order structure can be sufficient for AUF decision-making, and propose the first order-based rehearsal learning method. Although an order is less informative than a graph, it can be sufficient to identify the influence of decisions from observational data, suggesting that learning the entire graph is not always necessary. To learn the order, we develop an information-theoretic method that imposes no restrictions on the form of structural functions or the type of noise distributions. For AUF decision-making, we construct an order-based sampler to approximate the influence of decisions and, combined with a surrogate objective for maximizing the post-decision success probability, reduce the AUF task to a differentiable optimization problem. Experiments show that our order learning method outperforms existing methods, and that our AUF approach not only surpasses methods relying on learned graphs or learned orders, but also matches or even exceeds oracle baselines that are given the true graph.

[473]  arXiv:2605.04956 [pdf, ps, other]
Title: KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
Subjects: Machine Learning (cs.LG); Performance (cs.PF)

LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency across 176 tasks in 15 categories. Our systematic comparison of five representative methods yields three main findings. First, task structure determines correctness more than method design. Category explains nearly three times more variance in semantic correctness than method (9.4% vs 3.3% explained deviance), and 72% of Fusion tasks fail across all five methods while Math tasks are solved consistently. Second, iterative refinement improves correctness, but not performance. Across GEAK iterations, compile rate rises from 52.3% to 68.8% while average speedup declines from $1.58\times$ to $1.44\times$; newly rescued kernels consistently underperform persistently correct ones ($1.16\times$ vs $1.58\times$ speedup in round~0$\to$1). Third, correctness does not imply efficiency. 46.6% of correct kernels are slower than the PyTorch eager baseline, and cross-hardware speedup variance reaches $21.4\times$. Besides, quantization remains completely unsolved (0/30 successes) despite non-trivial compilation rates, revealing systematic misunderstanding of numerical computation contracts rather than surface-level syntax errors. These findings suggest that future progress depends on handling global coordination, explicitly modeling numerical precision, and incorporating hardware efficiency into generation. The code is available at https://github.com/BonnieW05/KernelBenchX

[474]  arXiv:2605.04957 [pdf, ps, other]
Title: Delving into Non-Exchangeability for Conformal Prediction in Graph-Structured Multivariate Time Series
Subjects: Machine Learning (cs.LG)

Point forecasting for graph-structured multivariate time series is a fundamental problem, but rigorous uncertainty quantification for such predictions is still underexplored. Conformal prediction (CP) offers uncertainty estimation with a solid coverage guarantee under the exchangeability assumption, which requires the joint data distribution to be unchanged under permutation. However, in graph-structured time series, inherent cross-node coupling can violate the exchangeability condition, making direct application of CP unreliable. Inspired by the spectral graph theory, such coupling resides in global trends and can be characterized by the low-frequency components, while high-frequency components are nearly exchangeable. Therefore, we propose a novel concept named Spectral Graph Conditional Exchangeability (SGCE), which conditions exchangeable high-frequency components on low-frequency ones to preserve global trends and enable effective CP in the spectral domain. Based on SGCE, we further propose Spectral Conformal prediction via wAveLEt transform (SCALE). SCALE uses graph wavelets to decompose low/high-frequency components and conformalizes high-frequency residuals via adaptive gating over a low-frequency embedding. Experimental results on real-world traffic datasets show that SCALE not only achieves valid coverage but also consistently improves the coverage-efficiency trade-off over the state-of-the-art CP methods.

[475]  arXiv:2605.04960 [pdf, ps, other]
Title: EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance
Comments: 15 pages, 6 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Reinforcement learning with verifiable rewards (RLVR), particularly Group Relative Policy Optimization (GRPO), has advanced LLM reasoning. However, GRPO suffers from three credit assignment failures: uniform token-level granularity that ignores heterogeneous informational value, uniform polarity that penalizes correct steps and rewards incorrect ones, and zero-variance collapse that erases outcome-driven gradients. We systematically quantify these failures, revealing highly non-uniform token informativeness, widespread step-level polarity misalignment, and substantial training waste.
To address these limitations, we propose Entropy-Progress Aligned GRPO (EP-GRPO), a framework that mines the model's intrinsic information flow for dense, self-supervised guidance. EP-GRPO integrates entropy-gated modulation to prioritize high entropy decision pivots, implicit process signals from policy divergence anchored to outcome advantages for directional token-level feedback without external reward models, and cumulative entropy mapping that enables progress-aligned advantage normalization, naturally maintaining gradient flow under zero reward variance.
Extensive experiments on mathematical reasoning benchmarks demonstrate that EP-GRPO achieves superior accuracy and efficiency compared to GRPO and its variants. The code will be available.

[476]  arXiv:2605.04962 [pdf, ps, other]
Title: TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding
Comments: 15 pages, 8 figures. Code and datasets are available at this https URL
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Foundation models have established unified representations for natural language processing, yet this paradigm remains largely unexplored for tabular data. Existing methods face fundamental limitations: LLM-based approaches lack retrieval-compatible vector outputs, whereas text embedding models often fail to capture tabular structure and numerical semantics. To bridge this gap, we first introduce the Tabular Embedding Benchmark (TabBench), a comprehensive suite designed to evaluate the tabular understanding capability of embedding models. We then propose TabEmbed, the first generalist embedding model that unifies tabular classification and retrieval within a shared embedding space. By reformulating diverse tabular tasks as semantic matching problems, TabEmbed leverages large-scale contrastive learning with positive-aware hard negative mining to discern fine-grained structural and numerical nuances. Experimental results on TabBench demonstrate that TabEmbed significantly outperforms state-of-the-art text embedding models, establishing a new baseline for universal tabular representation learning. Code and datasets are publicly available at https://github.com/qiangminjie27/TabEmbed and https://huggingface.co/datasets/qiangminjie27/TabBench.

[477]  arXiv:2605.04965 [pdf, ps, other]
Title: Reliable Modeling of Distribution Shifts via Displacement-Reshaped Optimal Transport
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Optimal transport (OT) is a central framework for modeling distribution shifts. Because OT compares distributions directly in input space, a well-designed ground metric between observations is essential to ensure that the optimizer does not violate the true geometry of change. We propose Displacement-Reshaped Optimal Transport (ReshapeOT), a method that reshapes the ground metric by integrating observed sample displacements as an additional source of knowledge. Technically, ReshapeOT replaces the Euclidean metric with a Mahalanobis distance estimated from displacement second moments. This effectively carves expressways through the input space, inviting transport solutions that better align with observed displacements. Our method is computationally lightweight, integrates seamlessly into any OT solver that operates on a cost matrix, and can be kernelized for further flexibility. Experiments on synthetic and real-world data show that ReshapeOT achieves substantial gains in transport reliability. We further demonstrate our method's usefulness in two practical use cases.

[478]  arXiv:2605.04966 [pdf, ps, other]
Title: Adaptive Contention-based Random Access for Uplink Reporting in 3GPP Ambient IoT Networks
Subjects: Systems and Control (eess.SY)

Ambient Internet of Things (A-IoT) targets energy harvesting (EH), battery-less devices as a simple connectivity solution for extensive ultra-low-power deployments. These devices typically face intermittent energy availability, making uplink reports increasingly susceptible to access collisions and energy outages. In this paper, we build upon the cellular standardization of A-IoT and examine the paging-triggered contention-based random access (CBRA) framework for uplink reporting. We analyze the effects of energy availability and collisions on these systems and introduce an EH-aware access control mechanism. In this mechanism, the reader broadcasts an access probability in the paging message, which helps regulate the number of devices attempting random access. Results show that, unlike the baselines, the proposed method scales well under dense deployments by keeping collisions nearly constant, improving access efficiency, and substantially reducing the number of paging rounds required for successful reporting. These results highlight the importance of lightweight reader-side access control for reliable and resource-efficient reporting in A-IoT environments.

[479]  arXiv:2605.04970 [pdf, ps, other]
Title: Skill Neologisms: Towards Skill-based Continual Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Modern LLMs show mastery over an ever-growing range of skills, as well as the ability to compose them flexibly. However, extending model capabilities to new skills in a scalable manner is an open-problem: fine-tuning and parameter-efficient variants risk catastrophic forgetting, while context-based approaches have limited expressiveness and are constrained by the model's effective context. We explore skill neologisms--i.e., soft tokens integrated in the model's vocabulary and optimized to improve capabilities over a specific skill--as a way to selectively extend model capabilities to new skills without weight updates. We first observe that off-the-shelf pre-trained LLMs already demonstrate tokens associated with procedural knowledge. We then show that skill neologisms can be learned to improve model capabilities on specific skills while being composable with out-of-distribution skills, and that independently trained skill neologisms can be composed zero-shot. These results suggest that skill neologisms may provide a scalable path towards skill-based continual learning.

[480]  arXiv:2605.04971 [pdf, ps, other]
Title: Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking
Comments: 9 pages of main text, plus appendices. Under review at NeurIPS 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Weight matrices in deep networks exhibit geometric continuity -- principal singular vectors of adjacent layers point in similar directions. While this property has been widely observed, its origin remains unexplained. Through experiments on toy MLPs and small transformers, we identify two mechanisms: residual connections create cross-layer gradient coherence that aligns weight updates across layers, and symmetry-breaking nonlinearities constrain all layers to a shared coordinate frame, preventing the rotation drift that would otherwise destabilize weight structure. Crucially, a nonlinear but rotation-preserving activation fails to retain continuity, isolating symmetry breaking -- not nonlinearity itself -- as the active ingredient. Activation and normalization play distinct roles: activation concentrates continuity in the leading singular direction, while normalization distributes it across multiple directions. In transformers, continuity is projection-specific: Q, K, Gate, and Up (which read from the residual stream) develop input-space ($\mathbf{v}_1$) continuity; O and Down (which write to it) develop output-space ($\mathbf{u}_1$) continuity; V alone, lacking an adjacent nonlinearity, develops only low continuity.

[481]  arXiv:2605.04972 [pdf, ps, other]
Title: Why Expert Alignment Is Hard: Evidence from Subjective Evaluation
Comments: 10 pages, 2 figures
Subjects: Computation and Language (cs.CL)

Aligning large language models with expert judgment is especially difficult in subjective evaluation tasks, where experts may disagree, rely on tacit criteria, and change their judgments over time. In this paper, we study expert alignment as a way to understand this difficulty. Using expert evaluations and follow-up questionnaires, we examine how different forms of expert information affect alignment and what this reveals about subjective judgment. Our findings show four consistent patterns. First, alignment difficulty varies substantially across experts, suggesting that expert evaluation styles differ widely in their distance from a model's prior behavior. Second, explicit criteria and reasoning do not always improve alignment, indicating that expert judgment is not fully captured by verbalized rules. Third, editing is sensitive to both the number and the identity of examples, with small numbers of edits providing useful but unstable gains. Fourth, alignment difficulty differs across evaluation dimensions: dimensions grounded more directly in proposal content are easier to align, while dimensions requiring external knowledge or value-based judgment remain harder. Taken together, these results suggest that expert alignment is difficult not only because of model limitations, but also because subjective evaluation is inherently heterogeneous, partly tacit, dimension-dependent, and temporally unstable.

[482]  arXiv:2605.04973 [pdf, ps, other]
Title: Architectural Constraints Alignment in AI-assisted, Platform-based Service Development
Comments: To Appear at CAiSE'26 - LLM-SOA Workshop
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

AI-assisted development tools enable rapid prototyping of services but often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments. Consequently, generated artifacts may exhibit brittle behavior and limited deployability. We propose a retrieval-augmented scaffolding approach that combines platform-based code generation with agentic clarification loops to expose and resolve architectural constraint ambiguities. By combining template retrieval with structured interaction, the method embeds production-relevant considerations during service scaffolding. Evaluation indicates improved architectural consistency and deployability compared to general-purpose AI code generation workflows, suggesting that constraint-aware retrieval is essential for aligning AI-assisted service development with production software engineering practices.

[483]  arXiv:2605.04975 [pdf, ps, other]
Title: Probabilistic Atomic Swaps for Bitcoin and Friends
Subjects: Cryptography and Security (cs.CR)

Atomic swaps are a fundamental primitive for the trustless exchange of digital assets across blockchains: they guarantee that either both parties receive the agreed assets or neither party transfers. While this all-or-nothing guarantee is powerful, it also imposes an inherent determinism that rules out exchanges whose intended outcome is probabilistic. As a result, existing atomic swaps cannot realize trustless exchanges in which one party pays for a fixed chance of receiving a larger asset or reward, as in lotteries, randomized allocation mechanisms, and probabilistic cross-chain trades.
We introduce probabilistic swaps, a new cryptographic primitive that extends atomic swaps to the probabilistic setting. In a probabilistic swap, one party's transfer is executed with a fixed, publicly specified probability embedded in the protocol and cannot be biased by either party. This yields a trustless mechanism for randomized exchange with verifiable odds and no trusted intermediary.
Our construction combines adaptor signatures with oblivious pseudorandom functions (OPRFs) to realize the desired probabilistic outcome while ensuring that neither party can predict or bias it in advance. Along the way, we introduce a new mechanism for the atomic exchange of OPRF evaluations for payments, which may be of independent interest. A key feature of our approach is that it preserves the minimal on-chain footprint of modern atomic-swap protocols. The protocol relies only on standard Bitcoin scripts, such as digital signatures and timelocks, and is deployable on any blockchain that already supports atomic swaps. Consequently, probabilistic swaps are indistinguishable from ordinary on-chain transactions, which helps preserve privacy and fungibility. We provide formal security foundations and demonstrate practicality through a probabilistic swap in the Bitcoin testnet and in the Lightning Network.

[484]  arXiv:2605.04977 [pdf, ps, other]
Title: ICPR 2026 Competition on Privacy-Preserving Person Re-Identification from Top-View RGB-Depth Camera (TVRID)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This companion paper reports the ICPR 2026 TVRID competition on privacy-aware top-view person re-identification. We present the competition setting, the released RGB-Depth dataset, and a summary of final results with descriptions of the top entries. TVRID contains 86 identities captured by four synchronized overhead Intel RealSense D455 cameras, with paired RGB/Depth streams and structured geometric variation across flat, ascent, descent, and oblique viewpoints. The evaluation protocol includes three tracks: RGB Re-ID, Depth Re-ID, and RGB$\leftrightarrow$Depth cross-modal retrieval. Submissions are ranked using mAP and CMC-1 under a unified server-side evaluation. The final results show a clear difficulty ordering (RGB $>$ Depth $>$ Cross-Modal), highlighting both the challenge of modality-constrained retrieval and the feasibility of strong performance with modality-invariant learning. By releasing the dataset at https://zenodo.org/records/17909410, the evaluation scripts at https://github.com/RaphaelDel/ICPR-TVRID, and the accompanying documentation, TVRID establishes a reproducible benchmark for top-view, depth-based, and cross-modal person re-id.

[485]  arXiv:2605.04978 [pdf, ps, other]
Title: Exhaustive Symbolic Integration: Integration by Differentiation and the Landscape of Symbolic Integrability
Authors: Harry Desmond
Comments: 26 pages, 2 figures; to be submitted to the Journal of Symbolic Computation
Subjects: Symbolic Computation (cs.SC); Logic in Computer Science (cs.LO)

We introduce Exhaustive Symbolic Integration (ESI), a method that enumerates all symbolic functions up to a given complexity $k$ within a specified operator basis and determines which admit closed-form antiderivatives within the same class. This allows us to compute the "integrability fraction" $\rho(k)$ (the fraction of functions whose derivatives lie within the same class), which we do for five operator bases including combinations of rational functions, powers, exponentials, logarithms and trigonometric functions. We find that $\rho(k)$ declines at high complexity and that the operator basis has a dramatic effect -- in particular, adding the logarithm boosts $\rho(k)$ by a factor of $\sim$3 and produces or exacerbates a clear peak at $k=6$. We also deploy ESI as a novel integration algorithm, identifying three integrals that resist SymPy, Mathematica, RUBI, FriCAS, Maxima and Giac under all tested strategies. When an antiderivative can be found by multiple methods, ESI often returns the simplest form. These results reveal that the landscape of symbolic integrability is shaped primarily by the choice of operators, and that exhaustive enumeration can systematically discover integrable forms -- including novel ones -- that elude computer albegra systems.

[486]  arXiv:2605.04979 [pdf, ps, other]
Title: On-line Learning in Tree MDPs by Treating Policies as Bandit Arms
Comments: Accepted as a full paper in the Main Track of AAMAS 2026
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

A Tree Markov Decision Problem (T-MDP) is a finite-horizon MDP with a starting state $s_{1}$, in which every state is reachable from $s_{1}$ through exactly one state-action trajectory. T-MDPs arise naturally as abstractions of decision making in sequential games with perfect recall, against stationary opponents. We consider the problem of on-line learning in T-MDPs, both in the PAC and the regret-minimisation regimes. We show that well-known bandit algorithms -- \textsc{Lucb} and \textsc{Ucb} -- can be applied on T-MDPs by treating each policy as an arm. The apparent technical challenge in this approach is that the number of policies is exponential in the number of states. Our main innovation is in the design of confidence bounds based on data shared by the policies, so that the bandit algorithms can yet be implemented with polynomial memory and per-step computation. We obtain instance-dependent upper bounds on sample complexity and regret that sum a ``gap term'' from every terminal state, rather than every policy. Empirically, our algorithms consistently outperform available alternatives on a suite of hidden-information games.

[487]  arXiv:2605.04980 [pdf, ps, other]
Title: Conceptors for Semantic Steering
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vector baseline. We further show that the conceptor quota provides a parameter-free layer-selection diagnostic, predicting concept separability with Pearson correlations up to r=0.96 across three instruction-tuned models and three semantic dimensions. Beyond selection, conceptors admit a closed-form Boolean algebra (AND, OR, NOT): we evaluate conceptor compositionality on thematically related sub-concepts. Across a systematic five-axis design-space evaluation, conceptors match or outperform additive baselines at layers where concept subspaces are multi-dimensional while producing substantially fewer degenerate outputs. Conceptor steering is a geometrically principled, compositional, and practically safer alternative to single-direction steering from a limited number of contrastive pairs.

[488]  arXiv:2605.04984 [pdf, ps, other]
Title: Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Long-horizon LLM agents depend on intermediate information-gathering turns, yet training feedback is usually observed only at the final answer, because process-level rewards require high-quality human annotation. Existing turn-level shaping methods reward turns that increase the likelihood of a gold answer, but they require answer supervision or stable task-specific verifiers. Conversely, label-free RL methods extract self-signals from output distributions, but mainly at the answer or trajectory level and therefore cannot assign credit to intermediate turns. We propose Self-Induced Outcome Potential (SIOP), which treats semantic clusters of final answers as latent future outcome states for potential-based turn-level credit assignment. For each query, SIOP samples multiple rollouts, clusters final answers into semantic outcome modes, and builds a reliability-aware target distribution over these states. It then rewards turns for increasing posterior support for reliable future states using a tractable cluster-level approximation. The objective generalizes information-potential shaping from gold-answer supervision to settings without task-specific gold verifiers while avoiding the broadcasted rollout-level advantages used by standard GRPO. We formalize the framework, characterize its supervised gold-answer limit, and show that SIOP improves average performance over verifier-free outcome-level baselines on seven search-augmented agentic reasoning benchmarks while approaching a gold-supervised outcome baseline. Code is available at https://github.com/dl-m9/SIOP.git.

[489]  arXiv:2605.04985 [pdf, ps, other]
Title: Attention-Based Chaotic Self-Supervision for Medical Image Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning models for medical image classification usually achieve promising results but typically rely on large, annotated datasets or standard transfer learning from ImageNet. Self-Supervised Learning (SSL) has emerged as a powerful alternative, yet common methods like masked autoencoders (MAEs) may inadvertently destroy fine-grained diagnostic features by using random masking. In this paper, we propose a novel SSL pre-training strategy, the Chaotic Denoising Autoencoder (CDAE). Instead of masking, we apply a chaotic transformation to the input image, tasking an autoencoder to reconstruct the original. We hypothesize this forces the encoder to learn robust, domain-specific features by "inverting the chaos". Furthermore, we propose an attentive fusion mechanism that combines features from our CDAE-trained encoder with a standard encoder, leveraging the strengths of both general and domain-specific representations. Our method is evaluated on two public medical datasets: ISIC 2018 (skin lesions) and APTOS 2019 (diabetic retinopathy). The proposed model achieves high performance, with an accuracy of 0.9221 and an F1-macro of 0.8530 on ISIC 2018, and an accuracy of 0.8644 and F1-macro of 0.7433 on APTOS 2019, demonstrating the efficacy of our approach.

[490]  arXiv:2605.04989 [pdf, ps, other]
Title: Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data
Comments: Accepted at IGARSS 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Wildfire burned-area mapping is essential for damage assessment, emissions modeling, and understanding fire-climate interactions across diverse ecological regions. Recent geospatial foundation models provide strong general-purpose representations for satellite imagery, yet there is still no clear understanding of how to efficiently adapt these models for downstream Earth observation tasks, particularly under geographic and temporal domain shift. This study evaluates three state-of-the-art Geospatial Foundation Models (GFMs) - Terramind, DINOv3, and Prithvi-v2 - for burned-area mapping across the United States and Canada using Sentinel-2 data. Leveraging 3,820 wildfire events from 2017-2023, we conduct spatial and temporal generalization tests across diverse biomes. We systematically compare full fine-tuning, decoder-only fine-tuning, and Low-Rank Adaptation (LoRA) for adapting each model. Across all experiments, LoRA provides the strongest cross-domain generalization while updating less than 1% of parameters, demonstrating a favorable trade-off between accuracy and efficiency. Prithvi-v2 with LoRA achieves the highest overall accuracy and the largest improvement compared to full fine-tuning. These findings indicate that geospatial foundation models, when adapted using lightweight parameter-efficient methods such as LoRA, offer a robust and scalable solution for large-scale burned-area mapping. Code is available at https://github.com/alishibli97/wildfire-lora-gfm.

[491]  arXiv:2605.04992 [pdf, ps, other]
Title: You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation
Subjects: Cryptography and Security (cs.CR)

The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank Adaptation (LoRA) modules. However, integrating these third-party adapters often induces catastrophic forgetting of the base model's foundational safety alignment. Restoring these guardrails via fine-tuning on safety data introduces an opposing failure mode: the severe degradation of the specialized domain knowledge the adapter was originally designed to provide. To overcome this zero-resource challenge, we propose Neural Weight Translation (NeWTral), a framework that directly maps unsafe, domain-specific adapters onto a safe alignment manifold while rigorously preserving their core expertise. NeWTral operates as a non-linear translation module pre-trained on a diverse corpus of unsafe-to-safe adapter pairs. By executing this mapping entirely within the parameter space, NeWTral utilizes an adaptive Mixture of Experts (MoE) routing strategy to autonomously blend high-fidelity surgical translators and aggressive alignment experts. We evaluate our framework across four architectural families (Llama, Mistral, Qwen, and Gemma) at scales up to 72B parameters across eight diverse scientific and professional domains. Our results demonstrate that the MoE variant achieves a radical reduction in the average Attack Success Rate (ASR), dropping from 70% in unsafe experts to just 13%, while maintaining an exceptional 90\% average knowledge fidelity. Much like the crowdsourced adapters it remedies, the NeWTral module is designed as a standalone, downloadable asset that allows practitioners to restore safety alignment instantly without requiring access to original training data or hardware-intensive retraining.

[492]  arXiv:2605.04993 [pdf, ps, other]
Title: Federated Learning for Early Prediction of EV Charging Demand
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Accurate forecasting of electric vehicle (EV) charging demand is critical for grid stability, infrastructure planning, and real-time charging optimization. In this work, we study the problem of early prediction of charging demand, where the total energy of a session is estimated using only information available at plug-in time and during the first minutes of charging. This enables actionable decisions while the session is still in progress, which is of direct importance for EV network operators. We construct a session-level dataset from the Adaptive Charging Network (ACN), combining session metadata with early-window charging measurements, and derive tabular features capturing user intent, temporal patterns, and initial charging behavior. We focus on a single operational depot, Caltech, and model intra-depot heterogeneity through station-level client partitions while evaluating multiple model families in a federated learning (FL) setting. Our results show that federated models can approach centralized predictive performance while keeping data in-depot, enabling privacy-enhanced training across distributed charging infrastructures. Overall, we demonstrate that reliable demand estimates can be obtained early in the session with minimal data, and that FL provides a practical pathway toward scalable and privacy-aware analytics for EV charging networks. Code is available at https://github.com/Indigma-Innovations/federated-learning-ev-charging-demand.

[493]  arXiv:2605.04995 [pdf, ps, other]
Title: Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we require these operations to be implemented by ReLU neural networks. In both settings, adaptivity never hinders approximation performance. However, this advantage can change when one passes from the unrestricted regime to the realizable regime. We identify four distinct approximation scenarios, each witnessed by an explicit task family: (a) no advantage of adaptivity; (b) an advantage in the unrestricted regime that persists under ReLU realizability; (c) an advantage that arises only under realizability; and (d) an advantage that disappears under realizability. This demonstrates that representational constraints interact profoundly with the effect of adaptivity.

[494]  arXiv:2605.04996 [pdf, ps, other]
Title: Tailoring Scaffolding to Diagnostic Strategies: Theory-Informed LLM-Based Agents
Comments: 3 pages, 1 figure. Companion Proceedings 16th International Conference on Learning Analytics & Knowledge (LAK26), Strengthening the Use of Learning Theories for Personalization of Learning Analytics Workshop
Subjects: Human-Computer Interaction (cs.HC)

Learning analytics systems increasingly integrate large language models (LLMs) to provide adaptive scaffolding in complex learning environments, yet personalization is often driven by global instructional choices rather than principled alignment with learning theory, limiting effectiveness and pedagogical grounding. In prior work, we examined how structuring and problematizing scaffolding approaches can be instantiated through LLM agents in a scenario-based learning environment for diagnostic reasoning. While both approaches supported learning, we observed systematic differences in learner interaction patterns and clear tendencies indicating that different diagnostic strategies benefited from distinct forms of scaffolding. Building on these findings, we propose a theory-informed scaffolding design grounded in the Knowledge Learning Instruction (KLI) framework, as different diagnostic strategies target different types of knowledge and require different instructional mechanisms. We use KLI to guide the alignment between strategy demands and scaffolding approaches and introduce a KLI-informed hybrid LLM agent that adapts its pedagogical support according to the diagnostic strategy being practiced, rather than applying a single global scaffolding approach. We hypothesize that this design could enable better learning gains.

[495]  arXiv:2605.04997 [pdf, ps, other]
Title: DualTCN: A Physics-Constrained Temporal Convolutional Network for 2 Time-Domain Marine CSEM Inversion
Subjects: Machine Learning (cs.LG)

DualTCN is the first deep-learning framework for inverting time-domain marine controlled-source electromagnetic (MCSEM) transient data. Moving away from traditional subsurface discretization, the framework regresses four earth-model parameters -- $\sigma_1$, $\sigma_2$, $d_1$, $d_2$ -- and reconstructs conductivity-depth profiles using a differentiable soft-step decoder. The optimized architecture (379K parameters) features a Temporal Convolutional Network (TCN) encoder paired with a late-time branch and an auxiliary seafloor-depth head. This design achieves a 25.3\% loss reduction over baseline models, with high predictive accuracy ($R^2 = 0.898$ for $\sigma_2$) and an inversion speed of 3.5~ms per sample on an A100 GPU.
The framework demonstrates high robustness to noise through curriculum-based amplitude augmentation, maintaining a mean $\bar{R}^2$ of 0.858 at $\pm2\%$ random amplitude error, compared to $0.363$ without augmentation. DualTCN generalizes effectively to three-layer extensions (seawater/resistive layer/basement), accurately resolving basement conductivity ($R^2 \approx 0.88$), though thin-layer resolution remains a physical limitation ($R^2 \approx 0.23$).
In comparative benchmarks, DualTCN significantly outperforms traditional local optimization methods like Levenberg-Marquardt and L-BFGS-B, yielding a mean $\bar{R}^2 = 0.877$ versus 0.129-0.439 for multi-start baselines, while operating at up to 21,000$\times$ lower computational cost. Finally, the framework incorporates uncertainty quantification via Monte Carlo (MC) Dropout. While well-calibrated for $\sigma_1$ (PICP90 = 0.944), inherent signal limitations at short offsets (200m) lead to under-coverage for $d_2$ (PICP90 = 0.572), which can be mitigated through post-hoc temperature scaling or split conformal prediction.

[496]  arXiv:2605.04998 [pdf, ps, other]
Title: Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation
Authors: Jinju Lee
Comments: 3 figures, 5 tables. Companion HuggingFace models: this https URL
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Chord progression generation is practically important but understudied. Most large-scale symbolic music systems target melody, multi-track arrangement, or audio synthesis, and chord-only models tend to be relegated to conditioning components inside larger pipelines. This paper treats chord generation as a standalone task and addresses a question that arises whenever such a model is adapted across genres: how much old-domain data must be retained during fine-tuning to acquire a new domain without forgetting the old? I study jazz fine-tuning starting from a pop-pretrained 25M-parameter Music Transformer (84.24% top-1 chord accuracy on a held-out pop test set). The available jazz corpus is an order of magnitude smaller than the pop corpus, so every fine-tune run uses all 1,513 jazz training sequences. The swept variable is the volume of pop "rehearsal" data mixed alongside, taking values in {0, 1K, 2.5K, 5K, 10K}. Every fine-tuned model gains 7 to 9 points of jazz top-1. Pop accuracy collapses by 2.14 points under jazz-only fine-tuning, recovers to baseline at approximately 2.5K rehearsal samples (1.65x the jazz volume), and saturates beyond that point. A complementary observation: the metric-best run (F3, 2.5K mix) is not always the perceptually preferred one. The pop-leaning (10K) and jazz-leaning (1K) endpoints carry more committed stylistic identities that the author more often selects as finished output in informal listening. I discuss what this suggests for music co-creation tools but make no perceptual claim, since no formal listening study has been conducted. All six checkpoints are released on the HuggingFace Hub at https://huggingface.co/PearlLeeStudio.

[497]  arXiv:2605.05000 [pdf, ps, other]
Title: Agentic Vulnerability Reasoning on Windows COM Binaries
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Windows Component Object Model (COM) services run with elevated privileges and are widely accessible to authenticated users, making race conditions in these binaries a critical surface for local privilege escalation. We present SLYP, an end-to-end agentic pipeline that discovers race condition vulnerabilities in COM binaries and generates debugger-verified proof-of-concept (PoC) code. SLYP exposes binary exploration, COM inspection, and dynamic debugging as reusable tool interfaces, giving agents the static context, COM activation metadata, and debugger feedback needed to move from vulnerability discovery to verified PoC generation. On a benchmark of 20 COM objects covering 40 vulnerability cases, SLYP achieves 0.973 F1, outperforming production coding agents by up to 0.208 F1 and the state-of-the-art static analyzer by 3.3x in bug discovery. For PoC generation, production coding agents in their default setup (without our COM inspection and dynamic debugging tools) verify essentially no cases on either frontier model, whereas SLYP's interactive toolsets enable it to autonomously synthesize working PoCs for 67.5% of cases on the strongest configuration. Deployed on production Windows services, SLYP discovers 28 previously unknown vulnerabilities across nine COM services, all confirmed by the Microsoft Security Response Center (MSRC) with 16 CVEs assigned and $140,000 in bounties. Furthermore, SLYP is designed with generalizable binary analysis and debugging interfaces, making it readily applicable to other commercial off-the-shelf (COTS) binaries beyond Windows COM services.

[498]  arXiv:2605.05001 [pdf, ps, other]
Title: Unlocking Embodied Probabilistic Computational Features in Motor Drives
Comments: This manuscript has been accepted for publication in 2026 International Power Electronics Conference, IPEC-Nagasaki 2026 -ECCE Asia-
Subjects: Systems and Control (eess.SY)

Artificial intelligence (AI)-driven fault diagnosis in motor drives often requires significant computational efforts and time for re-training, in addition to the limited knowledge behind the model and suitability of training and learning mechanisms. This work bridges this gap by proposing a structured mechanism of transforming untapped labeled fault data into AI parameters to leverage probabilistic data-driven learning. This novel AI reservoir modeling framework for power electronics not only eliminates exogenous efforts behind learning data patterns and its optimization, but also provides intuitive guidelines for power electronics engineers behind sizing of AI models. This alignment between data and system physics makes the proposed model transparent and interpretable, bridging practical understanding with data-driven learning. Its computational efficiency is demonstrated using experimental data that structured, physics-aware reservoirs achieve higher diagnostic accuracy and clearer explanations than conventional black-box AI methods.

[499]  arXiv:2605.05003 [pdf, ps, other]
Title: Misaligned by Reward: Socially Undesirable Preferences in LLMs
Comments: Preprint
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Reward models are a key component of large language model alignment, serving as proxies for human preferences during training. However, existing evaluations focus primarily on broad instruction-following benchmarks, providing limited insight into whether these models capture socially desirable preferences. As a result, important failures in social alignment can remain hidden.
We extend reward-model benchmarking to four socially consequential domains: bias, safety, morality, and ethical reasoning. We introduce a framework that converts social evaluation datasets into pairwise preference data, leveraging gold labels where available and directional bias indicators otherwise. This enables us to test whether reward models prefer socially undesirable responses, and whether their preferences produce systematically biased distributions over selected outputs.
Across five publicly available reward models and two instruction-tuned models used as reward proxies, we find substantial variation across domains, with no single model performing best overall. The models fall well short of strong social intelligence: they often prefer socially undesirable options, and their preferences produce systematically biased distributions. Moreover, stronger bias avoidance can reduce sensitivity to context, revealing a key alignment trade-off between avoiding biased outcomes and preserving contextual faithfulness. These findings show that standard reward benchmarks are insufficient for assessing social alignment and highlight the need for evaluations that directly measure the social preferences encoded in reward models.

[500]  arXiv:2605.05007 [pdf, ps, other]
Title: Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation
Subjects: Artificial Intelligence (cs.AI)

Large language model (LLM) multi-agent systems typically rely on rigid orchestration, committing either to flat per-query routing or to hand-engineered task decomposition, so decomposition depth, worker choice, and inference budget are not jointly optimized under one objective. We introduce Uno-Orchestra, a unified orchestration policy that selectively decomposes a task and dispatches each subtask to an admissible (model, primitive) pair, with both decisions learned together from curated RL trajectories grounded in real worker interactions. Against 22 baselines on a 13-benchmark suite spanning math, code, knowledge, long-context, and agentic tool-use, Uno-Orchestra reaches 77.0% macro pass@1, roughly 16% above the strongest workflow baseline, at roughly an order of magnitude lower per-query cost, advancing the accuracy-efficiency frontier of selective delegation.

[501]  arXiv:2605.05009 [pdf, ps, other]
Title: Learned Neighbor Trust for Collaborative Deployment in Model-Agnostic Decentralized Learning
Subjects: Machine Learning (cs.LG)

Many decentralized distillation methods are designed around training-time coordination, yet deploy each node in isolation even when more capable neighbors remain available at inference time. This is an incomplete objective for settings such as IoT, where devices are heterogeneous, data is scarce and skewed, and a node's strongest neighbors may far exceed its own local capacity. We study how nodes should train so that their predictions compose well at deployment, and how each node should learn whom to trust. Under a server-free, model-agnostic protocol where nodes exchange only queries and soft predictions, we propose Learned Neighbor Trust (LNTrust) wherein each node learns a compact trust function over its neighborhood from local validation evidence. This trust function gates auxiliary distillation during training and defines a deployment ensemble at inference, so that collaboration learned during training transfers directly to deployment. Across datasets and topologies, LNTrust improves deployed accuracy over the strongest output-only baseline by large margins while using significantly less communication than previous methods.

[502]  arXiv:2605.05012 [pdf, ps, other]
Title: Chaotic Contrastive Learning for Robust Texture Classification
Authors: Joao B Florindo
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Texture classification is a pivotal task in computer vision, presenting unique challenges due to high inter-class similarity and the sensitivity of structural patterns to scale and illumination changes. While Convolutional Neural Networks (CNNs) and recent Vision Transformers have set performance benchmarks, they often require extensive labeled datasets or struggle to generalize across domains due to an over-reliance on color and shape features. This paper introduces a novel framework that synergizes Self-Supervised Learning (SSL) with deterministic chaotic dynamics. We propose a chaotic contrastive pre-training strategy, where pixel-wise chaotic maps, specifically Logistic, Tent, and Sine maps, act as non-linear data augmentation techniques. These chaotic perturbations, grounded in ergodic theory, force the network to learn topologically robust features by mimicking complex environmental noise and reflectance variations. Furthermore, we introduce an attention-based feature ensemble that fuses high-level semantic representations from a supervised large backbone with low-frequency structural features from a chaos-pretrained tiny encoder. Experimental results on six texture benchmarks (FMD, UMD, KTH-TIPS2-b, DTD, GTOS, and 1200Tex) demonstrate the superiority of the proposed method, outperforming state-of-the-art approaches and achieving promising accuracies on all the analyzed datasets.

[503]  arXiv:2605.05014 [pdf, ps, other]
Title: CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography
Comments: Accepted at CVPR 2026 (Highlight). Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Autonomous driving must operate across diverse surfaces to enable safe mobility. However, most driving datasets are captured on well-paved flat roads. Moreover, recent driving datasets primarily provide sparse LiDAR ground truth for images, which is insufficient for assessing fine-grained geometry in depth estimation and completion. To address these gaps, we introduce CARD, a multi-modal driving dataset that delivers quasi-dense 3D ground truth across continuous sequences rich in speed bumps, potholes, irregular surfaces and off-road segments. Our sensor suite includes synchronized global-shutter stereo cameras, front and rear LiDARs, 6-DoF poses from LiDAR-inertial odometry, per-wheel motion traces, and full calibration. Notably, our multi-LiDAR fusion yields ~500K valid depth pixels per frame, about 6.5x more than KITTI Depth Completion and 10x more on average than other public driving datasets. The dataset spans ~110 km and 4.7 hours across Germany and Italy. In addition, CARD provides 2D bounding boxes targeting road-topography irregularities, enabling accurate benchmarking for both geometry and perception tasks. Furthermore, we establish a standardized evaluation protocol for road surface irregularities on CARD and benchmark state-of-the-art depth estimation models to provide strong baselines. The CARD dataset is hosted on https://huggingface.co/CARD-Data.

[504]  arXiv:2605.05016 [pdf, ps, other]
Title: Goedel Logics: On the Elimination of The Absoluteness Operator
Comments: This research was funded in part by the Austrian Science Fund (FWF) 10.55776/P36571
Subjects: Logic in Computer Science (cs.LO)

We investigate the eliminability of the absoluteness operator Delta in Goedel logics. While Delta is not definable from the standard connectives and disrupts important proof-theoretic properties, we show that it becomes eliminable at the propositional level under a restricted semantics in which all propositional atoms (except the truth constant 'True') are interpreted strictly below 1. Under this semantics, every formula containing Delta is equivalent to a disjunction of chain formulas, yielding a Delta-free normal form (standard and restricted semantics coincide w.r.t. valid formulas without Delta). We further analyze the situation in the first-order setting, where Delta-elimination fails in general due to recursion-theoretic and topological constraints, but can be recovered under witnessed semantics.

[505]  arXiv:2605.05017 [pdf, ps, other]
Title: Position: Embodied AI Requires a Privacy-Utility Trade-off
Comments: Accepted at ICML 2026. 10 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

Embodied AI (EAI) systems are rapidly transitioning from simulations into real-world domestic and other sensitive environments. However, recent EAI solutions have largely demonstrated advancements within isolated stages such as instruction, perception, planning and interaction, without considering their coupled privacy implications in high-frequency deployments where privacy leakage is often irreversible. This position paper argues that optimizing these components independently creates a systemic privacy crisis when deployed in sensitive settings, thereby advancing the position that privacy in EAI is a life cycle-level architectural constraint rather than a stage-local feature. To address these challenges, we propose Secure Privacy Integration in Next-generation Embodied AI (SPINE), a unified privacy-aware framework that treats privacy as a dynamic control signal governing cross-stage coupling throughout the entire EAI life cycle. SPINE decomposes the EAI pipeline into various stages and establishes a multi-criterion privacy classification matrix to orchestrate contextual sensitivity across stage boundaries. We conduct preliminary simulation and real-world case studies to conceptually validate how privacy constraints propagate downstream to reshape system behavior, illustrating the insufficiency of fragmented privacy patches and motivating future research directions into secure yet functional embodied AI systems. We detail the SPINE framework and case studies at https://github.com/rminshen03/EAI_Privacy_Position.

[506]  arXiv:2605.05020 [pdf, ps, other]
Title: Graph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learning
Authors: Shawn Ray
Comments: 22 pages, 12 figures, 7 tables
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)

System Neural Diversity (SND) measures behavioral heterogeneity in multi-agent reinforcement learning by averaging pairwise distances over all $\binom{n}{2}$ agent pairs, making each call quadratic in team size. We introduce Graph-SND, which replaces this complete-graph average with a weighted average over the edges of an arbitrary graph $G$. Three regimes follow: $G=K_n$ recovers SND exactly; a fixed sparse $G$ defines a localized diversity measure at $O(|E|)$ cost; and random edge samples yield an unbiased Horvitz-Thompson estimator and a normalized sample mean with $O(1/\sqrt{m})$ concentration in the sampled edge count $m$. For fixed sparse graphs we prove forwarding-index distortion bounds for expanders and a spectral refinement under low-rank distance structure; for random $d$-regular graphs we prove an unconditional probabilistic $\widetilde{\mathcal{O}}(D_{\max}/\sqrt{n})$ bound. On VMAS we verify recovery, unbiasedness, concentration, and wall-clock scaling, with a PettingZoo TVD panel checking non-Gaussian transfer. In a 500-iteration $n=100$ PPO run, Bernoulli-$0.1$ Graph-SND tracks full SND while reducing per-call metric time by about $10\times$, and frozen-policy GPU timing up to $n=500$ follows the predicted $\binom{n}{2}/|E|$ speedup. Random $d$-regular expanders empirically achieve $\mathrm{SND}_{G}^{\mathrm{u}}/\mathrm{SND} \in [0.9987, 1.0013]$ at $\Theta(n \log n)$ edges. In DiCo diversity control at $n=50$, Bernoulli-$0.1$ Graph-SND preserves set-point tracking with paired reward differences indistinguishable from zero across nine matched cells while cutting per-call metric cost by ${\sim}9.5\times$. Together, these results show that the SND aggregation bottleneck can be removed without changing the metric's semantics, yielding a drop-in sparse alternative that scales beyond complete-graph SND and supports both passive measurement and closed-loop diversity control.

[507]  arXiv:2605.05023 [pdf, ps, other]
Title: CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels
Comments: Accepted to ACL 2026
Subjects: Machine Learning (cs.LG)

Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet supporting diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-written kernels achieve high efficiency but are difficult to adapt. Recent work explores large language models (LLMs) for GPU kernel generation, but prior studies report unstable correctness and significant performance gaps for complex operators such as attention.
We present CuBridge, an LLM-based framework that adapts expert-written attention kernels through a structured lift-transfer-lower workflow. CuBridge starts from expert-written CUDA attention kernels and lifts them into an executable intermediate representation that makes execution orchestration explicit while abstracting low-level CUDA syntax. Given a user-provided PyTorch specification, CuBridge generates and verifies a target IR program, then reconstructs optimized CUDA code via reference-guided lowering. Across diverse attention variants and GPU platforms, CuBridge consistently produces correct kernels and substantially outperforms general frameworks, compiler-based approaches, and prior LLM-based methods.

[508]  arXiv:2605.05025 [pdf, ps, other]
Title: Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals
Authors: Gijs van Dijk
Comments: ACL SRW 2026
Subjects: Computation and Language (cs.CL)

We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or external models. Specifically, we measure the Kullback-Leibler divergence between each attention head's distribution and a uniform reference distribution, and use these features in a logistic regression probe. Across multiple datasets, task types, and model families, attention divergence is highly predictive of answer correctness and performs competitively with existing uncertainty estimation methods. We find that this signal is concentrated in middle layers and on factual tokens such as named entities and numbers, suggesting that attention dynamics provides an efficient and interpretable white-box signal of model uncertainty.

[509]  arXiv:2605.05026 [pdf, ps, other]
Title: Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models
Comments: Preprint
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fingers. Recent research studied this failure mode from several viewpoints, offering partial explanations to their occurrence, such as mode interpolation. In this work, we propose a complementary perspective that treats hallucinations as instabilities on the model-induced manifold. We begin by showing that a hallucination filter based on such instabilities matches or exceeds the performance of the recently proposed temporal one. By tracing the source of these instabilities, we identify local intrinsic dimension (LID) as their primary driver and propose Intrinsic Quenching (IQ), a direct corrective mechanism that deflates it to alleviate hallucinations. IQ consistently outperforms standard hallucination reduction baselines across a wide array of benchmarks and offers a highly promising solution for enforcing anatomical consistency in downstream medical imaging tasks.

[510]  arXiv:2605.05027 [pdf, ps, other]
Title: Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification
Comments: Accepted to CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Lifelong person re-identification (LReID) aims to train a generalizable model with sequentially collected data. However, such models often suffer from semantic drift, limited adaptability, and catastrophic forgetting as new domains emerge. Existing exemplar-free approaches largely rely on visual-only distillation or parameter regularization, while overlooking the potential of auxiliary modalities, such as text, to preserve semantic stability and enable incremental plasticity. We observe that the frozen text encoder in pretrained vision-language models can serve as a stable semantic anchor across domains. To decouple the roles of vision and text, we propose Prompt-Anchored vision-text Distillation (PAD), an asymmetric vision-text framework for semantic alignment and cross-domain generalization. On the textual side, we distill prompts to preserve vision-text alignment under a fixed semantic space, acting as a global semantic reference rather than a dominant learning signal. On the visual side, an EMA-based teacher with an adaptive prompt pool enables domain-wise adaptation by allocating new slots while freezing past ones. Extensive experiments show that PAD substantially outperforms state-of-the-art methods across seen and unseen domains, achieving a strong balance between stability and plasticity. Project page is available at https://github.com/zu-zi/PAD.

[511]  arXiv:2605.05029 [pdf, ps, other]
Title: The Predictive-Causal Gap: An Impossibility Theorem and Large-Scale Neural Evidence
Authors: Kejun Liu
Comments: 15 pages, 5 figures, 3 tables. Supplemental Material included (Sections S1-S10)
Subjects: Machine Learning (cs.LG)

We report a systematic failure mode in predictive representation learning. Across 2695 neural network configurations trained to predict linear-Gaussian dynamics, the optimal encoder tracks the environment rather than the system it is meant to model. The mean causal fidelity -- the fraction of encoder sensitivity allocated to system degrees of freedom -- is 0.49, and only 2.5% of configurations exceed 0.70. The failure intensifies with dimension: at N=100, the optimal encoder becomes causally blind (fidelity ~10^{-8}) while achieving 92% lower prediction error than the causal representation. We prove this is not an optimization artifact but a structural property of the predictive objective: when environment modes are slower or less noisy than system modes, every minimizer of the population risk encodes the former. The set of dynamics exhibiting this predictive-causal gap is open and of positive measure in parameter space. In a nonlinear Duffing-GRU sweep, unconstrained predictors learn environment-dominant representations in 55% of tasks (95% CI 41--68%) versus 24% under operational grounding (p=2.3e-3); the median out-of-distribution MSE inflation under environment shift is 1.82x versus 1.00x. Operational grounding -- restricting the loss to system observables -- partially suppresses the gap, but causal fidelity is never recovered without an explicit system-environment boundary. The results identify the predictive-causal gap as a structural limit of learning, with implications for self-supervised representation learning, world models, and the scaling paradigm.

[512]  arXiv:2605.05031 [pdf, ps, other]
Title: Computer-Aided Design Generation by Cascaded Discrete Diffusion Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent deep learning approaches seek to automate CAD creation by representing a model as a sequence of discrete commands and parameters, and then generating them using autoregressive models or continuous diffusion operating in Euclidean embedding space. However, continuous diffusion perturbs representations in a continuous Euclidean domain that does not reflect the inherently discrete and heterogeneous nature of CAD tokens, often producing perturbed representations that map to semantically invalid symbols. To overcome this limitation, we propose a cascaded discrete diffusion framework for CAD generation, which consists of a command diffusion for generating CAD commands and a parameter diffusion conditioned on CAD commands. Unlike isotropic Gaussian perturbation, the forward process of our approach operates directly over categorical token distributions using delicate transition matrices. For commands, we adopt an absorbing-state transition matrix that progressively corrupts tokens to a designated symbol; for parameters, we introduce specific transition matrices tailored to heterogeneous attributes: a Gaussian kernel for coordinate continuity, a scale-invariant kernel for dimensional values, and a prior-preserving kernel for boolean attributes. The reverse process is achieved by two denoising networks: a Transformer-based encoder for command recovery, and a parameter network with extra local self-attention for command-level interaction and cross-attention for conditional injection. Experiments on the DeepCAD dataset show that the proposed approach surpasses existing autoregressive and continuous diffusion models on unconditional generation metrics, while qualitative results validate effective controllability in conditional generation tasks. Source codes will be released.

[513]  arXiv:2605.05032 [pdf, ps, other]
Title: Quantized Probabilistic AI for Gear Fault Diagnosis in Motor Drives
Comments: This manuscript has been accepted for publication in 2026 International Power Electronics Conference, IPEC-Nagasaki 2026 -ECCE Asia-
Subjects: Systems and Control (eess.SY)

Deploying large artificial intelligence (AI) models in power electronics often demands high computational resources. Driven by the quantization paradigm, this digest proposes a quantization-aware training (QAT) principle to substantially minimize the number of bits required and simultaneously maximize the accuracy of computations in pre-trained AI models. Considering a pre-trained probabilistic Bayesian Neural Network (BNN) for gear fault diagnosis in motor drives as an example, we quantize its weights and activation functions from floating-point FP32 to low-precision INT8 values, which enhances the computational efficiency by a significant margin of 30-45% (for different model versions) without any compromise in the accuracy and uncertainty estimates. This substantiates a sustainable mechanism of deploying most quantized light-weight AI models into low-cost edge processors for power electronic applications.

[514]  arXiv:2605.05034 [pdf, ps, other]
Title: Few-Shot Learning Pipeline for Monkeypox Skin Disease Classification Using CNN Feature Extractors
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Despite the strong performance of Convolutional Neural Networks (CNNs) in disease classification, their effectiveness often depends on access to large annotated datasets, which is an impractical requirement for emerging or rare conditions such as Monkeypox. To overcome this limitation, we propose a few-shot learning (FSL) framework that employs SimpleShot, a lightweight, non-parametric, inductive classifier, for Monkeypox and pox-like skin disease recognition from limited labeled examples. The proposed pipeline passes the skin lesion images through a frozen, pretrained CNN backbone to obtain feature embeddings, which are then classified via SimpleShot using nearest-centroid comparisons in a normalized embedding space. We systematically benchmark six widely used CNN backbones as feature extractors under consistent experimental settings, enabling fair comparison. Experiments on three publicly available datasets (MSLD v1.0, MSID, and MSLD v2.0) are conducted across 2-way, 4-way, and 6-way tasks with 1-shot, 5-shot, and 10-shot configurations. Among all models, MobileNetV2_100 consistently achieves the highest accuracy. In addition, we present a cross-dataset evaluation for Monkeypox classification, revealing that binary Mpox-vs-Others transfer remains comparatively stable while multi-class performance degrades significantly under domain shift. Together, these results demonstrate the practical utility of combining inductive FSL methods with lightweight CNN backbones and highlight the importance of domain robustness for reliable real-world clinical deployment.

[515]  arXiv:2605.05040 [pdf, ps, other]
Title: Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

On-policy distillation is an efficient alternative to reinforcement learning, offering dense token-level training signals. However, its reliance on a stronger external teacher has driven recent work on on-policy self-distillation, where the same model serves as both teacher and student under different prompt contexts. Yet, existing self-distillation methods largely reduce learning to KL matching toward the context-augmented teacher model. This approach often suffers from training instability and can degrade reasoning performance over time. Moreover, self-distillation from the same model with prompt augmentation lacks the exploratory diversity provided by a genuine external teacher. To address these limitations, we move beyond fixed-teacher KL matching and propose \textbf{P}reference-\textbf{B}ased \textbf{S}elf-\textbf{D}istillation (\textbf{PBSD}), which revisits on-policy self-distillation through a reward-regularized perspective. Instead of directly matching the teacher distribution, we derive a reward-regularized objective whose analytic optimum is a reward-reweighted teacher distribution, yielding a target policy provably superior to the original teacher under this objective. Practically, PBSD optimizes preference gaps between teacher and student samples while maintaining on-policy student sampling. We support this framework with a statistical analysis of the induced preference-learning problem, formally establishing when on policy self-distillation is preferable to learning from an external teacher in our setting. Experiments on mathematical reasoning and tool-use benchmarks across multiple model scales demonstrate that PBSD consistently achieves the strongest average performance among comparable baselines, showing improved training stability over prior self-distillation baselines while preserving token efficiency.

[516]  arXiv:2605.05043 [pdf, ps, other]
Title: Finding accurate eigenvalues and eigenvectors of positive semi-definite matrices given a subspace
Subjects: Numerical Analysis (math.NA)

We revisit a classical problem in numerical linear algebra: given an $k$-dimensional subspace $\mathcal{Q}$ that approximates the leading eigenspace of an $n\times n$ positive semi-definite matrix $A$, the goal is to extract high-accuracy eigenvalues. The Rayleigh-Ritz (RR) method is the standard algorithm for the task, which has been shown to be optimal in several ways (when $A$ is symmetric, not necessarily positive semi-definite $A\succeq 0$). In this paper, we show that when $A \succeq 0$, alternative methods can outperform RR, while having the same computational complexity, that is, the main cost is in computing $AQ$, plus an $O(nk^2)$ term. In particular, we advocate the use of Nystr{\"o}m's method, showing that the approximate eigenvalues always have higher accuracy than RR, and the improvement can be arbitrarily large. The difference is significant, especially when $A$ has a fast-decaying spectrum. A similar improvement is numerically observed for the purpose of approximating the leading eigenvectors. In contrast, when the target eigenvalues are the trailing ones, the situation is reversed, and the Nystr{\"o}m method performs poorly; we suggest a remedy for this situation.

[517]  arXiv:2605.05044 [pdf, ps, other]
Title: Efficient Cost-Based Rewrite in a Bottom-Up Optimizer
Subjects: Databases (cs.DB)

The query optimizer in a Database Management Systems (DBMS), translates declarative queries into efficient execution plans. Conventional bottom-up optimization consists of two main stages: Query Rewrite (QRW) and Cost-Based Optimization (CBO). However, applying a rewrite rule during QRW may not always be beneficial; the best choice may depend on the (estimated) execution cost of the original and rewritten expressions. Fully exploiting such cost-dependent rules necessitates interleaving QRW with frequent CBO invocations, thereby incurring substantial overhead and often impractical optimization times. To mitigate this inefficiency, we introduce a novel cost-based rewrite framework for bottom-up optimizers. The core of our approach is a multi-level caching mechanism for intermediate CBO results aimed at eliminating redundant computation. Furthermore, we establish and exploit upper cost bounds to intelligently prune the search space during optimization. We also contribute methodological solutions for caching and reusing intermediate plan results within a bottom-up optimizer architecture. The framework has been implemented in the GaussDB optimizer. Experiments show that it significantly reduces overall optimization time, demonstrating the effectiveness of our approach.

[518]  arXiv:2605.05045 [pdf, ps, other]
Title: When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild distortions significantly degrade relational reasoning across models and datasets. We further evaluate prompt-based augmentation and preprocessing strategies (orientation correction and denoising), finding that while they offer partial improvements, they do not fully resolve hallucinations. Our results reveal a gap between perceptual robustness and relational understanding, highlighting the need for more robust, geometry-aware VLMs.

[519]  arXiv:2605.05046 [pdf, ps, other]
Title: Sampling Simultaneous Edge-Colorings
Subjects: Discrete Mathematics (cs.DM); Probability (math.PR)

We study the sampling problem for simultaneous edge colorings. Given a pair of graphs $G_1=(V,E_1)$ and $G_2=(V,E_2)$ which are on the same vertex set $V$, a simultaneous edge coloring is an edge coloring of $G_1\cup G_2$ so that each of the individual graphs is properly colored. When each of $G_1$ and $G_2$ are of maximum degree $\Delta$, then it is conjectured that $\Delta+2$ colors suffice, and recent work asymptotically establishes the conjecture.
We study Markov chains for randomly sampling from the uniform distribution over simultaneous edge colorings. Straightforward applications of Jerrum's classical coupling argument establish rapid mixing of the Glauber dynamics on the corresponding line graph when $k>8\Delta$. We present a simple weighted Hamming distance for which Jerrum's coupling yields optimal mixing time (up to constant factors) of $O(m\log{n})$ when $k>(6+\delta)\Delta$ for any fixed $\delta>0$. Moreover, utilizing the flip dynamics with our new metric, we obtain $O(m\log{n})$ mixing of the flip dynamics with a local choice of flip parameters, which flips only bounded-size components, when $k\geq 5.95\Delta$. The proof adapts previous coupling analyses for the flip dynamics to the setting of simultaneous edge colorings.

[520]  arXiv:2605.05047 [pdf, ps, other]
Title: Local Homophily on Bicolored Graphs is $\mathbf{P}$-complete
Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Combinatorics (math.CO)

We propose a local transformation on bicolored graphs, which we call local homophily, inspired by adaptive networks and based on majority dynamics and homophily. In this transformation, a vertex updates its color to match the majority of its neighbors, while neighbors of the same color become connected and neighbors of the opposite color become disconnected.
We show how to simulate Boolean circuits using local homophily and establish that determining whether a given pair of vertices becomes connected under iterative applications of local homophily is $\mathbf{P}$-complete under logspace reductions.

[521]  arXiv:2605.05049 [pdf, ps, other]
Title: Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Frontier models increasingly adopt Mixture-of-Experts (MoE) architectures to achieve large-model performance at reduced cost. However, training MoE models on HPC platforms is hindered by large memory footprints, frequent large-scale communication across heterogeneous networks, and severe workload imbalance. To characterize these challenges, we develop a mathematical model that quantifies memory, compute, and communication requirements for MoE configurations under various parallelization schemes, verified through micro-benchmarking, code instrumentation, and hardware profiling. Our analysis identifies performance bottlenecks: all-to-all latency at scale from expert parallelism, insufficient compute-communication overlap, low GPU utilization from imbalanced skinny GEMMs, and the absence of platform-aware hybrid parallelization strategies. To address these, we introduce Piper, a framework that leverages resource modeling to identify efficient training strategies for MoE models on target HPC platforms, applying pipeline parallelism with optimized schedules. Piper achieves 2-3.5X higher MFU than state-of-the-art frameworks such as X-MoE, and a novel all-to-all algorithm delivers 1.2-9X bandwidth over vendor implementation.

[522]  arXiv:2605.05050 [pdf, ps, other]
Title: Kinematic Discriminants of Deceleration Behavior Modes in Car-Following: Evidence from NGSIM Trajectory Data
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Gap-closing rate and visual looming swap discriminative dominance depending on deceleration intensity - a finding that reconciles a long-standing conflict in the car-following literature and challenges spacing-centered assumptions in traditional driver behavior models. This study presents a two-stage analytical framework that distinguishes between information availability (kinematic variables measurable in the environment) and information utilization (variables that demonstrably separate driver behavioral patterns), applied to 1,060,119 valid car-following observations from the NGSIM trajectory dataset (2,932 vehicles). Six kinematic features are extracted, and deceleration events are detected under two threshold conditions (-0.5 m/s^2 and -0.3 m/s^2). K-means clustering identifies behavioral modes, and one-way ANOVA with eta-squared effect sizes ranks each feature's discriminative power. Three key findings emerge: (1) threshold selection fundamentally shapes behavioral inference - the stricter threshold yields three interpretable modes while the permissive threshold collapses these to two; (2) hard braking prioritizes gap-closing rate (eta^2 = 0.715) while moderate braking emphasizes visual looming (eta^2 = 0.574); and (3) spacing headway is negligible (eta^2 <= 0.014) across both thresholds. These findings provide empirically grounded candidates for perceptual cue prioritization and have direct implications for ADAS warning system design and autonomous vehicle control.

[523]  arXiv:2605.05053 [pdf, ps, other]
Title: Reduced-order Neural Modeling with Differentiable Simulation for High-Detail Tactile Perception
Comments: IEEE RoboSoft 2026
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Tactile perception is key to dexterous manipulation, yet simulating high-resolution elastomer deformation remains computationally prohibitive. Finite element methods (FEM) deliver high fidelity but demand costly remeshing, while Material Point Methods (MPM) suffer from heavy particle-memory tradeoffs. We propose a {reduced-order neural simulation framework} that couples coarse-grained MPM dynamics with an implicit neural decoder to reconstruct sub-particle tactile details from compact latent states. The framework learns a continuous deformation manifold from paired high- and low-resolution simulations, enabling physically consistent, differentiable inference. Compared to the TacIPC, our method achieves over 65\% faster simulation and {40\% lower memory usage}, while maintaining better geometric fidelity. In tactile rendering and 3D surface reconstruction, our methods further improve accuracy by 25\% and produce realistic depth images and surface mesh within a faster inference speed. These results demonstrate that the proposed reduced-order neural model enables high-detail, physically grounded tactile simulation with substantial efficiency gains for robotic interaction and optimization.

[524]  arXiv:2605.05054 [pdf, ps, other]
Title: Direct Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recent flow matching (FM) methods improve the few-shot adaptation of vision-language models, by modeling cross-modal alignment as a continuous multi-step flow. In this paper, we argue that existing FM methods are inherently constrained by incompatible geometric priors on pre-trained cross-modal features, resulting in suboptimal adaptation performance. We first analyze these methods from a polar decomposition perspective (i.e., radial and angular sub-manifolds). Under this new geometric view, we identify three overlooked limitations in them: 1) Angular dynamics distortion: The radial-angular coupling induces non-uniform speed on the angular sub-manifold, leading to regression training difficulty and extra truncation errors. 2) Radial dynamics neglect: Feature normalization discards modality confidence, failing to distinguish out-of-distribution and in-distribution data, and abandoning crucial radial dynamics. 3) Context-agnostic unconditional flow: Dataset-specific information loss during pre-trained cross-modal feature extraction remains unrecovered. To resolve these issues, we propose warped product flow matching (WP-FM), a unified Riemannian framework that reformulates alignment on a warped product manifold. Within this framework, we derive direct product flow matching (DP-FM) by introducing a constant-warping metric, which yields a decoupled cylindrical manifold (i.e., direct product manifold). DP-FM enables independent radial evolution and constant-speed angular geodesic transport, effectively eliminating angular dynamics distortion while preserving radial consistency. Meanwhile, we incorporate classifier-free guidance by conditioning the flow on the pre-trained VLMs' hidden states to inject missing dataset-specific information. Extensive results across 11 benchmarks have demonstrated that DP-FM achieves a new state-of-the-art for multi-step few-shot adaptation.

[525]  arXiv:2605.05055 [pdf, ps, other]
Title: Adaptive Learning Strategies for AoA-Based Outdoor Localization: A Comprehensive Framework
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Localization in 5G and 6G networks is essential for important use cases such as intelligent transportation, smart factories, and smart cities. Although deep learning has enabled improving localization accuracy, depending on the deployment scenario and the effort required for dataset collection campaigns on a given infrastructure, the training process for localization models can vary significantly. Furthermore, with respect to feature selection, recent works have demonstrated the robustness of angle-of-arrival (AoA) based localization. In view of these two points, we propose an adaptive framework for AoA-based localization that consists of two alternative learning strategies, each suited either for large or small training datasets. The proposed framework is evaluated on a real, massive multiple input multiple output (mMIMO) orthogonal frequency division multiplexing (OFDM) outdoor channel state information (CSI) dataset. First, we investigate offline learning when large training datasets are available; we propose a hierarchical framework that first distinguishes between line of sight (LoS) and non line of sight (NLoS) regions and then moves to more fine grained localization in the respective region. This approach provides high-performance localization through accumulated batch retraining and an integrated hyperparameter optimization mechanism. Second, when only a small training dataset is available, an online learning framework is proposed, using incremental tree-based and ensemble-based models for handling streaming data and continuously updating mode, as well as an online few-shot learning model for rapidly initializing new classes from a limited labeled support set. These results showcase that highly accurate robust localization can be achieved incrementally during network operation by exploiting online learning, alleviating the need for large dataset collection campaigns.

[526]  arXiv:2605.05057 [pdf, ps, other]
Title: ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Open-vocabulary human-object interaction (HOI) detection requires recognizing interaction phrases that may not appear as annotated categories during training. Recent vision-language HOI detectors improve semantic transfer by matching human-object features with text embeddings, but their predictions are often dominated by object affordance and phrase-level co-occurrence. As a result, a model may predict \textit{cut cake} from the presence of a knife and a cake without verifying whether the hand, tool, target, contact pattern, and object state jointly support the action. We propose \textbf{ScriptHOI}, a structured framework that represents each interaction phrase as a soft scripted state transition. Rather than treating a phrase as a single class token, ScriptHOI decomposes it into body-role, contact, geometry, affordance, motion, and object-state slots. A visual state tokenizer parses each detected human-object pair into corresponding state tokens, and a slot-wise matcher estimates both script coverage and script conflict. These two quantities calibrate HOI logits, expose missing visual evidence, and provide training constraints for incomplete annotations. To avoid suppressing valid but unannotated interactions, we further introduce interval partial-label learning, which constrains unannotated candidates with script-derived lower and upper probability bounds instead of assigning closed-world negatives. A counterfactual script contrast loss swaps individual script slots to discourage object-only shortcuts. Experiments on HICO-DET, V-COCO, and open-vocabulary HOI splits show that ScriptHOI improves rare and unseen interaction recognition while substantially reducing affordance-conflict false positives.

[527]  arXiv:2605.05058 [pdf, ps, other]
Title: SoK: Robustness in Large Language Models against Jailbreak Attacks
Comments: To Appear in the 47th IEEE Symposium on Security and Privacy, May 18-20, 2026
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have achieved remarkable success but remain highly susceptible to jailbreak attacks, in which adversarial prompts coerce models into generating harmful, unethical, or policy-violating outputs. Such attacks pose real-world risks, eroding safety, trust, and regulatory compliance in high-stakes applications. Although a variety of attack and defense methods have been proposed, existing evaluation practices are inadequate, often relying on narrow metrics like attack success rate that fail to capture the multidimensional nature of LLM security. In this paper, we present a systematic taxonomy of jailbreak attacks and defenses and introduce Security Cube, a unified, multi-dimensional framework for comprehensive evaluation of these techniques. We provide detailed comparison tables of existing attacks and defenses, highlighting key insights and open challenges across the literature. Leveraging Security Cube, we conduct benchmark studies on 13 representative attacks and 5 defenses, establishing a clear view of the current landscape encompassing jailbreak attacks, defenses, automated judges, and LLM vulnerabilities. Based on these evaluations, we distill critical findings, identify unresolved problems, and outline promising research directions for enhancing LLM robustness against jailbreak attacks. Our analysis aims to pave the way towards more robust, interpretable, and trustworthy LLM systems. Our code is available at Code.

[528]  arXiv:2605.05059 [pdf, ps, other]
Title: A Comparison Between Co-Located and Distributed MIMO Deployments in OFDM-ISAC Networks
Comments: Accepted to the 32nd International Conference on Telecommunications (ICT 2026), Thessaloniki, Greece
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper investigates network-level integrated sensing and communication (ISAC) under two fundamentally different topology configurations: cell-free massive MIMO (CF-mMIMO) and multi-cell massive MIMO (MC-mMIMO). A unified OFDM-based waveform is adopted for both architectures as the key enabler for ISAC functionalities. The CF system exploits distributed access points (APs) and a scalable user-target-centric operation, whereas the MC system relies on co-located transmit-receive arrays with conventional cell-centric deployment. For both architectures, we derive a GLRT-based sensing detector and the corresponding sensing SNR expressions. We then examine a series of case studies investigating how the number of OFDM subcarriers, the transceiver allocation strategy, and the antenna/node distribution across the network affect the sensing performance. The results consistently demonstrate that CF-mMIMO provides more robust and higher sensing performance across most tested scenarios, particularly when transmit resources or antenna elements are spatially distributed. These findings highlight the inherent advantages of CF deployments for next-generation ISAC networks.

[529]  arXiv:2605.05062 [pdf, ps, other]
Title: Full-chip CMP modelling based on Fully Convolutional Network leveraging White Light Interferometry
Comments: Presented at the International Conference on Planarization Technology 2025 in Hong Kong
Subjects: Machine Learning (cs.LG)

As time-to-market is crucial in the Integrated Circuit (IC) industry, speeding up layout manufacturability verifi-cation is essential. Chemical-Mechanical Polishing (CMP) plays a vital role in IC fabrication but is significantly influenced by Layout-Dependent Effects (LDE). An accurate and efficient CMP model enables design teams to correct surface unevenness before fabrication, reducing costs and accelerating the design phase. However, existing models often rely on Density Step Height (DSH) modeling, which is time-consuming for calibration and requires substantial hardware resources for fine-grained predictions. In this paper, we propose combining the advantages of two surface analysis techniques, White Light Interfer-ometry (WLI) and Atomic Force Microscopy (AFM), to train a deep learning model. This model aims to predict full-chip post-CMP nanotopography with nanometer-scale accuracy. Our deep learning model is based on a Convolutional Neural Network (CNN) and follows a two-step pipeline. The model is trained on each technique separately, resulting in a detailed full-chip CMP model.

[530]  arXiv:2605.05066 [pdf, ps, other]
Title: The Impossibility Triangle of Long-Context Modeling
Authors: Yan Zhou
Comments: 41 pages, 6 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We identify and prove a fundamental trade-off governing long-sequence models: no model can simultaneously achieve (i) per-step computation independent of sequence length (Efficiency), (ii) state size independent of sequence length (Compactness), and (iii) the ability to recall a number of historical facts proportional to sequence length (Recall). We formalize this trade-off within an Online Sequence Processor abstraction that unifies Transformers, state space models, linear recurrent networks, and their hybrids. Using the Data Processing Inequality and Fano's Inequality, we prove that any model satisfying Efficiency and Compactness can recall at most O(poly(d)/log V) key-value pairs from a sequence of arbitrary length, where d is the model dimension and V is the vocabulary size. We classify 52 architectures published before March 2026 into the triangle, showing that each achieves at most two of the three properties and that hybrid architectures trace continuous trajectories in the interior. Experiments on synthetic associative recall tasks with five representative architectures validate the theoretical bound: empirical recall capacity lies strictly below the information-theoretic limit, and no architecture escapes the triangle.

[531]  arXiv:2605.05071 [pdf, ps, other]
Title: Look Once, Beam Twice: Camera-Primed Real-Time Double-Directional mmWave Beam Management for Vehicular Connectivity
Comments: Accepted to the 2026 IEEE International Conference on Sensing, Communication, and Networking (IEEE SECON 2026). Code and models available at: this https URL
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)

Millimeter-wave (mmWave) frequencies promise multi-gigabit connectivity for vehicle-to-everything (V2X) networks, but face challenges in terms of severe path loss and mobility-related beam misalignment. Reliable V2X connectivity requires fast, double-directional beam alignment. However, existing methods suffer from high training overhead and limited generalization to unseen scenarios. This paper presents VIsion-based BEamforming(VIBE), a hybrid model-based, closed-loop, learning architecture for real-time double-directional mmWave beam management primed by camera sensing. VIBE fuses machine learning, model-based reasoning, and closed-loop RF feedback to balance beam-pair establishment latency with link quality. VIBE bypasses exhaustive training overhead and accelerates link establishment by leveraging camera observations to reduce the beam-search space. Lightweight beam refinement and offset tracking mechanisms adaptively refine beams in response to dynamic application requirements. VIBE is implemented and evaluated across online indoor/outdoor testbeds, public datasets, and real-time vehicular experiments, demonstrating strong generalization capabilities, making it suitable for real-time V2X communication. Comparisons with 5G NR hierarchical beamforming show that VIBE consistently maintains lower outage rates. Furthermore, VIBE outperforms state-of-the-art end-to-end ML models for beam selection when evaluated on public datasets and achieves outage rates as low as 1.1-1.4 %. The results show that a hybrid model-based, closed-loop learning architecture is better suited for real-world mmWave vehicular connectivity than end-to-end trained ML models. For reproducibility, we publish our code to https://github.com/UNL-CPN-Lab/Look-Once-Beam-Twice.

[532]  arXiv:2605.05072 [pdf, ps, other]
Title: Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy
Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D occupancy prediction aims to infer dense, voxel-wise scene semantics from sensor observations, where the 2D-to-3D view transformation serves as a crucial step in bridging image features and volumetric representations. Most previous methods rely on a fixed projection space, where 3D reference points are uniformly sampled along pillars. However, such sampling struggles to capture the sparsity and height variations of real-world scenes, leading to ambiguous correspondences and unreliable feature aggregation. To address these challenges, we propose HiPR, a camera-LiDAR occupancy framework with Height-Guided Projection Reparameterization. HiPR first encodes LiDAR into a BEV height map to capture the maximum height of the point cloud. HiPR then adjusts the sampling range of each pillar using the height prior, enabling adaptive reparameterization of the projection space. As a result, the projected points are redistributed into geometrically meaningful regions rather than fixed ranges. Meanwhile, we mask out the invalid parts of the height map to avoid misleading the feature aggregation. In addition, to alleviate the training instability caused by noisy LiDAR-derived heights, we introduce a training-time Progressive Height Conditioning strategy, which gradually transitions the conditioning signal from ground-truth heights to LiDAR heights. Extensive experiments demonstrate that HiPR consistently outperforms existing state-of-the-art methods while maintaining real-time inference. The code and pretrained models can be found at https://github.com/Rayn-Wu/HiPR.

[533]  arXiv:2605.05077 [pdf, ps, other]
Title: FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate image segmentation is essential for modern computer vision applications such as image editing, autonomous driving, and medical image analysis. In recent years, Dichotomous Image Segmentation (DIS) has become a standard task for training and evaluating highly accurate segmentation models. Existing DIS approaches often fail to preserve fine-grained details or fully capture the semantic structure of the foreground. To address these challenges, we present FlowDIS, a novel dichotomous image segmentation method built on the flow matching framework, which learns a time-dependent vector field to transport the image distribution to the corresponding mask distribution, optionally conditioned on a text prompt. Moreover, with our Position-Aware Instance Pairing (PAIP) training strategy, FlowDIS offers strong controllability through text prompts, enabling precise, pixel-level object segmentation. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches both with and without language guidance. Compared with the best prior DIS method, FlowDIS achieves a 5.5% higher $F_{\beta}^{\omega}$ measure and 43% lower MAE ($\mathcal{M}$) on the DIS-TE test set. The code is available at: https://github.com/Picsart-AI-Research/FlowDIS

[534]  arXiv:2605.05079 [pdf, ps, other]
Title: A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping
Comments: 15 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video sequence capturing through refractive dynamic media, such as a turbulent air or water surface, often suffer from severe geometric distortions and temporal instability. While recent advances address mild atmospheric turbulence, no existing benchmarks systematically evaluate restoration methods under strong and highly nonuniform refractive conditions. We present a comprehensive benchmark for geometric distortion removal in video, covering a range from turbulence-like mild warping to strong discontinuous refractive deformations. The benchmark includes both laboratory-captured real data and synthetic sequences generated for static scenes via physics-based light refraction modeling across four distortion levels and multiple surface wave types. We evaluate a spectrum of methods from simple baselines and classical registration algorithms to advanced learning-based approaches including DATUM and our proposed diffusion based V-cache for high and extreme distortions regimes. Evaluation uses both pixel-level (PSNR, SSIM), and perceptual (LPIPS, DINO, CLIP) metrics providing the first large scale analysis of geometric distortion removal. Our benchmark establishes a new foundation for developing and evaluating algorithms capable of reconstructing video from highly distorted optical environments. Our code and datasets are available at https://github.com/iafoss/refractive-mfir-benchmark.

[535]  arXiv:2605.05080 [pdf, ps, other]
Title: The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences
Subjects: Computation and Language (cs.CL)

We administer 45 validated psychometric questionnaires to 50 large language models (LLMs) to identify the dimensions along which LLMs differ psychometrically. Using Supervised Semantic Differential (SSD), we find that the primary axis of between-model variance separates items describing phenomenally rich experience, including embodied sensation, felt affect, inner speech, imagery, and empathy, from items describing stimulus-driven behavioral reactivity ($R^2_{adj}=.037$, $p<.0001$). To test this hypothesis at the item level, we introduce the Pinocchio score ($\pi_i$), the ratio of inter-model response variance under neutral prompting to that under a human-simulation prompt, as an annotation-free measure of each item's experiential demand. $\pi_i$ predicts condition-induced shifts in primary factor loading magnitudes ($\rho=-.215$, $p<.0001$, $n=1292$--$1310$ items), confirming that between-model divergence on experiential items is structured rather than noisy. Applying PCA to per-model EFA scores across all questionnaires reveals one dominant dimension, the Pinocchio Axis ($\Pi$): the degree to which a model presents itself as a locus of phenomenal experience rather than a system of behavioral responses. This axis captures 47.1% of cross-questionnaire between-model variance in primary factor scores and converges with item-level Pinocchio scores ($r=.864$). Marked within-provider divergence across closely related model variants is consistent with post-training fine-tuning as a key contributor, supporting the interpretation that $\Pi$ reflects a training-shaped self-representational tendency governing how a model treats experiential language as self-applicable. The dominant axis of between-model psychometric variation is therefore not a conventional personality trait but a self-representational stance toward one's own nature as an experiencer.

[536]  arXiv:2605.05081 [pdf, ps, other]
Title: Provable imitation learning for control of instability in partially-observed Vlasov--Poisson equations
Subjects: Machine Learning (cs.LG); Analysis of PDEs (math.AP); Optimization and Control (math.OC); Plasma Physics (physics.plasm-ph)

We consider the stabilization of Vlasov--Poisson plasma dynamics, a central control problem in nuclear fusion. Our focus is the gap between what an ideal controller would use and what experiments can actually observe: while optimal policy may rely on the full phase-space state, practical feedback is typically limited to sparse macroscopic diagnostics. We therefore study imitation learning methods that distill a fully observed expert policy into controllers operating only on macroscopic measurements. We show the stability guarantees of the learned policy, where the error floor depends on the minimal behavior cloning loss achievable under the observation constraints. We further characterize this minimal loss in terms of a notion of entropy that quantifies the complexity of the initial distribution. Our results demonstrates the theoretical feasibility of learning stabilizing feedback policies for kinetic plasma dynamics from macroscopic observations, and exhibits the adaptivity of the learning approach to low-complexity structures. Through extensive numerical experiments, we validate our theory and show that the learned policies can stabilize the system using only macroscopic observations, within a significantly longer time horizon than non-adaptive baseline controllers.

[537]  arXiv:2605.05084 [pdf, ps, other]
Title: Order Matters: Improving Domain Adaptation by Reordering Data
Subjects: Machine Learning (cs.LG)

Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in stochastic settings, which can stifle the theoretical benefits of the method. This paper proposes Optimal Reordering of Data for Error-Reduced Estimation of Discrepancy (ORDERED), a novel unbiased stochastic variance reduction technique which reduces the discrepancy estimation error by optimising the order in which the training data are sampled. We consider two specific domain discrepancy losses (correlation alignment and the maximum mean discrepancy), formulate their stochastic estimation error as a function of the data sampling order, and propose a practical optimisation algorithm. Our simulations demonstrate reduced variance compared to related methods, and experiments on two domain shift image classification benchmarks show improved target domain accuracy.

[538]  arXiv:2605.05088 [pdf, ps, other]
Title: Gated Multimodal Learning for Interpretable Property Energy Performance Prediction and Retrofit Scenario Analysis
Subjects: Machine Learning (cs.LG); Physics and Society (physics.soc-ph)

Achieving resilient and sustainable cities requires scalable approaches to decarbonising residential buildings, which account for about 20% of UK greenhouse gas emissions and 25% of energy-related emissions in the European Union. Energy Performance Certificates (EPCs) support regulation and retrofit planning, but their reliance on on-site inspections limits timely city-scale assessment. This study introduces a gated multimodal model to predict Standard Assessment Procedure (SAP) energy efficiency and Environmental Impact (EI) scores by integrating EPC tabular variables, assessor-written free text, and Geographic Information System (GIS)-derived spatial features describing footprint geometry, height, area, and orientation. Sample-wise gating learns property-specific modality weights, while an auxiliary band classification head stabilises training. In a Westminster, London case study, the model predicts SAP and EI scores with MAEs of 4.03 and 4.76 points and R2 values of 0.757 and 0.748, respectively, achieving a mean MAE of 4.39. Ablation results show that full multimodal fusion outperforms unimodal and bimodal baselines for both score prediction and band-level classification. Interpretability analyses provide decision-relevant evidence: gating weights indicate strong reliance on assessor text; SHAP highlights main fuel, built form, and construction age band; text occlusion prioritises roof and wall fields; and spatial attribution is dominated by height and footprint area, with sensitivity to footprint shape. The validated framework is further applied to retrofit scenarios for wall insulation, roof insulation, and window glazing upgrades, indicating projected improvements in SAP, EI, annual energy cost, and equivalent CO2 emissions. Overall, the framework provides scalable property-level evidence for retrofit screening, intervention prioritisation, and net-zero housing transitions.

[539]  arXiv:2605.05090 [pdf, ps, other]
Title: Automatically Finding and Validating Unexpected Side-Effects of Interventions on Language Models
Comments: 33 pages, 4 figures, 20 tables, targeting EMNLP submission
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present an automated, contrastive evaluation pipeline for auditing the behavioral impact of interventions on large language models. Given a base model $M_1$ and an intervention model $M_2$, our method compares their free-form, multi-token generations across aligned prompt contexts and produces human-readable, statistically validated natural-language hypotheses describing how the models differ, along with recurring themes that summarize patterns across validated hypotheses.
We evaluate the approach in synthetic setting by injecting known behavioral changes and showing that the pipeline reliably recovers them. We then apply it to three real-world interventions, reasoning distillation, knowledge editing and unlearning, demonstrating that the method surfaces both intended and unexpected behavioral shifts, distinguishes large from subtle interventions, and does not hallucinate differences when effects are absent or misaligned with the prompt bank. Overall, the pipeline provides a statistically grounded and interpretable tool for post-hoc auditing of intervention-induced changes in model behavior.

[540]  arXiv:2605.05092 [pdf, ps, other]
Title: Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-step rollout capabilities for driver dynamics. We introduce Driver-WM, a driver-centric latent world model that rolls out in-cabin dynamics causally conditioned on out-cabin traffic context. This formulation unifies physical kinematics forecasting with auxiliary behavioral and emotional semantic recognition. Operating in a compact latent space constructed from frozen vision-language features, Driver-WM adopts a dual-stream architecture to separately encode external traffic and internal driver states. These streams are directionally coupled via a gated causal injection mechanism, which uses a learned vector gate to modulate external contextual perturbations while strictly enforcing temporal causality. Evaluations on a multi-task assistive driving benchmark demonstrate that Driver-WM yields robust long-horizon geometric forecasting for reactive high-motion maneuvers and improves semantic alignment for both driver and traffic states. Finally, the explicit external-to-internal conditioning allows for controlled test-time interventions to systematically analyze mechanism responses.

[541]  arXiv:2605.05095 [pdf, ps, other]
Title: A Bayesian Approach for Task-Specific Next-Best-View Selection with Uncertain Geometry
Comments: Code for this paper is available at this https URL
Journal-ref: ACM SIGGRAPH 2026
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

We develop a framework for task-specific active next-best-view selection in 3D reconstruction from point clouds, by casting the problem in the language of Bayesian decision theory. Our framework works by (a) placing a prior distribution over the space of implicit surfaces, (b) using recently-developed stochastic surface reconstruction methods to calculate the resulting posterior distribution, then (c) using the posterior distribution to carefully reason about which view to scan next. This enables us to perform camera selection in a manner that is directly optimized for the intended use of the reconstructed data - meaning, we reduce uncertainty only in those regions that make a difference in the task at hand, as opposed to prior approaches that reduce it uniformly across space. We evaluate our method across three distinct downstream tasks: semantic classification, segmentation, and PDE-guided physics simulation. Experimental results demonstrate that our framework achieves superior task performance with fewer views compared to commonly used baselines and prior general uncertainty-reduction techniques.

[542]  arXiv:2605.05096 [pdf, ps, other]
Title: CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation
Subjects: Information Retrieval (cs.IR)

Generative recommendation maps each item to a sequence of Semantic IDs (SIDs) and recasts retrieval as autoregressive token generation. In this paradigm the main bottleneck is the tokenizer rather than the Transformer: residual vector quantization with a hard nearest-neighbor assignment at every layer collapses multi-faceted item semantics at cluster boundaries and propagates early errors to later SID positions. A common workaround is to append a dense vector or attribute prefix to the SID, but this dual-representation design inflates inference cost and gives up the simplicity of a generative interface. We address the bottleneck at the tokenizer itself. CAPSID replaces hard residual quantization with capsule routing: at each layer an item probabilistically routes to several semantic capsules, the residual is updated by the routed reconstruction rather than by a single winning code, and the SID terminates once the active capsule's confidence is high enough. On top of CAPSID, SEMANTICBPE composes adjacent SID tokens into reusable subwords by combining their co-occurrence with their embedding compatibility. On Amazon Beauty, Sports, Toys, and a 35M-item proprietary industrial catalog, CAPSID+SEMANTICBPE improves Recall at 10 by 9.6% on average over ReSID, the strongest single-representation baseline, and matches or exceeds a COBRA-style sparse-dense system on every public benchmark while running at 51% of its inference latency. Ablations show that soft routing, iterative agreement, and confidence-driven length each contribute independently, and the gains are largest on tail items where boundary semantics dominate.

[543]  arXiv:2605.05097 [pdf, ps, other]
Title: Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics
Comments: Preprint. 9 pages, 2 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynamics make new associations immediately usable, strengthen what repetition confirms, and let the rest fade. We argue that external memory should follow a similar principle. In Memini, this view takes the form of an associative memory that organizes knowledge as a directed graph. Each edge carries two coupled internal variables, one fast and one slow, following the Benna-Fusi model of synaptic consolidation. From this coupling, episodic sensitivity, gradual consolidation, and selective forgetting emerge as facets of a single mechanism, reframing external memory as a learning substrate that reorganizes through its own dynamics.

[544]  arXiv:2605.05102 [pdf, ps, other]
Title: Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning
Comments: Accepted at the Conference of Learning Theory (COLT) 2026
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $\delta \in (0,1]$, thereby characterizing the regret distribution across the full range of $\delta$. We present a simple UCBVI-style algorithm with exploration bonus $\min\{c_{1,k}/N, c_{2,k}/\sqrt{N}\}$, where $N$ denotes the visit count and $(c_{1,k},c_{2,k})$ are user-specified parameters. For arbitrary parameter sequences, we derive general gap-independent and gap-dependent distributional regret bounds, yielding a principled characterization of how the parameters control the trade-off between expected performance, tail risk, and instance-dependent behavior. In particular, our bounds achieve optimal trade-offs between expected and distributional regret in both minimax and instance-dependent regimes. As a special case, for multi-armed bandits with $A$ arms and horizon $T$, we obtain a distributional regret bound of order $\mathcal{O}(\sqrt{AT}\log(1/\delta))$, confirming the conjecture of Lattimore & Szepesv\'ari (2020, Section 17.1) for the first time.

[545]  arXiv:2605.05103 [pdf, ps, other]
Title: Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
Comments: 25 pages, 8 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

We introduce the **Concept Field** of a text corpus: a local drift field with pointwise uncertainty, estimated in sentence-embedding space from the deltas between consecutive sentences. Given a candidate sentence transition, we score its agreement with the field by $\zeta$, the mean absolute z-distance between the observed delta and the field's local Gaussian estimate. The score is black-box (no model internals), corpus-attributable (every score traces to nearby corpus sentences), and admits a direct probabilistic reading. We support the computation with the introduction of a **Vector Sequence Database (VSDB)** that stores embeddings together with sequence-position and next-delta metadata. We evaluate this approach on two large-scale settings: hallucination-style groundedness detection over the U.S. Code of Federal Regulations, and novelty detection over Project Gutenberg. Using controlled LLM-generated rewrites, Concept Fields achieve strong selective classification performance under a grounded / ungrounded / unsure triage policy, which unlike retrieval-centric baselines have similar coverage-risk behavior across both domains, supporting a probability-based interpretation that transfers across domains. We also sketch how divergence and curl of the Concept Field, computed on dense clusters, surface qualitatively meaningful semantic patterns (logic sources, sinks, and implicit topics), which we offer as hypothesis-generating rather than as a quantitative result. Concept Fields provide a fast, lightweight, and interpretable signal for groundedness and novelty, complementary to LLM-as-judge and white-box detectors.

[546]  arXiv:2605.05105 [pdf, ps, other]
Title: Minimizing the Expected Cost of Synchronization in Lossless Power Networks
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)

The reliable operation of large-scale electric power networks is increasingly challenging, particularly with the integration of stochastic renewable generation. In this work, we address the problem of minimizing network transients by optimally modifying the underlying network. We formulate the problem in terms of graph Laplacian matrices and show that, under certain assumptions, the problem is convex. We derive a linear matrix inequality whose feasibility guarantees the existence and uniqueness of phase cohesive steady-state angles; this condition can be directly incorporated as a convex constraint in the optimization framework and we provide several geometric interpretations of the optimization problem. The proposed method is validated on the IEEE 30-bus test system, where results demonstrate that our approach effectively identifies critical links on the network. Dynamic simulations show a significant reduction in network transients and overall improvements across several performance metrics. We explore the sparsity-optimality trade-off using a reweighted $\ell_1$ heuristic.

[547]  arXiv:2605.05107 [pdf, ps, other]
Title: Input-Output Specifications and Dynamic Droop Coefficients: Stability and Performance Conditions for Grid-Forming IBRs
Subjects: Systems and Control (eess.SY)

This paper proposes dynamic stability and performance conditions for grid-connected inverter-based resources (IBRs). To this end, we extend the notion of steady-state droop coefficients to dynamic droop coefficients to capture the small-signal dynamics of IBRs and synchronous generators (SGs). Notably, the dynamic droop coefficients can be obtained from input-output data collected at the unit's (e.g., IBR or SG) point of interconnection without requiring prior knowledge of IBR internals or controls structure. To obtain frequency stability conditions, this IBR model is combined with a lightweight dynamic transmission network model that accounts for uncertainty of line dynamics. The resulting stability conditions are highly scalable and, given a few key network parameters, can be verified at the unit level. To make the conditions practical and offer intuitive and illustrative interpretations, we map the frequency stability conditions to bounds on the Bode plot of the dynamic droop coefficient for two broad types of IBR responses. Moreover, our specifications on the dynamic droop coefficient (i) translate basic frequency control ancillary services into verifiable requirements, and (ii) provide insights into the much-debated question of how to certify an IBR as grid-forming (GFM). The results are illustrated using dynamic droop coefficients obtained using detailed simulations of GFM and GFL IBRs as well as SGs.

[548]  arXiv:2605.05110 [pdf, ps, other]
Title: LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts
Comments: Published in IEEE Robotics and Automation Letters (RA-L), 2026
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physically infeasible guidelines using a tracking margin that permits controlled deviation, resolves temporal ambiguity by measuring progress via traveled distance along the guideline, and disambiguates motion details through position- and sequence-based key-orientations. We evaluate LineRides on the Ultra Mobility Vehicle (UMV) and show that the policy trained with our methods supports seamless transitions between normal driving and stunt execution, enabling five distinct stunts on command: MiniHop, LargeHop, ThreePointTurn, Backflip, and DriftTurn.

[549]  arXiv:2605.05112 [pdf, ps, other]
Title: Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
Comments: 25 pages, 8 figures, 11 tables
Subjects: Machine Learning (cs.LG)

SWE-bench-style agentic reinforcement learning relies on expensive stateful trajectories, yet substantial compute is wasted on sampled rollout groups with skewed pass rates, where binary rewards provide a weak contrastive signal. We frame this inefficiency as a pass-rate control problem and show that a 50% pass rate is the most informative operating point: it maximizes reward entropy, the probability of surviving group filtering, RLOO advantage energy under GRPO, and success--failure contrastive structure. Guided by this principle, we propose Prefix Sampling (PS), which replays trajectory prefixes to steer skewed groups toward this regime: successful prefixes serve as head starts for mostly failing groups, while failing prefixes serve as handicaps for mostly passing groups. In stateful agent environments, prefix states are reconstructed through replay while replayed tokens are excluded from the loss, restricting optimization to continuations generated by the current policy. On SWE-bench-style agentic RL, PS delivers end-to-end wall-clock speedups of 2.01x on Qwen3-14B and 1.55x on Qwen3-32B while preserving or improving final verified performance. For 14B, the SWE-bench Verified peak rises from the baseline peak of 0.273 to 0.295 under PS. Additional mathematical reasoning experiments on AIME 2025 show the same pass-rate control pattern and decompose the gains into replay, bidirectional coverage, and adaptive control.

[550]  arXiv:2605.05113 [pdf, ps, other]
Title: How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences
Authors: Mariia Seleznova
Subjects: Machine Learning (cs.LG)

We study signal propagation in linear recurrent models at finite width. While existing signal propagation theory relies predominantly on the infinite-width limit, it remains unclear for how long that approximation remains accurate when recurrent depth $t$ grows jointly with width $n$. This question is especially relevant for modern recurrent sequence models, whose natural operating regime involves long input sequences, i.e., large $t$. We derive exact finite-width formulas for the hidden state signal energies in linear recurrences under complex Gaussian initialization. Using these formulas, we identify the joint depth-width scaling regimes that govern signal propagation: (i) a subcritical regime $t=o(\sqrt n)$, in which the infinite-width approximation remains valid; (ii) a critical regime $t\sim c\sqrt n$, in which non-negligible deviations from infinite-width predictions appear and a nontrivial joint scaling limit emerges; and (iii) a supercritical regime $t\gg \sqrt n$, in which finite-width effects dominate. Thus, our results pinpoint the precise recurrent depth scale at which infinite-width theory breaks down in long-range linear recurrences. In turn, this shows when standard initialization schemes, such as Glorot, become unstable. More broadly, our results demonstrate that finite-width effects accumulate more rapidly with depth in recurrent models than in feedforward ones, leading to qualitatively different signal propagation behavior.

[551]  arXiv:2605.05115 [pdf, ps, other]
Title: Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Subjects: Machine Learning (cs.LG)

Neural representations carry rich geometric structure; but does that structure causally shape behavior? To address this question, we intervene along paths through activation space defined by different geometries, and measure the behavioral trajectories they induce. In particular, we test whether interventions that respect the geometry of activation space will yield behaviors close to those the model exhibits naturally. Concretely, we first fit an activation manifold $M_h$ to representations and a behavior manifold $M_y$ to output probability distributions. We then test the link $M_h \leftrightarrow M_y$ via interventions: we find that steering along $M_h$, which we term manifold steering, yields behavioral trajectories that follow $M_y$, while linear steering -- which assumes a Euclidean geometry -- cuts through off-manifold regions and hence produces unnatural outputs. Moreover, optimizing interventions in activation space to produce paths along $M_y$ recovers activation trajectories that trace the curvature of $M_h$. We demonstrate this bidirectional relationship between the geometry of representation and behavior across tasks and modalities. In language models, we use reasoning tasks with cyclic and sequential geometries as well as in-context learning tasks with more complex graph geometries. In a video world model, we use a task with geometry corresponding to physical dynamics. Overall, our work shows that geometry in neural representation is not merely incidental, but is in fact the proper object for enabling principled control via intervention on internals. This recasts the core problem of steering from finding the right direction to finding the right geometry.

[552]  arXiv:2605.05116 [pdf, ps, other]
Title: On the Hardness of Junking LLMs
Comments: 27 pages, 13 figures, 2 tables
Subjects: Machine Learning (cs.LG)

Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit semantic structure. These attacks generally operate by fixing an adversarial instruction and optimizing small adversarial components (e.g., suffixes or prefixes). In this setting, prompt structure is fundamental for performance, and recent results show that even simple random search can achieve strong performance when combined with sophisticated prompt design. Recently, it has been observed that harmful behaviors can be elicited even without the adversarial prompt, relying solely on optimized token sequences. This suggests the existence of natural backdoors, i.e., token sequences naturally emerged during LLMs training that trigger unsafe outputs without any meaningful instruction. However, despite these observations, this setting remains largely unexplored, and in particular the hardness of finding natural backdoors has not been assessed yet. In this work, we provide a first proof-of-concept study investigating the hardness of this task, which we refer to as the junking problem. We formalize it as the problem of finding token sequences that maximize the probability of generating a target prefix of harmful responses, propose a greedy random-search method to assess is such sequences can be discovered easily. Our results show that this problem is harder than standard jailbreak attacks, confirming the importance of semantic information in prompt design. At the same time, we find that our simple strategy is sufficient to solve it with a high success rate, suggesting that natural backdoors are present and easily recoverable. Finally, through perplexity analysis, we observe that the discovered token sequences lie in low-probability regions of the model distribution, supporting the hypothesis that they emerged implicitly from the training process.

[553]  arXiv:2605.05118 [pdf, ps, other]
Title: On the Wasserstein Gradient Flow Interpretation of Drifting Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of probability measures, equipped with the geometry of optimal transport. Unlike previous WGF-based contributions, GMD can be thought of as directly targeting a fixed point of a specific WGF flow. We demonstrate three main results: first, that one algorithm proposed by Deng et al. (2026) corresponds to finding the limiting point of a WGF on the KL divergence, with Parzen smoothing on the densities. Second, that the algorithm actually implemented by Deng et al. (2026) corresponds to a different procedure, which bears some resemblance to the fixed point of a WGF on the Sinkhorn divergence, but lacks certain desirable properties of the latter. Third, the same same idea can be extended to the limiting point of other WGFs, including the Maximum Mean Discrepancy (MMD), the sliced Wasserstein distance, and GAN critic functions.

[554]  arXiv:2605.05119 [pdf, ps, other]
Title: MCFlash: Bulk Bitwise Processing in 3D NAND with Dynamic Sensing and Multi-level Encoding
Comments: 27 pages, 10 figures, preprint under review
Subjects: Hardware Architecture (cs.AR)

This paper presents MCFlash, a practical and immediately deployable technique for executing bulk bitwise operations directly within commercial off-the-shelf(COTS) 3D NAND flash chips. MCFlash relies solely on standard user-mode instructions, combining Multi-Level Cell (MLC) data encodings with dynamically tuned read reference voltages to execute in-place bitwise operations. We evaluate MCFlash across diverse NAND flash chips, both floating-gate and charge-trap variants, from different generations. Our results represent the first demonstration of error-free, on-chip bitwise operations, sustaining over one billion operations on fresh blocks and maintaining bit-error rates below 0.015% even after 10,000 program/erase (P/E) cycles.

[555]  arXiv:2605.05120 [pdf, ps, other]
Title: Physiologically Grounded Driver Behavior Classification: SHAP-Driven Elite Feature Selection and Hybrid Gradient Boosting for Multimodal Physiological Signals
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

An interpretable and scalable framework for decoding driving behaviors from multimodal physiological signals is proposed in this study. We utilize multimodal physiological driving behavior large-scale dataset comprising synchronized electroencephalogram (EEG), electromyography (EMG), and galvanic skin response (GSR) signals. Our approach involves rigorous preprocessing followed by a domain-specific feature extraction pipeline targeting time-domain, frequency-domain, and derived physiological indices. To address high dimensionality, we employ SHAP-based elite feature selection, retaining the top 250 features to reduce computational overhead while preserving predictive power. Hyperparameter optimization for extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) models is conducted using Bayesian optimization via Optuna. Finally, a weighted soft-voting ensemble is constructed to leverage the complementary strengths of both gradient boosting frameworks. The results demonstrate that the proposed ensemble achieves a test accuracy of 80.91% and a macro-F1 score of 0.79, significantly outperforming single-modality baselines and traditional machine learning models. Ablation studies confirm an 8% performance gain over the best single modality (EEG), validating the necessity of multimodal fusion. SHAP analysis further validates the physiological plausibility of the model, revealing that the EEG contributes the majority of predictive weight, GSR and EMG features provide critical discriminatory signals for high-arousal and motor-intensive maneuvers.

[556]  arXiv:2605.05121 [pdf, ps, other]
Title: Beyond Semantics: An Evidential Reasoning-Aware Multi-View Learning Framework for Trustworthy Mental Health Prediction
Subjects: Computation and Language (cs.CL)

Automated mental health prediction using textual data has shown promising results with deep learning and large language models. However, deploying these models in high-stakes real-world settings remains challenging, as existing approaches largely rely on semantic representations and often produce overconfident predictions under ambiguous, noisy, or shifted data. Moreover, most methods lack reliable uncertainty estimation, undermining trust in risk-sensitive mental health applications. To address these limitations, we formulate the task as a multi-view learning problem that integrates semantic information from encoder-only models with higher-level reasoning information from decoder-only models, where reasoning-aware representations and uncertainty modeling are obtained in a trustworthy manner. To ensure reliable fusion, we adopt an evidential learning framework based on Subjective Logic to explicitly model uncertainty and introduce an evidential fusion strategy that balances complementary views while discounting unreliable evidence. Benchmarking on three real-world datasets, Dreaddit, SDCNL, and DepSeverity, reports accuracies of 0.835, 0.731, and 0.751, respectively, demonstrating its potential for reliable mental health prediction. Additional experiments on robustness to noise and case studies for interpretability confirm that our proposed framework not only improves predictive performance but also provides trustworthy uncertainty estimates and human-understandable reasoning signals, making it suitable for risk-sensitive applications in mental health assessment.

[557]  arXiv:2605.05123 [pdf, ps, other]
Title: Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In offline-to-online reinforcement learning (O2O-RL), policies are first safely trained offline using previously collected datasets and then further fine-tuned for tasks via limited online interactions. In a typical O2O-RL pipeline, candidate policies trained with offline RL are evaluated via either off-policy evaluation (OPE) or online evaluation (OE). The policy with the highest estimated value is then deployed and continually fine-tuned. However, this setup has two main issues. First, OPE can be unreliable, making it risky to deploy a policy based solely on those estimates, whereas OE may identify a viable policy with substantial online interaction, which could have been used for fine-tuning. Second--and more importantly--it is also often not possible to determine a priori whether a pretrained policy will improve with post-deployment fine-tuning, especially in non-stationary environments. As a result, procedures committing to a single deployed policy are impractical in many real-world settings. Moreover, a naive remedy that exhaustively fine-tunes all candidates would violate interaction budget constraints and is likewise infeasible. In this paper, we propose a novel adaptive approach for policy selection and fine-tuning under online interaction budgets in O2O-RL. Following the standard pipeline, we first train a set of candidate policies with different offline RL algorithms and hyperparameters; we then perform OPE to obtain initial performance estimates. We next adaptively select and fine-tune the policies based on their predicted performance via an upper-confidence-bound approach thereby making efficient use of online interactions. We demonstrate that our approach improves upon O2O-RL baselines with various benchmarks.

[558]  arXiv:2605.05124 [pdf, ps, other]
Title: Conditional outlier detection for clinical alerting
Comments: AMIA 2010 Annual Symposium proceedings, pp. 286-290. Homer R. Warner Best Paper Award
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management actions using past patient cases stored in an electronic health record (EHR) system. Our hypothesis is that patient-management actions that are unusual with respect to past patients may be due to a potential error and that it is worthwhile to raise an alert if such a condition is encountered. We evaluate this hypothesis using data obtained from the electronic health records of 4,486 post-cardiac surgical patients. We base the evaluation on the opinions of a panel of experts. The results support that anomaly-based alerting can have reasonably low false alert rates and that stronger anomalies are correlated with higher alert rates.

[559]  arXiv:2605.05125 [pdf, ps, other]
Title: Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Target trial emulation (TTE) enables causal questions to be studied with observational data when randomized controlled trials (RCTs) are infeasible. Yet treatment-effect methods often address causal estimation, missingness, and temporal structure separately, limiting their robustness in electronic health records (EHRs), where time-varying confounding and missing-not-at-random (MNAR) biomarkers can reach 50%--80%. We propose a two-stage pipeline for treatment effect estimation from incomplete longitudinal EHRs. First, CausalFlow-T, a directed acyclic graph (DAG)-constrained normalizing flow with long short-term memory (LSTM)-encoded patient history, performs exact invertible counterfactual inference, avoiding approximation errors from variational inference and separating confounding through explicit causal structure. Ablations on four synthetic and one semi-synthetic benchmark with known counterfactuals show that DAG constraints and exact inference address distinct failure modes: neither compensates for the other. Second, because CausalFlow-T requires completed inputs, we introduce an LLM-driven evolutionary imputer that proposes executable imputation operators rather than individual entries, and evaluate it with three large language model (LLM) backends, including two open-source models. Across 30%--80% MNAR missingness, this imputer achieves the best pooled rank over biomarker and causal metrics, leading in point-wise accuracy and temporal extrapolation while preserving average treatment effect (ATE) recovery as statistical baselines degrade. On Swiss primary-care EHRs from adults with type 2 diabetes initiating a GLP-1 receptor agonist or SGLT-2 inhibitor, the pipeline estimates a per-protocol weight-loss difference of -0.98 kg [95% CI -1.01, -0.96] favoring GLP-1 receptor agonists, consistent with randomized evidence and obtained from realistically incomplete real-world EHRs.

[560]  arXiv:2605.05126 [pdf, ps, other]
Title: ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation
Comments: Accepted to CVPR 2026, Project Page: this https URL
Subjects: Robotics (cs.RO)

Current Vision-Language-Action (VLA) models primarily focus on mapping 2D observations to actions, but exhibit notable limitations in spatiotemporal perception and reasoning: 1) spatial representations often rely on additional sensors, introducing substantial computational overhead; 2) visual reasoning is typically limited to future-frame prediction, lacking alignment with the instruction-grounded scene and thus compromising spatiotemporal consistency. To address these challenges, we propose ConsisVLA-4D, a unified and efficient framework that enhances spatiotemporal consistency in 3D perception and 4D reasoning. Specifically, we design: 1) CV-Aligner, which ensures cross-view object semantic consistency by filtering instruction-relevant regions and aligning object identities across multiple viewpoints; 2) CO-Fuser, which guarantees cross-object spatial geometric consistency by eliminating spatial relation ambiguities between objects across views using compact latent representations. Building upon these, we introduce 3) CS-Thinker to achieve cross-scene spatiotemporal consistency as actions unfold. It learns implicit knowledge of local dynamics from object-semantic tokens of CV-Aligner and global depth from geometric tokens of CO-Fuser, thereby enhancing efficient visual reasoning under scene variations. Extensive experiments demonstrate that, benefiting from its efficient spatiotemporal consistency design, ConsisVLA-4D achieves 21.6% and 41.5% performance improvements, along with 2.3-fold and 2.4-fold inference speedups compared to OpenVLA on the LIBERO benchmark and real-world platforms, respectively.ConsisVLA-4D is open-sourced and publicly available at

[561]  arXiv:2605.05129 [pdf, ps, other]
Title: BDF2-type integrator for Landau-Lifshitz-Gilbert equation in micromagnetics: a-priori error estimates
Comments: 10 figures
Subjects: Numerical Analysis (math.NA)

We consider the Landau-Lifshitz-Gilbert equation (LLG), which models time-dependent micromagnetic phenomena. We analyze a fully discrete scheme that combines first-order finite elements in space with a BDF2 method in time. The method requires the solution of only one linear system of equations per time step and does not enforce the pointwise unit-length constraint of the magnetization. While unconditional weak convergence has been analyzed in an earlier work, we now prove optimal-order convergence rates under sufficient regularity assumptions on the exact solution and the external field. In combination with our previous work, this establishes the first higher-order-in-time and linear integrator that converges both to weak and strong solutions of LLG. Numerical experiments confirm first-order convergence in space and second-order convergence in time.

[562]  arXiv:2605.05133 [pdf, ps, other]
Title: Transformed Latent Variable Multi-Output Gaussian Processes
Comments: ICML 2026
Subjects: Machine Learning (cs.LG)

Multi-Output Gaussian Processes (MOGPs) provide a principled probabilistic framework for modelling correlated outputs but face scalability bottlenecks when applied to datasets with high-dimensional output spaces. To maintain tractability, existing methods typically resort to restrictive assumptions, such as employing low-rank or sum-of-separable kernels, which can limit expressiveness. We propose the Transformed Latent Variable MOGP (T-LVMOGP), a novel framework that scales MOGPs to a massive number of outputs while preserving the capacity to capture meaningful inter-output dependencies. T-LVMOGP constructs a flexible multi-output deep kernel by mapping inputs and output-specific latent variables into an embedding space using a Lipschitz-regularised neural network. Combined with stochastic variational inference, our model effectively scales to high-dimensional output settings. Across diverse benchmarks, including climate modelling with over 10,000 outputs and zero-inflated spatial transcriptomics data, T-LVMOGP outperforms baselines in both predictive accuracy and computational efficiency.

[563]  arXiv:2605.05134 [pdf, ps, other]
Title: Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS)

Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrieval, we propose a new method that treats the LLM as a black-box dynamical system. By projecting LLM responses into a high-dimensional manifold via an embedding model, we characterize the resulting vector sequences as observable realizations of the model's latent state-space dynamics. Leveraging Koopman operator theory, we fit the transition operators for both factual and hallucinated regimes and define a differential residual score based on their respective prediction errors. To accommodate varying user requirements and domain-specific sensitivities, we introduce a preference-aware calibration mechanism that optimizes the classification threshold based on a small set of demonstrations. This approach enables low-cost hallucination detection in a single-sample pass, avoiding the need for secondary sampling or external grounding. Extensive testing across three data benchmarks demonstrates that our method achieves state-of-the-art performance with reduced resource overhead.

[564]  arXiv:2605.05136 [pdf, ps, other]
Title: CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization
Comments: 9 pages, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Domain Generalization (DG) aims to learn representations that remain robust under out-of-distribution (OOD) shifts and generalize effectively to unseen target domains. While recent invariant learning strategies and architectural advances have achieved strong performance, explicitly discovering a structured domain-invariant subspace through second-order statistics remains underexplored. In this work, we propose CPCANet, a novel framework grounded in Common Principal Component Analysis (CPCA), which unrolls the iterative Flury-Gautschi (FG) algorithm into fully differentiable neural layers. This approach integrates the statistical properties of CPCA into an end-to-end trainable framework, enforcing the discovery of a shared subspace across diverse domains while preserving interpretability. Experiments on four standard DG benchmarks demonstrate that CPCANet achieves state-of-the-art (SOTA) performance in zero-shot transfer. Moreover, CPCANet is architecture-agnostic and requires no dataset-specific tuning, providing a simple and efficient approach to learning robust representations under distribution shift. Code is available at https://github.com/wish44165/CPCANet.

[565]  arXiv:2605.05138 [pdf, ps, other]
Title: Executable World Models for ARC-AGI-3 in the Era of Coding Agents
Authors: Sergey Rodionov
Comments: 8 pages. Submitted to AGI-26
Subjects: Artificial Intelligence (cs.AI)

We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. We report results on the 25 public ARC-AGI-3 games. Each recorded playthrough uses a fresh agent instance with no access to previous playthrough-specific files or conversation state. Most games have a single recorded playthrough; for a few games, we report multiple independent fresh-agent playthroughs to expose run-to-run variability. The agent fully solved 7 games, achieved a Relative Human Action Efficiency greater than 75%, on 6 games, and obtained a mean per-game RHAE of 32.58%. Because the system uses no game-specific code, it can serve as a game-general baseline for ARC-AGI-3. Performance on the private validation set remains to be tested. Overall, the results provide preliminary evidence that verifier-driven executable world models are a promising approach for ARC-AGI-3 agents.

[566]  arXiv:2605.05144 [pdf, ps, other]
Title: Human-AI Co-Mentorship in Project-Based Learning: A Case Study in Financial Forecasting
Comments: Accepted for publication in 2026 ASEE Annual Conference & Exposition
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

This paper reflects on a AI research project carried out by a team of high-school and early-undergraduate students under the mentorship of graduate researchers and ably assisted by AI tools. We share our experience in not only on the learning experience for the high school students, but also on how AI tools accelerated the process that enabled the high school students to focus on higher order problem formulation and solution. Although the participants entered the project with limited background in both AI and finance, they showed strong enthusiasm for technical market analysis and ETF price prediction. Traditional learning settings would first teach the necessary methods in a classroom setting and only later let students apply them. In contrast, our project emphasized workflow design: students identified the sequence of steps needed to address the problem and then used AI-driven tools to execute each step.
We note that the high school students developed the necessary code through iterating with the AI tools, and we used our daily stand-ups to debug and answer conceptual questions. Each of the student was able to dig deeper into their area of interest whether computer science or finance, while collaboratively making a significant advance over the summer of 2025. This project was an important pedagogical exercise on how AI tools can be used for mentoring high school students, allowing them to focus on their specific interests and using the daily stand-ups to focus on problem definition and conceptual understanding. Despite their limited technical qualifications, the students were able to leverage AI tools to build meaningful models with real-world application.

[567]  arXiv:2605.05145 [pdf, ps, other]
Title: Toward a Risk Assessment Framework for Institutional DeFi: A Nine-Dimension Approach
Authors: Eva Oberholzer, Valeriy Zamaraiev (ZWING Intelligence AG, Zug, Switzerland)
Comments: 20 pages, 2 tables. Framework paper on institutional DeFi risk assessment introducing composability risk, comprehension debt, and temporal risk dynamics
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Software Engineering (cs.SE)

Decentralized finance (DeFi) protocols now intermediate over USD 100 billion in value, including regulated stablecoins and tokenized assets deployed as collateral, yet no widely adopted framework operationalizes risk assessment at the rigor institutional adoption demands. Existing approaches emphasize protocol-specific parameter optimization or conceptual taxonomies without providing explainable, composability-aware, and structurally independent assessment methodologies.
We propose a nine-dimension DeFi risk assessment framework extending the six-dimension taxonomy introduced by Moody's Analytics and Gauntlet with three novel dimensions: composability risk, comprehension debt, and temporal risk dynamics. We additionally introduce a transparency confidence modifier separating assessment reliability from risk severity.
The framework is grounded in structural analysis of protocol dependencies conducted through an ontology-based protocol intelligence infrastructure covering more than 8,000 DeFi protocols. We retrospectively analyze 12 major DeFi-related incidents from 2024-2026 representing approximately USD 2.5 billion in direct losses. Five of the 12 incidents require at least one novel dimension for complete root-cause characterization, including the two highest-systemic-impact events in the dataset.

[568]  arXiv:2605.05148 [pdf, ps, other]
Title: What Matters in Practical Learned Image Compression
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is yet to be proposed.
In this work, we aim to close this gap. We conduct a comprehensive study of the key modeling choices that govern the design of a practical learned image codec, jointly optimized for perceptual quality and runtime -- including within the ablations several novel techniques. We then perform performance-aware neural architecture search over millions of backbone configurations to identify models that achieve the target on-device runtime while maximizing compression performance as captured by perceptual metrics.
We combine the various optimizations to construct a new codec that achieves a significantly improved tradeoff between speed and perceptual quality. Based on rigorous subjective user studies, it provides 2.3-3x bitrate savings against AV1, AV2, VVC, ECM and JPEG-AI, and 20-40% bitrate savings against the best learned codec alternatives. At the same time, on an iPhone 17 Pro Max, it encodes 12MP images as fast as 230ms, and decodes them in 150ms -- faster than most top ML-based codecs run on a V100 GPU.

[569]  arXiv:2605.05151 [pdf, ps, other]
Title: Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting
Comments: 13 pages, 5 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear models such as DLinear has fueled ongoing debate, but no mechanistic explanation for this phenomenon has been offered. We address this gap by applying sparse autoencoders (SAEs), a tool from mechanistic interpretability, to probe the internal representations of PatchTST. We first establish that a single-layer, narrow-dimensional transformer matches the forecasting performance of deeper configurations across commonly used benchmarks. We then train SAEs on the post-GELU intermediate FFN activations with dictionary sizes ranging from 0.5x to 4.0x the native dimensionality. Expanding the dictionary yields negligible downstream performance change (average 0.214%), with large portions of overcomplete dictionaries remaining inactive. Targeted causal interventions on dominant latent features produce minimal forecast perturbation. Across all evaluated settings, we observe no empirical evidence that the analyzed FFN representations rely on strong superposition. Instead, the representations remain sparse, stable under aggressive dictionary expansion, and largely insensitive to latent interventions. These results demonstrate that superposition is not necessary for competitive performance on standard forecasting benchmarks, suggesting they may not demand the rich compositional representations that drive transformer success in language modeling, and helping explain the persistent competitiveness of simple linear models

[570]  arXiv:2605.05152 [pdf, ps, other]
Title: Age of Gossip in Ring Networks With Non-Poisson Updates
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI); Signal Processing (eess.SP)

We consider a network consisting of $n$ nodes connected in a ring formation and a source that generates updates according to a renewal process and disseminates them to the ring network according to a Poisson process. The nodes in the network gossip with each other according to a push-based gossiping protocol, and disseminate version updates. Gossip between two neighbors happens at the arrivals of renewal processes with finite mean and variance. All renewal processes and Poisson processes in the network are independent but not identically distributed. We consider both uni-directional ring networks and bi-directional ring networks. We use version age of information to quantify the freshness of information at each node. Prior work has used the stochastic hybrid systems (SHS) approach or a first passage percolation (FPP) approach to analyze ring networks with edges following identical Poisson processes. In this work, we use a sample-path backtracking approach to characterize the probabilistic scaling of the version age of information of an arbitrary node in the gossip network, where each edge follows an independent but not identically distributed renewal process. We show that the version age of information of any node in the network is stochastically equivalent to $\sqrt{n}$ at any time instant after the node has received its first update from the source.

[571]  arXiv:2605.05155 [pdf, ps, other]
Title: Aes3D: Aesthetic Assessment in 3D Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

As 3D Gaussian Splatting (3DGS) gains attention in immersive media and digital content creation, assessing the aesthetics of 3D scenes becomes important in helping creators build more visually compelling 3D content. However, existing evaluation methods for 3D scenes primarily emphasize reconstruction fidelity and perceptual realism, largely overlooking higher-level aesthetic attributes such as composition, harmony, and visual appeal. This limitation comes from two key challenges: (1) the absence of general 3DGS datasets with aesthetic annotations, and (2) the intrinsic nature of 3DGS as a low-level primitive representation, which makes it difficult to capture high-level aesthetic features. To address these challenges, we propose Aes3D, the first systematic framework for assessing the aesthetics of 3D neural rendering scenes. Aes3D includes Aesthetic3D, the first dataset dedicated to 3D scene aesthetic assessment, built on our proposed annotation strategy for 3D scene aesthetics. In addition, we present Aes3DGSNet, a lightweight model that directly predicts scene-level aesthetic scores from 3DGS representations. Notably, our model operates solely on 3D Gaussian primitives, eliminating the need for rendering multi-view images and thus reducing computational cost and hardware requirements. Through aesthetics-supervised learning on multi-view 3DGS scene representations, Aes3DGSNet effectively captures high-level aesthetic cues and accurately regresses aesthetic scores. Experimental results demonstrate that our approach achieves strong performance while maintaining a lightweight design, establishing a new benchmark for 3D scene aesthetic assessment. Code and datasets will be made available in a future version.

[572]  arXiv:2605.05159 [pdf, ps, other]
Title: PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augmented with synthetic data generated by a large language model (LLM). We employ three synthetic data strategies (direct generation, paraphrasing, and contrastive pair creation) using GPT-4o-mini, with a multi-stage quality filtering pipeline including embedding-based deduplication. We find that per-language threshold tuning on the development set yields 2 to 4\% F1 improvements without retraining. We also use weighted ensembles of 12B and 27B model predictions with per-language strategy selection. Our final system achieves a mean macro-F1 of 0.811 across all 22 languages, ranking 2nd overall of the participating teams, with 1st place finishes in 3 languages and top-3 in 8 languages. We also find that alternative architectures (XLM-RoBERTa, Qwen3) that showed strong development set performance suffered 30 to 50\% F1 drops on the test set, highlighting the importance of generalization.

[573]  arXiv:2605.05160 [pdf, ps, other]
Title: Private Structured-Subset Retrieval
Subjects: Information Theory (cs.IT)

We introduce the \emph{Private Structured-Subset Retrieval (PSSR)} problem, where a user retrieves $D$ messages from a database of $K$ messages replicated across $N$ non-colluding servers, and the demand is restricted to a known structured family of $D$-subsets. This formulation generalizes classical Private Information Retrieval (PIR) and multi-message PIR (MPIR), and captures settings where the demand space is constrained by application-specific structure. Focusing on balanced ${\{0,1\}}$-linear schemes, we derive converse bounds on the maximum retrieval rate and minimum subpacketization level, and develop an optimization-based framework for constructing schemes for general structured demand families. Our results show that, for certain families, the PSSR rate converse bound can exceed the best-known MPIR rate upper bound; when this PSSR bound is achievable, MPIR rate-optimal schemes become suboptimal for those families. By exploiting demand structure, our PSSR schemes achieve higher retrieval rates for many families and never underperform the best-known balanced ${\{0,1\}}$-linear MPIR schemes. Our results also show that demand structure can reduce the required subpacketization even when the optimal rate is unchanged. Our parallel work on contiguous-demand families further illustrates the scope of this framework by yielding rate-optimal schemes with substantially smaller subpacketization and no field-size restrictions, improving upon MPIR-based schemes.

[574]  arXiv:2605.05161 [pdf, ps, other]
Title: Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging
Comments: submitted to MICCAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Zero-shot anomaly localisation via vision-language models (VLMs) offers a compelling approach for rare pathology detection, yet its performance is fundamentally limited by the absence of healthy anatomical context. We reformulate zero-shot localisation as a comparative inference problem in which anomalies are identified through structured comparison against reference distributions of normal anatomy. We introduce WALDO, a training-free framework grounded in optimal transport theory that enables comparative reasoning through: (i) entropy-weighted Sliced Wasserstein distances for anatomically-aware reference selection from DINOv2 patch distributions, (ii) Goldilocks zone sampling exploiting the non-monotonic relationship between reference similarity and localisation accuracy, and (iii) self-consistency aggregation via weighted non-maximum suppression. We theoretically analyse the Goldilocks effect through distributional divergence, and show that references with moderate similarity minimize a bias-variance trade-off in comparative visual reasoning. On the NOVA brain MRI benchmark, WALDO with Qwen2.5-VL-72B achieves $43.5_{\pm1.6}\%$ mAP@30 (95\% CI: [40.4, 46.7]), representing a 19\% relative improvement over zero-shot baselines. Cross-model evaluation shows consistent gains: GPT-4o achieves $32.0_{\pm6.5}\%$ and Qwen3-VL-32B achieves $32.0_{\pm6.6}\%$ mAP@30. Paired McNemar tests confirm statistical significance ($p<0.01$). Source code is available at https://github.com/bkainz/WALDO_MICCAI26_demo .

[575]  arXiv:2605.05163 [pdf, ps, other]
Title: PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
Comments: Accepted by ICML 2026. Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in functional logic and hierarchical physics. To bridge this gap, we introduce PhysForge, a decoupled two-stage framework supported by PhysDB, a large-scale dataset of 150,000 assets with four-tier physical annotations. First, a VLM acts as a "physical architect" to plan a "Hierarchical Physical Blueprint" defining material, functional, and kinematic constraints. Second, a physics-grounded diffusion model realizes this blueprint by synthesizing high-fidelity geometry alongside precise kinematic parameters via a novel KineVoxel Injection (KVI) mechanism. Experiments demonstrate that PhysForge produces functionally plausible, simulation-ready assets, providing a robust data engine for interactive 3D content and embodied agents.

[576]  arXiv:2605.05164 [pdf, ps, other]
Title: Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Accurate analysis of histopathological images is critical for disease diagnosis and treatment planning. Whole-slide images (WSIs), which digitize tissue specimens at gigapixel resolution, are fundamental to this process but require aggregating thousands of patches for slide-level predictions. Multiple Instance Learning (MIL) tackles this challenge with a two-stage paradigm, decoupling tile-level embedding and slide-level prediction. However, most existing methods implicitly embed patch representations in homogeneous Euclidean spaces, overlooking the hierarchical organization and regional heterogeneity of pathological tissues. This limits current models' ability to capture global tissue architecture and fine-grained cellular morphology. To address this limitation, we introduce a hybrid hyperbolic-Euclidean representation that embeds WSI features in dual geometric spaces, enabling complementary modeling of hierarchical tissue structures and local morphological details. Building on this formulation, we develop BatMIL, a WSI classification framework that leverages both geometric spaces. To model long-range dependencies among thousands of patches, we employ a structured state space sequence model (S4) backbone that encodes patch sequences with linear computational complexity. Furthermore, to account for regional heterogeneity, we introduce a chunk-level mixture-of-experts (MoE) module that groups patches into regions and dynamically routes them to specialized subnetworks, improving representational capacity while reducing redundant computation. Extensive experiments on seven WSI datasets spanning six cancer types demonstrate that BatMIL consistently outperforms state-of-the-art MIL approaches in slide-level classification tasks. These results indicate that geometry-aware representation learning offers a promising direction for next-generation computational pathology.

[577]  arXiv:2605.05165 [pdf, ps, other]
Title: Interests Burn-down Diffusion Process for Personalized Collaborative Filtering
Subjects: Information Retrieval (cs.IR)

Generative methods have gained widespread attention in Collaborative Filtering (CF) tasks for their ability to produce high-quality personalized samples aligned with users' interests. Among them, diffusion generative models have raised increasing attention in recommendation field. Despite that the pioneering efforts have applied the conventional diffusion process to model diffusive user interests, the incongruity between the Gaussian noise and the subtle nature of user's personalized interaction behavior has led to sub-optimal results. To this end, we introduce a specifically-tailored diffusion scheme for interaction systems, namely the interests burn-down process. The interests burn-down process delineates the decay of user interests towards candidate items, complemented by its reverse burn-up process that yields personalized recommendation for users. The inherent burn-down nature of this process adeptly models the diffusive user interests, aligning seamlessly with the requirements of CF tasks. We present a novel recommendation method StageCF to illustrate the superiority of this newly proposed diffusion process. Experimental results have demonstrated the effectiveness of StageCF against existing generative and diffusion-based baseline methods. Furthermore, comprehensive studies validate the functionality of interests burn-down process, shedding light on its capacity to generate personalized interactions.

[578]  arXiv:2605.05166 [pdf, ps, other]
Title: The First Token Knows: Single-Decode Confidence for Hallucination Detection
Authors: Mina Gabriel
Comments: 6 pages, 1 figure
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled answers by meaning using natural language inference, but it adds both sampling cost and external inference overhead. We show that first-token confidence, phi_first, computed from the normalized entropy of the top-K logits at the first content-bearing answer token of a single greedy decode, matches or modestly exceeds semantic self-consistency on closed-book short-answer factual question answering. Across three 7-8B instruction-tuned models and two benchmarks, phi_first achieves a mean AUROC of 0.820, compared with 0.793 for semantic agreement and 0.791 for standard surface-form self-consistency. A subsumption test shows that phi_first is moderately to strongly correlated with semantic agreement, and combining the two signals yields only a small AUROC improvement over phi_first alone. These results suggest that much of the uncertainty information captured by multi-sample agreement is already available in the model's initial token distribution. We argue that phi_first should be reported as a default low-cost baseline before invoking sampling-based uncertainty estimation.

[579]  arXiv:2605.05168 [pdf, ps, other]
Title: Deterministic identification for Bernoulli channels and related channels with continuous input
Comments: 13 pages, 2 figures
Subjects: Information Theory (cs.IT)

For memoryless channels with continuous input alphabets, deterministic identification (DI) typically exhibits a linearithmic ($n\log n$) message growth. However, the exact DI capacity has long remained open due to a persistent gap between the best known achievability and converse bounds. This gap was recently closed for AWGN channels via a novel code construction optimising the "galaxy" codes. Here, we extend this approach to the Bernoulli channel and subsequently to any channel $W$ whose image contains a continuous curve of output probability distributions, and hence admits a reduction to the Bernoulli channel restricted to a subinterval of inputs. As a consequence, we prove that the converse bound is tight and establish $\dot{C}_{\text{DI}}(W) = \frac 12$ for this broad class of channels, thereby closing the long-standing capacity gap. A similar gap was also observed for the DI rate-reliability tradeoff. We analyse the tradeoff between rate and error of the proposed code and derive improved lower bounds on the reliability function, approaching the converse at leading order in the regime of small error exponents.

[580]  arXiv:2605.05169 [pdf, ps, other]
Title: Private Contiguous-Block Retrieval
Subjects: Information Theory (cs.IT)

We introduce the \emph{Private Contiguous-Block Retrieval (PCBR)} problem, where a user retrieves a block of $D$ messages with contiguous indices from $K$ replicated messages stored across $N$ non-colluding servers, while hiding the identity of the requested block from each server. This problem is motivated by storage and streaming systems where files are split into ordered segments. Unlike multi-message Private Information Retrieval (MPIR), where any $D$-subset may be requested, PCBR restricts the demand family to contiguous blocks. This relaxation raises a natural question: Can this structure be exploited to improve retrieval efficiency? We answer this question for balanced $\{0,1\}$-linear schemes. We establish an upper bound on the achievable retrieval rate for all problem parameters, derive a lower bound on the subpacketization level required by any scheme achieving the rate upper bound, and construct a rate-optimal scheme whose subpacketization level matches the lower bound for a broad range of problem parameters. Although the optimal PCBR rate coincides with the best-known MPIR rate converse bound, existing MPIR schemes can be suboptimal for PCBR and can require a much larger subpacketization level. In contrast, our scheme exploits the contiguous-block structure to achieve the optimal rate with reduced subpacketization.

[581]  arXiv:2605.05170 [pdf, ps, other]
Title: Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)

Driven by a rapid co-evolution of both harness and underlying models, LLM agents are improving at a dizzying pace. In our prior work (performed in Dec. 2025), we introduced "Design Conductor" (or just "Conductor"), a system capable of building a 5-stage Linux-capable RISC-V CPU in 12 hours. In this work, we introduce an updated multi-agent harness powered by frontier models released in April 2026, which is able to handle 80x larger tasks, at higher quality, fully autonomously. Following a brief introduction, we examine 4 designs that the system produced autonomously, including "VerTQ", an LLM inference accelerator which hard-wires support for TurboQuant in a 240-cycle pipeline, starting from the TurboQuant arXiv paper. VerTQ includes heavy compute processing, with 5129 FP16/32 units; the design was mapped to an FPGA at 125 MHz and consumes 5.7 mm^2 in TSMC 16FF (8 attention pipes). We review the key new characteristics that enabled these results. Finally, we analyze Design Conductor's token usage and other empirical characteristics, including its limitations.

[582]  arXiv:2605.05172 [pdf, ps, other]
Title: When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to replace previously learned good actions due to a distribution mismatch between offline data and online learning. In this work, we propose Q2RL, Q-Estimation and Q-Gating from BC for Reinforcement Learning, an algorithm for efficient offline-to-online learning. Our method consists of two parts: (1) Q-Estimation extracts a Q-function from a BC policy using a few interaction steps with the environment, followed by online RL with (2) Q-Gating, which switches between BC and RL policy actions based on their respective Q-values to collect samples for RL policy training. Across manipulation tasks from D4RL and robomimic benchmarks, Q2RL outperforms SOTA offline-to-online learning baselines on success rate and time to convergence. Q2RL is efficient enough to be applied in an on-robot RL setting, learning robust policies for contact-rich and high precision manipulation tasks such as pipe assembly and kitting, in 1-2 hours of online interaction, achieving success rates of up to 100% and up to 3.75x improvement against the original BC policy. Code and video are available at https://pages.rai-inst.com/q2rl_website/

[583]  arXiv:2605.05176 [pdf, ps, other]
Title: Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realize nonlinear features, such as polynomial or spline bases, which span a wide class of functions. Based on this construction, we establish a framework to analyze end-to-end in-context nonlinear regression with the constructed features. Our theory provides finite-sample generalization error bounds in terms of context length and training set size. We numerically validate the theory on synthetic regression tasks.

[584]  arXiv:2605.05177 [pdf, ps, other]
Title: Explicit Two-Sided Eigenvalue Bounds for Schrödinger Operators with Singular Potentials via Finite Element Method
Authors: Xuefeng Liu
Comments: 44 pages, 4 figures. The source includes the appendices and uses the standard LaTeX article class for arXiv compatibility
Subjects: Numerical Analysis (math.NA)

We present, to the best of our knowledge, the first numerical algorithm for explicit, computable two-sided eigenvalue bounds for Schr\"odinger operators H = -Delta + V on R^N, N = 2,3, in the presence of both an unbounded potential and an unbounded domain. "Explicit" here means that all constants and ingredients are derived in closed form from the mesh, the potential, and a small set of explicit inequalities (Payne-Weinberger, Hardy, and explicit bounded-domain Sobolev embeddings); the conversion to fully verified(IEEE-754-safe, interval-arithmetic) enclosures is a separate verification step and is left for future work. In particular, singular attractive potentials of Coulomb type, V(x) = -Z/|x|, which model the hydrogen atom and the H_2^+ molecular ion, are covered by the theory. The method combines domain truncation to a bounded domain D(R) containing {|x| <= R} with an extension of Liu's Composite Enriched Crouzeix-Raviart (CECR) finite element method to sign-indefinite potentials. Upper bounds come from the standard conforming Galerkin method; lower bounds come from the CECR construction, whose gap to the exact eigenvalue closes as the mesh is refined. Numerical experiments on the 2D single- and two-centred Coulomb potentials and on the 3D hydrogen atom and H_2^+ molecular ion illustrate the algorithm and confirm the predicted convergence.

[585]  arXiv:2605.05179 [pdf, ps, other]
Title: Estimating the expected output of wide random MLPs more efficiently than sampling
Comments: 68 pages. Code is available at this https URL
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (stat.ML)

By far the most common way to estimate an expected loss in machine learning is to draw samples, compute the loss on each one, and take the empirical average. However, sampling is not necessarily optimal. Given an MLP at initialization, we show how to estimate its expected output over Gaussian inputs without running samples through the network at all. Instead, we produce approximate representations of the distributions of activations at each layer, leveraging tools such as cumulants and Hermite expansions. We show both theoretically and empirically that for sufficiently wide networks, our estimator achieves a target mean squared error using substantially fewer FLOPs than Monte Carlo sampling. We find moreover that our methods perform particularly well at estimating the probabilities of rare events, and additionally demonstrate how they can be used for model training. Together, these findings suggest a path to producing models with a greatly reduced probability of catastrophic tail risks.

[586]  arXiv:2605.05182 [pdf, ps, other]
Title: A Closed-Form Dual-Barrier CBF Safety Filter for Holonomic Robots on Incrementally Built Occupancy Grid Maps
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

We present a dual-barrier control barrier function (CBF) safety filter for real-time, safety-critical velocity control of holonomic robots operating in incrementally built occupancy grid maps. As a robot explores an unknown environment, unmapped regions introduce irreducible uncertainty, since obstacle geometry beyond the explored frontier is unknown, making entry into such regions a source of collision risk, especially with front-facing sensors. To address this, we enforce two constraints: avoidance of mapped obstacles and restriction from unexplored regions. Both constraints are derived analytically from the occupancy grid's signed distance field, yielding a closed-form safety filter that requires only a small linear system solve per cycle. On resource-constrained platforms such as the Raspberry Pi, where SLAM and planning already consume significant compute, the low overhead of the proposed filter preserves resources. An adaptive gain schedule relaxes the frontier constraint in information-rich regions and tightens it in well-mapped areas, improving exploration efficiency while maintaining safety. The filter operates in velocity space as a minimally invasive correction and composes with arbitrary nominal controllers, including learning-based methods. Hardware flight experiments on a PX4-controlled quadrotor demonstrate zero collisions across multiple indoor runs.

[587]  arXiv:2605.05185 [pdf, ps, other]
Title: OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
Comments: Github Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data through Wikipedia path sampling, fuzzy entity rewriting, and source-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k for SFT and SearchVL-RL-8k for RL. Besides, we design a diverse tool environment that unifies text search, image search, OCR, cropping, sharpening, super-resolution, and perspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sided advantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open research on multimodal deep search agents.

[588]  arXiv:2605.05187 [pdf, ps, other]
Title: LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper reports on the LoViF 2026 PhyScore challenge, a competition on holistic quality assessment of world-model-generated videos across both 2D and 4D generation settings. The challenge is motivated by a central gap in current evaluation practice: perceptual quality alone is insufficient to judge whether generated dynamics are physically plausible, temporally coherent, and consistent with input conditions. Participants are required to build a metric that jointly predicts four dimensions, i.e., Video Quality, Physical Realism, Condition-Video Alignment, and Temporal Consistency. Depart from that, participants also need to localize physical anomaly timestamps for fine-grained diagnosis.
The benchmark dataset contains 1,554 videos generated by seven representative world generative models, organized into three tracks (text-2D, image-to-4D, and video-to-4D) and spanning 26 categories. These categories explicitly cover physics-relevant scenarios, including dynamics, optics, and thermodynamics, together with diverse real-world and creative content. To ensure label reliability, scores and anomaly timestamps are produced through trained human annotation with an additional automated quality-control pass.
Evaluation is based on both score prediction and anomaly localization, with a composite protocol that combines TimeStamp_IOU and SRCC/PLCC. This report summarizes the challenge design and provides method-level insights from submitted solutions.

[589]  arXiv:2605.05188 [pdf, ps, other]
Title: SILC: Lookahead Caching for Short-form Video Delivery Systems
Subjects: Networking and Internet Architecture (cs.NI)

Short video platforms like TikTok, Instagram Reels, and YouTube Shorts have gained immense popularity in the last few years and are responsible for a large and growing fraction of Internet traffic. We identify two unique opportunities for improving short video delivery using their existing interactions with content delivery networks (CDNs). First, short videos use a push-based recommendation system, where the user is presented a sequence of videos recommended by the algorithm rather than user explicitly picking content to watch (e.g., in YouTube). Such push-based short video systems offer a unique opportunity for system design by providing visibility into upcoming requests. Second, the popularity of these videos follows a highly skewed Pareto distribution, leading to geographical and temporal overlap amongst videos being served. We leverage these opportunities to build SILC - a lookahead-aware caching system, aimed at (i) reducing CDN cache miss rates, as well as (ii) reducing midgress bandwidth between the CDN and the origin server. Our evaluation of SILC uses traces that we collect from real users, through (i) an in-person user study, and (ii) a data donation program involving 100 TikTok users across the world. Using a combination of these traces, we simulate traffic from 10,000 simultaneous users. Our evaluation shows that, compared to 10 state-of-the-art heuristic and learning-based cache eviction policies, SILC reduces a CDN's midgress costs by 11.1% to 111%.

[590]  arXiv:2605.05191 [pdf, ps, other]
Title: LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents
Comments: 15 pages, 7 figures
Subjects: Artificial Intelligence (cs.AI)

Long-horizon search agents must manage a rapidly growing working context as they reason, call tools, and observe information. Naively accumulating all intermediate content can overwhelm the agent, increasing costs and the risk of errors. We propose that effective context management should be adaptive: parts of the agent's trajectory are maintained at different levels of detail depending on their current relevance to the task. To operationalize this principle, we introduce Context-ReAct, a general agentic paradigm for elastic context orchestration that integrates reasoning, context management, and tool use in a unified loop. Context-ReAct provides five atomic operations: Skip, Compress, Rollback, Snippet and Delete, which allow the agent to dynamically reshape its working context, preserving important evidence, summarizing resolved information, discarding unhelpful branches, and controlling context size. We prove that the Compress operator is expressively complete, while the other specialized operators provide efficiency and fidelity guarantees that reduce generation cost and hallucination risk. Building on this paradigm, we develop LongSeeker, a long-horizon search agent fine-tuned from Qwen3-30B-A3B on 10k synthesized trajectories. Across four representative search benchmarks, LongSeeker achieves 61.5% on BrowseComp and 62.5% on BrowseComp-ZH, substantially outperforming Tongyi DeepResearch (43.2% and 46.7%) and AgentFold (36.2% and 47.3%). These results highlight the potential of adaptive context management, showing that agents can achieve more reliable and efficient long-horizon reasoning by actively shaping their working memory.

[591]  arXiv:2605.05197 [pdf, ps, other]
Title: Implicit Representations of Grammaticality in Language Models
Subjects: Computation and Language (cs.CL)

Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and discriminate well between grammatical and ungrammatical sentences in tightly controlled minimal pairs. However, their string probabilities do not sharply discriminate between grammatical and ungrammatical sentences overall. But do LMs implicitly acquire a grammaticality distinction distinct from string probability? We explore this question through studying internal representations of LMs, by training a linear probe on a dataset of grammatical and (synthetic) ungrammatical sentences obtained by applying perturbations to a naturalistic text corpus. We find that this simple grammaticality probe generalizes to human-curated grammaticality judgment benchmarks and outperforms LM probability-based grammaticality judgments. When applied to semantic plausibility benchmarks, in which both members of a minimal pair are grammatical and differ in only plausibility, the probe however performs worse than string probability. The English-trained probe also exhibits nontrivial cross-lingual generalization, outperforming string probabilities on grammaticality benchmarks in numerous other languages. Additionally, probe scores correlate only weakly with string probabilities. These results collectively suggest that LMs acquire to some extent an implicit grammaticality distinction within their hidden layers.

[592]  arXiv:2605.05202 [pdf, ps, other]
Title: Optimizing Bit-Labeling of Voronoi Constellations
Comments: 6 Pages, 7 Figures, and submitted to IEEE ITW2026
Subjects: Information Theory (cs.IT)

We define a novel search method and performance metric as a technique for optimizing the bit-to-symbol map of the $D_4$ and $E_8$ root lattices in reference to bit error rate. We hold other sources of lattice gain constant by fixing the lattice constellation, and consider basis matrices that permute the integer labelings of the lattice points. After searching the possible basis matrices for $D_4$ and $E_8$, we found 0.1 dB of gain in $D_4$ bit error rate curves, and 0.5 dB of gain in $E_8$ compared to the standard bases commonly used in literature at a BER of $10^{-4}$.

[593]  arXiv:2605.05204 [pdf, ps, other]
Title: D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervised fine-tuning. For example, applying the commonly used fine-tuning technique would compromises their inherent few-step inference capability. To address this, we propose D-OPSD, a novel training paradigm for step-distilled diffusion models that enables on-policy learning during supervised fine-tuning. We first find that the modern diffusion model where the LLM/VLM serves as the encoder can inherit its encoder's in-context capabilities. This enables us to make the training as an on-policy self-distillation process. Specifically, during training, we make the model acts as both the teacher and the student with different contexts, where the student is conditioned only on the text feature, while the teacher is conditioned on the multimodal feature of both the text prompt and the target image. Training minimizes the two predicted distributions over the student's own roll-outs. By optimized on the model's own trajectory and under it's own supervision, D-OPSD enables the model to learn new concept, style, etc. without sacrificing the original few-step capacity.

[594]  arXiv:2605.05206 [pdf, ps, other]
Title: Taming Outlier Tokens in Diffusion Transformers
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carrying limited local information, but their role in generative models remains underexplored. We show that this phenomenon appears in both the encoder and denoiser of modern Representation Autoencoder (RAE)-DiT pipelines: pretrained ViT encoders can produce outlier representations, and DiTs themselves can develop internal outlier tokens, especially in intermediate layers. Moreover, simply masking high-norm tokens does not improve performance, indicating that the problem is not only caused by a few extreme values, but is more closely related to corrupted local patch semantics. To address this issue, we introduce Dual-Stage Registers (DSR), a register-based intervention for both components: trained registers when available, recursive test-time registers otherwise, and diffusion registers for the denoiser. Across ImageNet and large-scale text-to-image generation, these interventions consistently reduce outlier artifacts and improve generation quality. Our results highlight outlier-token control as an important ingredient in building stronger DiTs.

[595]  arXiv:2605.05207 [pdf, ps, other]
Title: Syn4D: A Multiview Synthetic 4D Dataset
Comments: 30 pages, 10 figures, project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Dense 3D reconstruction and tracking of dynamic scenes from monocular video remains an important open challenge in computer vision. Progress in this area has been constrained by the scarcity of high-quality datasets with dense, complete, and accurate geometric annotations. To address this limitation, we introduce Syn4D, a multiview synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotations. A key feature of Syn4D is the ability to unproject any pixel into 3D to any time and to any camera. We conduct extensive evaluations across multiple downstream tasks to demonstrate the utility and effectiveness of the proposed dataset, including 4D scene reconstruction, 3D point tracking, geometry-aware camera retargeting, and human pose estimation. The experimental results highlight Syn4D's potential to facilitate research in dynamic scene understanding and spatiotemporal modeling.

Cross-lists for Thu, 7 May 26

[596]  arXiv:2301.06217 (cross-list from quant-ph) [pdf, other]
Title: Analogy between Boltzmann machines and Feynman path integrals
Journal-ref: Journal of Chemical Theory and Computation 2023 19 (9), 2446-2454
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We provide a detailed exposition of the connections between Boltzmann machines commonly utilized in machine learning problems and the ideas already well known in quantum statistical mechanics through Feynman's description of the same. We find that this equivalence allows the interpretation that the hidden layers in Boltzmann machines and other neural network formalisms are in fact discrete versions of path elements that are present within the Feynman path-integral formalism. Since Feynman paths are the natural and elegant depiction of interference phenomena germane to quantum mechanics, it appears that in machine learning, the goal is to find an appropriate combination of ``paths'', along with accumulated path-weights, through a network that cumulatively capture the correct $x \rightarrow y$ map for a given mathematical problem. As a direct consequence of this analysis, we are able to provide general quantum circuit models that are applicable to both Boltzmann machines and to Feynman path integral descriptions. Connections are also made to inverse quantum scattering problems which allow a robust way to define ``interpretable'' hidden layers.

[597]  arXiv:2508.00049 (cross-list from astro-ph.CO) [pdf, ps, other]
Title: Segmenting proto-halos with vision transformers
Comments: 39 pages, 14 figures, 11 tables; updated to match the published version: JCAP11(2025)083
Journal-ref: JCAP 11 (2025) 083
Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV)

The formation of dark-matter halos from small cosmological perturbations generated in the early universe is a highly non-linear process typically modeled through N-body simulations. In this work, we explore the use of deep learning to segment and classify proto-halo regions in the initial density field according to their final halo mass at redshift z=0. We compare two architectures: a fully convolutional neural network (CNN) based on the V-Net design and a U-Net transformer. We find that the transformer-based network significantly outperforms the CNN across all metrics, achieving sub-percent error in the total segmented mass per halo class. Both networks deliver much higher accuracy than the perturbation-theory-based model \textsc{pinocchio}, especially at low halo masses and in the detailed reconstruction of proto-halo boundaries. We also investigate the impact of different input features by training models on the density field, the tidal shear, and their combination. Finally, we use Grad-CAM to generate class-activation heatmaps for the CNN, providing preliminary yet suggestive insights into how the network exploits the input fields.

[598]  arXiv:2605.00249 (cross-list from eess.SP) [pdf, ps, other]
Title: The Resurrection of Spectrum Spreading for 6G and Beyond: From Sinusoids to Chirps
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Orthogonal frequency-division multiplexing (OFDM) and its static sinusoidal subcarriers have underpinned the 4G and 5G eras, delivering high spectral efficiency and resilience to multipath fading through an efficient multicarrier architecture. However, as future systems move toward doubly dispersive environments driven by high-mobility applications and migration to mmWave/sub-THz bands, the time-invariance assumption underlying OFDM becomes increasingly difficult to maintain, and Doppler-induced degradation becomes prominent. While enhancements such as MIMO, advanced coding, and scheduling provide incremental remedies, they introduce additional overhead, because the sinusoidal subcarrier itself offers no inherent waveform-level robustness to Doppler impairments. Accordingly, two time-frequency spreading philosophies have emerged to improve Doppler resilience by distributing each symbol's energy across both dimensions of the time-frequency plane: (i) 2D isotropic spreading via the delay-Doppler (DD) domain, exemplified by the orthogonal time frequency space (OTFS) family, and (ii) sheared spreading via parameterizable chirps, exemplified by the affine frequency-division multiplexing (AFDM) family. In this article, we examine key considerations for future waveform design across these paradigms and argue that transitioning from the sinusoidal subcarriers of OFDM to the chirp-based subcarriers offers a viable design direction for improving Doppler robustness while retaining much of the mature OFDM infrastructure. This perspective also highlights the suitability of chirp-based waveforms for integrated sensing and communications (ISAC) and their extensibility to emerging physical-layer techniques. Overall, we argue that the transition from sinusoids to chirps is a technically motivated, compelling evolutionary direction for future wireless physical layer design.

[599]  arXiv:2605.02498 (cross-list from quant-ph) [pdf, ps, other]
Title: Permutation Routing on Ramanujan Hypergraphs with Applications to Neutral Atom Quantum Architectures
Comments: 23 pages, 1 figure, 20 tables
Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS); Mathematical Physics (math-ph)

We consider the routing of neutral atoms on a reconfigurable lattice in terms of hypergraph transformations. We prove the routing number of a Ramanujan $(d,r)$-regular hypergraph on $N$ vertices satisfies $\mathrm{rt}(H) = \Theta(\log N)$, where routing is via matchings in the clique expansion graph $G_{\mathrm{cl}}(H)$. Hypergraphs reframe the qubit routing problem by replacing Nenadov's two-sided spectral gap hypothesis with a one-sided condition based on eigenvalue centering. Song--Fan--Miao (SFM) coverings scale for Ramanujan families of every uniformity. A virtual overlay theorem establishes a capacity--depth tradeoff for 3D acousto-optic lens (AOL) architectures, with multi-layer stacking achieving $\Theta(\log N)$ routing with $L = O(\log N)$ independent overlay layers. An abelian Alon--Boppana barrier shows that fixed-degree Cayley graphs on $\mathbb{Z}_n^2$ cannot be Ramanujan and affine derandomization on such graphs achieves 15--30% congestion reduction. Towers of $k$-fold Ramanujan coverings yield $\mathrm(H_L) = O(\log N)$ by recursive routing lift. Entanglement-assisted routing by pre-distributed Bell pairs achieves $O(\log N)$ teleportation depth with a stable crossover at $\sim\!4$ routing rounds. Displacement energy analyzes greedy adaptive routing, identifying stalling and a hybrid greedy--Valiant protocol achieving $\sim\!3\times$ speedup at practical scales. Hierarchical multi-scale routing achieves $O(\log^2 N / \log b)$ depth with boundary-only transfers at capacity $k = O(\sqrt{N} \log N)$, and $O(\log N)$ depth with optimal block size $b = \Theta(\sqrt{n})$.

[600]  arXiv:2605.04051 (cross-list from stat.ML) [pdf, ps, other]
Title: A Consistency-Centric Approach to Set-Based Optimization with Multiple Models of Unranked Fidelity
Comments: 26 pages, 6 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

In complex real-world settings, optimization is challenged by the presence of diverse models of differing fidelity. In many optimization problems, a single model is treated as the most accurate representation of the underlying system, while other models are evaluated primarily by their agreement with this presumed most accurate model. Yet in real-world applications, model accuracy is rarely known a priori and assuming a single most accurate model can be misleading. This paper addresses this gap by proposing a flexible set-based optimization methodology called Set-Based Optimization with Multiple Models (S-BOMM) that works with multiple models without the assumption of a most accurate high-fidelity model. Unlike traditional optimization approaches that focus on finding an optimal solution according to the high-fidelity model, our methodology utilizes consistency between models to identify good solutions across multiple models. A probabilistic analysis of the consistency method is provided that bounds the likelihood of the methodology producing correct or incorrect results. Empirical results demonstrate the effectiveness of S-BOMM on test problems. By focusing on the consistency across models rather than relying on a single best solution, this set-based approach offers a practical alternative to optimization problems where multiple models must be considered without assuming a single most accurate high-fidelity model.

[601]  arXiv:2605.04087 (cross-list from math.OC) [pdf, ps, other]
Title: BOOOM: Loss-Function-Agnostic Black-Box Optimization over Orthonormal Manifolds for Machine Learning and Statistical Inference
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)

Optimization over the Stiefel manifold $\mathrm{St}(p,d)$, the set of $p \times d$ column-orthonormal matrices, is fundamental in statistics, machine learning, and scientific computing, yet remains challenging in the presence of non-convex, non-smooth, or black-box objectives. Existing methods largely rely on either convex relaxations or gradient-based Riemannian optimization, limiting applicability in derivative-free and highly multimodal settings. We propose \textsc{BOOOM} (Black-box Optimization Over Orthonormal Manifolds), a general-purpose framework for loss-function-agnostic optimization on $\mathrm{St}(p,d)$. The key idea is a global Givens rotation-based parametrization that maps the manifold to an unconstrained Euclidean angle space while preserving feasibility exactly. Building on this representation, BOOOM employs a structured, parallelizable, derivative-free search based on Recursive Modified Pattern Search, enabling systematic exploration through plane-wise rotations without requiring gradient information and facilitating escape from poor local optima. We establish a unified theoretical framework showing equivalence between angle-space and manifold optimization, transfer of stationarity, and global convergence in probability under mild conditions. Empirical results across diverse problems, including heterogeneous quadratic optimization, low-rank and sparse matrix decomposition, independent component analysis, and orthogonal joint diagonalization, among other widely studied settings, demonstrate strong performance relative to state-of-the-art methods, particularly in non-smooth and highly multimodal regimes. We further illustrate its practical utility through a novel supervised PCA formulation applied to metabolomics data in colorectal cancer.

[602]  arXiv:2605.04097 (cross-list from q-bio.NC) [pdf, ps, other]
Title: CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness
Comments: 9 pages
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)

Despite remarkable advances, today's AI systems remain narrow in scope, falling short of the flexible, adaptive, and multisensory intelligence that characterizes human capabilities. This gap has fueled longstanding debates about whether AI might one day achieve human-like generality or even consciousness, and whether theories of consciousness can inspire new architectures for AI. This paper presents an early blueprint for implementing a general AI system, CTM-AI, combining the Conscious Turing Machine (CTM), a formal machine model of consciousness, with today's foundation models. CTM-AI contains an enormous number of powerful processors ranging from specialized experts (e.g., vision-language models and APIs) to unspecialized general-purpose learners poised to develop their own expertise. Crucially, for whatever problem must be dealt with, information from many processors is selected, integrated, and exchanged appropriately to solve the task. CTM-AI achieves state-of-the-art accuracy on MUStARD (72.28) and UR-FUNNY (72.13), outperforming multimodal and multi-agent frameworks. On tool-using and agentic tasks, CTM-AI achieves 10+ points of improvement on StableToolBench and WebArena-Lite. Overall, CTM-AI offers a principled, testable blueprint for general AI inspired by a model of consciousness.

[603]  arXiv:2605.04102 (cross-list from cond-mat.mtrl-sci) [pdf, ps, other]
Title: Meta-LegNet: A Transferable and Interpretable Framework for Surface Adsorption Prediction via Self-Defined Adsorption-Environment Learning
Subjects: Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

A central challenge in computational catalysis is the identification of low-energy and chemically plausible adsorption configurations, as these directly affect adsorption energies, reaction pathways, and catalytic performance. Existing approaches generally rely on enumerating candidate adsorption sites followed by iterative refinement through density functional theory calculations or machine-learning-based relaxations. However, such workflows remain computationally expensive and are difficult to scale to complex surfaces or multi-adsorbate systems. Here, we introduce Meta-LegNet, a graph learning framework that combines SE(3)-equivariant atom-level message passing with voxel-based multiscale aggregation and cross-domain meta-learning to learn transferable representations of local adsorption environments across diverse catalyst--adsorbate systems. Rather than following a conventional regression-only paradigm, Meta-LegNet encodes local chemical environments using invariant radial features and equivariant directional information, and further incorporates broader structural context through coordinate-frame voxel pooling, assignment-based upsampling, and gated feature fusion. The resulting local-global decomposition produces atom-resolved attribution maps, which are processed to identify adsorption-relevant local environments in an interpretable manner. Based on the learned representations, we further construct an adsorption-environment database and develop a template-matching strategy to propose likely adsorption sites on previously unexplored surfaces without exhaustive site enumeration. Overall, our results suggest that learning transferable adsorption environments provides an accurate, interpretable, and practical route for accelerating catalyst screening.

[604]  arXiv:2605.04118 (cross-list from q-bio.QM) [pdf, ps, other]
Title: ProtDBench: A Unified Benchmark of Protein Binder Design and Evaluation
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI)

Recent advances in de novo protein binder design have enabled increasing experimental validation, yet reported in silico metrics remain difficult to interpret or compare across studies due to non-standardized evaluation protocols. We introduce ProtDBench, a standardized and throughput-aware evaluation framework for protein binder design. ProtDBench defines unified benchmark tasks, evaluation protocols, and success criteria, enabling systematic analysis of how evaluation design influences observed performance. Using a large wet-lab annotated dataset, we analyze commonly used structure prediction models as evaluation verifiers, revealing substantial verifier-dependent bias and limited agreement under identical filtering protocols. We then benchmark representative open-source generative binder design methods across ten diverse protein targets under a fixed evaluation protocol. Beyond per-sequence success rates, ProtDBench incorporates throughput-aware metrics based on a fixed 24-hour budget, as well as cluster-level success criteria to account for structural diversity. Together, these results expose systematic differences induced by filtering rules, success definitions, and throughput-aware evaluation between computational efficiency, success rate, and structural diversity. Overall, ProtDBench provides a fair and reproducible evaluation pipeline that supports systematic and controlled comparison of protein binder design methods under realistic evaluation settings.

[605]  arXiv:2605.04119 (cross-list from q-bio.QM) [pdf, ps, other]
Title: Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction
Comments: 9 pages of main text, 3 figures, 3 tables, and 1 algorithm. This version is a preliminary preprint
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Populations and Evolution (q-bio.PE)

Ancestral sequence reconstruction (ASR) aims to infer extinct protein sequences at internal nodes of a phylogenetic tree. Classical ASR methods are typically based on continuous-time Markov substitution models, but they treat sites largely independently and handle insertions and deletions only weakly or not at all. We introduce a tree-conditioned edit-flow model for variable-length ASR. Given two descendant sequences and their branch distances to a shared ancestor, the model reconstructs the ancestor through paired bidirectional edit trajectories constrained to agree on a common ancestral state. On a benchmark of experimentally evolved sequences with only context-independent substitutions, the model does not match the accuracy of the best classical method, yet still achieves reasonable performance despite being trained on natural sequences that include insertions, deletions, and substitutions. On a benchmark of natural homologous sequences with abundant insertions and deletions, the model most accurately localizes inferred evolutionary change.

[606]  arXiv:2605.04131 (cross-list from physics.ed-ph) [pdf, ps, other]
Title: A Dialogue-Based Framework for Correcting Multimodal Errors in AI-Assisted STEM Education
Subjects: Physics Education (physics.ed-ph); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Large Language Models (LLMs) are democratizing access to personalized tutoring; however, their effectiveness is hindered by challenges in processing multimodal content, which limits AI's potential to provide equitable, high-quality STEM support. This study evaluates LLM performance on multimodal physics problems, identifies specific failure modes through an empirical error taxonomy, and tests practical interventions designed to overcome multimodal processing limitations. We assessed three publicly available LLMs (Claude, Gemini, and ChatGPT) on multimodal physics problems from the OpenStax database and compared the results with text-only performance. An empirically derived error taxonomy was developed through pilot testing, followed by evaluation of a structured multimodal dialogue intervention. All three models achieved near-ceiling accuracy (96%) on text-only physics problems. Performance declined substantially on multimodal problems, consistent with what we term the Multimodal Interference Effect. Error analysis identified four failure modes: visual processing errors, context misinterpretation, mathematical computational errors, and hybrid errors, with visual processing errors being the most prevalent. The structured dialogue intervention corrected 82% of errors overall; visual processing errors were corrected at 100% across all models. Educators and students can implement these interventions immediately, requiring no model retraining, to improve AI tutoring reliability on image-rich STEM content, advancing equitable access to high-quality learning support.

[607]  arXiv:2605.04168 (cross-list from math.PR) [pdf, ps, other]
Title: Error analysis for learning fractional stochastic differential equations with applications in neural approximations
Subjects: Probability (math.PR); Numerical Analysis (math.NA)

This paper develops a framework for the error analysis in nonparametric model fitting of fractional stochastic differential equations based on discrete observations. We identify and quantify the main error sources -- time discretization, coefficient approximation, and model fitting error -- within a unified framework. Through Sobolev-type norms, we derive convergence rates that incorporate the regularity of trajectories, thereby capturing the interaction of these error components. To demonstrate the applicability of the theory, we introduce a training scheme for coefficient function estimation based on shallow neural networks and a recurrent architecture. Numerical experiments validate the theoretical findings and illustrate the effectiveness of the approach.

[608]  arXiv:2605.04191 (cross-list from stat.ML) [pdf, ps, other]
Title: Heterogeneous Ordinal Structure Learning with Bayesian Nonparametric Complexity Discovery
Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG)

Public attitudes toward artificial intelligence are heterogeneous, ordinally measured, and poorly captured by any single dependency graph. Existing ordinal structure learners assume a shared directed acyclic graph (DAG) across all respondents; recent heterogeneous ordinal graphical-model approaches focus on subgroup discovery rather than confirmatory cluster-specific DAG estimation; and latent profile analyses discard dependency structure entirely. We introduce a heterogeneous ordinal structure-learning framework combining monotone Gaussian score embedding, Bayesian nonparametric (BNP) complexity discovery via a truncated stick-breaking prior, and confirmatory fixed-K estimation with cluster-specific sparse DAG learning. The key methodological insight is a discovery-to-confirmation workflow: the nonparametric stage calibrates plausible archetype complexity, while inner-validated confirmatory refitting yields stable, interpretable structural estimates. On the 2024 Pew American Trends Panel AI attitudes survey, Wave 152 (W152) survey, (N = 4,788, 8 ordinal items), the confirmatory K*=5 model reduces holdout transformed-score mean squared error (MSE) by 25.8% over a single-graph baseline and by 4.6% over mixture-only clustering. A controlled tiered semi-synthetic benchmark calibrated to W152 structure validates recovery across difficulty regimes and transparently reveals failure modes under stress conditions.

[609]  arXiv:2605.04197 (cross-list from math.DS) [pdf, ps, other]
Title: Calculating Domain of Attraction Boundary of Power Systems Based on the Gentlest Ascent Dynamics
Subjects: Dynamical Systems (math.DS); Numerical Analysis (math.NA)

The power system, a fundamental public utility, is increasingly important due to growing global electricity demand. Recent large-scale blackouts (e.g., Iberian Peninsula, UK) have raised concerns about transient stability under impact faults. Transient stability is determined by post-disturbance synchronizing capability of synchronous generators, formulated as identifying the domain of attraction (DOA) boundary of the asymptotically stable equilibrium. Using a benchmark model of synchronous-generator-dominated power systems, this report employs a gentlest ascent dynamics (GAD) method for 1-saddle points, an adjoint operator method for periodic orbits, and stable manifold algorithms to compute the DOA boundary. These algorithms transform DOA boundary determination into constructing unstable critical elements (saddle points and periodic orbits) and their stable manifolds. Theoretically, under certain assumptions we prove that the DOA boundary is the closure of the union of stable manifolds of index-1 critical elements, and establish a stability theory for a perturbed GAD system. Numerical experiments on two-machine and three-machine systems (with only saddle points or with periodic orbits) validate the effectiveness and accuracy. Results show the algorithms accurately capture the geometric structure of the DOA boundary, providing a new numerical tool for transient stability analysis.

[610]  arXiv:2605.04235 (cross-list from math.CO) [pdf, ps, other]
Title: Conflict-Aware Seat Assignment in Classroom Environments
Comments: This manuscript is currently under review
Subjects: Combinatorics (math.CO); Computers and Society (cs.CY); Optimization and Control (math.OC)

Classroom dynamics depend on various elements that influence teaching performance and learning activities. A key challenge is to determine the most effective seating plan, where students will seat in a specific classroom setting to achieve the best learning environment. This paper introduces the Student Seat Allocation Problem (SSAP) for strategically organizing student seating in traditional classrooms to minimize interpersonal conflicts. We propose a mathematical model and an Iterated Local Search (ILS) heuristic to solve the SSAP. Computational experiments demonstrated that ILS outperformed in more complex scenarios when compared to the results obtained by a commercial solver on the introduced mathematical model. ILS was particularly efficient in real and artificial instances that exhibited a higher number of conflicts.

[611]  arXiv:2605.04246 (cross-list from math.OC) [pdf, ps, other]
Title: Globally Solving Unbalanced Optimal Transport and Density Control for Gaussian Distributions
Comments: 28 pages; submitted to a journal
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)

In this article, we study unbalanced optimal transport (UOT) and establish a control-theoretic dynamical extension, which we call the unbalanced density control (UDC), for a class of Gaussian reference measures. In the static setting, we consider UOT with quadratic transport cost and Kullback--Leibler penalties on the marginals relative to prescribed Gaussian measures. We show that the infinite-dimensional variational problem admits an exact Gaussian reduction, yielding a finite-dimensional optimization over masses, means, and covariances, together with a closed-form expression for the optimal transported mass. We then formulate UDC for discrete-time linear systems, where the initial and terminal state measures are imposed softly through KL penalties and the intermediate evolution is governed by controlled linear dynamics with quadratic control cost. For this problem, we prove that any feasible solution can be replaced, without loss of optimality, by a Gaussian initial measure and an affine-Gaussian control policy. This leads to an exact finite-dimensional reformulation and, after a standard covariance-steering lifting, to an SDP-based optimization for fixed mass, again coupled with a closed-form mass update. We further establish existence of optimal solutions and identify a sufficient condition under which the affine-Gaussian UDC policy is deterministic. These results provide globally optimal solution methods for both Gaussian UOT and Gaussian UDC. Finally, we illustrate our results with several numerical examples.

[612]  arXiv:2605.04255 (cross-list from stat.ML) [pdf, ps, other]
Title: Entropic Riemannian Neural Optimal Transport
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

Many machine learning problems involve data supported on curved spaces such as spheres, rotation groups, hyperbolic spaces, and general Riemannian manifolds, where Euclidean geometry can distort distances, averages, and the resulting optimal transport (OT) problem. Existing manifold OT methods have pursued amortized out-of-sample maps, while entropic regularization has made discrete OT more scalable, but these advantages have remained largely disjoint. We propose Entropic Riemannian Neural Optimal Transport (Entropic RNOT), a unified framework that combines intrinsic entropic OT with amortized out-of-sample evaluation on Riemannian manifolds. Our method learns a single target-side Schr\"odinger potential through a neural pullback parameterization, recovers the induced Gibbs coupling, and uses the resulting conditional laws to construct intrinsic transport surrogates. These include barycentric projections on Cartan-Hadamard manifolds and heat-smoothed conditional surrogates on stochastically complete manifolds, the latter turning possibly atomic target laws into absolutely continuous ones. For fixed regularization $\varepsilon>0$, we prove that the proposed hypothesis class recovers the entropic optimal coupling in strong probabilistic metrics. As consequences, barycentric surrogates converge in $L^2$, while heat-smoothed surrogates are stable at fixed heat time and asymptotically unbiased as the heat time vanishes. The guarantees hold for compactly supported data on possibly noncompact manifolds. Empirically, our method matches or improves over Euclidean, tangent-space, and log-Euclidean baselines on benchmarks over $\mathbb{S}^2$, $\mathrm{SO}(3)$, $\mathrm{SPD}(3)$, $\mathrm{SE}(3)$, and $\mathbb{H}^2$, scales favorably relative to discrete manifold Sinkhorn, and in a protein-ligand docking application, refines poses on $\mathrm{SE}(3)$ without retraining or per-instance optimization.

[613]  arXiv:2605.04269 (cross-list from stat.ML) [pdf, ps, other]
Title: Adapt or Forget: Provable Tradeoffs Between Adam and SGD in Nonstationary Optimization
Comments: 39 pages, 11 figures, 1 table
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We provide a theoretical analysis of Adam under non-stationary stochastic objectives, separating two regimes: Euclidean tracking under adaptive strong monotonicity of the Adam-preconditioned mean-gradient operator, and high-probability projected stationarity guarantees under general $L$-smooth objectives. In the tracking regime, we derive finite-time expected and high-probability bounds that decompose sharply into four components: initialization, objective drift, a first-moment tracking error governed by $\beta_1$, and a preconditioner perturbation governed by $\beta_2$. We characterize the burn-in time to reach Adam's irreducible tracking floor under constant and step-decay schedules. We also prove a high-probability bound on the average projected stationarity gap for Adam under distribution shift. Across both analyses, our bounds reveal a noise--drift tradeoff: in noise-dominated regimes, first-moment averaging and adaptive preconditioning can improve the high-probability error, whereas in drift-dominated regimes, stale first-moment information and preconditioner perturbations can compound the cost of nonstationarity, allowing vanilla SGD to achieve a smaller tracking floor. Our explicit $(\beta_1,\beta_2,\epsilon)$-dependent bounds delineate when adaptive step-sizing is beneficial versus harmful, and provide a theoretical mechanism for Adam's empirical instability and stabilization under distribution shift.

[614]  arXiv:2605.04271 (cross-list from quant-ph) [pdf, ps, other]
Title: Quantum Compression for Distributed Entanglement
Comments: Submitted to JSAIT. Proofs available in the supplemental part
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

We study compression strategies for multipartite entanglement distribution under uncertainty in the partitioning of the quantum state. When the partition is not known at the time of state preparation, we show that a joint design of the resource state and a family of compression schemes can increase the entanglement across partitions under a fixed transmission budget. We formulate this as a source coding problem and derive non-asymptotic upper and lower bounds on the achievable average entanglement subject to an average coding rate. We furthermore design an efficient method for jointly optimizing states and lossless compression maps by exploiting the inherent symmetry of weighted Dicke states. In the bipartite case, we propose practical constructions that closely approach the derived upper bound, and more generally we provide practical constructions for multipartite settings.

[615]  arXiv:2605.04300 (cross-list from math.ST) [pdf, ps, other]
Title: Thinned Quantile Shares are Universally Feasible
Subjects: Statistics Theory (math.ST); Discrete Mathematics (cs.DM); Computer Science and Game Theory (cs.GT); Combinatorics (math.CO)

Quantile shares, introduced by Babichenko, Feldman, Holzman, and Narayan [STOC 2024], offer an ordinal, self-maximizing, and interpretable benchmark for fair division of indivisible goods, but their universal feasibility is known only conditional on the rainbow Erd\H{o}s matching conjecture (EMC). Specifically, Babichenko et al. showed that assuming the rainbow EMC in the near-perfect matching regime, the $(1/2e)$-quantile share is universally feasible. In contrast, a simple argument shows that the $q$-quantile share can be infeasible for any $q > 1/e$. We introduce a one-parameter refinement of quantile shares, the $c$-thinned quantile share, obtained by thinning the inclusion probability in the random benchmark bundle by a factor of $c$ for a fixed constant $c\in(0,1]$. Our main result is that there exists a universal constant $c >0$ for which the $c$-thinned $e^{-c}$-quantile share is unconditionally universally feasible; this is best possible in the sense that for any $c \in (0,1]$, the $c$-thinned $q$-quantile share can be infeasible for any $q > e^{-c}$. Prior to this work, the only nontrivial share known to be universally feasible was Feige's residual maximin share. The thinning viewpoint also lets us remove the factor-two loss in the conditional result for the original quantile share: assuming the rainbow EMC, the $(1/e)$-quantile share is universally feasible.

[616]  arXiv:2605.04326 (cross-list from q-bio.NC) [pdf, ps, other]
Title: A foundation model of vision, audition, and language for in-silico neuroscience
Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG)

Cognitive neuroscience is fragmented into specialized models, each tailored to specific experimental paradigms, hence preventing a unified model of cognition in the human brain. Here, we introduce TRIBE v2, a tri-modal (video, audio and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions. Leveraging a unified dataset of over 1,000 hours of fMRI across 720 subjects, we demonstrate that our model accurately predicts high-resolution brain responses for novel stimuli, tasks and subjects, superseding traditional linear encoding models, delivering several-fold improvements in accuracy. Critically, TRIBE v2 enables in silico experimentation: tested on seminal visual and neuro-linguistic paradigms, it recovers a variety of results established by decades of empirical research. Finally, by extracting interpretable latent features, TRIBE v2 reveals the fine-grained topography of multisensory integration. These results establish artificial intelligence as a unifying framework for exploring the functional organization of the human brain.

[617]  arXiv:2605.04335 (cross-list from physics.comp-ph) [pdf, ps, other]
Title: GPU-Accelerated Simulations of Problems with Moving Boundaries and Fluid-Structure Interaction at Extreme Scales
Subjects: Computational Physics (physics.comp-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Fluid Dynamics (physics.flu-dyn)

Computational fluid dynamics and fluid-structure interaction simulations involving moving and deforming bodies is extremely hard. In this work, we present a graphical processing unit (GPU) optimized implementation of the sharp-interface immersed boundary method. The method allows performing simulation around complex stationary as well as moving bodies on a Cartesian grid. We base our implementation on the ViCar3D framework and make use of OpenACC, CUDA, NCCL and MPI. We test the implementation across grid sizes ranging from O(10million) to O(1billion) points and achieved a 20X speedup compared to existing CPU implementation. We next present our multi-GPU implementation by utilizing CUDA streams and NCCL communicators. This enables us to obtain a >90% strong and weak scaling efficiencies. Next we demonstrate the capability of the developed software to simulate a turbulent fluid flow and coupled fluid-structure interaction in flapping bat wing in flight at Re=5000.

[618]  arXiv:2605.04336 (cross-list from econ.TH) [pdf, ps, other]
Title: The Adversarial Discount - AI, Signal Correlation, and the Cybersecurity Arms Race
Authors: James W. Bono
Subjects: Theoretical Economics (econ.TH); Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT)

We study a contest-theoretic model of adversarial investment in which an attacker and a defender allocate resources to AI-augmented capabilities across multiple attack surfaces. The attacker's investment operates through two channels: it amplifies offensive potency unconditionally and erodes defensive effectiveness conditionally, generating an adversarial discount that deepens endogenously with the defender's own investment. We derive a closed-form arms race ratio decomposing the relative marginal effectiveness of offensive and defensive investment into six structural primitives and establish equilibrium uniqueness and global convergence under a continuous best-response dynamic. The central result concerns signal cross-correlation, the degree to which threat intelligence on one surface informs detection on another. With full cross-correlation, the arms race ratio is independent of the number of attack surfaces: the attacker's structural advantage from surface proliferation is completely neutralized. Under the benchmark full-dilution case, without cross-correlation, per-surface defense effectiveness vanishes as the attack surface grows. Extending the analysis to heterogeneous defenders facing an attacker who targets by expected value, we argue that the model points to a dual inefficiency: overinvestment in private defense (a zero-sum redirective externality) and underinvestment in shared signal correlation (a public good). These formal results, together with public-good reasoning outside the base model, characterize when collective information aggregation can dominate private capability investment as the decisive margin in adversarial contests.

[619]  arXiv:2605.04344 (cross-list from stat.ML) [pdf, ps, other]
Title: Perturbation is All You Need for Extrapolating Language Models
Comments: 44 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

We introduce a simple yet powerful framework for training large language models. In contrast to the standard autoregressive next-token prediction based on an exact prefix, we propose a perturbation-based procedure that first transforms the prefix into a semantic neighbor and then conditions on this perturbed variant for next-token prediction. This yields a hierarchical model with a pre-post-additive noise structure. Within this framework, we develop a rigorous theory of extrapolability, namely, the capacity of a model class to make reliable predictions for token sequences that lie outside the empirical support of the training corpus. We evaluate the finite-sample performance of the proposed procedure using both synthetic and real-world language data. Results show that the proposed method consistently improves out-of-support prediction while maintaining competitive in-support performance, demonstrating that perturbation offers a practical route to language modeling.

[620]  arXiv:2605.04379 (cross-list from math.CO) [pdf, ps, other]
Title: More on the Erd\H os--Kleitman problem on matchings in set families
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

Let $e(n,s)$ denote the maximum size of a family $\mathcal{F}$ of subsets of an $n$-element set that contains no $s$ pairwise disjoint members. In 1968, answering a question of Erd\H{o}s, Kleitman determined $e(sm-1,s)$ and $e(sm,s)$ for all integers $m,s\ge 1$. Half a century later, Frankl and Kupavskii determined $e(s(m+1)-\ell, s)$ for $\ell \leq \frac{s-3}{m+3}$. They showed that the corresponding extremal example is closely connected with the extremal example for the Erd\H{o}s Matching Conjecture, and conjectured that the same remains true for all $\ell \leq s/2$. In this paper, we prove an approximate version of their conjecture for $s\ge s_0(m)$.

[621]  arXiv:2605.04381 (cross-list from stat.ME) [pdf, ps, other]
Title: Causal discovery under mean independence and linearity
Comments: 25 pages, 5 figures
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of dependence can cause the methods to recover the wrong causal order, even with infinite data. We introduce the Linear Mean-Independent Acyclic Model (LiMIAM), which replaces full independence with weaker one-sided mean-independence restrictions on the disturbances. Under finite-order consequences of these restrictions, source nodes are generically identifiable, and hence a compatible causal order can be recovered recursively. Our proof is constructive and leads to DirectLiMIAM, a sequential residual-based algorithm for causal discovery under dependent noise. In simulations with mean-independent but dependent disturbances, DirectLiMIAM outperforms LiNGAM methods. A large-scale empirical application to the oil market highlights the implausibility of the independence assumption and the ability of DirectLiMIAM to recover a realistic causal ordering, from policy to production and from prices to inflation.

[622]  arXiv:2605.04416 (cross-list from quant-ph) [pdf, ps, other]
Title: SpinTune: Improving the Reliability of Quantum Sensor Networks for Practical Quantum-Classical Utility
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)

Emerging quantum sensors are increasingly envisioned as components of hybrid quantum-classical high-performance computing, enabling new capabilities in scientific, cyber-physical, and machine-learning pipelines. However, their practical utility is limited by environmental decoherence, which degrades sensing reliability. While dynamical decoupling (DD) pulse sequences can mitigate this, standard methods are often suboptimal in the presence of realistic noise. We present SpinTune, a reinforcement learning software approach that autonomously discovers adaptive, piecewise DD sequences tailored to specific environments. Using a simulation model of a Carbon-13 spin bath, we show that SpinTune significantly outperforms standard DD sequences in preserving coherence.

[623]  arXiv:2605.04443 (cross-list from q-bio.NC) [pdf, ps, other]
Title: Dissociating spatial frequency reliance from adversarial robustness advantages in neurally guided deep convolutional neural networks
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)

Deep convolutional neural networks (DCNNs) have rivaled humans on many visual tasks, yet they remain vulnerable to near-imperceptible perturbations generated by adversarial attacks. Recent work shows that aligning DCNN representations with human visual cortex activity improves adversarial robustness, but the mechanisms driving this advantage are unclear. One hypothesis suggests that neural alignment confers robustness by biasing models away from brittle high-frequency details and towards the low spatial frequencies (LSF). However, recent work shows that human object recognition critically depends on a narrow, mid-frequency "human channel". Interestingly, this band was partially preserved in prior LSF-focused studies. Here, we investigate whether a spectral bias towards the LSF or the human channel is the primary driver of the adversarial robustness observed in neurally aligned DCNNs. We first show that DCNNs aligned to higher-order regions of the human ventral visual stream systematically increase reliance on both LSF and the human channel. However, directly steering DCNNs towards these bands revealed a clear dissociation. Biasing models towards the human channel, either alone or together with LSF, does not improve robustness and even impairs it. LSF bias produced some robustness gains, but such improvements are modest despite inducing much larger shifts in spatial-frequency reliance than neurally aligned models. Spatial-frequency-biased models overall show little, if any, increase in similarity to human neural representational geometry. Together, our results suggest that altered spatial-frequency reliance is likely an emergent property of learning more human-like representations rather than the primary mechanism by which neural alignment confers adversarial robustness, and motivate the need for future research examining representational properties beyond spatial-frequency profiles.

[624]  arXiv:2605.04493 (cross-list from cond-mat.stat-mech) [pdf, ps, other]
Title: The unique, universal entropy for complex systems
Authors: Kenric P. Nelson
Comments: 35 pages, 6 figures, 3 tables
Subjects: Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT)

An axiomatic foundation regarding the entropy for complex systems is established. Missing from decades of research was the requirement that entropy must measure the uncertainty at the informational scale of the maximizing distribution, where the log-log slope equals $-1$. Additionally, entropy must be extensive across the full universality scaling classes defined by Hanel-Thurner. The coupled entropy, maximized by the coupled stretched exponential distributions, is proven to be the unique, universal entropy that satisfies these requirements. The non-additivity of the entropy is equal to the long-range dependence or nonlinear statistical coupling. The entropy-matched extensivity is a function of the coupling, stretching parameter, and dimensions. Evidence is provided that the Tsallis $q$-statistics creates misalignment in the physical modeling of complex systems. Information thermodynamic applications are reviewed, including measuring complexity, a zeroth law of temperature, the thermodynamic consistency of the coupled free energy, and a model of intelligence in non-equilibrium.

[625]  arXiv:2605.04505 (cross-list from eess.AS) [pdf, ps, other]
Title: JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)

The rapid advancement of generative audio models has outpaced the development of robust evaluation methodologies. Existing objective metrics and general multimodal large language models (MLLMs) often struggle with domain generalization, zero-shot capabilities, and instructional flexibility. To address these bottlenecks, we propose JASTIN, a generalizable, instruction-driven audio evaluation framework that formulates audio assessment as a self-instructed reasoning task. JASTIN bridges a frozen high-performance audio encoder with a fine-tuned LLM backbone via a trainable audio adapter. To ensure robust zero-shot generalization, we introduce a comprehensive instruction following data preparation pipeline, incorporating Multi-Source, Multi-Task, Multi-Calibration, and Multi-Description data. Experimental results demonstrate that JASTIN achieves state-of-the-art Pearson and Spearman correlations with human subjective ratings. It consistently outperforms general MLLMs across speech, sound, music, and out-of-domain evaluation tasks without the need for task-specific retraining.

[626]  arXiv:2605.04510 (cross-list from math.OC) [pdf, ps, other]
Title: Predictive and Prescriptive AI toward Optimizing Wildfire Suppression
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Intense wildfire seasons require critical prioritization decisions to allocate scarce suppression resources over a dispersed geographical area. This paper develops a predictive and prescriptive approach to jointly optimize crew assignments and wildfire suppression. The problem features a discrete resource-allocation structure with endogenous wildfire demand and non-linear wildfire dynamics. We formulate an integer optimization model with crew assignments on a time-space-rest network, wildfire dynamics on a time-state network, and linking constraints between them. We develop a two-sided branch-and-price-and-cut algorithm based on: (i) a two-sided column generation scheme that generates fire suppression plans and crew routes iteratively; (ii) a new family of cuts exploiting the knapsack structure of the linking constraints; and (iii) novel branching rules to accommodate non-linear wildfire dynamics. We also propose a data-driven double machine learning approach to estimate wildfire spread as a function of covariate information and suppression efforts, mitigating observed confounding between historical crew assignments and wildfire growth. Extensive computational experiments show that the optimization algorithm scales to otherwise intractable real-world instances; and that the methodology can enhance suppression effectiveness in practice, resulting in significant reductions in area burned over a wildfire season and guiding resource sharing across wildfire jurisdictions.

[627]  arXiv:2605.04533 (cross-list from quant-ph) [pdf, ps, other]
Title: Online Riemannian Gradient Descent for Quantum State Tomography with Matrix Product Operators
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Optimization and Control (math.OC)

Matrix product operators (MPOs) provide a scalable approach for quantum state tomography (QST) by offering a compact representation of many-body mixed states with limited entanglement, using only a number of parameters that scales polynomially with the system size. In this paper, we study QST for quantum density matrices that can be represented by MPOs. We first derive an equivalent characterization of Hermiticity in terms of the MPO core tensors and show that the coefficient tensor of an MPO under the Pauli or generalized Gell-Mann basis admits a real-valued low tensor-train (TT) rank structure. This establishes an explicit connection between MPO-based QST and noisy low-rank tensor completion. Motivated by this formulation, we develop an online Riemannian gradient descent (oRGD) algorithm that sequentially incorporates measurement data during the reconstruction process. With a proper initialization, we prove that oRGD converges linearly to the target MPO and succeeds with a number of distinct measurement settings that scales quadratically with the system size. As a byproduct, our analysis also yields a significantly improved sample complexity bound for the low TT rank tensor completion task. Furthermore, we propose a tailored spectral initialization method and establish its theoretical guarantee. Numerical experiments on several classes of quantum states validate the effectiveness and scalability of the proposed method.

[628]  arXiv:2605.04582 (cross-list from quant-ph) [pdf, ps, other]
Title: Fundamental Limitations of Post-Quantum Cryptographic Architectures
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)

Modern lattice-based cryptography, particularly the learning with errors paradigm, relies on injecting artificial noise to secure data against quantum adversaries. This study systematically examines the theoretical and physical boundaries of this noise-reliant model across four interconnected domains: computational complexity, information-theoretic thermodynamics, quantum error correction, and quantum learning theory. Starting from the algorithmic foundation, our analysis notes that these frameworks rely on provisional complexity-theoretic assumptions that remain vulnerable to future quantum algorithmic advancements. Furthermore, by translating this cryptographic mechanism into physical thermodynamics, we illustrate that intentionally injected discrete Gaussian noise does not equate to the permanent erasure of information. Because the structural integrity of the cryptographic secret remains preserved within the ciphertext, advanced quantum error correction protocols and quantum learning models can efficiently extract the underlying mathematical kernel. Ultimately, we suggest that while lattice-based cryptography provides a robust transitional alternative, definitively classifying these frameworks as unconditionally post-quantum represents a premature classification relying on transient physical bottlenecks rather than impenetrable theoretical boundaries.

[629]  arXiv:2605.04589 (cross-list from stat.ML) [pdf, ps, other]
Title: Multiscale Euclidean Network Trajectories: Second-Moment Geometry, Attribution, and Change Points
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

A central challenge in dynamic network analysis is to represent temporal evolution in a way that is both geometrically meaningful and statistically identifiable. One approach embeds a sequence of network snapshots as trajectories in a Euclidean space and relates these trajectories to node embeddings. In multilayer and unfolded spectral constructions, however, node embeddings and their underlying latent positions are identifiable only up to general linear transformations. Although this ambiguity preserves edge probabilities, it can distort geometry and invalidate distance based temporal comparisons at both the trajectory and node-levels.
We develop Multiscale Euclidean Network Trajectories (MENT), a framework for multiscale temporal trajectories based on second-moment geometry. By imposing an isotropic normalization on the anchor latent positions, we reduce the relevant ambiguity to orthogonal transformations and prevent distortion of the second-moment geometry. In this canonical representation, we define a trace variation distance and mode-wise variation distances along orthogonal directions, and use multidimensional scaling to obtain low-dimensional trajectories of time points at both global and mode-wise levels. The resulting trajectories support interpretation and inference. They admit mode-wise decompositions, support attribution of global and mode-wise temporal changes to nodes, and enable change point detection through 1D trajectories. We prove consistency of the proposed unfolded spectral embedding and of the induced temporal trajectories. Experiments on two synthetic and two real dynamic networks illustrate stable and interpretable recovery of temporal structure and show strong performance against existing change point detection baselines.

[630]  arXiv:2605.04604 (cross-list from quant-ph) [pdf, ps, other]
Title: Generative Quantum-inspired Kolmogorov-Arnold Eigensolver
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum circuit simulation, and selected configuration interaction postprocessing. We present the generative quantum-inspired Kolmogorov-Arnold eigensolver (GQKAE), a parameter-efficient extension of the generative quantum eigensolver (GQE) for quantum chemistry. GQKAE replaces the parameter-heavy feed-forward network components in GPT-style generative eigensolvers with hybrid quantum-inspired Kolmogorov-Arnold network modules, forming a compact HQKANsformer backbone. The method preserves autoregressive operator selection and the quantum-selected configuration interaction evaluation pipeline, while using single-qubit DatA Re-Uploading ActivatioN modules to provide expressive nonlinear mappings. Numerical benchmarks on H4, N2, LiH, C2H6, H2O, and the H2O dimer show that GQKAE achieves chemical accuracy comparable to the GPT-based GQE architecture, while reducing trainable parameters and memory by approximately 66% and improving wall-time performance. For strongly correlated systems such as N2 and LiH, GQKAE also improves convergence behavior and final energy errors. These results indicate that quantum-inspired Kolmogorov-Arnold networks can reduce classical-side overhead while preserving circuit-generation quality, offering a scalable route for HPC-quantum co-design on near-term quantum platforms.

[631]  arXiv:2605.04689 (cross-list from math.LO) [pdf, ps, other]
Title: Continuations and Completeness in Proof-theoretic Semantics
Subjects: Logic (math.LO); Logic in Computer Science (cs.LO)

This is a short paper about the relationship between logic and computation. More specifically, it is about a relationship between the completeness proof for intuitionistic propositional logic within the form of proof-theoretic semantics that is known as base-extension semantics and a fundamental idea from the theory of computation called continuation-passing semantics. The latter is explained herein both in terms of reduction in natural deduction and the lambda calculus and in terms of proof-search. The relationship between completeness and continuations is explored through an analysis of Sandqvist's proof of the completeness theorem as seen from the mathematical perspective of Kripke's and Heyting's semantics. Our analysis can be seen to reveal how syntactic representations of continuations embody intensional semantical intuitions about the relationship between their meaning and use. These intuitions are made precise using the tools of proof-theoretic semantics.

[632]  arXiv:2605.04734 (cross-list from math.CO) [pdf, ps, other]
Title: Hamilton decompositions of all directed tori at odd modulus
Authors: SangHyun Park
Comments: 37 pages, 10 figures. Companion Lean 4 formalization at this https URL (release tag 0.0.3-allodd). Companion preprints: arXiv:2603.24708 (D_3), arXiv:2604.27140 (D_5), arXiv:2605.00660 (D_7)
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Logic in Computer Science (cs.LO)

Let $D_d(m) = \operatorname{Cay}((\mathbb{Z}/m\mathbb{Z})^d, \{e_0, \ldots, e_{d-1}\})$ be the directed Cartesian product of $d$ directed $m$-cycles. We prove that $D_d(m)$ admits a directed Hamilton decomposition for every dimension $d \geq 2$ and every odd modulus $m \geq 3$. The proof combines two new closure mechanisms with a small set of base dimensions. The high-modulus count branch handles every odd $d \geq 5$ and every odd $m \geq d$ via triangular prefix coordinates and a primitivity criterion controlled by gcd conditions on symbol counts. The base-tail modular-trade branch handles the complementary range $m < d$ by decomposing a base multigraph into cylinders and scheduling active tail residues by local symbol trades; it yields the successor closure $b \mapsto 2b+1$ for $b \geq 5$. Together with multiplicative product closure, these reduce the all-dimensions theorem to the four base dimensions $d \in \{2, 3, 5, 7\}$. Dimensions $2$ and $3$ are proved here; dimensions $5$ and $7$ are imported from companion arXiv preprints.
A Lean 4 formalization records the same all-dimensions endpoint. As an independent consequence, the dimensions $2$ and $3$ alone solve every odd $d \geq 29$, by a dyadic-triadic interval-hitting argument.

[633]  arXiv:2605.04833 (cross-list from stat.ME) [pdf, ps, other]
Title: Data anonymization in the presence of outliers via invariant coordinate selection
Comments: Submitted to Privacy in Statistical Databases 2026
Subjects: Methodology (stat.ME); Cryptography and Security (cs.CR)

Protecting confidential data while preserving utility is particularly challenging when data sets contain outlying observations. Existing latent space anonymization methods, such as spectral anonymization (SA), rely on principal component analysis (PCA) and may therefore be vulnerable to contamination. We investigate anonymization in the presence of outliers and propose ICSA, a robust alternative to SA based on invariant coordinate selection (ICS). By replacing the PCA transformation with ICS, the robustness of the anonymization procedure can be regulated through the choice of scatter matrices. Alongside the methodological development, we derive a theoretical result showing that SA fails under sufficiently influential outliers. To assess the practical implications of this result, we compare the privacy-utility trade-off of ICSA and SA through simulation experiments under varying contamination settings and outlier severities. Our findings indicate that implementations of ICSA based on robust scatter matrices achieve stronger privacy protection than SA, while typically maintaining comparable, and in some cases improved, utility. We further examine the empirical performance of the proposed method using a benchmark clinical data set, where ICSA demonstrates superior overall privacy-utility efficiency relative to SA. These results suggest that explicitly accounting for outliers can materially improve anonymization performance and that robust latent space transformations offer a promising direction for privacy-preserving statistical data release.

[634]  arXiv:2605.04838 (cross-list from stat.ME) [pdf, ps, other]
Title: PAIR-CI: Calibrated Conditional Independence Testing for Causal Discovery with Incomplete Data
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)

The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability approaching 1 when imputation error induces spurious conditional dependence. We introduce PAIR-CI, a nonparametric CI test that restores calibration by integrating multiple imputation directly into the inferential procedure via a paired permutation design. PAIR-CI compares cross-validated models that include and exclude the candidate variable while receiving the same imputed conditioning set, forcing imputation error to cancel in their loss difference rather than contaminate the test statistic. A provably consistent variance estimator jointly accounts for uncertainty arising from cross-validation and multiple imputation -- to our knowledge, the first formal unification of these two inferential frameworks. In simulations, existing imputation-based CI tests exhibit false positive rates of 28--45% when data are missing not at random (MNAR), whereas PAIR-CI averages below the nominal 5% level across data-generating processes and missingness mechanisms. These gains are largest in nonlinear settings and grow with causal graph size: when integrated into the PC algorithm, PAIR-CI reduces structural Hamming distance by 8% on 10-variable nonlinear graphs, 15% on 30-variable equivalents, and up to 44% on the 56-variable HAILFINDER network, with stable performance in all settings.

[635]  arXiv:2605.04855 (cross-list from quant-ph) [pdf, ps, other]
Title: W-state graphs: Structure and Algorithms
Subjects: Quantum Physics (quant-ph); Discrete Mathematics (cs.DM); Combinatorics (math.CO)

We study the class of edge-coloured graphs arising from the graph-theoretic representation of quantum photonic experiments that generate multipartite W-states. Abstracting away physical amplitudes and phases, we introduce W-state graphs: matching-covered graphs equipped with a half-edge 2-colouring such that every perfect matching contains exactly one bichromatic edge and every vertex is incident with a red half-edge. Our main contribution is a complete structural characterization of W-state graphs. We show that a graph is a W-state graph if and only if each of its 3-connected components is a W-cone, a simple and rigid building block defined by a universal vertex and a factor-critical base. This characterization implies that no W-state graph is simple and yields a recognition algorithm running as fast as verifying whether a graph is matching-covered. We also show that the natural generalization to Dicke states encounters a complexity barrier: verifying one of the two Dicke state conditions is itself coNP-complete, resolving an open problem of Vardi and Zhang [IJCAI 2023]. Our results place W-state graphs firmly within classical matching theory and precisely delineate the combinatorial structures capable of realizing idealized W-states in the experiment-graph framework.

[636]  arXiv:2605.04915 (cross-list from quant-ph) [pdf, ps, other]
Title: Optimal Error Exponents for Composite Sequential Quantum Hypothesis Testing
Comments: Under Review
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Statistics Theory (math.ST)

We study the composite sequential quantum hypothesis testing (SQHT) problem, where the objective is to distinguish a null quantum state from a compact, convex set of alternative quantum states. We propose a mixture-sequential quantum probability ratio test that adaptively selects measurements based on the current mixture estimate of the alternative set, and stops upon the first threshold crossing of the mixture log-likelihood ratio. Under an expected sample size constraint, we show that our proposed adaptive strategy simultaneously achieves the optimal Type-I and (worst-case) Type-II error exponents. These exponents are characterized by the minimal measured relative entropies between the null state and the alternative set. We further establish a matching converse, thereby characterizing the optimal error exponent region. Finally, our results show that achieving vanishing error probabilities in composite SQHT requires an expected sample complexity at least as large as that of sequential testing between two fixed quantum states.

[637]  arXiv:2605.04918 (cross-list from math.AP) [pdf, ps, other]
Title: Neural Discovery of Strichartz Extremizers
Comments: 38 pages, 26 figures
Subjects: Analysis of PDEs (math.AP); Machine Learning (cs.LG); Numerical Analysis (math.NA)

Strichartz inequalities are a cornerstone of the modern theory of dispersive PDEs, but their extremizers are known explicitly only in a handful of sharp cases. The non-convexity of the underlying functional makes the problem hard, and to our knowledge no systematic numerical attack has been attempted. We propose a simple neural-network-based pipeline that searches for extremizers as critical points of the Strichartz ratio, and apply it in three settings. First, on the Schr\"odinger group we recover the Gaussian extremizers of Foschi and Hundertmark--Zharnitsky in dimensions $d=1,2$ to within $10^{-3}$ relative error, with no analytical prior. Second, on $59$ further admissible pairs in $d=1$ where the answer is conjectural, the method consistently finds Gaussians, supporting the conjecture that Gaussians are the universal extremizers in the admissible range. Third, on the critical Airy--Strichartz inequality at $\gamma=1/q$, where existence is open, the optimization does not converge to any $L^2$ profile: instead, the iterates organize themselves as mKdV breathers $B(0,\cdot;\alpha,1,0,0)$ with growing internal frequency $\alpha$, and the discovered ratio approaches the Frank--Sabin universal lower bound $\widetilde A_{q,r}$ from below with a power-law gap $\sim\alpha^{-0.9}$. We confirm the same picture with an independent Hermite-basis ansatz. We propose a precise conjecture: the supremum equals $\widetilde A_{q,r}$ and is approached, but not attained, along the breather family. The pipeline thus serves both as a validator on known cases and as a discovery tool when no extremizer exists.

[638]  arXiv:2605.04924 (cross-list from eess.SP) [pdf, ps, other]
Title: 423.7 + 426.5 Tb/s GMI Bi-Directional HCF Transmission
Comments: 4 pages, 5 figures, submitted to ECOC 2026
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

We demonstrate OESCL-band same-wavelength bi-directional transmission over 60 km HCF with 42.5 THz bandwidth, achieving GMIs comparable with the highest unidirectional SMF data-rates in both directions, with an aggregate of 423.7 + 426.5 Tb/s.

[639]  arXiv:2605.04932 (cross-list from stat.ML) [pdf, ps, other]
Title: Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift
Comments: 8 pages, 4 figures, 4 tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincar\'e inequality reduces temporal risk volatility to derivative energy, and a Jacobian-velocity theorem identifies directional tangent energy along the deployment path as the governing quantity under explicit along-path regularity and domination assumptions. Under low-rank drift, that quantity reduces to directional Jacobian energy in the drift subspace, motivating drift-aligned tangent regularization (DTR) and a matched monitoring proxy. Rather than smoothing the network isotropically, DTR penalizes sensitivity only along estimated drift directions. We validate the theorem-to-method pipeline in four experiments: a synthetic benchmark for the time-domain inequality, a controlled synthetic comparison against isotropic Jacobian regularization, and two frozen-deployment studies on the UCI Air Quality and Tetouan power-consumption datasets. DTR reduces risk volatility and directional gain in the controlled low-rank regime, beats isotropic smoothing there, and gives validation-selected deployment gains on both real datasets when the Air Quality drift subspace is estimated from target-orthogonal sensor motion. Moderate drift-subspace misspecification is tolerable while orthogonal misspecification largely removes the benefit.

[640]  arXiv:2605.04958 (cross-list from eess.SP) [pdf, ps, other]
Title: Fast Full-Wave Simulation of Indoor RSS Maps for Pre-Measurement Validation in Device-Free Localization
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

Human localization is gaining momentum in security, healthcare, logistics, and smart spaces applications. While global navigation systems are unreliable indoor, device-free (a.k.a. passive) localization methods that exploit human-induced perturbations of radio propagation can be effectively used. This paper investigates the use of a compact full-wave electromagnetic (EM) setup as a fast and reliable tool to simulate indoor Wi-Fi propagation for human sensing. The goal is to provide a practical baseline for validating simplified propagation models, such as diffraction-based descriptions, and to reduce the need for costly measurement campaigns. Two-dimensional attenuation maps from received signal strength are generated and compared in controlled environments, focusing on attenuation statistics and interference patterns. The simulations reproduce the main spatial features, though discrepancies remain due to simplified material characterization. Diffraction-aware refinements are proposed to mitigate these effects. Overall, the approach provides an efficient pre-measurement reference to support device-free system design and to guide experimental planning.

[641]  arXiv:2605.04987 (cross-list from math.CO) [pdf, ps, other]
Title: Matchings in permutations
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

We say that two permutations $[n]\to [n]$ intersect if they map some element $x$ to the same element $y$. A matching in a family of permutations is a collection of pairwise disjoint permutations. In this paper, we study families of permutations with no matchings of size $s$. In particular, we obtain a characterization of the largest $s$-matching-free families and a Hilton--Milner type result. We also obtain results for the families of derangements.

[642]  arXiv:2605.05008 (cross-list from stat.ML) [pdf, ps, other]
Title: Scalable inference of spatial regions and temporal signatures from time series
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

Regionalization aims to partition a spatial domain into contiguous regions that share similar characteristics, enabling more effective spatial analysis, policy making, and resource management. Existing approaches for spatial regionalization typically rely on static spatial snapshots rather than evolving time series. Meanwhile, most time series clustering methods ignore spatial structure or enforce spatial continuity through ad hoc regularization, constraining the number of inferred regions a priori either explicitly or implicitly. Utilizing the minimum description length principle from information theory, here we propose an efficient and fully nonparametric framework for the regionalization of spatial time series. Our method jointly infers a spatial partition along with a set of representative time series archetypes ("drivers") that best compress a spatiotemporal dataset, with a runtime log-linear in the number of time series. We demonstrate that this method can accurately recover planted regional structure and drivers in synthetic time series, and can extract meaningful structural regularities in large-scale empirical air quality and vegetation index records. Our method provides a principled and scalable framework for spatially contiguous partitioning, allowing interpretable temporal patterns and homogeneous regions to emerge directly from the data itself.

[643]  arXiv:2605.05024 (cross-list from stat.ML) [pdf, ps, other]
Title: Hypergraph Generation via Structured Stochastic Diffusion
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)

Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propose \HEDGE, a generative model defined directly on relaxed incidence matrices via a structured stochastic diffusion. The forward process combines a hypergraph-specific two-sided heat operator with an Ornstein--Uhlenbeck component, preserving structure-aware noising near the data while yielding an explicit Gaussian terminal law. Conditional on an observed hypergraph, this forward process is linear-Gaussian, so conditional means, covariances, scores, and reverse-drift targets are available in closed form. We therefore learn a permutation-equivariant state-only reverse-drift field in incidence space by regressing onto exact conditional targets, and generate samples by simulating a learned reverse-time SDE from the Gaussian base law. We establish exactness in the ideal state-only setting together with finite-horizon stability guarantees, and empirically show improved hypergraph generation quality relative to strong baselines.

[644]  arXiv:2605.05036 (cross-list from quant-ph) [pdf, ps, other]
Title: Block Permutation Routing on Ramanujan Hypergraphs for Fault-Tolerant Quantum Computing
Comments: 20 pages, 3 figures, 4 tables
Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)

We analyze permutation routing of rigid blocks representing surface code patches of $d_C^2$ atoms on a reconfigurable lattice with hypergraph transformations. For a hypergraph $H$, code distance $d_C$, $s=d_C^2$, number of blocks $N_L$, and guard distance $g$, we show the block routing number $\mathrm{rt}_B(H, s, g) = \Theta(d_C \log N_L)$. A spectral analysis of the quotient graph $Q(G_{\mathrm{cl}}(H), B)$ (blocks as supervertices) shows that the spectral ratio $\beta_Q < 1$ is preserved in the high-connectivity regime. Negative association of block permutations and congestion bounds are used for random intermediate configurations. Serialization establishes that each quotient routing phase requires $O(d_C)$ physical sub-steps due to the block footprint width. A lower bound $\mathrm{rt}_B = \Omega(d_C \log N_L)$ follows from combining the spectral lower bound on quotient phases with the traversal cost per phase. We include error model analysis grounded in recent experimental results, syndrome extraction protocols (stop-and-correct, rolling active fault-tolerant (AFT) measurement, and adaptive deformation), and integration with lattice surgery compilation via the Litinski protocol. Composition with the correlated-decoding scheme reduces syndrome-extraction overhead from $O(d_C)$ to $O(1)$ per correction window, leaving routing as the leading-order contributor to the integrated $O(d_C \log N_L)$ depth. Spectral inheritance is organized in a hierarchy: exact (Haemers interlacing on equitable partitions), perturbative (Weyl bounds for near-equitable partitions, a practically relevant case for surface-code patches), and universal (higher-order Cheeger). Methods extend directly to QCCD trapped-ion architectures under the same regime condition, with junction crossings replacing AOD transports as the elementary single-hop translation.

[645]  arXiv:2605.05082 (cross-list from eess.IV) [pdf, ps, other]
Title: External Validation of Deep Learning Models for BI-RADS Breast Density Prediction from Ultrasound Images
Comments: Accepted at the 18th International Workshop on Breast Imaging (IWBI 2026)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

We externally validated three deep learning models (DenseNet121, ViT-B/32, and ResNet50) for predicting mammographic breast density from breast ultrasound exams on an independent cohort. The external validation set comprised 2,000 ultrasound exams, including 500 cancer cases defined by an initial negative exam (BI-RADS 1 or 2) followed by a cancer diagnosis within 6 months to 10 years, and 1,500 negative controls matched by manufacturer and study year. Performance was measured using patient-level AUROC across four density categories: A (fatty), B (scattered), C (heterogeneous), and D (extremely dense). As a downstream assessment, we also evaluated 10-year risk prediction by incorporating age and AI-derived density into the Tyrer-Cuzick model and comparing performance against a reference model using age and mammography-reported density. All three models performed best in extremely dense breasts (AUROC 0.868-0.899), with strong performance in fatty (0.814-0.838) and scattered density (0.764-0.799), and lower performance in heterogeneously dense breasts (0.699-0.729). DenseNet121 achieved the highest overall performance (micro-averaged AUROC 0.885), and performance across categories was comparable between internal and external testing. For risk modeling, age combined with AI-derived density yielded a lower AUROC than age combined with mammography-reported density (0.541 vs. 0.570; p = 0.23), with no statistically significant difference. These findings indicate that deep learning models generalize well to external data with different racial composition for breast density assessment. While performance is strongest in extremely dense breasts, heterogeneously dense remains more challenging, highlighting the need for targeted optimization.

[646]  arXiv:2605.05091 (cross-list from q-bio.NC) [pdf, ps, other]
Title: Think-Aloud Reshapes Automated Cognitive Model Discovery Beyond Behavior
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)

Computational cognitive models discovered using large language models have so far relied solely on behavioral data. However, it is well-known that models produced from the behavioral trajectory alone are typically under-determined. In this work, we explore the use of Think Aloud traces as an additional form of data constraint during automated model discovery. When applied to the domain of risky decision-making, we find that the models discovered with think-aloud achieve significantly improved predictive performance on held-out data. Additionally, we find that the discovered models belong to different structural classes than those discovered from behavior alone for the majority of participants (69.4\%), specifically, it shifts from Explicit comparator towards Integrated utility. These results suggest that process-level language data not only improve model fit, but also systematically reshape the structure of the discovered cognitive models, enabling the identification of mechanisms that are not recoverable from behavior alone.

[647]  arXiv:2605.05093 (cross-list from stat.ML) [pdf, ps, other]
Title: Proximal Projection for Doubly Sparse Regularized Models
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)

Regularization is often used in high-dimensional regression settings to generate a sparse model, which can save tremendous computing resources and identify predictors that are most strongly associated with the response. When the predictors can be represented by a Gaussian graphical model, the structure of the predictor graph can be exploited during regularization. Our proposed model exploits this underlying predictor graph structure by decomposing the estimated coefficient vector into a sum of latent variables that correspond to the sum of each node contribution to the coefficient vector. Regularization is then performed on the latent variables rather than on the coefficient vector directly. We use a penalty function that permits a clear user-defined trade-off between the L1 and L2 penalties and propose a novel proximal projection during optimization. Further, our implementation computes the projection operator for the intersection of selected groups, which conserves more computing resources compared to predictor duplication methods, especially for high-dimensional data. Through simulation, we evaluate the performance of our approach under different graph structures and node counts, and present results on real-world data. Results suggest that our method exhibits stable performance relative to other singly or doubly sparse graphical regression models.

[648]  arXiv:2605.05104 (cross-list from cond-mat.mtrl-sci) [pdf, ps, other]
Title: Building informative materials datasets beyond targeted objectives
Subjects: Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (cs.LG); Applications (stat.AP)

Materials science data collection can be expensive, making the reuse and long-term utility of datasets critical important for future discovery campaigns. In practice, researchers prioritize a subset of properties due to research interests. However, ignoring a subset of outcomes in data collection campaigns potentially generate datasets poorly suited for future learning tasks. Here, we present a framework for dataset construction that maximizes informativeness for target properties of interest while preserving performance on untargeted ones. Our approach uses diversity-aware selection to ensure broad coverage of the materials space. In noisy experimental dataset construction, we find that without our diversity-aware framework, prediction performance on untargeted properties can degrade by up to 40% relative to random sampling, whereas applying our framework yields improvements of up to 10% . For targeted properties, performance can degrade with respect to random sampling by up to 12.5% without diversity, while our framework achieves gains of up to 25%. Incorporating diversity into dataset construction not only preserves informativeness for the targeted properties, but also improves materials coverage for potential future objectives. As a result, the constructed datasets remain broadly informative across considered and unconsidered outcomes, ensuring unbiased quality entries and mitigating cold-start limitations in subsequent modeling and discovery campaigns.

[649]  arXiv:2605.05132 (cross-list from quant-ph) [pdf, ps, other]
Title: A Factor-Graph Formulation of CSS Syndrome Decoding: Joint BP and Four-State BP
Authors: Kenta Kasai
Comments: 14 pages, 3 figures
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

For CSS syndrome decoding, the two check matrices impose binary parity-check constraints on the two Pauli error components. The posterior can therefore be written as a binary factor graph with two Tanner graphs coupled by the local joint prior at each qubit. We call the sum-product algorithm on this factorization joint belief propagation (joint BP). Joint BP retains the local channel correlation between the two Pauli components. This note compares joint BP with the four-state Pauli-label factor graph used for four-state BP. The two algorithms are shown to have the same posterior weights, messages, and beliefs after relabeling the four local Pauli states and marginalizing the irrelevant binary component.

[650]  arXiv:2605.05175 (cross-list from eess.IV) [pdf, ps, other]
Title: MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge
Authors: Perry E. Radau
Comments: 21 pages, 4 figures, 10 tables
Subjects: Image and Video Processing (eess.IV); Computation and Language (cs.CL); Medical Physics (physics.med-ph)

Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanner operational knowledge central to research MRI practice. Purpose: We developed MRI-Eval, a tiered benchmark for relative model comparison on MRI physics and GE scanner operations knowledge using primary multiple-choice questions (MCQ), with stem-only and primed diagnostic conditions as complementary analyses. Methods: MRI-Eval includes 1365 scored items across nine categories and three difficulty tiers from textbooks, GE scanner manuals, programming course materials, and expert-generated questions. Five model families were evaluated (GPT-5.4, Claude Opus 4.6, Claude Sonnet 4.6, Gemini 2.5 Pro, Llama 3.3 70B). MCQ was primary; stem-only removed options and used an independent LLM judge; primed stem-only tested responses to incorrect user claims. Results: Overall MCQ accuracy was 93.2% to 97.1%. GE scanner operations was the lowest category for every model (88.2% to 94.6%). In stem-only, frontier-model accuracy fell to 58.4% to 61.1%, and Llama 3.3 70B fell to 37.1%; GE scanner operations stem-only accuracy was 13.8% to 29.8%. Conclusion: High MCQ performance can mask weak free-text recall, especially for vendor-specific operational knowledge. MRI-Eval is most informative as a relative comparison benchmark rather than an absolute competency measure and supports caution in using raw LLM outputs for GE-specific protocol guidance.

[651]  arXiv:2605.05183 (cross-list from math.AP) [pdf, ps, other]
Title: Numerical study of the 2D Kaup-Broer-Kuperschmidt Boussinesq system
Subjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA)

In this work we consider the well posed version of the Kaup-Broer-Kuperschmidt system in two dimensions. We numerically construct soliton type solutions and show that they are unstable both against dispersion and singularity formation. Further, we study line solitons and their stability, as well as generally localised initial data. In either case we fail to find stable structures.

[652]  arXiv:2605.05189 (cross-list from stat.ML) [pdf, ps, other]
Title: Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)

How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding.
We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale $d^2\asymp n$. At load $n/d^2\to\alpha$, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold $d^2\sim 2n\log n$.

[653]  arXiv:2605.05192 (cross-list from math.CA) [pdf, ps, other]
Title: Almost-Orthogonality in Lp Spaces: A Case Study with Grok
Subjects: Classical Analysis and ODEs (math.CA); Artificial Intelligence (cs.AI); Combinatorics (math.CO); Probability (math.PR)

Carbery proposed the following sharpened form of triangle inequality for many functions: for any $p\ge 2$ and any finite sequence $(f_j)_j\subset L^p$ we have \[ \Big\|\sum_j f_j\Big\|_p \ \le\ \left(\sup_{j} \sum_{k} \alpha_{jk}^{\,c}\right)^{1/p'} \Big(\sum_j \|f_j\|_p^p\Big)^{1/p}, \] where $c=2$, $1/p+1/p'=1$, and $\alpha_{jk}=\sqrt{\frac{\|f_{j}f_{k}\|_{p/2}}{\|f_{j}\|_{p}\|f_{k}\|_{p}}}$. In the first part of this paper we construct a counterexample showing that this inequality fails for every $p>2$. We then prove that if an estimate of the above form holds, the exponent must satisfy $c\le p'$. Finally, at the critical exponent $c=p'$, we establish the inequality for all integer values $p\ge 2$.
In the second part of the paper we obtain a sharp three-function bound \[ \Big\|\sum_{j=1}^{3} f_j\Big\|_p \ \le\ \left(1+2\Gamma^{c(p)}\right)^{1/p'} \Big(\sum_{j=1}^{3} \|f_j\|_p^p\Big)^{1/p}, \] where $p \geq 3$, $c(p) = \frac{2\ln(2)}{(p-2)\ln(3)+2\ln(2)}$ and $\Gamma=\Gamma(f_1,f_2,f_3)\in[0,1]$ quantifies the degree of orthogonality among $f_1,f_2,f_3$. The exponent $c(p)$ is optimal, and improves upon the power $r(p) = \frac{6}{5p-4}$ obtained previously by Carlen, Frank, and Lieb. Some intermediate lemmas and inequalities appearing in this work were explored with the assistance of the large language model Grok.

[654]  arXiv:2605.05193 (cross-list from math.PR) [pdf, ps, other]
Title: Grokability in five inequalities
Subjects: Probability (math.PR); Artificial Intelligence (cs.AI); Analysis of PDEs (math.AP); Classical Analysis and ODEs (math.CA); Functional Analysis (math.FA)

In this note, we report five mathematical discoveries made in collaboration with Grok, all of which have been subsequently verified by the authors. These include an improved lower bound on the maximal Gaussian perimeter of convex sets in $\mathbb{R}^n$, sharper $L_2$-$L_1$ moment comparison inequalities on the Hamming cube $\{-1,1\}^n$, a strengthened autoconvolution inequality, improved asymptotic bounds on the size of the largest $g$-Sidon sets in $\{1,\dots,n\}$, and an optimal balanced Szarek's inequality.

[655]  arXiv:2605.05198 (cross-list from math.OC) [pdf, ps, other]
Title: S-LCG: Structured Linear Congruential Generator-Based Deterministic Algorithm for Search and Optimization
Subjects: Optimization and Control (math.OC); Neural and Evolutionary Computing (cs.NE)

This study presents a novel deterministic optimization algorithm based on a special variant of the Linear Congruential Generator (LCG). While conventional algorithms generally operate within the search space, the introduced technique follows a two-level architecture. In particular, an external loop that adaptively balances between exploration and exploitation, while the internal loop evaluates solutions. It is motivated by the intrinsic structure of the generator, the reason behind naming it the Structured Linear Congruential Generator (S- LCG). which enjoys a number of unique characteristics as follows: 1) a memoryless scheme, which ensures non-overlapping sequences based on distinct seeds, thus ensuring no evaluation redundancy; 2) bit splitting representation, which converts LCG states into multi-dimensional points to overcome the Marsaglia lattice effect; 3) adaptive exploration-exploitation of the generator space, which leads to implicit optimization of the surrogate smooth objective function; and 4) constant information gathering speed to avoid the problem of premature convergence. Extensive testing on 26 benchmark functions across dimensions d = 2 to 30 demonstrates that S-LCG comes within 1% of the global optimum in 83.3% of 138 cases (100% at d = 2, 81.2% at d = 30) while the nearest competitor GA achieved 75.4%. Statistical validation shows that S-LCG outperforms eight cutting-edge binary algorithms. Furthermore, its practical value is confirmed by validation on three constrained engineering design problems. In the end, S-LCG offers an optimization framework that is strictly reproducible and requires only one sensitive parameter to be tuned.

Replacements for Thu, 7 May 26

[656]  arXiv:1405.0033 (replaced) [pdf, ps, other]
Title: Syntax and Semantics of Linear Dependent Types
Authors: Matthijs Vákár
Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL); Category Theory (math.CT)
[657]  arXiv:1501.05016 (replaced) [pdf, ps, other]
Title: A Categorical Semantics for Linear Logical Frameworks
Authors: Matthijs Vákár
Comments: Based on the technical report arXiv:1405.0033 . To appear in the proceedings of FoSSaCS 2015, in the Advanced Research in Computing and Software Science (ARCoSS) subline of Springer's Lecture Notes in Computer Science series
Subjects: Logic in Computer Science (cs.LO)
[658]  arXiv:1604.01713 (replaced) [pdf, ps, other]
Title: A block Recycled GMRES method with investigations into aspects of solver performance
Comments: 37 pages, 26 pages of manuscript text, 16 figures, 1 table, Temple University Research Report 16-04-04
Subjects: Numerical Analysis (math.NA)
[659]  arXiv:2106.05513 (replaced) [pdf, ps, other]
Title: Deterministic Mincut in Almost-Linear Time
Authors: Jason Li
Comments: STOC 2021, 31 pages. Fix technical error in Theorem 1.5 resulting in an $\epsilon^{-7}$ term instead of $\epsilon^{-4}$. Also fix formatting throughout the paper
Subjects: Data Structures and Algorithms (cs.DS)
[660]  arXiv:2209.15111 (replaced) [pdf, ps, other]
Title: Quantifying Harm
Comments: Preprint: under submission
Subjects: Artificial Intelligence (cs.AI)
[661]  arXiv:2302.03167 (replaced) [pdf, ps, other]
Title: Algebraic Semantics of Datalog with Equality
Subjects: Logic in Computer Science (cs.LO)
[662]  arXiv:2307.03017 (replaced) [pdf, ps, other]
Title: RealLiFe: Real-Time Light Field Reconstruction via Hierarchical Sparse Gradient Descent
Comments: Accepted by IEEE TPAMI
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[663]  arXiv:2309.03810 (replaced) [pdf, ps, other]
Title: Three Hardness Results for Graph Similarity Problems
Subjects: Discrete Mathematics (cs.DM); Computational Complexity (cs.CC)
[664]  arXiv:2312.06423 (replaced) [pdf, ps, other]
Title: MalPurifier: Enhancing Android Malware Detection with Adversarial Purification against Evasion Attacks
Comments: 18 pages; Accepted by IEEE TDSC
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[665]  arXiv:2402.08446 (replaced) [pdf, ps, other]
Title: Inevitability of Polarization in Geometric Opinion Exchange
Subjects: Social and Information Networks (cs.SI); Theoretical Economics (econ.TH)
[666]  arXiv:2404.06230 (replaced) [pdf, ps, other]
Title: Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning
Comments: Accepted to IEEE European Symposium on Security and Privacy 2026
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[667]  arXiv:2404.13649 (replaced) [pdf, ps, other]
Title: Distributional Principal Autoencoders
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[668]  arXiv:2406.07293 (replaced) [pdf, ps, other]
Title: Automated versus Human Engagement: Mapping Cognitive Bias Triggers in Online Discourse
Subjects: Social and Information Networks (cs.SI)
[669]  arXiv:2407.04924 (replaced) [pdf, ps, other]
Title: Symmetric Linear Arc Monadic Datalog and Gadget Reductions
Subjects: Rings and Algebras (math.RA); Computational Complexity (cs.CC); Logic in Computer Science (cs.LO)
[670]  arXiv:2407.08587 (replaced) [pdf, ps, other]
Title: From AoI to QVAoI: Query-Based Semantics-Aware Scheduling for Energy-Harvesting IoT Systems
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
[671]  arXiv:2407.14861 (replaced) [pdf, ps, other]
Title: Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes
Comments: ECML PKDD 2024, 18 pages, 2 figures, 5 tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[672]  arXiv:2408.09929 (replaced) [pdf, ps, other]
Title: Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise
Comments: Accepted by ICML 2026
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[673]  arXiv:2409.09183 (replaced) [pdf, ps, other]
Title: Quantum-inspired Reinforcement Learning for Synthesizable Drug Design
Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)
[674]  arXiv:2409.17596 (replaced) [pdf, ps, other]
Title: Subjective and Objective Quality-of-Experience Evaluation Study for Live Video Streaming
Comments: 17 pages, 8 figures
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
[675]  arXiv:2410.00206 (replaced) [pdf, ps, other]
Title: A Unified FPT Framework for Crossing Number Problems
Comments: short version at ESA 2025
Subjects: Computational Geometry (cs.CG); Combinatorics (math.CO)
[676]  arXiv:2410.23222 (replaced) [pdf, ps, other]
Title: Dataset-Driven Channel Masks in Transformers for Multivariate Time Series
Comments: ICASSP 2026. Preliminary version: NeurIPS Workshop on Time Series in the Age of Large Models 2024 (Oral presentation)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[677]  arXiv:2411.09764 (replaced) [pdf, ps, other]
Title: ModelPredictiveControl.jl: advanced process control made easy in Julia
Comments: 11 pages, 12 figures, 1 table
Subjects: Systems and Control (eess.SY)
[678]  arXiv:2411.19300 (replaced) [pdf, ps, other]
Title: Fast Switching in Mixed-Integer Model Predictive Control
Comments: This preprint was revised based on the feedback from the reviewers and resubmitted to the IEEE. The previous version has been conditionally accepted for publication
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[679]  arXiv:2412.03077 (replaced) [pdf, ps, other]
Title: RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos
Comments: 29 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[680]  arXiv:2412.08893 (replaced) [pdf, ps, other]
Title: Optimal Control with Natural Images: Efficient Reinforcement Learning using Overcomplete Sparse Codes
Authors: Peter N. Loxley
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
[681]  arXiv:2501.01556 (replaced) [pdf, ps, other]
Title: The Geometry of Statistical Data and Information: A Large Deviation Perspective
Subjects: Information Theory (cs.IT)
[682]  arXiv:2501.03717 (replaced) [pdf, ps, other]
Title: Materialist: Physically Based Editing Using Single-Image Inverse Rendering
Comments: More Comprehensive IJCV Camera-Ready Version. Project website: this https URL
Journal-ref: International Journal of Computer Vision (IJCV), 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[683]  arXiv:2501.03972 (replaced) [pdf, ps, other]
Title: MAD-BA: 3D LiDAR Bundle Adjustment -- from Uncertainty Modelling to Structure Optimization
Comments: 8 pages, 7 figures. This work has been accepted to IEEE Robotics and Automation Letters (RA-L)
Subjects: Robotics (cs.RO)
[684]  arXiv:2501.12549 (replaced) [pdf, ps, other]
Title: An O(log n)-Approximation Algorithm for (p,q)-Flexible Graph Connectivity via Independent Rounding
Comments: 23 pages. A preliminary version appeared in the proceedings of the 26th International Conference on Integer Programming and Combinatorial Optimization (IPCO 2025)
Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
[685]  arXiv:2501.14171 (replaced) [pdf, ps, other]
Title: Fully Guided Neural Schrödinger bridge for Brain MR image synthesis
Comments: Single column, 33 pages, 6 figures, revised_v1
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[686]  arXiv:2501.14576 (replaced) [pdf, ps, other]
Title: Dynamic Modeling and Control of Multi-Stack Alkaline Water Electrolysis Systems with Shared Gas Separators and Lye Circulation: An Experiment-Based Study
Authors: Yiwei Qiu (1), Jiatong Li (1), Yangjun Zeng (1), Yi Zhou (1), Shi Chen (1), Xiaoyan Qiu (1), Buxiang Zhou (1), Ge He, (2), Xu Ji, (2), Wenying Li (3), ((1) College of Electrical Engineering, Sichuan University, (2) School of Chemical Engineering, Sichuan University, (3) Sichuan Tsinghua Energy Internet Research Institute)
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[687]  arXiv:2501.16039 (replaced) [pdf, ps, other]
Title: Complexity of Constructing Minimal Faithful Permutation Representations for Fitting-free Groups
Comments: In [v3], we computed the minimal faithful permutation degree. For this new version [v4], we also compute a minimal faithful permutation representation. Version [v3] corresponds to our FCT 2025 paper
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Group Theory (math.GR)
[688]  arXiv:2502.02625 (replaced) [pdf, ps, other]
Title: Bayesian Parameter Shift Rule in Variational Quantum Eigensolvers
Comments: 8 pages, 5 figures, 14th International Conference on Learning Representations (ICLR 2026)
Subjects: Machine Learning (cs.LG); Quantum Physics (quant-ph)
[689]  arXiv:2502.04538 (replaced) [pdf, ps, other]
Title: "Security vs. Interoperability" Arguments: An Analytical Framework
Subjects: Cryptography and Security (cs.CR)
[690]  arXiv:2502.05591 (replaced) [pdf, ps, other]
Title: Round and Resilience-Optimal Approximate Agreement on Trees and Block Graphs
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[691]  arXiv:2502.06621 (replaced) [pdf, ps, other]
Title: Three Fundamental Questions in Modern Infinite-Domain Constraint Satisfaction
Comments: 44 pages
Subjects: Logic (math.LO); Logic in Computer Science (cs.LO)
[692]  arXiv:2502.12370 (replaced) [pdf, ps, other]
Title: Positional Encoding in Transformer-Based Time Series Models: A Survey
Subjects: Machine Learning (cs.LG)
[693]  arXiv:2502.13498 (replaced) [pdf, ps, other]
Title: Collision-Aware Object-Goal Visual Navigation via Two-Stage Deep Reinforcement Learning
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[694]  arXiv:2503.03565 (replaced) [pdf, ps, other]
Title: Efficiency of Parallel and Restart Exploration Strategies in Model Free Stochastic Simulations
Subjects: Probability (math.PR); Machine Learning (cs.LG)
[695]  arXiv:2503.03936 (replaced) [pdf, ps, other]
Title: Construction and Decoding of Quantum Margulis Codes
Comments: 15 pages, 11 figures. Part of this work was presented at the 60th Annual Allerton Conference on Communication, Control and Computing, in Urbana, IL, USA. Accepted for publication in Quantum
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
[696]  arXiv:2504.05407 (replaced) [pdf, ps, other]
Title: RouteFormer: A Transformer-Based Routing Framework for Autonomous Vehicles
Comments: 10 pages, the title and abstract are modified after peer review process to better reflect the scope of the paper. More validation tests were added as well
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[697]  arXiv:2504.09542 (replaced) [pdf, ps, other]
Title: Solar-charge your car: EV charging can be aligned with renewables by providing pro-environmental information on a smartboard
Subjects: Emerging Technologies (cs.ET)
[698]  arXiv:2504.10808 (replaced) [pdf, ps, other]
Title: Privacy-Preserving Empathy Detection in Video Interactions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[699]  arXiv:2504.11101 (replaced) [pdf, ps, other]
Title: Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[700]  arXiv:2505.00020 (replaced) [pdf, ps, other]
Title: Beyond Public Access in LLM Pre-Training Data
Comments: 29 pages, 4 figures. Revised based on peer review comments. Added bootstrapped 95% CIs for all AUROC scores and z-tests comparing public vs. non-public recognition (new Table 3). Qualified claims where differences are not statistically significant at book level. Updated contact information
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[701]  arXiv:2505.00359 (replaced) [src]
Title: TNStream: Applying Tightest Neighbors to Micro-Clusters to Define Multi-Density Clusters in Streaming Data
Comments: This paper is withdrawn due to issues identified after submission. The complexity analysis in Section 6.4 is affected by inappropriate experimental parameter settings, leading to potentially inaccurate results. In addition, the description of the Skeleton Set in Section 4.2 is not sufficiently rigorous. A thorough revision is required
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[702]  arXiv:2505.00753 (replaced) [pdf, ps, other]
Title: LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey
Comments: Accepted by ACL 2026 (Findings). Paper lists and resources are available at this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[703]  arXiv:2505.00835 (replaced) [pdf, ps, other]
Title: Multi-site modelling and reconstruction of past extreme skew surges along the French Atlantic coast
Subjects: Applications (stat.AP); Machine Learning (cs.LG); Machine Learning (stat.ML)
[704]  arXiv:2505.03597 (replaced) [pdf, ps, other]
Title: Fixed-Length Dense Fingerprint Representation with Alignment and Robust Enhancement
Comments: Accepted by IEEE Transactions on Information Forensics and Security (TIFS) 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[705]  arXiv:2505.05880 (replaced) [pdf, ps, other]
Title: Combining Abstract Argumentation and Machine Learning for Efficiently Analyzing Low-Level Process Event Streams
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[706]  arXiv:2505.06518 (replaced) [pdf, ps, other]
Title: Provable Distributional Value Iteration under Partial Observability
Comments: Accepted at the Reinforcement Learning Conference (RLC) 2026. Code available at: this https URL
Subjects: Artificial Intelligence (cs.AI)
[707]  arXiv:2505.09353 (replaced) [pdf, ps, other]
Title: Deterministic Suffix-reading Automata
Subjects: Formal Languages and Automata Theory (cs.FL)
[708]  arXiv:2505.11815 (replaced) [pdf, ps, other]
Title: UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[709]  arXiv:2505.16527 (replaced) [pdf, ps, other]
Title: Joint Relational Database Generation via Graph-Conditional Diffusion Models
Comments: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Subjects: Machine Learning (cs.LG)
[710]  arXiv:2505.18118 (replaced) [pdf, ps, other]
Title: Scalable Policy Maximization Under Network Interference
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[711]  arXiv:2505.18244 (replaced) [pdf, ps, other]
Title: Emergent Hierarchical Structure in Large Language Models: An Information-Theoretic Framework for Multi-Scale Representation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[712]  arXiv:2505.18466 (replaced) [pdf, ps, other]
Title: Purdah and Patriarchy: Evaluating and Mitigating South Asian Biases in Open-Ended Multilingual LLM Generations
Comments: Accepted to TrustNLP at ACL (Association for Computational Linguistics) 2026
Subjects: Computation and Language (cs.CL)
[713]  arXiv:2505.19625 (replaced) [pdf, ps, other]
Title: Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[714]  arXiv:2505.19629 (replaced) [pdf, ps, other]
Title: Software Engineering for Self-Adaptive Robotics: A Research Agenda
Subjects: Software Engineering (cs.SE); Robotics (cs.RO)
[715]  arXiv:2505.21756 (replaced) [pdf, ps, other]
Title: Online Voting using Point to MultiPoint Quantum Key Distribution
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Emerging Technologies (cs.ET)
[716]  arXiv:2505.24187 (replaced) [pdf, ps, other]
Title: Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models
Subjects: Computation and Language (cs.CL)
[717]  arXiv:2506.01249 (replaced) [pdf, ps, other]
Title: SysLLMatic: Large Language Models are Software System Optimizers
Subjects: Software Engineering (cs.SE); Performance (cs.PF)
[718]  arXiv:2506.06856 (replaced) [pdf, ps, other]
Title: Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[719]  arXiv:2506.07372 (replaced) [pdf, ps, other]
Title: Enhanced Consistency Bi-directional GAN (CBiGAN) for Malware Anomaly Detection
Subjects: Cryptography and Security (cs.CR)
[720]  arXiv:2506.07548 (replaced) [pdf, ps, other]
Title: Overcoming Environmental Meta-Stationarity in MARL via Adaptive Curriculum and Counterfactual Group Advantage
Comments: 23 pages; 15figures
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)
[721]  arXiv:2506.10843 (replaced) [pdf, ps, other]
Title: Diverse Committees with Incomplete or Inaccurate Approval Ballots
Comments: 19 pages
Subjects: Computer Science and Game Theory (cs.GT)
[722]  arXiv:2506.14432 (replaced) [pdf, ps, other]
Title: A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[723]  arXiv:2506.16240 (replaced) [pdf, ps, other]
Title: Microcanonical simulated annealing: Massively parallel Monte Carlo simulations with sporadic random-number generation
Comments: 17 pages, 6 figures, 4 tables. Version accepted for publication in Computer Physics Communications
Journal-ref: Comp. Phys. Comm. 325, 110182 (2026)
Subjects: Statistical Mechanics (cond-mat.stat-mech); Disordered Systems and Neural Networks (cond-mat.dis-nn); Hardware Architecture (cs.AR); Computational Physics (physics.comp-ph)
[724]  arXiv:2506.20616 (replaced) [pdf, ps, other]
Title: Shape2Animal: Creative Animal Generation from Natural Silhouettes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[725]  arXiv:2506.20911 (replaced) [pdf, ps, other]
Title: FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
Comments: The Fourteenth International Conference on Learning Representations
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[726]  arXiv:2506.21654 (replaced) [pdf, ps, other]
Title: Experience converting a large mathematical software package written in C++ to C++20 modules
Subjects: Software Engineering (cs.SE); Mathematical Software (cs.MS)
[727]  arXiv:2506.22226 (replaced) [pdf, ps, other]
Title: Cardiovascular disease classification using radiomics and geometric features from cardiac CT
Comments: Accepted at STACOM 2025 workshop held in conjunction with MICCAI 2025 conference
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[728]  arXiv:2507.10467 (replaced) [pdf, ps, other]
Title: Colorful Minors
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
[729]  arXiv:2507.12768 (replaced) [pdf, ps, other]
Title: AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[730]  arXiv:2507.14874 (replaced) [pdf, ps, other]
Title: The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs
Comments: 23 pages, 9 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[731]  arXiv:2507.15736 (replaced) [pdf, ps, other]
Title: IDRBench: Understanding the Capability of Large Language Models on Interdisciplinary Research
Subjects: Computation and Language (cs.CL)
[732]  arXiv:2507.16594 (replaced) [pdf, ps, other]
Title: Optimizing Split Learning Latency in TinyML-Based IoT Systems
Comments: This paper is uploaded here for research community, thus it is for non-commercial purposes
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
[733]  arXiv:2507.18798 (replaced) [pdf, ps, other]
Title: Higher-order Kripke models for intuitionistic and non-classical modal logics
Subjects: Logic in Computer Science (cs.LO)
[734]  arXiv:2507.20021 (replaced) [pdf, ps, other]
Title: When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation
Comments: Updated version with additional ablations, clarifications, and code release
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[735]  arXiv:2507.23387 (replaced) [pdf, ps, other]
Title: SGEMM-cube: Precision-Recovery FP32 GEMM Approximation on Ascend NPUs with FP16 Matrix Engines
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[736]  arXiv:2507.23501 (replaced) [pdf, ps, other]
Title: Adaptive Ensemble Aggregation for Actor-Critics
Comments: updated theory; experiments; author list
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[737]  arXiv:2508.00091 (replaced) [pdf, ps, other]
Title: Provable Non-Convex Euclidean Distance Matrix Completion: Geometry, Reconstruction, and Robustness
Comments: 52 pages, 7 figures. In v1, the proof of Lemma 5.3 (Appendix B.1) did not include an argument required to control the bound uniformly over all Y; a standard net argument would therefore yield sub-optimal bounds. In v2, we address this issue by using matrix decoupling. We have also edited the manuscript throughout for clarity
Subjects: Optimization and Control (math.OC); Computational Geometry (cs.CG); Machine Learning (cs.LG)
[738]  arXiv:2508.02115 (replaced) [pdf, ps, other]
Title: Coward: Collision-based OOD Watermarking for Practical Proactive Federated Backdoor Detection
Comments: Currently under review. 35-page main body. 10-page supplementary
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[739]  arXiv:2508.04204 (replaced) [pdf, ps, other]
Title: ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[740]  arXiv:2508.05372 (replaced) [pdf, ps, other]
Title: The domain-of-dependence stabilization for cut-cell meshes is fully discretely stable
Comments: 31 pages, 11 figures, 4 tables. For the associated reproducibility repository, see this https URL
Journal-ref: The SMAI Journal of computational mathematics, Volume 12 (2026), pp. 187-218
Subjects: Numerical Analysis (math.NA)
[741]  arXiv:2508.07240 (replaced) [pdf, ps, other]
Title: PureSample: Neural Materials Learned by Sampling Microgeometry
Journal-ref: ACM SIGGRAPH Conference Papers, 2026
Subjects: Graphics (cs.GR)
[742]  arXiv:2508.08289 (replaced) [pdf, ps, other]
Title: Understanding Transformers through the Lens of Pavlovian Conditioning
Authors: Mu Qiao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
[743]  arXiv:2508.11196 (replaced) [pdf, ps, other]
Title: UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning
Authors: Jiajin Guan (1), Haibo Mei (2), Bonan Zhang (1), Dan Liu (1), Yuanshuang Fu (1), Yue Zhang (2) ((1) Research Institute of Electronic Science and Technology, University of Electronic Science and Technology of China, Chengdu, China, (2) School of Aeronautics and Astronautics, University of Electronic Science and Technology of China, Chengdu, China)
Comments: The article has been accepted by Frontiers of Computer Science (FCS), with the DOI: 10.1007/s11704-026-52082-z
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[744]  arXiv:2508.19035 (replaced) [pdf, ps, other]
Title: Investigating Advanced Reasoning of Large Language Models via Black-Box Environment Interaction
Comments: Accepted by ICML 2026
Subjects: Artificial Intelligence (cs.AI)
[745]  arXiv:2508.19145 (replaced) [pdf, ps, other]
Title: Echoes of the Past: A Unified Perspective on Fading memory and Echo States
Journal-ref: Neural Computation, vol 38(5), 2026
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Dynamical Systems (math.DS)
[746]  arXiv:2508.19651 (replaced) [pdf, ps, other]
Title: Scalable Object Detection in the Car Interior With Vision Foundation Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[747]  arXiv:2509.04347 (replaced) [pdf, ps, other]
Title: When Darwin met Ianus: dichotomies of expressivity
Comments: added pseudo-loop lemma for phylogeny constraint languages; 36 pages
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO); Rings and Algebras (math.RA)
[748]  arXiv:2509.10784 (replaced) [pdf, ps, other]
Title: Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning
Comments: 19 pages, 6 figures, 8 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[749]  arXiv:2509.14274 (replaced) [pdf, ps, other]
Title: Discovering New Theorems via LLMs with In-Context Proof Learning in Lean
Comments: 12 pages, 2 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
[750]  arXiv:2509.14640 (replaced) [pdf, ps, other]
Title: DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers
Subjects: Machine Learning (cs.LG)
[751]  arXiv:2509.17244 (replaced) [pdf, ps, other]
Title: Scalable Multi Agent Diffusion Policies for Coverage Control
Subjects: Robotics (cs.RO)
[752]  arXiv:2509.24943 (replaced) [pdf, ps, other]
Title: Perceive, Verify and Understand Long Video: Multi-Granular Perception and Active Verification via Interactive Agents
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[753]  arXiv:2509.24996 (replaced) [pdf, ps, other]
Title: Addressing Methodological Sensitivity in MCDM with a Systematic Pipeline Approach to Data Transformation Sensitivity Analysis
Subjects: Optimization and Control (math.OC); Software Engineering (cs.SE)
[754]  arXiv:2509.25041 (replaced) [pdf, ps, other]
Title: GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[755]  arXiv:2510.00468 (replaced) [pdf, ps, other]
Title: Feature Identification via the Empirical NTK
Authors: Jennifer Lin
Comments: 23 pages, 4 figures. v2: references and expanded discussion in Appendix B added. v3: Transformer case study and more appendices added. v4: Language model case study added
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[756]  arXiv:2510.01641 (replaced) [pdf, ps, other]
Title: FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring
Comments: Accepted to ICLR 2026. Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[757]  arXiv:2510.02864 (replaced) [pdf, ps, other]
Title: Forensic Similarity for Speech Deepfakes
Comments: Accepted @ ACM IH&MMSec 2026
Subjects: Sound (cs.SD)
[758]  arXiv:2510.08431 (replaced) [pdf, ps, other]
Title: Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
Comments: ICLR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[759]  arXiv:2510.09881 (replaced) [pdf, ps, other]
Title: LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates
Comments: Accepted to CVPR 2026 Findings. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[760]  arXiv:2510.09885 (replaced) [pdf, ps, other]
Title: Diffusion-Inspired Masked Fine-Tuning for Knowledge Injection in Autoregressive LLMs
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[761]  arXiv:2510.10674 (replaced) [pdf, ps, other]
Title: Soft-Decoding Reverse Reconciliation in Discrete-Modulation CV-QKD
Comments: Accepted for publication to IEEE Transactions on Communications
Subjects: Information Theory (cs.IT)
[762]  arXiv:2510.10713 (replaced) [pdf, ps, other]
Title: Deep Learning in Astrophysics
Authors: Yuan-Sen Ting
Comments: Published in Annual Review of Astronomy and Astrophysics, Volume 64. This is the authors' version. The published version is available at this https URL
Journal-ref: Annual Review of Astronomy and Astrophysics, 2026, Vol. 64
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Cosmology and Nongalactic Astrophysics (astro-ph.CO); Earth and Planetary Astrophysics (astro-ph.EP); Astrophysics of Galaxies (astro-ph.GA); High Energy Astrophysical Phenomena (astro-ph.HE); Artificial Intelligence (cs.AI)
[763]  arXiv:2510.17903 (replaced) [pdf, ps, other]
Title: Learning Time-Varying Graphs from Incomplete Graph Signals
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[764]  arXiv:2510.18518 (replaced) [pdf, ps, other]
Title: Efficient Model-Based Reinforcement Learning for Robot Control via Online Optimization
Subjects: Robotics (cs.RO)
[765]  arXiv:2510.21033 (replaced) [pdf, ps, other]
Title: Iso-Riemannian Optimization on Learned Data Manifolds
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Differential Geometry (math.DG)
[766]  arXiv:2510.24215 (replaced) [pdf, ps, other]
Title: What Can Be Recovered Under Sparse Adversarial Corruption? Assumption-Free Theory for Linear Measurements
Comments: 18 pages, 3 figures; preprint submitted to IEEE Trans. Inf. Theory
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
[767]  arXiv:2510.25224 (replaced) [pdf, ps, other]
Title: ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation
Subjects: Computation and Language (cs.CL)
[768]  arXiv:2511.01553 (replaced) [pdf, ps, other]
Title: Online Continual Learning on Intel Loihi 2 via a Co-designed Spiking Neural Network
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)
[769]  arXiv:2511.03475 (replaced) [pdf, ps, other]
Title: ContextPilot: Fast Long-Context Inference via Context Reuse
Subjects: Machine Learning (cs.LG)
[770]  arXiv:2511.05080 (replaced) [pdf, ps, other]
Title: Making Knowledge Accessible: Divergent Readability-Accuracy Strategies of Mistral and QWen in Biomedical Text Simplification
Comments: NLP4DH 2026 Open review page at this https URL . In a nutshell, to achieve better generalizability, we need to soften the language and/or expand the number of models and document classes under review. As it is right now, the analysis only speaks to biomedical text and two models (QWen 2.5 and Mistral-small). We acknowledge this as a limitation and area for future work
Subjects: Computation and Language (cs.CL)
[771]  arXiv:2511.06371 (replaced) [pdf, ps, other]
Title: Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning
Subjects: Robotics (cs.RO)
[772]  arXiv:2511.06452 (replaced) [pdf, ps, other]
Title: MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains
Comments: AAAI 2026
Subjects: Machine Learning (cs.LG)
[773]  arXiv:2511.06754 (replaced) [pdf, ps, other]
Title: SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation
Comments: Accepted at ICRA 2026
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[774]  arXiv:2511.08195 (replaced) [pdf, ps, other]
Title: UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization
Comments: 27 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[775]  arXiv:2511.09537 (replaced) [pdf, ps, other]
Title: NSL-MT: Linguistically Informed Negative Samples for Efficient Machine Translation in Low-Resource Languages
Subjects: Machine Learning (cs.LG)
[776]  arXiv:2511.13587 (replaced) [pdf, ps, other]
Title: VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping
Comments: CVPR 2026 Main
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[777]  arXiv:2511.16424 (replaced) [pdf, ps, other]
Title: Second-Order MPC-Based Distributed Q-Learning
Comments: 6 pages, 2 figures, published in IFAC World Congress 2026
Subjects: Systems and Control (eess.SY)
[778]  arXiv:2511.16567 (replaced) [pdf, ps, other]
Title: POMA-3D: The Point Map Way to 3D Scene Understanding
Comments: 11 pages, 6 tables, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[779]  arXiv:2511.19513 (replaced) [pdf, ps, other]
Title: Row-stochastic matrices can provably outperform doubly stochastic matrices in decentralized learning
Comments: 41 pages, 38 figures
Subjects: Machine Learning (cs.LG)
[780]  arXiv:2511.20179 (replaced) [pdf, ps, other]
Title: Human-computer interactions predict mental health
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[781]  arXiv:2511.21343 (replaced) [pdf, ps, other]
Title: Model Predictive Control and Moving Horizon Estimation using Statistically Weighted Data-Based Ensemble Models
Comments: 6 pages, 4 figures, published in ECC 2026
Subjects: Systems and Control (eess.SY)
[782]  arXiv:2511.21641 (replaced) [pdf, ps, other]
Title: Model-free practical PI-Lead control design by ultimate sensitivity principle
Authors: Michael Ruderman
Comments: 6 pages, 10 figures
Subjects: Systems and Control (eess.SY)
[783]  arXiv:2512.01594 (replaced) [pdf, ps, other]
Title: CAEC: Confidential, Attestable, and Efficient Inter-CVM Communication with Arm CCA
Subjects: Cryptography and Security (cs.CR); Operating Systems (cs.OS)
[784]  arXiv:2512.03465 (replaced) [pdf, ps, other]
Title: Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits
Authors: Robert Dilworth
Comments: 20 pages, 8 figures, 2 tables
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Information Retrieval (cs.IR)
[785]  arXiv:2512.03847 (replaced) [pdf, ps, other]
Title: DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[786]  arXiv:2512.04388 (replaced) [pdf, ps, other]
Title: Learning to Orchestrate Agents in Natural Language with the Conductor
Comments: To appear at the 14th International Conference on Learning Representations (ICLR 2026)
Subjects: Machine Learning (cs.LG)
[787]  arXiv:2512.04614 (replaced) [pdf, ps, other]
Title: On Tight FPT Time Approximation Algorithms for k-Clustering Problems
Comments: 35 pages, 1 figures; accepted to ICALP 2026
Subjects: Data Structures and Algorithms (cs.DS)
[788]  arXiv:2512.05226 (replaced) [pdf, ps, other]
Title: Variance Matters: Improving Domain Adaptation via Stratified Sampling
Comments: Published in TMLR 05/26
Journal-ref: Transactions on Machine Learning Research, 2835-8856 (2026)
Subjects: Machine Learning (cs.LG)
[789]  arXiv:2512.05844 (replaced) [pdf, ps, other]
Title: NEAT: Neighborhood-Guided, Efficient, Autoregressive Set Transformer for 3D Molecular Generation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[790]  arXiv:2512.06393 (replaced) [pdf, ps, other]
Title: Conflict-Aware Fusion: Mitigating Logic Inertia in Large Language Models via Structured Cognitive Priors
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
[791]  arXiv:2512.08265 (replaced) [pdf, ps, other]
Title: Theoretical Studies of Sub-THz Active Split-Ring Resonators for Near-Field Imaging
Comments: IEEE Transactions on Circuits and Systems I: Regular Papers
Subjects: Systems and Control (eess.SY)
[792]  arXiv:2512.08681 (replaced) [pdf, ps, other]
Title: Resolvable Triple Arrays
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
[793]  arXiv:2512.08856 (replaced) [pdf, ps, other]
Title: Can the GPC standard eliminate consent banners in the EU?
Journal-ref: Computer Law & Security Review, 61, 106332 (2026)
Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR)
[794]  arXiv:2512.12109 (replaced) [pdf, ps, other]
Title: A Neuro-Symbolic Framework for Accountability in Public-Sector AI
Comments: Accepted at FAccT 2026 (The 2026 ACM Conference on Fairness, Accountability, and Transparency), June 25-28, Montreal, Canada
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
[795]  arXiv:2512.14954 (replaced) [pdf, ps, other]
Title: Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[796]  arXiv:2512.15146 (replaced) [pdf, ps, other]
Title: Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning
Comments: Accepted to ACL 2026 Main Conference. 15 pages, 9 figures, 5 tables
Subjects: Computation and Language (cs.CL)
[797]  arXiv:2512.20056 (replaced) [pdf, ps, other]
Title: Towards Generative Location Awareness for Disaster Response: A Probabilistic Cross-view Geolocalization Approach
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[798]  arXiv:2512.20773 (replaced) [pdf, ps, other]
Title: DIAL: Direct Iterative Adversarial Learning for Realistic Multi-Turn Dialogue Simulation
Subjects: Computation and Language (cs.CL)
[799]  arXiv:2512.22588 (replaced) [pdf, ps, other]
Title: Low-Latency Quasi-Static Modeling of UAV Tether Aerodynamics
Comments: Accepted at ICUAS2026
Subjects: Robotics (cs.RO)
[800]  arXiv:2512.22671 (replaced) [pdf, ps, other]
Title: Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2
Authors: Pere Martra
Comments: 23 pages, 5 figures, 9 tables. Code available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[801]  arXiv:2512.23864 (replaced) [pdf, ps, other]
Title: Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[802]  arXiv:2601.00020 (replaced) [pdf, ps, other]
Title: Personalized Spiking Neural Networks with Ferroelectric Synapses for EEG Signal Processing
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Systems and Control (eess.SY)
[803]  arXiv:2601.00264 (replaced) [pdf, ps, other]
Title: S1-MMAlign: A Large-Scale, Multi-Disciplinary Dataset for Scientific Figure-Text Understanding
Comments: 18 pages (main text) + 6 pages (supplementary information), 7 figures (main text). Updated version submitted to Scientific Data. Dataset available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[804]  arXiv:2601.00285 (replaced) [pdf, ps, other]
Title: SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[805]  arXiv:2601.00376 (replaced) [pdf, ps, other]
Title: In Line with Context: Repository-Level Code Generation via Context Inlining
Comments: Accepted to FSE 2026
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[806]  arXiv:2601.03166 (replaced) [pdf, ps, other]
Title: Dynamic Hyperparameter Importance for Efficient Multi-Objective Optimization
Subjects: Machine Learning (cs.LG)
[807]  arXiv:2601.04080 (replaced) [pdf, ps, other]
Title: Craig-Lyndon Interpolation for the Logic of Here and There with a Variation of Mints' Sequent System
Subjects: Logic in Computer Science (cs.LO)
[808]  arXiv:2601.05166 (replaced) [pdf, ps, other]
Title: Inapproximability of Counting Permutation Patterns
Authors: Michal Opler
Subjects: Data Structures and Algorithms (cs.DS)
[809]  arXiv:2601.05983 (replaced) [pdf, ps, other]
Title: Age of Gossip With Cellular Drone Mobility
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI); Signal Processing (eess.SP)
[810]  arXiv:2601.06839 (replaced) [pdf, ps, other]
Title: PRISM: Color-Stratified Point Cloud Sampling
Comments: This work has been submitted to the 2026 International Conference on Pattern Recognition (ICPR) for possible publication
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[811]  arXiv:2601.07389 (replaced) [pdf, ps, other]
Title: On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
[812]  arXiv:2601.07704 (replaced) [pdf, ps, other]
Title: TMATDG: applying TDG methods to multiple scattering via T-matrix approximation
Comments: 13 pages, 6 figures
Subjects: Numerical Analysis (math.NA)
[813]  arXiv:2601.08623 (replaced) [pdf, ps, other]
Title: SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models
Comments: Code at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[814]  arXiv:2601.09195 (replaced) [pdf, ps, other]
Title: ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[815]  arXiv:2601.11689 (replaced) [pdf, ps, other]
Title: Bridging Modalities: Joint Synthesis and Registration Framework for Aligning Diffusion MRI with T1-Weighted Images
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[816]  arXiv:2601.13908 (replaced) [pdf, ps, other]
Title: Improving the local solution of the DG predictor of the ADER-DG method for solving systems of ordinary differential equations and its applicability to systems of differential-algebraic equations
Authors: I.S. Popov
Comments: 41 pages, 11 figures, 5 tables
Subjects: Numerical Analysis (math.NA); Functional Analysis (math.FA); Applied Physics (physics.app-ph); Computational Physics (physics.comp-ph)
[817]  arXiv:2601.13994 (replaced) [pdf, ps, other]
Title: torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
[818]  arXiv:2601.14180 (replaced) [pdf, ps, other]
Title: Progressive $\mathcal{J}$-Invariant Self-supervised Learning for Low-Dose CT Denoising
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[819]  arXiv:2601.15486 (replaced) [pdf, ps, other]
Title: A Universal Large Language Model -- Drone Command and Control Interface
Subjects: Robotics (cs.RO)
[820]  arXiv:2601.19690 (replaced) [pdf, ps, other]
Title: DSVM-UNet : Enhancing VM-UNet with Dual Self-distillation for Medical Image Segmentation
Comments: 5 pages, 1 figures
Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[821]  arXiv:2601.20634 (replaced) [pdf, ps, other]
Title: A Scalable Multi-Task Model for Virtual Sensors
Comments: 22 pages in total, 17 figures
Subjects: Machine Learning (cs.LG)
[822]  arXiv:2601.21851 (replaced) [pdf, ps, other]
Title: Visual Disentangled Diffusion Autoencoders: Scalable Counterfactual Generation for Foundation Models
Subjects: Machine Learning (cs.LG)
[823]  arXiv:2601.22118 (replaced) [pdf, ps, other]
Title: Defining Operational Conditions for Safety-Critical AI-Based Systems from Data
Subjects: Artificial Intelligence (cs.AI)
[824]  arXiv:2601.22725 (replaced) [pdf, ps, other]
Title: OpenVTON-Bench: A Large-Scale High-Resolution Benchmark for Controllable Virtual Try-On Evaluation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[825]  arXiv:2602.00305 (replaced) [pdf, ps, other]
Title: Syntax- and Compilation-Preserving Evasion of LLM Vulnerability Detectors
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[826]  arXiv:2602.00844 (replaced) [pdf, ps, other]
Title: Multivariate Time Series Data Imputation via Distributionally Robust Regularization
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)
[827]  arXiv:2602.00937 (replaced) [pdf, ps, other]
Title: CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining
Comments: Accepted to the Robotics: Science and Systems (RSS) 2026
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[828]  arXiv:2602.01187 (replaced) [pdf, ps, other]
Title: Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[829]  arXiv:2602.01486 (replaced) [pdf, ps, other]
Title: Multi-Scale Wavelet Transformers for Operator Learning of Dynamical Systems
Subjects: Machine Learning (cs.LG)
[830]  arXiv:2602.02282 (replaced) [pdf, ps, other]
Title: MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology
Comments: Accepted at Proceedings 43rd International Conference on Machine Learning, Seoul, South Korea
Journal-ref: Proceedings 43rd International Conference on Machine Learning 2026
Subjects: Machine Learning (cs.LG)
[831]  arXiv:2602.02315 (replaced) [pdf, ps, other]
Title: The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors
Subjects: Computation and Language (cs.CL)
[832]  arXiv:2602.02543 (replaced) [pdf, ps, other]
Title: Norm Anchors Make Model Edits Last
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[833]  arXiv:2602.02779 (replaced) [pdf, ps, other]
Title: Comparison of Trefftz-Based PINNs and Standard PINNs Focusing on Structure Preservation
Authors: Koji Koyamada
Subjects: Numerical Analysis (math.NA)
[834]  arXiv:2602.02924 (replaced) [pdf, ps, other]
Title: How Does the Lagrangian Guide Safe Reinforcement Learning through Diffusion Models?
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[835]  arXiv:2602.02958 (replaced) [pdf, ps, other]
Title: Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization
Comments: Accepted by ICML 2026. 13 pages
Subjects: Machine Learning (cs.LG)
[836]  arXiv:2602.03204 (replaced) [pdf, ps, other]
Title: Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry
Subjects: Machine Learning (cs.LG)
[837]  arXiv:2602.03396 (replaced) [pdf, ps, other]
Title: Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective
Subjects: Computation and Language (cs.CL)
[838]  arXiv:2602.03452 (replaced) [pdf, ps, other]
Title: Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[839]  arXiv:2602.04129 (replaced) [pdf, ps, other]
Title: KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Multiagent Systems (cs.MA)
[840]  arXiv:2602.05890 (replaced) [pdf, ps, other]
Title: DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[841]  arXiv:2602.06810 (replaced) [pdf, ps, other]
Title: Calibrating Tabular Anomaly Detection via Optimal Transport
Subjects: Machine Learning (cs.LG)
[842]  arXiv:2602.06869 (replaced) [pdf, ps, other]
Title: Uncovering Cross-Objective Interference in Multi-Objective Alignment
Authors: Yining Lu, Meng Jiang
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[843]  arXiv:2602.07608 (replaced) [src]
Title: HistoMet: A Pan-Cancer Deep Learning Framework for Prognostic Prediction of Metastatic Progression and Site Tropism from Primary Tumor Histopathology
Comments: Withdrawn due to dataset issues identified
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[844]  arXiv:2602.08316 (replaced) [pdf, ps, other]
Title: SWE Context Bench: A Benchmark for Context Learning in Coding
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[845]  arXiv:2602.08457 (replaced) [pdf, ps, other]
Title: Hybrid Pooling with LLMs via Relevance Context Learning
Comments: SIGIR 2026
Subjects: Information Retrieval (cs.IR)
[846]  arXiv:2602.09130 (replaced) [pdf, ps, other]
Title: UniComp: A Unified Evaluation of Large Language Model Compression via Pruning, Quantization and Distillation
Comments: 18 pages, 5 figures, 18 tables
Subjects: Machine Learning (cs.LG)
[847]  arXiv:2602.09504 (replaced) [pdf, ps, other]
Title: Seeing the Goal, Missing the Truth: Human Accountability for AI Bias
Comments: 24 pages, 4 figures, 8 tables
Subjects: General Finance (q-fin.GN); Artificial Intelligence (cs.AI)
[848]  arXiv:2602.10043 (replaced) [pdf, ps, other]
Title: Cross-Dataset Linkage of Brain MRI using Image Similarity Measures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[849]  arXiv:2602.10144 (replaced) [pdf, ps, other]
Title: When LLMs get significantly worse: A statistical approach to detect model degradations
Comments: this https URL
Journal-ref: ICLR 2026
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[850]  arXiv:2602.10712 (replaced) [pdf, ps, other]
Title: Photons x Force: Differentiable Radiation Pressure Modeling
Comments: Camera-ready version. Accepted to ACM Transactions on Graphics 45(4). (SIGGRAPH 2026), article 82. 17 pages, 19 figures
Journal-ref: ACM Transactions on Graphics, Vol. 45, No. 4, Article 82 (July 2026)
Subjects: Graphics (cs.GR); Earth and Planetary Astrophysics (astro-ph.EP); Instrumentation and Methods for Astrophysics (astro-ph.IM)
[851]  arXiv:2602.12095 (replaced) [pdf, ps, other]
Title: Pack it in: Packing into Partially Filled Containers Through Contact
Comments: 8 pages, 5 figures
Subjects: Robotics (cs.RO)
[852]  arXiv:2602.12783 (replaced) [pdf, ps, other]
Title: SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise
Comments: Accepted by SIGIR 2026
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
[853]  arXiv:2602.13280 (replaced) [pdf, ps, other]
Title: BEAGLE: Behavior-Enforced Agent for Grounded Learner Emulation
Subjects: Artificial Intelligence (cs.AI)
[854]  arXiv:2602.13670 (replaced) [pdf, ps, other]
Title: Advancing Analytic Class-Incremental Learning through Vision-Language Calibration
Comments: 20 pages, 11 figures, 9 tables. Accepted by ICML2026
Subjects: Machine Learning (cs.LG)
[855]  arXiv:2602.14872 (replaced) [pdf, ps, other]
Title: The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards
Comments: Updated version after ICML acceptance
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
[856]  arXiv:2602.16233 (replaced) [pdf, ps, other]
Title: DistributedEstimator: Distributed Training of Quantum Neural Networks via Circuit Cutting
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Quantum Physics (quant-ph)
[857]  arXiv:2602.17753 (replaced) [pdf, ps, other]
Title: The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems
Comments: To be publishesd at ACM FAccT 2026
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[858]  arXiv:2602.19041 (replaced) [pdf, ps, other]
Title: Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning
Comments: 24 pages, 5 figures
Subjects: Machine Learning (cs.LG)
[859]  arXiv:2602.19651 (replaced) [pdf, ps, other]
Title: Denoising Particle Filters: Learning State Estimation with Single-Step Objectives
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[860]  arXiv:2602.19837 (replaced) [pdf, ps, other]
Title: Meta-Learning and Meta-Reinforcement Learning -- Tracing the Path towards DeepMind's Adaptive Agent
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[861]  arXiv:2602.22291 (replaced) [pdf, ps, other]
Title: Manifold of Failure: Behavioral Attraction Basins in Language Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[862]  arXiv:2602.23010 (replaced) [pdf, ps, other]
Title: Helmlab: A Two-Space Family of Analytical, Data-Driven Color Spaces for UI Design Systems
Authors: Gorkem Yildiz
Comments: 16 pages, 7 figures, 4 tables. Code, datasets, and live benchmark at this https URL and this https URL
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[863]  arXiv:2602.23547 (replaced) [pdf, ps, other]
Title: France or Spain or Germany or France: A Neural Account of Non-Redundant Redundant Disjunctions
Comments: 7 pages, 6 figures
Subjects: Computation and Language (cs.CL)
[864]  arXiv:2603.00492 (replaced) [pdf, ps, other]
Title: ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models
Comments: Video results: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
[865]  arXiv:2603.00714 (replaced) [pdf, ps, other]
Title: A Reconstruction System for Industrial Pipeline Inner Walls Using Panoramic Image Stitching with Endoscopic Imaging
Comments: 5 pages, 3 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[866]  arXiv:2603.00884 (replaced) [pdf, ps, other]
Title: From OCR to Analysis: Tracking Correction Provenance in Digital Humanities Pipelines
Authors: Haoze Guo, Ziqi Wei
Comments: In Proceedings of the 6th International Conference on Natural Language Processing for Digital Humanities
Subjects: Human-Computer Interaction (cs.HC)
[867]  arXiv:2603.01097 (replaced) [pdf, ps, other]
Title: Understanding LoRA as Knowledge Memory: An Empirical Analysis
Comments: ICML 2026
Journal-ref: Proceedings of the Forty-Third International Conference on Machine Learning (ICML), 2026
Subjects: Machine Learning (cs.LG)
[868]  arXiv:2603.02995 (replaced) [pdf, ps, other]
Title: A Graph-Native Approach to Normalization
Subjects: Databases (cs.DB)
[869]  arXiv:2603.03632 (replaced) [pdf, ps, other]
Title: Local Safety Filters for Networked Systems via Two-Time-Scale Design
Comments: Longer version of a paper accepted for publication in IEEE LCSS; this version has additional data for the simulations
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[870]  arXiv:2603.03805 (replaced) [pdf, ps, other]
Title: Relational In-Context Learning via Synthetic Pre-training with Structural Prior
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)
[871]  arXiv:2603.08022 (replaced) [pdf, ps, other]
Title: Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization
Subjects: Machine Learning (cs.LG)
[872]  arXiv:2603.09501 (replaced) [pdf, ps, other]
Title: Avoiding Big Integers: Parallel Multimodular Algebraic Verification of Arithmetic Circuits
Subjects: Symbolic Computation (cs.SC)
[873]  arXiv:2603.09789 (replaced) [pdf, ps, other]
Title: A Hybrid Quantum-Classical Framework for Financial Volatility Forecasting Based on Quantum Circuit Born Machines
Authors: Yixiong Chen
Comments: Added comprehensive analysis on Implicit Knowledge Distillation via a novel "Drop-Prior" mechanism
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)
[874]  arXiv:2603.11379 (replaced) [pdf, ps, other]
Title: Induced Minors and Coarse Tree Decompositions
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
[875]  arXiv:2603.11911 (replaced) [pdf, ps, other]
Title: InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model
Comments: Project page: this https URL Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[876]  arXiv:2603.12137 (replaced) [pdf, ps, other]
Title: Reaching a Consensus in Predictive Loops
Subjects: Social and Information Networks (cs.SI)
[877]  arXiv:2603.13286 (replaced) [pdf, ps, other]
[878]  arXiv:2603.13783 (replaced) [pdf, ps, other]
Title: RetimeGS: Continuous-Time Reconstruction of 4D Gaussian Splatting
Comments: Accepted to CVPR2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[879]  arXiv:2603.14764 (replaced) [pdf, ps, other]
Title: Topology-Preserving Data Augmentation for Ring-Type Polygon Annotations
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[880]  arXiv:2603.16368 (replaced) [pdf, ps, other]
Title: Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy
Comments: Accepted to the 18th International Conference on Social Robotics (ICSR 2026)
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[881]  arXiv:2603.16659 (replaced) [pdf, ps, other]
Title: LLMs learn scientific taste from institutional traces across the social sciences
Subjects: Artificial Intelligence (cs.AI); General Economics (econ.GN)
[882]  arXiv:2603.17751 (replaced) [pdf, ps, other]
Title: Multi-Source Human-in-the-Loop Digital Twin Testbed for Connected and Autonomous Vehicles in Mixed Traffic Flow
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[883]  arXiv:2603.17771 (replaced) [pdf, ps, other]
Title: Attention Sinks Induce Gradient Sinks: Massive Activations as Gradient Regulators in Transformers
Comments: 29 pages, 14 figures, 7 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[884]  arXiv:2603.20684 (replaced) [pdf, ps, other]
Title: Centrality-Based Pruning for Efficient Echo State Networks
Authors: Sudip Laudari
Comments: 8 pages, 3 figures, 2 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
[885]  arXiv:2603.25412 (replaced) [pdf, ps, other]
Title: Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[886]  arXiv:2603.25956 (replaced) [pdf, ps, other]
Title: ARTA: Adversarial-Robust Multivariate Time--Series Anomaly Detection via Sparsity-Constrained Perturbations
Comments: 12 pages, 4 figures
Subjects: Machine Learning (cs.LG)
[887]  arXiv:2603.26114 (replaced) [pdf, ps, other]
Title: DPD-Cancer: Explainable Graph-Based Deep Learning for Small Molecule Anti-Cancer Activity Prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[888]  arXiv:2603.26178 (replaced) [pdf, ps, other]
Title: Geometric Evolution Graph Convolutional Networks: Enhancing Graph Representation Learning via Ricci Flow
Subjects: Machine Learning (cs.LG)
[889]  arXiv:2603.26214 (replaced) [pdf, ps, other]
Title: Optimal b-Colourings and Fall Colourings in $H$-Free Graphs
Subjects: Combinatorics (math.CO); Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
[890]  arXiv:2603.29895 (replaced) [pdf, ps, other]
Title: A Rational Account of Categorization Based on Information Theory
Comments: 6 pages, 5 figures, 2 tables; Published at CogSci 2026 Conference
Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
[891]  arXiv:2603.29917 (replaced) [pdf, ps, other]
Title: Diffusion-Based Feature Denoising with NNMF for Robust handwritten digit multi-class classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[892]  arXiv:2604.01342 (replaced) [pdf, ps, other]
Title: Massively Parallel Exact Inference for Hawkes Processes
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[893]  arXiv:2604.01345 (replaced) [pdf, ps, other]
Title: Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning
Subjects: Machine Learning (cs.LG)
[894]  arXiv:2604.01496 (replaced) [pdf, ps, other]
Title: From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents
Comments: Updated URL for the dataset and models
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
[895]  arXiv:2604.02041 (replaced) [pdf, ps, other]
Title: Stable Hermite transforms via the Golub-Welsch algorithm
Subjects: Numerical Analysis (math.NA)
[896]  arXiv:2604.02995 (replaced) [pdf, ps, other]
Title: A semicontinuous relaxation of Saito's criterion and freeness as angular minimization
Comments: Major revision: section reorganization with computational details moved to appendices. 26 pages, 3 tables
Subjects: Algebraic Geometry (math.AG); Machine Learning (cs.LG); Combinatorics (math.CO)
[897]  arXiv:2604.04211 (replaced) [pdf, ps, other]
Title: LOCARD: An Agentic Framework for Blockchain Forensics
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[898]  arXiv:2604.04333 (replaced) [pdf, ps, other]
Title: What is Human in Judgment? Comparing Automation Bias and Algorithm Aversion Between the United States Military Academy and the General Public
Subjects: Computers and Society (cs.CY)
[899]  arXiv:2604.04896 (replaced) [pdf, ps, other]
Title: Measuring Depth of Matroids
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
[900]  arXiv:2604.06377 (replaced) [pdf, ps, other]
Title: The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[901]  arXiv:2604.07401 (replaced) [pdf, ps, other]
Title: Geometric Entropy and Retrieval Phase Transitions in Continuous Thermal Dense Associative Memory
Comments: Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)
[902]  arXiv:2604.07595 (replaced) [pdf, ps, other]
Title: ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback
Authors: Matthew Penaroza
Comments: 31 pages including appendix, 12 figures, 15 tables, 3 algorithms; evaluation on MuSiQue, HotpotQA, and 2WikiMultiHopQA with Claude Sonnet 4, Haiku 4.5, and GPT-5-mini
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[903]  arXiv:2604.07634 (replaced) [pdf, ps, other]
Title: VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
Comments: CVPR Findings 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[904]  arXiv:2604.08059 (replaced) [pdf, ps, other]
Title: Governed Capability Evolution for Embodied Agents: Safe Upgrade, Compatibility Checking, and Runtime Rollback for Embodied Capability Modules
Comments: 46 pages, 3 figures, 10 tables, 7 appendices
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[905]  arXiv:2604.08548 (replaced) [pdf, ps, other]
Title: ETCH-X: Robustify Expressive Body Fitting to Clothed Humans with Composable Datasets
Comments: Page: this https URL, Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[906]  arXiv:2604.09554 (replaced) [pdf, ps, other]
Title: LABBench2: An Improved Benchmark for AI Systems Performing Biology Research
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[907]  arXiv:2604.11087 (replaced) [pdf, ps, other]
Title: CausalGaze: Unveiling Hallucinations via Counterfactual Graph Intervention in Large Language Models
Comments: Accepted as ACL2026 Findings
Subjects: Machine Learning (cs.LG)
[908]  arXiv:2604.11440 (replaced) [pdf, ps, other]
Title: R3-VAE: Reference Vector-Guided Rating Residual Quantization VAE for Generative Recommendation
Comments: Tech Report
Subjects: Information Retrieval (cs.IR)
[909]  arXiv:2604.11714 (replaced) [pdf, ps, other]
Title: BEM: Training-Free Background Embedding Memory for False-Positive Suppression in Real-Time Fixed-Background Camera
Comments: Accepted to ICPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[910]  arXiv:2604.11840 (replaced) [pdf, ps, other]
Title: When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation
Authors: Sandro Andric
Comments: 12 pages, 7 figures, supplementary material included as ancillary file
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Multiagent Systems (cs.MA)
[911]  arXiv:2604.14552 (replaced) [pdf, ps, other]
Title: DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance
Comments: 16 pages, 42 figures. Evaluation of inference performance on NVIDIA T4 and L4 GPUs across precision modes (FP32, FP16, INT8)
Subjects: Performance (cs.PF); Hardware Architecture (cs.AR); Machine Learning (cs.LG)
[912]  arXiv:2604.14580 (replaced) [pdf, ps, other]
Title: TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
[913]  arXiv:2604.14854 (replaced) [pdf, ps, other]
Title: Towards Optimal Passive Feedback Control of LTI Systems under LQR Performance
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[914]  arXiv:2604.15360 (replaced) [pdf, ps, other]
Title: Mapping High-Performance Regions in Battery Scheduling across Data Uncertainty, Battery Design, and Planning Horizons
Comments: Research supported by Enefit
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[915]  arXiv:2604.15729 (replaced) [pdf, ps, other]
Title: MambaBack: Bridging Local Features and Global Contexts in Whole Slide Image Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[916]  arXiv:2604.15777 (replaced) [pdf, ps, other]
Title: SegMix:Shuffle-based Feedback Learning for Semantic Segmentation of Pathology Images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[917]  arXiv:2604.15918 (replaced) [pdf, ps, other]
Title: A Practical Guide to PID Controller Implementation
Subjects: Systems and Control (eess.SY)
[918]  arXiv:2604.16804 (replaced) [pdf, ps, other]
Title: AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[919]  arXiv:2604.17189 (replaced) [src]
Title: Shepherding UAV Swarm with Action Prediction Based on Movement Constraints
Comments: Incomplete results were found in the paper
Subjects: Robotics (cs.RO)
[920]  arXiv:2604.17332 (replaced) [pdf, ps, other]
Title: On Drift Induced by Local Transition Asymmetry in Combinatorial State Spaces
Authors: Fumio Ishizaki
Subjects: Computational Engineering, Finance, and Science (cs.CE); Probability (math.PR)
[921]  arXiv:2604.17450 (replaced) [pdf, ps, other]
Title: Compiling Deterministic Structure into SLM Harnesses
Subjects: Artificial Intelligence (cs.AI)
[922]  arXiv:2604.17546 (replaced) [pdf, ps, other]
Title: Homogeneous Network Caching is Fixed-Parameter Tractable Parameterized by the Number of Caches
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
[923]  arXiv:2604.17681 (replaced) [pdf, ps, other]
Title: FedCRF: A Federated Cross-domain Recommendation Method with Semantic-driven Deep Knowledge Fusion
Subjects: Information Retrieval (cs.IR)
[924]  arXiv:2604.18231 (replaced) [pdf, ps, other]
Title: AgenTEE: Confidential LLM Agent Execution on Edge Devices
Subjects: Cryptography and Security (cs.CR); Operating Systems (cs.OS)
[925]  arXiv:2604.18336 (replaced) [pdf, ps, other]
Title: Enhancing Glass Surface Reconstruction via Depth Prior for Robot Navigation
Comments: 9 pages, 8 figures
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[926]  arXiv:2604.18396 (replaced) [pdf, ps, other]
Title: River-LLM: Large Language Model Seamless Exit Based on KV Share
Authors: Yingtao Shen, An Zou
Comments: Accepted to ACL 2026, 13pages, with appendix
Subjects: Computation and Language (cs.CL)
[927]  arXiv:2604.19331 (replaced) [pdf, ps, other]
Title: Evaluating LLM-Driven Summarisation of Parliamentary Debates with Computational Argumentation
Comments: Accepted at KR'26 In The Wild Track. Camera Ready with additional supplementary materials
Subjects: Computation and Language (cs.CL)
[928]  arXiv:2604.19354 (replaced) [pdf, ps, other]
Title: Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges
Comments: Accepted to AIWare'26 Benchmark and Dataset Track
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Software Engineering (cs.SE)
[929]  arXiv:2604.20274 (replaced) [pdf, ps, other]
Title: Estimating Power-Law Exponent with Edge Differential Privacy
Subjects: Databases (cs.DB)
[930]  arXiv:2604.20289 (replaced) [pdf, ps, other]
Title: X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
Comments: Technical Report, update demonstration website
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[931]  arXiv:2604.20855 (replaced) [pdf, ps, other]
Title: Caesar: Deep Agentic Web Exploration for Creative Answer Synthesis
Subjects: Information Retrieval (cs.IR); Multiagent Systems (cs.MA)
[932]  arXiv:2604.21174 (replaced) [pdf, ps, other]
Title: Scale-Parameter Selection in Gaussian Kolmogorov-Arnold Networks
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Analysis of PDEs (math.AP)
[933]  arXiv:2604.21251 (replaced) [pdf, ps, other]
Title: CAP: Controllable Alignment Prompting for Unlearning in LLMs
Comments: Accpeted to ACL 2026 Main Conference
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[934]  arXiv:2604.21335 (replaced) [pdf, ps, other]
Title: Sub-Token Routing in LoRA for Adaptation and Query-Aware KV Compression
Authors: Wei Jiang, Wei Wang
Comments: 17 pages, 13 tables, 2 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[935]  arXiv:2604.21352 (replaced) [pdf, ps, other]
Title: CARE: Counselor-Aligned Response Engine for Online Mental-Health Support
Comments: 10 pages, 4 figures
Subjects: Computation and Language (cs.CL)
[936]  arXiv:2604.22795 (replaced) [pdf, ps, other]
Title: Load constrained wind farm flow control through multi-objective multi-agent reinforcement learning
Comments: Submitted to Journal of Physics: Conference Series (Torque 2026). This is the Accepted Manuscript version of an article accepted for publication in Journal of Physics: Conference Series. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. This Accepted Manuscript is published under a CC BY licence
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
[937]  arXiv:2604.23088 (replaced) [pdf, ps, other]
Title: Code Broker: A Multi-Agent System for Automated Code Quality Assessment
Authors: Samer Attrah
Comments: 9 pages, 2 figures, 2 tables, 33 references
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)
[938]  arXiv:2604.23338 (replaced) [pdf, ps, other]
Title: A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
Authors: Kexin Chu
Comments: 58 pages, 8 figures, 15 tables
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[939]  arXiv:2604.23475 (replaced) [pdf, ps, other]
Title: Supernodes and Halos: Loss-Critical Hubs in LLM Feed-Forward Layers
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[940]  arXiv:2604.23899 (replaced) [pdf, ps, other]
Title: Mammographic Lesion Segmentation with Lightweight Models: A Comparative Study
Authors: Helder Oliveira
Comments: Submitted to a journal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[941]  arXiv:2604.24028 (replaced) [pdf, ps, other]
Title: Vulnerability Identification by Harnessing Inter-connected Multi-Source Information
Subjects: Software Engineering (cs.SE)
[942]  arXiv:2604.24300 (replaced) [pdf, ps, other]
Title: ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning
Comments: ICML 2026, Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[943]  arXiv:2604.24636 (replaced) [pdf, ps, other]
Title: Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application
Authors: William Oliveira
Comments: 28 pages, 8 tables, 17 references
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[944]  arXiv:2604.25110 (replaced) [pdf, ps, other]
Title: Knowledge Distillation Must Account for What It Loses
Authors: Wenshuo Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[945]  arXiv:2604.25370 (replaced) [pdf, ps, other]
Title: GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment
Comments: 11 pages; GPT-image-2 social media dataset; Twitter API collection and multilingual curation; C2PA watermark stripping on platform upload; browser-automated AI badge verification; CLIP semantic clustering; AI-generated image provenance and attribution
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[946]  arXiv:2604.25878 (replaced) [pdf, ps, other]
Title: Prime-Field PINI: Machine-Checked Composition Theorems for Post-Quantum NTT Masking
Comments: 17 pages, 1 Figure
Subjects: Cryptography and Security (cs.CR)
[947]  arXiv:2604.26118 (replaced) [pdf, ps, other]
Title: LLM-Guided Issue Generation from Uncovered Code Segments
Subjects: Software Engineering (cs.SE)
[948]  arXiv:2604.26172 (replaced) [pdf, ps, other]
Title: Co-Learning Port-Hamiltonian Systems and Optimal Energy-Shaping Control
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[949]  arXiv:2604.26223 (replaced) [pdf, ps, other]
Title: StreamGuard: Exploring a 5G Architecture for Efficient, Quality of Experience-Aware Video Conferencing
Comments: 31 pages, 35 figures
Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[950]  arXiv:2604.26509 (replaced) [pdf, ps, other]
Title: 3D Generation for Embodied AI and Robotic Simulation: A Survey
Comments: 27 pages, 11 figures, 8 tables
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[951]  arXiv:2604.26689 (replaced) [pdf, ps, other]
Title: Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
Comments: 8 pages main text + appendix; 3 figures, 12 tables;
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[952]  arXiv:2604.26752 (replaced) [pdf, ps, other]
[953]  arXiv:2604.26803 (replaced) [pdf, ps, other]
Title: PM-EKF: A Physiological Model-Based Extended Kalman Filter for Daily-Life Physical Activity Energy Expenditure Estimation
Comments: The main body consists of 11 pages. A 2-page supplementary material is included in the source file as pdf
Subjects: Systems and Control (eess.SY)
[954]  arXiv:2604.27168 (replaced) [pdf, ps, other]
Title: The Field of Safe Motion: Operationalizing Affordances in the Field of Safe Travel Using Reachability Analysis
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
[955]  arXiv:2604.27201 (replaced) [pdf, ps, other]
Title: Path-Lock Expert: Separating Reasoning Mode in Hybrid Thinking via Architecture-Level Separation
Comments: 27 pages, 9 figures, 6 tables. Under review
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[956]  arXiv:2604.27629 (replaced) [pdf, ps, other]
Title: WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning
Authors: Ke Xu
Comments: 16 pages, 3 figures, 8 tables
Subjects: Artificial Intelligence (cs.AI)
[957]  arXiv:2604.27757 (replaced) [pdf, ps, other]
Title: Temporal Routing in Static Networks: The Schedule Completion Problem
Subjects: Data Structures and Algorithms (cs.DS)
[958]  arXiv:2604.27814 (replaced) [pdf, ps, other]
Title: Probabilistic Circuits for Irregular Multivariate Time Series Forecasting
Subjects: Machine Learning (cs.LG)
[959]  arXiv:2604.27859 (replaced) [pdf, ps, other]
Title: A Brief Overview: Agentic Reinforcement Learning In Large Language Models
Subjects: Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
[960]  arXiv:2604.27891 (replaced) [pdf, ps, other]
Title: In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks
Comments: 20 pages
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[961]  arXiv:2604.27953 (replaced) [pdf, ps, other]
Title: The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[962]  arXiv:2604.28192 (replaced) [pdf, ps, other]
Title: LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning
Comments: LaST-R1 Technical Report
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[963]  arXiv:2605.00364 (replaced) [pdf, ps, other]
Title: Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning
Comments: 17 pages, 2 figures
Subjects: Computation and Language (cs.CL)
[964]  arXiv:2605.00445 (replaced) [pdf, ps, other]
Title: The Power of Order: Fooling LLMs with Adversarial Table Permutations
Subjects: Machine Learning (cs.LG)
[965]  arXiv:2605.00457 (replaced) [pdf, ps, other]
Title: A Policy-Driven DRL Framework for System-Level Tradeoff Control in NR-U/Wi-Fi Coexistence
Comments: 13 pages, 13 figures, 1 table, submitted to IEEE Open Journal of Vehicular Technology
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Systems and Control (eess.SY)
[966]  arXiv:2605.00473 (replaced) [pdf, ps, other]
Title: Near-optimal and Efficient First-Order Algorithm for Multi-Task Learning with Shared Linear Representation
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[967]  arXiv:2605.00613 (replaced) [pdf, ps, other]
Title: KingsGuard: Enclave Data Protection Under Real-World TEE Vulnerabilities
Comments: 15 pages, 12 figures. Accepted at ACM CCS 2026
Subjects: Cryptography and Security (cs.CR)
[968]  arXiv:2605.00699 (replaced) [pdf, ps, other]
Title: STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack
Subjects: Cryptography and Security (cs.CR)
[969]  arXiv:2605.00877 (replaced) [pdf, ps, other]
Title: OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models
Comments: Work in progress
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[970]  arXiv:2605.00953 (replaced) [pdf, ps, other]
Title: Information Accessibility Limits in Structured NP Search
Authors: Jing-Yuan Wei
Comments: 23 pages. Includes appendices with explicit constructions and numerical examples
Subjects: Information Theory (cs.IT); Computational Complexity (cs.CC); Optimization and Control (math.OC)
[971]  arXiv:2605.01090 (replaced) [pdf, ps, other]
Title: Sampled-data Robust Control of Electrically Stimulated Engineered Cell Factories
Subjects: Systems and Control (eess.SY)
[972]  arXiv:2605.01278 (replaced) [pdf, ps, other]
Title: Valley3: Scaling Omni Foundation Models for E-commerce
Subjects: Artificial Intelligence (cs.AI)
[973]  arXiv:2605.01345 (replaced) [pdf, ps, other]
Title: The Perceptual Bandwidth Bottleneck in Vision-Language Models: Active Visual Reasoning via Sequential Experimental Design
Comments: 27 pages, 5 figures, accepted at ICML 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[974]  arXiv:2605.01562 (replaced) [pdf, ps, other]
Title: Neuro-Symbolic Agents for Hallucination-Free Requirements Reuse
Authors: Ahmed F. Ibrahim
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[975]  arXiv:2605.01643 (replaced) [pdf, ps, other]
Title: AI Alignment via Incentives and Correction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[976]  arXiv:2605.01670 (replaced) [pdf, ps, other]
Title: Maxwell à la Helmholtz: Direct boundary integral equations for 3D scattering by perfect electric conductors via Helmholtz operators
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
[977]  arXiv:2605.01699 (replaced) [pdf, ps, other]
Title: Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Neural and Evolutionary Computing (cs.NE)
[978]  arXiv:2605.01711 (replaced) [pdf, ps, other]
Title: Linear-Time Global Visual Modeling without Explicit Attention
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[979]  arXiv:2605.01720 (replaced) [pdf, ps, other]
Title: SignVerse-2M: A Two-Million-Clip Pose-Native Universe of 55+ Sign Languages
Comments: The included languages actually amount to 55+, and the 25 types refer to those that exceed 15 hours. 13 pages. Project Page at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[980]  arXiv:2605.01845 (replaced) [pdf, ps, other]
Title: Efficient Decision Procedures for RNmatrix Semantics
Subjects: Logic in Computer Science (cs.LO)
[981]  arXiv:2605.01847 (replaced) [pdf, ps, other]
Title: NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles
Authors: Jia Xiao
Comments: 30 pages, 11 figures
Subjects: Artificial Intelligence (cs.AI)
[982]  arXiv:2605.01978 (replaced) [pdf, ps, other]
Title: Stability of Control Lyapunov Function Guided Reinforcement Learning
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
[983]  arXiv:2605.02003 (replaced) [pdf, ps, other]
Title: RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[984]  arXiv:2605.02059 (replaced) [pdf, ps, other]
Title: RenCon 2025: Revival of the Expressive Performance Rendering Competition
Comments: Accepted at NIME 2026
Subjects: Multimedia (cs.MM); Sound (cs.SD)
[985]  arXiv:2605.02167 (replaced) [pdf, ps, other]
Title: Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution
Comments: 32 pages, 13 figures, 12 tables. Accepted to ICML 2026; includes appendix
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[986]  arXiv:2605.02317 (replaced) [pdf, ps, other]
Title: Anon: Extrapolating Adaptivity Beyond SGD and Adam
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[987]  arXiv:2605.02320 (replaced) [pdf, ps, other]
Title: ANO: A Principled Approach to Robust Policy Optimization
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[988]  arXiv:2605.02327 (replaced) [pdf, ps, other]
Title: Denoising data using convex relaxations
Comments: 38 pages, 6 figures
Subjects: Methodology (stat.ME); Machine Learning (cs.LG)
[989]  arXiv:2605.02356 (replaced) [pdf, ps, other]
Title: ZNO: Stable Rational Neural Operators in the Z-Domain for Discrete-Time Dynamics
Authors: Xianli Zhu, Jia Yin
Comments: Corrected metadata; article content unchanged
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
[990]  arXiv:2605.02463 (replaced) [pdf, ps, other]
Title: When Stress Becomes Signal: Detecting Antifragility-Compatible Regimes in Multi-Agent LLM Systems
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
[991]  arXiv:2605.02475 (replaced) [pdf, ps, other]
Title: Shadow-Loom: Causal Reasoning over Graphical World Models of Narratives
Authors: David Wilmot
Comments: 7 pages, 28 pages total
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[992]  arXiv:2605.02489 (replaced) [pdf, ps, other]
Title: GRAIL: A Deep-Granularity Hybrid Resonance Framework for Real-Time Agent Discovery via SLM-Enhanced Indexing
Authors: Jinliang Xu
Comments: 8 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
[993]  arXiv:2605.02502 (replaced) [pdf, ps, other]
Title: GuardSec: A Multi-Modal Web Platform for Real-Time Digital Fraud Detection, Entity Verification, and Connection Security Analysis in the African Context
Comments: first version
Subjects: Cryptography and Security (cs.CR)
[994]  arXiv:2605.02558 (replaced) [pdf, ps, other]
Title: TemPose-TF-ASF: Two-Stage Bidirectional Stroke Context Fusion for Badminton Stroke Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[995]  arXiv:2605.02669 (replaced) [pdf, ps, other]
Title: An explainable hypothesis-driven approach to Drug-Induced Liver Injury with HADES
Subjects: Artificial Intelligence (cs.AI)
[996]  arXiv:2605.02740 (replaced) [pdf, ps, other]
Title: Foundation Models to Unlock Real-World Evidence from Nationwide Medical Claims
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[997]  arXiv:2605.02841 (replaced) [pdf, ps, other]
Title: TRACE: Temporal Reasoning over Context and Evidence for Activity Recognition in Smart Homes
Subjects: Human-Computer Interaction (cs.HC)
[998]  arXiv:2605.02856 (replaced) [pdf, ps, other]
Title: The 1-Bit Barrier is Universal: k-Stage Pipeline Composition and Unified Leakage Bounds for Standard Modular Reductions in PQC Hardware
Comments: 30 pages, 0 figures
Subjects: Cryptography and Security (cs.CR)
[999]  arXiv:2605.02910 (replaced) [pdf, ps, other]
Title: CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing
Comments: 57 Pages, 14 Tables, 27 Figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[1000]  arXiv:2605.02973 (replaced) [pdf, ps, other]
Title: Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges
Comments: Accepted to ICML 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[1001]  arXiv:2605.02990 (replaced) [pdf, ps, other]
Title: ChaRVoC: A Challenge-Response Voice Cancelable Authentication System
Comments: Accepted at the Conference on Information Technology and its Applications (CITA 2026)
Subjects: Cryptography and Security (cs.CR)
[1002]  arXiv:2605.03148 (replaced) [pdf, ps, other]
Title: Boundary-Aware Uncertainty Quantification for Wildfire Spread Prediction
Authors: Jonas V. Funk
Comments: 10 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1003]  arXiv:2605.03208 (replaced) [pdf, ps, other]
Title: Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs
Subjects: Software Engineering (cs.SE)
[1004]  arXiv:2605.03212 (replaced) [pdf, ps, other]
Title: ADAPTS: Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Applications (stat.AP); Computation (stat.CO)
[1005]  arXiv:2605.03269 (replaced) [pdf, ps, other]
[1006]  arXiv:2605.03278 (replaced) [pdf, ps, other]
Title: Copula-Based Endogeneity Correction for Doubly Robust Estimation of Treatment Effect
Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI)
[1007]  arXiv:2605.03314 (replaced) [pdf, ps, other]
Title: When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning
Comments: Accepted by ICML'2026
Subjects: Computation and Language (cs.CL)
[1008]  arXiv:2605.03348 (replaced) [pdf, ps, other]
Title: Toward Structural Multimodal Representations: Specialization, Selection, and Sparsification via Mixture-of-Experts
Comments: Published at ICML 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[1009]  arXiv:2605.03361 (replaced) [pdf, ps, other]
Title: ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval
Comments: 6 pages, 3 figures, 2 tables
Subjects: Artificial Intelligence (cs.AI)
[1010]  arXiv:2605.03410 (replaced) [pdf, ps, other]
Title: Geometry over Density: Few-Shot Cross-Domain OOD Detection
Subjects: Artificial Intelligence (cs.AI)
[1011]  arXiv:2605.03437 (replaced) [pdf, ps, other]
Title: Learning Discriminative Signed Distance Functions from Multi-scale Level-of-detail Features for 3D Anomaly Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[1012]  arXiv:2605.03456 (replaced) [pdf, ps, other]
Title: VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1013]  arXiv:2605.03475 (replaced) [pdf, ps, other]
Title: WorldJen: An End-to-End Multi-Dimensional Benchmark for Generative Video Models
Comments: 30 pages +25 appendix
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1014]  arXiv:2605.03598 (replaced) [pdf, ps, other]
Title: Unifying Dynamical Systems and Graph Theory to Mechanistically Understand Computation in Neural Networks
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
[1015]  arXiv:2605.03619 (replaced) [pdf, ps, other]
Title: The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code
Subjects: Cryptography and Security (cs.CR)
[1016]  arXiv:2605.03686 (replaced) [pdf, ps, other]
Title: From Code to Prediction: Fine-Tuning LLMs for Neural Network Performance Classification in NNGPT
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[1017]  arXiv:2605.03691 (replaced) [pdf, ps, other]
Title: Small Matrices with Small Inverses: Unimodular Zerofree Cases
Authors: Steven Finch
Comments: 10 pages; restored missing matrix in sec. 3; extended sequences in secs. 1 & 3
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Number Theory (math.NT)
[1018]  arXiv:2605.03770 (replaced) [pdf, ps, other]
Title: Firmware Distribution as Attack Surface: A Security Study of ASIC Cryptocurrency Miners
Comments: 20 pages, 7 figures. Submitted to a security conference. Includes empirical analysis of 134 firmware images across major ASIC manufacturers
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
[1019]  arXiv:2605.03796 (replaced) [pdf, ps, other]
Title: Capability centrality: the next step from scale-free property
Authors: Mikhail Tuzhilin
Subjects: Social and Information Networks (cs.SI)
[1020]  arXiv:2605.03855 (replaced) [pdf, ps, other]
Title: Evaluating Generative Models as Interactive Emergent Representations of Human-Like Collaborative Behavior
Comments: Under review
Subjects: Robotics (cs.RO)
[1021]  arXiv:2605.03862 (replaced) [pdf, ps, other]
Title: Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards
Comments: 36 pages
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[1022]  arXiv:2605.03929 (replaced) [pdf, ps, other]
Title: PHALAR: Phasors for Learned Musical Audio Representations
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP)
[1023]  arXiv:2605.03941 (replaced) [pdf, ps, other]
Title: iWorld-Bench: A Benchmark for Interactive World Models with a Unified Action Generation Framework
Comments: Accepted at ICML 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[1024]  arXiv:2605.03953 (replaced) [pdf, ps, other]
Title: Transformers with Selective Access to Early Representations
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[1025]  arXiv:2605.04000 (replaced) [pdf, ps, other]
Title: Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning
Subjects: Software Engineering (cs.SE)
[ total of 1025 entries: 1-1025 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2605, contact, help  (Access key information)