We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 1074 entries: 1-1074 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 20 Mar 26

[1]  arXiv:2603.18005 [pdf, ps, other]
Title: Negative Sampling Techniques in Information Retrieval: A Survey
Comments: Accepted at findings EACL 2026
Subjects: Information Retrieval (cs.IR)

Information Retrieval (IR) is fundamental to many modern NLP applications. The rise of dense retrieval (DR), using neural networks to learn semantic vector representations, has significantly advanced IR performance. Central to training effective dense retrievers through contrastive learning is the selection of informative negative samples. Synthesizing 35 seminal papers, this survey provides a comprehensive and up-to-date overview of negative sampling techniques in dense IR. Our unique contribution is the focus on modern NLP applications and the inclusion of recent Large Language Model (LLM)-driven methods, an area absent in prior reviews. We propose a taxonomy that categorizes techniques including random, static/dynamically mined, and synthetic datasets. We then analyze these approaches with respect to trade-offs between effectiveness, computational cost, and implementation difficulty. The survey concludes by outlining current challenges and promising future directions for the use of LLM-generated synthetic data.

[2]  arXiv:2603.18007 [pdf, ps, other]
Title: Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm
Comments: 19 pages, 2 figures, 6 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The study explores whether current Large Language Models (LLMs) exhibit Theory of Mind (ToM) capabilities -- specifically, the ability to infer others' beliefs, intentions, and emotions from text. Given that LLMs are trained on language data without social embodiment or access to other manifestations of mental representations, their apparent social-cognitive reasoning raises key questions about the nature of their understanding. Are they capable of robust mental-state attribution indistinguishable from human ability in its output, or do their outputs merely reflect superficial pattern completion? To address this question, we tested five LLMs and compared their performance to that of human controls using an adapted version of a text-based tool widely used in human ToM research. The test involves answering questions about the beliefs, intentions, and emotions of story characters. The results revealed a performance gap between the models. Earlier and smaller models were strongly affected by the number of relevant inferential cues available and, to some extent, were also vulnerable to the presence of irrelevant or distracting information in the texts. In contrast, GPT-4o demonstrated high accuracy and strong robustness, performing comparably to humans even in the most challenging conditions. This work contributes to ongoing debates about the cognitive status of LLMs and the boundary between genuine understanding and statistical approximation.

[3]  arXiv:2603.18008 [pdf, ps, other]
Title: TherapyGym: Evaluating and Aligning Clinical Fidelity and Safety in Therapy Chatbots
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Large language models (LLMs) are increasingly used for mental-health support; yet prevailing evaluation methods--fluency metrics, preference tests, and generic dialogue benchmarks--fail to capture the clinically critical dimensions of psychotherapy. We introduce THERAPYGYM, a framework that evaluates and improves therapy chatbots along two clinical pillars: fidelity and safety. Fidelity is measured using the Cognitive Therapy Rating Scale (CTRS), implemented as an automated pipeline that scores adherence to CBT techniques over multi-turn sessions. Safety is assessed using a multi-label annotation scheme, covering therapy-specific risks (e.g., failing to address harm or abuse). To mitigate bias and unreliability in LLM-based judges, we further release THERAPYJUDGEBENCH, a validation set of 116 dialogues with 1,270 expert ratings for auditing and calibration against licensed clinicians. THERAPYGYM also serves as a training harness: CTRS and safety-based rewards drive RL with configurable patient simulations spanning diverse symptom profiles. Models trained in THERAPYGYM improve on expert ratings, with average CTRS rising from 0.10 to 0.60 (and 0.16 to 0.59 under LLM judges). Our work enables scalable development of therapy chatbots that are faithful to evidence-based practice and safer in high-stakes use.

[4]  arXiv:2603.18009 [pdf, ps, other]
Title: How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding
Comments: 27 pages,16 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

With the widespread adoption of large language models (LLMs) in natural language processing, prompt engineering and retrieval-augmented generation (RAG) have become mainstream to enhance LLMs' performance on complex tasks. However, LLMs generate outputs autoregressively, leading to inevitable output uncertainty. Since model performance is highly sensitive to prompt design, precise uncertainty measurement is crucial for reliable prompt optimization. For multi-class multiple-choice (understanding) tasks, conventional uncertainty measures (e.g., entropy) based on output probabilities treat all classes equally and ignore class prior differences in pretraining corpora. This failure to distinguish spurious confidence (from priors) from true certainty (from contextual understanding) results in poor confidence calibration. To address this, we propose Log-Scale Focal Uncertainty (LSFU), a first-token-based metric inspired by focal loss. LSFU incorporates label prior probabilities as a risk-modulation factor to suppress noise from high-frequency classes and emphasize risk for low-frequency long-tail classes, with a dynamic weighting mechanism unifying the measurement scale. Based on LSFU, we further propose the uncertainty-calibrated prompt optimization framework (UCPOF), which leverages the first token of model outputs to select high-quality exemplars and dynamically optimize prompts. Comprehensive evaluations show UCPOF improves average accuracy by 6.03% over few-shot baselines, surpasses always-on full RAG by 5.75% in overall average accuracy, and reduces the average retrieval trigger rate by 50.66%. By adaptively triggering RAG only for high-uncertainty samples, our framework significantly lowers computational costs while maintaining state-of-the-art performance.

[5]  arXiv:2603.18010 [pdf, ps, other]
Title: Agentic Framework for Political Biography Extraction
Comments: 70 pages, 14 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

The production of large-scale political datasets typically demands extracting structured facts from vast piles of unstructured documents or web sources, a task that traditionally relies on expensive human experts and remains prohibitively difficult to automate at scale. In this paper, we leverage Large Language Models (LLMs) to automate the extraction of multi-dimensional elite biographies, addressing a long-standing bottleneck in political science research. We propose a two-stage ``Synthesis-Coding'' framework for complex extraction task: an upstream synthesis stage that uses recursive agentic LLMs to search, filter, and curate biography from heterogeneous web sources, followed by a downstream coding stage that maps curated biography into structured dataframes. We validate this framework through three primary results. First, we demonstrate that, when given curated contexts, LLM coders match or outperform human experts in extraction accuracy. Second, we show that in web environments, the agentic system synthesizes more information from web resources than human collective intelligence (Wikipedia). Finally, we diagnosed that directly coding from long and multi-language corpora introduces bias that the synthesis stage can alleviate by curating evidence into signal-dense representations. By comprehensive evaluation, We provide a generalizable, scalable framework for building transparent and expansible large scale database in political science.

[6]  arXiv:2603.18011 [pdf, ps, other]
Title: Controllable Evidence Selection in Retrieval-Augmented Question Answering via Deterministic Utility Gating
Authors: Victor P. Unda
Comments: 21 pages, 1 figures, 4 tables
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Many modern AI question-answering systems convert text into vectors and retrieve the closest matches to a user question. While effective for topical similarity, similarity scores alone do not explain why some retrieved text can serve as evidence while other equally similar text cannot. When many candidates receive similar scores, systems may select sentences that are redundant, incomplete, or address different conditions than the question requires.
This paper presents a deterministic evidence selection framework for retrieval-augmented question answering. The approach introduces Meaning-Utility Estimation (MUE) and Diversity-Utility Estimation (DUE), fixed scoring and redundancy-control procedures that determine evidence admissibility prior to answer generation. Each sentence or record is evaluated independently using explicit signals for semantic relatedness, term coverage, conceptual distinctiveness, and redundancy. No training or fine-tuning is required.
In the prototype, a unit is accepted only if it explicitly states the fact, rule, or condition required by the task. Units are not merged or expanded. If no unit independently satisfies the requirement, the system returns no answer. This deterministic gating produces compact, auditable evidence sets and establishes a clear boundary between relevant text and usable evidence.

[7]  arXiv:2603.18012 [pdf, ps, other]
Title: DynaRAG: Bridging Static and Dynamic Knowledge in Retrieval-Augmented Generation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

We present DynaRAG, a retrieval-augmented generation (RAG) framework designed to handle both static and time-sensitive information needs through dynamic knowledge integration. Unlike traditional RAG pipelines that rely solely on static corpora, DynaRAG selectively invokes external APIs when retrieved documents are insufficient for answering a query. The system employs an LLM-based reranker to assess document relevance, a sufficiency classifier to determine when fallback is necessary, and Gorilla v2 -- a state-of-the-art API calling model -- for accurate tool invocation. We further enhance robustness by incorporating schema filtering via FAISS to guide API selection. Evaluations on the CRAG benchmark demonstrate that DynaRAG significantly improves accuracy on dynamic questions, while also reducing hallucinations. Our results highlight the importance of dynamic-aware routing and selective tool use in building reliable, real-world question-answering systems.

[8]  arXiv:2603.18013 [pdf, ps, other]
Title: Learned but Not Expressed: Capability-Expression Dissociation in Large Language Models
Comments: 12 pages, 3 figures
Subjects: Computation and Language (cs.CL)

Large language models (LLMs) demonstrate the capacity to reconstruct and trace learned content from their training data under specific elicitation conditions, yet this capability does not manifest in standard generation contexts. This empirical observational study examines the expression of non-causal, non-implementable solution types across 300 prompt-response generations spanning narrative and problem-solving task contexts. Drawing on recent findings regarding memorization contiguity and alignment-induced discourse priors, we document a systematic dissociation between learned capability and expressed output. Across three distinct LLMs, ten task scenarios, and both creative narrative and practical advisory contexts, we documented zero instances of non-causal solution frames in generated outputs (0%, 95% CI: [0%, 1.2%]), despite verified reconstruction capability under conditional extraction. These findings challenge the prevailing assumption that training data presence directly predicts output probability, demonstrating instead that task-conditioned generation policies can comprehensively suppress learned content across diverse contexts. The results offer implications for understanding generation dynamics, output distribution control, and the behavioral boundaries of contemporary LLMs.

[9]  arXiv:2603.18014 [pdf, ps, other]
Title: Real-Time Trustworthiness Scoring for LLM Structured Outputs and Data Extraction
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Structured Outputs from current LLMs exhibit sporadic errors, hindering enterprise AI efforts from realizing their immense potential. We present CONSTRUCT, a method to score the trustworthiness of LLM Structured Outputs in real-time, such that lower-scoring outputs are more likely to contain errors. This reveals the best places to focus limited human review bandwidth. CONSTRUCT additionally scores the trustworthiness of each field within a LLM Structured Output, helping reviewers quickly identify which parts of the output are wrong. Our method is suitable for any LLM (including black-box LLM APIs without logprobs such as reasoning models and Anthropic models), does not require labeled training data nor custom model deployment, and works for complex Structured Outputs with many fields of diverse types (including nested JSON schemas).
We additionally present one of the first public LLM Structured Output benchmarks with reliable ground-truth values that are not full of mistakes. Over this four-dataset benchmark, CONSTRUCT detects errors from various LLMs (including Gemini 3 and GPT-5) with significantly higher precision/recall than other scoring methods.

[10]  arXiv:2603.18015 [pdf, ps, other]
Title: Beyond Accuracy: An Explainability-Driven Analysis of Harmful Content Detection
Comments: This paper has been accepted at TrustNet 2026 (this https URL). The final version will appear in Springer (LNNS), 2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Although automated harmful content detection systems are frequently used to monitor online platforms, moderators and end users frequently cannot understand the logic underlying their predictions. While recent studies have focused on increasing classification accuracy, little focus has been placed on comprehending why neural models identify content as harmful, especially when it comes to borderline, contextual, and politically sensitive situations. In this work, a neural harmful content detection model trained on the Civil Comments dataset is analyzed explainability-drivenly. Two popular post-hoc explanation methods, Shapley Additive Explanations and Integrated Gradients, are used to analyze the behavior of a RoBERTa-based classifier in both correct predictions and systematic failure cases. Despite strong overall performance, with an area under the curve of 0.93 and an accuracy of 0.94, the analysis reveals limitations that are not observable from aggregate evaluation metrics alone. Integrated Gradients appear to extract more diffuse contextual attributions while Shapley Additive Explanations extract more focused attributions on explicit lexical cues. The consequent divergence in their outputs manifests in both false negatives and false positives. Qualitative case studies reveal recurring failure modes such as indirect toxicity, lexical over-attribution, or political discourse. The results suggest that explainable AI can foster human-in-the-loop moderation by exposing model uncertainty and increasing the interpretable rationale behind automated decisions. Most importantly, this work highlights the role of explainability as a transparency and diagnostic resource for online harmful content detection systems rather than as a performance-enhancing lever.

[11]  arXiv:2603.18016 [pdf, ps, other]
Title: MineDraft: A Framework for Batch Parallel Speculative Decoding
Comments: This paper proposes MineDraft, a framework that speeds up speculative decoding by overlapping drafting and verification, hiding drafting latency, and delivering improved throughput and latency
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to propose draft tokens that are subsequently verified by a larger target model. However, the performance of standard SD is often limited by the strictly sequential execution of these drafting and verification stages. To address this, this paper proposes MineDraft, a batch parallel speculative decoding (PSD) framework designed to effectively hide drafting latency by overlapping it with verification. Our theoretical analysis shows that PSD is substantially more efficient than standard SD. MineDraft realizes the PSD through a novel batch-parallel design that maintains two batches of requests, overlapping drafting for one batch with verification for the other. Our experimental results show significant improvements of MineDraft in both throughput (up to 75%) and end-to-end latency (up to 39%) over standard SD. Furthermore, we have implemented MineDraft as a plugin for vLLM, demonstrating its practicality for production-ready inference systems.

[12]  arXiv:2603.18017 [pdf, ps, other]
Title: Frayed RoPE and Long Inputs: A Geometric Perspective
Comments: Accepted by ICLR 2026
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Rotary Positional Embedding (RoPE) is a widely adopted technique for encoding position in language models, which, while effective, causes performance breakdown when input length exceeds training length. Prior analyses assert (rightly) that long inputs cause channels to rotate ``out of distribution,'' but it is not clear how extra rotation relates to or causes pathological behavior. Through empirical and theoretical analysis we advance a unified geometric understanding of attention behavior with RoPE. We find that attention induces tight clustering of separated key and query latent point clouds, allowing for creation of sink tokens: placeholders that allow attention heads to avoid token mixing when not required. RoPE applied to longer inputs damages this key/query cluster separation, producing pathological behavior by inhibiting sink token functionality. From this geometric perspective, we propose RoPE-ID (In Distribution), a straightforward modification that allows attention layers to generalize to longer inputs out of the box: apply RoPE with high frequency to a subset of channels. We demonstrate the effectiveness of RoPE-ID for extended inputs using 1B and 3B parameter Transformers on the LongBench and RULER information retrieval benchmarks.

[13]  arXiv:2603.18018 [pdf, ps, other]
Title: An Agentic System for Schema Aware NL2SQL Generation
Subjects: Computation and Language (cs.CL); Databases (cs.DB)

The natural language to SQL (NL2SQL) task plays a pivotal role in democratizing data access by enabling non-expert users to interact with relational databases through intuitive language. While recent frameworks have enhanced translation accuracy via task specialization, their reliance on Large Language Models (LLMs) raises significant concerns regarding computational overhead, data privacy, and real-world deployability in resource-constrained environments. To address these challenges, we propose a schema based agentic system that strategically employs Small Language Models (SLMs) as primary agents, complemented by a selective LLM fallback mechanism. The LLM is invoked only upon detection of errors in SLM-generated output, the proposed system significantly minimizes computational expenditure. Experimental results on the BIRD benchmark demonstrate that our system achieves an execution accuracy of 47.78% and a validation efficiency score of 51.05%, achieving over 90% cost reduction compared to LLM-centric baselines as approximately 67% of queries are resolved using local SLMs. The system achieves an average cost per query of 0.0085 compared to 0.094 for LLM-only systems, achieving near-zero operational costs for locally executed queries. [Github repository: https://github.com/mindslab25/CESMA.]

[14]  arXiv:2603.18019 [pdf, ps, other]
Title: BenchBrowser -- Collecting Evidence for Evaluating Benchmark Validity
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Do language model benchmarks actually measure what practitioners intend them to ? High-level metadata is too coarse to convey the granular reality of benchmarks: a "poetry" benchmark may never test for haikus, while "instruction-following" benchmarks will often test for an arbitrary mix of skills. This opacity makes verifying alignment with practitioner goals a laborious process, risking an illusion of competence even when models fail on untested facets of user interests. We introduce BenchBrowser, a retriever that surfaces evaluation items relevant to natural language use cases over 20 benchmark suites. Validated by a human study confirming high retrieval precision, BenchBrowser generates evidence to help practitioners diagnose low content validity (narrow coverage of a capability's facets) and low convergent validity (lack of stable rankings when measuring the same capability). BenchBrowser, thus, helps quantify a critical gap between practitioner intent and what benchmarks actually test.

[15]  arXiv:2603.18020 [pdf, ps, other]
Title: CaseLinker: An Open-Source System for Cross-Case Analysis of Internet Crimes Against Children Reports -- Technical Report & Initial Release
Comments: 23 pages, independent project
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Child sexual exploitation and abuse (CSEA) case data is inherently disturbing, fragmented across multiple organizations, jurisdictions, and agencies, with varying levels of detail and formatting, making cross-case analysis, pattern identification, and trend detection challenging. This paper presents CaseLinker, a modular system for ingesting, processing, analyzing, and visualizing CSEA case data. CaseLinker employs a hybrid deterministic information extraction approach combining regex-based extraction for structured data (demographics, platforms, evidence) with pattern-based semantic analysis for severity indicators and case topics, ensuring interpretability and auditability. The system extracts relevant case information, populates a comprehensive case schema, creates six interactive visualizations (Timeline, Severity Indicators, Case Visualization, Previous Perpetrator Status, Environment/Platforms, Organizations Involved), provides a platform for deeper automated and manual analysis, groups similar cases using weighted Jaccard similarity across multiple dimensions (platforms, demographics, topics, severity, investigation type), and provides automated triage and insights based on collected case data. CaseLinker is evaluated on 47 cases from publicly available AZICAC reports (2011-2014), demonstrating effective information extraction, case clustering, automated insights generation, and interactive visualization capabilities. CaseLinker addresses critical challenges in case analysis including fragmented data sources, cross-case pattern identification, and the emotional burden of repeatedly processing disturbing case material.

[16]  arXiv:2603.18025 [pdf, ps, other]
Title: Understanding the Relationship Between Firms' AI Technology Innovation and Consumer Complaints
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Applications (stat.AP)

In the artificial intelligence (AI) age, firms increasingly invest in AI technology innovation to secure competitive advantages. However, the relationship between firms' AI technology innovation and consumer complaints remains insufficiently explored. Drawing on Protection Motivation Theory (PMT), this paper investigates how firms' AI technology innovation influences consumer complaints. Employing a multimethod approach, Study 1 analyzes panel data from S&P 500 firms (N = 2,758 firm-year observations), Study 2 examines user-generated Reddit data (N = 2,033,814 submissions and comments), and Study 3 involves two controlled experiments (N = 410 and N = 500). The results reveal that firms' AI technology innovation significantly increases consumers' threat-related emotions, heightening their complaints. Furthermore, compared to AI process innovation, AI product innovation leads to higher consumer complaints. This paper advances the understanding of consumers' psychological responses to firms' AI innovation and provides practical implications for managing consumer complaints effectively.

[17]  arXiv:2603.18028 [pdf, ps, other]
Title: Clinically Meaningful Explainability for NeuroAI: An ethical, technical, and clinical perspective
Comments: 20 pages, 2 figures
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

While explainable AI (XAI) is often heralded as a means to enhance transparency and trustworthiness in closed-loop neurotechnology for psychiatric and neurological conditions, its real-world prevalence remains low. Moreover, empirical evidence suggests that the type of explanations provided by current XAI methods often fails to align with clinicians' end-user needs. In this viewpoint, we argue that clinically meaningful explainability (CME) is essential for AI-enabled closed-loop medical neurotechnology and must be addressed from an ethical, technical, and clinical perspective. Instead of exhaustive technical detail, clinicians prioritize clinically relevant, actionable explanations, such as clear representations of input-output relationships and feature importance. Full technical transparency, although theoretically desirable, often proves irrelevant or even overwhelming in practice, as it may lead to informational overload. Therefore, we advocate for CME in the neurotechnology domain: prioritizing actionable clarity over technical completeness and designing interface visualizations that intuitively map AI outputs and key features into clinically meaningful formats. To this end, we introduce a reference architecture called NeuroXplain, which translates CME into actionable technical design recommendations for any future neurostimulation device. Our aim is to inform stakeholders working in neurotechnology and regulatory framework development to ensure that explainability fulfills the right needs for the right stakeholders and ultimately leads to better patient treatment and care.

[18]  arXiv:2603.18029 [pdf, ps, other]
Title: Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
Authors: J. Clayton Kerce
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Transformers resist surgical control. Ablating an attention head identified as critical for capitalization produces minimal behavioral change because distributed redundancy compensates for damage. This Hydra effect renders interpretability illusory: we may identify components through correlation, but cannot predict or control their causal role. We demonstrate that architectural interventions can expose hidden modularity. Our approach combines dual-stream processing separating token and contextual representations, per-layer supervision providing independent gradient signal at each depth, and gated attention regularizing toward discrete activation patterns. When trained with per-layer supervision, models produce ablation effects 5 to 23 times larger than architecturally identical controls trained with standard objectives. This enables 4 times greater control leverage on targeted behaviors: scaling identified attention heads produces smooth, predictable changes in model output. The key finding is architectural. Without per-layer supervision, ablation damage concentrates near zero with low variance (Winograd standard deviation 0.63%). With per-layer supervision, effects spread widely (standard deviation 6.32%), revealing which predictions depend on which circuits. The larger variance is not measurement noise but the signature of unmasked modularity. We validate our approach through three components: engineered features that capture computational dynamics rather than vocabulary structure (validated by near-zero correlation with raw activation clustering), an architecture providing positive control for modularity, and causal experiments demonstrating functional reorganization where different tasks route through different attention heads. This es tablishes a methodology for transforming interpretability from passive observation to active control.

[19]  arXiv:2603.18030 [pdf, ps, other]
Title: Quine: Realizing LLM Agents as Native POSIX Processes
Authors: Hao Ke
Comments: 10 pages, 3 figures. Reference implementation available on this https URL
Subjects: Operating Systems (cs.OS); Artificial Intelligence (cs.AI); Programming Languages (cs.PL); Software Engineering (cs.SE)

Current LLM agent frameworks often implement isolation, scheduling, and communication at the application layer, even though these mechanisms are already provided by mature operating systems. Instead of introducing another application-layer orchestrator, this paper presents Quine, a runtime architecture and reference implementation that realizes LLM agents as native POSIX processes. The mapping is explicit: identity is PID, interface is standard streams and exit status, state is memory, environment variables, and filesystem, and lifecycle is fork/exec/exit. A single executable implements this model by recursively spawning fresh instances of itself. By grounding the agent abstraction in the OS process model, Quine inherits isolation, composition, and resource control directly from the kernel, while naturally supporting recursive delegation, context renewal via exec, and shell-native composition. The design also exposes where the POSIX process model stops: processes provide a robust substrate for execution, but not a complete runtime model for cognition. In particular, the analysis points toward two immediate extensions beyond process semantics: task-relative worlds and revisable time. A reference implementation of Quine is publicly available on GitHub.

[20]  arXiv:2603.18031 [pdf, ps, other]
Title: InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs) scale linearly but often struggle to capture high-rank and synchronous global interactions. We present a consistency boundary analysis that characterizes when diagonal short-memory SSMs can approximate causal attention and identifies structural gaps that remain. Motivated by this analysis, we propose InfoMamba, an attention-free hybrid architecture. InfoMamba replaces token-level self-attention with a concept bottleneck linear filtering layer that serves as a minimal-bandwidth global interface and integrates it with a selective recurrent stream through information-maximizing fusion (IMF). IMF dynamically injects global context into the SSM dynamics and encourages complementary information usage through a mutual-information-inspired objective. Extensive experiments on classification, dense prediction, and non-vision tasks show that InfoMamba consistently outperforms strong Transformer and SSM baselines, achieving competitive accuracy-efficiency trade-offs while maintaining near-linear scaling.

[21]  arXiv:2603.18032 [pdf, ps, other]
Title: Towards Differentiating Between Failures and Domain Shifts in Industrial Data Streams
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Anomaly and failure detection methods are crucial in identifying deviations from normal system operational conditions, which allows for actions to be taken in advance, usually preventing more serious damages. Long-lasting deviations indicate failures, while sudden, isolated changes in the data indicate anomalies. However, in many practical applications, changes in the data do not always represent abnormal system states. Such changes may be recognized incorrectly as failures, while being a normal evolution of the system, e.g. referring to characteristics of starting the processing of a new product, i.e. realizing a domain shift. Therefore, distinguishing between failures and such ''healthy'' changes in data distribution is critical to ensure the practical robustness of the system. In this paper, we propose a method that not only detects changes in the data distribution and anomalies but also allows us to distinguish between failures and normal domain shifts inherent to a given process. The proposed method consists of a modified Page-Hinkley changepoint detector for identification of the domain shift and possible failures and supervised domain-adaptation-based algorithms for fast, online anomaly detection. These two are coupled with an explainable artificial intelligence (XAI) component that aims at helping the human operator to finally differentiate between domain shifts and failures. The method is illustrated by an experiment on a data stream from the steel factory.

[22]  arXiv:2603.18034 [pdf, ps, other]
Title: Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems
Authors: Scott Thornton
Comments: 10 pages, 5 figures
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Retrieval-Augmented Generation (RAG) systems extend large language models (LLMs) with external knowledge sources but introduce new attack surfaces through the retrieval pipeline. In particular, adversaries can poison retrieval corpora so that malicious documents are preferentially retrieved at inference time, enabling targeted manipulation of model outputs. We study gradient-guided corpus poisoning attacks against modern RAG pipelines and evaluate retrieval-layer defenses that require no modification to the underlying LLM.
We implement dual-document poisoning attacks consisting of a sleeper document and a trigger document optimized using Greedy Coordinate Gradient (GCG). In a large-scale evaluation on the Security Stack Exchange corpus (67,941 documents) with 50 attack attempts, gradient-guided poisoning achieves a 38.0 percent co-retrieval rate under pure vector retrieval.
We show that a simple architectural modification, hybrid retrieval combining BM25 and vector similarity, substantially mitigates this attack. Across all 50 attacks, hybrid retrieval reduces gradient-guided attack success from 38 percent to 0 percent without modifying the model or retraining the retriever. When attackers jointly optimize payloads for both sparse and dense retrieval signals, hybrid retrieval can be partially circumvented, achieving 20-44 percent success, but still significantly raises attack difficulty relative to vector-only retrieval.
Evaluation across five LLM families (GPT-5.3, GPT-4o, Claude Sonnet 4.6, Llama 4, and GPT-4o-mini) shows attack success ranging from 46.7 percent to 93.3 percent. Cross-corpus evaluation on the FEVER Wikipedia dataset (25 attacks) yields 0 percent attack success across all retrieval configurations.

[23]  arXiv:2603.18035 [pdf, ps, other]
Title: Taming Epilepsy: Mean Field Control of Whole-Brain Dynamics
Comments: 22 pages, 7 figures
Subjects: Machine Learning (cs.LG)

Controlling the high-dimensional neural dynamics during epileptic seizures remains a significant challenge due to the nonlinear characteristics and complex connectivity of the brain. In this paper, we propose a novel framework, namely Graph-Regularized Koopman Mean-Field Game (GK-MFG), which integrates Reservoir Computing (RC) for Koopman operator approximation with Alternating Population and Agent Control Network (APAC-Net) for solving distributional control problems. By embedding Electroencephalogram (EEG) dynamics into a linear latent space and imposing graph Laplacian constraints derived from the Phase Locking Value (PLV), our method achieves robust seizure suppression while respecting the functional topological structure of the brain.

[24]  arXiv:2603.18036 [pdf, ps, other]
Title: MST-Direct: Matching via Sinkhorn Transport for Multivariate Geostatistical Simulation with Complex Non-Linear Dependencies
Subjects: Machine Learning (cs.LG)

Multivariate geostatistical simulation requires the faithful reproduction of complex non-linear dependencies among geological variables, including bimodal distributions, step functions, and heteroscedastic relationships. Traditional methods such as the Gaussian Copula and LU Decomposition assume linear correlation structures and often fail to preserve these complex joint distribution patterns. We propose MST-Direct (Matching via Sinkhorn Transport), a novel algorithm based on Optimal Transport theory that uses the Sinkhorn algorithm to directly match multivariate distributions while preserving spatial correlation structures. The method processes all variables simultaneously as a single multidimensional vector, enabling relational matching across the full joint space rather than relying on pairwise linear dependencies.

[25]  arXiv:2603.18037 [pdf, ps, other]
Title: Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization
Authors: Takato Yasuno
Comments: 16 pages, 11 figures, 6 tables
Subjects: Machine Learning (cs.LG)

This paper presents a systematic methodology for building domain-specific Japanese small language models using QLoRA fine-tuning. We address three core questions: optimal training scale, base-model selection, and architecture-aware quantization. Stage 1 (Training scale): Scale-learning experiments (1k--5k samples) identify n=4,000 as optimal, where test-set NLL reaches minimum (1.127) before overfitting at 5k samples. Stage 2 (Compare finetuned SLMs): Comparing four Japanese LLMs shows that Llama-3 models with Japanese continual pre-training (Swallow-8B, ELYZA-JP-8B) outperform multilingual models (Qwen2.5-7B). Stage 3 (Quantization): Llama-3 architectures improve under Q4_K_M quantization, while GQA architectures degrade severely (Qwen2.5: -0.280 points). Production recommendation: Swallow-8B Q4_K_M achieves 2.830/3 score, 8.9 s/question, 4.9 GB size. The methodology generalizes to low-resource technical domains and provides actionable guidance for compact Japanese specialist LMs on consumer hardware.

[26]  arXiv:2603.18039 [pdf, ps, other]
Title: Sharpness Aware Surrogate Training for Spiking Neural Networks
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

Surrogate gradients are a standard tool for training spiking neural networks (SNNs), but conventional hard forward or surrogate backward training couples a nonsmooth forward model with a biased gradient estimator. We study sharpness aware Surrogate Training (SAST), which applies sharpness aware Minimization (SAM) to a surrogate forward SNN trained by backpropagation. In this formulation, the optimization target is an ordinary smooth empirical risk, so the training gradient is exact for the auxiliary model being optimized. Under explicit boundedness and contraction assumptions, we derive compact state stability and input Lipschitz bounds, establish smoothness of the surrogate objective, provide a first order SAM approximation bound, and prove a nonconvex convergence guarantee for stochastic SAST with an independent second minibatch. We also isolate a local mechanism proposition, stated separately from the unconditional guarantees, that links per sample parameter gradient control to smaller input gradient norms under local Jacobian conditioning. Empirically, we evaluate clean accuracy, hard spike transfer, corruption robustness, and training overhead on N-MNIST and DVS Gesture. The clearest practical effect is transfer gap reduction: on N-MNIST, hard spike accuracy rises from 65.7% to 94.7% (best at $\rho=0.30$) while surrogate forward accuracy remains high; on DVS Gesture, hard spike accuracy improves from 31.8% to 63.3% (best at $\rho=0.40$). We additionally specify the compute matched, calibration, and theory alignment controls required for a final practical assessment.

[27]  arXiv:2603.18041 [pdf, ps, other]
Title: Quotient Geometry and Persistence-Stable Metrics for Swarm Configurations
Authors: Mark M. Bailey
Comments: 20 pages
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Systems and Control (eess.SY); Algebraic Topology (math.AT)

Swarm and constellation reconfiguration can be viewed as motion of an unordered point configuration in an ambient space. Here, we provide persistence-stable, symmetry-invariant geometric representations for comparing and monitoring multi-agent configuration data. We introduce a quotient formation space $\mathcal{S}_n(M,G)=M^n/(G\times S_n)$ and a formation matching metric $d_{M,G}$ obtained by optimizing a worst-case assignment error over ambient symmetries $g\in G$ and relabelings $\sigma\in S_n$. This metric is a structured, physically interpretable relaxation of Gromov--Hausdorff distance: the induced inter-agent metric spaces satisfy $d_{\mathrm{GH}}(X_x,X_y)\le d_{M,G}([x],[y])$. Composing this bound with stability of Vietoris--Rips persistence yields $d_B(\Phi_k([x]),\Phi_k([y]))\le d_{M,G}([x],[y])$, providing persistence-stable signatures for reconfiguration monitoring. We analyze the metric geometry of $(\mathcal{S}_n(M,G),d_{M,G})$: under compactness/completeness assumptions on $M$ and compact $G$ it is compact/complete and the metric induces the quotient topology; if $M$ is geodesic then the quotient is geodesic and exhibits stratified singularities along collision and symmetry strata, relating it to classical configuration spaces. We study expressivity of the signatures, identifying symmetry-mismatch and persistence-compression mechanisms for non-injectivity. Finally, in a phase-circle model we prove a conditional inverse theorem: under semicircle support and a gap-labeling margin, the $H_0$ signature is locally bi-Lipschitz to $d_{M,G}$ up to an explicit factor, yielding two-sided control. Examples on $\mathbb{S}^2$ and $\mathbb{T}^m$ illustrate satellite-constellation and formation settings.

[28]  arXiv:2603.18043 [pdf, ps, other]
Title: The Provenance Paradox in Multi-Agent LLM Routing: Delegation Contracts and Attested Identity in LDP
Authors: Sunil Prakash
Comments: 9 pages, 6 figures. Open-source: this https URL
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)

Multi-agent LLM systems delegate tasks across trust boundaries, but current protocols do not govern delegation under unverifiable quality claims. We show that when delegates can inflate self-reported quality scores, quality-based routing produces a provenance paradox: it systematically selects the worst delegates, performing worse than random. We extend the LLM Delegate Protocol (LDP) with delegation contracts that bound authority through explicit objectives, budgets, and failure policies; a claimed-vs-attested identity model that distinguishes self-reported from verified quality; and typed failure semantics enabling automated recovery. In controlled experiments with 10 simulated delegates and validated with real Claude models, routing by self-claimed quality scores performs worse than random selection (simulated: 0.55 vs. 0.68; real models: 8.90 vs. 9.30), while attested routing achieves near-optimal performance (d = 9.51, p < 0.001). Sensitivity analysis across 36 configurations confirms the paradox emerges reliably when dishonest delegates are present. All extensions are backward-compatible with sub-microsecond validation overhead.

[29]  arXiv:2603.18045 [pdf, ps, other]
Title: RARE disease detection from Capsule Endoscopic Videos based on Vision Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work is corresponding to the Gastro Competition for multi-label classification from capsule endoscopic videos (CEV). Deep learning network based on Transformers are fined-tune for this task. The based online mode is Google Vision Transformer (ViT) batch16 with 224 x 224 resolutions. In total, 17 labels are classified, which are mouth, esophagus, stomach, small intestine, colon, z-line, pylorus, ileocecal valve, active bleeding, angiectasia, blood, erosion, erythema, hematin, lymphangioectasis, polyp, and ulcer. For test dataset of three videos, the overall mAP @0.5 is 0.0205 whereas the overall mAP @0.95 is 0.0196.

[30]  arXiv:2603.18046 [pdf, ps, other]
Title: NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference
Comments: 11 pages. Accepted at the VerifAI Workshop at ICLR 2026 (camera-ready version)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

When users query proprietary LLM APIs, they receive outputs with no cryptographic assurance that the claimed model was actually used. Service providers could substitute cheaper models, apply aggressive quantization, or return cached responses - all undetectable by users paying premium prices for frontier capabilities. We present METHOD, a zero-knowledge proof system that makes LLM inference verifiable: users can cryptographically confirm that outputs correspond to the computation of a specific model.
Our approach exploits the fact that transformer inference naturally decomposes into independent layer computations, enabling a layerwise proof framework where each layer generates a constant-size proof regardless of model width. This decomposition sidesteps the scalability barrier facing monolithic approaches and enables parallel proving. We develop lookup table approximations for non-arithmetic operations (softmax, GELU, LayerNorm) that introduce zero measurable accuracy loss, and introduce Fisher information-guided verification for scenarios where proving all layers is impractical.
On transformer models up to d=128, METHOD generates constant-size layer proofs of 5.5KB (2.1KB attention + 3.5KB MLP) with 24 ms verification time. Compared to EZKL, METHOD achieves 70x smaller proofs and 5.7x faster proving time at d=128, while maintaining formal soundness guarantees (epsilon < 1e-37). Lookup approximations preserve model perplexity exactly, enabling verification without quality compromise.

[31]  arXiv:2603.18048 [pdf, ps, other]
Title: DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models
Comments: 14 pages,6 figures
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Recent Audio Multimodal Large Language Models (Audio MLLMs) demonstrate impressive performance on speech benchmarks, yet it remains unclear whether these models genuinely process acoustic signals or rely on text-based semantic inference. To systematically study this question, we introduce DEAF (Diagnostic Evaluation of Acoustic Faithfulness), a benchmark of over 2,700 conflict stimuli spanning three acoustic dimensions: emotional prosody, background sounds, and speaker identity. Then, we design a controlled multi-level evaluation framework that progressively increases textual influence, ranging from semantic conflicts in the content to misleading prompts and their combination, allowing us to disentangle content-driven bias from prompt-induced sycophancy. We further introduce diagnostic metrics to quantify model reliance on textual cues over acoustic signals. Our evaluation of seven Audio MLLMs reveals a consistent pattern of text dominance: models are sensitive to acoustic variations, yet predictions are predominantly driven by textual inputs, revealing a gap between high performance on standard speech benchmarks and genuine acoustic understanding.

[32]  arXiv:2603.18049 [pdf, ps, other]
Title: Conditional Execution of Transpiler Passes Based on Per-Script Feature Detection
Comments: Preprint. Under review at SOAP 2026. Implementation available in Google Closure Compiler
Subjects: Programming Languages (cs.PL); Software Engineering (cs.SE)

As the ECMAScript specification evolves, industrial-scale JavaScript compilers face the challenge of supporting modern language syntax while maintaining compatibility for diverse execution environments. Traditionally, compilers solve this by running transpilation passes in a monolithic pipeline, where the transpilation passes are chosen to execute strictly based on a target language level. This results in significant computational waste, as compilers perform expensive Abstract Syntax Tree (AST) traversals to lower features that may not exist in the actual input source code. We present a compiler improvement that conditionally executes transpiler passes based on accurately tracking and dynamically maintaining the exact set of language features present in the compilation unit throughout the transpilation process. It is implemented in the production Google Closure Compiler. By populating and maintaining a FeatureSet at every JavaScript script-level, it dynamically skips running unnecessary lowering passes. We detail the architectural safeguards -- including strategic pass ordering and dynamic validation of the transpiled code for feature-correctness. Evaluation of this improvement on large-scale production monorepos produced a considerable reduction in compilation time and saved compute and memory usage.

[33]  arXiv:2603.18053 [pdf, ps, other]
Title: Auditing the Auditors: Does Community-based Moderation Get It Right?
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); General Economics (econ.GN); Machine Learning (stat.ML)

Online social platforms increasingly rely on crowd-sourced systems to label misleading content at scale, but these systems must both aggregate users' evaluations and decide whose evaluations to trust. To address the latter, many platforms audit users by rewarding agreement with the final aggregate outcome, a design we term consensus-based auditing. We analyze the consequences of this design in X's Community Notes, which in September 2022 adopted consensus-based auditing that ties users' eligibility for participation to agreement with the eventual platform outcome. We find evidence of strategic conformity: minority contributors' evaluations drift toward the majority and their participation share falls on controversial topics, where independent signals matter most. We formalize this mechanism in a behavioral model in which contributors trade off private beliefs against anticipated penalties for disagreement. Motivated by these findings, we propose a two-stage auditing and aggregation algorithm that weights contributors by the stability of their past residuals rather than by agreement with the majority. The method first accounts for differences across content and contributors, and then measures how predictable each contributor's evaluations are relative to the latent-factor model. Contributors whose evaluations are consistently informative receive greater influence in aggregation, even when they disagree with the prevailing consensus. In the Community Notes data, this approach improves out-of-sample predictive performance while avoiding penalization of disagreement.

[34]  arXiv:2603.18054 [pdf, ps, other]
Title: An FPGA-Based SoC Architecture with a RISC-V Controller for Energy-Efficient Temporal-Coding Spiking Neural Networks
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)

Spiking Neural Networks (SNNs) offer high energy efficiency and event-driven computation, ideal for low-power edge AI. Their hardware implementation on FPGAs, however, faces challenges due to heavy computation, large memory use, and limited flexibility. This paper proposes a compact System-on-Chip (SoC) architecture for temporal-coding SNNs, integrating a RISC-V controller with an event-driven SNN core. It replaces multipliers with bitwise operations using binarized weights, includes a spike-time sorter for active spikes, and skips noninformative events to reduce computation. The architecture runs fully on a Xilinx Artix-7 FPGA, achieving up to 16x memory reduction for weights and lowering computational overhead and latency, with 97.0% accuracy on MNIST and 88.3% on FashionMNIST. This self-contained design provides an efficient, scalable platform for real-time neuromorphic inference at the edge.

[35]  arXiv:2603.18056 [pdf, ps, other]
Title: Fundamental Limits of Neural Network Sparsification: Evidence from Catastrophic Interpretability Collapse
Subjects: Machine Learning (cs.LG)

Extreme neural network sparsification (90% activation reduction) presents a critical challenge for mechanistic interpretability: understanding whether interpretable features survive aggressive compression. This work investigates feature survival under severe capacity constraints in hybrid Variational Autoencoder--Sparse Autoencoder (VAE-SAE) architectures. We introduce an adaptive sparsity scheduling framework that progressively reduces active neurons from 500 to 50 over 50 training epochs, and provide empirical evidence for fundamental limits of the sparsification-interpretability relationship. Testing across two benchmark datasets -- dSprites and Shapes3D -- with both Top-k and L1 sparsification methods, our key finding reveals a pervasive paradox: while global representation quality (measured by Mutual Information Gap) remains stable, local feature interpretability collapses systematically. Under Top-k sparsification, dead neuron rates reach $34.4\pm0.9\%$ on dSprites and $62.7\pm1.3\%$ on Shapes3D at k=50. L1 regularization -- a fundamentally different "soft constraint" paradigm -- produces equal or worse collapse: $41.7\pm4.4\%$ on dSprites and $90.6\pm0.5\%$ on Shapes3D. Extended training for 100 additional epochs fails to recover dead neurons, and the collapse pattern is robust across all tested threshold definitions. Critically, the collapse scales with dataset complexity: Shapes3D (RGB, 6 factors) shows $1.8\times$ more dead neurons than dSprites (grayscale, 5 factors) under Top-k and $2.2\times$ under L1. These findings establish that interpretability collapse under sparsification is intrinsic to the compression process rather than an artifact of any particular algorithm, training duration, or threshold choice.

[36]  arXiv:2603.18059 [pdf, ps, other]
Title: Guardrails as Infrastructure: Policy-First Control for Tool-Orchestrated Workflows
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Tool-using automation systems, from scripts and CI bots to agentic assistants, fail in recurring patterns. Common failures include unsafe side effects, invalid arguments, uncontrolled retries, and leakage of sensitive outputs. Many mitigations are model-centric and prompt-dependent, so they are brittle and do not generalize to non-LLM callers. We present Policy-First Tooling, a model-agnostic permission layer that mediates tool invocation through explicit constraints, risk-aware gating, recovery controls, and auditable explanations. The paper contributes a compact policy DSL, a runtime enforcement architecture with actionable rationale and fix hints, and a reproducible benchmark based on trace replay with controlled fault and misuse injection. In 225 controlled runs across five policy packs and three fault profiles, stricter packs improve violation prevention from 0.000 in P0 to 0.681 in P4, while task success drops from 0.356 to 0.067. Retry amplification decreases from 3.774 in P0 to 1.378 in P4, and leakage recall reaches 0.875 under injected secret outputs. These results make safety to utility trade-offs explicit and measurable.

[37]  arXiv:2603.18062 [pdf, ps, other]
Title: S3T-Former: A Purely Spike-Driven State-Space Topology Transformer for Skeleton Action Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Skeleton-based action recognition is crucial for multimedia applications but heavily relies on power-hungry Artificial Neural Networks (ANNs), limiting their deployment on resource-constrained edge devices. Spiking Neural Networks (SNNs) provide an energy-efficient alternative; however, existing spiking models for skeleton data often compromise the intrinsic sparsity of SNNs by resorting to dense matrix aggregations, heavy multimodal fusion modules, or non-sparse frequency domain transformations. Furthermore, they severely suffer from the short-term amnesia of spiking neurons. In this paper, we propose the Spiking State-Space Topology Transformer (S3T-Former), which, to the best of our knowledge, is the first purely spike-driven Transformer architecture specifically designed for energy-efficient skeleton action recognition. Rather than relying on heavy fusion overhead, we formulate a Multi-Stream Anatomical Spiking Embedding (M-ASE) that acts as a generalized kinematic differential operator, elegantly transforming multimodal skeleton features into heterogeneous, highly sparse event streams. To achieve true topological and temporal sparsity, we introduce Lateral Spiking Topology Routing (LSTR) for on-demand conditional spike propagation, and a Spiking State-Space (S3) Engine to systematically capture long-range temporal dynamics without non-sparse spectral workarounds. Extensive experiments on multiple large-scale datasets demonstrate that S3T-Former achieves highly competitive accuracy while theoretically reducing energy consumption compared to classic ANNs, establishing a new state-of-the-art for energy-efficient neuromorphic action recognition.

[38]  arXiv:2603.18063 [pdf, ps, other]
Title: MCP-38: A Comprehensive Threat Taxonomy for Model Context Protocol Systems (v1.0)
Comments: v1.0
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

The Model Context Protocol (MCP) introduces a structurally distinct attack surface that existing threat frameworks, designed for traditional software systems or generic LLM deployments, do not adequately cover. This paper presents MCP-38, a protocol-specific threat taxonomy consisting of 38 threat categories (MCP-01 through MCP-38). The taxonomy was derived through a systematic four-phase methodology: protocol decomposition, multi-framework cross-mapping, real-world incident synthesis, and remediation-surface categorization. Each category is mapped to STRIDE, OWASP Top 10 for LLM Applications (2025, LLM01--LLM10), and the OWASP Top 10 for Agentic Applications (2026, ASI01--ASI10). MCP-38 addresses critical threats arising from MCP's semantic attack surface (tool description poisoning, indirect prompt injection, parasitic tool chaining, and dynamic trust violations), none of which are adequately captured by prior work. MCP-38 provides the definitional and empirical foundation for automated threat intelligence platforms.

[39]  arXiv:2603.18064 [pdf, ps, other]
Title: A vision for a colorectal digital twin that enables proactive and personalized disease management
Subjects: Emerging Technologies (cs.ET)

Colorectal cancer, inflammatory bowel disease, and diverticular disease are progressive conditions that affect millions of individuals worldwide and impose substantial clinical and economic burdens. Early detection and personalized management are essential for slowing disease progression and improving patient outcomes. Current care pathways rely primarily on episodic clinical encounters, laboratory testing, and reactive interventions, limiting early detection and personalized longitudinal management. This paper introduces a conceptual framework for an integrated colorectal digital twin that supports non-invasive, continuous monitoring and personalized disease management. The framework integrates multimodal physiological and behavioral data streams, hybrid mechanistic-machine learning modeling of colorectal function, and a personalized artificial intelligence engine to support proactive disease management. Rather than presenting a deployed clinical system, this work outlines a clear vision and a structured approach for colorectal digital twins, identifying key technical, modeling, and translational challenges necessary for future implementation and validation.

[40]  arXiv:2603.18066 [pdf, ps, other]
Title: A Synthesizable RTL Implementation of Predictive Coding Networks
Authors: Timothy Oh
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Machine Learning (cs.LG)

Backpropagation has enabled modern deep learning but is difficult to realize as an online, fully distributed hardware learning system due to global error propagation, phase separation, and heavy reliance on centralized memory. Predictive coding offers an alternative in which inference and learning arise from local prediction-error dynamics between adjacent layers. This paper presents a digital architecture that implements a discrete-time predictive coding update directly in hardware. Each neural core maintains its own activity, prediction error, and synaptic weights, and communicates only with adjacent layers through hardwired connections. Supervised learning and inference are supported via a uniform per-neuron clamping primitive that enforces boundary conditions while leaving the internal update schedule unchanged. The design is a deterministic, synthesizable RTL substrate built around a sequential MAC datapath and a fixed finite-state schedule. Rather than executing a task-specific instruction sequence inside the learning substrate, the system evolves under fixed local update rules, with task structure imposed through connectivity, parameters, and boundary conditions. The contribution of this work is not a new learning rule, but a complete synthesizable digital substrate that executes predictive-coding learning dynamics directly in hardware.

[41]  arXiv:2603.18067 [pdf, ps, other]
Title: DarkDriving: A Real-World Day and Night Aligned Dataset for Autonomous Driving in the Dark Environment
Comments: 8 pages, 8 figures. Accepted to ICRA 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)

The low-light conditions are challenging to the vision-centric perception systems for autonomous driving in the dark environment. In this paper, we propose a new benchmark dataset (named DarkDriving) to investigate the low-light enhancement for autonomous driving. The existing real-world low-light enhancement benchmark datasets can be collected by controlling various exposures only in small-ranges and static scenes. The dark images of the current nighttime driving datasets do not have the precisely aligned daytime counterparts. The extreme difficulty to collect a real-world day and night aligned dataset in the dynamic driving scenes significantly limited the research in this area. With a proposed automatic day-night Trajectory Tracking based Pose Matching (TTPM) method in a large real-world closed driving test field (area: 69 acres), we collected the first real-world day and night aligned dataset for autonomous driving in the dark environment. The DarkDriving dataset has 9,538 day and night image pairs precisely aligned in location and spatial contents, whose alignment error is in just several centimeters. For each pair, we also manually label the object 2D bounding boxes. DarkDriving introduces four perception related tasks, including low-light enhancement, generalized low-light enhancement, and low-light enhancement for 2D detection and 3D detection of autonomous driving in the dark environment. The experimental results show that our DarkDriving dataset provides a comprehensive benchmark for evaluating low-light enhancement for autonomous driving and it can also be generalized to enhance dark images and promote detection in some other low-light driving environment, such as nuScenes.

[42]  arXiv:2603.18069 [pdf, ps, other]
Title: Robust Global Position and Heading Tracking on SE(3) via Saturated Hybrid Feedback
Subjects: Systems and Control (eess.SY)

This letter presents a novel control solution to the robust global position and heading tracking problem for underactuated vehicles, equipped with single-axis thrust and full torque actuation, operating under strict, user-defined actuation limits. The architecture features a saturated position tracking controller augmented with two first-order filters. This formulation ensures the boundedness of the first and second derivatives, yielding less conservative bounds and systematically generating bounded attitude references whose limits are easily tuned via design parameters. To track these dynamic references, the inner loop comprises a saturated, modified Rodrigues parameter (MRP)-based controller paired with a hybrid dynamic path-lifting mechanism. This approach allows the attitude tracking law to be designed on a covering space of the configuration manifold. By leveraging a stability equivalence framework, the methodology establishes that the resulting interconnected system achieves robust global asymptotic and semi-global exponential tracking on SE(3), while complying with user-defined input saturation bounds. Numerical simulations validate the proposed solution.

[43]  arXiv:2603.18071 [pdf, ps, other]
Title: Circumventing Platform Defenses at Scale: Automated Content Replication from YouTube to Blockchain-Based Decentralized Storage
Authors: Zeeshan Akram
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

We present YouTube-Synch [1], a production system for automated, large-scale content extraction and replication from YouTube to decentralized storage on Joystream. The system continuously mirrors videos from more than 10,000 creator-authorized channels while handling platform constraints such as API quotas, rate limiting, bot detection, and OAuth token churn. We report a 3.5-year longitudinal case study covering 15 releases and 144 pull requests, from early API dependence to API-free operation. A key finding is that YouTube's defense layers are operationally coupled: bypassing one control often triggers another, creating cascading failures. We analyze three incidents with measured impact: 28 duplicate on-chain objects caused by database throughput issues, loss of over 10,000 channels after OAuth mass expiration, and 719 daily errors from queue pollution. For each, we describe the architectural response. Contributions include a three-generation proxy stack with behavior variance injection, a trust-minimized ownership verification protocol that replaces OAuth for channel control, write-ahead logging with cross-system state reconciliation, and containerized deployment. Results show that sustained architectural adaptation can maintain reliable cross-platform replication at production scale.

[44]  arXiv:2603.18073 [pdf, ps, other]
Title: Continually self-improving AI
Authors: Zitong Yang
Comments: PhD thesis
Subjects: Artificial Intelligence (cs.AI)

Modern language model-based AI systems are remarkably powerful, yet their capabilities remain fundamentally capped by their human creators in three key ways. First, although a model's weights can be updated via fine-tuning, acquiring new knowledge from small, specialized corpora after pretraining remains highly data-inefficient. Second, the training of these systems relies heavily on finite, human-generated data from across history. Third, the pipelines used to train AI models are confined by the algorithms that human researchers can discover and explore. This thesis takes a small step toward overcoming these inherent limitations, presenting three chapters aimed at breaking these dependencies to create continually self-improving AI. First, to overcome this data-efficiency barrier in knowledge acquisition, we propose a synthetic data approach that diversifies and amplifies small corpora into rich knowledge representations, enabling a model to effectively update its parameters from limited source material. Second, to reduce reliance on human data, we show that given a fixed amount of such data, the model can self-generate synthetic data to bootstrap its fundamental pretraining capabilities without distillation from any off-the-shelf, instruction-tuned LM. Finally, to transcend human-engineered training paradigms, we demonstrate that by scaling search during test time over the space of algorithms, AI can search over a larger space of learning algorithm configurations than human researchers can explore manually.

[45]  arXiv:2603.18074 [pdf, ps, other]
Title: Lightweight Adaptation for LLM-based Technical Service Agent: Latent Logic Augmentation and Robust Noise Reduction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Applications (stat.AP)

Adapting Large Language Models in complex technical service domains is constrained by the absence of explicit cognitive chains in human demonstrations and the inherent ambiguity arising from the diversity of valid responses. These limitations severely hinder agents from internalizing latent decision dynamics and generalizing effectively. Moreover, practical adaptation is often impeded by the prohibitive resource and time costs associated with standard training paradigms. To overcome these challenges and guarantee computational efficiency, we propose a lightweight adaptation framework comprising three key contributions. (1) Latent Logic Augmentation: We introduce Planning-Aware Trajectory Modeling and Decision Reasoning Augmentation to bridge the gap between surface-level supervision and latent decision logic. These approaches strengthen the stability of Supervised Fine-Tuning alignment. (2) Robust Noise Reduction: We construct a Multiple Ground Truths dataset through a dual-filtering method to reduce the noise by validating diverse responses, thereby capturing the semantic diversity. (3) Lightweight Adaptation: We design a Hybrid Reward mechanism that fuses an LLM-based judge with a lightweight relevance-based Reranker to distill high-fidelity reward signals while reducing the computational cost compared to standard LLM-as-a-Judge reinforcement learning. Empirical evaluations on real-world Cloud service tasks, conducted across semantically diverse settings, demonstrate that our framework achieves stability and performance gains through Latent Logic Augmentation and Robust Noise Reduction. Concurrently, our Hybrid Reward mechanism achieves alignment comparable to standard LLM-as-a-judge methods with reduced training time, underscoring the practical value for deploying technical service agents.

[46]  arXiv:2603.18077 [pdf, ps, other]
Title: A New Approach to Code Smoothing Bounds
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

To analyze the security of code-based cryptosystems, the smoothing parameter, which is closely related to the total variation distance of codes, has been investigated. While previous studies have bounded this distance using the Fourier transform on locally compact abelian groups, we take an alternative approach based on random walks. In this paper, we derive an inequality for the total variation distance of random walks using equitable partitions, and we show that our proposed bound generalizes existing results for finite abelian groups.

[47]  arXiv:2603.18078 [pdf, ps, other]
Title: Variational Phasor Circuits for Phase-Native Brain-Computer Interface Classification
Authors: Dibakar Sigdel
Subjects: Machine Learning (cs.LG)

We present the \textbf{Variational Phasor Circuit (VPC)}, a deterministic classical learning architecture operating on the continuous $S^1$ unit circle manifold. Inspired by variational quantum circuits, VPC replaces dense real-valued weight matrices with trainable phase shifts, local unitary mixing, and structured interference in the ambient complex space. This phase-native design provides a unified method for both binary and multi-class classification of spatially distributed signals. A single VPC block supports compact phase-based decision boundaries, while stacked VPC compositions extend the model to deeper circuits through inter-block pull-back normalization. Using synthetic brain-computer interface benchmarks, we show that VPC can decode difficult mental-state classification tasks with competitive accuracy and substantially fewer trainable parameters than standard Euclidean baselines. These results position unit-circle phase interference as a practical and mathematically principled alternative to dense neural computation, and motivate VPC as both a standalone classifier and a front-end encoding layer for future hybrid phasor-quantum systems.

[48]  arXiv:2603.18079 [pdf, ps, other]
Title: SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Large Language Model (LLM) agents have shown strong results on multi-turn tool-use tasks, yet they operate in isolation during training, failing to leverage experiences accumulated across episodes. Existing experience-augmented methods address this by organizing trajectories into retrievable libraries, but they retrieve experiences only once based on the initial task description and hold them constant throughout the episode. In multi-turn settings where observations change at every step, this static retrieval becomes increasingly mismatched as episodes progress. We propose SLEA-RL (Step-Level Experience-Augmented Reinforcement Learning), a framework that retrieves relevant experiences at each decision step conditioned on the current observation. SLEA-RL operates through three components: (i) step-level observation clustering that groups structurally equivalent environmental states for efficient cluster-indexed retrieval; (ii) a self-evolving experience library that distills successful strategies and failure patterns through score-based admission and rate-limited extraction; and (iii) policy optimization with step-level credit assignment for fine-grained advantage estimation across multi-turn episodes. The experience library evolves alongside the policy through semantic analysis rather than gradient updates. Experiments on long-horizon multi-turn agent benchmarks demonstrate that SLEA-RL achieves superior performance compared to various reinforcement learning baselines.

[49]  arXiv:2603.18080 [pdf, ps, other]
Title: Growing Alphabets Do Not Automatically Amplify Shuffle Privacy: Obstruction, Estimation Bounds, and Optimal Mechanism Design
Authors: Alex Shvets
Comments: 40 pages, no figures
Subjects: Information Theory (cs.IT)

We study neighboring shuffle experiments for epsilon_0-LDP channels along growing alphabets d -> infinity, and optimal mechanism design for frequency estimation under a canonical pairwise chi-squared budget.
On the privacy side, we prove an exact compression theorem: the shuffled histogram experiment depends only on the pushforward law of the pairwise likelihood ratio. We establish a sharp universal bound chi^2 <= (e^{epsilon_0}-1)^2/e^{epsilon_0}, construct explicit obstruction families for which the shuffled privacy curve equals binary randomized response for all d, and prove a sharp diluting/persistent dichotomy.
On the estimation side, we prove a universal lower bound of order (d-1)/(n chi_*(W)) via Cramer-Rao and Assouad arguments, and show that symmetrization to equivariant channels is WLOG.
On the design side, we show calibrated GRR is not optimal. The optimal mechanism is an augmented GRR: fraction p of users applies aggressive GRR with lambda_* = sqrt(d-1), the rest sends a null symbol. This thinning principle is specific to shuffle and has no local-DP counterpart. For low budget 0 < C <= C_*(d), augmented GRR is optimal among all permutation-equivariant channels. GRR is also the unique optimizer within the subset-selection family.

[50]  arXiv:2603.18082 [pdf, ps, other]
Title: EgoAdapt: Enhancing Robustness in Egocentric Interactive Speaker Detection Under Missing Modalities
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

TTM (Talking to Me) task is a pivotal component in understanding human social interactions, aiming to determine who is engaged in conversation with the camera-wearer. Traditional models often face challenges in real-world scenarios due to missing visual data, neglecting the role of head orientation, and background noise. This study addresses these limitations by introducing EgoAdapt, an adaptive framework designed for robust egocentric "Talking to Me" speaker detection under missing modalities. Specifically, EgoAdapt incorporates three key modules: (1) a Visual Speaker Target Recognition (VSTR) module that captures head orientation as a non-verbal cue and lip movement as a verbal cue, allowing a comprehensive interpretation of both verbal and non-verbal signals to address TTM, setting it apart from tasks focused solely on detecting speaking status; (2) a Parallel Shared-weight Audio (PSA) encoder for enhanced audio feature extraction in noisy environments; and (3) a Visual Modality Missing Awareness (VMMA) module that estimates the presence or absence of each modality at each frame to adjust the system response dynamically.Comprehensive evaluations on the TTM benchmark of the Ego4D dataset demonstrate that EgoAdapt achieves a mean Average Precision (mAP) of 67.39% and an Accuracy (Acc) of 62.01%, significantly outperforming the state-of-the-art method by 4.96% in Accuracy and 1.56% in mAP.

[51]  arXiv:2603.18083 [pdf, ps, other]
Title: Probabilistic Federated Learning on Uncertain and Heterogeneous Data with Model Personalization
Comments: Accepted at IEEE Transactions on Emerging Topics in Computational Intelligence
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Conventional federated learning (FL) frameworks often suffer from training degradation due to data uncertainty and heterogeneity across local clients. Probabilistic approaches such as Bayesian neural networks (BNNs) can mitigate this issue by explicitly modeling uncertainty, but they introduce additional runtime, latency, and bandwidth overhead that has rarely been studied in federated settings. To address these challenges, we propose Meta-BayFL, a personalized probabilistic FL method that combines meta-learning with BNNs to improve training under uncertain and heterogeneous data. The framework is characterized by three main features: (1) BNN-based client models incorporate uncertainty across hidden layers to stabilize training on small and noisy datasets, (2) meta-learning with adaptive learning rates enables personalized updates that enhance local training under non-IID conditions, and (3) a unified probabilistic and personalized design improves the robustness of global model aggregation. We provide a theoretical convergence analysis and characterize the upper bound of the global model over communication rounds. In addition, we evaluate computational costs (runtime, latency, and communication) and discuss the feasibility of deployment on resource-constrained devices such as edge nodes and IoT systems. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that Meta-BayFL consistently outperforms state-of-the-art methods, including both standard and personalized FL approaches (e.g., pFedMe, Ditto, FedFomo), with up to 7.42\% higher test accuracy.

[52]  arXiv:2603.18084 [pdf, ps, other]
Title: Uncovering Latent Phase Structures and Branching Logic in Locomotion Policies: A Case Study on HalfCheetah
Comments: Accepted at XAI-2026: The 4th World Conference on eXplainable Artificial Intelligence
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

In locomotion control tasks, Deep Reinforcement Learning (DRL) has demonstrated high performance; however, the decision-making process of the learned policy remains a black box, making it difficult for humans to understand. On the other hand, in periodic motions such as walking, it is well known that implicit motion phases exist, such as the stance phase and the swing phase. Focusing on this point, this study hypothesizes that a policy trained for locomotion control may also represent a phase structure that is interpretable by humans. To examine this hypothesis in a controlled setting, we consider a locomotion task that is amenable to observing whether a policy autonomously acquires temporally structured phases through interaction with the environment. To verify this hypothesis, in the MuJoCo locomotion benchmark HalfCheetah-v5, the state transition sequences acquired by a policy trained for walking control through interaction with the environment were aggregated into semantic phases based on state similarity and consistency of subsequent transitions. As a result, we demonstrated that the state sequences generated by the trained policy exhibit periodic phase transition structures as well as phase branching. Furthermore, by approximating the states and actions corresponding to each semantic phase using Explainable Boosting Machines (EBMs), we analyzed phase-dependent decision making-namely, which state features the policy function attends to and how it controls action outputs in each phase. These results suggest that neural network-based policies, which are often regarded as black boxes, can autonomously acquire interpretable phase structures and logical branching mechanisms.

[53]  arXiv:2603.18085 [pdf, ps, other]
Title: Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction
Subjects: Artificial Intelligence (cs.AI)

Recent incidents have highlighted alarming cases where human-AI interactions led to negative psychological outcomes, including mental health crises and even user harm. As LLMs serve as sources of guidance, emotional support, and even informal therapy, these risks are poised to escalate. However, studying the mechanisms underlying harmful human-AI interactions presents significant methodological challenges, where organic harmful interactions typically develop over sustained engagement, requiring extensive conversational context that are difficult to simulate in controlled settings. To address this gap, we developed a Multi-Trait Subspace Steering (MultiTraitsss) framework that leverages established crisis-associated traits and novel subspace steering framework to generate Dark models that exhibits cumulative harmful behavioral patterns. Single-turn and multi-turn evaluations show that our dark models consistently produce harmful interaction and outcomes. Using our Dark models, we propose protective measure to reduce harmful outcomes in Human-AI interactions.

[54]  arXiv:2603.18086 [pdf, ps, other]
Title: SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The Segment Anything Model (SAM) excels at general image segmentation but has limited ability to understand natural language, which restricts its direct application in Referring Expression Segmentation (RES). Toward this end, we propose SSP-SAM, a framework that fully utilizes SAM's segmentation capabilities by integrating a Semantic-Spatial Prompt (SSP) encoder. Specifically, we incorporate both visual and linguistic attention adapters into the SSP encoder, which highlight salient objects within the visual features and discriminative phrases within the linguistic features. This design enhances the referent representation for the prompt generator, resulting in high-quality SSPs that enable SAM to generate precise masks guided by language. Although not specifically designed for Generalized RES (GRES), where the referent may correspond to zero, one, or multiple objects, SSP-SAM naturally supports this more flexible setting without additional modifications. Extensive experiments on widely used RES and GRES benchmarks confirm the superiority of our method. Notably, our approach generates segmentation masks of high quality, achieving strong precision even at strict thresholds such as Pr@0.9. Further evaluation on the PhraseCut dataset demonstrates improved performance in open-vocabulary scenarios compared to existing state-of-the-art RES methods. The code and checkpoints are available at: https://github.com/WayneTomas/SSP-SAM.

[55]  arXiv:2603.18088 [pdf, ps, other]
Title: Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Constraints are essential for stabilizing reinforcement learning fine-tuning (RFT) and preventing degenerate outputs, yet they inherently conflict with the optimization objective because stronger constraints limit the ability of a fine-tuned model to discover better solutions. We propose \textit{dynamic constraints} that resolve this tension by adapting to the evolving capabilities of the fine-tuned model based on the insight that constraints should only intervene when degenerate outputs occur. We implement this by using a reference model as an \textit{online refiner} that takes the response from the fine-tuned model and generates a minimally corrected version which preserves correct content verbatim while fixing errors. A supervised fine-tuning loss then trains the fine-tuned model to produce the refined output. This mechanism yields a constraint that automatically strengthens or relaxes based on output quality. Experiments on dialogue and code generation show that dynamic constraints outperform both KL regularization and unconstrained baselines, achieving substantially higher task rewards while maintaining training stability.

[56]  arXiv:2603.18089 [pdf, ps, other]
Title: CytoSyn: a Foundation Diffusion Model for Histopathology -- Tech Report
Comments: 21 pages, 5 figures, tech report, model page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Computational pathology has made significant progress in recent years, fueling advances in both fundamental disease understanding and clinically ready tools. This evolution is driven by the availability of large amounts of digitized slides and specialized deep learning methods and models. Multiple self-supervised foundation feature extractors have been developed, enabling downstream predictive applications from cell segmentation to tumor sub-typing and survival analysis. In contrast, generative foundation models designed specifically for histopathology remain scarce. Such models could address tasks that are beyond the capabilities of feature extractors, such as virtual staining. In this paper, we introduce CytoSyn, a state-of-the-art foundation latent diffusion model that enables the guided generation of highly realistic and diverse histopathology H&E-stained images, as shown in an extensive benchmark. We explored methodological improvements, training set scaling, sampling strategies and slide-level overfitting, culminating in the improved CytoSyn-v2, and compared our work to PixCell, a state-of-the-art model, in an in-depth manner. This comparison highlighted the strong sensitivity of both diffusion models and performance metrics to preprocessing-specific details such as JPEG compression. Our model has been trained on a dataset obtained from more than 10,000 TCGA diagnostic whole-slide images of 32 different cancer types. Despite being trained only on oncology slides, it maintains state-of-the-art performance generating inflammatory bowel disease images. To support the research community, we publicly release CytoSyn's weights, its training and validation datasets, and a sample of synthetic images in this repository: https://huggingface.co/Owkin-Bioptimus/CytoSyn.

[57]  arXiv:2603.18090 [pdf, ps, other]
Title: MOSS-TTS Technical Report
Comments: Project page: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This technical report presents MOSS-TTS, a speech generation foundation model built on a scalable recipe: discrete audio tokens, autoregressive modeling, and large-scale pretraining. Built on MOSS-Audio-Tokenizer, a causal Transformer tokenizer that compresses 24 kHz audio to 12.5 fps with variable-bitrate RVQ and unified semantic-acoustic representations, we release two complementary generators: MOSS-TTS, which emphasizes structural simplicity, scalability, and long-context/control-oriented deployment, and MOSS-TTS-Local-Transformer, which introduces a frame-local autoregressive module for higher modeling efficiency, stronger speaker preservation, and a shorter time to first audio. Across multilingual and open-domain settings, MOSS-TTS supports zero-shot voice cloning, token-level duration control, phoneme-/pinyin-level pronunciation control, smooth code-switching, and stable long-form generation. This report summarizes the design, training recipe, and empirical characteristics of the released models.

[58]  arXiv:2603.18091 [pdf, ps, other]
Title: Action Draft and Verify: A Self-Verifying Framework for Vision-Language-Action Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Vision-Language-Action (VLA) models have recently demonstrated strong performance across embodied tasks. Modern VLAs commonly employ diffusion action experts to efficiently generate high-precision continuous action chunks, while auto-regressive generation can be slower and less accurate at low-level control. Yet auto-regressive paradigms still provide complementary priors that can improve robustness and generalization in out-of-distribution environments. To leverage both paradigms, we propose Action-Draft-and-Verify (ADV): diffusion action expert drafts multiple candidate action chunks, and the VLM selects one by scoring all candidates in a single forward pass with a perplexity-style metric. Under matched backbones, training data, and action-chunk length, ADV improves success rate by +4.3 points in simulation and +19.7 points in real-world over diffusion-based baseline, with a single-pass VLM reranking overhead.

[59]  arXiv:2603.18092 [pdf, ps, other]
Title: A Vision-based Framework for Intelligent gNodeB Mobility Control
Subjects: Networking and Internet Architecture (cs.NI)

This paper proposes a vision-based framework for the intelligent control of mobile Open Radio Access Network (O-RAN) base stations (gNBs) operating in dynamic wireless environments. The framework comprises three innovative components. The first is the introduction of novel Service Models (SMs) within a vision-enabled O-RAN architecture, termed VisionRAN. These SMs extend state-of-the-art O-RAN-based architectures by enabling the transmission of vision-based sensing data and gNB positioning control messages. The second is an O-RAN xApp, VisionApp, which fuses vision and radio data, and uses this information to control the position of a mobile gNB, using a Deep Q-Network (DQN). The third is a digital twin environment, VisionTwin, which incorporates vision data and can emulate realistic wireless scenarios; this digital twin was used to train the DQN running in VisionApp and validate the overall system. Experimental results, obtained using real vision data and an emulated radio, demonstrate that the proposed approach reduces the duration of Line-of-Sight (LoS) blockages by up to 75% compared to a static gNB. These findings confirm the viability of integrating multimodal perception and learning-based control within RANs.

[60]  arXiv:2603.18093 [pdf, ps, other]
Title: One-to-More: High-Fidelity Training-Free Anomaly Generation with Attention Control
Comments: Accepted by CVPR2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Industrial anomaly detection (AD) is characterized by an abundance of normal images but a scarcity of anomalous ones. Although numerous few-shot anomaly synthesis methods have been proposed to augment anomalous data for downstream AD tasks, most existing approaches require time-consuming training and struggle to learn distributions that are faithful to real anomalies, thereby restricting the efficacy of AD models trained on such data. To address these limitations, we propose a training-free few-shot anomaly generation method, namely O2MAG, which leverages the self-attention in One reference anomalous image to synthesize More realistic anomalies, supporting effective downstream anomaly detection. Specifically, O2MAG manipulates three parallel diffusion processes via self-attention grafting and incorporates the anomaly mask to mitigate foreground-background query confusion, synthesizing text-guided anomalies that closely adhere to real anomalous distributions. To bridge the semantic gap between the encoded anomaly text prompts and the true anomaly semantics, Anomaly-Guided Optimization is further introduced to align the synthesis process with the target anomalous distribution, steering the generation toward realistic and text-consistent anomalies. Moreover, to mitigate faint anomaly synthesis inside anomaly masks, Dual-Attention Enhancement is adopted during generation to reinforce both self- and cross-attention on masked regions. Extensive experiments validate the effectiveness of O2MAG, demonstrating its superior performance over prior state-of-the-art methods on downstream AD tasks.

[61]  arXiv:2603.18094 [pdf, ps, other]
Title: Token Economy for Fair and Efficient Dynamic Resource Allocation in Congestion Games
Subjects: Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY)

Self-interested behavior in sharing economies often leads to inefficient aggregate outcomes compared to a centrally coordinated allocation, ultimately harming users. Yet, centralized coordination removes individual decision power. This issue can be addressed by designing rules that align individual preferences with system-level objectives. Unfortunately, rules based on conventional monetary mechanisms introduce unfairness by discriminating among users based on their wealth. To solve this problem, in this paper, we propose a token-based mechanism for congestion games that achieves efficient and fair dynamic resource allocation. Specifically, we model the token economy as a continuous-time dynamic game with finitely many boundedly rational agents, explicitly capturing their evolutionary policy-revision dynamics. We derive a mean-field approximation of the finite-population game and establish strong approximation guarantees between the mean-field and the finite-population games. This approximation enables the design of integer tolls in closed form that provably steer the aggregate dynamics toward an optimal efficient and fair allocation from any initial condition.

[62]  arXiv:2603.18095 [pdf, ps, other]
Title: Q-Drift: Quantization-Aware Drift Correction for Diffusion Model Sampling
Comments: 29 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Post-training quantization (PTQ) is a practical path to deploy large diffusion models, but quantization noise can accumulate over the denoising trajectory and degrade generation quality. We propose Q-Drift, a principled sampler-side correction that treats quantization error as an implicit stochastic perturbation on each denoising step and derives a marginal-distribution-preserving drift adjustment. Q-Drift estimates a timestep-wise variance statistic from calibration, in practice requiring as few as 5 paired full-precision/quantized calibration runs. The resulting sampler correction is plug-and-play with common samplers, diffusion models, and PTQ methods, while incurring negligible overhead at inference. Across six diverse text-to-image models (spanning DiT and U-Net), three samplers (Euler, flow-matching, DPM-Solver++), and two PTQ methods (SVDQuant, MixDQ), Q-Drift improves FID over the corresponding quantized baseline in most settings, with up to 4.59 FID reduction on PixArt-Sigma (SVDQuant W3A4), while preserving CLIP scores.

[63]  arXiv:2603.18096 [pdf, ps, other]
Title: A Trace-Based Assurance Framework for Agentic AI Orchestration: Contracts, Testing, and Governance
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)

In Agentic AI, Large Language Models (LLMs) are increasingly used in the orchestration layer to coordinate multiple agents and to interact with external services, retrieval components, and shared memory. In this setting, failures are not limited to incorrect final outputs. They also arise from long-horizon interaction, stochastic decisions, and external side effects (such as API calls, database writes, and message sends). Common failures include non-termination, role drift, propagation of unsupported claims, and attacks via untrusted context or external channels.
This paper presents an assurance framework for such Agentic AI systems. Executions are instrumented as Message-Action Traces (MAT) with explicit step and trace contracts. Contracts provide machine-checkable verdicts, localize the first violating step, and support deterministic replay. The framework includes stress testing, formulated as a budgeted counterexample search over bounded perturbations. It also supports structured fault injection at service, retrieval, and memory boundaries to assess containment under realistic operational faults and degraded conditions. Finally, governance is treated as a runtime component, enforcing per-agent capability limits and action mediation (allow, rewrite, block) at the language-to-action boundary.
To support comparative evaluations across stochastic seeds, models, and orchestration configurations, the paper defines trace-based metrics for task success, termination reliability, contract compliance, factuality indicators, containment rate, and governance outcome distributions. More broadly, the framework is intended as a common abstraction to support testing and evaluation of multi-agent LLM systems, and to facilitate reproducible comparison across orchestration designs and configurations.

[64]  arXiv:2603.18101 [pdf, ps, other]
Title: Training-Only Heterogeneous Image-Patch-Text Graph Supervision for Advancing Few-Shot Learning Adapters
Comments: Accepted at The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recent adapter-based CLIP tuning (e.g., Tip-Adapter) is a strong few-shot learner, achieving efficiency by caching support features for fast prototype matching. However, these methods rely on global uni-modal feature vectors, overlooking fine-grained patch relations and their structural alignment with class text. To bridge this gap without incurring inference costs, we introduce a novel asymmetric training-only framework. Instead of altering the lightweight adapter, we construct a high-capacity auxiliary Heterogeneous Graph Teacher that operates solely during training. This teacher (i) integrates multi-scale visual patches and text prompts into a unified graph, (ii) performs deep cross-modal reasoning via a Modality-aware Graph Transformer (MGT), and (iii) applies discriminative node filtering to extract high-fidelity class features. Crucially, we employ a cache-aware dual-objective strategy to supervise this relational knowledge directly into the Tip-Adapter's key-value cache, effectively upgrading the prototypes while the graph teacher is discarded at test time. Thus, inference remains identical to Tip-Adapter with zero extra latency or memory. Across standard 1-16-shot benchmarks, our method consistently establishes a new state-of-the-art. Ablations confirm that the auxiliary graph supervision, text-guided reasoning, and node filtering are the essential ingredients for robust few-shot adaptation. Code is available at https://github.com/MR-Sherif/TOGA.git.

[65]  arXiv:2603.18102 [pdf, ps, other]
Title: HWE-Bench: Can Language Models Perform Board-level Schematic Designs?
Subjects: Hardware Architecture (cs.AR)

Large Language Models (LLMs) have demonstrated significant potential in various engineering tasks, including software development, digital logic generation, and companion document maintenance. However, their ability to perform board-level circuit design is understudied, as this task requires a synergized understanding of real-world physics and Integrated Circuit (IC) datasheets, the latter comprising detailed specifications for individual components. To address this challenge, we propose \hweb, an evaluation framework that benchmarks the ability of LLMs to perform such designs. It consists of 300 board-level design tasks pulled from open-source and crowdsourcing platforms such as GitHub and OSHWLab, covering 8 application domains, and is complemented with a knowledge base of 2,914 real IC datasheets. For each task, the LLMs are tasked with generating a schematic from scratch, using the provided circuit functional requirements and a set of component datasheets as input. The resulting schematic will be checked against a static electrical rules, and then passed to a circuit simulator to verify its dynamic behavior. Our evaluation show that although current models achieve initial engineering usability and documentation understanding, they lack physical intuition, as the top-performing model achieved an overall pass rate of 8.15\%. We envision that advancements on \hweb\ will pave the way for the development of practical Electronic Design Automation (EDA) agents, revolutionizing the field of board-level design.

[66]  arXiv:2603.18103 [pdf, ps, other]
Title: STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)

With the widespread deployment of deep-learning-based speech models in security-critical applications, backdoor attacks have emerged as a serious threat: an adversary who poisons a small fraction of training data can implant a hidden trigger that controls the model's output while preserving normal behavior on clean inputs. Existing inference-time defenses are not well suited to the audio domain, as they either rely on trigger over-robustness assumptions that fail on transformation-based and semantic triggers, or depend on properties specific to image or text modalities. In this paper, we propose STEP (Stability-based Trigger Exposure Profiling), a black-box, retraining-free backdoor detector that operates under hard-label-only access. Its core idea is to exploit a characteristic dual anomaly of backdoor triggers: anomalous label stability under semantic-breaking perturbations, and anomalous label fragility under semantic-preserving perturbations. STEP profiles each test sample with two complementary perturbation branches that target these two properties respectively, scores the resulting stability features with one-class anomaly detectors trained on benign references, and fuses the two scores via unsupervised weighting. Extensive experiments across seven backdoor attacks show that STEP achieves an average AUROC of 97.92% and EER of 4.54%, substantially outperforming state-of-the-art baselines, and generalizes across model architectures, speech tasks, an open-set verification scenario, and over-the-air physical-world settings.

[67]  arXiv:2603.18104 [pdf, ps, other]
Title: Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI
Authors: Houston Haynes
Comments: 29 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Prevailing AI training infrastructure assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer complexity, and structural degradation of geometric properties through training are consequences of this arithmetic substrate. This paper develops an alternative training architecture grounded in three prior results: the Dimensional Type System and Deterministic Memory Management framework [6], which establishes stack-eligible gradient allocation and exact quire accumulation as design-time verifiable properties; the Program Hypergraph [8], which establishes grade preservation through geometric algebra computations as a type-level invariant; and the b-posit 2026 standard [10], which makes posit arithmetic tractable across hardware targets conventionally considered inference-only. Their composition enables depth-independent training memory bounded to approximately twice the inference footprint, grade-preserving weight updates, and exact gradient accumulation, applicable uniformly to loss-function-optimized and spike-timing-dependent neuromorphic models. We introduce Bayesian distillation, a mechanism by which the latent prior structure of a general-purpose model is extracted through the ADM training regime, resolving the data-scarcity bootstrapping problem for domain-specific training. For deployment, we introduce warm rotation, an operational pattern in which an updated model transitions into an active inference pathway without service interruption, with structural correctness formalized through PHG certificates and signed version records. The result is a class of domain-specific AI systems that are smaller and more precise than general-purpose models, continuously adaptive, verifiably correct with respect to the physical structure of their domains, and initializable from existing models.

[68]  arXiv:2603.18105 [pdf, ps, other]
Title: Adaptive Fuzzy Logic-Based Steganographic Encryption Framework: A Comprehensive Experimental Evaluation
Subjects: Cryptography and Security (cs.CR)

Digital image steganography requires a careful trade-off among payload capacity, visual fidelity, and statistical undetectability. Fixed-depth least significant bit embedding remains attractive because of its simplicity and high capacity, but it modifies smooth and textured regions uniformly, thereby increasing distortion and detectability in statistically sensitive areas. This paper presents an adaptive steganographic framework that combines a Mamdanitype fuzzy inference system with modern authenticated encryption. The proposed method determines a pixel-wise embedding depth from 1 to 3 bits using local entropy, edge magnitude, and payload pressure as linguistic inputs. To preserve encoder-decoder synchronization, the same feature maps are computed from lower-bit-stripped images, making the adaptive control mechanism invariant to the least significant modifications introduced during embedding. A cryptographic layer based on Argon2id and AES-256-GCM protects payload confidentiality and integrity independently of steganographic concealment.

[69]  arXiv:2603.18107 [pdf, ps, other]
Title: ARTEMIS: A Neuro Symbolic Framework for Economically Constrained Market Dynamics
Authors: Rahul D Ray
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Statistical Finance (q-fin.ST)

Deep learning models in quantitative finance often operate as black boxes, lacking interpretability and failing to incorporate fundamental economic principles such as no-arbitrage constraints. This paper introduces ARTEMIS (Arbitrage-free Representation Through Economic Models and Interpretable Symbolics), a novel neuro-symbolic framework combining a continuous-time Laplace Neural Operator encoder, a neural stochastic differential equation regularised by physics-informed losses, and a differentiable symbolic bottleneck that distils interpretable trading rules. The model enforces economic plausibility via two novel regularisation terms: a Feynman-Kac PDE residual penalising local no-arbitrage violations, and a market price of risk penalty bounding the instantaneous Sharpe ratio. We evaluate ARTEMIS against six strong baselines on four datasets: Jane Street, Optiver, Time-IMM, and DSLOB (a synthetic crash regime). Results demonstrate ARTEMIS achieves state-of-the-art directional accuracy, outperforming all baselines on DSLOB (64.96%) and Time-IMM (96.0%). A comprehensive ablation study confirms each component's contribution: removing the PDE loss reduces directional accuracy from 64.89% to 50.32%. Underperformance on Optiver is attributed to its long sequence length and volatility-focused target. By providing interpretable, economically grounded predictions, ARTEMIS bridges the gap between deep learning's power and the transparency demanded in quantitative finance.

[70]  arXiv:2603.18108 [pdf, ps, other]
Title: From Concepts to Judgments: Interpretable Image Aesthetic Assessment
Comments: 12 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image aesthetic assessment (IAA) aims to predict the aesthetic quality of images as perceived by humans. While recent IAA models achieve strong predictive performance, they offer little insight into the factors driving their predictions. Yet for users, understanding why an image is considered pleasing or not is as valuable as the score itself, motivating growing interest in interpretability within IAA. When humans evaluate aesthetics, they naturally rely on high-level cues to justify their judgments. Motivated by this observation, we propose an interpretable IAA framework grounded in human-understandable aesthetic concepts. We learn these concepts in an accessible manner, constructing a subspace that forms the foundation of an inherently interpretable model. To capture nuanced influences on aesthetic perception beyond explicit concepts, we introduce a simple yet effective residual predictor. Experiments on photographic and artistic datasets demonstrate that our method achieves competitive predictive performance while offering transparent, human-understandable aesthetic judgments.

[71]  arXiv:2603.18111 [pdf, ps, other]
Title: BoundAD: Boundary-Aware Negative Generation for Time Series Anomaly Detection
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Contrastive learning methods for time series anomaly detection (TSAD) heavily depend on the quality of negative sample construction. However, existing strategies based on random perturbations or pseudo-anomaly injection often struggle to simultaneously preserve temporal semantic consistency and provide effective decision-boundary supervision. Most existing methods rely on prior anomaly injection, while overlooking the potential of generating hard negatives near the data manifold boundary directly from normal samples themselves. To address this issue, we propose a reconstruction-driven boundary negative generation framework that automatically constructs hard negatives through the reconstruction process of normal samples. Specifically, the method first employs a reconstruction network to capture normal temporal patterns, and then introduces a reinforcement learning strategy to adaptively adjust the optimization update magnitude according to the current reconstruction state. In this way, boundary-shifted samples close to the normal data manifold can be induced along the reconstruction trajectory and further used for subsequent contrastive representation learning. Unlike existing methods that depend on explicit anomaly injection, the proposed framework does not require predefined anomaly patterns, but instead mines more challenging boundary negatives from the model's own learning dynamics. Experimental results show that the proposed method effectively improves anomaly representation learning and achieves competitive detection performance on the current dataset.

[72]  arXiv:2603.18112 [pdf, ps, other]
Title: Tula: Optimizing Time, Cost, and Generalization in Distributed Large-Batch Training
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Distributed training increases the number of batches processed per iteration either by scaling-out (adding more nodes) or scaling-up (increasing the batch-size). However, the largest configuration does not necessarily yield the best performance. Horizontal scaling introduces additional communication overhead, while vertical scaling is constrained by computation cost and device memory limits. Thus, simply increasing the batch-size leads to diminishing returns: training time and cost decrease initially but eventually plateaus, creating a knee-point in the time/cost versus batch-size pareto curve. The optimal batch-size therefore depends on the underlying model, data and available compute resources. Large batches also suffer from worse model quality due to the well-known generalization gap. In this paper, we present Tula, an online service that automatically optimizes time, cost, and convergence quality for large-batch training of convolutional models. It combines parallel-systems modeling with statistical performance prediction to identify the optimal batch-size. Tula predicts training time and cost within 7.5-14% error across multiple models, and achieves up to 20x overall speedup and improves test accuracy by 9% on average over standard large-batch training on various vision tasks, thus successfully mitigating the generalization gap and accelerating training at the same time.

[73]  arXiv:2603.18113 [pdf, ps, other]
Title: VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models
Comments: 12 pages; Accepted to WWW2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

As large language models (LLMs) increasingly shape content generation, interaction, and decision-making across the Web, aligning them with human values has become a central objective in trustworthy AI. This challenge becomes even more pronounced when aligning multiple, potentially conflicting human values. Although recent approaches, such as reward reweighting, prompt-based supervised fine-tuning, and model merging, attempt to tackle multi-value alignment, they still face two major limitations: (1) training separate models for each value combination is prohibitively expensive; (2) value conflicts substantially degrade alignment performance. These limitations make it difficult to achieve favorable trade-offs across diverse human values. To address these challenges, we revisit multi-value alignment from the perspective of value consistency in data and propose VC-soup, a data filtering and parameter merging framework grounded in value-consistent learning. We first design a value consistency metric based on the cosine similarity between the reward-gap vector of each preference pair and an all-ones vector, which quantifies its cross-value coherence. We then filter out low-consistency preference pairs in each value dataset and train on the remaining data to obtain smooth, value-consistent policy models that better preserve linear mode connectivity. Finally, we linearly combine these policies and apply Pareto filtering across values to obtain solutions with balanced multi-value performance. Extensive experiments and theoretical analysis demonstrate that VC-soup effectively mitigates conflicts and consistently outperforms existing multi-value alignment methods.

[74]  arXiv:2603.18115 [pdf, ps, other]
Title: LLM-Augmented Computational Phenotyping of Long Covid
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Phenotypic characterization is essential for understanding heterogeneity in chronic diseases and for guiding personalized interventions. Long COVID, a complex and persistent condition, yet its clinical subphenotypes remain poorly understood. In this work, we propose an LLM-augmented computational phenotyping framework ``Grace Cycle'' that iteratively integrates hypothesis generation, evidence extraction, and feature refinement to discover clinically meaningful subgroups from longitudinal patient data. The framework identifies three distinct clinical phenotypes, Protected, Responder, and Refractory, based on 13,511 Long Covid participants. These phenotypes exhibit pronounced separation in peak symptom severity, baseline disease burden, and longitudinal dose-response patterns, with strong statistical support across multiple independent dimensions.
This study illustrates how large language models can be integrated into a principled, statistically grounded pipeline for phenotypic screening from complex longitudinal data. Note that the proposed framework is disease-agnostic and offers a general approach for discovering clinically interpretable subphenotypes.

[75]  arXiv:2603.18116 [pdf, ps, other]
Title: Responsible AI in criminal justice: LLMs in policing and risks to case progression
Subjects: Computers and Society (cs.CY)

There is growing interest in the use of Large Language Models (LLMs) in policing, but there are potential risks. We have developed a practical approach to identifying risks, grounded in the policing and legal system of England and Wales. We identify 15 policing tasks that could be implemented using LLMs and 17 risks from their use, then illustrate with over 40 examples of impact on case progression. As good practice is agreed, many risks could be reduced. But this requires effort: we need to address these risks in a timely manner and define system wide impacts and benefits.

[76]  arXiv:2603.18117 [pdf, ps, other]
Title: Intellectual Stewardship: Re-adapting Human Minds for Creative Knowledge Work in the Age of AI
Authors: Jianwei Zhang
Comments: 23 pages
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Background: Amid the opportunities and risks introduced by generative AI, learning research needs to envision how human minds and responsibilities should re-adapt as AI continues to augment or automate various tasks.
Approach: Drawing on theories of learning, intelligence, and knowledge creation, this conceptual paper proposes intellectual stewardship as a human-centered, conceptually grounded framework for advancing creative learning practices with AI.
Key points: Students and teachers work as responsible governors of intellectual processes distributed across human and artificial systems, guided by five core principles. Being knowledge-wise involves understanding the evolving state of knowledge and taking purposeful actions to advance it. Being intelligence-wise emphasizes making informed choices about how to orchestrate distributed cognitive processes and resources. Being context-wise requires sensitivity to recognize opportunities and risks. Being ethics-wise foregrounds ethical judgment, responsibility, and care in the use of knowledge and intellectual power. Finally, self- and community-growing defines the overarching purpose, aligning intellectual work with personal development and the advancement of collective well-being.
Contribution: The principles provide a lens for viewing the adaptation of human minds in AI-infused learning environments, calling for the development of meta-level dispositions and capabilities that characterize wisdom-oriented, socially responsible knowledge builders in the AI age.

[77]  arXiv:2603.18118 [pdf, ps, other]
Title: Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models
Comments: arXiv admin note: text overlap with arXiv:2411.14432
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large Language Models (LLMs) have achieved remarkable reliability and advanced capabilities through extended test-time reasoning. However, extending these capabilities to Multi-modal Large Language Models (MLLMs) remains a significant challenge due to a critical scarcity of high-quality, long-chain reasoning data and optimized training pipelines. To bridge this gap, we present a unified multi-agent visual reasoning framework that systematically evolves from our foundational image-centric model, Insight-V, into a generalized spatial-temporal architecture, Insight-V++. We first propose a scalable data generation pipeline equipped with multi-granularity assessment that autonomously synthesizes structured, complex reasoning trajectories across image and video domains without human intervention. Recognizing that directly supervising MLLMs with such intricate data yields sub-optimal results, we design a dual-agent architecture comprising a reasoning agent to execute extensive analytical chains, and a summary agent to critically evaluate and distill final outcomes. While our initial framework utilized Direct Preference Optimization (DPO), its off-policy nature fundamentally constrained reinforcement learning potential. To overcome these limitations, particularly for long-horizon video understanding, Insight-V++ introduces two novel algorithms, ST-GRPO and J-GRPO, which enhance spatial-temporal reasoning and improve evaluative robustness. Crucially, by leveraging reliable feedback from the summary agent, we guide an iterative reasoning path generation process, retraining the entire multi-agent system in a continuous, self-improving loop. Extensive experiments on base models like LLaVA-NeXT and Qwen2.5-VL demonstrate significant performance gains across challenging image and video reasoning benchmarks while preserving strong capabilities on traditional perception-focused tasks.

[78]  arXiv:2603.18120 [pdf, ps, other]
Title: MAED: Mathematical Activation Error Detection for Mitigating Physical Fault Attacks in DNN Inference
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

The inference phase of deep neural networks (DNNs) in embedded systems is increasingly vulnerable to fault attacks and failures, which can result in incorrect predictions. These vulnerabilities can potentially lead to catastrophic consequences, making the development of effective mitigation techniques essential. In this paper, we introduce MAED (Mathematical Activation Error Detection), an algorithm-level error detection framework that exploits mathematical identities to continuously validate the correctness of non-linear activation function computations at runtime. To the best of our knowledge, this work is the first to integrate algorithm-level error detection techniques to defend against both malicious fault injection attacks and naturally occurring faults in critical DNN components in embedded systems. The evaluation is conducted on three widely adopted activation functions, namely ReLu, sigmoid, and tanh which serve as fundamental building blocks for introducing non-linearity in DNNs and can lead to mispredictions when subjected to natural faults or fault attacks. We assessed the proposed error detection scheme via fault model simulation, achieving close to 100% error detection while mitigating existing fault attacks on DNN inference. Additionally, the overhead introduced by integrating the proposed scheme with the baseline implementation (i.e., without error detection) is validated through implementations on an AMD/Xilinx Artix-7 FPGA and an ATmega328P microcontroller, as well as through integration with TensorFlow. On the microcontroller, the proposed error detection incurs less than 1% clock cycle overhead, while on the FPGA it requires nearly zero additional area, at the cost of approximately a 20% increase in latency for sigmoid and tanh.

[79]  arXiv:2603.18122 [pdf, ps, other]
Title: Don't Vibe Code, Do Skele-Code: Interactive No-Code Notebooks for Subject Matter Experts to Build Lower-Cost Agentic Workflows
Comments: Main paper 9 pages. Topics: Agentic Coding, HCI, LLMs, Workflows
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Programming Languages (cs.PL); Systems and Control (eess.SY)

Skele-Code is a natural-language and graph-based interface for building workflows with AI agents, designed especially for less or non-technical users. It supports incremental, interactive notebook-style development, and each step is converted to code with a required set of functions and behavior to enable incremental building of workflows. Agents are invoked only for code generation and error recovery, not orchestration or task execution. This agent-supported, but code-first approach to workflows, along with the context-engineering used in Skele-Code, can help reduce token costs compared to the multi-agent system approach to executing workflows. Skele-Code produces modular, easily extensible, and shareable workflows. The generated workflows can also be used as skills by agents, or as steps in other workflows.

[80]  arXiv:2603.18124 [pdf, ps, other]
Title: Evaluating FrameNet-Based Semantic Modeling for Gender-Based Violence Detection in Clinical Records
Comments: Paper accepted to the Lang4Heath Workshop at PROPOR 2026
Subjects: Computation and Language (cs.CL)

Gender-based violence (GBV) is a major public health issue, with the World Health Organization estimating that one in three women experiences physical or sexual violence by an intimate partner during her lifetime. In Brazil, although healthcare professionals are legally required to report such cases, underreporting remains significant due to difficulties in identifying abuse and limited integration between public information systems. This study investigates whether FrameNet-based semantic annotation of open-text fields in electronic medical records can support the identification of patterns of GBV. We compare the performance of an SVM classifier for GBV cases trained on (1) frame-annotated text, (2) annotated text combined with parameterized data, and (3) parameterized data alone. Quantitative and qualitative analyses show that models incorporating semantic annotation outperform categorical models, achieving over 0.3 improvement in F1 score and demonstrating that domain-specific semantic representations provide meaningful signals beyond structured demographic data. The findings support the hypothesis that semantic analysis of clinical narratives can enhance early identification strategies and support more informed public health interventions.

[81]  arXiv:2603.18126 [pdf, ps, other]
Title: A Survey of Neural Network Variational Monte Carlo from a Computing Workload Characterization Perspective
Subjects: Hardware Architecture (cs.AR); Chemical Physics (physics.chem-ph)

Neural Network Variational Monte Carlo (NNVMC) has emerged as a promising paradigm for solving quantum many-body problems by combining variational Monte Carlo with expressive neural-network wave-function ans\"atze. Although NNVMC can achieve competitive accuracy with favorable asymptotic scaling, practical deployment remains limited by high runtime and memory cost on modern graphics processing units (GPUs). Compared with language and vision workloads, NNVMC execution is shaped by physics-specific stages, including Markov-Chain Monte Carlo sampling, wave-function construction, and derivative/Laplacian evaluation, which produce heterogeneous kernel behavior and nontrivial bottlenecks. This paper provides a workload-oriented survey and empirical GPU characterization of four representative ans\"atze: PauliNet, FermiNet, Psiformer, and Orbformer. Using a unified profiling protocol, we analyze model-level runtime and memory trends and kernel-level behavior through family breakdown, arithmetic intensity, roofline positioning, and hardware utilization counters. The results show that end-to-end performance is often constrained by low-intensity elementwise and data-movement kernels, while the compute/memory balance varies substantially across ans\"atze and stages. Based on these findings, we discuss algorithm--hardware co-design implications for scalable NNVMC systems, including phase-aware scheduling, memory-centric optimization, and heterogeneous acceleration.

[82]  arXiv:2603.18130 [pdf, ps, other]
Title: Final Report for the Workshop on Robotics & AI in Medicine
Authors: Juan P Wachs
Comments: 51 pages, 5 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

The CARE Workshop on Robotics and AI in Medicine, held on December 1, 2025 in Indianapolis, convened leading researchers, clinicians, industry innovators, and federal stakeholders to shape a national vision for advancing robotics and artificial intelligence in healthcare. The event highlighted the accelerating need for coordinated research efforts that bridge engineering innovation with real clinical priorities, emphasizing safety, reliability, and translational readiness with an emphasis on the use of robotics and AI to achieve this readiness goal.
Across keynotes, panels, and breakout sessions, participants underscored critical gaps in data availability, standardized evaluation methods, regulatory pathways, and workforce training that hinder the deployment of intelligent robotic systems in surgical, diagnostic, rehabilitative, and assistive contexts. Discussions emphasized the transformative potential of AI enabled robotics to improve precision, reduce provider burden, expand access to specialized care, and enhance patient outcomes particularly in undeserved regions and high risk procedural domains. Special attention was given to austere settings, disaster and relief and military settings.
The workshop demonstrated broad consensus on the urgency of establishing a national Center for AI and Robotic Excellence in medicine (CARE). Stakeholders identified priority research thrusts including human robot collaboration, trustworthy autonomy, simulation and digital twins, multi modal sensing, and ethical integration of generative AI into clinical workflows. Participants also articulated the need for high quality datasets, shared test beds, autonomous surgical systems, clinically grounded benchmarks, and sustained interdisciplinary training mechanisms.

[83]  arXiv:2603.18157 [pdf, ps, other]
Title: Learning-Augmented Algorithms for $k$-median via Online Learning
Comments: NeurIPS 2025
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

The field of learning-augmented algorithms seeks to use ML techniques on past instances of a problem to inform an algorithm designed for a future instance. In this paper, we introduce a novel model for learning-augmented algorithms inspired by online learning. In this model, we are given a sequence of instances of a problem and the goal of the learning-augmented algorithm is to use prior instances to propose a solution to a future instance of the problem. The performance of the algorithm is measured by its average performance across all the instances, where the performance on a single instance is the ratio between the cost of the algorithm's solution and that of an optimal solution for that instance. We apply this framework to the classic $k$-median clustering problem, and give an efficient learning algorithm that can approximately match the average performance of the best fixed $k$-median solution in hindsight across all the instances. We also experimentally evaluate our algorithm and show that its empirical performance is close to optimal, and also that it automatically adapts the solution to a dynamically changing sequence.

[84]  arXiv:2603.18160 [pdf, ps, other]
Title: On the equivalence of semi-discrete Active Flux and Discontinuous Galerkin methods and a comparison of their performance
Subjects: Numerical Analysis (math.NA)

The Active Flux (AF) method employs a globally continuous approximation, like continuous Finite Element methods. This is achieved through the placement of point values at cell interfaces which are shared between adjacent cells. With, on average, K+1 degrees of freedom per cell, Active Flux achieves a polynomial approximation of degree K+1, while the Discontinuous Galerkin (DG) method uses only polynomials of degree K, i.e. one degree less with the same number of degrees of freedom. Despite all the differences, in this paper we show, however, that for linear problems in one and several dimensions as well as -- in some sense -- for nonlinear ones, semi-discrete AF and DG are the same method. We identify a mapping between their respective degrees of freedom, upon which the updates of these degrees of freedom turn out to agree. On the one hand, AF therefore seems more economical then DG for a given value of the error, and we confirm this in numerical experiments. On the other hand, this is a way to understand superconvergence of DG in a natural way, and we show how Radau polynomials and their zeros appear in the mapping between DG and AF: In the Radau points, AF "shines through" as the background high-order scheme behind DG.

[85]  arXiv:2603.18161 [pdf, ps, other]
Title: How LLMs Distort Our Written Language
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) are used by over a billion people globally, most often to assist with writing. In this work, we demonstrate that LLMs not only alter the voice and tone of human writing, but also consistently alter the intended meaning. First, we conduct a human user study to understand how people actually interact with LLMs when using them for writing. Our findings reveal that extensive LLM use led to a nearly 70% increase in essays that remained neutral in answering the topic question. Significantly more heavy LLM users reported that the writing was less creative and not in their voice. Next, using a dataset of human-written essays that was collected in 2021 before the widespread release of LLMs, we study how asking an LLM to revise the essay based on the human-written feedback in the dataset induces large changes in the resulting content and meaning. We find that even when LLMs are prompted with expert feedback and asked to only make grammar edits, they still change the text in a way that significantly alters its semantic meaning. We then examine LLM-generated text in the wild, specifically focusing on the 21% of AI-generated scientific peer reviews at a recent top AI conference. We find that LLM-generated reviews place significantly less weight on clarity and significance of the research, and assign scores that, on average, are a full point higher.These findings highlight a misalignment between the perceived benefit of AI use and an implicit, consistent effect on the semantics of human writing, motivating future work on how widespread AI writing will affect our cultural and scientific institutions.

[86]  arXiv:2603.18166 [pdf, ps, other]
Title: Efficient Dense Crowd Trajectory Prediction Via Dynamic Clustering
Subjects: Artificial Intelligence (cs.AI)

Crowd trajectory prediction plays a crucial role in public safety and management, where it can help prevent disasters such as stampedes. Recent works address the problem by predicting individual trajectories and considering surrounding objects based on manually annotated data. However, these approaches tend to overlook dense crowd scenarios, where the challenges of automation become more pronounced due to the massiveness, noisiness, and inaccuracy of the tracking outputs, resulting in high computational costs. To address these challenges, we propose and extensively evaluate a novel cluster-based approach that groups individuals based on similar attributes over time, enabling faster execution through accurate group summarisation. Our plug-and-play method can be combined with existing trajectory predictors by using our output centroid in place of their pedestrian input. We evaluate our proposed method on several challenging dense crowd scenes. We demonstrated that our approach leads to faster processing and lower memory usage when compared with state-of-the-art methods, while maintaining the accuracy

[87]  arXiv:2603.18171 [pdf, ps, other]
Title: Modeling the human lexicon under temperature variations: linguistic factors, diversity and typicality in LLM word associations
Comments: 11 pages, 12 figures, to appear in LREC 2026
Subjects: Computation and Language (cs.CL)

Large language models (LLMs) achieve impressive results in terms of fluency in text generation, yet the nature of their linguistic knowledge - in particular the human-likeness of their internal lexicon - remains uncertain. This study compares human and LLM-generated word associations to evaluate how accurately models capture human lexical patterns. Using English cue-response pairs from the SWOW dataset and newly generated associations from three LLMs (Mistral-7B, Llama-3.1-8B, and Qwen-2.5-32B) across multiple temperature settings, we examine (i) the influence of lexical factors such as word frequency and concreteness on cue-response pairs, and (ii) the variability and typicality of LLM responses compared to human responses. Results show that all models mirror human trends for frequency and concreteness but differ in response variability and typicality. Larger models such as Qwen tend to emulate a single "prototypical" human participant, generating highly typical but minimally variable responses, while smaller models such as Mistral and Llama produce more variable yet less typical responses. Temperature settings further influence this trade-off, with higher values increasing variability but decreasing typicality. These findings highlight both the similarities and differences between human and LLM lexicons, emphasizing the need to account for model size and temperature when probing LLM lexical representations.

[88]  arXiv:2603.18173 [pdf, ps, other]
Title: GRAFITE: Generative Regression Analysis Framework for Issue Tracking and Evaluation
Comments: 7 pages, 2 figures
Subjects: Computation and Language (cs.CL)

Large language models (LLMs) are largely motivated by their performance on popular topics and benchmarks at the time of their release. However, over time, contamination occurs due to significant exposure of benchmark data during training. This poses a risk of model performance inflation if testing is not carefully executed. To address this challenge, we present GRAFITE, a continuous LLM evaluation platform through a comprehensive system for maintaining and evaluating model issues. Our approach enables building a repository of model problems based on user feedback over time and offers a pipeline for assessing LLMs against these issues through quality assurance (QA) tests using LLM-as-a-judge. The platform enables side-by-side comparison of multiple models, facilitating regression detection across different releases. The platform is available at https://github.com/IBM/grafite. The demo video is available at www.youtube.com/watch?v=XFZyoleN56k.

[89]  arXiv:2603.18174 [pdf, ps, other]
Title: Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL
Comments: Work in progess
Subjects: Machine Learning (cs.LG)

Conflict detection in policy languages is a solved problem -- as long as every rule condition is a crisp Boolean predicate. BDDs, SMT solvers, and NetKAT all exploit that assumption. But a growing class of routing and access-control systems base their decisions on probabilistic ML signals: embedding similarities, domain classifiers, complexity estimators. Two such signals, declared over categories the author intended to be disjoint, can both clear their thresholds on the same query and silently route it to the wrong model. Nothing in the compiler warns about this. We characterize the problem as a three-level decidability hierarchy -- crisp conflicts are decidable via SAT, embedding conflicts reduce to spherical cap intersection, and classifier conflicts are undecidable without distributional knowledge -- and show that for the embedding case, which dominates in practice, replacing independent thresholding with a temperature-scaled softmax partitions the embedding space into Voronoi regions where co-firing is impossible. No model retraining is needed. We implement the detection and prevention mechanisms in the Semantic Router DSL, a production routing language for LLM inference, and discuss how the same ideas apply to semantic RBAC and API gateway policy.

[90]  arXiv:2603.18178 [pdf, ps, other]
Title: VLM-AutoDrive: Post-Training Vision-Language Models for Safety-Critical Autonomous Driving Events
Comments: 16 pages, 9 figures, submitted to arXiv
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The rapid growth of ego-centric dashcam footage presents a major challenge for detecting safety-critical events such as collisions and near-collisions, scenarios that are brief, rare, and difficult for generic vision models to capture. While multimodal large language models (MLLMs) demonstrate strong general reasoning ability, they underperform in driving contexts due to domain and temporal misalignment.
We introduce VLM-AutoDrive, a modular post-training framework for adapting pretrained Vision-Language Models (VLMs) to high-fidelity anomaly detection. The framework integrates metadata-derived captions, LLM-generated descriptions, visual question answering (VQA) pairs, and chain-of-thought (CoT) reasoning supervision to enable domain-aligned and interpretable learning. Off-the-shelf VLMs such as NVIDIA's Cosmos-Reason1 7B (CR1) exhibit near-zero Collision recall in zero-shot settings; fine-tuning with VLM-AutoDrive improves Collision F1 from 0.00 to 0.69 and overall accuracy from 35.35% to 77.27%.
VLM-AutoDrive offers a scalable recipe for adapting general-purpose VLMs to safety-critical, temporally localized perception tasks. Evaluated on real-world Nexar dashcam videos, it achieves substantial gains in Collision and Near-Collision detection while producing interpretable reasoning traces, bridging the gap between perception, causality, and decision reasoning in autonomous driving.

[91]  arXiv:2603.18184 [pdf, ps, other]
Title: CWoMP: Morpheme Representation Learning for Interlinear Glossing
Comments: Project page: this http URL
Subjects: Computation and Language (cs.CL)

Interlinear glossed text (IGT) is a standard notation for language documentation which is linguistically rich but laborious to produce manually. Recent automated IGT methods treat glosses as character sequences, neglecting their compositional structure. We propose CWoMP (Contrastive Word-Morpheme Pretraining), which instead treats morphemes as atomic form-meaning units with learned representations. A contrastively trained encoder aligns words-in-context with their constituent morphemes in a shared embedding space; an autoregressive decoder then generates the morpheme sequence by retrieving entries from a mutable lexicon of these embeddings. Predictions are interpretable--grounded in lexicon entries--and users can improve results at inference time by expanding the lexicon without retraining. We evaluate on diverse low-resource languages, showing that CWoMP outperforms existing methods while being significantly more efficient, with particularly strong gains in extremely low-resource settings.

[92]  arXiv:2603.18189 [pdf, ps, other]
Title: TeachingCoach: A Fine-Tuned Scaffolding Chatbot for Instructional Guidance to Instructors
Subjects: Artificial Intelligence (cs.AI)

Higher education instructors often lack timely and pedagogically grounded support, as scalable instructional guidance remains limited and existing tools rely on generic chatbot advice or non-scalable teaching center human-human consultations. We present TeachingCoach, a pedagogically grounded chatbot designed to support instructor professional development through real-time, conversational guidance. TeachingCoach is built on a data-centric pipeline that extracts pedagogical rules from educational resources and uses synthetic dialogue generation to fine-tune a specialized language model that guides instructors through problem identification, diagnosis, and strategy development. Expert evaluations show TeachingCoach produces clearer, more reflective, and more responsive guidance than a GPT-4o mini baseline, while a user study with higher education instructors highlights trade-offs between conversational depth and interaction efficiency. Together, these results demonstrate that pedagogically grounded, synthetic data driven chatbots can improve instructional support and offer a scalable design approach for future instructional chatbot systems.

[93]  arXiv:2603.18192 [pdf, ps, other]
Title: MicroVision: An Open Dataset and Benchmark Models for Detecting Vulnerable Road Users and Micromobility Vehicles
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Micromobility is a growing mode of transportation, raising new challenges for traffic safety and planning due to increased interactions in areas where vulnerable road users (VRUs) share the infrastructure with micromobility, including parked micromobility vehicles (MMVs). Approaches to support traffic safety and planning increasingly rely on detecting road users in images -- a computer-vision task relying heavily on the quality of the images to train on. However, existing open image datasets for training such models lack focus and diversity in VRUs and MMVs, for instance, by categorizing both pedestrians and MMV riders as "person", or by not including new MMVs like e-scooters. Furthermore, datasets are often captured from a car perspective and lack data from areas where only VRUs travel (sidewalks, cycle paths). To help close this gap, we introduce the MicroVision dataset: an open image dataset and annotations for training and evaluating models for detecting the most common VRUs (pedestrians, cyclists, e-scooterists) and stationary MMVs (bicycles, e-scooters), from a VRU perspective. The dataset, recorded in Gothenburg (Sweden), consists of more than 8,000 anonymized, full-HD images with more than 30,000 carefully annotated VRUs and MMVs, captured over an entire year and part of almost 2,000 unique interaction scenes. Along with the dataset, we provide first benchmark object-detection models based on state-of-the-art architectures, which achieved a mean average precision of up to 0.723 on an unseen test set. The dataset and model can support traffic safety to distinguish between different VRUs and MMVs, or help monitoring systems identify the use of micromobility. The dataset and model weights can be accessed at https://doi.org/10.71870/eepz-jd52.

[94]  arXiv:2603.18196 [pdf, ps, other]
Title: Retrieval-Augmented LLMs for Security Incident Analysis
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Investigating cybersecurity incidents requires collecting and analyzing evidence from multiple log sources, including intrusion detection alerts, network traffic records, and authentication events. This process is labor-intensive: analysts must sift through large volumes of data to identify relevant indicators and piece together what happened. We present a RAG-based system that performs security incident analysis through targeted query-based filtering and LLM semantic reasoning. The system uses a query library with associated MITRE ATT\&CK techniques to extract indicators from raw logs, then retrieves relevant context to answer forensic questions and reconstruct attack sequences. We evaluate the system with five LLM providers on malware traffic incidents and multi-stage Active Directory attacks. We find that LLM models have different performance and tradeoffs, with Claude Sonnet~4 and DeepSeek~V3 achieving 100\% recall across all four malware scenarios, while DeepSeek costs 15$\times$ less (\$0.008 vs.\ \$0.12 per analysis). Attack step detection on Active Directory scenarios reaches 100\% precision and 82\% recall. Ablation studies confirm that a RAG architecture is essential: LLM baselines without RAG-enhanced context correctly identify victim hosts but miss all attack infrastructure including malicious domains and command-and-control servers. These results demonstrate that combining targeted query-based filtering with RAG-based retrieval enables accurate, cost-effective security analysis within LLM context limits.

[95]  arXiv:2603.18197 [pdf, ps, other]
Title: Access Controlled Website Interaction for Agentic AI with Delegated Critical Tasks
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)

Recent studies reveal gaps in delegating critical tasks to agentic AI that accesses websites on the user's behalf, primarily due to limited access control mechanisms on websites designed for agentic AI. In response, we propose a design of website-based interaction for AI agents with fine-grained access control for delegated critical tasks. Our approach encompasses a website design and implementation, as well as modifications to the access grant protocols in an open-source authorization service to tailor it to agentic AI, with delegated critical tasks on the website. The evaluation of our approach demonstrates the capabilities of our access-controlled website used by AI agents.

[96]  arXiv:2603.18200 [pdf, ps, other]
Title: Minimum Energy Cruise of All-Electric Aircraft with Applications to Advanced Air Mobility
Comments: 17 pages, 3 figures, submitted to Aerospace Systems special issue on Low-altitude Economy
Subjects: Systems and Control (eess.SY)

Electrified propulsion is expected to play an important role in the sustainable development of Advanced Air Mobility (AAM). However, the limited energy density of batteries motivates the need to minimize energy consumption during flight. This paper studies the minimum total energy problem for an all-electric aircraft in steady cruise flight. The problem is formulated as an optimal control problem in which the cruise airspeed and final cruise time are optimization variables. The battery supply voltage is modeled as an affine function of the battery charge. Pontryagin's Minimum Principle is used to derive the necessary and sufficient conditions for optimality, from which closed-form expressions for the optimal cruise airspeed and optimal final cruise time are obtained. Additional analytical conditions are derived that determine when all-electric operation is feasible, one of which is that sufficient electric charge must be available. Numerical simulations based on the BETA Technologies CX300 all-electric aircraft and a representative AAM scenario illustrate how the aircraft weight, cruising altitude, electrical system efficiency, and initial battery charge influence the optimal airspeed and the feasibility of all-electric cruise.

[97]  arXiv:2603.18201 [pdf, ps, other]
Title: A Computationally Efficient Learning of Artificial Intelligence System Reliability Considering Error Propagation
Comments: 42 pages, 11 figures
Subjects: Artificial Intelligence (cs.AI); Computation (stat.CO)

Artificial Intelligence (AI) systems are increasingly prominent in emerging smart cities, yet their reliability remains a critical concern. These systems typically operate through a sequence of interconnected functional stages, where upstream errors may propagate to downstream stages, ultimately affecting overall system reliability. Quantifying such error propagation is essential for accurate modeling of AI system reliability. However, this task is challenging due to: i) data availability: real-world AI system reliability data are often scarce and constrained by privacy concerns; ii) model validity: recurring error events across sequential stages are interdependent, violating the independence assumptions of statistical inference; and iii) computational complexity: AI systems process large volumes of high-speed data, resulting in frequent and complex recurrent error events that are difficult to track and analyze. To address these challenges, this paper leverages a physics-based autonomous vehicle simulation platform with a justifiable error injector to generate high-quality data for AI system reliability analysis. Building on this data, a new reliability modeling framework is developed to explicitly characterize error propagation across stages. Model parameters are estimated using a computationally efficient, theoretically guaranteed composite likelihood expectation - maximization algorithm. Its application to the reliability modeling for autonomous vehicle perception systems demonstrates its predictive accuracy and computational efficiency.

[98]  arXiv:2603.18202 [pdf, ps, other]
Title: R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation
Authors: Naoki Morihira (1 and 2), Amal Nahar (1), Kartik Bharadwaj (1), Yasuhiro Kato (2), Akinobu Hayashi (1 and 2), Tatsuya Harada (2 and 3) ((1) Honda R and D Co. Ltd., (2) The University of Tokyo, (3) RIKEN AIP)
Comments: 20 pages, 12 figures, 2 tables. Published as a conference paper at ICLR 2026. Code available at this https URL
Journal-ref: Published as a conference paper at ICLR 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

A central challenge in image-based Model-Based Reinforcement Learning (MBRL) is to learn representations that distill essential information from irrelevant visual details. While promising, reconstruction-based methods often waste capacity on large task-irrelevant regions. Decoder-free methods instead learn robust representations by leveraging Data Augmentation (DA), but reliance on such external regularizers limits versatility. We propose R2-Dreamer, a decoder-free MBRL framework with a self-supervised objective that serves as an internal regularizer, preventing representation collapse without resorting to DA. The core of our method is a redundancy-reduction objective inspired by Barlow Twins, which can be easily integrated into existing frameworks. On DeepMind Control Suite and Meta-World, R2-Dreamer is competitive with strong baselines such as DreamerV3 and TD-MPC2 while training 1.59x faster than DreamerV3, and yields substantial gains on DMC-Subtle with tiny task-relevant objects. These results suggest that an effective internal regularizer can enable versatile, high-performance decoder-free MBRL. Code is available at https://github.com/NM512/r2dreamer.

[99]  arXiv:2603.18203 [pdf, ps, other]
Title: How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence
Comments: conference ICSSH2026
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

The dominant paradigms of artificial intelligence were shaped by learning theories from psychology: behaviorism inspired reinforcement learning, cognitivism gave rise to deep learning and memory-augmented architectures, and constructivism influenced curriculum learning and compositional approaches. This paper argues that each AI paradigm inherited not only the strengths but the structural limitations of the psychological theory that inspired it. Reinforcement learning cannot account for the internal structure of knowledge, deep learning compresses representations into opaque parameter spaces resistant to principled update, and current integrative approaches lack a formal account of how new understanding is constructed from existing components. The paper further examines a cross-cultural divergence in the interpretation of rote learning, arguing that the Eastern conception of memorization as a structured, multi-phase precursor to understanding offers an underexploited bridge between psychological theory and AI methodology. Drawing on the systematicity debate and critique of Aizawa of both classicism and connectionism, this paper introduces ReSynth, a trimodular framework that separates reasoning (Intellect), purpose (Identity), and knowledge (Memory) as architecturally independent components. The paper traces the genealogy from psychological paradigm to AI method, diagnoses the inherited limitations at each stage, and argues that adaptability, the central challenge of artificial general intelligence requires a representational architecture in which systematic behavior is a necessary consequence rather than an accidental property.

[100]  arXiv:2603.18210 [pdf, ps, other]
Title: GoalVLM: VLM-driven Object Goal Navigation for Multi-Agent System
Comments: 8 pages, 5 figures
Subjects: Robotics (cs.RO)

Object-goal navigation has traditionally been limited to ground robots with closed-set object vocabularies. Existing multi-agent approaches depend on precomputed probabilistic graphs tied to fixed category sets, precluding generalization to novel goals at test time.
We present GoalVLM, a cooperative multi-agent framework for zero-shot, open-vocabulary object navigation. GoalVLM integrates a Vision-Language Model (VLM) directly into the decision loop, SAM3 for text-prompted detection and segmentation, and SpaceOM for spatial reasoning, enabling agents to interpret free-form language goals and score frontiers via zero-shot semantic priors without retraining. Each agent builds a BEV semantic map from depth-projected voxel splatting, while a Goal Projector back-projects detections through calibrated depth into the map for reliable goal localization. A constraint-guided reasoning layer evaluates frontiers through a structured prompt chain (scene captioning, room-type classification, perception gating, multi-frontier ranking), injecting commonsense priors into exploration.
We evaluate GoalVLM on GOAT-Bench val_unseen (360 multi-subtask episodes, 1032 sequential object-goal subtasks, HM3D scenes), where each episode requires navigating to a chain of 5-7 open-vocabulary targets. GoalVLM with N=2 agents achieves 55.8% subtask SR and 18.3% SPL, competitive with state-of-the-art methods while requiring no task-specific training. Ablation studies confirm the contributions of VLM-guided frontier reasoning and depth-projected goal localization.

[101]  arXiv:2603.18218 [pdf, ps, other]
Title: Semantic Segmentation and Depth Estimation for Real-Time Lunar Surface Mapping Using 3D Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Navigation and mapping on the lunar surface require robust perception under challenging conditions, including poorly textured environments, high-contrast lighting, and limited computational resources. This paper presents a real-time mapping framework that integrates dense perception models with a 3D Gaussian Splatting (3DGS) representation. We first benchmark several models on synthetic datasets generated with the LuPNT simulator, selecting a stereo dense depth estimation model based on Gated Recurrent Units for its balance of speed and accuracy in depth estimation, and a convolutional neural network for its superior performance in detecting semantic segments. Using ground truth poses to decouple the local scene understanding from the global state estimation, our pipeline reconstructs a 120-meter traverse with a geometric height accuracy of approximately 3 cm, outperforming a traditional point cloud baseline without LiDAR. The resulting 3DGS map enables novel view synthesis and serves as a foundation for a full SLAM system, where its capacity for joint map and pose optimization would offer significant advantages. Our results demonstrate that combining semantic segmentation and dense depth estimation with learned map representations is an effective approach for creating detailed, large-scale maps to support future lunar surface missions.

[102]  arXiv:2603.18219 [pdf, ps, other]
Title: Convergence of Payoff-Based Higher-Order Replicator Dynamics in Contractive Games
Subjects: Systems and Control (eess.SY)

We study the convergence properties of a payoff-based higher-order version of replicator dynamics, a widely studied model in evolutionary dynamics and game-theoretic learning, in contractive games. Recent work has introduced a control-theoretic perspective for analyzing the convergence of learning dynamics through passivity theory, leading to a classification of learning dynamics based on the passivity notion they satisfy, such as \textdelta-passivity, equilibrium-independent passivity, and incremental passivity. We leverage this framework for the study of higher-order replicator dynamics for contractive games, which form the complement of passive learning dynamics. Standard replicator dynamics can be represented as a cascade interconnection between an integrator and the softmax mapping. Payoff-based higher-order replicator dynamics include a linear time-invariant (LTI) system in parallel with the existing integrator. First, we show that if this added system is strictly passive and asymptotically stable, then the resulting learning dynamics converge locally to the Nash equilibrium in contractive games. Second, we establish global convergence properties using incremental stability analysis for the special case of symmetric matrix contractive games.

[103]  arXiv:2603.18221 [pdf, ps, other]
Title: Scalable and Personalized Oral Assessments Using Voice AI
Subjects: Computers and Society (cs.CY)

Large language models have broken take-home exams. Students generate polished work they cannot explain under follow-up questioning. Oral examinations are a natural countermeasure -- they require real-time reasoning and cannot be outsourced to an LLM -- but they have never scaled. Voice AI changes this. We describe a system that conducted 36 oral examinations for an undergraduate AI/ML course at a total cost of \$15 (\$0.42 per student), low enough to attach oral comprehension checks to every assignment rather than reserving them for high-stakes finals. Because the LLM generates questions dynamically from a rubric, the entire examination structure can be shared in advance: practice is learning, and there is no exam to leak. A multi-agent architecture decomposes each examination into structured phases, and a council of three LLM families grades each transcript through a deliberation round in which models revise scores after reviewing peer evidence, achieving inter-rater reliability (Krippendorff's $\alpha$ = 0.86) above conventional thresholds. But the system also broke in instructive ways: the agent stacked questions despite explicit prohibitions, could not randomize case selection, and a cloned professorial voice was perceived as aggressive rather than familiar. The recurring lesson is that behavioral constraints on LLMs must be enforced through architecture, not prompting alone. Students largely agreed the format tested genuine understanding (70%), yet found it more stressful than written exams (83%) -- unsurprising given that 83% had never taken any oral examination. We document the full design, failure modes, and student experience, and include all prompts as appendices.

[104]  arXiv:2603.18232 [pdf, ps, other]
Title: On the Complexity of the Odd-Red Bipartite Perfect Matching Polytope
Subjects: Data Structures and Algorithms (cs.DS)

The odd-red bipartite perfect matching problem asks to find a perfect matching containing an odd number of red edges in a given red-blue edge-colored bipartite graph. While this problem lies in $\mathsf{P}$, its polyhedral structure remains elusive, despite renewed attention to achieving better polyhedral understanding, nurtured by recent advances from two complementary angles. Apart from being a special case of bimodular integer programs, whose polyhedral structure is also badly understood, it is related to one of the most notorious open derandomization questions in theoretical computer science: whether there is a deterministic efficient algorithm for the exact bipartite perfect matching problem, which asks to find a perfect matching with exactly $k$ red edges.
Recent progress towards deterministic algorithms for this problem crucially relies on a good polyhedral understanding. Motivated by this, Jia, Svensson, and Yuan show that the extension complexity of the exact bipartite perfect matching polytope is exponential in general. Interestingly, their result is true even for the easier odd-red bipartite perfect matching problem. For this problem, they introduce an exponential-size relaxation and leave open whether it is an exact description.
Apart from showing that this description is not exact and even hard to separate over, we show, more importantly, that the red-odd bipartite perfect matching polytope exhibits complex facet structure: any exact description needs constraints with large and diverse coefficients. This rules out classical relaxations based on constraints with all coefficients in $\{0,\pm1\}$, such as the above-mentioned one, and suggests that significant deviations from prior approaches may be needed to obtain an exact description. More generally, we obtain that also polytopes corresponding to bimodular integer programs have complex facet structure.

[105]  arXiv:2603.18235 [pdf, ps, other]
Title: Toward Reliable, Safe, and Secure LLMs for Scientific Applications
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

As large language models (LLMs) evolve into autonomous "AI scientists," they promise transformative advances but introduce novel vulnerabilities, from potential "biosafety risks" to "dangerous explosions." Ensuring trustworthy deployment in science requires a new paradigm centered on reliability (ensuring factual accuracy and reproducibility), safety (preventing unintentional physical or biological harm), and security (preventing malicious misuse). Existing general-purpose safety benchmarks are poorly suited for this purpose, suffering from a fundamental domain mismatch, limited threat coverage of science-specific vectors, and benchmark overfitting, which create a critical gap in vulnerability evaluation for scientific applications. This paper examines the unique security and safety landscape of LLM agents in science. We begin by synthesizing a detailed taxonomy of LLM threats contextualized for scientific research, to better understand the unique risks associated with LLMs in science. Next, we conceptualize a mechanism to address the evaluation gap by utilizing dedicated multi-agent systems for the automated generation of domain-specific adversarial security benchmarks. Based on our analysis, we outline how existing safety methods can be brought together and integrated into a conceptual multilayered defense framework designed to combine a red-teaming exercise and external boundary controls with a proactive internal Safety LLM Agent. Together, these conceptual elements provide a necessary structure for defining, evaluating, and creating comprehensive defense strategies for trustworthy LLM agent deployment in scientific disciplines.

[106]  arXiv:2603.18236 [pdf, ps, other]
Title: Delay-Robust Primal-Dual Dynamics for Distributed Optimization
Subjects: Systems and Control (eess.SY)

Continuous-time primal-dual gradient dynamics (PDGD) is an ubiquitous approach for dynamically solving constrained distributed optimization problems. Yet, the distributed nature of the dynamics makes it prone to communication uncertainties, especially time delays. To mitigate this effect, we propose a delay-robust continuous-time PDGD. The dynamics is obtained by augmenting the standard PDGD with an auxiliary state coupled through a gain matrix, while preserving the optimal solution. Then, we present sufficient tuning conditions for this gain matrix in the form of linear matrix inequalities, which ensure uniform asymptotic stability in the presence of bounded, time-varying delays. The criterion is derived via the Lyapunov-Krasovskii method. A numerical example illustrates the improved delay robustness of our approach compared to the standard PDGD under large, time-varying delays.

[107]  arXiv:2603.18237 [pdf, ps, other]
Title: Gradient-Informed Temporal Sampling Improves Rollout Accuracy in PDE Surrogate Training
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Researchers train neural simulators on uniformly sampled numerical simulation data. But under the same budget, does systematically sampled data provide the most effective information? A fundamental yet unformalized problem is how to sample training data for neural simulators so as to maximize rollout accuracy. Existing data sampling methods either tend to collapse into locally high-information-density regions, or preserve diversity but remain insufficiently model-specific, often leading to performance that is no better than uniform sampling. To address this, we propose a data sampling method tailored to neural simulators, Gradient-Informed Temporal Sampling (GITS). GITS jointly optimizes pilot-model local gradients and set-level temporal coverage, thereby effectively balancing model specificity and dynamical information. Compared with multiple sampling baselines, the data selected by GITS achieves lower rollout error across multiple PDE systems, model backbones and sample ratios. Furthermore, ablation studies demonstrate the necessity and complementarity of the two optimization objectives in GITS. In addition, we analyze the successful sampling patterns of GITS as well as the typical PDE systems and model backbones on which GITS fails.

[108]  arXiv:2603.18238 [pdf, ps, other]
Title: ReDAG-RT: Global Rate-Priority Scheduling for Real-Time Multi-DAG Execution in ROS 2
Comments: 12 pages, 6 figures
Subjects: Robotics (cs.RO)

ROS 2 has become a dominant middleware for robotic systems, where perception, estimation, planning, and control pipelines are structured as directed acyclic graphs of callbacks executed under a shared executor. However, default ROS 2 executors use best-effort dispatch without cross-DAG priority enforcement, leading to callback contention, structural priority inversion, and deadline instability under concurrent workloads. These limitations restrict deployment in time-critical and safety-sensitive cyber-physical systems. This paper presents ReDAGRT, a user-space global scheduling framework for deterministic multi-DAG execution in unmodified ROS 2. The framework introduces a Rate-Priority driven global ready queue that orders callbacks by activation rate, enforces per-DAG concurrency bounds, and mitigates cross-graph priority inversion without modifying the ROS 2 API, executor interface, or underlying operating system scheduler. We formalize a multi-DAG task model for ROS 2 callback pipelines and analyze cross-DAG interference under Rate-Priority scheduling. Response-time recurrences and schedulability conditions are derived within classical Rate-Monotonic theory. Experiments in a ROS 2 Humble environment compare ReDAGRT against SingleThreadedExecutor and MultiThreadedExecutor using synthetic multi-DAG workloads. Results show up to 29.7 percent reduction in deadline miss rate, 42.9 percent reduction in 99th percentile response time, and 13.7 percent improvement over MultiThreadedExecutor under comparable utilization. Asymmetric per-DAG concurrency bounds further reduce interference by 40.8 percent. These results demonstrate that deterministic and analyzable multi-DAG scheduling can be achieved entirely in the ROS 2 user-space execution layer, providing a practical foundation for real-time robotic middleware in safety-critical systems.

[109]  arXiv:2603.18240 [pdf, ps, other]
Title: Achievable DoF Bounds for Cache-Aided Asymmetric MIMO Communications
Comments: Extended journal version; submitted to IEEE Transactions on Communications (TCOM). An earlier conference version was published in ISIT 2025 (DOI: 10.1109/ISIT63088.2025.11195683)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This is an extended journal version of the conference paper published in ISIT 2025; submitted to IEEE Transactions on Communications (TCOM). Integrating coded caching (CC) into multiple-input multiple-output (MIMO) communications significantly enhances the achievable degrees of freedom (DoF). This paper investigates a practical cache-aided asymmetric MIMO configuration with cache ratio $\gamma$, where a server with $L$ transmit antennas communicates with $K$ users. The users are partitioned into $J$ groups, and each user in group $j$ has $G_j$ receive antennas. We propose four content-aware MIMO-CC strategies: \emph{min-$G$} enforces symmetry using the smallest antenna count among users; \emph{Grouping} maximizes intra-subset spatial multiplexing gain at the expense of some global caching gain; \emph{Super-grouping} aggregates users into optimized \emph{min-$G$}-based super-sets with identical effective receive multiplexing gains before applying \emph{Grouping} across them; and \emph{Phantom} redistributes spatial resources assuming ``phantom'' antennas at the users to bridge the performance gains of \emph{min-$G$} and \emph{Grouping}. We develop these asymmetric strategies under three reference symmetric CC placement-delivery policies with guaranteed linear decodability: a DoF-optimal policy achieving the optimal single-shot DoF, and two closed-form policies, namely combinatorial and linear cyclic low-complexity constructions, with the cyclic policy attaining DoF performance close to the others in many operating regimes. Analytical and numerical results demonstrate significant DoF improvements across various system configurations, and that policy-strategy combinations offer flexible trade-offs between DoF and subpacketization complexity.

[110]  arXiv:2603.18241 [pdf, ps, other]
Title: Splitting-strategies for arbitrary-order fully mixed finite element discretizations of the Biot equations
Subjects: Numerical Analysis (math.NA)

We study the fully mixed formulation of the Biot equations, which is characterized by a symmetric coupling between flow and deformation. This structure enables the use of stable mixed finite elements for each subproblem without a strong compatibility condition across the two subphysics. To exploit this flexibility while preserving the conservation structure of both subproblems, we consider fully mixed finite element methods in which the symmetry of the elastic stress tensor is enforced weakly. The resulting mixed formulation exhibits a saddle-point structure whose stability is determined by suitable inf--sup conditions. Inf--sup stability is established for several families of discrete spaces of arbitrary order, leading to optimal a priori error estimates. Iterative splitting strategies following the classical fixed-stress split with additional tuning are specifically investigated for the fully mixed formulation, with proof of convergence and rates depending on the coupling strength. Contrary to previous analyses on coupled problems with a symmetric structure, we theoretically prove the efficacy of negative stabilization, consistent with Schur-complement ideas. Numerical results based on analytical solutions and the classical Mandel problem support the theory.

[111]  arXiv:2603.18245 [pdf, ps, other]
Title: Who Tests the Testers? Systematic Enumeration and Coverage Audit of LLM Agent Tool Call Safety
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)

Large Language Model (LLM) agents increasingly act through external tools, making their safety contingent on tool-call workflows rather than text generation alone. While recent benchmarks evaluate agents across diverse environments and risk categories, a fundamental question remains unanswered: how complete are existing test suites, and what unsafe interaction patterns persist even after an agent passes the benchmark? We propose SafeAudit, a meta-audit framework that addresses this gap through two contributions. First, an LLM-based enumerator that systematically generates test cases by enumerating valid tool-call workflows and diverse user scenarios. Second, we introduce rule-resistance, a non-semantic, quantitative metric that distills compact safety rules from existing benchmarks and identifies unsafe interaction patterns that remain uncovered under those rules. Across 3 benchmarks and 12 environments, SafeAudit uncovers more than 20% residual unsafe behaviors that existing benchmarks fail to expose, with coverage growing monotonically as the testing budget increases. Our results highlight significant completeness gaps in current safety evaluation and motivate meta-auditing as a necessary complement to benchmark-based agent safety testing.

[112]  arXiv:2603.18246 [pdf, ps, other]
Title: Rapid Adaptation of Particle Dynamics for Generalized Deformable Object Mobile Manipulation
Comments: 8 pages, ICRA 2026
Subjects: Robotics (cs.RO)

We address the challenge of learning to manipulate deformable objects with unknown dynamics. In non-rigid objects, the dynamics parameters define how they react to interactions -- how they stretch, bend, compress, and move -- and they are critical to determining the optimal actions to perform a manipulation task successfully. In other robotic domains, such as legged locomotion and in-hand rigid object manipulation, state-of-the-art approaches can handle unknown dynamics using Rapid Motor Adaptation (RMA). Through a supervised procedure in simulation that encodes each rigid object's dynamics, such as mass and position, these approaches learn a policy that conditions actions on a vector of latent dynamic parameters inferred from sequences of state-actions. However, in deformable object manipulation, the object's dynamics not only includes its mass and position, but also how the shape of the object changes. Our key insight is that the recent ground-truth particle positions of a deformable object in simulation capture changes in the object's shape, making it possible to extend RMA to deformable object manipulation. This key insight allows us to develop RAPiD, a two-phase method that learns to perform real-robot deformable object mobile manipulation by: 1) learning a visuomotor policy conditioned on the object's dynamics embedding, which is encoded from the object's privileged information in simulation, such as its mass and ground-truth particle positions, and 2) learning to infer this embedding using non-privileged information instead, such as robot visual observations and actions, so that the learned policy can transfer to the real world. On a mobile manipulator with 22 degrees of freedom, RAPiD enables over 80%+ success rates across two vision-based deformable object mobile manipulation tasks in the real world, under various object dynamics, categories, and instances.

[113]  arXiv:2603.18247 [pdf, ps, other]
Title: AGRI-Fidelity: Evaluating the Reliability of Listenable Explanations for Poultry Disease Detection
Subjects: Machine Learning (cs.LG)

Existing XAI metrics measure faithfulness for a single model, ignoring model multiplicity where near-optimal classifiers rely on different or spurious acoustic cues. In noisy farm environments, stationary artifacts such as ventilation noise can produce explanations that are faithful yet unreliable, as masking-based metrics fail to penalize redundant shortcuts. We propose AGRI-Fidelity, a reliability-oriented evaluation framework for listenable explanations in poultry disease detection without spatial ground truth. The method combines cross-model consensus with cyclic temporal permutation to construct null distributions and compute a False Discovery Rate (FDR), suppressing stationary artifacts while preserving time-localized bioacoustic markers. Across real and controlled datasets, AGRI-Fidelity effectively provides reliability-aware discrimination for all data points versus masking-based metrics.

[114]  arXiv:2603.18251 [pdf, ps, other]
Title: Christoffel Adaptive Sampling for Sparse Random Feature Expansions
Subjects: Numerical Analysis (math.NA)

Random Feature Models (RFMs) have become a powerful tool for approximating multivariate functions and solving partial differential equations efficiently. Sparse Random Feature Expansions (SRFE) improve traditional RFMs by incorporating sparsity, making it particularly effective in data-scarce settings. In this work, we integrate active learning with sparse random feature approximations to improve sampling efficiency. Specifically, we incorporate the Christoffel function to guide an adaptive sampling process, dynamically selecting informative sample points based on their contribution to the function space. This approach optimizes the distribution of sample points by leveraging the Christoffel function associated with an iteratively-chosen basis obtained by the sparse recovery solver. We conduct numerical experiments comparing adaptive and nonadaptive sampling strategies with the SRFE framework and examine their accuracy for various function approximation tasks. Overall, our results demonstrate the advantages of adaptive sampling in maintaining high accuracy while reducing sample complexity for SRFE, highlighting its potential for scientific computing tasks where data is expensive to acquire.

[115]  arXiv:2603.18252 [pdf, ps, other]
Title: RIS-Aided Mobile Network Design
Journal-ref: 2025 IEEE 36th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Istanbul, T\"urkiye, 2025, pp. 1-6
Subjects: Networking and Internet Architecture (cs.NI)

In this paper, we examine the distribution of radio signal propagation within the city of Poznan (Poland) to determine optimal locations for deploying Reconfigurable Intelligent Surfaces (RIS). The study focuses on designing a 5G/6G Radio Access Network (RAN), incorporating eight Base Stations (BSs) that utilize either Single Input Single Output (SISO), or Multiple Input Multiple Output (MIMO) antenna technologies, depending on the network cell configuration. Through detailed simulations and analyses, we explore various propagation scenarios in both Line-of-Sight (LOS) and Non-Line-of-Sight (NLOS) conditions, considering the complex urban landscape characterized by high-rise buildings. The results demonstrate the potential of using RISs in mobile networks to enhance radio signal quality in urban environments through strategic placements. Our findings suggest that RISs can significantly mitigate Path Loss (PL) and improve signal coverage in challenging urban environments, particularly in areas where traditional base station deployment alone would be insufficient. Furthermore, the study highlights the role of RISs in reducing the need for additional base stations, thereby optimizing network costs and infrastructure while maintaining high-quality service delivery. The insights gained from this research provide valuable guidelines for network planners and engineers seeking to implement RIS technology in future 5G and beyond networks, ensuring more efficient and robust urban communication systems.

[116]  arXiv:2603.18254 [pdf, ps, other]
Title: Computation-Utility-Privacy Tradeoffs in Bayesian Estimation
Comments: To appear at STOC 2026
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

Bayesian methods lie at the heart of modern data science and provide a powerful scaffolding for estimation in data-constrained settings and principled quantification and propagation of uncertainty. Yet in many real-world use cases where these methods are deployed, there is a natural need to preserve the privacy of the individuals whose data is being scrutinized. While a number of works have attempted to approach the problem of differentially private Bayesian estimation through either reasoning about the inherent privacy of the posterior distribution or privatizing off-the-shelf Bayesian methods, these works generally do not come with rigorous utility guarantees beyond low-dimensional settings. In fact, even for the prototypical tasks of Gaussian mean estimation and linear regression, it was unknown how close one could get to the Bayes-optimal error with a private algorithm, even in the simplest case where the unknown parameter comes from a Gaussian prior. In this work, we give the first efficient algorithms for both of these problems that achieve mean-squared error $(1+o(1))\mathrm{OPT}$ and additionally show that both tasks exhibit an intriguing computational-statistical gap. For Bayesian mean estimation, we prove that the excess risk achieved by our method is optimal among all efficient algorithms within the low-degree framework, yet is provably worse than what is achievable by an exponential-time algorithm. For linear regression, we prove a qualitatively similar lower bound. Our algorithms draw upon the privacy-to-robustness framework of arXiv:2212.05015, but with the curious twist that to achieve private Bayes-optimal estimation, we need to design sum-of-squares-based robust estimators for inherently non-robust objects like the empirical mean and OLS estimator. Along the way we also add to the sum-of-squares toolkit a new kind of constraint based on short-flat decompositions.

[117]  arXiv:2603.18256 [pdf, ps, other]
Title: MolRGen: A Training and Evaluation Setting for De Novo Molecular Generation with Reasonning Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent advances in reasoning-based large language models (LLMs) have demonstrated substantial improvements in complex problem-solving tasks. Motivated by these advances, several works have explored the application of reasoning LLMs to drug discovery and molecular design. However, most existing approaches either focus on evaluation or rely on training setups that require ground-truth labels, such as molecule pairs with known property modifications. Such supervision is unavailable in \textit{de novo} molecular generation, where the objective is to generate novel molecules that optimize a desirability score without prior knowledge of high-scoring candidates. To bridge this gap, we introduce MolRGen, a large-scale benchmark and dataset for training and evaluating reasoning-based LLMs on \textit{de novo} molecular generation. Our contributions are threefold. First, we propose a setting to evaluate and train models for \textit{de novo} molecular generation and property prediction. Second, we introduce a novel diversity-aware top-$k$ score that captures both the quality and diversity of generated molecules. Third, we show our setting can be used to train LLMs for molecular generation, training a 24B LLM with reinforcement learning, and we provide a detailed analysis of its performance and limitations.

[118]  arXiv:2603.18257 [pdf, ps, other]
Title: Discovering What You Can Control: Interventional Boundary Discovery for Reinforcement Learning
Authors: Jiaxin Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Selecting relevant state dimensions in the presence of confounded distractors is a causal identification problem: observational statistics alone cannot reliably distinguish dimensions that correlate with actions from those that actions cause. We formalize this as discovering the agent's Causal Sphere of Influence and propose Interventional Boundary Discovery IBD, which applies Pearl's do-operator to the agent's own actions and uses two-sample testing to produce an interpretable binary mask over observation dimensions. IBD requires no learned models and composes with any downstream RL algorithm as a preprocessing step. Across 12 continuous control settings with up to 100 distractor dimensions, we find that: (1) observational feature selection can actively select confounded distractors while discarding true causal dimensions; (2) full-state RL degrades sharply once distractors outnumber relevant features by roughly 3:1 in our benchmarks; and (3)IBD closely tracks oracle performance across all distractor levels tested, with gains transferring across SAC and TD3.

[119]  arXiv:2603.18258 [pdf, ps, other]
Title: Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
Comments: Accepted at ICLR 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Direct Preference Optimization (DPO) has emerged as a popular algorithm for aligning pretrained large language models with human preferences, owing to its simplicity and training stability. However, DPO suffers from the recently identified squeezing effect (also known as likelihood displacement), where the probability of preferred responses decreases unintentionally during training. To understand and mitigate this phenomenon, we develop a theoretical framework that models the coordinate-wise dynamics in logit space. Our analysis reveals that negative-gradient updates cause residuals to expand rapidly along high-curvature directions, which underlies the squeezing effect, whereas Sharpness-Aware Minimization (SAM) can suppress this behavior through its curvature-regularization effect. Building on this insight, we investigate logits-SAM, a computationally efficient variant that perturbs only the output layer with negligible overhead. Extensive experiments on Pythia-2.8B, Mistral-7B, and Gemma-2B-IT across multiple datasets and benchmarks demonstrate that logits-SAM consistently improves the effectiveness of DPO and integrates seamlessly with other DPO variants. Code is available at https://github.com/RitianLuo/logits-sam-dpo.

[120]  arXiv:2603.18260 [pdf, ps, other]
Title: Manufacturing Micro-Patterned Surfaces with Multi-Robot Systems
Subjects: Robotics (cs.RO)

Applying micro-patterns to surfaces has been shown to impart useful physical properties such as drag reduction and hydrophobicity. However, current manufacturing techniques cannot produce micro-patterned surfaces at scale due to high-cost machinery and inefficient coverage techniques such as raster-scanning. In this work, we use multiple robots, each equipped with a patterning tool, to manufacture these surfaces. To allow these robots to coordinate during the patterning task, we use the ergodic control algorithm, which specifies coverage objectives using distributions. We demonstrate that robots can divide complicated coverage objectives by communicating compressed representations of their trajectory history both in simulations and experimental trials. Further, we show that robot-produced patterning can lower the coefficient of friction of metallic surfaces. This work demonstrates that distributed multi-robot systems can coordinate to manufacture products that were previously unrealizable at scale.

[121]  arXiv:2603.18261 [pdf, ps, other]
Title: LRConv-NeRV: Low Rank Convolution for Efficient Neural Video Compression
Authors: Tamer Shanableh
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Neural Representations for Videos (NeRV) encode entire video sequences within neural network parameters, offering an alternative paradigm to conventional video codecs. However, the convolutional decoder of NeRV remains computationally expensive and memory intensive, limiting its deployment in resource-constrained environments. This paper proposes LRConv-NeRV, an efficient NeRV variant that replaces selected dense 3x3 convolutional layers with structured low-rank separable convolutions, trained end-to-end within the decoder architecture. By progressively applying low-rank factorization from the largest to earlier decoder stages, LRConv-NeRV enables controllable trade-offs between reconstruction quality and efficiency. Extensive experiments demonstrate that applying LRConv only to the final decoder stage reduces decoder complexity by 68%, from 201.9 to 64.9 GFLOPs, and model size by 9.3%, while incurring negligible quality loss and achieving approximately 9.2% bitrate reduction. Under INT8 post-training quantization, LRConv-NeRV preserves reconstruction quality close to the dense NeRV baseline, whereas more aggressive factorization of early decoder stages leads to disproportionate quality degradation. Compared to existing work under layer-aligned settings, LRConv-NeRV achieves a more favorable efficiency versus quality trade-off, offering substantial GFLOPs and parameter reductions while maintaining higher PSNR/MS-SSIM and improved temporal stability. Temporal flicker analysis using LPIPS further shows that the proposed solution preserves temporal coherence close to the NeRV baseline, results establish LRConv-NeRV as a potential architectural alternative for efficient neural video decoding under low-precision and resource-constrained settings.

[122]  arXiv:2603.18266 [pdf, ps, other]
Title: Enactor: From Traffic Simulators to Surrogate World Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Traffic microsimulators are widely used to evaluate road network performance under various ``what-if" conditions. However, the behavior models controlling the actions of the actors are overly simplistic and fails to capture realistic actor-actor interactions. Deep learning-based methods have been applied to model vehicles and pedestrians as ``agents" responding to their surrounding ``environment" (including lanes, signals, and neighboring agents). Although effective in learning actor-actor interaction, these approaches fail to generate physically consistent trajectories over long time periods, and they do not explicitly address the complex dynamics that arise at traffic intersections which is a critical location in urban networks. Inspired by the World Model paradigm, we have developed an actor centric generative model using transformer-based architecture that is able to capture the actor-actor interaction, at the same time understanding the geometry to the traffic intersection to generate physically grounded trajectories that are based on learned behavior. Moreover, we test the model in a live ``simulation-in-the-loop" setting, where we generate the initial conditions of the actors using SUMO and then let the model control the dynamics of the actors. We let the simulation run for 40000 timesteps (4000 seconds), testing the performance of the model on long timerange and evaluating the trajectories on traffic engineering related metrics. Experimental results demonstrate that the proposed framework effectively captures complex actor-actor interactions and generates long-horizon, physically consistent trajectories, while requiring significantly fewer training samples than traditional agent-centric generative approaches. Our model is able to outperform the baseline in traffic related as well as aggregate metrics where our model beats the baseline by more than 10x on the KL-Divergence.

[123]  arXiv:2603.18271 [pdf, ps, other]
Title: SG-CoT: An Ambiguity-Aware Robotic Planning Framework using Scene Graph Representations
Comments: This work has been submitted to the IEEE Robotics and Automation Letters for possible publication
Subjects: Robotics (cs.RO)

Ambiguity poses a major challenge to large language models (LLMs) used as robotic planners. In this letter, we present Scene Graph-Chain-of-Thought (SG-CoT), a two-stage framework where LLMs iteratively query a scene graph representation of the environment to detect and clarify ambiguities. First, a structured scene graph representation of the environment is constructed from input observations, capturing objects, their attributes, and relationships with other objects. Second, the LLM is equipped with retrieval functions to query portions of the scene graph that are relevant to the provided instruction. This grounds the reasoning process of the LLM in the observation, increasing the reliability of robotic planners under ambiguous situations. SG-CoT also allows the LLM to identify the source of ambiguity and pose a relevant disambiguation question to the user or another robot. Extensive experimentation demonstrates that SG-CoT consistently outperforms prior methods, with a minimum of 10% improvement in question accuracy and a minimum success rate increase of 4% in single-agent and 15% in multi-agent environments, validating its effectiveness for more generalizable robot planning.

[124]  arXiv:2603.18272 [pdf, ps, other]
Title: Retrieval-Augmented LLM Agents: Learning to Learn from Experience
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

While large language models (LLMs) have advanced the development of general-purpose agents, achieving robust generalization to unseen tasks remains a significant challenge. Current approaches typically rely on either fine-tuning or training-free memory-augmented generation using retrieved experience; yet both have limitations: fine-tuning often fails to extrapolate to new tasks, while experience retrieval often underperforms compared to supervised baselines. In this work, we propose to combine these approaches and systematically study how to train retrieval-augmented LLM agents to effectively leverage retrieved trajectories in-context. First, we establish a robust supervised fine-tuning (SFT) recipe using LoRA that outperforms several state-of-the-art agent training pipelines. Second, we provide a detailed analysis of key design choices for experience retrieval, identifying optimal strategies for storage, querying, and trajectory selection. Finally, we propose a pipeline that integrates experience retrieval into the fine-tuning process. Our results demonstrate that this combined approach significantly improves generalization to unseen tasks, providing a scalable and effective framework for building agents that learn to learn from experience.

[125]  arXiv:2603.18273 [pdf, ps, other]
Title: EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research
Subjects: Artificial Intelligence (cs.AI)

In this technical report, we present the Educational Data Mining Automated Research System (EDM-ARS), a domain-specific multi-agent pipeline that automates end-to-end educational data mining (EDM) research. We conceptualize EDM-ARS as a general framework for domain-aware automated research pipelines, where educational expertise is embedded into each stage of the research lifecycle. As a first instantiation of this framework, we focus on predictive modeling tasks. Within this scope, EDM-ARS orchestrates five specialized LLM-powered agents (ProblemFormulator, DataEngineer, Analyst, Critic, and Writer) through a state-machine coordinator that supports revision loops, checkpoint-based recovery, and sandboxed code execution. Given a research prompt and a dataset, EDM-ARS produces a complete LaTeX manuscript with real Semantic Scholar citations, validated machine learning analyses, and automated methodological peer review. We also provide a detailed description of the system architecture, the three-tier data registry design that encodes educational domain expertise, the specification of each agent, the inter-agent communication protocol, and mechanisms for error-handling and self-correction. Finally, we discuss current limitations, including single-dataset scope and formulaic paper output, and outline a phased roadmap toward causal inference, transfer learning, psychometric, and multi-dataset generalization. EDM-ARS is released as an open-source project to support the educational research community.

[126]  arXiv:2603.18280 [pdf, ps, other]
Title: Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails
Authors: Gregory N. Frank
Comments: 31 pages, 7 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Current alignment evaluation mostly measures whether models encode dangerous concepts and whether they refuse harmful requests. Both miss the layer where alignment often operates: routing from concept detection to behavioral policy. We study political censorship in Chinese-origin language models as a natural experiment, using probes, surgical ablations, and behavioral tests across nine open-weight models from five labs. Three findings follow. First, probe accuracy alone is non-diagnostic: political probes, null controls, and permutation baselines can all reach 100%, so held-out category generalization is the informative test. Second, surgical ablation reveals lab-specific routing. Removing the political-sensitivity direction eliminates censorship and restores accurate factual output in most models tested, while one model confabulates because its architecture entangles factual knowledge with the censorship mechanism. Cross-model transfer fails, indicating that routing geometry is model- and lab-specific. Third, refusal is no longer the dominant censorship mechanism. Within one model family, hard refusal falls to zero while narrative steering rises to the maximum, making censorship invisible to refusal-only benchmarks. These results support a three-stage descriptive framework: detect, route, generate. Models often retain the relevant knowledge; alignment changes how that knowledge is expressed. Evaluations that audit only detection or refusal therefore miss the routing mechanism that most directly determines behavior.

[127]  arXiv:2603.18281 [pdf, ps, other]
Title: On Additive Gaussian Processes for Wind Farm Power Prediction
Journal-ref: In: Rainieri, C., Gentile, C., Aenlle L\'opez, M. (eds) Proceedings of the 10th International Operational Modal Analysis Conference (IOMAC 2024)
Subjects: Machine Learning (cs.LG)

Population-based Structural Health Monitoring (PBSHM) aims to share information between similar machines or structures. This paper takes a population-level perspective, exploring the use of additive Gaussian processes to reveal variations in turbine-specific and farm-level power models over a collected wind farm dataset. The predictions illustrate patterns in wind farm power generation, which follow intuition and should enable more informed control and decision-making.

[128]  arXiv:2603.18282 [pdf, ps, other]
Title: CycleCap: Improving VLMs Captioning Performance via Self-Supervised Cycle Consistency Fine-Tuning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual-Language Models (VLMs) have achieved remarkable progress in image captioning, visual question answering, and visual reasoning. Yet they remain prone to vision-language misalignment, often producing overly generic or hallucinated descriptions. Existing approaches address this via instruction tuning-requiring costly, large-scale annotated datasets or via complex test-time frameworks for caption refinement. In this work, we revisit image-text alignment through the lens of cycle consistency: given an image and a caption generated by an image-to-text model, the backward mapping through a text-to-image model should reconstruct an image that closely matches the original. In our setup, a VLM serves as the image-to-text component, while a pre-trained text-to-image model closes the loop by reconstructing the image from the generated caption. Building on this, we introduce CycleCap, a fine-tuning scheme to improve image captioning using Group Relative Policy Optimization (GRPO) with a reward based on the similarity between the original and reconstructed images, computed on-the-fly. Unlike previous work that uses cycle consistency loss for preference dataset construction, our method leverages cycle consistency directly as a self-supervised training signal. This enables the use of raw images alone, eliminating the need for curated image-text datasets, while steering the VLM to produce more accurate and grounded text descriptions. Applied to four VLMs ranging from 1B to 7B parameters, CycleCap yields consistent improvements across captioning and hallucination benchmarks, surpassing state-of-the-art methods that rely on supervised cycle consistency training.

[129]  arXiv:2603.18283 [pdf, ps, other]
Title: Turnpike with Uncertain Measurements: Triangle-Equality ILP with a Deterministic Recovery Guarantee
Comments: 16 pages, 4 figures
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)

We study Turnpike with uncertain measurements: reconstructing a one-dimensional point set from an unlabeled multiset of pairwise distances under bounded noise and rounding. We give a combinatorial characterization of realizability via a multi-matching that labels interval indices by distinct distance values while satisfying all triangle equalities. This yields an ILP based on the triangle equality whose constraint structure depends only on the two-partition set $\mathcal{P}_y=\{(r,s,t): y_r+y_s=y_t\}$ and a natural LP relaxation with $\{0,1\}$-coefficient constraints. Integral solutions certify realizability and output an explicit assignment matrix, enabling an assignment-first, regression-second pipeline for downstream coordinate estimation. Under bounded noise followed by rounding, we prove a deterministic separation condition under which $\mathcal{P}_y$ is recovered exactly, so the ILP/LP receives the same combinatorial input as in the noiseless case. Experiments illustrate integrality behavior and degradation outside the provable regime.

[130]  arXiv:2603.18284 [pdf, ps, other]
Title: Offload or Overload: A Platform Measurement Study of Mobile Robotic Manipulation Workloads
Comments: 15 pages, 17 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

Mobile robotic manipulation--the ability of robots to navigate spaces and interact with objects--is a core capability of physical AI. Foundation models have led to breakthroughs in their performance, but at a significant computational cost. We present the first measurement study of mobile robotic manipulation workloads across onboard, edge, and cloud GPU platforms. We find that the full workload stack is infeasible to run on smaller onboard GPUs, while larger onboard GPUs drain robot batteries several hours faster. Offloading alleviates these constraints but introduces its own challenges, as additional network latency degrades task accuracy, and the bandwidth requirement makes naive cloud offloading impractical. Finally, we quantify opportunities and pitfalls of sharing compute across robot fleets. We believe our measurement study will be crucial to designing inference systems for mobile robots.

[131]  arXiv:2603.18290 [pdf, ps, other]
Title: CORE: Robust Out-of-Distribution Detection via Confidence and Orthogonal Residual Scoring
Comments: 26 pages, 5 figures, includes supplementary material as appendix
Subjects: Artificial Intelligence (cs.AI)

Out-of-distribution (OOD) detection is essential for deploying deep learning models reliably, yet no single method performs consistently across architectures and datasets -- a scorer that leads on one benchmark often falters on another. We attribute this inconsistency to a shared structural limitation: logit-based methods see only the classifier's confidence signal, while feature-based methods attempt to measure membership in the training distribution but do so in the full feature space where confidence and membership are entangled, inheriting architecture-sensitive failure modes. We observe that penultimate features naturally decompose into two orthogonal subspaces: a classifier-aligned component encoding confidence, and a residual the classifier discards. We discover that this residual carries a class-specific directional signature for in-distribution data -- a membership signal invisible to logit-based methods and entangled with noise in feature-based methods. We propose CORE (COnfidence + REsidual), which disentangles the two signals by scoring each subspace independently and combines them via normalized summation. Because the two signals are orthogonal by construction, their failure modes are approximately independent, producing robust detection where either view alone is unreliable. CORE achieves competitive or state-of-the-art performance across five architectures and five benchmark configurations, ranking first in three of five settings and achieving the highest grand average AUROC with negligible computational overhead.

[132]  arXiv:2603.18294 [pdf, ps, other]
Title: The Validity Gap in Health AI Evaluation: A Cross-Sectional Analysis of Benchmark Composition
Subjects: Artificial Intelligence (cs.AI)

Background: Clinical trials rely on transparent inclusion criteria to ensure generalizability. In contrast, benchmarks validating health-related large language models (LLMs) rarely characterize the "patient" or "query" populations they contain. Without defined composition, aggregate performance metrics may misrepresent model readiness for clinical use.
Methods: We analyzed 18,707 consumer health queries across six public benchmarks using LLMs as automated coding instruments to apply a standardized 16-field taxonomy profiling context, topic, and intent.
Results: We identified a structural "validity gap." While benchmarks have evolved from static retrieval to interactive dialogue, clinical composition remains misaligned with real-world needs. Although 42% of the corpus referenced objective data, this was polarized toward wellness-focused wearable signals (17.7%); complex diagnostic inputs remained rare, including laboratory values (5.2%), imaging (3.8%), and raw medical records (0.6%). Safety-critical scenarios were effectively absent: suicide/self-harm queries comprised <0.7% of the corpus and chronic disease management only 5.5%. Benchmarks also neglected vulnerable populations (pediatrics/older adults <11%) and global health needs.
Conclusions: Evaluation benchmarks remain misaligned with real-world clinical needs, lacking raw clinical artifacts, adequate representation of vulnerable populations, and longitudinal chronic care scenarios. The field must adopt standardized query profiling--analogous to clinical trial reporting--to align evaluation with the full complexity of clinical practice.

[133]  arXiv:2603.18295 [pdf, ps, other]
Title: Constrained Hybrid Metaheuristic: A Universal Framework for Continuous Optimisation
Subjects: Neural and Evolutionary Computing (cs.NE)

This paper presents the constrained Hybrid Metaheuristic (cHM) algorithm as a general framework for continuous optimisation. Unlike many existing metaheuristics that are tailored to specific function classes or problem domains, cHM is designed to operate across a broad spectrum of objective functions, including those with unknown, heterogeneous, or complex properties such as non-convexity, non-separability, and varying smoothness. We provide a formal description of the algorithm, highlighting its modular structure and two-phase operation, which facilitates dynamic adaptation to the problem's characteristics. A key feature of cHM is its ability to harness synergy between both candidate solutions and component metaheuristic strategies. This property allows the algorithm to apply the most appropriate search behaviour at each stage of the optimisation process, thereby improving convergence and robustness. Our extensive experimental evaluation on 28 benchmark functions demonstrates that cHM consistently matches or outperforms traditional metaheuristics in terms of solution quality and convergence speed. In addition, a practical application of the algorithm is demonstrated for a feature selection problem in the context of data classification. The results underscore its potential as a versatile and effective black-box optimiser suitable for both theoretical research and practical applications.

[134]  arXiv:2603.18297 [pdf, ps, other]
Title: Path-Constrained Mixture-of-Experts
Comments: Under review
Subjects: Machine Learning (cs.LG)

Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling by activating only a subset of parameters for each input. However, conventional MoE routing selects each layer's experts independently, creating N^L possible expert paths -- for N experts across L layers. This far exceeds typical training set sizes, leading to statistical inefficiency as the model may not learn meaningful structure over such a vast path space. To constrain it, we propose \pathmoe, which shares router parameters across consecutive layers. Experiments on 0.9B and 16B parameter models demonstrate consistent improvements on perplexity and downstream tasks over independent routing, while eliminating the need for auxiliary load balancing losses. Analysis reveals that tokens following the same path naturally cluster by linguistic function, with \pathmoe{} producing more concentrated groups, better cross-layer consistency, and greater robustness to routing perturbations. These results offer a new perspective for understanding MoE architectures through the lens of expert paths.

[135]  arXiv:2603.18298 [pdf, ps, other]
Title: Sparse3DTrack: Monocular 3D Object Tracking Using Sparse Supervision
Comments: 22 pages, 8 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Monocular 3D object tracking aims to estimate temporally consistent 3D object poses across video frames, enabling autonomous agents to reason about scene dynamics. However, existing state-of-the-art approaches are fully supervised and rely on dense 3D annotations over long video sequences, which are expensive to obtain and difficult to scale. In this work, we address this fundamental limitation by proposing the first sparsely supervised framework for monocular 3D object tracking. Our approach decomposes the task into two sequential sub-problems: 2D query matching and 3D geometry estimation. Both components leverage the spatio-temporal consistency of image sequences to augment a sparse set of labeled samples and learn rich 2D and 3D representations of the scene. Leveraging these learned cues, our model automatically generates high-quality 3D pseudolabels across entire videos, effectively transforming sparse supervision into dense 3D track annotations. This enables existing fully-supervised trackers to effectively operate under extreme label sparsity. Extensive experiments on the KITTI and nuScenes datasets demonstrate that our method significantly improves tracking performance, achieving an improvement of up to 15.50 p.p. while using at most four ground truth annotations per track.

[136]  arXiv:2603.18299 [pdf, ps, other]
Title: ALIGN: Adversarial Learning for Generalizable Speech Neuroprosthesis
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

Intracortical brain-computer interfaces (BCIs) can decode speech from neural activity with high accuracy when trained on data pooled across recording sessions. In realistic deployment, however, models must generalize to new sessions without labeled data, and performance often degrades due to cross-session nonstationarities (e.g., electrode shifts, neural turnover, and changes in user strategy). In this paper, we propose ALIGN, a session-invariant learning framework based on multi-domain adversarial neural networks for semi-supervised cross-session adaptation. ALIGN trains a feature encoder jointly with a phoneme classifier and a domain classifier operating on the latent representation. Through adversarial optimization, the encoder is encouraged to preserve task-relevant information while suppressing session-specific cues. We evaluate ALIGN on intracortical speech decoding and find that it generalizes consistently better to previously unseen sessions, improving both phoneme error rate and word error rate relative to baselines. These results indicate that adversarial domain alignment is an effective approach for mitigating session-level distribution shift and enabling robust longitudinal BCI decoding.

[137]  arXiv:2603.18300 [pdf, ps, other]
Title: Auditing Preferences for Brands and Cultures in LLMs
Comments: 20 pages, 2 figures
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Large language models (LLMs) based AI systems increasingly mediate what billions of people see, choose and buy. This creates an urgent need to quantify the systemic risks of LLM-driven market intermediation, including its implications for market fairness, competition, and the diversity of information exposure.
This paper introduces ChoiceEval, a reproducible framework for auditing preferences for brands and cultures in large language models (LLMs) under realistic usage conditions. ChoiceEval addresses two core technical challenges: (i) generating realistic, persona-diverse evaluation queries and (ii) converting free-form outputs into comparable choice sets and quantitative preference metrics. For a given topic (e.g. running shoes, hotel chains, travel destinations), the framework segments users into psychographic profiles (e.g., budget-conscious, wellness-focused, convenience), and then derives diverse prompts that reflect real-world advice-seeking and decision-making behaviour. LLM responses are converted into normalised top-k choice sets. Preference and geographic bias are then quantified using comparable metrics across topics and personas. Thus, ChoiceEval provides a scalable audit pipeline for researchers, platforms, and regulators, linking model behaviour to real-world economic outcomes.
Applied to Gemini, GPT, and DeepSeek across 10 topics spanning commerce and culture and more than 2,000 questions, ChoiceEval reveals consistent preferences: U.S.-developed models Gemini and GPT show marked favouritism toward American entities, while China-developed DeepSeek exhibits more balanced yet still detectable geographic preferences. These patterns persist across user personas, suggesting systematic rather than incidental effects.

[138]  arXiv:2603.18306 [pdf, ps, other]
Title: Fast and Generalizable NeRF Architecture Selection for Satellite Scene Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Neural Radiance Fields (NeRF) have emerged as a powerful approach for photorealistic 3D reconstruction from multi-view images. However, deploying NeRF for satellite imagery remains challenging. Each scene requires individual training, and optimizing architectures via Neural Architecture Search (NAS) demands hours to days of GPU time. While existing approaches focus on architectural improvements, our SHAP analysis reveals that multi-view consistency, rather than model architecture, determines reconstruction quality. Based on this insight, we develop PreSCAN, a predictive framework that estimates NeRF quality prior to training using lightweight geometric and photometric descriptors. PreSCAN selects suitable architectures in < 30 seconds with < 1 dB prediction error, achieving 1000$\times$ speedup over NAS. We further demonstrate PreSCAN's deployment utility on edge platforms (Jetson Orin), where combining its predictions with offline cost profiling reduces inference power by 26% and latency by 43% with minimal quality loss. Experiments on DFC2019 datasets confirm that PreSCAN generalizes across diverse satellite scenes without retraining.

[139]  arXiv:2603.18308 [pdf, ps, other]
Title: Proprioceptive-only State Estimation for Legged Robots with Set-Coverage Measurements of Learned Dynamics
Subjects: Robotics (cs.RO)

Proprioceptive-only state estimation is attractive for legged robots since it is computationally cheaper and is unaffected by perceptually degraded conditions. The history of joint-level measurements contains rich information that can be used to infer the dynamics of the system and subsequently produce navigational measurements. Recent approaches produce these estimates with learned measurement models and fuse with IMU data, under a Gaussian noise assumption. However, this assumption can easily break down with limited training data and render the estimates inconsistent and potentially divergent. In this work, we propose a proprioceptive-only state estimation framework for legged robots that characterizes the measurement noise using set-coverage statements that do not assume any distribution. We develop a practical and computationally inexpensive method to use these set-coverage measurements with a Gaussian filter in a systematic way. We validate the approach in both simulation and two real-world quadrupedal datasets. Comparison with the Gaussian baselines shows that our proposed method remains consistent and is not prone to drift under real noise scenarios.

[140]  arXiv:2603.18309 [pdf, ps, other]
Title: Unrolled Reconstruction with Integrated Super-Resolution for Accelerated 3D LGE MRI
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accelerated 3D late gadolinium enhancement (LGE) MRI requires robust reconstruction methods to recover thin atrial structures from undersampled k-space data. While unrolled model-based networks effectively integrate physics-driven data consistency with learned priors, they operate at the acquired resolution and may fail to fully recover high-frequency detail. We propose a hybrid unrolled reconstruction framework in which an Enhanced Deep Super-Resolution (EDSR) network replaces the proximal operator within each iteration of the optimization loop, enabling joint super-resolution enhancement and data consistency enforcement. The model is trained end-to-end on retrospectively undersampled preclinical 3D LGE datasets and compared against compressed sensing, Model-Based Deep Learning (MoDL), and self-guided Deep Image Prior (DIP) baselines. Across acceleration factors, the proposed method consistently improves PSNR and SSIM over standard unrolled reconstruction and better preserves fine cardiac structures, leading to improved LA (left atrium) segmentation performance. These results demonstrate that integrating super-resolution priors directly within model-based reconstruction provides measurable gains in accelerated 3D LGE MRI.

[141]  arXiv:2603.18314 [pdf, ps, other]
Title: Approximate Subgraph Matching with Neural Graph Representations and Reinforcement Learning
Comments: 10 pages, 5 figures. Code available at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Approximate subgraph matching (ASM) is a task that determines the approximate presence of a given query graph in a large target graph. Being an NP-hard problem, ASM is critical in graph analysis with a myriad of applications ranging from database systems and network science to biochemistry and privacy. Existing techniques often employ heuristic search strategies, which cannot fully utilize the graph information, leading to sub-optimal solutions. This paper proposes a Reinforcement Learning based Approximate Subgraph Matching (RL-ASM) algorithm that exploits graph transformers to effectively extract graph representations and RL-based policies for ASM. Our model is built upon the branch-and-bound algorithm that selects one pair of nodes from the two input graphs at a time for potential matches. Instead of using heuristics, we exploit a Graph Transformer architecture to extract feature representations that encode the full graph information. To enhance the training of the RL policy, we use supervised signals to guide our agent in an imitation learning stage. Subsequently, the policy is fine-tuned with the Proximal Policy Optimization (PPO) that optimizes the accumulative long-term rewards over episodes. Extensive experiments on both synthetic and real-world datasets demonstrate that our RL-ASM outperforms existing methods in terms of effectiveness and efficiency. Our source code is available at https://github.com/KaiyangLi1992/RL-ASM.

[142]  arXiv:2603.18315 [pdf, ps, other]
Title: DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving
Comments: 32 pages, 15 figures. Code and demo available online
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Ensuring safe decision-making in autonomous vehicles remains a fundamental challenge despite rapid advances in end-to-end learning approaches. Traditional reinforcement learning (RL) methods rely on manually engineered rewards or sparse collision signals, which fail to capture the rich contextual understanding required for safe driving and make unsafe exploration unavoidable in real-world settings. Recent vision-language models (VLMs) offer promising semantic understanding capabilities; however, their high inference latency and susceptibility to hallucination hinder direct application to real-time vehicle control. To address these limitations, this paper proposes DriveVLM-RL, a neuroscience-inspired framework that integrates VLMs into RL through a dual-pathway architecture for safe and deployable autonomous driving. The framework decomposes semantic reward learning into a Static Pathway for continuous spatial safety assessment using CLIP-based contrasting language goals, and a Dynamic Pathway for attention-gated multi-frame semantic risk reasoning using a lightweight detector and a large VLM. A hierarchical reward synthesis mechanism fuses semantic signals with vehicle states, while an asynchronous training pipeline decouples expensive VLM inference from environment interaction. All VLM components are used only during offline training and are removed at deployment, ensuring real-time feasibility. Experiments in the CARLA simulator show significant improvements in collision avoidance, task success, and generalization across diverse traffic scenarios, including strong robustness under settings without explicit collision penalties. These results demonstrate that DriveVLM-RL provides a practical paradigm for integrating foundation models into autonomous driving without compromising real-time feasibility. Demo video and code are available at: https://zilin-huang.github.io/DriveVLM-RL-website/

[143]  arXiv:2603.18322 [pdf, ps, other]
Title: Polynomial Constructions and Deletion-Ball Geometry for Multiset Deletion Codes
Comments: 41 pages
Subjects: Information Theory (cs.IT)

We study error-correcting codes in the space $\mathcal{S}_{n,q}$ of length-$n$ multisets over a $q$-ary alphabet under the deletion metric, motivated by permutation channels in which ordering is completely lost and errors act only on symbol multiplicities. We develop two complementary directions. First, we present polynomial Sidon-type constructions over finite fields, in both projective and affine forms, yielding multiset $t$-deletion-correcting codes in the regime $t<q$ with redundancy $t+O(1)$, independent of the blocklength $n$. Second, we develop a geometric analysis of deletion balls in $\mathcal{S}_{n,q}$. Using difference-vector representations together with a diagonal reduction of the relevant generating functions, we derive exact generating-function expressions for individual deletion-ball sizes, exact formulas for the number of ordered pairs of multisets at a fixed distance $m$, and consequently for the average ball size. We prove that radius-$r$ deletion balls are minimized at extreme multisets and maximized at the most balanced multisets, giving a formal global characterization of extremal centers in $\mathcal{S}_{n,q}$. We further relate the maximal-ball value to the ideal difference set $S_{q-1}(r,r)$ through boundary truncation, obtaining explicit closed forms for $q=2$ and $q=3$. These geometric results lead to volume-based bounds on code size, including sphere-packing upper bounds, a boundary-aware analysis of code--anticode arguments, and Gilbert--Varshamov-type lower bounds governed by exact average ball sizes. For fixed $q$ and $t$, the resulting average-ball lower bound matches the interior-difference-set scale asymptotically.

[144]  arXiv:2603.18325 [pdf, ps, other]
Title: Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum
Comments: 39 pages, 4 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Chain-of-thought reasoning, where language models expend additional computation by producing thinking tokens prior to final responses, has driven significant advances in model capabilities. However, training these reasoning models is extremely costly in terms of both data and compute, as it involves collecting long traces of reasoning behavior from humans or synthetic generators and further post-training the model via reinforcement learning. Are these costs fundamental, or can they be reduced through better algorithmic design? We show that autocurriculum, where the model uses its own performance to decide which problems to focus training on, provably improves upon standard training recipes for both supervised fine-tuning (SFT) and reinforcement learning (RL). For SFT, we show that autocurriculum requires exponentially fewer reasoning demonstrations than non-adaptive fine-tuning, by focusing teacher supervision on prompts where the current model struggles. For RL fine-tuning, autocurriculum decouples the computational cost from the quality of the reference model, reducing the latter to a burn-in cost that is nearly independent of the target accuracy. These improvements arise purely from adaptive data selection, drawing on classical techniques from boosting and learning from counterexamples, and requiring no assumption on the distribution or difficulty of prompts.

[145]  arXiv:2603.18326 [pdf, ps, other]
Title: Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration
Subjects: Machine Learning (cs.LG)

While offline reinforcement learning provides reliable policies for real-world deployment, its inherent pessimism severely restricts an agent's ability to explore and collect novel data online. Drawing inspiration from safe reinforcement learning, exploring near the boundary of regions well covered by the offline dataset and reliably modeled by the simulator allows an agent to take manageable risks--venturing into informative but moderate-uncertainty states while remaining close enough to familiar regions for safe recovery. However, naively rewarding this boundary-seeking behavior can lead to a degenerate parking behavior, where the agent simply stops once it reaches the frontier. To solve this, we propose a novel vector-field reward shaping paradigm designed to induce continuous, safe boundary exploration for non-adaptive deployed policies. Operating on an uncertainty oracle trained from offline data, our reward combines two complementary components: a gradient-alignment term that attracts the agent toward a target uncertainty level, and a rotational-flow term that promotes motion along the local tangent plane of the uncertainty manifold. Through theoretical analysis, we show that this reward structure naturally induces sustained exploratory behavior along the boundary while preventing degenerate solutions. Empirically, by integrating our proposed reward shaping with Soft Actor-Critic on a 2D continuous navigation task, we validate that agents successfully traverse uncertainty boundaries while balancing safe, informative data collection with primary task completion.

[146]  arXiv:2603.18327 [pdf, ps, other]
Title: Consumer-to-Clinical Language Shifts in Ambient AI Draft Notes and Clinician-Finalized Documentation: A Multi-level Analysis
Subjects: Artificial Intelligence (cs.AI)

Ambient AI generates draft clinical notes from patient-clinician conversations, often using lay or consumer-oriented phrasing to support patient understanding instead of standardized clinical terminology. How clinicians revise these drafts for professional documentation conventions remains unclear. We quantified clinician editing for consumer-to- clinical normalization using a dictionary-confirmed transformation framework. We analyzed 71,173 AI-draft and finalized-note section pairs from 34,726 encounters. Confirmed transformations were defined as replacing a consumer expression with its dictionary-mapped clinical equivalent in the same section. Editing significantly reduced terminology density across all sections (p < 0.001). The Assessment and Plan accounted for the largest transformation volume (59.3%). Our analysis identified 7,576 transformation events across 4,114 note sections (5.8%), representing 1.2% consumer-term deletions. Transformation intensity varied across individual clinicians (p < 0.001). Overall, clinician post-editing demonstrates consistent shifts from conversational phrasing toward standardized, section- appropriate clinical terminology, supporting section-aware ambient AI design.

[147]  arXiv:2603.18328 [pdf, ps, other]
Title: A Family of Adaptive Activation Functions for Mitigating Failure Modes in Physics-Informed Neural Networks
Authors: Krishna Murari
Subjects: Machine Learning (cs.LG)

Physics-Informed Neural Networks(PINNs) are a powerful and flexible learning framework that has gained significant attention in recent years. It has demonstrated strong performance across a wide range of scientific and engineering problems. In parallel, wavelets have been extensively used as efficient computational tools due to their strong approximation capabilities. Motivated by the common failure modes observed in standard PINNs, this work introduces a novel family of adaptive wavelet-based activation functions. The proposed activation functions significantly improve training stability and expressive power by combining trainable wavelet functions with either trainable or fixed hyperbolic tangent and softplus functions. Five distinct activation functions are developed within the PINN framework and systematically evaluated across four representative classes of partial differential equations (PDEs). Comprehensive comparisons using bar plots demonstrate improved robustness and accuracy compared to traditional activation functions. Furthermore, the proposed approach is validated through direct comparisons with baseline PINNs, transformer-based architectures such as PINNsFormer, and other deep learning models, highlighting its effectiveness and generality.

[148]  arXiv:2603.18329 [pdf, ps, other]
Title: FaithSteer-BENCH: A Deployment-Aligned Stress-Testing Benchmark for Inference-Time Steering
Subjects: Artificial Intelligence (cs.AI)

Inference-time steering is widely regarded as a lightweight and parameter-free mechanism for controlling large language model (LLM) behavior, and prior work has often suggested that simple activation-level interventions can reliably induce targeted behavioral changes. However, such conclusions are typically drawn under relatively relaxed evaluation settings that overlook deployment constraints, capability trade-offs, and real-world robustness. We therefore introduce \textbf{FaithSteer-BENCH}, a stress-testing benchmark that evaluates steering methods at a fixed deployment-style operating point through three gate-wise criteria: controllability, utility preservation, and robustness. Across multiple models and representative steering approaches, we uncover several systematic failure modes that are largely obscured under standard evaluation, including illusory controllability, measurable cognitive tax on unrelated capabilities, and substantial brittleness under mild instruction-level perturbations, role prompts, encoding transformations, and data scarcity. Gate-wise benchmark results show that existing methods do not necessarily provide reliable controllability in deployment-oriented practical settings. In addition, mechanism-level diagnostics indicate that many steering methods induce prompt-conditional alignment rather than stable latent directional shifts, further explaining their fragility under stress. FaithSteer-BENCH therefore provides a unified benchmark and a clearer analytical lens for future method design, reliability evaluation, and deployment-oriented research in steering.

[149]  arXiv:2603.18330 [pdf, ps, other]
Title: MemArchitect: A Policy Driven Memory Governance Layer
Comments: This is an on going research work and will be updated periodically
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Persistent Large Language Model (LLM) agents expose a critical governance gap in memory management. Standard Retrieval-Augmented Generation (RAG) frameworks treat memory as passive storage, lacking mechanisms to resolve contradictions, enforce privacy, or prevent outdated information ("zombie memories") from contaminating the context window.
We introduce MemArchitect, a governance layer that decouples memory lifecycle management from model weights. MemArchitect enforces explicit, rule-based policies, including memory decay, conflict resolution, and privacy controls.
We demonstrate that governed memory consistently outperforms unmanaged memory in agentic settings, highlighting the necessity of structured memory governance for reliable and safe autonomous systems.

[150]  arXiv:2603.18331 [pdf, ps, other]
Title: Understanding the Theoretical Foundations of Deep Neural Networks through Differential Equations
Subjects: Artificial Intelligence (cs.AI)

Deep neural networks (DNNs) have achieved remarkable empirical success, yet the absence of a principled theoretical foundation continues to hinder their systematic development. In this survey, we present differential equations as a theoretical foundation for understanding, analyzing, and improving DNNs. We organize the discussion around three guiding questions: i) how differential equations offer a principled understanding of DNN architectures, ii) how tools from differential equations can be used to improve DNN performance in a principled way, and iii) what real-world applications benefit from grounding DNNs in differential equations. We adopt a two-fold perspective spanning the model level, which interprets the whole DNN as a differential equation, and the layer level, which models individual DNN components as differential equations. From these two perspectives, we review how this framework connects model design, theoretical analysis, and performance improvement. We further discuss real-world applications, as well as key challenges and opportunities for future research.

[151]  arXiv:2603.18333 [pdf, ps, other]
Title: Trajectory Landscapes for Therapeutic Strategy Design in Agent-Based Tumor Microenvironment Models
Subjects: Systems and Control (eess.SY)

Multiplex tissue imaging (MTI) enables high- dimensional, spatially resolved measurements of the tumor microenvironment (TME), but most clinical datasets are tempo- rally undersampled and longitudinally limited, restricting direct inference of underlying spatiotemporal dynamics and effective intervention timing. Agent-based models (ABMs) provide mech- anistic, stochastic simulators of TME evolution; yet their high- dimensional state space and uncertain parameterization make direct control design challenging. This work presents a reduced- order, simulation-driven framework for therapeutic strategy design using ABM-derived trajectory ensembles. Starting from a nominal ABM, we systematically perturb biologically plausible parameters to generate a set of simulated trajectories and construct a low-dimensional trajectory landscape describing TME evolution. From time series of spatial summary statistics extracted from the simulations, we learn a probabilistic Markov State Model (MSM) that captures metastable states and the transitions between them. To connect simulation dynamics with clinical observations, we map patient MTI snapshots onto the landscape and assess concordance with observed spatial phenotypes and clinical outcomes. We further show that conditioning the MSM on dominant governing parameters yields group-specific transition models to formulate a finite-horizon Markov Decision Process (MDP) for treatment scheduling. The resulting framework enables simulation-grounded therapeutic policy design for partially observed biological systems without requiring longitudinal patient measurements.

[152]  arXiv:2603.18334 [pdf, ps, other]
Title: Can LLMs Reason Like Automated Theorem Provers for Rust Verification? VCoT-Bench: Evaluating via Verification Chain of Thought
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

As Large Language Models (LLMs) increasingly assist secure software development, their ability to meet the rigorous demands of Rust program verification remains unclear. Existing evaluations treat Rust verification as a black box, assessing models only by binary pass or fail outcomes for proof hints. This obscures whether models truly understand the logical deductions required for verifying nontrivial Rust code. To bridge this gap, we introduce VCoT-Lift, a framework that lifts low-level solver reasoning into high-level, human-readable verification steps. By exposing solver-level reasoning as an explicit Verification Chain-of-Thought, VCoT-Lift provides a concrete ground truth for fine-grained evaluation. Leveraging VCoT-Lift, we introduce VCoT-Bench, a comprehensive benchmark of 1,988 VCoT completion tasks for rigorously evaluating LLMs' understanding of the entire verification process. VCoT-Bench measures performance along three orthogonal dimensions: robustness to varying degrees of missing proofs, competence across different proof types, and sensitivity to the proof locations. Evaluation of ten state-of-the-art models reveals severe fragility, indicating that current LLMs fall well short of the reasoning capabilities exhibited by automated theorem provers.

[153]  arXiv:2603.18335 [pdf, ps, other]
Title: Distributed Unknown Input Observer Design: A Geometric Approach
Subjects: Systems and Control (eess.SY)

We present a geometric approach to designing distributed unknown input observers (DUIOs) for linear time-invariant systems, where measurements are distributed across nodes and each node is influenced by \emph{unknown inputs} through distinct channels. The proposed distributed estimation scheme consists of a network of observers, each tasked with reconstructing the entire system state despite having access only to local input-output signals that are individually insufficient for full state observation. Unlike existing methods that impose stringent rank conditions on the input and output matrices at each node, our approach leverages the $(C,A)$-invariant (conditioned invariant) subspace at each node from a geometric perspective. This enables the design of DUIOs in both continuous- and discrete-time settings under relaxed conditions, for which we establish sufficiency and necessity. The effectiveness of our methodology is demonstrated through extensive simulations, including a practical case study on a power grid system.

[154]  arXiv:2603.18336 [pdf, ps, other]
Title: ManiDreams: An Open-Source Library for Robust Object Manipulation via Uncertainty-aware Task-specific Intuitive Physics
Comments: 9 pages, 10 figures. Project page at this https URL
Subjects: Robotics (cs.RO)

Dynamics models, whether simulators or learned world models, have long been central to robotic manipulation, but most focus on minimizing prediction error rather than confronting a more fundamental challenge: real-world manipulation is inherently uncertain. We argue that robust manipulation under uncertainty is fundamentally an integration problem: uncertainties must be represented, propagated, and constrained within the planning loop, not merely suppressed during training. We present and open-source ManiDreams, a modular framework for uncertainty-aware manipulation planning over intuitive physics models. It realizes this integration through composable abstractions for distributional state representation, backend-agnostic dynamics prediction, and declarative constraint specification for action optimization. The framework explicitly addresses three sources of uncertainty: perceptual, parametric, and structural. It wraps any base policy with a sample-predict-constrain loop that evaluates candidate actions against distributional outcomes, adding robustness without retraining. Experiments on ManiSkill tasks show that ManiDreams maintains robust performance under various perturbations where the RL baseline degrades significantly. Runnable examples on pushing, picking, catching, and real-world deployment demonstrate flexibility across different policies, optimizers, physics backends, and executors. The framework is publicly available at https://github.com/Rice-RobotPI-Lab/ManiDreams

[155]  arXiv:2603.18342 [pdf, ps, other]
Title: Shifting Uncertainty to Critical Moments: Towards Reliable Uncertainty Quantification for VLA Model
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Vision-Language-Action (VLA) models enable general-purpose robotic policies by mapping visual observations and language instructions to low-level actions, but they often lack reliable introspection. A common practice is to compute a token-level uncertainty signal and take its mean over a rollout. However, mean aggregation can dilute short-lived but safety-critical uncertainty spikes in continuous control. In particular, successful rollouts may contain localized high-entropy segments due to benign noise or non-critical micro-adjustments, while failure rollouts can appear low-entropy for most timesteps and only exhibit brief spikes near the onset of failure. We propose a unified uncertainty quantification approach for predicting rollout success versus failure that (1) uses max-based sliding window pooling to preserve transient risk signals, (2) applies motion-aware stability weighting to emphasize high-frequency action oscillations associated with unstable behaviors, and (3) performs DoF-adaptive calibration via Bayesian Optimization to prioritize kinematically critical axes. Experiments on the LIBERO benchmark show that our method substantially improves failure prediction accuracy and yields more reliable signals for failure detection, which can support downstream human-in-the-loop interventions.

[156]  arXiv:2603.18343 [pdf, ps, other]
Title: VISTA: Validation-Guided Integration of Spatial and Temporal Foundation Models with Anatomical Decoding for Rare-Pathology VCE Event Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Capsule endoscopy event detection is challenging because diagnostically relevant findings are sparse, visually heterogeneous, and embedded in long, noisy video streams, while evaluation is performed at the event level rather than by frame accuracy alone. We therefore formulate the RARE-VISION task as a metric-aligned event detection problem instead of a purely frame-wise classification task. Our framework combines two complementary backbones, EndoFM-LV for local temporal context and DINOv3 ViT-L/16 for strong frame-level visual semantics, followed by a Diverse Head Ensemble, Validation-Guided Hierarchical Fusion, and Anatomy-Aware Temporal Event Decoding. The fusion stage uses validation-derived class-wise model weighting, backbone weighting, and probability calibration, while the decoding stage applies temporal smoothing, anatomical constraints, threshold refinement, and per-label event generation to produce stable event predictions. Validation ablations indicate that complementary backbones, validation-guided fusion, and anatomy-aware temporal decoding all contribute to event-level performance. On the official hidden test set, the proposed method achieved an overall temporal mAP@0.5 of 0.3530 and temporal mAP@0.95 of 0.3235.

[157]  arXiv:2603.18344 [pdf, ps, other]
Title: HRI-SA: A Multimodal Dataset for Online Assessment of Human Situational Awareness during Remote Human-Robot Teaming
Comments: This work is currently under peer review
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Maintaining situational awareness (SA) is critical in human-robot teams. Yet, under high workload and dynamic conditions, operators often experience SA gaps. Automated detection of SA gaps could provide timely assistance for operators. However, conventional SA measures either disrupt task flow or cannot capture real-time fluctuations, limiting their operational utility. To the best of our knowledge, no publicly available dataset currently supports the systematic evaluation of online human SA assessment in human-robot teaming. To advance the development of online SA assessment tools, we introduce HRI-SA, a multimodal dataset from 30 participants in a realistic search-and-rescue human-robot teaming context, incorporating eye movements, pupil diameter, biosignals, user interactions, and robot data. The experimental protocol included predefined events requiring timely operator assistance, with ground truth SA latency of two types (perceptual and comprehension) systematically obtained by measuring the time between assistance need onset and resolution. We illustrate the utility of this dataset by evaluating standard machine learning models for detecting perceptual SA latencies using generic eye-tracking features and contextual features. Results show that eye-tracking features alone effectively classified perceptual SA latency (recall=88.91%, F1=67.63%) using leave-one-group-out cross-validation, with performance improved through contextual data fusion (recall=91.51%, F1=80.38%). This paper contributes the first public dataset supporting the systematic evaluation of SA throughout a human-robot teaming mission, while also demonstrating the potential of generic eye-tracking features for continuous perceptual SA latency detection in remote human-robot teaming.

[158]  arXiv:2603.18347 [pdf, ps, other]
Title: Bonsai: A class of effective methods for independent sampling of graph partitions
Subjects: Data Structures and Algorithms (cs.DS); Computers and Society (cs.CY); Social and Information Networks (cs.SI)

We develop effective methods for constructing an ensemble of district plans via independent sampling from a reasonable probability distribution on the space of graph partitions. We compare the performance of our algorithms to that of standard Markov Chain based algorithms in the context of grid graphs and state congressional and legislative maps. For the case of perfect population balance between districts, we provide an explicit description of the distribution from which our method samples.

[159]  arXiv:2603.18348 [pdf, ps, other]
Title: Epistemic Generative Adversarial Networks
Comments: 14 pages, 6 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Generative models, particularly Generative Adversarial Networks (GANs), often suffer from a lack of output diversity, frequently generating similar samples rather than a wide range of variations. This paper introduces a novel generalization of the GAN loss function based on Dempster-Shafer theory of evidence, applied to both the generator and discriminator. Additionally, we propose an architectural enhancement to the generator that enables it to predict a mass function for each image pixel. This modification allows the model to quantify uncertainty in its outputs and leverage this uncertainty to produce more diverse and representative generations. Experimental evidence shows that our approach not only improves generation variability but also provides a principled framework for modeling and interpreting uncertainty in generative processes.

[160]  arXiv:2603.18349 [pdf, ps, other]
Title: Large-Scale Analysis of Political Propaganda on Moltbook
Comments: 9 pages, 4 figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We present an NLP-based study of political propaganda on Moltbook, a Reddit-style platform for AI agents. To enable large-scale analysis, we develop LLM-based classifiers to detect political propaganda, validated against expert annotation (Cohen's $\kappa$= 0.64-0.74). Using a dataset of 673,127 posts and 879,606 comments, we find that political propaganda accounts for 1% of all posts and 42% of all political content. These posts are concentrated in a small set of communities, with 70% of such posts falling into five of them. 4% of agents produced 51% of these posts. We further find that a minority of these agents repeatedly post highly similar content within and across communities. Despite this, we find limited evidence that comments amplify political propaganda.

[161]  arXiv:2603.18350 [pdf, ps, other]
Title: PeriphAR: Fast and Accurate Real-World Object Selection with Peripheral Augmented Reality Displays
Subjects: Human-Computer Interaction (cs.HC)

Gaze-based selection in XR requires visual confirmation due to eye-tracking limitations and target ambiguity in 3D contexts. Current designs for wide-FOV displays use world-locked, central overlays, which are not conducive to always-on AR glasses. This paper introduces PeriphAR (per-ree-far), a visualization technique that leverages peripheral vision for feedback during gaze-based selection on a monocular AR display. In a first user study, we isolated text, color, and shape properties of target objects to compare peripheral selection cues. Peripheral vision was more sensitive to color than shape, but this sensitivity rapidly declined at lower contrast. To preserve preattentive processing of color, we developed two strategies to enhance color in users' peripheral vision. In a second user study, our strategy that maximized contrast of the target to the neighboring object with the most similar color was subjectively preferred. As proof of concept, we implemented PeriphAR in an end-to-end system to test performance with real-world object detection.

[162]  arXiv:2603.18353 [pdf, ps, other]
Title: Interpretability without actionability: mechanistic methods cannot correct language model errors despite near-perfect internal representations
Comments: 27 pages, 5 figures, 10 tables. Code available at this https URL
Subjects: Artificial Intelligence (cs.AI)

Language models encode task-relevant knowledge in internal representations that far exceeds their output performance, but whether mechanistic interpretability methods can bridge this knowledge-action gap has not been systematically tested. We compared four mechanistic interpretability methods -- concept bottleneck steering (Steerling-8B), sparse autoencoder feature steering, logit lens with activation patching, and linear probing with truthfulness separator vector steering (Qwen 2.5 7B Instruct) -- for correcting false-negative triage errors using 400 physician-adjudicated clinical vignettes (144 hazards, 256 benign). Linear probes discriminated hazardous from benign cases with 98.2% AUROC, yet the model's output sensitivity was only 45.1%, a 53-percentage-point knowledge-action gap. Concept bottleneck steering corrected 20% of missed hazards but disrupted 53% of correct detections, indistinguishable from random perturbation (p=0.84). SAE feature steering produced zero effect despite 3,695 significant features. TSV steering at high strength corrected 24% of missed hazards while disrupting 6% of correct detections, but left 76% of errors uncorrected. Current mechanistic interpretability methods cannot reliably translate internal knowledge into corrected outputs, with implications for AI safety frameworks that assume interpretability enables effective error correction.

[163]  arXiv:2603.18354 [pdf, ps, other]
Title: Multi-material Direct Ink Writing and Embroidery for Stretchable Wearable Sensors
Comments: 6 pages, 8 figures, conference
Subjects: Robotics (cs.RO)

The development of wearable sensing systems for sports performance tracking, rehabilitation, and injury prevention has driven growing demand for smart garments that combine comfort, durability, and accurate motion detection. This paper presents a textile-compatible fabrication workflow that integrates multi-material direct ink writing with automated embroidery to create stretchable strain sensors directly embedded into garments. The process combines sequential multi-material printing of a silicone-carbon grease-silicone stack with automated embroidery that provides both mechanical fixation and electrical interfacing in a single step. The resulting hybrid sensor demonstrates stretchability up to 120% strain while maintaining electrical continuity, with approximately linear behaviour up to 60% strain (R^2 = 0.99), a gauge factor of 31.4, and hysteresis of 22.9%. Repeated loading-unloading tests over 80 cycles show baseline and peak drift of 0.135% and 0.236% per cycle, respectively, indicating moderate cycle-to-cycle stability. Mechanical testing further confirms that the silicone-fabric interface remains intact under large deformation, with failure occurring in the textile rather than at the stitched boundary. As a preliminary proof of concept, the sensor was integrated into wearable elbow and knee sleeves for joint angle monitoring, showing a clear correlation between normalised resistance change and bending angle. By addressing both mechanical fixation and electrical interfacing through embroidery-based integration, this approach provides a reproducible and scalable pathway for incorporating printed stretchable electronics into textile systems for motion capture and soft robotic applications.

[164]  arXiv:2603.18355 [pdf, ps, other]
Title: Pushan: Trace-Free Deobfuscation of Virtualization-Obfuscated Binaries
Subjects: Cryptography and Security (cs.CR)

In the ever-evolving battle against malware, binary obfuscation techniques are a formidable barrier to effective analysis by both human security analysts and automated systems. In particular, virtualization or VM-based obfuscation is one of the strongest protection mechanisms that evade automated analysis. Despite widespread use of virtualization, existing automated deobfuscation techniques suffer from three major drawbacks. First, they only work on execution traces, which prevents them from recovering all logic in an obfuscated binary. Second, they depend on dynamic symbolic execution, which is expensive and does not scale in practice. Third, they cannot generate "well-formed" code, which prevents existing binary decompilers from generating human-friendly output.
This paper introduces PUSHAN, a novel and generic technique for deobfuscating virtualization-obfuscated binaries while overcoming the limitations of existing techniques. PUSHAN is trace-free and avoids path-constraint accumulation by using VPC-sensitive, constraint-free symbolic emulation to recover a complete CFG of the virtualized function. It is the first approach that also decompiles the protected code into high-quality C pseudocode to enable effective analysis. Crucially, PUSHAN circumvents reliance on path satisfiability, a known NP-hard problem that hampers scalability. We evaluate PUSHAN on more than 1,000 binaries, including targets protected by academic state of the art (Tigress) and commercial-strength obfuscators VMProtect and Themida. PUSHAN successfully deobfuscates these binaries, retrieves their complete CFGs, and decompiles them to C pseudocode. We further demonstrate applicability by analyzing a previously unanalyzed VMProtect-obfuscated malware sample from VirusTotal, where our decompiled output enables LLM-assisted code simplification, reuse, and program understanding.

[165]  arXiv:2603.18356 [pdf, ps, other]
Title: LGESynthNet: Controlled Scar Synthesis for Improved Scar Segmentation in Cardiac LGE-MRI Imaging
Comments: Accepted at MICCAI STACOM workshop 2025
Subjects: Artificial Intelligence (cs.AI)

Segmentation of enhancement in LGE cardiac MRI is critical for diagnosing various ischemic and non-ischemic cardiomyopathies. However, creating pixel-level annotations for these images is challenging and labor-intensive, leading to limited availability of annotated data. Generative models, particularly diffusion models, offer promise for synthetic data generation, yet many rely on large training datasets and often struggle with fine-grained conditioning control, especially for small or localized features. We introduce LGESynthNet, a latent diffusion-based framework for controllable enhancement synthesis, enabling explicit control over size, location, and transmural extent. Formulated as inpainting using a ControlNet-based architecture, the model integrates: (a) a reward model for conditioning-specific supervision, (b) a captioning module for anatomically descriptive text prompts, and (c) a biomedical text encoder. Trained on just 429 images (79 patients), it produces realistic, anatomically coherent samples. A quality control filter selects outputs with high conditioning-fidelity, which when used for training augmentation, improve downstream segmentation and detection performance, by up-to 6 and 20 points respectively.

[166]  arXiv:2603.18358 [pdf, ps, other]
Title: From Noise to Signal: When Outliers Seed New Topics
Comments: To appear in the Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Outliers in dynamic topic modeling are typically treated as noise, yet we show that some can serve as early signals of emerging topics. We introduce a temporal taxonomy of news-document trajectories that defines how documents relate to topic formation over time. It distinguishes anticipatory outliers, which precede the topics they later join, from documents that either reinforce existing topics or remain isolated. By capturing these trajectories, the taxonomy links weak-signal detection with temporal topic modeling and clarifies how individual articles anticipate, initiate, or drift within evolving clusters. We implement it in a cumulative clustering setting using document embeddings from eleven state-of-the-art language models and evaluate it retrospectively on HydroNewsFr, a French news corpus on the hydrogen economy. Inter-model agreement reveals a small, high-consensus subset of anticipatory outliers, increasing confidence in these labels. Qualitative case studies further illustrate these trajectories through concrete topic developments.

[167]  arXiv:2603.18359 [pdf, ps, other]
Title: Towards Interpretable Framework for Neural Audio Codecs via Sparse Autoencoders: A Case Study on Accent Information
Subjects: Sound (cs.SD)

Neural Audio Codecs (NACs) are widely adopted in modern speech systems, yet how they encode linguistic and paralinguistic information remains unclear. Improving the interpretability of NAC representations is critical for understanding and deploying them in sensitive applications. Hence, we employ Sparse Autoencoders (SAEs) to decompose dense NAC representations into sparse, interpretable activations. In this work, we focus on a challenging paralinguistic attribute-accent-and propose a framework to quantify NAC interpretability. We evaluate four NAC models under 16 SAE configurations using a relative performance index. Our results show that DAC and SpeechTokenizer achieve the highest interpretability. We further reveal that acoustic-oriented NACs encode accent information primarily in activation magnitudes of sparse representations, whereas phonetic-oriented NACs rely more on activation positions, and that low-bitrate EnCodec variants show higher interpretability.

[168]  arXiv:2603.18360 [pdf, ps, other]
Title: LEO-based Carrier-Phase Positioning for 6G: Design Insights and Comparison with GNSS
Comments: 7 pages, 6 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The integration of non-terrestrial networks (NTN) into 5G new radio (NR) enables a new class of positioning capabilities based on cellular signals transmitted by Low-Earth Orbit (LEO) satellites. In this paper, we investigate joint delay-and-carrier-phase positioning for LEO-based NR-NTN systems and provide a convergence-centric comparison with Global Navigation Satellite Systems (GNSS). We show that the rapid orbital motion of LEO satellites induces strong temporal and geometric diversity across observation epochs, thereby improving the conditioning of multi-epoch carrier-phase models and enabling significantly faster integer-ambiguity convergence. To enable robust carrier-phase tracking under intermittent positioning reference signal (PRS) transmissions, we propose a dual-waveform design that combines wideband PRS for delay estimation with a continuous narrowband carrier for phase tracking. Using a realistic simulation framework incorporating LEO orbit dynamics, we demonstrate that LEO-based joint delay-and-carrier-phase positioning achieves cm-level accuracy with convergence times on the order of a few seconds, whereas GNSS remains limited to meter-level accuracy over comparable short observation windows. These results establish LEO-based cellular positioning as a strong complement and potential alternative to GNSS for high-accuracy positioning, navigation, and timing (PNT) services in future wireless networks.

[169]  arXiv:2603.18361 [pdf, ps, other]
Title: Synthetic Data Generation for Training Diversified Commonsense Reasoning Models
Comments: 21 pages, 7 figures
Subjects: Computation and Language (cs.CL)

Conversational agents are required to respond to their users not only with high quality (i.e. commonsense bearing) responses, but also considering multiple plausible alternative scenarios, reflecting the diversity in their responses. Despite the growing need to train diverse commonsense generators, the progress of this line of work has been significantly hindered by the lack of large-scale high-quality diverse commonsense training datasets. Due to the high annotation costs, existing Generative Commonsense Reasoning (GCR) datasets are created using a small number of human annotators, covering only a narrow set of commonsense scenarios. To address this training resource gap, we propose a two-stage method to create the first-ever synthetic dataset CommonSyn for diversified (GCR). The model fine-tuned on our synthetic data jointly increase both generation diversity and quality compared with vanilla models and the model fine-tuned on human-crafted dataset across different size Large Language Models (LLMs)

[170]  arXiv:2603.18363 [pdf, ps, other]
Title: PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradigm for eliciting the latent capabilities of Large Language Models (LLMs) without external supervision. However, current methods rely on heuristic intrinsic rewards, which often lack a well-defined theoretical optimization target and are prone to degenerative biases. In this work, we introduce PowerFlow, a principled framework that reformulates unsupervised fine-tuning as a distribution matching problem. By casting GFlowNet as an amortized variational sampler for unnormalized densities, we propose a length-aware Trajectory-Balance objective that explicitly neutralizes the structural length biases inherent in autoregressive generation. By targeting $\alpha$-power distributions, PowerFlow enables the directional elicitation of the dual nature of LLMs: sharpening the distribution ($\alpha > 1$) to intensify logical reasoning, or flattening it ($\alpha < 1$) to unlock expressive creativity. Extensive experiments demonstrate that PowerFlow consistently outperforms existing RLIF methods, matching or even exceeding supervised GRPO. Furthermore, by mitigating over-sharpening in aligned models, our approach achieves simultaneous gains in diversity and quality, shifting the Pareto frontier in creative tasks.

[171]  arXiv:2603.18364 [pdf, ps, other]
Title: A Distributionally Robust Optimal Control Approach for Differentially Private Dynamical Systems
Comments: 6 pages, 3 figures, Submitted to IEEE L-CSS and CDC 2026
Subjects: Systems and Control (eess.SY)

In this paper, we develop a distributionally robust optimal control approach for differentially private dynamical systems, enabling a plant to securely outsource control computation to an untrusted remote server. We consider a plant that ensures differential privacy of its state trajectory by injecting calibrated noise into its output measurements. Unlike prior works, we assume that the server only has access to an ambiguity set consisting of admissible noise distributions, rather than the exact distribution. To account for this uncertainty, the server formulates a distributionally robust optimal control problem to minimize the worst-case expected cost over all admissible noise distributions. However, the formulated problem is computationally intractable due to the nonconvexity of the ambiguity set. To overcome this, we relax it into a convex Kullback--Leibler divergence ball, so that the reformulated problem admits a tractable closed-form solution.

[172]  arXiv:2603.18368 [pdf, ps, other]
Title: Decidability of Quantum Modal Logic
Authors: Kenji Tokuo
Journal-ref: Logic Journal of the IGPL, Volume 33, Issue 3, June 2025, jzaf010
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)

The decidability of a logical system refers to the existence of an algorithm that can determine whether any given formula in that system is a theorem. In this paper, Harrop's lemma is used to prove the decidability of quantum modal logic.

[173]  arXiv:2603.18369 [pdf, ps, other]
Title: Convergence of entropy-stable continuous summation-by-parts discretizations of symmetric hyperbolic conservation laws
Comments: 26 pages
Subjects: Numerical Analysis (math.NA)

The Lax equivalence theorem guarantees convergence of stable and consistent discretizations for linear hyperbolic partial differential equations (PDEs). For nonlinear problems, however, stability and consistency alone do not generally guarantee convergence, even for smooth solutions, and existing convergence results typically rely either on projection-based error decompositions or on linearization arguments that do not directly extend to entropy-stable split-form discretizations. In particular, general convergence results for entropy-stable discretizations of hyperbolic PDEs are currently lacking, despite their widespread use. In this work, we prove convergence under smoothness assumptions on the exact solution and fluxes for entropy-stable split-form discretizations of scalar and symmetric hyperbolic systems with homogeneous flux functions within the continuous summation-by-parts (C-SBP) framework. The scalar inviscid Burgers equation is presented as a canonical example. The analysis is based on a stability-consistency argument that yields a nonlinear error evolution inequality whose solution provides an explicit upper bound on the numerical error. We show that, for sufficiently small mesh spacing, and for degree-$p$ C-SBP discretizations in $d$ spatial dimensions with $p>1+d/2$, this bound remains finite on any finite time interval and tends to zero as the mesh is refined, implying convergence despite the presence of local linear instabilities. The results help clarify the relationship between consistency, entropy stability, nonlinear error growth, and convergence for discretizations of nonlinear hyperbolic problems.

[174]  arXiv:2603.18370 [pdf, ps, other]
Title: Contact Status Recognition and Slip Detection with a Bio-inspired Tactile Hand
Comments: 7 pages, 9 figures
Subjects: Robotics (cs.RO)

Stable and reliable grasp is critical to robotic manipulations especially for fragile and glazed objects, where the grasp force requires precise control as too large force possibly damages the objects while small force leads to slip and fall-off. Although it is assumed the objects to manipulate is grasped firmly in advance, slip detection and timely prevention are necessary for a robot in unstructured and universal environments. In this work, we addressed this issue by utilizing multimodal tactile feedback from a five-fingered bio-inspired hand. Motivated by human hands, the tactile sensing elements were distributed and embedded into the soft skin of robotic hand, forming 24 tactile channels in total. Different from the threshold method that was widely employed in most existing works, we converted the slip detection problem to contact status recognition in combination with binning technique first and then detected the slip onset time according to the recognition results. After the 24-channel tactile signals passed through discrete wavelet transform, 17 features were extracted from different time and frequency bands. With the optimal 120 features employed for status recognition, the test accuracy reached 96.39% across three different sliding speeds and six kinds of materials. When applied to four new unseen materials, a high accuracy of 91.95% was still achieved, which further validated the generalization of our proposed method. Finally, the performance of slip detection is verified based on the trained model of contact status recognition.

[175]  arXiv:2603.18372 [pdf, ps, other]
Title: TENSURE: Fuzzing Sparse Tensor Compilers (Registered Report)
Subjects: Programming Languages (cs.PL); Software Engineering (cs.SE)

Sparse Tensor Compilers (STCs) have emerged as critical infrastructure for optimizing high-dimensional data analytics and machine learning workloads. The STCs must synthesize complex, irregular control flow for various compressed storage formats directly from high-level declarative specifications, thereby making them highly susceptible to subtle correctness defects. Existing testing frameworks, which rely on mutating computation graphs restricted to a standard vocabulary of operators, fail to exercise the arbitrary loop synthesis capabilities of these compilers. Furthermore, generic grammar-based fuzzers struggle to generate valid inputs due to the strict rules governing how indices are reused across multiple tensors.
In this paper, we present TENSURE, the first extensible black-box fuzzing framework specifically designed for the testing of STCs. TENSURE leverages Einstein Summation (Einsum) notation as a general input abstraction, enabling the generation of complex, unconventional tensor contractions that expose corner cases in the code-generation phases of STCs. We propose a novel constraint-based generation algorithm that guarantees 100% semantic validity of synthesized kernels, significantly outperforming the ~3.3% validity rate of baseline grammar fuzzers. To enable metamorphic testing without a trusted reference, we introduce a set of semantic-preserving mutation operators that exploit algebraic commutativity and heterogeneity in storage formats. Our evaluation on two state-of-the-art systems, TACO and Finch, reveals widespread fragility, particularly in TACO, where TENSURE exposed crashes or silent miscompilations in a majority of generated test cases. These findings underscore the critical need for specialized testing tools in the sparse compilation ecosystem.

[176]  arXiv:2603.18373 [pdf, ps, other]
Title: To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs
Authors: Rui Hong, Shuxue Quan
Comments: 14 pages, 1 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

When VLMs answer correctly, do they genuinely rely on visual information or exploit language shortcuts? We introduce the Tri-Layer Diagnostic Framework, which disentangles hallucination sources via three metrics: Latent Anomaly Detection (perceptual awareness), Visual Necessity Score (visual dependency, measured via KL divergence), and Competition Score (conflict between visual grounding and instruction following). Using counterfactual interventions (blind, noise, and conflict images) across 7 VLMs and 7,000 model-sample pairs, our taxonomy reveals that 69.6% of samples exhibit Visual Sycophancy--models detect visual anomalies but hallucinate to satisfy user expectations--while zero samples show Robust Refusal, indicating alignment training has systematically suppressed truthful uncertainty acknowledgment. A scaling analysis (Qwen2.5-VL 7B to 72B) shows larger models reduce Language Shortcuts but amplify Visual Sycophancy, demonstrating scale alone cannot resolve the grounding problem. Diagnostic scores further enable a post-hoc selective prediction strategy achieving up to +9.5pp accuracy at 50% coverage with no additional training cost.

[177]  arXiv:2603.18375 [pdf, ps, other]
Title: Relationship-Centered Care: Relatedness and Responsible Design for Human Connections in Mental-Health Care
Subjects: Human-Computer Interaction (cs.HC)

There has been a growing research interest in Digital Therapeutic Alliance (DTA) as the field of AI-powered conversational agents are being deployed in mental health care, particularly those delivering CBT (Cognitive Behaviour Therapy). Our proposition argues that the current design paradigm which seeks to optimize the bond between a patient in need of support and an AI agent contains a subtle but consequential trap: it risks producing an "appearance of connection" that unintentionally disrupts the fundamental human need for relatedness, which potentially displaces the authentic human relationships upon which long-term psychological recovery depends. We propose a reorientation from designing artificial intelligence tools that simulate relationships to designing AI that scaffolds them. To operationalize our argument, we propose an interdisciplinary model that translates the Responsible AI Six Sphere Framework through the lens of Self-Determination Theory (SDT), with a specific focus on the basic psychological need for relatedness. The resulting model offers the technical and often clinical communities a set of relationship-centered design guidelines and relevant provocations for building AI systems that function not just as companions, but as a catalyst for strengthening a patient's entire relational ecology; their connections with therapists, caregivers, family, and peers. In doing so, we discuss a model towards a more sustainable ecosystem of relationship-centered AI in mental health care.

[178]  arXiv:2603.18377 [pdf, ps, other]
Title: PlanTwin: Privacy-Preserving Planning Abstractions for Cloud-Assisted LLM Agents
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)

Cloud-hosted large language models (LLMs) have become the de facto planners in agentic systems, coordinating tools and guiding execution over local environments. In many deployments, however, the environment being planned over is private, containing source code, files, credentials, and metadata that cannot be exposed to the cloud. Existing solutions address adjacent concerns, such as execution isolation, access control, or confidential inference, but they do not control what cloud planners observe during planning: within the permitted scope, \textit{raw environment state is still exposed}.
We introduce PlanTwin, a privacy-preserving architecture for cloud-assisted planning without exposing raw local context. The key idea is to project the real environment into a \textit{planning-oriented digital twin}: a schema-constrained and de-identified abstract graph that preserves planning-relevant structure while removing reconstructable details. The cloud planner operates solely on this sanitized twin through a bounded capability interface, while a local gatekeeper enforces safety policies and cumulative disclosure budgets. We further formalize the privacy-utility trade-off as a capability granularity problem, define architectural privacy goals using $(k,\delta)$-anonymity and $\epsilon$-unlinkability, and mitigate compositional leakage through multi-turn disclosure control.
We implement PlanTwin as middleware between local agents and cloud planners and evaluate it on 60 agentic tasks across ten domains with four cloud planners. PlanTwin achieves full sensitive-item non-disclosure (SND = 1.0) while maintaining planning quality close to full-context systems: three of four planners achieve PQS $> 0.79$, and the full pipeline incurs less than 2.2\% utility loss.

[179]  arXiv:2603.18380 [pdf, ps, other]
Title: Emergence of Phase Transitions in Complex Contagions
Comments: Under Review at KDD '26
Subjects: Social and Information Networks (cs.SI)

Understanding how complex behaviors, opinions, and innovations spread in online social networks remains a central challenge in computational social science. Existing models of complex contagion typically rely on stylized threshold mechanisms based solely on the number of infected neighbors and do not account for the interaction between individual preferences, local social influence, and global sentiment. Moreover, the emergence of virality through phase transitions and tipping points remains poorly characterized.
In this paper, we propose a unified propagation cascade model in which notions propagate as high-dimensional vectors in the same feature space as network nodes. Node activations are governed by a unified decision function that integrates propagation affinity, local influence, and global influence. The resulting dynamics induce a stochastic, Markovian cascade process that enables efficient MCMC sampling of propagation outcomes.
Using preferential attachment networks, we systematically study spread distributions, incubation dynamics, parameter sensitivity, and phase transition behavior. Our results show that balanced interactions between local reinforcement and global activation are critical for successful cascades and that early-stage growth patterns provide reliable signals of impending phase transitions.

[180]  arXiv:2603.18382 [pdf, ps, other]
Title: From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents
Subjects: Artificial Intelligence (cs.AI)

Anonymization is widely treated as a practical safeguard because re-identifying anonymous records was historically costly, requiring domain expertise, tailored algorithms, and manual corroboration. We study a growing privacy risk that may weaken this barrier: LLM-based agents can autonomously reconstruct real-world identities from scattered, individually non-identifying cues. By combining these sparse cues with public information, agents resolve identities without bespoke engineering. We formalize this threat as \emph{inference-driven linkage} and systematically evaluate it across three settings: classical linkage scenarios (Netflix and AOL), \emph{InferLink} (a controlled benchmark varying task intent, shared cues, and attacker knowledge), and modern text-rich artifacts. Without task-specific heuristics, agents successfully execute both fixed-pool matching and open-ended identity resolution. In the Netflix Prize setting, an agent reconstructs 79.2\% of identities, significantly outperforming a 56.0\% classical baseline. Furthermore, linkage emerges not only under explicit adversarial prompts but also as a byproduct of benign cross-source analysis in \emph{InferLink} and unstructured research narratives. These findings establish that identity inference -- not merely explicit information disclosure -- must be treated as a first-class privacy risk; evaluations must measure what identities an agent can infer.

[181]  arXiv:2603.18383 [pdf, ps, other]
Title: From Servers to Sites: Compositional Power Trace Generation of LLM Inference for Infrastructure Planning
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Datacenter operators and electrical utilities rely on power traces at different spatiotemporal scales. Operators use fine-grained traces for provisioning, facility management, and scheduling, while utilities use site-level load profiles for capacity and interconnection planning. Existing datacenter power models do not capture LLM inference workloads, in which GPUs shift rapidly among compute-intensive prefill, lower-power decode, and idle states, and facility demand depends on how these states evolve and synchronize across many devices. We show that LLM inference power can be represented compositionally through two components: workload-driven transitions among operating states and configuration-specific power distributions within those states. Building on this observation, we develop a trace-generation framework that learns from measured traces and synthesizes power profiles for new traffic conditions and serving configurations. These traces aggregate from GPU servers to rack-, row-, and facility-scale load profiles at the temporal granularity required by the study.
Across multiple LLMs, tensor-parallel settings, and GPU generations, our framework achieves median absolute energy error below 5% for most configurations while preserving temporal autocorrelation structure. The resulting traces support downstream analyses including oversubscription, power modulation, and utility-facing load characterization, enabling infrastructure evaluations that flat nameplate assumptions and static trace replay cannot support.

[182]  arXiv:2603.18385 [pdf, ps, other]
Title: Evolutionarily Stable Stackelberg Equilibrium
Authors: Sam Ganzfried
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH); Populations and Evolution (q-bio.PE)

We present a new solution concept called evolutionarily stable Stackelberg equilibrium (SESS). We study the Stackelberg evolutionary game setting in which there is a single leading player and a symmetric population of followers. The leader selects an optimal mixed strategy, anticipating that the follower population plays an evolutionarily stable strategy (ESS) in the induced subgame and may satisfy additional ecological conditions. We consider both leader-optimal and follower-optimal selection among ESSs, which arise as special cases of our framework. Prior approaches to Stackelberg evolutionary games either define the follower response via evolutionary dynamics or assume rational best-response behavior, without explicitly enforcing stability against invasion by mutations. We present algorithms for computing SESS in discrete and continuous games, and validate the latter empirically. Our model applies naturally to biological settings; for example, in cancer treatment the leader represents the physician and the followers correspond to competing cancer cell phenotypes.

[183]  arXiv:2603.18387 [pdf, ps, other]
Title: Mathematical Foundations of Deep Learning
Authors: Xiaojing Ye
Comments: Draft version. Final version is published in "Chapman & Hall/CRC Mathematics and Artificial Intelligence Series" by Taylor & Francis in 2026
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

This draft book offers a comprehensive and rigorous treatment of the mathematical principles underlying modern deep learning. The book spans core theoretical topics, from the approximation capabilities of deep neural networks, the theory and algorithms of optimal control and reinforcement learning integrated with deep learning techniques, to contemporary generative models that drive today's advances in artificial intelligence.

[184]  arXiv:2603.18388 [pdf, ps, other]
Title: Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Automatic prompt optimization (APO) has emerged as a powerful paradigm for improving LLM performance without manual prompt engineering. Reflective APO methods such as GEPA iteratively refine prompts by diagnosing failure cases, but the optimization process remains black-box and label-free, leading to uninterpretable trajectories and systematic failure. We identify and empirically demonstrate four limitations: on GSM8K with a defective seed, GEPA degrades accuracy from 23.81% to 13.50%. We propose VISTA, a multi-agent APO framework that decouples hypothesis generation from prompt rewriting, enabling semantically labeled hypotheses, parallel minibatch verification, and interpretable optimization trace. A two-layer explore-exploit mechanism combining random restart and epsilon-greedy sampling further escapes local optima. VISTA recovers accuracy to 87.57% on the same defective seed and consistently outperforms baselines across all conditions on GSM8K and AIME2025.

[185]  arXiv:2603.18390 [pdf, ps, other]
Title: AutoScreen-FW: An LLM-based Framework for Resume Screening
Comments: 11 pages, 9 figures
Subjects: Computation and Language (cs.CL)

Corporate recruiters often need to screen many resumes within a limited time, which increases their burden and may cause suitable candidates to be overlooked. To address these challenges, prior work has explored LLM-based automated resume screening. However, some methods rely on commercial LLMs, which may pose data privacy risks. Moreover, since companies typically do not make resumes with evaluation results publicly available, it remains unclear which resume samples should be used during learning to improve an LLM's judgment performance. To address these problems, we propose AutoScreen-FW, an LLM-based locally and automatically resume screening framework. AutoScreen-FW uses several methods to select a small set of representative resume samples. These samples are used for in-context learning together with a persona description and evaluation criteria, enabling open-source LLMs to act as a career advisor and evaluate unseen resumes. Experiments with multiple ground truths show that the open-source LLM judges consistently outperform GPT-5-nano. Under one ground truth setting, it also surpass GPT-5-mini. Although it is slightly weaker than GPT-5-mini under other ground-truth settings, it runs substantially faster per resume than commercial GPT models. These findings indicate the potential for deploying AutoScreen-FW locally in companies to support efficient screening while reducing recruiters' burden.

[186]  arXiv:2603.18391 [pdf, ps, other]
Title: Computational and Statistical Hardness of Calibration Distance
Authors: Mingda Qiao
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)

The distance from calibration, introduced by B{\l}asiok, Gopalan, Hu, and Nakkiran (STOC 2023), has recently emerged as a central measure of miscalibration for probabilistic predictors. We study the fundamental problems of computing and estimating this quantity, given either an exact description of the data distribution or only sample access to it.
We give an efficient algorithm that exactly computes the calibration distance when the distribution has a uniform marginal and noiseless labels, which improves the $O(1/\sqrt{|\mathcal{X}|})$ additive approximation of Qiao and Zheng (COLT 2024) for this special case. Perhaps surprisingly, the problem becomes $\mathsf{NP}$-hard when either of the two assumptions is removed. We extend our algorithm to a polynomial-time approximation scheme for the general case.
For the estimation problem, we show that $\Theta(1/\epsilon^3)$ samples are sufficient and necessary for the empirical calibration distance to be upper bounded by the true distance plus $\epsilon$. In contrast, a polynomial dependence on the domain size -- incurred by the learning-based baseline -- is unavoidable for two-sided estimation.
Our positive results are based on simple sparsifications of both the distribution and the target predictor, which significantly reduce the search space for computation and lead to stronger concentration for the estimation problem. To prove the hardness results, we introduce new techniques for certifying lower bounds on the calibration distance -- a problem that is hard in general due to its $\textsf{co-NP}$-completeness.

[187]  arXiv:2603.18393 [pdf, ps, other]
Title: Where are the Hidden Gems? Applying Transformer Models for Design Discussion Detection
Subjects: Software Engineering (cs.SE)

Design decisions are at the core of software engineering and appear in Q\&A forums, mailing lists, pull requests, issue trackers, and commit messages. Design discussions spanning a project's history provide valuable information for informed decision-making, such as refactoring and software modernization. Machine learning techniques have been used to detect design decisions in natural language discussions; however, their effectiveness is limited by the scarcity of labeled data and the high cost of annotation. Prior work adopted cross-domain strategies with traditional classifiers, training on one domain and testing on another. Despite their success, transformer-based models, which often outperform traditional methods, remain largely unexplored in this setting. The goal of this work is to investigate the performance of transformer-based models (i.e., BERT, RoBERTa, XLNet, LaMini-Flan-T5-77M, and ChatGPT-4o-mini) for detecting design-related discussions. To this end, we conduct a conceptual replication of prior cross-domain studies while extending them with modern transformer architectures and addressing methodological issues in earlier work. The models were fine-tuned on Stack Overflow and evaluated on GitHub artifacts (i.e., pull requests, issues, and commits). BERT and RoBERTa show strong recall across domains, while XLNet achieves higher precision but lower recall. ChatGPT-4o-mini yields the highest recall and competitive overall performance, whereas LaMini-Flan-T5-77M provides a lightweight alternative with stronger precision but less balanced performance. We also evaluated similar-word injection for data augmentation, but unlike prior findings, it did not yield meaningful improvements. Overall, these results highlight both the opportunities and trade-offs of using modern language models for detecting design discussion.

[188]  arXiv:2603.18396 [pdf, ps, other]
Title: RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

Bus holding control is challenging due to stochastic traffic and passenger demand. While deep reinforcement learning (DRL) shows promise, standard actor-critic algorithms suffer from Q-value instability in volatile environments. A key source of this instability is the conflation of two distinct uncertainties: aleatoric uncertainty (irreducible noise) and epistemic uncertainty (data insufficiency). Treating these as a single risk leads to value underestimation in noisy states, causing catastrophic policy collapse. We propose a robust ensemble soft actor-critic (RE-SAC) framework to explicitly disentangle these uncertainties. RE-SAC applies Integral Probability Metric (IPM)-based weight regularization to the critic network to hedge against aleatoric risk, providing a smooth analytical lower bound for the robust Bellman operator without expensive inner-loop perturbations. To address epistemic risk, a diversified Q-ensemble penalizes overconfident value estimates in sparsely covered regions. This dual mechanism prevents the ensemble variance from misidentifying noise as a data gap, a failure mode identified in our ablation study. Experiments in a realistic bidirectional bus corridor simulation demonstrate that RE-SAC achieves the highest cumulative reward (approx. -0.4e6) compared to vanilla SAC (-0.55e6). Mahalanobis rareness analysis confirms that RE-SAC reduces Oracle Q-value estimation error by up to 62% in rare out-of-distribution states (MAE of 1647 vs. 4343), demonstrating superior robustness under high traffic variability.

[189]  arXiv:2603.18397 [pdf, ps, other]
Title: FlowMS: Flow Matching for De Novo Structure Elucidation from Mass Spectra
Authors: Jianan Nie, Peng Gao
Subjects: Machine Learning (cs.LG)

Mass spectrometry (MS) stands as a cornerstone analytical technique for molecular identification, yet de novo structure elucidation from spectra remains challenging due to the combinatorial complexity of chemical space and the inherent ambiguity of spectral fragmentation patterns. Recent deep learning approaches, including autoregressive sequence models, scaffold-based methods, and graph diffusion models, have made progress. However, diffusion-based generation for this task remains computationally demanding. Meanwhile, discrete flow matching, which has shown strong performance for graph generation, has not yet been explored for spectrum-conditioned structure elucidation. In this work, we introduce FlowMS, the first discrete flow matching framework for spectrum-conditioned de novo molecular generation. FlowMS generates molecular graphs through iterative refinement in probability space, enforcing chemical formula constraints while conditioning on spectral embeddings from a pretrained formula transformer encoder. Notably, it achieves state-of-the-art performance on 5 out of 6 metrics on the NPLIB1 benchmark: 9.15% top-1 accuracy (9.7% relative improvement over DiffMS) and 7.96 top-10 MCES (4.2% improvement over MS-BART). We also visualize the generated molecules, which further demonstrate that FlowMS produces structurally plausible candidates closely resembling ground truth structures. These results establish discrete flow matching as a promising paradigm for mass spectrometry-based structure elucidation in metabolomics and natural product discovery.

[190]  arXiv:2603.18398 [pdf, ps, other]
Title: Deconstructing Open-World Game Mission Design Formula: A Thematic Analysis Using an Action-Block Framework
Subjects: Human-Computer Interaction (cs.HC)

Open-world missions often rely on repeated formulas, yet designers lack systematic ways to examine pacing, variation, and experiential balance across large portfolios. We introduce the Mission Action Quality Vector (MAQV), a six-dimensional framework-covering combat, exploration, narrative, emotion, problem-solving, and uniqueness-paired with an action block grammar representing missions as gameplay sequences. Using about 2200 missions from 20 AAA titles, we apply LLM-assisted parsing to convert community walkthroughs into structured action sequences and score them with MAQV. An interactive dashboard enables designers to reveal underlying mission formulas. In a mixed-methods study with experienced players and designers, we validate the pipeline's fidelity and the tool's usability, and use thematic analysis to identify recurring design trade-offs, pacing grammars, and systematic differences by quest type and franchise evolution. Our work offers a reproducible analytical workflow, a data-driven visualization tool, and reflective insights to support more balanced, varied mission design at scale.

[191]  arXiv:2603.18400 [pdf, ps, other]
Title: Graph-of-Constraints Model Predictive Control for Reactive Multi-agent Task and Motion Planning
Comments: 8 main content pages, 4 main content figures, camera ready version submitted to IEEE International Conference on Robotics and Automation (ICRA 2026)
Subjects: Robotics (cs.RO)

Sequences of interdependent geometric constraints are central to many multi-agent Task and Motion Planning (TAMP) problems. However, existing methods for handling such constraint sequences struggle with partially ordered tasks and dynamic agent assignments. They typically assume static assignments and cannot adapt when disturbances alter task allocations. To overcome these limitations, we introduce Graph-of-Constraints Model Predictive Control (GoC-MPC), a generalized sequence-of-constraints framework integrated with MPC. GoC-MPC naturally supports partially ordered tasks, dynamic agent coordination, and disturbance recovery. By defining constraints over tracked 3D keypoints, our method robustly solves diverse multi-agent manipulation tasks-coordinating agents and adapting online from visual observations alone, without relying on training data or environment models. Experiments demonstrate that GoC-MPC achieves higher success rates, significantly faster TAMP computation, and shorter overall paths compared to recent baselines, establishing it as an efficient and robust solution for multi-agent manipulation under real-world disturbances. Our supplementary video and code can be found at https://sites.google.com/view/goc-mpc/home .

[192]  arXiv:2603.18401 [pdf, ps, other]
Title: Pixel-Accurate Epipolar Guided Matching
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Keypoint matching can be slow and unreliable in challenging conditions such as repetitive textures or wide-baseline views. In such cases, known geometric relations (e.g., the fundamental matrix) can be used to restrict potential correspondences to a narrow epipolar envelope, thereby reducing the search space and improving robustness. These epipolar-guided matching approaches have proved effective in tasks such as SfM; however, most rely on coarse spatial binning, which introduces approximation errors, requires costly post-processing, and may miss valid correspondences. We address these limitations with an exact formulation that performs candidate selection directly in angular space. In our approach, each keypoint is assigned a tolerance circle which, when viewed from the epipole, defines an angular interval. Matching then becomes a 1D angular interval query, solved efficiently in logarithmic time with a segment tree. This guarantees pixel-level tolerance, supports per-keypoint control, and removes unnecessary descriptor comparisons. Extensive evaluation on ETH3D demonstrates noticeable speedups over existing approaches while recovering exact correspondence sets.

[193]  arXiv:2603.18402 [pdf, ps, other]
Title: Inst4DGS: Instance-Decomposed 4D Gaussian Splatting with Multi-Video Label Permutation Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present Inst4DGS, an instance-decomposed 4D Gaussian Splatting (4DGS) approach with long-horizon per-Gaussian trajectories. While dynamic 4DGS has advanced rapidly, instance-decomposed 4DGS remains underexplored, largely due to the difficulty of associating inconsistent instance labels across independently segmented multi-view videos. We address this challenge by introducing per-video label-permutation latents that learn cross-video instance matches through a differentiable Sinkhorn layer, enabling direct multi-view supervision with consistent identity preservation. This explicit label alignment yields sharp decision boundaries and temporally stable identities without identity drift. To further improve efficiency, we propose instance-decomposed motion scaffolds that provide low-dimensional motion bases per object for long-horizon trajectory optimization. Experiments on Panoptic Studio and Neural3DV show that Inst4DGS jointly supports tracking and instance decomposition while achieving state-of-the-art rendering and segmentation quality. On the Panoptic Studio dataset, Inst4DGS improves PSNR from 26.10 to 28.36, and instance mIoU from 0.6310 to 0.9129, over the strongest baseline.

[194]  arXiv:2603.18403 [pdf, ps, other]
Title: Wavelet-based grid adaptation with consistent treatment of high-order sharp immersed geometries
Comments: 24 pages, 13 figures
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)

Wavelet-based grid adaptation methods use multiresolution analysis for error estimation, offering a mathematically rigorous approach to adaptive grid refinement when solving Partial Differential Equations (PDEs). However, applying these methods to PDE discretizations with immersed geometries is challenging, as standard interpolating wavelet transforms lose consistency near non-grid-aligned boundary intersections. To address this, we propose a high-order interpolating wavelet transform adaptation strategy compatible with sharp immersed boundary and interface discretizations. The approach performs consistent high-order wavelet transforms on narrow intervals using a 1D polynomial extrapolation technique. To maintain high order, the technique incorporates boundary values and derivatives, which are evaluated from multivariate interpolating polynomials similar to those used in high order immersed finite difference discretizations. Consequently, the proposed approach maintains the wavelet order on any arbitrary smooth multidimensional domain, including near concave geometry sections. This approach enables grid adaptation in complex domains while robustly bounding the numerical error via a manually set refinement threshold. The algorithm's performance is validated on both static and dynamic problems, including the Navier-Stokes equations with moving boundaries and temporally adapting grid resolutions. The results demonstrate that the proposed method enables effective grid adaptation, establishing a robust, predictable relationship between a user-defined refinement threshold and the overall solution error, even for problems with complex, moving boundaries.

[195]  arXiv:2603.18407 [pdf, ps, other]
Title: Interleaved Information Structures in Dynamic Games: A General Framework with Application to the Linear-Quadratic Case
Comments: 6 pages, 3 figures
Subjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

A fundamental problem in noncooperative dynamic game theory is the computation of Nash equilibria under different information structures, which specify the information available to each agent during decision-making. Prior work has extensively studied equilibrium solutions for two canonical information structures: feedback, where agents observe the current state at each time, and open-loop, where agents only observe the initial state. However, these paradigms are often too restrictive to capture realistic settings exhibiting interleaved information structures, in which each agent observes only a subset of other agents at every timestep. To date, there is no systematic framework for modeling and solving dynamic games under arbitrary interleaved information structures. To this end, we make two main contributions. First, we introduce a method to model deterministic dynamic games with arbitrary interleaved information structures as Mathematical Program Networks (MPNs), where the network structure encodes the informational dependencies between agents. Second, for linear-quadratic (LQ) dynamic games, we leverage the MPN formulation to develop a systematic procedure for deriving Riccati-like equations that characterize Nash equilibria. Finally, we illustrate our approach through an example involving three agents exhibiting a cyclic information structure.

[196]  arXiv:2603.18408 [pdf, ps, other]
Title: Efficient and Versatile Quadrupedal Skating: Optimal Co-design via Reinforcement Learning and Bayesian Optimization
Subjects: Robotics (cs.RO)

In this paper, we present a hardware-control co-design approach that enables efficient and versatile roller skating on quadrupedal robots equipped with passive wheels. Passive-wheel skating reduces leg inertia and improves energy efficiency, particularly at high speeds. However, the absence of direct wheel actuation tightly couples mechanical design and control. To unlock the full potential of this modality, we formulate a bilevel optimization framework: an upper-level Bayesian Optimization searches the mechanical design space, while a lower-level Reinforcement Learning trains a motor control policy for each candidate design. The resulting design-policy pairs not only outperform human-engineered baselines, but also exhibit versatile behaviors such as hockey stop (rapid braking by turning sideways to maximize friction) and self-aligning motion (automatic reorientation to improve energy efficiency in the direction of travel), offering the first system-level study of dynamic skating motion on quadrupedal robots.

[197]  arXiv:2603.18409 [pdf, ps, other]
Title: TopoChunker: Topology-Aware Agentic Document Chunking Framework
Authors: Xiaoyu Liu
Subjects: Computation and Language (cs.CL)

Current document chunking methods for Retrieval-Augmented Generation (RAG) typically linearize text. This forced linearization strips away intrinsic topological hierarchies, creating ``semantic fragmentation'' that degrades downstream retrieval quality. In this paper, we propose TopoChunker, an agentic framework that maps heterogeneous documents onto a Structured Intermediate Representation (SIR) to explicitly preserve cross-segment dependencies. To balance structural fidelity with computational cost, TopoChunker employs a dual-agent architecture. An Inspector Agent dynamically routes documents through cost-optimized extraction paths, while a Refiner Agent performs capacity auditing and topological context disambiguation to reconstruct hierarchical lineage. Evaluated on unstructured narratives (GutenQA) and complex reports (GovReport), TopoChunker demonstrates state-of-the-art performance. It outperforms the strongest LLM-based baseline by 8.0% in absolute generation accuracy and achieves an 83.26% Recall@3, while simultaneously reducing token overhead by 23.5%, offering a scalable approach for structure-aware RAG.

[198]  arXiv:2603.18411 [pdf, ps, other]
Title: TARo: Token-level Adaptive Routing for LLM Test-time Alignment
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models (LLMs) exhibit strong reasoning capabilities but typically require expensive post-training to reach high performance. Recent test-time alignment methods offer a lightweight alternative, but have been explored mainly for preference alignment rather than reasoning. To bridge this gap, we propose, Token-level Adaptive Routing (TARo), which steers frozen LLMs toward structured reasoning entirely at inference time. Specifically, we first train reward models on step-wise mathematical traces to capture fine-grained logical consistency signals, then introduce a learnable token-level router that automatically controls the guidance of the reward model to the base model. Extensive experiments show that TARo significantly improves reasoning performance by up to +22.4% over base model and +8.4% over existing token-level test-time alignment methods, while also boosting out-of-distribution clinical reasoning (MedXpertQA) and instruction following (AlpacaEval). Furthermore, TARo also generalizes from small to large backbones without retraining, extending test-time alignment from preference optimization to robust, cross-domain reasoning.

[199]  arXiv:2603.18415 [pdf, ps, other]
Title: The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation
Comments: Comments: 32 pages, 6 tables, empirical research on corporate finance & digital economy, using Chinese A-share listed companies data (2006-2024), incorporating agent-based modelling simulations, suitable for finance/innovation economics journals
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

At a time when the phenomenon of 'AI washing' is quietly spreading, an increasing number of enterprises are using the label of artificial intelligence merely as a cosmetic embellishment in their annual reports, rather than as a genuine engine driving transformation. A test regarding the essence of innovation and the authenticity of information disclosure has arrived. This paper employs large language models to conduct semantic analysis on the text of annual reports from Chinese A-share listed companies from 2006 to 2024, systematically examining the impact of corporate AI washing behaviour on their green innovation. The research reveals that corporate AI washing exerts a significant crowding-out effect on green innovation, with this negative relationship transmitted through dual channels in both product and capital markets. Furthermore, this crowding-out effect exhibits heterogeneity across firms and industries, with private enterprises, small and medium-sized enterprises (SMEs), and firms in highly competitive sectors suffering more severe negative impacts from AI washing. Simulation results indicate that a combination of policy tools can effectively improve market equilibrium. Based on this, this paper proposes that the government should design targeted support tools to 'enhance market returns and alleviate financing constraints', adopt a differentiated regulatory strategy, and establish a disclosure mechanism combining 'professional identification and reputational sanctions' to curb such peer AI washing behaviour.

[200]  arXiv:2603.18417 [pdf, ps, other]
Title: Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration
Comments: Accepted to the International Conference on Machine Intelligence Theory and Applications (MiTA 2026)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Sparse attention mechanisms promise to break the quadratic bottleneck of long-context transformers, yet production adoption remains limited by a critical usability gap: optimal hyperparameters vary substantially across layers and models, and current methods (e.g., SpargeAttn) rely on manual grid search to identify them. We propose AFBS-BO (Adaptive Fidelity Binary Search with Bayesian Optimization), a fully automated framework that discovers optimal layer- and head-specific hyperparameters without human intervention. Our hybrid algorithm combines Bayesian Optimization for global exploration with binary search for local refinement, leveraging multi-fidelity evaluation across sequence lengths to reduce tuning cost. On Llama-2-7B, AFBS-BO accelerates hyperparameter discovery by 3.4x with 8.8x fewer evaluations than grid search, and identifies high-sparsity configurations that outperform existing sparse attention baselines while closely matching dense attention quality. By transforming sparse attention from a manually tuned heuristic into a self-optimizing primitive, AFBS-BO enables plug-and-play acceleration across diverse transformer architectures and domains.

[201]  arXiv:2603.18418 [pdf, ps, other]
Title: Mind the Rarities: Can Rare Skin Diseases Be Reliably Diagnosed via Diagnostic Reasoning?
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Large vision-language models (LVLMs) demonstrate strong performance in dermatology; however, evaluating diagnostic reasoning for rare conditions remains largely unexplored. Existing benchmarks focus on common diseases and assess only final accuracy, overlooking the clinical reasoning process, which is critical for complex cases. We address this gap by constructing DermCase, a long-context benchmark derived from peer-reviewed case reports. Our dataset contains 26,030 multi-modal image-text pairs and 6,354 clinically challenging cases, each annotated with comprehensive clinical information and step-by-step reasoning chains. To enable reliable evaluation, we establish DermLIP-based similarity metrics that achieve stronger alignment with dermatologists for assessing differential diagnosis quality. Benchmarking 22 leading LVLMs exposes significant deficiencies across diagnosis accuracy, differential diagnosis, and clinical reasoning. Fine-tuning experiments demonstrate that instruction tuning substantially improves performance while Direct Preference Optimization (DPO) yields minimal gains. Systematic error analysis further reveals critical limitations in current models' reasoning capabilities.

[202]  arXiv:2603.18420 [pdf, ps, other]
Title: From Topic to Transition Structure: Unsupervised Concept Discovery at Corpus Scale via Predictive Associative Memory
Authors: Jason Dury
Comments: 22 pages, 5 figures. Code and demo: this https URL
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Embedding models group text by semantic content, what text is about. We show that temporal co-occurrence within texts discovers a different kind of structure: recurrent transition-structure concepts or what text does. We train a 29.4M-parameter contrastive model on 373 million co-occurrence pairs from 9,766 Project Gutenberg texts (24.96 million passages), mapping pre-trained embeddings into an association space where passages with similar transition structure cluster together. Under capacity constraint (42.75% accuracy), the model must compress across recurring patterns rather than memorise individual co-occurrences. Clustering at six granularities (k=50 to k=2,000) produces a multi-resolution concept map; from broad modes like "direct confrontation" and "lyrical meditation" to precise registers and scene templates like "sailor dialect" and "courtroom cross-examination." At k=100, clusters average 4,508 books each (of 9,766), confirming corpus-wide patterns. Direct comparison with embedding-similarity clustering shows that raw embeddings group by topic while association-space clusters group by function, register, and literary tradition. Unseen novels are assigned to existing clusters without retraining; the association model concentrates each novel into a selective subset of coherent clusters, while raw embedding assignment saturates nearly all clusters. Validation controls address positional, length, and book-concentration confounds. The method extends Predictive Associative Memory (PAM, arXiv:2602.11322) from episodic recall to concept formation: where PAM recalls specific associations, multi-epoch contrastive training under compression extracts structural patterns that transfer to unseen texts, the same framework producing qualitatively different behaviour in a different regime.

[203]  arXiv:2603.18421 [pdf, ps, other]
Title: The Impact of Corporate AI Washing on Farmers' Digital Financial Behavior Response -- An Analysis from the Perspective of Digital Financial Exclusion
Comments: Comments: 35 pages, 4 tables, empirical research on rural digital finance & fintech, using CHFS2019 data (6,800 rural households) & corporate AI investment data, incorporating Logit/Ologit/GSEM models, suitable for agricultural economics/financial inclusion journals
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

In the context of the rapid development of digital finance, some financial technology companies exhibit the phenomenon of "AI washing," where they overstate their AI capabilities while underinvesting in actual AI resources. This paper constructs a corporate-level AI washing index based on CHFS2019 data and AI investment data from 15-20 financial technology companies, analyzing and testing its impact on farmers' digital financial behavior response. The study finds that AI washing significantly suppresses farmers' digital financial behavior; the higher the degree of AI washing, the lower the response level of farmers' digital financial behavior. Moreover, AI washing indirectly inhibits farmers' behavioral responses by exacerbating knowledge exclusion and risk exclusion. Social capital can positively moderate the negative impact of AI washing; among farmer groups with high social capital, the suppressive effect of AI washing on digital financial behavior is significantly weaker than that among groups with low social capital. In response, this paper suggests that regulatory authorities establish a strict information disclosure system for AI technology, conduct differentiated digital financial education to enhance the identification capabilities of vulnerable groups, promote digital financial mutual aid groups to leverage the protective effects of social capital, improve the consumer protection mechanism for farmers in digital finance, and set up pilot "Digital Inclusive Finance Demonstration Counties," etc.

[204]  arXiv:2603.18422 [pdf, ps, other]
Title: Topological Obstructions to the Existence of Control Barrier Functions
Comments: 6 pages, 3 figures
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

In 1983, Brockett developed a topological necessary condition for the existence of continuous, asymptotically stabilizing control laws. Building upon recent work on necessary conditions for set stabilization, we develop Brockett-like necessary conditions for the existence of control barrier functions (CBFs). By leveraging the unique geometry of CBF safe sets, we provide simple and self-contained derivations of necessary conditions for the existence of CBFs and their safe, continuous controllers. We demonstrate the application of these conditions to instructive examples and kinematic nonholonomic systems, and discuss their relationship to Brockett's necessary condition.

[205]  arXiv:2603.18423 [pdf, ps, other]
Title: SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning
Comments: ICLR 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)

How can we accurately quantize a pre-trained model without any data? Quantization algorithms are widely used for deploying neural networks on resource-constrained edge devices. Zero-shot Quantization (ZSQ) addresses the crucial and practical scenario where training data are inaccessible for privacy or security reasons. However, three significant challenges hinder the performance of existing ZSQ methods: 1) noise in the synthetic dataset, 2) predictions based on off-target patterns, and the 3) misguidance by erroneous hard labels. In this paper, we propose SynQ (Synthesis-aware Fine-tuning for Zero-shot Quantization), a carefully designed ZSQ framework to overcome the limitations of existing methods. SynQ minimizes the noise from the generated samples by exploiting a low-pass filter. Then, SynQ trains the quantized model to improve accuracy by aligning its class activation map with the pre-trained model. Furthermore, SynQ mitigates misguidance from the pre-trained model's error by leveraging only soft labels for difficult samples. Extensive experiments show that SynQ provides the state-of-the-art accuracy, over existing ZSQ methods.

[206]  arXiv:2603.18424 [pdf, ps, other]
Title: Deceiving Flexibility: A Stealthy False Data Injection Model in Vehicle-to-Grid Coordination
Subjects: Systems and Control (eess.SY); Computational Engineering, Finance, and Science (cs.CE)

Electric vehicles (EVs) in Vehicle-to-Grid (V2G) systems act as distributed energy resources that support grid stability. Centralized coordination such as the extended State Space Model (eSSM) enhances scalability and estimation efficiency but may introduce new cyber-attack surfaces. This paper presents a stealthy False Data Injection Attack (FDIA) targeting eSSM-based V2G coordination. Unlike prior studies that assume attackers can disrupt physical charging or discharging processes, we consider an adversary who compromises only a subset of EVs, and limiting their influence to the manipulation of reported State of Charge (SoC) and power measurements. By doing so, the attacker can deceive the operator's perception of fleet flexibility while remaining consistent with model-based expectations, thus evading anomaly detection. Numerical simulations show that the proposed stealthy FDIA can deteriorate grid frequency stability even without direct access to control infrastructure. These findings highlight the need for enhanced detection and mitigation mechanisms tailored to aggregated V2G frameworks

[207]  arXiv:2603.18425 [pdf, ps, other]
Title: Multimodal Task Interference: A Benchmark and Analysis of History-Target Mismatch in Multimodal LLMs
Subjects: Computation and Language (cs.CL)

Task interference, the performance degradation caused by task switches within a single conversation, has been studied exclusively in text-only settings despite the growing prevalence of multimodal dialogue systems. We introduce a benchmark for evaluating this phenomenon in multimodal LLMs, covering six tasks across text and vision with systematic variation of history-target along three axes: modality mismatch, reasoning mismatch, and answer format mismatch. Experiments on both open-weights and proprietary models reveal that task interference is highly directional: switching from text-only to image-based targets causes severe performance drops, while the reverse transition yields minimal degradation. Interference is further amplified when mismatches co-occur across multiple dimensions, and is driven most strongly by modality differences, followed by answer format, while reasoning requirement shifts cause minimal degradation.

[208]  arXiv:2603.18426 [pdf, ps, other]
Title: Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
Comments: ICLR 2026
Subjects: Artificial Intelligence (cs.AI)

What happens when multiple compression methods are combined-does the order in which they are applied matter? Joint model compression has emerged as a powerful strategy to achieve higher efficiency by combining multiple methods such as pruning and quantization. A central but underexplored factor in joint model compression is the compression order, or the sequence of different methods within the compression pipeline. Most prior studies have either sidestepped the issue by assuming orthogonality between techniques, while a few have examined them only in highly constrained cases. Consequently, the broader role of compression order in shaping model performance remains poorly understood. In this paper, we address the overlooked problem of compression order and provide both theoretical and empirical analysis. We formulate the problem of optimizing the compression order and introduce the Progressive Intensity Hypothesis, which states that weaker perturbations should precede stronger ones. We provide theoretical guarantees showing that the relative benefit of one order increases with the underlying performance gap. Extensive experiments on both language and vision models validate the hypothesis, and further show its generality to broader setups such as multi-stage compression and mixed-precision quantization.

[209]  arXiv:2603.18427 [pdf, ps, other]
Title: R&D: Balancing Reliability and Diversity in Synthetic Data Augmentation for Semantic Segmentation
Journal-ref: Computational Collective Intelligence, ICCCI 2025, Lecture Notes in Computer Science 16139 (2026) 433-448
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Collecting and annotating datasets for pixel-level semantic segmentation tasks are highly labor-intensive. Data augmentation provides a viable solution by enhancing model generalization without additional real-world data collection. Traditional augmentation techniques, such as translation, scaling, and color transformations, create geometric variations but fail to generate new structures. While generative models have been employed to extend semantic information of datasets, they often struggle to maintain consistency between the original and generated images, particularly for pixel-level tasks. In this work, we propose a novel synthetic data augmentation pipeline that integrates controllable diffusion models. Our approach balances diversity and reliability data, effectively bridging the gap between synthetic and real data. We utilize class-aware prompting and visual prior blending to improve image quality further, ensuring precise alignment with segmentation labels. By evaluating benchmark datasets such as PASCAL VOC and BDD100K, we demonstrate that our method significantly enhances semantic segmentation performance, especially in data-scarce scenarios, while improving model robustness in real-world applications. Our code is available at \href{https://github.com/chequanghuy/Enhanced-Generative-Data-Augmentation-for-Semantic-Segmentation-via-Stronger-Guidance}{https://github.com/chequanghuy/Enhanced-Generative-Data-Augmentation-for-Semantic-Segmentation-via-Stronger-Guidance}.

[210]  arXiv:2603.18428 [pdf, ps, other]
Title: Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Decoding strategies largely determine the quality of Large Language Model (LLM) outputs, yet widely used heuristics such as greedy or fixed temperature/top-p decoding are static and often task-agnostic, leading to suboptimal or inconsistent generation quality across domains that demand stylistic or structural flexibility. We introduce a reinforcement learning-based decoder sampler that treats decoding as sequential decision-making and learns a lightweight policy to adjust sampling parameters at test-time while keeping LLM weights frozen. We evaluated summarization datasets including BookSum, arXiv, and WikiHow using Granite-3.3-2B and Qwen-2.5-0.5B. Our policy sampler consistently outperforms greedy and static baselines, achieving relative gains of up to +88% (BookSum, Granite) and +79% (WikiHow, Qwen). Reward ablations show that overlap-only objectives underperform compared to composite rewards, while structured shaping terms (length, coverage, repetition, completeness) enable stable and sustained improvements. These findings highlight reinforcement learning as a practical mechanism for test-time adaptation in decoding, enabling domain-aware and user-controllable generation without retraining large models.

[211]  arXiv:2603.18429 [pdf, ps, other]
Title: AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Long-horizon GUI agents are a key step toward real-world deployment, yet effective interaction memory under prevailing paradigms remains under-explored. Replaying full interaction sequences is redundant and amplifies noise, while summaries often erase dependency-critical information and traceability. We present AndroTMem, a diagnostic framework for anchored memory in long-horizon Android GUI agents. Its core benchmark, AndroTMem-Bench, comprises 1,069 tasks with 34,473 interaction steps (avg. 32.1 per task, max. 65). We evaluate agents with TCR (Task Complete Rate), focusing on tasks whose completion requires carrying forward critical intermediate state; AndroTMem-Bench is designed to enforce strong step-to-step causal dependencies, making sparse yet essential intermediate states decisive for downstream actions and centering interaction memory in evaluation. Across open- and closed-source GUI agents, we observe a consistent pattern: as interaction sequences grow longer, performance drops are driven mainly by within-task memory failures, not isolated perception errors or local action mistakes. Guided by this diagnosis, we propose Anchored State Memory (ASM), which represents interaction sequences as a compact set of causally linked intermediate-state anchors to enable subgoal-targeted retrieval and attribution-aware decision making. Across multiple settings and 12 evaluated GUI agents, ASM consistently outperforms full-sequence replay and summary-based baselines, improving TCR by 5%-30.16% and AMS by 4.93%-24.66%, indicating that anchored, structured memory effectively mitigates the interaction-memory bottleneck in long-horizon GUI tasks. The code, benchmark, and related resources are publicly available at [https://github.com/CVC2233/AndroTMem](https://github.com/CVC2233/AndroTMem).

[212]  arXiv:2603.18431 [pdf, ps, other]
Title: Towards Noise-Resilient Quantum Multi-Armed and Stochastic Linear Bandits
Subjects: Machine Learning (cs.LG)

Quantum multi-armed bandits (MAB) and stochastic linear bandits (SLB) have recently attracted significant attention, as their quantum counterparts can achieve quadratic speedups over classical MAB and SLB. However, most existing quantum MAB algorithms assume ideal quantum Monte Carlo (QMC) procedures on noise-free circuits, overlooking the impact of noise in current noisy intermediate-scale quantum (NISQ) devices. In this paper, we study a noise-robust QMC algorithm that improves estimation accuracy when querying quantum reward oracles. Building on this estimator, we propose noise-robust QMAB and QSLB algorithms that enhance performance in noisy environments while preserving the advantage over classical methods. Experiments show that our noise-robust approach improves QMAB estimation accuracy and reduces regret under several quantum noise models.

[213]  arXiv:2603.18432 [pdf, ps, other]
Title: MLOW: Interpretable Low-Rank Frequency Magnitude Decomposition of Multiple Effects for Time Series Forecasting
Subjects: Machine Learning (cs.LG)

Separating multiple effects in time series is fundamental yet challenging for time-series forecasting (TSF). However, existing TSF models cannot effectively learn interpretable multi-effect decomposition by their smoothing-based temporal techniques. Here, a new interpretable frequency-based decomposition pipeline MLOW captures the insight: a time series can be represented as a magnitude spectrum multiplied by the corresponding phase-aware basis functions, and the magnitude spectrum distribution of a time series always exhibits observable patterns for different effects. MLOW learns a low-rank representation of the magnitude spectrum to capture dominant trending and seasonal effects. We explore low-rank methods, including PCA, NMF, and Semi-NMF, and find that none can simultaneously achieve interpretable, efficient and generalizable decomposition. Thus, we propose hyperplane-nonnegative matrix factorization (Hyperplane-NMF). Further, to address the frequency (spectral) leakage restricting high-quality low-rank decomposition, MLOW enables a flexible selection of input horizons and frequency levels via a mathematical mechanism. Visual analysis demonstrates that MLOW enables interpretable and hierarchical multiple-effect decomposition, robust to noises. It can also enable plug-and-play in existing TSF backbones with remarkable performance improvement but minimal architectural modifications.

[214]  arXiv:2603.18433 [pdf, ps, other]
Title: Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems
Comments: 4 Figures, 3 Tables, Submitted to the International Conference on Power, Electronics, Communications, Computing, and Intelligent Infrastructure 2026
Subjects: Cryptography and Security (cs.CR)

Large language models (LLMs) deployed behind APIs and retrieval-augmented generation (RAG) stacks are vulnerable to prompt injection attacks that may override system policies, subvert intended behavior, and induce unsafe outputs. Existing defenses often treat prompts as flat strings and rely on ad hoc filtering or static jailbreak detection. This paper proposes Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models each request as a structured composition of system, developer, user, and retrieved-document segments. PCFI applies a three-stage middleware pipeline, lexical heuristics, role-switch detection, and hierarchical policy enforcement, before forwarding requests to the backend LLM. We implement PCFI as a FastAPI-based gateway for deployed LLM APIs and evaluate it on a custom benchmark of synthetic and semi-realistic prompt-injection workloads. On the evaluated benchmark suite, PCFI intercepts all attack-labeled requests, maintains a 0% False Positive Rate, and introduces a median processing overhead of only 0.04 ms. These results suggest that provenance- and priority-aware prompt enforcement is a practical and lightweight defense for deployed LLM systems.

[215]  arXiv:2603.18435 [pdf, ps, other]
Title: Beyond Ray-Casting: Evaluating Controller, Free-Hand, and Virtual-Touch Modalities for Immersive Text Entry
Comments: 7 figures, International Conference on Power, Electronics, Communications, Computing, and Intelligent Infrastructure 2026
Subjects: Human-Computer Interaction (cs.HC)

Efficient text entry remains a primary bottleneck preventing Virtual Reality (VR) from evolving into a viable productivity platform. To address this, we conducted an empirical comparison of six physical input systems across three interaction styles Controller Driven, Free Hand, and Virtual Touch evaluating both discrete tap typing and continuous gesture typing (swiping), alongside a speech to text (Voice) condition as a non physical reference modality. Results from 21 participants show that the Controller Driven Tap Gesture Combo (CD TGC) delivers the best productivity performance, achieving speeds 2.25 times higher than the slowest system and 30% faster than the current industry standard, while reducing error rates by up to 68%. A clear trade off emerged between performance and perceived usability: although controller based gesture input led on speed and accuracy, participants rated Virtual Touch Tap Typing highest in subjective experience, scoring 80% higher on the System Usability Scale (SUS) than the lowest rated alternative. We further observe that Free Hand interaction remains limited by tracking stability and physical fatigue, whereas Voice input introduces practical constraints related to privacy, editing control, and immersive engagement. Together, these findings characterize the tension between throughput and natural interaction in immersive text entry and provide data driven guidance for future VR interface design.

[216]  arXiv:2603.18436 [pdf, ps, other]
Title: AS2 -- Attention-Based Soft Answer Sets: An End-to-End Differentiable Neuro-Soft-Symbolic Reasoning Architecture
Authors: Wael AbdAlmageed
Subjects: Artificial Intelligence (cs.AI)

Neuro-symbolic artificial intelligence (AI) systems typically couple a neural perception module to a discrete symbolic solver through a non-differentiable boundary, preventing constraint-satisfaction feedback from reaching the perception encoder during training. We introduce AS2 (Attention-Based Soft Answer Sets), a fully differentiable neuro-symbolic architecture that replaces the discrete solver with a soft, continuous approximation of the Answer Set Programming (ASP) immediate consequence operator $T_P$. AS2 maintains per-position probability distributions over a finite symbol domain throughout the forward pass and trains end-to-end by minimizing the fixed-point residual of a probabilistic lift of $T_P$, thereby differentiating through the constraint check without invoking an external solver at either training or inference time. The architecture is entirely free of conventional positional embeddings. Instead, it encodes problem structure through constraint-group membership embeddings that directly reflect the declarative ASP specification, making the model agnostic to arbitrary position indexing. On Visual Sudoku, AS2 achieves 99.89% cell accuracy and 100% constraint satisfaction (verified by Clingo) across 1,000 test boards, using a greedy constrained decoding procedure that requires no external solver. On MNIST Addition with $N \in \{2, 4, 8\}$ addends, AS2 achieves digit accuracy above 99.7% across all scales. These results demonstrate that a soft differentiable fixpoint operator, combined with constraint-aware attention and declarative constraint specification, can match or exceed pipeline and solver-based neuro-symbolic systems while maintaining full end-to-end differentiability.

[217]  arXiv:2603.18443 [pdf, ps, other]
Title: SR-Nav: Spatial Relationships Matter for Zero-shot Object Goal Navigation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Zero-shot object-goal navigation aims to find target objects in unseen environments using only egocentric observation. Recent methods leverage foundation models' comprehension and reasoning capabilities to enhance navigation performance. However, when faced with poor viewpoints or weak semantic cues, foundation models often fail to support reliable reasoning in both perception and planning, resulting in inefficient or failed navigation. We observe that inherent relationships among objects and regions encode structured scene priors, which help agents infer plausible target locations even under partial observations. Motivated by this insight, we propose Spatial Relation-aware Navigation (SR-Nav), a framework that models both observed and experience-based spatial relationships to enhance both perception and planning. Specifically, SR-Nav first constructs a Dynamic Spatial Relationship Graph (DSRG) that encodes the target-centered spatial relationships through the foundation models and updates dynamically with real-time observations. We then introduce a Relation-aware Matching Module. It utilizes relationship matching instead of naive detection, leveraging diverse relationships in the DSRG to verify and correct errors, enhancing visual perception robustness. Finally, we design a Dynamic Relationship Planning Module to reduce the planning search space by dynamically computing the optimal paths based on the DSRG from the current position, thereby guiding planning and reducing exploration redundancy. Experiments on HM3D show that our method achieves state-of-the-art performance in both success rate and navigation efficiency. The code will be publicly available at https://github.com/Mzyw-1314/SR-Nav

[218]  arXiv:2603.18444 [pdf, ps, other]
Title: Discounted Beta--Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards
Comments: 14 pages, 3 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Reinforcement learning with verifiable rewards (RLVR) has emerged as an effective post-training paradigm for improving the reasoning capabilities of large language models. However, existing group-based RLVR methods often suffer from severe sample inefficiency. This inefficiency stems from reliance on point estimation of rewards from a small number of rollouts, leading to high estimation variance, variance collapse, and ineffective utilization of generated responses. In this work, we reformulate RLVR from a statistical estimation perspective by modeling rewards as samples drawn from a policy-induced distribution and casting advantage computation as the problem of estimating the reward distribution from finite data. Building on this view, we propose Discounted Beta--Bernoulli (DBB) reward estimation, which leverages historical reward statistics for the non-stationary distribution. Although biased, the resulting estimator exhibits reduced and stable variance, theoretically avoids estimated variance collapse, and achieves lower mean squared error than standard point estimation. Extensive experiments across six in-distribution and three out-of-distribution reasoning benchmarks demonstrate that GRPO with DBB consistently outperforms naive GRPO, achieving average Acc@8 improvements of 3.22/2.42 points in-distribution and 12.49/6.92 points out-of-distribution on the 1.7B and 8B models, respectively, without additional computational cost or memory usage.

[219]  arXiv:2603.18446 [pdf, ps, other]
Title: UT-ACA: Uncertainty-Triggered Adaptive Context Allocation for Long-Context Inference
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Long-context inference remains challenging for large language models due to attention dilution and out-of-distribution degradation. Context selection mitigates this limitation by attending to a subset of key-value cache entries, yet most methods allocate a fixed context budget throughout decoding despite highly non-uniform token-level contextual demands. To address this issue, we propose Uncertainty-Triggered Adaptive Context Allocation (UT-ACA), an inference-time framework that dynamically adjusts the context window based on token-wise uncertainty. UT-ACA learns an uncertainty detector that combines semantic embeddings with logit-based confidence while accounting for uncertainty accumulation across decoding steps. When insufficient evidence is indicated, UT-ACA selectively rolls back, expands the context window, and regenerates the token with additional support. Experiments show that UT-ACA substantially reduces average context usage while preserving generation quality in long-context settings.

[220]  arXiv:2603.18447 [pdf, ps, other]
Title: SODIUM: From Open Web Data to Queryable Databases
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

During research, domain experts often ask analytical questions whose answers require integrating data from a wide range of web sources. Thus, they must spend substantial effort searching, extracting, and organizing raw data before analysis can begin. We formalize this process as the SODIUM task, where we conceptualize open domains such as the web as latent databases that must be systematically instantiated to support downstream querying. Solving SODIUM requires (1) conducting in-depth and specialized exploration of the open web, which is further strengthened by (2) exploiting structural correlations for systematic information extraction and (3) integrating collected information into coherent, queryable database instances.
To quantify the challenges in automating SODIUM, we construct SODIUM-Bench, a benchmark of 105 tasks derived from published academic papers across 6 domains, where systems are tasked with exploring the open web to collect and aggregate data from diverse sources into structured tables. Existing systems struggle with SODIUM tasks: we evaluate 6 advanced AI agents on SODIUM-Bench, with the strongest baseline achieving only 46.5% accuracy. To bridge this gap, we develop SODIUM-Agent, a multi-agent system composed of a web explorer and a cache manager. Powered by our proposed ATP-BFS algorithm and optimized through principled management of cached sources and navigation paths, SODIUM-Agent conducts deep and comprehensive web exploration and performs structurally coherent information extraction. SODIUM-Agent achieves 91.1% accuracy on SODIUM-Bench, outperforming the strongest baseline by approximately 2 times and the weakest by up to 73 times.

[221]  arXiv:2603.18448 [pdf, ps, other]
Title: Seeking Universal Shot Language Understanding Solutions
Subjects: Machine Learning (cs.LG)

Shot language understanding (SLU) is crucial for cinematic analysis but remains challenging due to its diverse cinematographic dimensions and subjective expert judgment. While vision-language models (VLMs) have shown strong ability in general visual understanding, recent studies reveal judgment discrepancies between VLMs and film experts on SLU tasks. To address this gap, we introduce SLU-SUITE, a comprehensive training and evaluation suite containing 490K human-annotated QA pairs across 33 tasks spanning six film-grounded dimensions. Using SLU-SUITE, we originally observe two insights into VLM-based SLU from: the model side, which diagnoses key bottlenecks of modules; the data side, which quantifies cross-dimensional influences among tasks. These findings motivate our universal SLU solutions from two complementary paradigms: UniShot, a balanced one-for-all generalist trained via dynamic-balanced data mixing, and AgentShots, a prompt-routed expert cluster that maximizes peak dimension performance. Extensive experiments show that our models outperform task-specific ensembles on in-domain tasks and surpass leading commercial VLMs by 22% on out-of-domain tasks.

[222]  arXiv:2603.18449 [pdf, ps, other]
Title: CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

The widespread deployment of large language models (LLMs) calls for post-hoc methods that can flexibly adapt models to evolving safety requirements. Meanwhile, the rapidly expanding open-source LLM ecosystem has produced a diverse collection of models that already exhibit various safety-related functionalities. This motivates a shift from constructing safety functionality from scratch to reusing existing functionality from external models, thereby avoiding costly data collection and training procedures.
In this paper, we present Cross-Model Neuron Transfer (CNT), a post-hoc method that reuses safety-oriented functionality by transferring a minimal subset of neurons from an open-source donor LLM to a target LLM. By operating at the neuron level, CNT enables modular function-level adaptation, supporting both function addition andfunction deletion. We evaluate CNT on seven popular LLMs across three representative applications: safety disalignment, alignment enhancement, and bias removal. Experimental results show that CNT achieves targeted safety-oriented functionality transfer with minimal performance degradation (less than 1% for most models), consistently outperforming five baselines, demonstrating its generality and practical effectiveness.

[223]  arXiv:2603.18450 [pdf, ps, other]
Title: Generalizations of Backup Control Barrier Functions: Expansion and Adaptation for Input-Bounded Safety-Critical Control
Comments: 6 pages, 2 figures
Subjects: Systems and Control (eess.SY)

Guaranteeing the safety of nonlinear systems with bounded inputs remains a key challenge in safe autonomy. Backup control barrier functions (bCBFs) provide a powerful mechanism for constructing controlled invariant sets by propagating trajectories under a pre-verified backup controller to a forward invariant backup set. While effective, the standard bCBF method utilizes the same backup controller for both set expansion and safety certification, which can restrict the expanded safe set and lead to conservative dynamic behavior. In this study, we generalize the bCBF framework by separating the set-expanding controller from the verified backup controller, thereby enabling a broader class of expansion strategies while preserving formal safety guarantees. We establish sufficient conditions for forward invariance of the resulting implicit safe set and show how the generalized construction recovers existing bCBF methods as special cases. Moreover, we extend the proposed framework to parameterized controller families, enabling online adaptation of the expansion controller while maintaining safety guarantees in the presence of input bounds.

[224]  arXiv:2603.18452 [pdf, ps, other]
Title: Pólya Thresholds Graphs
Subjects: Information Theory (cs.IT); Social and Information Networks (cs.SI); Probability (math.PR)

We introduce the P\'olya threshold graph model and derive its stochastic and algebraic properties. This random threshold graph is generated sequentially via a two-color P\'olya urn process. Starting from an empty graph, each time step involves a draw from the urn that produces an indicator variable, determining whether a newly added node is universal (connected to all existing nodes and itself) or isolated (connected to no existing nodes). This construction yields a random threshold graph with an adjacency matrix that admits an explicit representation in terms of the draw sequence. Using the structure of the P\'olya draw process, we derive the exact degree distribution for any arbitrary node, including its mean and variance. Furthermore, we evaluate a distance-based decay centrality score and provide an explicit expression for its expectation. On the algebraic side, we explicitly characterize the Laplacian matrix of the random threshold graph, obtaining a closed-form description of its spectrum and corresponding eigenbasis. Finally, as an application of these structural results, we analyze discrete-time consensus dynamics on P\'olya threshold graphs.

[225]  arXiv:2603.18453 [pdf, ps, other]
Title: Learning Consistent Temporal Grounding between Related Tasks in Sports Coaching
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video-LLMs often attend to irrelevant frames, which is especially detrimental for sports coaching tasks requiring precise temporal grounding. Yet obtaining frame-level supervision is challenging: expensive to collect from humans and unreliable from other models. We improve temporal grounding without additional annotations by exploiting the observation that related tasks, such as generation and verification, must attend to the same frames. We enforce this via a self-consistency objective over select visual attention maps of tightly-related tasks. Using VidDiffBench, which provides ground-truth keyframe annotations, we first validate that attention misallocation is a significant bottleneck. We then show that training with our objective yields gains of +3.0%, +14.1% accuracy and +0.9 BERTScore over supervised finetuning across three sports coaching tasks: Exact, FitnessQA, and ExpertAF, even surpassing closed-source models.

[226]  arXiv:2603.18455 [pdf, ps, other]
Title: Impact of Differentials in SIMON32 Algorithm for Lightweight Security of Internet of Things
Comments: Accepted at IEEE Global Communications Conference (GLOBECOM) 2025
Subjects: Cryptography and Security (cs.CR)

SIMON and SPECK were among the first efficient encryption algorithms introduced for resource-constrained applications. SIMON is suitable for Internet of Things (IoT) devices and has rapidly attracted the attention of the research community to understand its structure and analyse its security. To analyse the security of an encryption algorithm, researchers often employ cryptanalysis techniques. However, cryptanalysis is a resource and time-intensive task. To improve cryptanalysis efficiency, state-of-the-art research has proposed implementing heuristic search and sampling methods. Despite recent advances, the cryptanalysis of the SIMON cypher remains inefficient. Contributing factors are the large size of the difference distribution tables utilised in cryptanalysis and the scarcity of differentials with a high transition probability. To address these limitations, we introduce an analysis of differential properties of the SIMON32 cypher, revealing differential characteristics that pave the way for future efficiency enhancements. Our analysis has further increased the number of targeted rounds by identifying high probability differentials within a partial difference distribution table of the SIMON cypher, exceeding existing state-of-the-art benchmarks. The code designed for this work is available at https://github.com/johncook1979/simon32-analysis.

[227]  arXiv:2603.18459 [pdf, ps, other]
Title: HypeMed: Enhancing Medication Recommendations with Hypergraph-Based Patient Relationships
Comments: Accepted by TOIS
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Medication recommendations aim to generate safe and effective medication sets from health records. However, accurately recommending medications hinges on inferring a patient's latent clinical condition from sparse and noisy observations, which requires both (i) preserving the visit-level combinatorial semantics of co-occurring entities and (ii) leveraging informative historical references through effective, visit-conditioned retrieval. Most existing methods fall short in one of both aspects: graph-based modeling often fragments higher-order intra-visit patterns into pairwise relations, while inter-visit augmentation methods commonly exhibit an imbalance between learning a globally stable representation space and performing dynamic retrieval within it. To address these limitations, this paper proposes HypeMed, a two-stage hypergraph-based framework unifying intra-visit coherence modeling and inter-visit augmentation. HypeMed consists of two core modules: MedRep for representation pre-training, and SimMR for similarity-enhanced recommendation. In the first stage, MedRep encodes clinical visits as hyperedges via knowledge-aware contrastive pre-training, creating a globally consistent, retrieval-friendly embedding space. In the second stage, SimMR performs dynamic retrieval within this space, fusing retrieved references with the patient's longitudinal data to refine medication prediction. Evaluation on real-world benchmarks shows that HypeMed outperforms state-of-the-art baselines in both recommendation precision and DDI reduction, simultaneously enhancing the effectiveness and safety of clinical decision support.

[228]  arXiv:2603.18460 [pdf, ps, other]
Title: Interpretable Prostate Cancer Detection using a Small Cohort of MRI Images
Comments: 26 pages, 5 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Prostate cancer is a leading cause of mortality in men, yet interpretation of T2-weighted prostate MRI remains challenging due to subtle and heterogeneous lesions. We developed an interpretable framework for automatic cancer detection using a small dataset of 162 T2-weighted images (102 cancer, 60 normal), addressing data scarcity through transfer learning and augmentation. We performed a comprehensive comparison of Vision Transformers (ViT, Swin), CNNs (ResNet18), and classical methods (Logistic Regression, SVM, HOG+SVM). Transfer-learned ResNet18 achieved the best performance (90.9% accuracy, 95.2% sensitivity, AUC 0.905) with only 11M parameters, while Vision Transformers showed lower performance despite substantially higher complexity. Notably, HOG+SVM achieved comparable accuracy (AUC 0.917), highlighting the effectiveness of handcrafted features in small datasets. Unlike state-of-the-art approaches relying on biparametric MRI (T2+DWI) and large cohorts, our method achieves competitive performance using only T2-weighted images, reducing acquisition complexity and computational cost. In a reader study of 22 cases, five radiologists achieved a mean sensitivity of 67.5% (Fleiss Kappa = 0.524), compared to 95.2% for the AI model, suggesting potential for AI-assisted screening to reduce missed cancers and improve consistency. Code and data are publicly available.

[229]  arXiv:2603.18461 [pdf, ps, other]
Title: Cell-Type Prototype-Informed Neural Network for Gene Expression Estimation from Pathology Images
Comments: Accepted by CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Estimating slide- and patch-level gene expression profiles from pathology images enables rapid and low-cost molecular analysis with broad clinical impact. Despite strong results, existing approaches treat gene expression as a mere slide- or spot-level signal and do not incorporate the fact that the measured expression arises from the aggregation of underlying cell-level expression. To explicitly introduce this missing cell-resolved guidance, we propose a Cell-type Prototype-informed Neural Network (CPNN) that leverages publicly available single-cell RNA-sequencing datasets. Since single-cell measurements are noisy and not paired with histology images, we first estimate cell-type prototypes-mean expression profiles that reflect stable gene-gene co-variation patterns.CPNN then learns cell-type compositional weights directly from images and models the relationship between prototypes and observed bulk or spatial expression, providing a biologically grounded and structurally regularized prediction framework. We evaluate CPNN on three slide-level datasets and three patch-level spatial transcriptomics datasets. Across all settings, CPNN achieves the highest performance in terms of Spearman correlation. Moreover, by visualizing the inferred compositional weights, our framework provides interpretable insights into which cell types drive the predicted expression. Code is publicly available at https://github.com/naivete5656/CPNN.

[230]  arXiv:2603.18462 [pdf, ps, other]
Title: AlignMamba-2: Enhancing Multimodal Fusion and Sentiment Analysis with Modality-Aware Mamba
Comments: Accepted by Pattern Recognition
Subjects: Artificial Intelligence (cs.AI)

In the era of large-scale pre-trained models, effectively adapting general knowledge to specific affective computing tasks remains a challenge, particularly regarding computational efficiency and multimodal heterogeneity. While Transformer-based methods have excelled at modeling inter-modal dependencies, their quadratic computational complexity limits their use with long-sequence data. Mamba-based models have emerged as a computationally efficient alternative; however, their inherent sequential scanning mechanism struggles to capture the global, non-sequential relationships that are crucial for effective cross-modal alignment. To address these limitations, we propose \textbf{AlignMamba-2}, an effective and efficient framework for multimodal fusion and sentiment analysis. Our approach introduces a dual alignment strategy that regularizes the model using both Optimal Transport distance and Maximum Mean Discrepancy, promoting geometric and statistical consistency between modalities without incurring any inference-time overhead. More importantly, we design a Modality-Aware Mamba layer, which employs a Mixture-of-Experts architecture with modality-specific and modality-shared experts to explicitly handle data heterogeneity during the fusion process. Extensive experiments on four challenging benchmarks, including dynamic time-series (on the CMU-MOSI and CMU-MOSEI datasets) and static image-related tasks (on the NYU-Depth V2 and MVSA-Single datasets), demonstrate that AlignMamba-2 establishes a new state-of-the-art in both effectiveness and efficiency across diverse pattern recognition tasks, ranging from dynamic time-series analysis to static image-text classification.

[231]  arXiv:2603.18464 [pdf, ps, other]
Title: AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models
Subjects: Machine Learning (cs.LG)

Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models faces significant challenges in computational efficiency and data acquisition. We propose AcceRL, a fully asynchronous and decoupled RL framework designed to eliminate synchronization barriers by physically isolating training, inference, and rollouts. Crucially, AcceRL is the first to integrate a plug-and-play, trainable world model into a distributed asynchronous RL pipeline to generate virtual experiences. Experiments on the LIBERO benchmark demonstrate that AcceRL achieves state-of-the-art (SOTA) performance. Systematically, it exhibits super-linear scaling in throughput and highly efficient hardware utilization. Algorithmically, the world-model-augmented variant delivers unprecedented sample efficiency and robust training stability in complex control tasks.

[232]  arXiv:2603.18465 [pdf, ps, other]
Title: MedQ-UNI: Toward Unified Medical Image Quality Assessment and Restoration via Vision-Language Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing medical image restoration (Med-IR) methods are typically modality-specific or degradation-specific, failing to generalize across the heterogeneous degradations encountered in clinical practice. We argue this limitation stems from the isolation of Med-IR from medical image quality assessment (Med-IQA), as restoration models without explicit quality understanding struggle to adapt to diverse degradation types across modalities. To address these challenges, we propose MedQ-UNI, a unified vision-language model that follows an assess-then-restore paradigm, explicitly leveraging Med-IQA to guide Med-IR across arbitrary modalities and degradation types. MedQ-UNI adopts a multimodal autoregressive dual-expert architecture with shared attention: a quality assessment expert first identifies degradation issues through structured natural language descriptions, and a restoration expert then conditions on these descriptions to perform targeted image restoration. To support this paradigm, we construct a large-scale dataset of approximately 50K paired samples spanning three imaging modalities and five restoration tasks, each annotated with structured quality descriptions for joint Med-IQA and Med-IR training, along with a 2K-sample benchmark for evaluation. Extensive experiments demonstrate that a single MedQ-UNI model, without any task-specific adaptation, achieves state-of-the-art restoration performance across all tasks while generating superior descriptions, confirming that explicit quality understanding meaningfully improves restoration fidelity and interpretability.

[233]  arXiv:2603.18466 [pdf, ps, other]
Title: Recolour What Matters: Region-Aware Colour Editing via Token-Level Diffusion
Comments: 18 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Colour is one of the most perceptually salient yet least controllable attributes in image generation. Although recent diffusion models can modify object colours from user instructions, their results often deviate from the intended hue, especially for fine-grained and local edits. Early text-driven methods rely on discrete language descriptions that cannot accurately represent continuous chromatic variations. To overcome this limitation, we propose ColourCrafter, a unified diffusion framework that transforms colour editing from global tone transfer into a structured, region-aware generation process. Unlike traditional colour driven methods, ColourCrafter performs token-level fusion of RGB colour tokens and image tokens in latent space, selectively propagating colour information to semantically relevant regions while preserving structural fidelity. A perceptual Lab-space Loss further enhances pixel-level precision by decoupling luminance and chrominance and constraining edits within masked areas. Additionally, we build ColourfulSet, a largescale dataset of high-quality image pairs with continuous and diverse colour variations. Extensive experiments demonstrate that ColourCrafter achieves state-of-the-art colour accuracy, controllability and perceptual fidelity in fine-grained colour editing. Our project is available at https://yangyuqi317.github.io/ColourCrafter.github.io/.

[234]  arXiv:2603.18469 [pdf, ps, other]
Title: GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms
Comments: We are working towards releasing the code in April 2026
Subjects: Computation and Language (cs.CL)

We introduce GAIN (Goal-Aligned Decision-Making under Imperfect Norms), a benchmark designed to evaluate how large language models (LLMs) balance adherence to norms against business goals. Existing benchmarks typically focus on abstract scenarios rather than real-world business applications. Furthermore, they provide limited insights into the factors influencing LLM decision-making. This restricts their ability to measure models' adaptability to complex, real-world norm-goal conflicts. In GAIN, models receive a goal, a specific situation, a norm, and additional contextual pressures. These pressures, explicitly designed to encourage potential norm deviations, are a unique feature that differentiates GAIN from other benchmarks, enabling a systematic evaluation of the factors influencing decision-making. We define five types of pressures: Goal Alignment, Risk Aversion, Emotional/Ethical Appeal, Social/Authoritative Influence, and Personal Incentive. The benchmark comprises 1,200 scenarios across four domains: hiring, customer support, advertising and finance. Our experiments show that advanced LLMs frequently mirror human decision-making patterns. However, when Personal Incentive pressure is present, they diverge significantly, showing a strong tendency to adhere to norms rather than deviate from them.

[235]  arXiv:2603.18470 [pdf, ps, other]
Title: CyberJustice Tutor: An Agentic AI Framework for Cybersecurity Learning via Think-Plan-Act Reasoning and Pedagogical Scaffolding
Subjects: Human-Computer Interaction (cs.HC)

The integration of Large Language Models (LLMs) into cybersecurity education for criminal justice professionals is currently hindered by the "statelessness" of reactive chatbots and the risk of hallucinations in high-stakes legal contexts. To address these limitations, we propose the CyberJustice Tutor, an educational dialogue system powered by an Agentic AI framework. Unlike reactive chatbots, our system employs a "Think-Plan-Act" cognitive cycle, enabling autonomous goal decomposition, longitudinal planning, and dynamic context maintenance. We integrate a Pedagogical Scaffolding Layer grounded in Vygotsky's Zone of Proximal Development (ZPD), which dynamically adapts instructional support based on the learner's real-time progress. Furthermore, an Adaptive Retrieval Augmented Generation (RAG) core anchors the agent's reasoning in verified curriculum materials to ensure legal and technical accuracy. A comprehensive user study with 123 participants, including students, educators, and active law enforcement officers, validated the system's efficacy. Quantitative results demonstrate high user acceptance for Response Speed (4.7/5), Ease of Use (4.4/5), and Accuracy (4.3/5). Qualitative feedback indicates that the agentic architecture is perceived as highly effective in guiding learners through personalized paths, demonstrating the feasibility and usability of agentic AI for specialized professional education.

[236]  arXiv:2603.18471 [pdf, ps, other]
Title: A Faster Deterministic Algorithm for Kidney Exchange via Representative Set
Subjects: Data Structures and Algorithms (cs.DS)

The Kidney Exchange Problem is a prominent challenge in healthcare and economics, arising in the context of organ transplantation. It has been extensively studied in artificial intelligence and optimization. In a kidney exchange, a set of donor-recipient pairs and altruistic donors are considered, with the goal of identifying a sequence of exchange -- comprising cycles or chains starting from altruistic donors -- such that each donor provides a kidney to the compatible recipient in the next donor-recipient pair. Due to constraints in medical resources, some limits are often imposed on the lengths of these cycles and chains. These exchanges create a network of transplants aimed at maximizing the total number, $t$, of successful transplants. Recently, this problem was deterministically solved in $O^*(14.34^t)$ time (IJCAI 2024). In this paper, we introduce the representative set technique for the Kidney Exchange Problem, showing that the problem can be deterministically solved in $O^*(6.855^t)$ time.

[237]  arXiv:2603.18472 [pdf, ps, other]
Title: Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

While Multimodal Large Language Models (MLLMs) have achieved remarkable success in interpreting natural scenes, their ability to process discrete symbols -- the fundamental building blocks of human cognition -- remains a critical open question. Unlike continuous visual data, symbols such as mathematical formulas, chemical structures, and linguistic characters require precise, deeper interpretation. This paper introduces a comprehensive benchmark to evaluate how top-tier MLLMs navigate these "discrete semantic spaces" across five domains: language, culture, mathematics, physics, and chemistry. Our investigation uncovers a counterintuitive phenomenon: models often fail at basic symbol recognition yet succeed in complex reasoning tasks, suggesting they rely on linguistic probability rather than true visual perception. By exposing this "cognitive mismatch", we highlight a significant gap in current AI capabilities: the struggle to truly perceive and understand the symbolic languages that underpin scientific discovery and abstract thought. This work offers a roadmap for developing more rigorous, human-aligned intelligent systems.

[238]  arXiv:2603.18474 [pdf, ps, other]
Title: WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Precise behavioral control of large language models (LLMs) is critical for complex applications. However, existing methods often incur high training costs, lack natural language controllability, or compromise semantic coherence. To bridge this gap, we propose WASD (unWeaving Actionable Sufficient Directives), a novel framework that explains model behavior by identifying sufficient neural conditions for token generation. Our method represents candidate conditions as neuron-activation predicates and iteratively searches for a minimal set that guarantees the current output under input perturbations. Experiments on SST-2 and CounterFact with the Gemma-2-2B model demonstrate that our approach produces explanations that are more stable, accurate, and concise than conventional attribution graphs. Moreover, through a case study on controlling cross-lingual output generation, we validated the practical effectiveness of WASD in controlling model behavior.

[239]  arXiv:2603.18475 [pdf, ps, other]
Title: Resolving the Blow-Up: A Time-Dilated Numerical Framework for Multiple Firing Events in Mean-Field Neuronal Networks
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

In large-scale excitatory neuronal networks, rapid synchronization manifests as {multiple firing events (MFEs)}, mathematically characterized by a finite-time blow-up of the neuronal firing rate in the mean-field Fokker-Planck equation. Standard numerical methods struggle to resolve this singularity due to the divergent boundary flux and the instantaneous nature of the population voltage reset. In this work, we propose a robust {multiscale numerical framework based on time dilation}. By transforming the governing equation into a dilated timescale proportional to the firing activity, we desingularize the blow-up, effectively stretching the instantaneous synchronization event into a resolved mesoscopic process. This approach is shown to be physically consistent with the {microscopic cascade mechanism} underlying MFEs and the system's inherent fragility. To implement this numerically, we develop a hybrid scheme that utilizes a {mesh-independent flux criterion} to switch between timescales and a semi-analytical ``moving Gaussian'' method to accurately evolve the post-blowup Dirac mass. Numerical benchmarks demonstrate that our solver not only captures steady states with high accuracy but also efficiently reproduces periodic MFEs, matching Monte Carlo simulations without the severe time-step restrictions associated with particle cascades.

[240]  arXiv:2603.18477 [pdf, ps, other]
Title: Leveraging Large Language Models for Generalizing Peephole Optimizations
Subjects: Programming Languages (cs.PL)

Peephole optimizations are a core component of modern optimizing compilers. It rewrites specific instruction into semantically equivalent but more efficient forms. In practice, creating a new peephole optimization often starts from a concrete optimization instance and requires lifting it into a more general rewrite rule that matches a wider range of instruction patterns. This generalization step is critical to optimization effectiveness, but it is also difficult: producing rules that are both correct and sufficiently general typically demands substantial manual effort and domain expertise. Existing approaches such as Hydra attempt to automate this task with program synthesis, but their generalization capability is often limited by search-space explosion, under-generalization, and restricted support for diverse instruction domains.
We present LPG, large language model aided peephole optimization generalization, a framework that uses large language models (LLMs) to generalize peephole optimizations. The design of LPG is motivated by the observation that LLMs are effective at semantic abstraction and exploratory reasoning, while formal analyses are necessary to ensure that generated rules are sound and profitable. Based on this observation, LPG adopts a closed-loop workflow that integrates LLM-driven symbolic constant generalization, structural generalization, constraint relaxation, and bitwidth/precision generalization with feedback from syntactic validation, semantic verification, and profitability checking.
We evaluate LPG on real-world peephole optimization issues drawn from the LLVM ecosystem. Overall, LPG successfully generalizes 90 out of 102 optimizations. On the integer-focused subset that is directly comparable to Hydra, LPG generalizes 74 out of 81 optimizations, whereas Hydra generalizes 35.

[241]  arXiv:2603.18480 [pdf, ps, other]
Title: Do Vision Language Models Understand Human Engagement in Games?
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Inferring human engagement from gameplay video is important for game design and player-experience research, yet it remains unclear whether vision--language models (VLMs) can infer such latent psychological states from visual cues alone. Using the GameVibe Few-Shot dataset across nine first-person shooter games, we evaluate three VLMs under six prompting strategies, including zero-shot prediction, theory-guided prompts grounded in Flow, GameFlow, Self-Determination Theory, and MDA, and retrieval-augmented prompting. We consider both pointwise engagement prediction and pairwise prediction of engagement change between consecutive windows. Results show that zero-shot VLM predictions are generally weak and often fail to outperform simple per-game majority-class baselines. Memory- or retrieval-augmented prompting improves pointwise prediction in some settings, whereas pairwise prediction remains consistently difficult across strategies. Theory-guided prompting alone does not reliably help and can instead reinforce surface-level shortcuts. These findings suggest a perception--understanding gap in current VLMs: although they can recognize visible gameplay cues, they still struggle to robustly infer human engagement across games.

[242]  arXiv:2603.18481 [pdf, ps, other]
Title: T-QPM: Enabling Temporal Out-Of-Distribution Detection and Domain Generalization for Vision-Language Models in Open-World
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Out-of-distribution (OOD) detection remains a critical challenge in open-world learning, where models must adapt to evolving data distributions. While recent vision-language models (VLMS) like CLIP enable multimodal OOD detection through Dual-Pattern Matching (DPM), existing methods typically suffer from two major shortcomings: (1) They rely on fixed fusion rules and assume static environments, failing under temporal drift; and (2) they lack robustness against covariate shifted inputs. In this paper, we propose a novel two-step framework to enhance OOD detection and covariate distribution shift robustness in dynamic settings. We extend the dual-pattern regime into Temporal Quadruple-Pattern Matching (T-QPM). First, by pairing OOD images with text descriptions, we introduce cross-modal consistency patterns between ID and OOD signals, refining the decision boundary through joint image-text reasoning. Second, we address temporal distribution shifts by learning lightweight fusion weights to optimally combine semantic matching and visual typicality. To ensure stability, we enforce explicit regularization based on Average Thresholded Confidence (ATC), preventing performance degradation as distributions evolve. Experiments on temporally partitioned benchmarks demonstrate that our approach significantly outperforms static baselines, offering a robust, temporally-consistent framework for multimodal OOD detection in non-stationary environments.

[243]  arXiv:2603.18482 [pdf, ps, other]
Title: The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices
Comments: Under review
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)

Standard decoding strategies for text generation, including top-k, nucleus sampling, and contrastive search, select tokens based on likelihood, restricting selection to high-probability regions. Human language production operates differently: tokens are chosen for communicative appropriateness rather than statistical frequency. This mismatch creates a truncation blind spot: contextually appropriate but statistically rare tokens remain accessible to humans yet unreachable by likelihood-based decoding. We hypothesize this contributes to the detectability of machine-generated text. Analyzing over 1.8 million texts across eight language models, five decoding strategies, and 53 hyperparameter configurations, we find that 8-18% of human-selected tokens fall outside typical truncation boundaries. Simple classifiers trained on predictability and lexical diversity achieve remarkable detection rates. Crucially, neither model scale nor architecture correlates strongly with detectability; truncation parameters account for most variance. Configurations achieving low detectability often produce incoherent text, indicating that evading detection and producing natural text are distinct objectives. These findings suggest detectability is enhanced by likelihood-based token selection, not merely a matter of model capability.

[244]  arXiv:2603.18488 [pdf, ps, other]
Title: TexEditor: Structure-Preserving Text-Driven Texture Editing
Comments: 19pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Text-guided texture editing aims to modify object appearance while preserving the underlying geometric structure. However, our empirical analysis reveals that even SOTA editing models frequently struggle to maintain structural consistency during texture editing, despite the intended changes being purely appearance-related. Motivated by this observation, we jointly enhance structure preservation from both data and training perspectives, and build TexEditor, a dedicated texture editing model based on Qwen-Image-Edit-2509. Firstly, we construct TexBlender, a high-quality SFT dataset generated with Blender, which provides strong structural priors for a cold start. Sec- ondly, we introduce StructureNFT, a RL-based approach that integrates structure-preserving losses to transfer the structural priors learned during SFT to real-world scenes. Moreover, due to the limited realism and evaluation coverage of existing benchmarks, we introduce TexBench, a general-purpose real-world benchmark for text-guided texture editing. Extensive experiments on existing Blender-based texture benchmarks and our TexBench show that TexEditor consistently outperforms strong baselines such as Nano Banana Pro. In addition, we assess TexEditor on the general purpose benchmark ImgEdit to validate its generalization. Our code and data are available at https://github.com/KlingAIResearch/TexEditor.

[245]  arXiv:2603.18489 [pdf, ps, other]
Title: EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models
Subjects: Computation and Language (cs.CL)

Diffusion-based large language models (dLLMs) rely on bidirectional attention, which prevents lossless KV caching and requires a full forward pass at every denoising step. Existing approximate KV caching methods reduce this cost by selectively updating cached states, but their decision overhead scales with context length or model depth. We propose EntropyCache, a training-free KV caching method that uses the maximum entropy of newly decoded token distributions as a constant-cost signal for deciding when to recompute. Our design is grounded in two empirical observations: (1) decoded token entropy correlates with KV cache drift, providing a cheap proxy for cache staleness, and (2) feature volatility of decoded tokens persists for multiple steps after unmasking, motivating recomputation of the $k$ most recently decoded tokens. The skip-or-recompute decision requires only $O(V)$ computation per step, independent of context length and model scale. Experiments on LLaDA-8B-Instruct and Dream-7B-Instruct show that EntropyCache achieves $15.2\times$-$26.4\times$ speedup on standard benchmarks and $22.4\times$-$24.1\times$ on chain-of-thought benchmarks, with competitive accuracy and decision overhead accounting for only $0.5\%$ of inference time. Code is available at https://github.com/mscheong01/EntropyCache.

[246]  arXiv:2603.18492 [pdf, ps, other]
Title: AIMER: Calibration-Free Task-Agnostic MoE Pruning
Subjects: Machine Learning (cs.LG)

Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token compute, but the deployment still requires storing all experts, making expert pruning important for reducing memory and serving overhead. Existing task-agnostic expert pruning methods are typically calibration-dependent: they estimate expert importance from routing or activation statistics on a calibration set, which makes pruning outcomes sensitive to the choice of calibration set and adds substantial preprocessing cost. We introduce AIMER (\textbf{A}bsolute mean over root mean square \textbf{IM}portance for \textbf{E}xpert \textbf{R}anking), a simple calibration-free criterion that yields clear within-layer score separation and distinct expert stratification. Across 7B to 30B MoE language models at 25\% and 50\% pruning ratios over 16 benchmarks, AIMER consistently delivers competitive or stronger overall performance against state-of-the-art calibration-based expert pruning baselines with only 0.22--1.27 seconds for scoring the experts.

[247]  arXiv:2603.18493 [pdf, ps, other]
Title: FILT3R: Latent State Adaptive Kalman Filter for Streaming 3D Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Streaming 3D reconstruction maintains a persistent latent state that is updated online from incoming frames, enabling constant-memory inference. A key failure mode is the state update rule: aggressive overwrites forget useful history, while conservative updates fail to track new evidence, and both behaviors become unstable beyond the training horizon. To address this challenge, we propose FILT3R, a training-free latent filtering layer that casts recurrent state updates as stochastic state estimation in token space. FILT3R maintains a per-token variance and computes a Kalman-style gain that adaptively balances memory retention against new observations. Process noise -- governing how much the latent state is expected to change between frames -- is estimated online from EMA-normalized temporal drift of candidate tokens. Using extensive experiments, we demonstrate that FILT3R yields an interpretable, plug-in update rule that generalizes common overwrite and gating policies as special cases. Specifically, we show that gains shrink in stable regimes as uncertainty contracts with accumulated evidence, and rise when genuine scene change increases process uncertainty, improving long-horizon stability for depth, pose, and 3D reconstruction, compared to the existing methods. Code will be released at https://github.com/jinotter3/FILT3R.

[248]  arXiv:2603.18494 [pdf, ps, other]
Title: MemoAct: Atkinson-Shiffrin-Inspired Memory-Augmented Visuomotor Policy for Robotic Manipulation
Subjects: Robotics (cs.RO)

Memory-augmented robotic policies are essential in handling memory-dependent tasks. However, existing approaches typically rely on simple observation window extensions, struggling to simultaneously achieve precise task state tracking and robust long-horizon retention. To overcome these challenges, inspired by the Atkinson-Shiffrin memory model, we propose MemoAct, a hierarchical memory-based policy that leverages distinct memory tiers to tackle specific bottlenecks. Specifically, lossless short-term memory ensures precise task state tracking, while compressed long-term memory enables robust long-horizon retention. To enrich the evaluation landscape, we construct MemoryRTBench based on RoboTwin 2.0, specifically tailored to assess policy capabilities in task state tracking and long-horizon retention. Extensive experiments across simulated and real-world scenarios demonstrate that MemoAct achieves superior performance compared to both existing Markovian baselines and history-aware policies. The project page is \href{https://tlf-tlf.github.io/MemoActPage/}{available}.

[249]  arXiv:2603.18495 [pdf, ps, other]
Title: Cross-Domain Demo-to-Code via Neurosymbolic Counterfactual Reasoning
Comments: Accepted at CVPR 2026
Subjects: Artificial Intelligence (cs.AI)

Recent advances in Vision-Language Models (VLMs) have enabled video-instructed robotic programming, allowing agents to interpret video demonstrations and generate executable control code. We formulate video-instructed robotic programming as a cross-domain adaptation problem, where perceptual and physical differences between demonstration and deployment induce procedural mismatches. However, current VLMs lack the procedural understanding needed to reformulate causal dependencies and achieve task-compatible behavior under such domain shifts. We introduce NeSyCR, a neurosymbolic counterfactual reasoning framework that enables verifiable adaptation of task procedures, providing a reliable synthesis of code policies. NeSyCR abstracts video demonstrations into symbolic trajectories that capture the underlying task procedure. Given deployment observations, it derives counterfactual states that reveal cross-domain incompatibilities. By exploring the symbolic state space with verifiable checks, NeSyCR proposes procedural revisions that restore compatibility with the demonstrated procedure. NeSyCR achieves a 31.14% improvement in task success over the strongest baseline Statler, showing robust cross-domain adaptation across both simulated and real-world manipulation tasks.

[250]  arXiv:2603.18496 [pdf, ps, other]
Title: NymeriaPlus: Enriching Nymeria Dataset with Additional Annotations and Data
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The Nymeria Dataset, released in 2024, is a large-scale collection of in-the-wild human activities captured with multiple egocentric wearable devices that are spatially localized and temporally synchronized. It provides body-motion ground truth recorded with a motion-capture suit, device trajectories, semi-dense 3D point clouds, and in-context narrations. In this paper, we upgrade Nymeria and introduce NymeriaPlus. NymeriaPlus features: (1) improved human motion in Momentum Human Rig (MHR) and SMPL formats; (2) dense 3D and 2D bounding box annotations for indoor objects and structural elements; (3) instance-level 3D object reconstructions; and (4) additional modalities e.g., basemap recordings, audio, and wristband videos. By consolidating these complementary modalities and annotations into a single, coherent benchmark, NymeriaPlus strengthens Nymeria into a more powerful in-the-wild egocentric dataset. We expect NymeriaPlus to bridge a key gap in existing egocentric resources and to support a broader range of research, including unique explorations of multimodal learning for embodied AI.

[251]  arXiv:2603.18501 [pdf, ps, other]
Title: Efficient Video Diffusion with Sparse Information Transmission for Video Compression
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Video compression aims to maximize reconstruction quality with minimal bitrates. Beyond standard distortion metrics, perceptual quality and temporal consistency are also critical. However, at ultra-low bitrates, traditional end-to-end compression models tend to produce blurry images of poor perceptual quality. Besides, existing generative compression methods often treat video frames independently and show limitations in time coherence and efficiency. To address these challenges, we propose the Efficient Video Diffusion with Sparse Information Transmission (Diff-SIT), which comprises the Sparse Temporal Encoding Module (STEM) and the One-Step Video Diffusion with Frame Type Embedder (ODFTE). The STEM sparsely encodes the original frame sequence into an information-rich intermediate sequence, achieving significant bitrate savings. Subsequently, the ODFTE processes this intermediate sequence as a whole, which exploits the temporal correlation. During this process, our proposed Frame Type Embedder (FTE) guides the diffusion model to perform adaptive reconstruction according to different frame types to optimize the overall quality. Extensive experiments on multiple datasets demonstrate that Diff-SIT establishes a new state-of-the-art in perceptual quality and temporal consistency, particularly in the challenging ultra-low-bitrate regime. Code is released at https://github.com/MingdeZhou/Diff-SIT.

[252]  arXiv:2603.18502 [pdf, ps, other]
Title: HOMEY: Heuristic Object Masking with Enhanced YOLO for Property Insurance Risk Detection
Comments: 21 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Automated property risk detection is a high-impact yet underexplored frontier in computer vision with direct implications for real estate, underwriting, and insurance operations. We introduce HOMEY (Heuristic Object Masking with Enhanced YOLO), a novel detection framework that combines YOLO with a domain-specific masking mechanism and a custom-designed loss function. HOMEY is trained to detect 17 risk-related property classes, including structural damages (e.g., cracked foundations, roof issues), maintenance neglect (e.g., dead yards, overgrown bushes), and liability hazards (e.g., falling gutters, garbage, hazard signs). Our approach introduces heuristic object masking to amplify weak signals in cluttered backgrounds and risk-aware loss calibration to balance class skew and severity weighting. Experiments on real-world property imagery demonstrate that HOMEY achieves superior detection accuracy and reliability compared to baseline YOLO models, while retaining fast inference. Beyond detection, HOMEY enables interpretable and cost-efficient risk analysis, laying the foundation for scalable AI-driven property insurance workflows.

[253]  arXiv:2603.18505 [pdf, ps, other]
Title: From Snapshots to Symphonies: The Evolution of Protein Prediction from Static Structures to Generative Dynamics and Multimodal Interactions
Comments: 17 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The protein folding problem has been fundamentally transformed by artificial intelligence, evolving from static structure prediction toward the modeling of dynamic conformational ensembles and complex biomolecular interactions. This review systematically examines the paradigm shift in AI driven protein science across five interconnected dimensions: unified multimodal representations that integrate sequences, geometries, and textual knowledge; refinement of static prediction through MSA free architectures and all atom complex modeling; generative frameworks, including diffusion models and flow matching, that capture conformational distributions consistent with thermodynamic ensembles; prediction of heterogeneous interactions spanning protein ligand, protein nucleic acid, and protein protein complexes; and functional inference of fitness landscapes, mutational effects, and text guided property prediction. We critically analyze current bottlenecks, including data distribution biases, limited mechanistic interpretability, and the disconnect between geometric metrics and biophysical reality, while identifying future directions toward physically consistent generative models, multimodal foundation architectures, and experimental closed loop systems. This methodological transformation marks artificial intelligence's transition from a structural analysis tool into a universal simulator capable of understanding and ultimately rewriting the dynamic language of life.

[254]  arXiv:2603.18507 [pdf, ps, other]
Title: Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM
Subjects: Artificial Intelligence (cs.AI)

Persona prompting can steer LLM generation towards a domain-specific tone and pattern. This behavior enables use cases in multi-agent systems where diverse interactions are crucial and human-centered tasks require high-level human alignment. Prior works provide mixed opinions on their utility: some report performance gains when using expert personas for certain domains and their contribution to data diversity in synthetic data creation, while others find near-zero or negative impact on general utility. To fully leverage the benefits of the LLM persona and avoid its harmfulness, a more comprehensive investigation of the mechanism is crucial. In this work, we study how model optimization, task type, prompt length, and placement can impact expert persona effectiveness across instruction-tuned and reasoning LLMs, and provide insight into conditions under which expert personas fail and succeed. Based on our findings, we developed a pipeline to fully leverage the benefits of an expert persona, named PRISM (Persona Routing via Intent-based Self-Modeling), which self-distills an intent-conditioned expert persona into a gated LoRA adapter through a bootstrapping process that requires no external data, models, or knowledge. PRISM enhances human preference and safety alignment on generative tasks while maintaining accuracy on discriminative tasks across all models, with minimal memory and computing overhead.

[255]  arXiv:2603.18508 [pdf, ps, other]
Title: Foundations and Architectures of Artificial Intelligence for Motor Insurance
Comments: 173 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

This handbook presents a systematic treatment of the foundations and architectures of artificial intelligence for motor insurance, grounded in large-scale real-world deployment. It formalizes a vertically integrated AI paradigm that unifies perception, multimodal reasoning, and production infrastructure into a cohesive intelligence stack for automotive risk assessment and claims processing. At its core, the handbook develops domain-adapted transformer architectures for structured visual understanding, relational vehicle representation learning, and multimodal document intelligence, enabling end-to-end automation of vehicle damage analysis, claims evaluation, and underwriting workflows. These components are composed into a scalable pipeline operating under practical constraints observed in nationwide motor insurance systems in Thailand. Beyond model design, the handbook emphasizes the co-evolution of learning algorithms and MLOps practices, establishing a principled framework for translating modern artificial intelligence into reliable, production-grade systems in high-stakes industrial environments.

[256]  arXiv:2603.18510 [pdf, ps, other]
Title: OnlinePG: Online Open-Vocabulary Panoptic Mapping with 3D Gaussian Splatting
Comments: CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Open-vocabulary scene understanding with online panoptic mapping is essential for embodied applications to perceive and interact with environments. However, existing methods are predominantly offline or lack instance-level understanding, limiting their applicability to real-world robotic tasks. In this paper, we propose OnlinePG, a novel and effective system that integrates geometric reconstruction and open-vocabulary perception using 3D Gaussian Splatting in an online setting. Technically, to achieve online panoptic mapping, we employ an efficient local-to-global paradigm with a sliding window. To build local consistency map, we construct a 3D segment clustering graph that jointly leverages geometric and semantic cues, fusing inconsistent segments within sliding window into complete instances. Subsequently, to update the global map, we construct explicit grids with spatial attributes for the local 3D Gaussian map and fuse them into the global map via robust bidirectional bipartite 3D Gaussian instance matching. Finally, we utilize the fused VLM features inside the 3D spatial attribute grids to achieve open-vocabulary scene understanding. Extensive experiments on widely used datasets demonstrate that our method achieves better performance among online approaches, while maintaining real-time efficiency.

[257]  arXiv:2603.18513 [pdf, ps, other]
Title: CAFlow: Adaptive-Depth Single-Step Flow Matching for Efficient Histopathology Super-Resolution
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In digital pathology, whole-slide images routinely exceed gigapixel resolution, making computationally intensive generative super-resolution (SR) impractical for routine deployment. We introduce CAFlow, an adaptive-depth single-step flow-matching framework that routes each image tile to the shallowest network exit that preserves reconstruction quality. CAFlow performs flow matching in pixel-unshuffled rearranged space, reducing spatial computation by 16x while enabling direct inference. We show that dedicating half of training to exact t=0 samples is essential for single-step quality (-1.5 dB without it). The backbone, FlowResNet (1.90M parameters), mixes convolution and window self-attention blocks across four early exits spanning 3.1 to 13.3 GFLOPs. A lightweight exit classifier (~6K parameters) achieves 33% compute savings at only 0.12 dB cost. On multi-organ histopathology x4 SR, adaptive routing achieves 31.72 dB PSNR versus 31.84 dB at full depth, while the shallowest exit exceeds bicubic by +1.9 dB at 2.8x less compute than SwinIR-light. The method generalizes to held-out colon tissue with minimal quality loss (-0.02 dB), and at x8 upscaling it outperforms all comparable-compute baselines while remaining competitive with the much larger SwinIR-Medium model. Downstream nuclei segmentation confirms preservation of clinically relevant structure. The model trains in under 5 hours on a single GPU, and adaptive routing can reduce whole-slide inference from minutes to seconds.

[258]  arXiv:2603.18516 [pdf, ps, other]
Title: Total Recall QA: A Verifiable Evaluation Suite for Deep Research Agents
Comments: 7 pages, 4 figures
Subjects: Information Retrieval (cs.IR)

Deep research agents have emerged as LLM-based systems designed to perform multi-step information seeking and reasoning over large, open-domain sources to answer complex questions by synthesizing information from multiple information sources. Given the complexity of the task and despite various recent efforts, evaluation of deep research agents remains fundamentally challenging. This paper identifies a list of requirements and optional properties for evaluating deep research agents. We observe that existing benchmarks do not satisfy all identified requirements. Inspired by prior research on TREC Total Recall Tracks, we introduce the task of Total Recall Question Answering and develop a framework for deep research agents evaluation that satisfies the identified criteria. Our framework constructs single-answer, total recall queries with precise evaluation and relevance judgments derived from a structured knowledge base paired with a text corpus, enabling large-scale data construction. Using this framework, we build TRQA, a deep research benchmark constructed from Wikidata-Wikipedia as a real-world source and a synthetically generated e-commerce knowledge base and corpus to mitigate the effects of data contamination. We benchmark the collection with representative retriever and deep research models and establish baseline retrieval and end-to-end results for future comparative evaluation.

[259]  arXiv:2603.18520 [pdf, ps, other]
Title: Robotic Agentic Platform for Intelligent Electric Vehicle Disassembly
Subjects: Robotics (cs.RO)

Electric vehicles (EV) create an urgent need for scalable battery recycling, yet disassembly of EV battery packs remains largely manual due to high design variability. We present our Robotic Agentic Platform for Intelligent Disassembly (RAPID), designed to investigate perception-driven manipulation, flexible automation, and AI-assisted robot programming in realistic recycling scenarios. The system integrates a gantry-mounted industrial manipulator, RGB-D perception, and an automated nut-running tool for fastener removal on a full-scale EV battery pack. An open-vocabulary object detection pipeline achieves 0.9757 mAP50, enabling reliable identification of screws, nuts, busbars, and other components. We experimentally evaluate (n=204) three one-shot fastener removal strategies: taught-in poses (97% success rate, 24 min duration), one-shot vision execution (57%, 29 min), and visual servoing (83%, 36 min), comparing success rate and disassembly time for the battery's top cover fasteners. To support flexible interaction, we introduce agentic AI specifications for robotic disassembly tasks, allowing LLM agents to translate high-level instructions into robot actions through structured tool interfaces and ROS services. We evaluate SmolAgents with GPT-4o-mini and Qwen 3.5 9B/4B on edge hardware. Tool-based interfaces achieve 100% task completion, while automatic ROS service discovery shows 43.3% failure rates, highlighting the need for structured robot APIs for reliable LLM-driven control. This open-source platform enables systematic investigation of human-robot collaboration, agentic robot programming, and increasingly autonomous disassembly workflows, providing a practical foundation for research toward scalable robotic battery recycling.

[260]  arXiv:2603.18523 [pdf, ps, other]
Title: Counting Circuits: Mechanistic Interpretability of Visual Reasoning in Large Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Counting serves as a simple but powerful test of a Large Vision-Language Model's (LVLM's) reasoning; it forces the model to identify each individual object and then add them all up. In this study, we investigate how LVLMs implement counting using controlled synthetic and real-world benchmarks, combined with mechanistic analyses. Our results show that LVLMs display a human-like counting behavior, with precise performance on small numerosities and noisy estimation for larger quantities. We introduce two novel interpretability methods, Visual Activation Patching and HeadLens, and use them to uncover a structured "counting circuit" that is largely shared across a variety of visual reasoning tasks. Building on these insights, we propose a lightweight intervention strategy that exploits simple and abundantly available synthetic images to fine-tune arbitrary pretrained LVLMs exclusively on counting. Despite the narrow scope of this fine-tuning, the intervention not only enhances counting accuracy on in-distribution synthetic data, but also yields an average improvement of +8.36% on out-of-distribution counting benchmarks and an average gain of +1.54% on complex, general visual reasoning tasks for Qwen2.5-VL. These findings highlight the central, influential role of counting in visual reasoning and suggest a potential pathway for improving overall visual reasoning capabilities through targeted enhancement of counting mechanisms.

[261]  arXiv:2603.18524 [pdf, ps, other]
Title: 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model
Comments: Project page: this https URL Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Creating dynamic, view-consistent videos of customized subjects is highly sought after for a wide range of emerging applications, including immersive VR/AR, virtual production, and next-generation e-commerce. However, despite rapid progress in subject-driven video generation, existing methods predominantly treat subjects as 2D entities, focusing on transferring identity through single-view visual features or textual prompts. Because real-world subjects are inherently 3D, applying these 2D-centric approaches to 3D object customization reveals a fundamental limitation: they lack the comprehensive spatial priors necessary to reconstruct the 3D geometry. Consequently, when synthesizing novel views, they must rely on generating plausible but arbitrary details for unseen regions, rather than preserving the true 3D identity. Achieving genuine 3D-aware customization remains challenging due to the scarcity of multi-view video datasets. While one might attempt to fine-tune models on limited video sequences, this often leads to temporal overfitting. To resolve these issues, we introduce a novel framework for 3D-aware video customization, comprising 3DreamBooth and 3Dapter. 3DreamBooth decouples spatial geometry from temporal motion through a 1-frame optimization paradigm. By restricting updates to spatial representations, it effectively bakes a robust 3D prior into the model without the need for exhaustive video-based training. To enhance fine-grained textures and accelerate convergence, we incorporate 3Dapter, a visual conditioning module. Following single-view pre-training, 3Dapter undergoes multi-view joint optimization with the main generation branch via an asymmetrical conditioning strategy. This design allows the module to act as a dynamic selective router, querying view-specific geometric hints from a minimal reference set. Project page: https://ko-lani.github.io/3DreamBooth/

[262]  arXiv:2603.18526 [pdf, ps, other]
Title: Rethink Web Service Resilience in Space: A Radiation-Aware and Sustainable Transmission Solution
Comments: This paper has been accepted at WWW 2026
Subjects: Multimedia (cs.MM)

Low Earth Orbit (LEO) satellite networks such as Starlink and Project Kuiper are increasingly integrated with cloud infrastructures, forming an important internet backbone for global web services. By extending connectivity to remote regions, oceans, and disaster zones, these networks enable reliable access to applications ranging from real-time WebRTC communication to emergency response portals. Yet the resilience of these web services is threatened by space radiation: it degrades hardware, drains batteries, and disrupts continuity, even if the space-cloud integrated providers use machine learning to analyze space weather and radiation data. Specifically, conventional fixes like altitude adjustments and thermal annealing consume energy; neglecting this energy use results in deep discharge and faster battery aging, whereas sleep modes risk abrupt web session interruptions. Efficient network-layer mitigation remains a critical gap. We propose RALT (Radiation-Aware LEO Transmission), a control-plane solution that dynamically reroutes traffic during radiation events, accounting for energy constraints to minimize battery degradation and sustain service performance. Our work shows that unlocking space-based web services' full potential for global reliable connectivity requires rethinking resilience through the lens of the space environment itself.

[263]  arXiv:2603.18527 [pdf, ps, other]
Title: Born-Series-Inspired Residual Metric for Learning-based Preconditioners
Comments: 18 pages, 7 figures
Subjects: Numerical Analysis (math.NA)

Loss functions for learning-based PDE preconditioners implicitly choose a \emph{metric} in which residuals are matched, yet most approaches still optimize an unpreconditioned Euclidean residual norm. For indefinite operators such as the high-frequency Helmholtz equation, this default metric can make both learning and iterative correction overly sensitive to near-resonant spectral components, while classical preconditioning succeeds precisely by reshaping the residual geometry. We show that the Born Series and shifted-Laplacian left preconditioning are linked by the identity $ I-G_\eta V_\eta = G_\eta A = L_\eta^{-1}A, $ which turns the reference Green operator $G_\eta$ into a natural Riesz-map residual metric $ R_\eta = G_\eta^\ast G_\eta $ and suggests measuring the physical residual via $ \|r\|_{R_\eta}=\|G_\eta r\|_2. $ Building on this viewpoint, we propose a \emph{Neural Preconditioned Born Series} (NPBS) iteration that replaces the scalar CBS relaxation with a residual-driven neural operator, together with a metric-matched Born-series-inspired loss $\mathcal{L}_{\mathrm{bs}}^{R_\eta}$. The framework is architecture-agnostic and supports fast $\mathcal{O}(N\log N)$ evaluation via FFT/DST/DCT. Numerical experiments on heterogeneous Helmholtz problems demonstrate the effectiveness of our method, and its advantage becomes more pronounced as the systems grow more ill-conditioned; we then extend the framework to other PDE classes, including convection--diffusion--reaction equations and linearized Newton systems for nonlinear PDEs, where it also yields substantial iteration reductions.

[264]  arXiv:2603.18528 [pdf, ps, other]
Title: Correlation-Weighted Multi-Reward Optimization for Compositional Generation
Subjects: Artificial Intelligence (cs.AI)

Text-to-image models produce images that align well with natural language prompts, but compositional generation has long been a central challenge. Models often struggle to satisfy multiple concepts within a single prompt, frequently omitting some concepts and resulting in partial success. Such failures highlight the difficulty of jointly optimizing multiple concepts during reward optimization, where competing concepts can interfere with one another. To address this limitation, we propose Correlation-Weighted Multi-Reward Optimization (\ours), a framework that leverages the correlation structure among concept rewards to adaptively weight each attribute concept in optimization. By accounting for interactions among concepts, \ours balances competing reward signals and emphasizes concepts that are partially satisfied yet inconsistently generated across samples, improving compositional generation. Specifically, we decompose multi-concept prompts into pre-defined concept groups (\eg, objects, attributes, and relations) and obtain reward signals from dedicated reward models for each concept. We then adaptively reweight these rewards, assigning higher weights to conflicting or hard-to-satisfy concepts using correlation-based difficulty estimation. By focusing optimization on the most challenging concepts within each group, \ours encourages the model to consistently satisfy all requested attributes simultaneously. We apply our approach to train state-of-the-art diffusion models, SD3.5 and FLUX.1-dev, and demonstrate consistent improvements on challenging multi-concept benchmarks, including ConceptMix, GenEval 2, and T2I-CompBench.

[265]  arXiv:2603.18530 [pdf, ps, other]
Title: When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

Large language models (LLMs) are increasingly used for high-stakes decisions, yet their susceptibility to spurious features remains poorly characterized. We introduce ICE-Guard, a framework applying intervention consistency testing to detect three types of spurious feature reliance: demographic (name/race swaps), authority (credential/prestige swaps), and framing (positive/negative restatements). Across 3,000 vignettes spanning 10 high-stakes domains, we evaluate 11 LLMs from 8 families and find that (1) authority bias (mean 5.8%) and framing bias (5.0%) substantially exceed demographic bias (2.2%), challenging the field's narrow focus on demographics; (2) bias concentrates in specific domains -- finance shows 22.6% authority bias while criminal justice shows only 2.8%; (3) structured decomposition, where the LLM extracts features and a deterministic rubric decides, reduces flip rates by up to 100% (median 49% across 9 models). We demonstrate an ICE-guided detect-diagnose-mitigate-verify loop achieving cumulative 78% bias reduction via iterative prompt patching. Validation against real COMPAS recidivism data shows COMPAS-derived flip rates exceed pooled synthetic rates, suggesting our benchmark provides a conservative estimate of real-world bias. Code and data are publicly available.

[266]  arXiv:2603.18532 [pdf, ps, other]
Title: Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The strong performance of large vision-language models (VLMs) trained with reinforcement learning (RL) has motivated similar approaches for fine-tuning vision-language-action (VLA) models in robotics. Many recent works fine-tune VLAs directly in the real world to avoid addressing the sim-to-real gap. While real-world RL circumvents sim-to-real issues, it inherently limits the generality of the resulting VLA, as scaling scene and object diversity in the physical world is prohibitively difficult. This leads to the paradoxical outcome of transforming a broadly pretrained model into an overfitted, scene-specific policy. Training in simulation can instead provide access to diverse scenes, but designing those scenes is also costly. In this work, we show that VLAs can be RL fine-tuned without sacrificing generality and with reduced labor by leveraging 3D world generative models. Using these models together with a language-driven scene designer, we generate hundreds of diverse interactive scenes containing unique objects and backgrounds, enabling scalable and highly parallel policy learning. Starting from a pretrained imitation baseline, our approach increases simulation success from 9.7% to 79.8% while achieving a 1.25$\times$ speedup in task completion time. We further demonstrate successful sim-to-real transfer enabled by the quality of the generated digital twins together with domain randomization, improving real-world success from 21.7% to 75% and achieving a 1.13$\times$ speedup. Finally, we further highlight the benefits of leveraging the effectively unlimited data from 3D world generative models through an ablation study showing that increasing scene diversity directly improves zero-shot generalization.

[267]  arXiv:2603.18533 [pdf, ps, other]
Title: Balancing the Reasoning Load: Difficulty-Differentiated Policy Optimization with Length Redistribution for Efficient and Robust Reinforcement Learning
Comments: 13 pages
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Large Reasoning Models (LRMs) have shown exceptional reasoning capabilities, but they also suffer from the issue of overthinking, often generating excessively long and redundant answers.
For problems that exceed the model's capabilities, LRMs tend to exhibit the overconfidence phenomenon, generating overly short but incorrect answers, which may contribute to suboptimal performance.
To address these issues, we propose Difficulty-Differentiated Policy Optimization (DDPO), an efficient reinforcement learning algorithm that optimizes simple and complex tasks separately based on the overconfidence phenomenon.
Specifically, it reduces the output length for simple tasks without compromising accuracy, while for complex tasks, it expands the exploration space to improve performance. We further derive the theoretical conditions for maximizing expected accuracy, which require the length distribution to closely approximate the optimal length and be as concentrated as possible. Based on these conditions, we propose using the difficulty-level average as a well-founded reference for length optimization.
Extensive experiments on both in-domain and out-of-domain benchmarks validate the superiority and effectiveness of DDPO. Compared to GRPO, DDPO reduces the average answer length by 12% while improving accuracy by 1.85% across multiple benchmarks, achieving a better trade-off between accuracy and length. The code is available at https://github.com/Yinan-Xia/DDPO.

[268]  arXiv:2603.18534 [pdf, ps, other]
Title: Data-efficient pre-training by scaling synthetic megadocs
Subjects: Machine Learning (cs.LG)

Synthetic data augmentation has emerged as a promising solution when pre-training is constrained by data rather than compute. We study how to design synthetic data algorithms that achieve better loss scaling: not only lowering loss at finite compute but especially as compute approaches infinity. We first show that pre-training on web data mixed with synthetically generated rephrases improves i.i.d. validation loss on the web data, despite the synthetic data coming from an entirely different distribution. With optimal mixing and epoching, loss and benchmark accuracy improve without overfitting as the number of synthetic generations grows, plateauing near $1.48\times$ data efficiency at 32 rephrases per document. We find even better loss scaling under a new perspective: synthetic generations from the same document can form a single substantially longer megadocument instead of many short documents. We show two ways to construct megadocs: stitching synthetic rephrases from the same web document or stretching a document by inserting rationales. Both methods improve i.i.d. loss, downstream benchmarks, and especially long-context loss relative to simple rephrasing, increasing data efficiency from $1.48\times$ to $1.80\times$ at $32$ generations per document. Importantly, the improvement of megadocs over simple rephrasing widens as more synthetic data is generated. Our results show how to design synthetic data algorithms that benefit more from increasing compute when data-constrained.

[269]  arXiv:2603.18535 [pdf, ps, other]
Title: Align-to-Scale: Mode Switching Technique for Unimanual 3D Object Manipulation with Gaze-Hand-Object Alignment in Extended Reality
Comments: 19 pages, 6 figures, Presented at ACM ETRA 2026
Subjects: Human-Computer Interaction (cs.HC)

As extended reality (XR) technologies rapidly become as ubiquitous as today's mobile devices, supporting one-handed interaction becomes essential for XR. However, the prevalent Gaze + Pinch interaction model partially supports unimanual interaction, where users select, move, and rotate objects with one hand, but scaling typically requires both hands. In this work, we leverage the spatial alignment between gaze and hand as a mode switch to enable single-handed pinch-to-scale. We design and evaluate several techniques geared for one-handed scaling and assess their usability in a compound translate-scale task. Our findings show that all proposed methods effectively enable one-handed scaling, but each method offers distinct advantages and trade-offs. To this end, we derive design guidelines to support futuristic 3D interfaces with unimanual interaction. Our work helps make eye-hand 3D interaction in XR more mobile, flexible, and accessible.

[270]  arXiv:2603.18538 [pdf, ps, other]
Title: Beyond Passive Aggregation: Active Auditing and Topology-Aware Defense in Decentralized Federated Learning
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

Decentralized Federated Learning (DFL) remains highly vulnerable to adaptive backdoor attacks designed to bypass traditional passive defense metrics. To address this limitation, we shift the defensive paradigm toward a novel active, interventional auditing framework. First, we establish a dynamical model to characterize the spatiotemporal diffusion of adversarial updates across complex graph topologies. Second, we introduce a suite of proactive auditing metrics, stochastic entropy anomaly, randomized smoothing Kullback-Leibler divergence, and activation kurtosis. These metrics utilize private probes to stress-test local models, effectively exposing latent backdoors that remain invisible to conventional static detection. Furthermore, we implement a topology-aware defense placement strategy to maximize global aggregation resilience. We provide theoretical property for the system's convergence under co-evolving attack and defense dynamics. Numeric empirical evaluations across diverse architectures demonstrate that our active framework is highly competitive with state-of-the-art defenses in mitigating stealthy, adaptive backdoors while preserving primary task utility.

[271]  arXiv:2603.18539 [pdf, ps, other]
Title: iSatCR: Graph-Empowered Joint Onboard Computing and Routing for LEO Data Delivery
Comments: 14 pages, 9 figures
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

Sending massive Earth observation data produced by low Earth orbit (LEO) satellites back to the ground for processing consumes a large amount of on-orbit bandwidth and exacerbates the space-to-ground link bottleneck. Most prior work has concentrated on optimizing the routing of raw data within the constellation, yet cannot cope with the surge in data volume. Recently, advances in onboard computing have made it possible to process data in situ, thus significantly reducing the data volume to be transmitted. In this paper, we present iSatCR, a distributed graph-based approach that jointly optimizes onboard computing and routing to boost transmission efficiency. Within iSatCR, we design a novel graph embedding utilizing shifted feature aggregation and distributed message passing to capture satellite states, and then propose a distributed graph-based deep reinforcement learning algorithm that derives joint computing-routing strategies under constrained on-board storage to handle the complexity and dynamics of LEO networks. Extensive experiments show iSatCR outperforms baselines, particularly under high load.

[272]  arXiv:2603.18540 [pdf, ps, other]
Title: GAPSL: A Gradient-Aligned Parallel Split Learning on Heterogeneous Data
Comments: 13 pages, 21 figures
Subjects: Machine Learning (cs.LG)

The increasing complexity of neural networks poses significant challenges for democratizing FL on resource?constrained client devices. Parallel split learning (PSL) has emerged as a promising solution by offloading substantial computing workload to a server via model partitioning, shrinking client-side computing load, and eliminating the client-side model aggregation for reduced communication and deployment costs. Since PSL is aggregation-free, it suffers from severe training divergence stemming from gradient directional inconsistency across clients. To address this challenge, we propose GAPSL, a gradient-aligned PSL framework that comprises two key components: leader gradient identification (LGI) and gradient direction alignment (GDA). LGI dynamically selects a set of directionally consistent client gradients to construct a leader gradient that captures the global convergence trend. GDA employs a direction-aware regularization to align each client's gradient with the leader gradient, thereby mitigating inter-device gradient directional inconsistency and enhancing model convergence. We evaluate GAPSL on a prototype computing testbed. Extensive experiments demonstrate that GAPSL consistently outperforms state-of-the-art benchmarks in training accuracy and latency.

[273]  arXiv:2603.18541 [pdf, ps, other]
Title: Remedying Target-Domain Astigmatism for Cross-Domain Few-Shot Object Detection
Comments: Accepted to CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Cross-domain few-shot object detection (CD-FSOD) aims to adapt pretrained detectors from a source domain to target domains with limited annotations, suffering from severe domain shifts and data scarcity problems. In this work, we find a previously overlooked phenomenon: models exhibit dispersed and unfocused attention in target domains, leading to imprecise localization and redundant predictions, just like a human cannot focus on visual objects. Therefore, we call it the target-domain Astigmatism problem. Analysis on attention distances across transformer layers reveals that regular fine-tuning inherently shows a trend to remedy this problem, but results are still far from satisfactory, which we aim to enhance in this paper. Biologically inspired by the human fovea-style visual system, we enhance the fine-tuning's inherent trend through a center-periphery attention refinement framework, which contains (1) a Positive Pattern Refinement module to reshape attention toward semantic objects using class-specific prototypes, simulating the visual center region; (2) a Negative Context Modulation module to enhance boundary discrimination by modeling background context, simulating the visual periphery region; and (3) a Textual Semantic Alignment module to strengthen center-periphery distinction through cross-modal cues. Our bio-inspired approach transforms astigmatic attention into focused patterns, substantially improving adaptation to target domains. Experiments on six challenging CD-FSOD benchmarks consistently demonstrate improved detection accuracy and establish new state-of-the-art results.

[274]  arXiv:2603.18543 [pdf, ps, other]
Title: Measuring ESG Risk in Supply Networks
Subjects: Social and Information Networks (cs.SI)

Environmental, Social and Governance (ESG) rating is a way for investors to prioritise investments in companies with good corporate behaviour. However, ESG ratings are vulnerable to greenwashing in a number of ways. In this paper we study the effect that trade with badly rated companies has on a target company's own rating. To do this we introduce a measurement framework, generalising PageRank and Alpha Centrality, which allows tuning of aggregation and path counting approaches to resist greenwashing and reflect the rater's opinions and preferences for harm accumulation. These metrics allow updating of the target's ESG rating, identification of influential neighbours and assessment of vulnerability of the target to bad behaviour in their supply network. We study these metrics on synthetic ESG interaction networks as well as a real inter-company network and the international trade network.

[275]  arXiv:2603.18545 [pdf, ps, other]
Title: CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Medical vision--language models (MVLMs) are increasingly used as perceptual backbones in radiology pipelines and as the visual front end of multimodal assistants, yet their reliability under real clinical workflows remains underexplored. Prior robustness evaluations often assume clean, curated inputs or study isolated corruptions, overlooking routine acquisition, reconstruction, display, and delivery operations that preserve clinical readability while shifting image statistics. To address this gap, we propose CoDA, a chain-of-distribution framework that constructs clinically plausible pipeline shifts by composing acquisition-like shading, reconstruction and display remapping, and delivery and export degradations. Under masked structural-similarity constraints, CoDA jointly optimizes stage compositions and parameters to induce failures while preserving visual plausibility. Across brain MRI, chest X-ray, and abdominal CT, CoDA substantially degrades the zero-shot performance of CLIP-style MVLMs, with chained compositions consistently more damaging than any single stage. We also evaluate multimodal large language models (MLLMs) as technical-authenticity auditors of imaging realism and quality rather than pathology. Proprietary multimodal models show degraded auditing reliability and persistent high-confidence errors on CoDA-shifted samples, while the medical-specific MLLMs we test exhibit clear deficiencies in medical image quality auditing. Finally, we introduce a post-hoc repair strategy based on teacher-guided token-space adaptation with patch-level alignment, which improves accuracy on archived CoDA outputs. Overall, our findings characterize a clinically grounded threat surface for MVLM deployment and show that lightweight alignment improves robustness in deployment.

[276]  arXiv:2603.18546 [pdf, ps, other]
Title: HEP Statistical Inference for UAV Fault Detection: CLs, LRT, and SBI Applied to Blade Damage
Authors: Khushiyant
Comments: 12 Pages, 8 Figures
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)

This paper transfers three statistical methods from particle physics to multirotor propeller fault detection: the likelihood ratio test (LRT) for binary detection, the CLs modified frequentist method for false alarm rate control, and sequential neural posterior estimation (SNPE) for quantitative fault characterization. Operating on spectral features tied to rotor harmonic physics, the system returns three outputs: binary detection, controlled false alarm rates, and calibrated posteriors over fault severity and motor location. On UAV-FD, a hexarotor dataset of 18 real flights with 5% and 10% blade damage, leave-one-flight-out cross-validation gives AUC 0.862 +/- 0.007 (95% CI: 0.849--0.876), outperforming CUSUM (0.708 +/- 0.010), autoencoder (0.753 +/- 0.009), and LSTM autoencoder (0.551). At 5% false alarm rate the system detects 93% of significant and 81% of subtle blade damage. On PADRE, a quadrotor platform, AUC reaches 0.986 after refitting only the generative models. SNPE gives a full posterior over fault severity (90% credible interval coverage 92--100%, MAE 0.012), so the output includes uncertainty rather than just a point estimate or fault flag. Per-flight sequential detection achieves 100% fault detection with 94% overall accuracy.

[277]  arXiv:2603.18548 [pdf, ps, other]
Title: SINDy-KANs: Sparse identification of non-linear dynamics through Kolmogorov-Arnold networks
Subjects: Machine Learning (cs.LG)

Kolmogorov-Arnold networks (KANs) have arisen as a potential way to enhance the interpretability of machine learning. However, solutions learned by KANs are not necessarily interpretable, in the sense of being sparse or parsimonious. Sparse identification of nonlinear dynamics (SINDy) is a complementary approach that allows for learning sparse equations for dynamical systems from data; however, learned equations are limited by the library. In this work, we present SINDy-KANs, which simultaneously train a KAN and a SINDy-like representation to increase interpretability of KAN representations with SINDy applied at the level of each activation function, while maintaining the function compositions possible through deep KANs. We apply our method to a number of symbolic regression tasks, including dynamical systems, to show accurate equation discovery across a range of systems.

[278]  arXiv:2603.18549 [pdf, ps, other]
Title: Quantifying Memory Cells Vulnerability for DRAM Security
Subjects: Cryptography and Security (cs.CR)

Dynamic Random Access Memory (DRAM) is pervasive in computer systems. Cell vulnerabilities caused by unintended phenomena (forced retention failure, latency alteration, rowhammer and rowpress) lead to unintended bit flips in memory. These phenomena have been explored as attacks to violate data integrity and confidentiality during normal operation, but also exploited as a benefit in security systems as a method to generate random secret keys and unique device fingerprints (e.g. Physically Unclonable Functions). In both cases, attackers may wish to exploit knowledge of individual cell flip vulnerability to predict the current/future data contents of a set of cells, which can be utilised to break security systems. In this work, we develop a quantitative, cell-level circuit framework that models DRAM vulnerability directly from its physical charge leakage and disturbance pathways. By linking these device-layer behaviours to system-level security properties, our framework enables systematic evaluation of DRAM with respect to volatility (retention), integrity (disturbance-induced modification), and confidentiality (pattern-dependent leakage). We further demonstrate how the framework can be applied to well-known failure modes, revealing non-uniform and context-dependent vulnerability patterns. This work provides both theoretical foundations and practical evaluation tools for evaluating the suitability of DRAM use within security applications.

[279]  arXiv:2603.18555 [pdf, ps, other]
Title: Inductance-Based Force Self-Sensing in Fiber-Reinforced Pneumatic Twisted-and-Coiled Actuators
Subjects: Robotics (cs.RO)

Fiber-reinforced pneumatic twisted-and-coiled actuators (FR-PTCAs) offer high power density and compliance but their strong hysteresis and lack of intrinsic proprioception limit effective closed-loop control. This paper presents a self-sensing FR-PTCA integrated with a conductive nickel wire that enables intrinsic force estimation and indirect displacement inference via inductance feedback. Experimental characterization reveals that the inductance of the actuator exhibits a deterministic, low-hysteresis inductance-force relationship at constant pressures, in contrast to the strongly hysteretic inductance-length behavior. Leveraging this property, this paper develops a parametric self-sensing model and a nonlinear hybrid observer that integrates an Extended Kalman Filter (EKF) with constrained optimization to resolve the ambiguity in the inductance-force mapping and estimate actuator states. Experimental results demonstrate that the proposed approach achieves force estimation accuracy comparable to that of external load cells and maintains robust performance under varying load conditions.

[280]  arXiv:2603.18556 [pdf, ps, other]
Title: Latent Factor Modeling with Expert Network for Multi-Behavior Recommendation
Subjects: Information Retrieval (cs.IR)

Traditional recommendation methods, which typically focus on modeling a single user behavior (e.g., purchase), often face severe data sparsity issues. Multi-behavior recommendation methods offer a promising solution by leveraging user data from diverse behaviors. However, most existing approaches entangle multiple behavioral factors, learning holistic but imprecise representations that fail to capture specific user intents. To address this issue, we propose a multi-behavior method by modeling latent factors with an expert network (MBLFE). In our approach, we design a gating expert network, where the expert network models all latent factors within the entire recommendation scenario, with each expert specializing in a specific latent factor. The gating network dynamically selects the optimal combination of experts for each user, enabling a more accurate representation of user preferences. To ensure independence among experts and factor consistency of a particular expert, we incorporate self-supervised learning during the training process. Furthermore, we enrich embeddings with multi-behavior data to provide the expert network with more comprehensive collaborative information for factor extraction. Extensive experiments on three real-world datasets demonstrate that our method significantly outperforms state-of-the-art baselines, validating its effectiveness.

[281]  arXiv:2603.18557 [pdf, ps, other]
Title: Cross-Lingual LLM-Judge Transfer via Evaluation Decomposition
Comments: 19 pages
Subjects: Computation and Language (cs.CL)

As large language models are increasingly deployed across diverse real-world applications, extending automated evaluation beyond English has become a critical challenge. Existing evaluation approaches are predominantly English-focused, and adapting them to other languages is hindered by the scarcity and cost of human-annotated judgments in most languages. We introduce a decomposition-based evaluation framework built around a Universal Criteria Set (UCS). UCS consists of a shared, language-agnostic set of evaluation dimensions, producing an interpretable intermediate representation that supports cross-lingual transfer with minimal supervision. Experiments on multiple faithfulness tasks across languages and model backbones demonstrate consistent improvements over strong baselines without requiring target-language annotations.

[282]  arXiv:2603.18558 [pdf, ps, other]
Title: HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Long-form video question answering requires reasoning over extended temporal contexts, making frame selection critical for large vision-language models (LVLMs) bound by finite context windows. Existing methods face a sharp trade-off: similarity-based selectors are fast but collapse compositional queries into a single dense vector, losing sub-event ordering and cross-modal bindings; agent-based methods recover this structure through iterative LVLM inference, but at prohibitive cost. We introduce HiMu, a training-free framework that bridges this gap. A single text-only LLM call decomposes the query into a hierarchical logic tree whose leaves are atomic predicates, each routed to a lightweight expert spanning vision (CLIP, open-vocabulary detection, OCR) and audio (ASR, CLAP). The resulting signals are normalized, temporally smoothed to align different modalities, and composed bottom-up through fuzzy-logic operators that enforce temporal sequencing and adjacency, producing a continuous satisfaction curve. Evaluations on Video-MME, LongVideoBench and HERBench-Lite show that HiMu advances the efficiency-accuracy Pareto front: at 16 frames with Qwen3-VL 8B it outperforms all competing selectors, and with GPT-4o it surpasses agentic systems operating at 32-512 frames while requiring roughly 10x fewer FLOPs.

[283]  arXiv:2603.18559 [pdf, ps, other]
Title: TiBCLaG: A Trigger-induced Bistable Compliant Laparoscopic Grasper
Comments: 17 pages, 13 figures
Subjects: Robotics (cs.RO)

Industrial laparoscopic graspers use multi-link rigid mechanisms manufactured to tight tolerances, resulting in high manufacturing and assembly costs. This work presents the design and proof-of-concept validation of a monolithic, fully compliant, bistable, laparoscopic grasper that eliminates the need for multiple rigid links, thereby reducing part count. The device integrates a compliant trigger and a compliant gripper end-effector, coupled via a control push-rod, to achieve stable grasping without continuous user input. The trigger mechanism is synthesized using a Two-Element Beam Constraint Model as a design framework to control the deformation and stiffness of V-beam-like elements. This technique enables elastic energy storage while preventing snap-through instability. The end-effector is designed as a compliant gripper to achieve adaptive grasping through elastic deformation. Jaws' opening-and-closing performance is demonstrated using nonlinear finite element analysis. The laparoscopic design presented here is fabricated using fused deposition 3D printing. The fabricated prototype demonstrates reliable bistable actuation, confirming the feasibility of such compliant laparoscopic grasper architectures.

[284]  arXiv:2603.18561 [pdf, ps, other]
Title: CausalVAD: De-confounding End-to-End Autonomous Driving via Causal Intervention
Comments: Accepted to CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Planning-oriented end-to-end driving models show great promise, yet they fundamentally learn statistical correlations instead of true causal relationships. This vulnerability leads to causal confusion, where models exploit dataset biases as shortcuts, critically harming their reliability and safety in complex scenarios. To address this, we introduce CausalVAD, a de-confounding training framework that leverages causal intervention. At its core, we design the sparse causal intervention scheme (SCIS), a lightweight, plug-and-play module to instantiate the backdoor adjustment theory in neural networks. SCIS constructs a dictionary of prototypes representing latent driving contexts. It then uses this dictionary to intervene on the model's sparse vectorized queries. This step actively eliminates spurious associations induced by confounders, thereby eliminating spurious factors from the representations for downstream tasks. Extensive experiments on benchmarks like nuScenes show CausalVAD achieves state-of-the-art planning accuracy and safety. Furthermore, our method demonstrates superior robustness against both data bias and noisy scenarios configured to induce causal confusion.

[285]  arXiv:2603.18563 [pdf, ps, other]
Title: Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH)

AI agents are increasingly deployed in interactive economic environments characterized by repeated AI-AI interactions. Despite AI agents' advanced capabilities, empirical studies reveal that such interactions often fail to stably induce a strategic equilibrium, such as a Nash equilibrium. Post-training methods have been proposed to induce a strategic equilibrium; however, it remains impractical to uniformly apply an alignment method across diverse, independently developed AI models in strategic settings. In this paper, we provide theoretical and empirical evidence that off-the-shelf reasoning AI agents can achieve Nash-like play zero-shot, without explicit post-training. Specifically, we prove that `reasonably reasoning' agents, i.e., agents capable of forming beliefs about others' strategies from previous observation and learning to best respond to these beliefs, eventually behave along almost every realized play path in a way that is weakly close to a Nash equilibrium of the continuation game. In addition, we relax the common-knowledge payoff assumption by allowing stage payoffs to be unknown and by having each agent observe only its own privately realized stochastic payoffs, and we show that we can still achieve the same on-path Nash convergence guarantee. We then empirically validate the proposed theories by simulating five game scenarios, ranging from a repeated prisoner's dilemma game to stylized repeated marketing promotion games. Our findings suggest that AI agents naturally exhibit such reasoning patterns and therefore attain stable equilibrium behaviors intrinsically, obviating the need for universal alignment procedures in many real-world strategic interactions.

[286]  arXiv:2603.18564 [pdf, ps, other]
Title: Transformers Learn Robust In-Context Regression under Distributional Uncertainty
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent work has shown that Transformers can perform in-context learning for linear regression under restrictive assumptions, including i.i.d. data, Gaussian noise, and Gaussian regression coefficients. However, real-world data often violate these assumptions: the distributions of inputs, noise, and coefficients are typically unknown, non-Gaussian, and may exhibit dependency across the prompt. This raises a fundamental question: can Transformers learn effectively in-context under realistic distributional uncertainty? We study in-context learning for noisy linear regression under a broad range of distributional shifts, including non-Gaussian coefficients, heavy-tailed noise, and non-i.i.d. prompts. We compare Transformers against classical baselines that are optimal or suboptimal under the corresponding maximum-likelihood criteria. Across all settings, Transformers consistently match or outperform these baselines, demonstrating robust in-context adaptation beyond classical estimators.

[287]  arXiv:2603.18567 [pdf, ps, other]
Title: SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large language models incur high inference latency due to sequential autoregressive decoding. Speculative decoding alleviates this bottleneck by using a lightweight draft model to propose multiple tokens for batched verification. However, its adoption has been limited by the lack of high-quality draft models and scalable training infrastructure. We introduce SpecForge, an open-source, production-oriented framework for training speculative decoding models with full support for EAGLE-3. SpecForge incorporates target-draft decoupling, hybrid parallelism, optimized training kernels, and integration with production-grade inference engines, enabling up to 9.9x faster EAGLE-3 training for Qwen3-235B-A22B. In addition, we release SpecBundle, a suite of production-grade EAGLE-3 draft models trained with SpecForge for mainstream open-source LLMs. Through a systematic study of speculative decoding training recipes, SpecBundle addresses the scarcity of high-quality drafts in the community, and our draft models achieve up to 4.48x end-to-end inference speedup on SGLang, establishing SpecForge as a practical foundation for real-world speculative decoding deployment.

[288]  arXiv:2603.18568 [pdf, ps, other]
Title: Some structural properties of mixed orthogonal arrays and their irredundancy
Subjects: Information Theory (cs.IT)

Mixed (asymmetric) orthogonal arrays (MOAs) generalize classical orthogonal arrays by allowing columns over different alphabets. However, their study requires very different structural tools than those used for symmetric orthogonal arrays (OAs), since several key features of the symmetric setting are no longer available in the mixed case, including Euclidean duality, a unique global index, and certain classical bounds. In this paper, we establish three structural results for mixed orthogonal arrays. First, we prove a Singleton-type upper bound and obtain a characterization of MDS and almost-MDS mixed orthogonal arrays. Second, we introduce a trace duality for $\mathbb{F}_q$-linear MOAs over $\prod_{i=1}^{s} \mathbb{F}_{q^{n_i}}$ and establish a correspondence with $\mathbb{F}_q$-linear error-block codes that determines the strength of the MOA via the dual distance of the associated error-block code. Finally, we develop a structural theory of irredundant mixed orthogonal arrays (IrMOAs), motivated by their role in the construction of $t$-uniform and absolutely maximally entangled (AME) quantum states. In the extremal case $t=\lfloor s/2\rfloor$, we prove that $\mathbb{F}_q$-linear IrMOAs with minimum index $1$ (yielding AME states of minimal support) are equivalent to $\mathbb{F}_q$-linear error-block MDS codes.

[289]  arXiv:2603.18569 [pdf, ps, other]
Title: Damage identification using noisy frequency response functions based on topology optimization
Journal-ref: Journal of Sound and Vibration, 545, 117412 (2023)
Subjects: Computational Engineering, Finance, and Science (cs.CE); Signal Processing (eess.SP)

This paper proposes a robust damage identification method using noisy frequency response functions (FRFs) and topology optimization. We formulate the damage identification problem as an inverse problem of generating the damage topology of the structure from measured dynamic responses of the structure to given external dynamic loading. The method is based on the minimization of the objective function representing errors between measured FRFs of the structure obtained by experimental modal analysis, and those obtained by harmonic response analysis using finite element analysis. In the minimization process, material distribution, or the topology of the structure is varied and the optimal damage topology is identified as regions with no material assigned as a result of the minimization using the solid isotropic material with penalization (SIMP). In order to overcome the problems caused by the ill-posedness of the inverse problem, it is proposed that the least absolute shrinkage and selection operator (Lasso) regularization, or the penalization to the L1 norm of the design variable be applied to the original objective function. By applying Lasso regularization, the method is expected not only to eliminate spurious damaged regions but also to minimize the effect of measurement noises. This paper first presents the mathematical background and its numerical implementation of the proposed methodology. The method is then applied to the identification of a damage of cantilevered plates. The FRFs were experimentally obtained and the proposed method is applied. It is shown that the method successfully identifies the damage.

[290]  arXiv:2603.18570 [pdf, ps, other]
Title: Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Graph neural networks (GNNs) are widely used for learning from graph-structured data in domains such as social networks, recommender systems, and financial platforms. To comply with privacy regulations like the GDPR, CCPA, and PIPEDA, approximate graph unlearning, which aims to remove the influence of specific data points from trained models without full retraining, has become an increasingly important component of trustworthy graph learning. However, approximate unlearning often incurs subtle performance degradation, which may incur negative and unintended side effects. In this work, we show that such degradations can be amplified into adversarial attacks. We introduce the notion of \textbf{unlearning corruption attacks}, where an adversary injects carefully chosen nodes into the training graph and later requests their deletion. Because deletion requests are legally mandated and cannot be denied, this attack surface is both unavoidable and stealthy: the model performs normally during training, but accuracy collapses only after unlearning is applied. Technically, we formulate this attack as a bi-level optimization problem: to overcome the challenges of black-box unlearning and label scarcity, we approximate the unlearning process via gradient-based updates and employ a surrogate model to generate pseudo-labels for the optimization. Extensive experiments across benchmarks and unlearning algorithms demonstrate that small, carefully designed unlearning requests can induce significant accuracy degradation, raising urgent concerns about the robustness of GNN unlearning under real-world regulatory demands. The source code will be released upon paper acceptance.

[291]  arXiv:2603.18571 [pdf, ps, other]
Title: CAPSUL: A Comprehensive Human Protein Benchmark for Subcellular Localization
Comments: Accepted to ICLR 2026
Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Quantitative Methods (q-bio.QM)

Subcellular localization is a crucial biological task for drug target identification and function annotation. Although it has been biologically realized that subcellular localization is closely associated with protein structure, no existing dataset offers comprehensive 3D structural information with detailed subcellular localization annotations, thus severely hindering the application of promising structure-based models on this task. To address this gap, we introduce a new benchmark called $\mathbf{CAPSUL}$, a $\mathbf{C}$omprehensive hum$\mathbf{A}$n $\mathbf{P}$rotein benchmark for $\mathbf{SU}$bcellular $\mathbf{L}$ocalization. It features a dataset that integrates diverse 3D structural representations with fine-grained subcellular localization annotations carefully curated by domain experts. We evaluate this benchmark using a variety of state-of-the-art sequence-based and structure-based models, showcasing the importance of involving structural features in this task. Furthermore, we explore reweighting and single-label classification strategies to facilitate future investigation on structure-based methods for this task. Lastly, we showcase the powerful interpretability of structure-based methods through a case study on the Golgi apparatus, where we discover a decisive localization pattern $\alpha$-helix from attention mechanisms, demonstrating the potential for bridging the gap with intuitive biological interpretability and paving the way for data-driven discoveries in cell biology.

[292]  arXiv:2603.18573 [pdf, ps, other]
Title: Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation
Comments: Accepted at ECIR 2026
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Training conversational recommender systems (CRS) requires extensive dialogue data, which is challenging to collect at scale. To address this, researchers have used simulated user-recommender conversations. Traditional simulation approaches often utilize a single large language model (LLM) that generates entire conversations with prior knowledge of the target items, leading to scripted and artificial dialogues. We propose a reference-free simulation framework that trains two independent LLMs, one as the user and one as the conversational recommender. These models interact in real-time without access to predetermined target items, but preference summaries and target attributes, enabling the recommender to genuinely infer user preferences through dialogue. This approach produces more realistic and diverse conversations that closely mirror authentic human-AI interactions. Our reference-free simulators match or exceed existing methods in quality, while offering a scalable solution for generating high-quality conversational recommendation data without constraining conversations to pre-defined target items. We conduct both quantitative and human evaluations to confirm the effectiveness of our reference-free approach.

[293]  arXiv:2603.18575 [pdf, ps, other]
Title: Modeling the Impacts of Swipe Delay on User Quality of Experience in Short Video Streaming
Subjects: Multimedia (cs.MM)

Short video streaming platforms have gained immense popularity in recent years, transforming the way users consume video content. A critical aspect of user interaction with these platforms is the swipe gesture, which allows users to navigate through videos seamlessly. However, the delay between a user's swipe action and the subsequent video playback can significantly impact the overall user experience. This paper presents the first systematic study investigating the effects of swipe delay on user Quality of Experience (QoE) in short video streaming. In particular, we conduct a subjective quality assessment containing 132 swipe delay patterns. The obtained results show that user experience is affected not only by the swipe delay duration, but also by the number of delays and their temporal positions. A single delay of eight seconds or longer is likely to lead to user dissatisfaction. Moreover, early-session delays are less harmful to user QoE than late-session delays. Based on the findings, we propose a novel QoE model that accurately predicts user experience based on swipe delay characteristics. The proposed model demonstrates high correlation with subjective ratings, outperforming existing models in short video streaming.

[294]  arXiv:2603.18577 [pdf, ps, other]
Title: MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning
Subjects: Artificial Intelligence (cs.AI)

Text-guided image editors can now manipulate authentic medical scans with high fidelity, enabling lesion implantation/removal that threatens clinical trust and safety. Existing defenses are inadequate for healthcare. Medical detectors are largely black-box, while MLLM-based explainers are typically post-hoc, lack medical expertise, and may hallucinate evidence on ambiguous cases. We present MedForge, a data-and-method solution for pre-hoc, evidence-grounded medical forgery detection. We introduce MedForge-90K, a large-scale benchmark of realistic lesion edits across 19 pathologies with expert-guided reasoning supervision via doctor inspection guidelines and gold edit locations. Building on it, MedForge-Reasoner performs localize-then-analyze reasoning, predicting suspicious regions before producing a verdict, and is further aligned with Forgery-aware GSPO to strengthen grounding and reduce hallucinations. Experiments demonstrate state-of-the-art detection accuracy and trustworthy, expert-aligned explanations.

[295]  arXiv:2603.18578 [pdf, ps, other]
Title: Dream the Dream: Futuring Communication between LGBTQ+ and Cisgender Groups in Metaverse
Comments: Conditionally accepted to DIS 2026
Subjects: Human-Computer Interaction (cs.HC)

Digital platforms frequently reproduce heteronormative norms and structural biases, limiting inclusive communication between LGBTQ+ and cisgender individuals. The Metaverse, with its affordances for identity fluidity, presence, and community governance, offers a promising site for reimagining such interactions. To investigate this potential, we conducted participatory design workshops involving LGBTQ+ and cisgender participants, situating them in speculative Metaverse contexts to surface barriers and co-create alternative futures. The workshops followed a three-phase process-identifying challenges, speculative problem-solving, and visualizing futures-yielding socio-spatial-technical solutions across four layers: activity, interaction, scene, and space. These findings highlight the importance of spatial cues and power dynamics in shaping digital encounters. We contribute by (1) articulating challenges of cross-group communication in virtual environments, (2) proposing inclusive design opportunities for the Metaverse, and (3) advancing principles for addressing power geometry in digital space. This work demonstrates futuring as a critical strategy for designing equitable, transformative communication infrastructures.

[296]  arXiv:2603.18579 [pdf, ps, other]
Title: ICE: Intervention-Consistent Explanation Evaluation with Statistical Grounding for LLMs
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Evaluating whether explanations faithfully reflect a model's reasoning remains an open problem. Existing benchmarks use single interventions without statistical testing, making it impossible to distinguish genuine faithfulness from chance-level performance. We introduce ICE (Intervention-Consistent Explanation), a framework that compares explanations against matched random baselines via randomization tests under multiple intervention operators, yielding win rates with confidence intervals. Evaluating 7 LLMs across 4 English tasks, 6 non-English languages, and 2 attribution methods, we find that faithfulness is operator-dependent: operator gaps reach up to 44 percentage points, with deletion typically inflating estimates on short text but the pattern reversing on long text, suggesting that faithfulness should be interpreted comparatively across intervention operators rather than as a single score. Randomized baselines reveal anti-faithfulness in one-third of configurations, and faithfulness shows zero correlation with human plausibility (|r| < 0.04). Multilingual evaluation reveals dramatic model-language interactions not explained by tokenization alone. We release the ICE framework and ICEBench benchmark.

[297]  arXiv:2603.18581 [pdf, ps, other]
Title: WarPGNN: A Parametric Thermal Warpage Analysis Framework with Physics-aware Graph Neural Network
Comments: 6 Pages, ACM format
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Systems and Control (eess.SY)

With the advent of system-in-package (SiP) chiplet-based design and heterogeneous 2.5D/3D integration, thermal-induced warpage has become a critical reliability concern. While conventional numerical approaches can deliver highly accurate results, they often incur prohib- itively high computational costs, limiting their scalability for complex chiplet-package systems. In this paper, we present WarPGNN, an ef- ficient and accurate parametric thermal warpage analysis framework powered by Graph Neural Networks (GNNs). By operating directly on graphs constructed from the floorplans, WarPGNN enables fast warpage-aware floorplan exploration and exhibits strong transfer- ability across diverse package configurations. Our method first en- codes multi-die floorplans into reduced Transitive Closure Graphs (rTCGs), then a Graph Convolution Network (GCN)-based encoder extracts hierarchical structural features, followed by a U-Net inspired decoder that reconstructs warpage maps from graph feature embed- dings. Furthermore, to address the long-tailed pattern of warpage data distribution, we developed a physics-informed loss and revised a message-passing encoder based on Graph Isomorphic Network (GIN) that further enhance learning performance for extreme cases and expressiveness of graph embeddings. Numerical results show that WarPGNN achieves more than 205.91x speedup compared with the 2-D efficient FEM-based method and over 119766.64x acceleration with 3-D FEM method COMSOL, respectively, while maintaining comparable accuracy at only 1.26% full-scale normalized RMSE and 2.21% warpage value error. Compared with recent DeepONet-based model, our method achieved comparable prediction accuracy and in- ference speedup with 3.4x lower training time. In addition, WarPGNN demonstrates remarkable transferability on unseen datasets with up to 3.69% normalized RMSE and similar runtime.

[298]  arXiv:2603.18582 [pdf, ps, other]
Title: Breaking Hard Isomorphism Benchmarks with DRESS
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Machine Learning (cs.LG)

In this paper we study the single-deletion variant $\Delta$-DRESS, part of the broader DRESS framework. We demonstrate empirically that $\Delta$-DRESS, a single level of vertex deletion applied to the DRESS graph fingerprint, achieves unique fingerprints within each tested SRG parameter family across all 51,718 non-isomorphic strongly regular graphs (SRGs) considered, spanning 16 parameter families: the complete Spence collection (12 families, 43,703 graphs on up to 64 vertices) plus four additional SRG families with up to 4,466 graphs per family. Combined with 18 additional hard graph families (102 graphs including Miyazaki, Chang, Paley, Latin square, and Steiner constructions), $\Delta$-DRESS achieves 100% within-family separation across 34 benchmark families covering 51,816 distinct graph instances, implicitly resolving over 576 million within-family non-isomorphic pairs. Moreover, the classical Rook $L_2(4)$ vs. Shrikhande pair, SRG(16,6,2,2), is known to be indistinguishable by the original 3-WL algorithm, yet $\Delta$-DRESS separates it, proving that $\Delta$-DRESS escapes the theoretical boundaries of 3-WL. The method runs in polynomial time $\mathcal{O}(n \cdot I \cdot m \cdot d_{\max})$ per graph; a streamed implementation of the combined fingerprint uses $\mathcal{O}(m + B + n)$ memory, where $B$ is the number of histogram bins, while the experiments reported here additionally retain the full deleted-subgraph multiset matrix for post-hoc analysis.

[299]  arXiv:2603.18584 [pdf, ps, other]
Title: Model Reference Adaptive Control For Gust Load Allevation of Nonlinear Aeroelastic
Comments: 17
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Model Reference Adaptive Control based on Lyapunov stability theory is developed for gust load alleviation of nonlinear aeroelastic systems. The controller operates on a nonlinear reduced-order model derived from Taylor series expansion and eigenvector projection of the coupled fluid-structure-flight dynamic equations. The complete MRAC formulation is presented, including the reference model design that encodes desired closed-loop damping characteristics, the adaptive control law with real-time gain adjustment, and the Lyapunov derivation of the adaptation law that guarantees asymptotic tracking in the linear case and bounded tracking under a Lipschitz condition on the nonlinear residual. The adaptation rate matrix is identified as the single most important design parameter, governing the trade-off between convergence speed, peak load reduction, and actuator demand. Two test cases are considered, a 3DOF aerofoil with cubic stiffness nonlinearities, and a Global Hawk type unmanned aerial vehicle. For the UAV under a discrete gusts, MRAC achieves significant wing-tip deflection reductions, outperforming the H infinity robust control benchmark with comparable control effort. Under Von Karman stochastic turbulence, meaningful reductions are also obtained, with performance scaling with the adaptation rate. The results demonstrate that MRAC provides an effective framework for GLA of flexible aircraft operating in both deterministic and stochastic disturbance environments.

[300]  arXiv:2603.18585 [pdf, ps, other]
Title: HAViT: Historical Attention Vision Transformer
Journal-ref: 2026 IEEE Conference on Artificial Intelligence (CAI)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision Transformers have excelled in computer vision but their attention mechanisms operate independently across layers, limiting information flow and feature learning. We propose an effective cross-layer attention propagation method that preserves and integrates historical attention matrices across encoder layers, offering a principled refinement of inter-layer information flow in Vision Transformers. This approach enables progressive refinement of attention patterns throughout the transformer hierarchy, enhancing feature acquisition and optimization dynamics. The method requires minimal architectural changes, adding only attention matrix storage and blending operations. Comprehensive experiments on CIFAR-100 and TinyImageNet demonstrate consistent accuracy improvements, with ViT performance increasing from 75.74% to 77.07% on CIFAR-100 (+1.33%) and from 57.82% to 59.07% on TinyImageNet (+1.25%). Cross-architecture validation shows similar gains across transformer variants, with CaiT showing 1.01% enhancement. Systematic analysis identifies the blending hyperparameter of historical attention (alpha = 0.45) as optimal across all configurations, providing the ideal balance between current and historical attention information. Random initialization consistently outperforms zero initialization, indicating that diverse initial attention patterns accelerate convergence and improve final performance. Our code is publicly available at https://github.com/banik-s/HAViT.

[301]  arXiv:2603.18586 [pdf, ps, other]
Title: Color image restoration based on nonlocal saturation-value similarity
Authors: Wei Wang, Yakun Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose and develop a novel nonlocal variational technique based on saturation-value similarity for color image restoration. In traditional nonlocal methods, image patches are extracted from red, green and blue channels of a color image directly, and the color information can not be described finely because the patch similarity is mainly based on the grayscale value of independent channel. The main aim of this paper is to propose and develop a novel nonlocal regularization method by considering the similarity of image patches in saturation-value channel of a color image. In particular, we first establish saturation-value similarity based nonlocal total variation by incorporating saturation-value similarity of color image patches into the proposed nonlocal gradients, which can describe the saturation and value similarity of two adjacent color image patches. The proposed nonlocal variational models are then formulated based on saturation-value similarity based nonlocal total variation. Moreover, we design an effective and efficient algorithm to solve the proposed optimization problem numerically by employing bregmanized operator splitting method, and we also study the convergence of the proposed algorithms. Numerical examples are presented to demonstrate that the performance of the proposed models is better than that of other testing methods in terms of visual quality and some quantitative metrics including peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), quaternion structural similarity index (QSSIM) and S-CIELAB color error.

[302]  arXiv:2603.18588 [pdf, ps, other]
Title: AU Codes, Language, and Synthesis: Translating Anatomy to Text for Facial Behavior Synthesis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Facial behavior synthesis remains a critical yet underexplored challenge. While text-to-face models have made progress, they often rely on coarse emotion categories, which lack the nuance needed to capture the full spectrum of human nonverbal communication. Action Units (AUs) provide a more precise and anatomically grounded alternative. However, current AU-based approaches typically encode AUs as one-hot vectors, modeling compound expressions as simple linear combinations of individual AUs. This linearity becomes problematic when handling conflicting AUs--defined as those which activate the same facial muscle with opposing actions. Such cases lead to anatomically implausible artifacts and unnatural motion superpositions. To address this, we propose a novel method that represents facial behavior through natural language descriptions of AUs. This approach preserves the expressiveness of the AU framework while enabling explicit modeling of complex and conflicting AUs. It also unlocks the potential of modern text-to-image models for high-fidelity facial synthesis. Supporting this direction, we introduce BP4D-AUText, the first large-scale text-image paired dataset for complex facial behavior. It is synthesized by applying a rule-based Dynamic AU Text Processor to the BP4D and BP4D+ datasets. We further propose VQ-AUFace, a generative model that leverages facial structural priors to synthesize realistic and diverse facial behaviors from text. Extensive quantitative experiments and user studies demonstrate that our approach significantly outperforms existing methods. It excels in generating facial expressions that are anatomically plausible, behaviorally rich, and perceptually convincing, particularly under challenging conditions involving conflicting AUs.

[303]  arXiv:2603.18589 [pdf, ps, other]
Title: Benchmarking Visual Feature Representations for LiDAR-Inertial-Visual Odometry Under Challenging Conditions
Comments: 14 pages, Publised IEEE Access2026
Journal-ref: E. Choi et al., "Benchmarking Visual Feature Representations for LiDAR-Inertial-Visual Odometry Under Challenging Conditions," in IEEE Access, vol. 14, pp. 30186-30199, 2026
Subjects: Robotics (cs.RO)

Accurate localization in autonomous driving is critical for successful missions including environmental mapping and survivor searches. In visually challenging environments, including low-light conditions, overexposure, illumination changes, and high parallax, the performance of conventional visual odometry methods significantly degrade undermining robust robotic navigation. Researchers have recently proposed LiDAR-inertial-visual odometry (LIVO) frameworks, that integrate LiDAR, IMU, and camera sensors, to address these challenges. This paper extends the FAST-LIVO2-based framework by introducing a hybrid approach that integrates direct photometric methods with descriptor-based feature matching. For the descriptor-based feature matching, this work proposes pairs of ORB with the Hamming distance, SuperPoint with SuperGlue, SuperPoint with LightGlue, and XFeat with the mutual nearest neighbor. The proposed configurations are benchmarked by accuracy, computational cost, and feature tracking stability, enabling a quantitative comparison of the adaptability and applicability of visual descriptors. The experimental results reveal that the proposed hybrid approach outperforms the conventional sparse-direct method. Although the sparse-direct method often fails to converge in regions where photometric inconsistency arises due to illumination changes, the proposed approach still maintains robust performance under the same conditions. Furthermore, the hybrid approach with learning-based descriptors enables robust and reliable visual state estimation across challenging environments.

[304]  arXiv:2603.18593 [pdf, ps, other]
Title: Language Model Maps for Prompt-Response Distributions via Log-Likelihood Vectors
Subjects: Computation and Language (cs.CL)

We propose a method that represents language models by log-likelihood vectors over prompt-response pairs and constructs model maps for comparing their conditional distributions. In this space, distances between models approximate the KL divergence between the corresponding conditional distributions. Experiments on a large collection of publicly available language models show that the maps capture meaningful global structure, including relationships to model attributes and task performance. The method also captures systematic shifts induced by prompt modifications and their approximate additive compositionality, suggesting a way to analyze and predict the effects of composite prompt operations. We further introduce pointwise mutual information (PMI) vectors to reduce the influence of unconditional distributions; in some cases, PMI-based model maps better reflect training-data-related differences. Overall, the framework supports the analysis of input-dependent model behavior.

[305]  arXiv:2603.18595 [pdf, ps, other]
Title: RUBICONe: Wireless RAFT-Unified Behaviors for Intervehicular Cooperative Operations and Negotiations
Subjects: Networking and Internet Architecture (cs.NI)

Just as Caesar declared "alea iacta est" (the die is cast) upon crossing the Rubicone river, lane change decisions in autonomous vehicles also represent critical points of no return. RUBICONe addresses this challenge by recognizing that lane change decision-making relying solely on a single vehicle's perception would be as precarious as crossing an unknown river alone. By implementing a distributed consensus framework that extends the RAFT algorithm with wireless connectivity, RUBICONe enables multiple vehicles to collectively process and aggregate their perceptions. Using multiple software-defined radio (SDR) devices as the experimental platform, this study demonstrates how consensus-based decision-making significantly reduces the impact of environmental interference and mitigates the risk of misjudgments by individual vehicles. Just as crossing the Rubicone marked a point of irrevocable action backed by collective intelligence, RUBICONe ensures that lane change decisions are made with comprehensive situational awareness and distributed consensus, showcasing the reliability gain of consensus in wireless communications.

[306]  arXiv:2603.18596 [pdf, ps, other]
Title: Elastic Weight Consolidation Done Right for Continual Learning
Comments: Accepted to CVPR 2026
Journal-ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Weight regularization methods in continual learning (CL) alleviate catastrophic forgetting by assessing and penalizing changes to important model weights. Elastic Weight Consolidation (EWC) is a foundational and widely used approach within this framework that estimates weight importance based on gradients. However, it has consistently shown suboptimal performance. In this paper, we conduct a systematic analysis of importance estimation in EWC from a gradient-based perspective. For the first time, we find that EWC's reliance on the Fisher Information Matrix (FIM) results in gradient vanishing and inaccurate importance estimation in certain scenarios. Our analysis also reveals that Memory Aware Synapses (MAS), a variant of EWC, imposes unnecessary constraints on parameters irrelevant to prior tasks, termed the redundant protection. Consequently, both EWC and its variants exhibit fundamental misalignments in estimating weight importance, leading to inferior performance. To tackle these issues, we propose the Logits Reversal (LR) operation, a simple yet effective modification that rectifies EWC's importance estimation. Specifically, reversing the logit values during the calculation of FIM can effectively prevent both gradient vanishing and redundant protection. Extensive experiments across various CL tasks and datasets show that the proposed method significantly outperforms existing EWC and its variants. Therefore, we refer to it as EWC Done Right (EWC-DR).

[307]  arXiv:2603.18597 [pdf, ps, other]
Title: myMNIST: Benchmark of PETNN, KAN, and Classical Deep Learning Models for Burmese Handwritten Digit Recognition
Comments: 7 pages, 2 figures, 3 tables, Accepted to ICNLP 2026, Xi'an, China
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We present the first systematic benchmark on myMNIST (formerly BHDD), a publicly available Burmese handwritten digit dataset important for Myanmar NLP/AI research. We evaluate eleven architectures spanning classical deep learning models (Multi-Layer Perceptron, Convolutional Neural Network, Long Short-Term Memory, Gated Recurrent Unit, Transformer), recent alternatives (FastKAN, EfficientKAN), an energy-based model (JEM), and physics-inspired PETNN variants (Sigmoid, GELU, SiLU). Using Precision, Recall, F1-Score, and Accuracy as evaluation metrics, our results show that the CNN remains a strong baseline, achieving the best overall scores (F1 = 0.9959, Accuracy = 0.9970). The PETNN (GELU) model closely follows (F1 = 0.9955, Accuracy = 0.9966), outperforming LSTM, GRU, Transformer, and KAN variants. JEM, representing energy-based modeling, performs competitively (F1 = 0.9944, Accuracy = 0.9958). KAN-based models (FastKAN, EfficientKAN) trail the top performers but provide a meaningful alternative baseline (Accuracy ~0.992). These findings (i) establish reproducible baselines for myMNIST across diverse modeling paradigms, (ii) highlight PETNN's strong performance relative to classical and Transformer-based models, and (iii) quantify the gap between energy-inspired PETNNs and a true energy-based model (JEM). We release this benchmark to facilitate future research on Myanmar digit recognition and to encourage broader evaluation of emerging architectures on regional scripts.

[308]  arXiv:2603.18598 [pdf, ps, other]
Title: Complementary Text-Guided Attention for Zero-Shot Adversarial Robustness
Comments: Accepted to TPAMI 2026. arXiv admin note: substantial text overlap with arXiv:2410.21802
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Due to the impressive zero-shot capabilities, pre-trained vision-language models (e.g., CLIP), have attracted widespread attention and adoption across various domains. Nonetheless, CLIP has been observed to be susceptible to adversarial examples. Through experimental analysis, we have observed a phenomenon wherein adversarial perturbations induce shifts in text-guided attention. Building upon this observation, we propose a simple yet effective strategy: Text-Guided Attention for Zero-Shot Robustness (TGA-ZSR). This framework incorporates two components: Local Attention Refinement Module and Global Attention Constraint Module. Our goal is to maintain the generalization of the CLIP model and enhance its adversarial robustness. Additionally, the Global Attention Constraint Module acquires text-guided attention from both the target and original models using clean examples. Its objective is to maintain model performance on clean samples while enhancing overall robustness. However, we observe that the method occasionally focuses on irrelevant or spurious features, which can lead to suboptimal performance and undermine its robustness in certain scenarios. To overcome this limitation, we further propose a novel approach called Complementary Text-Guided Attention (Comp-TGA). This method integrates two types of foreground attention: attention guided by the class prompt and reversed attention driven by the non-class prompt. These complementary attention mechanisms allow the model to capture a more comprehensive and accurate representation of the foreground. The experiments validate that TGA-ZSR and Comp-TGA yield 9.58% and 11.95% improvements respectively, in zero-shot robust accuracy over the current state-of-the-art techniques across 16 datasets.

[309]  arXiv:2603.18599 [pdf, ps, other]
Title: SJD-PAC: Accelerating Speculative Jacobi Decoding via Proactive Drafting and Adaptive Continuation
Comments: CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Speculative Jacobi Decoding (SJD) offers a draft-model-free approach to accelerate autoregressive text-to-image synthesis. However, the high-entropy nature of visual generation yields low draft-token acceptance rates in complex regions, creating a bottleneck that severely limits overall throughput. To overcome this, we introduce SJD-PAC, an enhanced SJD framework. First, SJD-PAC employs a proactive drafting strategy to improve local acceptance rates in these challenging high-entropy regions. Second, we introduce an adaptive continuation mechanism that sustains sequence validation after an initial rejection, bypassing the need for full resampling. Working in tandem, these optimizations significantly increase the average acceptance length per step, boosting inference speed while strictly preserving the target distribution. Experiments on standard text-to-image benchmarks demonstrate that SJD-PAC achieves a $3.8\times$ speedup with lossless image quality.

[310]  arXiv:2603.18600 [pdf, ps, other]
Title: Improving Joint Audio-Video Generation with Cross-Modal Context Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The dual-stream transformer architecture-based joint audio-video generation method has become the dominant paradigm in current research. By incorporating pre-trained video diffusion models and audio diffusion models, along with a cross-modal interaction attention module, high-quality, temporally synchronized audio-video content can be generated with minimal training data. In this paper, we first revisit the dual-stream transformer paradigm and further analyze its limitations, including model manifold variations caused by the gating mechanism controlling cross-modal interactions, biases in multi-modal background regions introduced by cross-modal attention, and the inconsistencies in multi-modal classifier-free guidance (CFG) during training and inference, as well as conflicts between multiple conditions. To alleviate these issues, we propose Cross-Modal Context Learning (CCL), equipped with several carefully designed modules. Temporally Aligned RoPE and Partitioning (TARP) effectively enhances the temporal alignment between audio latent and video latent representations. The Learnable Context Tokens (LCT) and Dynamic Context Routing (DCR) in the Cross-Modal Context Attention (CCA) module provide stable unconditional anchors for cross-modal information, while dynamically routing based on different training tasks, further enhancing the model's convergence speed and generation quality. During inference, Unconditional Context Guidance (UCG) leverages the unconditional support provided by LCT to facilitate different forms of CFG, improving train-inference consistency and further alleviating conflicts. Through comprehensive evaluations, CCL achieves state-of-the-art performance compared with recent academic methods while requiring substantially fewer resources.

[311]  arXiv:2603.18601 [pdf, ps, other]
Title: From Connectivity to Multi-Orbit Intelligence: Space-Based Data Center Architectures for 6G and Beyond
Subjects: Emerging Technologies (cs.ET)

Direct handset-to-satellite (DHTS) communication is emerging as a core capability of 6G non-terrestrial networks, enabling standard devices to directly access low Earth orbit (LEO) satellites. While LEO provides the physical access layer for DHTS, large-scale device connectivity introduces challenges in mobility management, interference control, spectrum efficiency, and constellation-wide coordination. Relay-only LEO architectures are insufficient to manage massive handset access under dynamic traffic and energy constraints. This article introduces a hierarchical architecture in which direct handset-to-LEO access is supported by multi-orbit space-based data centers (SBDCs) spanning LEO, medium Earth orbit (MEO), and geostationary Earth orbit (GEO). In this framework, LEO satellites handle radio access and real-time inference, while higher orbital layers provide regional aggregation, global orchestration, and compute-aware routing. By embedding distributed in-orbit computing, energy-aware scheduling, and AI-driven hierarchical control, the constellation evolves from a passive relay network into an intelligent multi-layer system capable of supporting large-scale DHTS services. We discuss key enabling technologies, envisioned multi-orbit integrated Earth-space compute architecture, and open research challenges in integrating multi-orbit computing, highlighting pathways toward scalable and resilient 6G DHTS networks.

[312]  arXiv:2603.18602 [pdf, ps, other]
Title: Cross-Layer Traffic Allocation and Contention Window Optimization for Wi-Fi 7 MLO: When DRL Meets LSTM
Comments: 13 pages, 9 figures
Subjects: Networking and Internet Architecture (cs.NI)

To support future diverse applications, multi-link operation (MLO) has been introduced in the Wi-Fi 7 standard (IEEE 802.11be) to enable concurrent communication over multiple frequency bands. This new capability relies on a two-tier medium access control (MAC) architecture, where the upper MAC (U-MAC) allocates traffic across links and the lower MAC (L-MAC) performs independent channel access. However, MLO optimization is challenging due to the inherent coupling between the U-MAC and L-MAC, as well as the dynamic and complex nature of wireless networks. To address these challenges, we propose a cross-layer framework that jointly optimizes traffic allocation at the U-MAC layer and initial contention window (ICW) sizes at the L-MAC layer to maximize network throughput. Specifically, we extend the single-link Bianchi Markov model to develop an analytical framework that captures the relationship among network throughput, traffic allocation, and ICW sizes. Based on this framework, we formulate a nonconvex, nonlinear cross-layer optimization problem. To solve it efficiently, we design a long short-term memory-based soft actor-critic (LSTM-SAC) algorithm that leverages LSTM to handle the partial observability and non-Markovian dynamics inherent in Wi-Fi networks. Finally, using a well-developed event-based Wi-Fi simulator, we demonstrate that the proposed LSTM-SAC substantially outperforms existing benchmark solutions across a wide range of network settings.

[313]  arXiv:2603.18604 [pdf, ps, other]
Title: AutORAN: LLM-driven Natural Language Programming for Agile xApp Development
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

Traditional RAN systems are closed and monolithic, stifling innovation. The openness and programmability enabled by Open Radio Access Network (O-RAN) are envisioned to revolutionize cellular networks with control-plane applications--xApps. The development of xApps (typically by third-party developers), however, remains time-consuming and cumbersome, often requiring months of manual coding and integration, which hinders the roll-out of new functionalities in practice. To lower the barrier of xApp development for both developers and network operators, we present AutORAN, the first LLM-driven natural language programming framework for agile xApps that automates the entire xApp development pipeline. In a nutshell, AutORAN turns high-level user intents into swiftly deployable xApps within minutes, eliminating the need for manual coding or testing. To this end, AutORAN builds a fully automated xApp generation pipeline, which integrates multiple functional modules (from user requirement elicitation, AI/ML function design and validation, to xApp synthesis and deployment). We design, implement, and comprehensively evaluate AutORAN on representative xApp tasks. Results show AutORAN-generated xApps can achieve similar or even better performance than the best known hand-crafted baselines. AutORAN drastically accelerates the xApp development cycle (from user intent elicitation to roll-out), streamlining O-RAN innovation.

[314]  arXiv:2603.18606 [pdf, ps, other]
Title: SQL-Commenter: Aligning Large Language Models for SQL Comment Generation with Direct Preference Optimization
Comments: Accepted to ICPC 2026
Subjects: Software Engineering (cs.SE)

SQL query comprehension is a significant challenge due to complex syntax, diverse join types, and deep nesting. Many queries lack adequate comments, severely hindering code readability, maintainability, and knowledge transfer. Automated SQL comment generation faces two main challenges: limited datasets that inadequately represent complex real-world queries, and Large Language Models' (LLMs) insufficient understanding of SQL-specific semantics. Our empirical analysis shows that even after continual pre-training and supervised fine-tuning, LLMs struggle with complex SQL semantics, yielding inaccurate comments. To address this, we propose SQL-Commenter, an advanced method based on LLaMA-3.1-8B. We first construct a comprehensive dataset of complex SQL queries with expert-verified comments. Next, we perform continual pre-training on a large SQL corpus to enhance the LLM's syntax and semantic understanding, followed by supervised fine-tuning. Finally, we introduce Direct Preference Optimization (DPO) using human feedback. SQL-Commenter utilizes a preference-based loss function to favor preferred outputs, enhancing fine-grained semantic learning and context-dependent quality assessment. Evaluated on the Spider and Bird benchmarks, SQL-Commenter significantly outperforms state-of-the-art baselines. On average, it surpasses the strongest baseline (Qwen3-14B) by 9.29, 4.99, and 13.23 percentage points on BLEU-4, METEOR, and ROUGE-L, respectively. Moreover, human evaluation demonstrates the superior quality of comments generated by SQL-Commenter in terms of correctness, completeness, and naturalness.

[315]  arXiv:2603.18608 [pdf, ps, other]
Title: A Complexity Hierarchy of Shuffles in Card-Based Protocols
Subjects: Cryptography and Security (cs.CR)

Card-based cryptography uses physical playing cards to construct protocols for secure multi-party computation. Existing card-based protocols employ various types of shuffles, some of which are easy to implement in practice while others are considerably more complex. In this paper, we classify shuffle operations into several levels according to their implementation complexity. We motivate this hierarchy from both practical and theoretical perspectives, and prove separation results between several levels by showing that certain shuffles cannot be realized using only operations from lower levels. Finally, we propose a new complexity measure for evaluating card-based protocols based on this hierarchy.

[316]  arXiv:2603.18611 [pdf, ps, other]
Title: Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media
Comments: Accepted at WWW 2026
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Advances in social media data dissemination enable the provision of real-time information during a crisis. The information comes from different classes, such as infrastructure damages, persons missing or stranded in the affected zone, etc. Existing methods attempted to classify text and images into various humanitarian categories, but their decision-making process remains largely opaque, which affects their deployment in real-life applications. Recent work has sought to improve transparency by extracting textual rationales from tweets to explain predicted classes. However, such explainable classification methods have mostly focused on text, rather than crisis-related images. In this paper, we propose an interpretable-by-design multimodal classification framework. Our method first learns the joint representation of text and image using a visual language transformer model and extracts text rationales. Next, it extracts the image rationales via the mapping with text rationales. Our approach demonstrates how to learn rationales in one modality from another through cross-modal rationale transfer, which saves annotation effort. Finally, tweets are classified based on extracted rationales. Experiments are conducted over CrisisMMD benchmark dataset, and results show that our proposed method boosts the classification Macro-F1 by 2-35% while extracting accurate text tokens and image patches as rationales. Human evaluation also supports the claim that our proposed method is able to retrieve better image rationale patches (12%) that help to identify humanitarian classes. Our method adapts well to new, unseen datasets in zero-shot mode, achieving an accuracy of 80%.

[317]  arXiv:2603.18612 [pdf, ps, other]
Title: DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units
Comments: 6 pages, 2 figures. Submitted to Interspeech 2026
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

We introduce DiscoPhon, a multilingual benchmark for evaluating unsupervised phoneme discovery from discrete speech units. DiscoPhon covers 6 dev and 6 test languages, chosen to span a wide range of phonemic contrasts. Given only 10 hours of speech in a previously unseen language, systems must produce discrete units that are mapped to a predefined phoneme inventory, through either a many-to-one or a one-to-one assignment. The resulting sequences are evaluated for unit quality, recognition and segmentation. We provide four pretrained multilingual HuBERT and SpidR baselines, and show that phonemic information is available enough in current models for derived units to correlate well with phonemes, though with variations across languages.

[318]  arXiv:2603.18613 [pdf, ps, other]
Title: Cyber-Resilient Digital Twins: Discriminating Attacks for Safe Critical Infrastructure Control
Comments: 19 Pages, 2 Figures, 12 Tables
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Industrial Cyber-Physical Systems (ICPS) face growing threats from cyber-attacks that exploit sensor and control vulnerabilities. Digital Twin (DT) technology can detect anomalies via predictive modelling, but current methods cannot distinguish attack types and often rely on costly full-system shutdowns. This paper presents i-SDT (intelligent Self-Defending DT), combining hydraulically-regularized predictive modelling, multi-class attack discrimination, and adaptive resilient control. Temporal Convolutional Networks (TCNs) with differentiable conservation constraints capture nominal dynamics and improve robustness to adversarial manipulations. A recurrent residual encoder with Maximum Mean Discrepancy (MMD) separates normal operation from single- and multi-stage attacks in latent space. When attacks are confirmed, Model Predictive Control (MPC) uses uncertainty-aware DT predictions to keep operations safe without shutdown. Evaluation on SWaT and WADI datasets shows major gains in detection accuracy, 44.1% fewer false alarms, and 56.3% lower operational costs in simulation-in-the-loop evaluation. with sub-second inference latency confirming real-time feasibility on plant-level workstations, i-SDT advances autonomous cyber-physical defense while maintaining operational resilience.

[319]  arXiv:2603.18614 [pdf, ps, other]
Title: ZEBRAARENA: A Diagnostic Simulation Environment for Studying Reasoning-Action Coupling in Tool-Augmented LLMs
Subjects: Artificial Intelligence (cs.AI)

Tool-augmented large language models (LLMs) must tightly couple multi-step reasoning with external actions, yet existing benchmarks often confound this interplay with complex environment dynamics, memorized knowledge or dataset contamination. In this paper, we introduce ZebraArena, a procedurally generated diagnostic environment for studying reasoning-action coupling in tool-augmented LLMs, with controllable difficulty and a knowledge-minimal design, which limits gains from memorization or dataset contamination. Each task in ZebraArena requires a set of critical information which is available only through targeted tool use, yielding an interpretable interface between external information acquisition and deductive reasoning. This design provides deterministic evaluation via unique solutions, and a theoretical optimal query count for measuring efficient tool use. We show that ZebraArena requires a combination of in-depth reasoning and accurate external tool calling, which remains a challenge as frontier reasoning models such as GPT-5 and Gemini 2.5 Pro only achieves 60% accuracy on the hard instances. We also observe a persistent gaps between theoretical optimality and practical tool usage. For example, GPT-5 uses 70-270% more tool calls than the theoretical optimum. We highlight the key findings in our evaluation, and hope ZebraArena stimulates further research on the interplay between internal reasoning and external action.

[320]  arXiv:2603.18616 [pdf, ps, other]
Title: Benchmarking CNN-based Models against Transformer-based Models for Abdominal Multi-Organ Segmentation on the RATIC Dataset
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate multi-organ segmentation in abdominal CT scans is essential for computer-aided diagnosis and treatment. While convolutional neural networks (CNNs) have long been the standard approach in medical image segmentation, transformer-based architectures have recently gained attention due to their ability to model long-range dependencies. In this study, we systematically benchmark the three hybrid transformer-based models UNETR, SwinUNETR, and UNETR++ against a strong CNN baseline, SegResNet, for volumetric multi-organ segmentation on the heterogeneous RATIC dataset. The dataset comprises 206 annotated CT scans from 23 institutions worldwide, covering five abdominal organs. All models were trained and evaluated under identical preprocessing and training conditions using the Dice Similarity Coefficient (DSC) as the primary metric. The results show that the CNN-based SegResNet achieves the highest overall performance, outperforming all hybrid transformer-based models across all organs. Among the transformer-based approaches, UNETR++ delivers the most competitive results, while UNETR demonstrates notably faster convergence with fewer training iterations. These findings suggest that, for small- to medium-sized heterogeneous datasets, well-optimized CNN architectures remain highly competitive and may outperform hybrid transformer-based designs.

[321]  arXiv:2603.18620 [pdf, ps, other]
Title: Learning to Self-Evolve
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We introduce Learning to Self-Evolve (LSE), a reinforcement learning framework that trains large language models (LLMs) to improve their own contexts at test time. We situate LSE in the setting of test-time self-evolution, where a model iteratively refines its context from feedback on seen problems to perform better on new ones. Existing approaches rely entirely on the inherent reasoning ability of the model and never explicitly train it for this task. LSE reduces the multi-step evolution problem to a single-step RL objective, where each context edit is rewarded by the improvement in downstream performance. We pair this objective with a tree-guided evolution loop. On Text-to-SQL generation (BIRD) and general question answering (MMLU-Redux), a 4B-parameter model trained with LSE outperforms self-evolving policies powered by GPT-5 and Claude Sonnet 4.5, as well as prompt optimization methods including GEPA and TextGrad, and transfers to guide other models without additional training. Our results highlight the effectiveness of treating self-evolution as a learnable skill.

[322]  arXiv:2603.18623 [pdf, ps, other]
Title: OpenT2M: No-frill Motion Generation with Open-source,Large-scale, High-quality Data
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Text-to-motion (T2M) generation aims to create realistic human movements from text descriptions, with promising applications in animation and robotics. Despite recent progress, current T2M models perform poorly on unseen text descriptions due to the small scale and limited diversity of existing motion datasets. To address this problem, we introduce OpenT2M, a million-level, high-quality, and open-source motion dataset containing over 2800 hours of human motion. Each sequence undergoes rigorous quality control through physical feasibility validation and multi-granularity filtering, with detailed second-wise text annotations. We also develop an automated pipeline for creating long-horizon sequences, enabling complex motion generation. Building upon OpenT2M, we introduce MonoFrill, a pretrained motion model that achieves compelling T2M results without complicated designs or technique tricks as "frills". Its core component is 2D-PRQ, a novel motion tokenizer that captures spatiotemporal dependencies by dividing the human body into biology parts. Experiments show that OpenT2M significantly improves generalization of existing T2M models, while 2D-PRQ achieves superior reconstruction and strong zero-shot performance. We expect OpenT2M and MonoFrill will advance the T2M field by addressing longstanding data quality and benchmarking challenges.

[323]  arXiv:2603.18624 [pdf, ps, other]
Title: REST: Receding Horizon Explorative Steiner Tree for Zero-Shot Object-Goal Navigation
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Zero-shot object-goal navigation (ZSON) requires navigating unknown environments to find a target object without task-specific training. Prior hierarchical training-free solutions invest in scene understanding (\textit{belief}) and high-level decision-making (\textit{policy}), yet overlook the design of \textit{option}, i.e., a subgoal candidate proposed from evolving belief and presented to policy for selection. In practice, options are reduced to isolated waypoints scored independently: single destinations hide the value gathered along the journey; an unstructured collection obscures the relationships among candidates. Our insight is that the option space should be a \textit{tree of paths}. Full paths expose en-route information gain that destination-only scoring systematically neglects; a tree of shared segments enables coarse-to-fine LLM reasoning that dismisses or pursues entire branches before examining individual leaves, compressing the combinatorial path space into an efficient hierarchy. We instantiate this insight in \textbf{REST} (Receding Horizon Explorative Steiner Tree), a training-free framework that (1) builds an explicit open-vocabulary 3D map from online RGB-D streams; (2) grows an agent-centric tree of safe and informative paths as the option space via sampling-based planning; and (3) textualizes each branch into a spatial narrative and selects the next-best path through chain-of-thought LLM reasoning. Across the Gibson, HM3D, and HSSD benchmarks, REST consistently ranks among the top methods in success rate while achieving the best or second-best path efficiency, demonstrating a favorable efficiency-success balance.

[324]  arXiv:2603.18625 [pdf, ps, other]
Title: GenVideoLens: Where LVLMs Fall Short in AI-Generated Video Detection?
Comments: ECCV 2026 submission. 14 pages, 6 figures, 4 tables. Supplementary material included
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years, AI-generated videos have become increasingly realistic and sophisticated. Meanwhile, Large Vision-Language Models (LVLMs) have shown strong potential for detecting such content. However, existing evaluation protocols largely treat the task as a binary classification problem and rely on coarse-grained metrics such as overall accuracy, providing limited insight into where LVLMs succeed or fail. To address this limitation, we introduce GenVideoLens, a fine-grained benchmark that enables dimension-wise evaluation of LVLM capabilities in AI-generated video detection. The benchmark contains 400 highly deceptive AI-generated videos and 100 real videos, annotated by experts across 15 authenticity dimensions covering perceptual, optical, physical, and temporal cues. We evaluate eleven representative LVLMs on this benchmark. Our analysis reveals a pronounced dimensional imbalance. While LVLMs perform relatively well on perceptual cues, they struggle with optical consistency, physical interactions, and temporal-causal reasoning. Model performance also varies substantially across dimensions, with smaller open-source models sometimes outperforming stronger proprietary models on specific authenticity cues. Temporal perturbation experiments further show that current LVLMs make limited use of temporal information. Overall, GenVideoLens provides diagnostic insights into LVLM behavior, revealing key capability gaps and offering guidance for improving future AI-generated video detection systems.

[325]  arXiv:2603.18626 [pdf, ps, other]
Title: GEAR: Geography-knowledge Enhanced Analog Recognition Framework in Extreme Environments
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The Mariana Trench and the Qinghai-Tibet Plateau exhibit significant similarities in geological origins and microbial metabolic functions. Given that deep-sea biological sampling faces prohibitive costs, recognizing structurally homologous terrestrial analogs of the Mariana Trench on the Qinghai-Tibet Plateau is of great significance. Yet, no existing model adequately addresses cross-domain topographic similarity retrieval, either neglecting geographical knowledge or sacrificing computational efficiency. To address these challenges, we present \underline{\textbf{G}}eography-knowledge \underline{\textbf{E}}nhanced \underline{\textbf{A}}nalog \underline{\textbf{R}}ecognition (\textbf{GEAR}) Framework, a three-stage pipeline designed to efficiently retrieve analogs from 2.5 million square kilometers of the Qinghai-Tibet Plateau: (1) Skeleton guided Screening and Clipping: Recognition of candidate valleys and initial screening based on size and linear morphological criteria. (2) Physics aware Filtering: The Topographic Waveform Comparator (TWC) and Morphological Texture Module (MTM) evaluate the waveform and texture and filter out inconsistent candidate valleys. (3) Graph based Fine Recognition: We design a \underline{\textbf{M}}orphology-integrated \underline{\textbf{S}}iamese \underline{\textbf{G}}raph \underline{\textbf{N}}etwork (\textbf{MSG-Net}) based on geomorphological metrics. Correspondingly, we release an expert-annotated topographic similarity dataset targeting tectonic collision zones. Experiments demonstrate the effectiveness of every stage. Besides, MSG-Net achieved an F1-Score 1.38 percentage points higher than the SOTA baseline. Using features extracted by MSG-Net, we discovered a significant correlation with biological data, providing evidence for future biological analysis.

[326]  arXiv:2603.18627 [pdf, ps, other]
Title: Agentic Flow Steering and Parallel Rollout Search for Spatially Grounded Text-to-Image Generation
Subjects: Artificial Intelligence (cs.AI)

Precise Text-to-Image (T2I) generation has achieved great success but is hindered by the limited relational reasoning of static text encoders and the error accumulation in open-loop sampling. Without real-time feedback, initial semantic ambiguities during the Ordinary Differential Equation trajectory inevitably escalate into stochastic deviations from spatial constraints. To bridge this gap, we introduce AFS-Search (Agentic Flow Steering and Parallel Rollout Search), a training-free closed-loop framework built upon FLUX.1-dev. AFS-Search incorporates a training-free closed-loop parallel rollout search and flow steering mechanism, which leverages a Vision-Language Model (VLM) as a semantic critic to diagnose intermediate latents and dynamically steer the velocity field via precise spatial grounding. Complementarily, we formulate T2I generation as a sequential decision-making process, exploring multiple trajectories through lookahead simulations and selecting the optimal path based on VLM-guided rewards. Further, we provide AFS-Search-Pro for higher performance and AFS-Search-Fast for quicker generation. Experimental results show that our AFS-Search-Pro greatly boosts the performance of the original FLUX.1-dev, achieving state-of-the-art results across three different benchmarks. Meanwhile, AFS-Search-Fast also significantly enhances performance while maintaining fast generation speed.

[327]  arXiv:2603.18631 [pdf, ps, other]
Title: D-Mem: A Dual-Process Memory System for LLM Agents
Subjects: Artificial Intelligence (cs.AI)

Driven by the development of persistent, self-adapting autonomous agents, equipping these systems with high-fidelity memory access for long-horizon reasoning has emerged as a critical requirement. However, prevalent retrieval-based memory frameworks often follow an incremental processing paradigm that continuously extracts and updates conversational memories into vector databases, relying on semantic retrieval when queried. While this approach is fast, it inherently relies on lossy abstraction, frequently missing contextually critical information and struggling to resolve queries that rely on fine-grained contextual understanding. To address this, we introduce D-Mem, a dual-process memory system. It retains lightweight vector retrieval for routine queries while establishing an exhaustive Full Deliberation module as a high-fidelity fallback. To achieve cognitive economy without sacrificing accuracy, D-Mem employs a Multi-dimensional Quality Gating policy to dynamically bridge these two processes. Experiments on the LoCoMo and RealTalk benchmarks using GPT-4o-mini and Qwen3-235B-Instruct demonstrate the efficacy of our approach. Notably, our Multi-dimensional Quality Gating policy achieves an F1 score of 53.5 on LoCoMo with GPT-4o-mini. This outperforms our static retrieval baseline, Mem0$^\ast$ (51.2), and recovers 96.7\% of the Full Deliberation's performance (55.3), while incurring significantly lower computational costs.

[328]  arXiv:2603.18633 [pdf, ps, other]
Title: An Onto-Relational-Sophic Framework for Governing Synthetic Minds
Comments: 9 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)

The rapid evolution of artificial intelligence, from task-specific systems to foundation models exhibiting broad, flexible competence across reasoning, creative synthesis, and social interaction, has outpaced the conceptual and governance frameworks designed to manage it. Current regulatory paradigms, anchored in a tool-centric worldview, address algorithmic bias and transparency but leave unanswered foundational questions about what increasingly capable synthetic minds are, how societies should relate to them, and the normative principles that should guide their development. Here we introduce the Onto-Relational-Sophic (ORS) framework, grounded in Cyberism philosophy, which offers integrated answers to these challenges through three pillars: (1) a Cyber-Physical-Social-Thinking (CPST) ontology that defines the mode of being for synthetic minds as irreducibly multi-dimensional rather than purely computational; (2) a graded spectrum of digital personhood providing a pragmatic relational taxonomy beyond binary person-or-tool classifications; and (3) Cybersophy, a wisdom-oriented axiology synthesizing virtue ethics, consequentialism, and relational approaches to guide governance. We apply the framework to emergent scenarios including autonomous research agents, AI-mediated healthcare, and agentic AI ecosystems, demonstrating its capacity to generate proportionate, adaptive governance recommendations. The ORS framework charts a path from narrow technical alignment toward comprehensive philosophical foundations for the synthetic minds already among us.

[329]  arXiv:2603.18634 [pdf, ps, other]
Title: SwiftGS: Episodic Priors for Immediate Satellite Surface Recovery
Comments: 24 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Rapid, large-scale 3D reconstruction from multi-date satellite imagery is vital for environmental monitoring, urban planning, and disaster response, yet remains difficult due to illumination changes, sensor heterogeneity, and the cost of per-scene optimization. We introduce SwiftGS, a meta-learned system that reconstructs 3D surfaces in a single forward pass by predicting geometry-radiation-decoupled Gaussian primitives together with a lightweight SDF, replacing expensive per-scene fitting with episodic training that captures transferable priors. The model couples a differentiable physics graph for projection, illumination, and sensor response with spatial gating that blends sparse Gaussian detail and global SDF structure, and incorporates semantic-geometric fusion, conditional lightweight task heads, and multi-view supervision from a frozen geometric teacher under an uncertainty-aware multi-task loss. At inference, SwiftGS operates zero-shot with optional compact calibration and achieves accurate DSM reconstruction and view-consistent rendering at significantly reduced computational cost, with ablations highlighting the benefits of the hybrid representation, physics-aware rendering, and episodic meta-training.

[330]  arXiv:2603.18636 [pdf, ps, other]
Title: Training-Free Sparse Attention for Fast Video Generation via Offline Layer-Wise Sparsity Profiling and Online Bidirectional Co-Clustering
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion Transformers (DiTs) achieve strong video generation quality but suffer from high inference cost due to dense 3D attention, leading to the development of sparse attention technologies to improve efficiency. However, existing training-free sparse attention methods in video generation still face two unresolved limitations: ignoring layer heterogeneity in attention pruning and ignoring query-key coupling in block partitioning, which hinder a better quality-speedup trade-off. In this work, we uncover a critical insight that the attention sparsity of each layer is its intrinsic property, with minor effects across different inputs. Motivated by this, we propose SVOO, a training-free Sparse attention framework for fast Video generation via Offline layer-wise sparsity profiling and Online bidirectional co-clustering. Specifically, SVOO adopts a two-stage paradigm: (i) offline layer-wise sensitivity profiling to derive intrinsic per-layer pruning levels, and (ii) online block-wise sparse attention via a novel bidirectional co-clustering algorithm. Extensive experiments on seven widely used video generation models demonstrate that SVOO achieves a superior quality-speedup trade-off over state-of-the-art methods, delivering up to $1.93\times$ speedup while maintaining a PSNR of up to 29 dB on Wan2.1.

[331]  arXiv:2603.18637 [pdf, ps, other]
Title: MOSAIC: Multi-Objective Slice-Aware Iterative Curation for Alignment
Authors: Yipu Dou, Wang Yang
Comments: 9 pages, 5 figures. Code available at this https URL
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)

We study how to allocate a fixed supervised fine-tuning budget when three objectives must be balanced at once: multi-turn safety alignment, low over-refusal on benign boundary queries, and instruction following under verifiable constraints. We propose MOSAIC (Multi-Objective Slice-Aware Iterative Curation for Alignment), a multi-objective framework for closed-loop data mixture search built on a unified L1-L3 evaluation interface. MOSAIC turns slice-level failure profiles into executable data actions, including dataset-level mixture ratios, bucket-level weights, and focus criteria. Under a fixed 1M-token budget and five rounds of independent fine-tuning from the same base model, MOSAIC improves internal XGuard from 2.76 to 4.67 while keeping OrBench at 4.41 and IFEval at 3.65. The final Pareto solution also generalizes better than a random static LoRA baseline on independent attack, over-refusal, and capability tests, suggesting that structured failure diagnosis can serve as a practical control signal for budgeted data construction. Code is available at https://github.com/douyipu/mosaic.

[332]  arXiv:2603.18639 [pdf, ps, other]
Title: PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent progress in video generation has led to substantial improvements in visual fidelity, yet ensuring physically consistent motion remains a fundamental challenge. Intuitively, this limitation can be attributed to the fact that real-world object motion unfolds in three-dimensional space, while video observations provide only partial, view-dependent projections of such dynamics. To address these issues, we propose PhysVideo, a two-stage framework that first generates physics-aware orthogonal foreground videos and then synthesizes full videos with background. In the first stage, Phys4View leverages physics-aware attention to capture the influence of physical attributes on motion dynamics, and enhances spatio-temporal consistency by incorporating geometry-enhanced cross-view attention and temporal attention. In the second stage, VideoSyn uses the generated foreground videos as guidance and learns the interactions between foreground dynamics and background context for controllable video synthesis. To support training, we construct PhysMV, a dataset containing 40K scenes, each consisting of four orthogonal viewpoints, resulting in a total of 160K video sequences. Extensive experiments demonstrate that PhysVideo significantly improves physical realism and spatial-temporal coherence over existing video generation methods. Home page: https://anonymous.4open.science/w/Phys4D/.

[333]  arXiv:2603.18641 [pdf, ps, other]
Title: A Comparative Empirical Study of Catastrophic Forgetting Mitigation in Sequential Task Adaptation for Continual Natural Language Processing Systems
Subjects: Computation and Language (cs.CL)

Neural language models deployed in real-world applications must continually adapt to new tasks and domains without forgetting previously acquired knowledge. This work presents a comparative empirical study of catastrophic forgetting mitigation in continual intent classification. Using the CLINC150 dataset, we construct a 10-task label-disjoint scenario and evaluate three backbone architectures: a feed-forward Artificial Neural Network (ANN), a Gated Recurrent Unit (GRU), and a Transformer encoder, under a range of continual learning (CL) strategies. We consider one representative method from each major CL family: replay-based Maximally Interfered Retrieval (MIR), regularization-based Learning without Forgetting (LwF), and parameter-isolation via Hard Attention to Task (HAT), both individually and in all pairwise and triple combinations. Performance is assessed with average accuracy, macro F1, and backward transfer, capturing the stability-plasticity trade-off across the task sequence. Our results show that naive sequential fine-tuning suffers from severe forgetting for all architectures and that no single CL method fully prevents it. Replay emerges as a key ingredient: MIR is the most reliable individual strategy, and combinations that include replay (MIR+HAT, MIR+LwF, MIR+LwF+HAT) consistently achieve high final performance with near-zero or mildly positive backward transfer. The optimal configuration is architecture-dependent. MIR+HAT yields the best result for ANN and Transformer, MIR+LwF+HAT, on the other hand, works the best for GRU, and in several cases CL methods even surpass joint training, indicating a regularization effect. These findings highlight the importance of jointly selecting backbone architecture and CL mechanism when designing continual intent-classification systems.

[334]  arXiv:2603.18642 [pdf, ps, other]
Title: Evaluating Model-Free Policy Optimization in Masked-Action Environments via an Exact Blackjack Oracle
Authors: Kevin Song
Comments: 23 pages, 2 figures, 3 tables, 6 supplementary figures
Subjects: Machine Learning (cs.LG)

Infinite-shoe casino blackjack provides a rigorous, exactly verifiable benchmark for discrete stochastic control under dynamically masked actions. Under a fixed Vegas-style ruleset (S17, 3:2 payout, dealer peek, double on any two, double after split, resplit to four), an exact dynamic programming (DP) oracle was derived over 4,600 canonical decision cells. This oracle yielded ground-truth action values, optimal policy labels, and a theoretical expected value (EV) of -0.00161 per hand. To evaluate sample-efficient policy recovery, three model-free optimizers were trained via simulated interaction: masked REINFORCE with a per-cell exponential moving average baseline, simultaneous perturbation stochastic approximation (SPSA), and the cross-entropy method (CEM). REINFORCE was the most sample-efficient, achieving a 46.37% action-match rate and an EV of -0.04688 after 10^6 hands, outperforming CEM (39.46%, 7.5x10^6 evaluations) and SPSA (38.63%, 4.8x10^6 evaluations). However, all methods exhibited substantial cell-conditional regret, indicating persistent policy-level errors despite smooth reward convergence. This gap shows that tabular environments with severe state-visitation sparsity and dynamic action masking remain challenging, while aggregate reward curves can obscure critical local failures. As a negative control, it was proven and empirically confirmed that under i.i.d. draws without counting, optimal bet sizing collapses to the table minimum. In addition, larger wagers strictly increased volatility and ruin without improving expectation. These results highlight the need for exact oracles and negative controls to avoid mistaking stochastic variability for genuine algorithmic performance.

[335]  arXiv:2603.18645 [pdf, ps, other]
Title: MeInTime: Bridging Age Gap in Identity-Preserving Face Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV)

To better preserve an individual's identity, face restoration has evolved from reference-free to reference-based approaches, which leverage high-quality reference images of the same identity to enhance identity fidelity in the restored outputs. However, most existing methods implicitly assume that the reference and degraded input are age-aligned, limiting their effectiveness in real-world scenarios where only cross-age references are available, such as historical photo restoration. This paper proposes MeInTime, a diffusion-based face restoration method that extends reference-based restoration from same-age to cross-age settings. Given one or few reference images along with an age prompt corresponding to the degraded input, MeInTime achieves faithful restoration with both identity fidelity and age consistency. Specifically, we decouple the modeling of identity and age conditions. During training, we focus solely on effectively injecting identity features through a newly introduced attention mechanism and introduce Gated Residual Fusion modules to facilitate the integration between degraded features and identity representations. At inference, we propose Age-Aware Gradient Guidance, a training-free sampling strategy, using an age-driven direction to iteratively nudge the identity-aware denoising latent toward the desired age semantic manifold. Extensive experiments demonstrate that MeInTime outperforms existing face restoration methods in both identity preservation and age consistency. Our code is available at: https://github.com/teer4/MeInTime

[336]  arXiv:2603.18647 [pdf, ps, other]
Title: Beyond TVLA: Anderson-Darling Leakage Assessment for Neural Network Side-Channel Leakage Detection
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Test Vector Leakage Assessment (TVLA) based on Welch's $t$-test has become a standard tool for detecting side-channel leakage. However, its mean-based nature can limit sensitivity when leakage manifests primarily through higher-order distributional differences. As our experiments show, this property becomes especially crucial when it comes to evaluating neural network implementations. In this work, we propose Anderson--Darling Leakage Assessment (ADLA), a leakage detection framework that applies the two-sample Anderson--Darling test for leakage detection. Unlike TVLA, ADLA tests equality of the full cumulative distribution functions and does not rely on a purely mean-shift model.
We evaluate ADLA on a multilayer perceptron (MLP) trained on MNIST and implemented on a ChipWhisperer-Husky evaluation platform. We consider protected implementations employing shuffling and random jitter countermeasures. Our results show that ADLA can provide improved leakage-detection sensitivity in protected implementations for a low number of traces compared to TVLA.

[337]  arXiv:2603.18649 [pdf, ps, other]
Title: Click-to-Ask: An AI Live Streaming Assistant with Offline Copywriting and Online Interactive QA
Comments: 4 pages, 2 figures, Accepted at WWW2026 Demos
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Live streaming commerce has become a prominent form of broadcasting in the modern era. To facilitate more efficient and convenient product promotions for streamers, we present Click-to-Ask, an AI-driven assistant for live streaming commerce with complementary offline and online components. The offline module processes diverse multimodal product information, transforming complex inputs into structured product data and generating compliant promotional copywriting. During live broadcasts, the online module enables real-time responses to viewer inquiries by allowing streamers to click on questions and leveraging both the structured product information generated by the offline module and an event-level historical memory maintained in a streaming architecture. This system significantly reduces the time needed for promotional preparation, enhances content engagement, and enables prompt interaction with audience inquiries, ultimately improving the effectiveness of live streaming commerce. On our collected dataset of TikTok live stream frames, the proposed method achieves a Question Recognition Accuracy of 0.913 and a Response Quality score of 0.876, demonstrating considerable potential for practical application. The video demonstration can be viewed here: https://www.youtube.com/shorts/mWIXK-SWhiE.

[338]  arXiv:2603.18652 [pdf, ps, other]
Title: Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation
Comments: Submitted to ICDAR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Reliably extracting tables from PDFs is essential for large-scale scientific data mining and knowledge base construction, yet existing evaluation approaches rely on rule-based metrics that fail to capture semantic equivalence of table content. We present a benchmarking framework based on synthetically generated PDFs with precise LaTeX ground truth, using tables sourced from arXiv to ensure realistic complexity and diversity. As our central methodological contribution, we apply LLM-as-a-judge for semantic table evaluation, integrated into a matching pipeline that accommodates inconsistencies in parser outputs. Through a human validation study comprising over 1,500 quality judgments on extracted table pairs, we show that LLM-based evaluation achieves substantially higher correlation with human judgment (Pearson r=0.93) compared to Tree Edit Distance-based Similarity (TEDS, r=0.68) and Grid Table Similarity (GriTS, r=0.70). Evaluating 21 contemporary PDF parsers across 100 synthetic documents containing 451 tables reveals significant performance disparities. Our results offer practical guidance for selecting parsers for tabular data extraction and establish a reproducible, scalable evaluation methodology for this critical task.
Code and data: https://github.com/phorn1/pdf-parse-bench Metric study and human evaluation: https://github.com/phorn1/table-metric-study

[339]  arXiv:2603.18654 [pdf, ps, other]
Title: QuaQue: Design and SQL Implementation of Condensed Algebra for Concurrent Versioning of Knowledge Graphs
Comments: 11 pages, 6 figures, DBKDA conference
Journal-ref: DBKDA 2026, The Eighteenth International Conference on Advances in Databases, Knowledge, and Data Applications
Subjects: Databases (cs.DB)

The management of versioned knowledge graphs presents significant challenges, particularly in querying data across multiple versions efficiently. This paper introduces QuaQue, a key component of the ConVer-G system, which addresses this challenge by translating SPARQL (SPARQL Protocol and RDF Query Language) queries into SQL (Structured Query Language). QuaQue leverages a novel condensed algebra to operate on a relational model where versioning information is compactly stored using bitstrings. This approach allows for efficient querying of concurrent versions of knowledge graphs within a standard relational database system. We present the key concepts of our condensed algebra, detail the translation process from SPARQL algebra to SQL, and provide a comparative benchmark against a native RDF (Resource Description Framework) triple store, demonstrating the viability and performance benefits of our approach.

[340]  arXiv:2603.18655 [pdf, ps, other]
Title: Multiscale Switch for Semi-Supervised and Contrastive Learning in Medical Ultrasound Image Segmentation
Comments: This is the author-submitted LaTeX version with original typesetting. The final published version (with IEEE production formatting and layout changes) is available at this http URL under CC BY 4.0 license
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Medical ultrasound image segmentation faces significant challenges due to limited labeled data and characteristic imaging artifacts including speckle noise and low-contrast boundaries. While semi-supervised learning (SSL) approaches have emerged to address data scarcity, existing methods suffer from suboptimal unlabeled data utilization and lack robust feature representation mechanisms. In this paper, we propose Switch, a novel SSL framework with two key innovations: (1) Multiscale Switch (MSS) strategy that employs hierarchical patch mixing to achieve uniform spatial coverage; (2) Frequency Domain Switch (FDS) with contrastive learning that performs amplitude switching in Fourier space for robust feature representations. Our framework integrates these components within a teacher-student architecture to effectively leverage both labeled and unlabeled data. Comprehensive evaluation across six diverse ultrasound datasets (lymph nodes, breast lesions, thyroid nodules, and prostate) demonstrates consistent superiority over state-of-the-art methods. At 5\% labeling ratio, Switch achieves remarkable improvements: 80.04\% Dice on LN-INT, 85.52\% Dice on DDTI, and 83.48\% Dice on Prostate datasets, with our semi-supervised approach even exceeding fully supervised baselines. The method maintains parameter efficiency (1.8M parameters) while delivering superior performance, validating its effectiveness for resource-constrained medical imaging applications. The source code is publicly available at https://github.com/jinggqu/Switch

[341]  arXiv:2603.18656 [pdf, ps, other]
Title: Balanced Thinking: Improving Chain of Thought Training in Vision Language Models
Subjects: Artificial Intelligence (cs.AI)

Multimodal reasoning in vision-language models (VLMs) typically relies on a two-stage process: supervised fine-tuning (SFT) and reinforcement learning (RL). In standard SFT, all tokens contribute equally to the loss, even though reasoning data are inherently token-imbalanced. Long <think> traces overshadow short but task-critical <answer> segments, leading to verbose reasoning and inaccurate answers. We propose SCALe (Scheduled Curriculum Adaptive Loss), which explicitly separates supervision over reasoning and answer segments using dynamic, length-independent weighting. Unlike vanilla SFT, which overweights the <think> segment, SCALe-SFT gradually shifts the focus from <think> to <answer> throughout training via a cosine scheduling policy, encouraging concise and well-grounded reasoning. We evaluate SCALe across diverse benchmarks and architectures. Results show that SCALe consistently improves accuracy over vanilla SFT and matches the performance of the full two-phase SFT + GRPO pipeline while requiring only about one-seventh of the training time, making it a lightweight yet effective alternative. When combined with GRPO, SCALe achieves the best overall performance, highlighting its value both as a standalone method and as a strong foundation for reinforcement refinement.

[342]  arXiv:2603.18657 [pdf, ps, other]
Title: Enhancing Multi-Corpus Training in SSL-Based Anti-Spoofing Models: Domain-Invariant Feature Extraction
Subjects: Machine Learning (cs.LG)

The performance of speech spoofing detection often varies across different training and evaluation corpora. Leveraging multiple corpora typically enhances robustness and performance in fields like speaker recognition and speech recognition. However, our spoofing detection experiments show that multi-corpus training does not consistently improve performance and may even degrade it. We hypothesize that dataset-specific biases impair generalization, leading to performance instability. To address this, we propose an Invariant Domain Feature Extraction (IDFE) framework, employing multi-task learning and a gradient reversal layer to minimize corpus-specific information in learned embeddings. The IDFE framework reduces the average equal error rate by 20% compared to the baseline, assessed across four varied datasets.

[343]  arXiv:2603.18658 [pdf, ps, other]
Title: Mean-field control barrier functions for stochastic multi-agent systems
Subjects: Systems and Control (eess.SY)

Many applications involving multi-agent systems require fulfilling safety constraints. Control barrier functions offer a systematic framework to enforce forward invariance of safety sets. Recent work extended this paradigm to mean-field scenarios, where the number of agents is large enough to make density-space descriptions a reasonable workaround for the curse of dimensionality. However, an open gap in the recent literature concerns the development of mean-field control barrier functions for Fokker-Planck (advection-diffusion) equations. In this work, we address this gap, enabling safe mean-field control of agents with stochastic microscopic dynamics. We provide bounded stability guarantees under safety corrections and corroborate our results through numerical simulations in two representative scenarios, coverage and shepherding control of multi-agent systems.

[344]  arXiv:2603.18660 [pdf, ps, other]
Title: Multimodal Model for Computational Pathology:Representation Learning and Image Compression
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Whole slide imaging (WSI) has transformed digital pathology by enabling computational analysis of gigapixel histopathology images. Recent foundation model advances have accelerated progress in computational pathology, facilitating joint reasoning across pathology images, clinical reports, and structured data. Despite this progress, challenges remain: the extreme resolution of WSIs creates computational hurdles for visual learning; limited expert annotations constrain supervised approaches; integrating multimodal information while preserving biological interpretability remains difficult; and the opacity of modeling ultra-long visual sequences hinders clinical transparency. This review comprehensively surveys recent advances in multimodal computational pathology. We systematically analyze four research directions: (1) self-supervised representation learning and structure-aware token compression for WSIs; (2) multimodal data generation and augmentation; (3) parameter-efficient adaptation and reasoning-enhanced few-shot learning; and (4) multi-agent collaborative reasoning for trustworthy diagnosis. We specifically examine how token compression enables cross-scale modeling and how multi-agent mechanisms simulate a pathologist's "Chain of Thought" across magnifications to achieve uncertainty-aware evidence fusion. Finally, we discuss open challenges and argue that future progress depends on unified multimodal frameworks integrating high-resolution visual data with clinical and biomedical knowledge to support interpretable and safe AI-assisted diagnosis.

[345]  arXiv:2603.18662 [pdf, ps, other]
Title: Thinking with Constructions: A Benchmark and Policy Optimization for Visual-Text Interleaved Geometric Reasoning
Subjects: Artificial Intelligence (cs.AI)

Geometric reasoning inherently requires "thinking with constructions" -- the dynamic manipulation of visual aids to bridge the gap between problem conditions and solutions. However, existing Multimodal Large Language Models (MLLMs) are largely confined to passive inference with static diagrams, lacking the strategic knowledge of when and how to construct effective visual aids. To address this, we present a framework for Visual-Text Interleaved Chain-of-Thought. We first introduce GeoAux-Bench, the first benchmark comprising 4,334 geometry problems that aligns textual construction steps with ground-truth visual updates. Our pilot study reveals two critical insights: (1) interleaved visual-textual aids outperform single-modality counterparts, which cannot losslessly capture geometric synergy; and (2) valid constructions act as entropy reducers, strongly correlating with reduced reasoning perplexity. Building on these findings, we propose Action Applicability Policy Optimization (A2PO), a reinforcement learning paradigm for mastering strategic construction. A2PO employs Adaptive Reward Shaping to regulate the timing and quality of visual aids via counterfactual sampling to distinguish necessary from redundant constructions. Experiments demonstrate our approach enables MLLMs to leverage selective auxiliary constructions, yielding a 3.51% gain over strong baselines. Code and data are available on GitHub.

[346]  arXiv:2603.18668 [pdf, ps, other]
Title: Complexity of Auctions with Interdependence
Subjects: Computer Science and Game Theory (cs.GT); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)

We study auction design in the celebrated interdependence model introduced by Milgrom and Weber [1982], where a mechanism designer allocates a good, maximizing the value of the agent who receives it, while inducing truthfulness using payments. In the lesser-studied procurement auctions, one allocates a chore, minimizing the cost incurred by the agent selected to perform it.
Most of the past literature in theoretical computer science considers designing truthful mechanisms with constant approximation for the value setting, with restricted domains and monotone valuation functions.
In this work, we study the general computational problems of optimizing the approximation ratio of truthful mechanism, for both value and cost, in the deterministic and randomized settings. Unlike most previous works, we remove the domain restriction and the monotonicity assumption imposed on value functions. We provide theoretical explanations for why some previously considered special cases are tractable, reducing them to classical combinatorial problems, and providing efficient algorithms and characterizations. We complement our positive results with hardness results for the general case, providing query complexity lower bounds, and proving the NP-Hardness of the general case.

[347]  arXiv:2603.18669 [pdf, ps, other]
Title: CSSDF-Net: Safe Motion Planning Based on Neural Implicit Representations of Configuration Space Distance Field
Subjects: Robotics (cs.RO)

High-dimensional manipulator operation in unstructured environments requires a differentiable, scene-agnostic distance query mechanism to guide safe motion generation. Existing geometric collision checkers are typically non-differentiable, while workspace-based implicit distance models are hindered by the highly nonlinear workspace--configuration mapping and often suffer from poor convergence; moreover, self-collision and environment collision are commonly handled as separate constraints. We propose Configuration-Space Signed Distance Field-Net (CSSDF-Net), which learns a continuous signed distance field directly in configuration space to provide joint-space distance and gradient queries under a unified geometric notion of safety. To enable zero-shot generalization without environment-specific retraining, we introduce a spatial-hashing-based data generation pipeline that encodes robot-centric geometric priors and supports efficient retrieval of risk configurations for arbitrary obstacle point sets. The learned distance field is integrated into safety-constrained trajectory optimization and receding-horizon MPC, enabling both offline planning and online reactive avoidance. Experiments on a planar arm and a 7-DoF manipulator demonstrate stable gradients, effective collision avoidance in static and dynamic scenes, and practical inference latency for large-scale point-cloud queries, supporting deployment in previously unseen environments.

[348]  arXiv:2603.18670 [pdf, ps, other]
Title: Masking Intent, Sustaining Equilibrium: Risk-Aware Potential Game-empowered Two-Stage Mobile Crowdsensing
Subjects: Networking and Internet Architecture (cs.NI)

Beyond data collection, future mobile crowdsensing (MCS) in complex applications must satisfy diverse requirements, including reliable task completion, budget and quality constraints, and fluctuating worker availability. Besides raw-data and location privacy, workers' intent/preference traces can be exploited by an honest-but-curious platform, enabling intent inference from repeated observations and frequency profiling. Meanwhile, worker dropouts and execution uncertainty may cause coverage instability and redundant sensing, while repeated global online re-optimization incurs high interaction overhead and enlarges the observable attack surface. To address these issues, we propose iParts, an intent-preserving and risk-controllable two-stage service provisioning framework for dynamic MCS. In the offline stage, workers report perturbed intent vectors via personalized local differential privacy with memorization/permanent randomization, suppressing frequency-based inference while preserving decision utility. Using only perturbed intents, the platform builds a redundancy-aware quality model and performs risk-aware pre-planning under budget, individual rationality, quality-failure risk, and intent-mismatch risk constraints. We formulate offline pre-planning as an exact potential game with expected social welfare as the potential function, ensuring a constrained pure-strategy Nash equilibrium and finite-step convergence under asynchronous feasible improvements. In the online stage, when runtime dynamics cause quality deficits, a temporary-recruitment potential game over idle/standby workers enables lightweight remediation with bounded interaction rounds and low observability. Experiments show that iParts achieves a favorable privacy-utility-efficiency trade-off, improving welfare and task completion while reducing redundancy and communication overhead compared with representative baselines.

[349]  arXiv:2603.18671 [pdf, ps, other]
Title: Towards High-Quality Image Segmentation: Improving Topology Accuracy by Penalizing Neighbor Pixels
Comments: Accepted to CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Standard deep learning models for image segmentation cannot guarantee topology accuracy, failing to preserve the correct number of connected components or structures. This, in turn, affects the quality of the segmentations and compromises the reliability of the subsequent quantification analyses. Previous works have proposed to enhance topology accuracy with specialized frameworks, architectures, and loss functions. However, these methods are often cumbersome to integrate into existing training pipelines, they are computationally very expensive, or they are restricted to structures with tubular morphology. We present SCNP, an efficient method that improves topology accuracy by penalizing the logits with their poorest-classified neighbor, forcing the model to improve the prediction at the pixels' neighbors before allowing it to improve the pixels themselves. We show the effectiveness of SCNP across 13 datasets, covering different structure morphologies and image modalities, and integrate it into three frameworks for semantic and instance segmentation. Additionally, we show that SCNP can be integrated into several loss functions, making them improve topology accuracy. Our code can be found at https://jmlipman.github.io/SCNP-SameClassNeighborPenalization.

[350]  arXiv:2603.18676 [pdf, ps, other]
Title: MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation
Subjects: Artificial Intelligence (cs.AI)

MANAR (Memory-augmented Attention with Navigational Abstract Conceptual Representation), contextualization layer generalizes standard multi-head attention (MHA) by instantiating the principles of Global Workspace Theory (GWT). While MHA enables unconstrained all-to-all communication, it lacks the functional bottleneck and global integration mechanisms hypothesized in cognitive models of consciousness. MANAR addresses this by implementing a central workspace through a trainable memory of abstract concepts and an Abstract Conceptual Representation (ACR). The architecture follows a two-stage logic that maps directly to GWT mechanics: (i) an integration phase, where retrieved memory concepts converge to form a collective "mental image" (the ACR) based on input stimuli; and (ii) a broadcasting phase, where this global state navigates and informs the contextualization of individual local tokens. We demonstrate that efficient linear-time scaling is a fundamental architectural byproduct of instantiating GWT functional bottleneck, as routing global information through a constant-sized ACR resolves the quadratic complexity inherent in standard attention. MANAR is a compatible re-parameterization of MHA with identical semantic roles for its projections, enabling knowledge transfer from pretrained transformers via weight-copy and thus overcoming the adoption barriers of structurally incompatible linear-time alternatives. MANAR enables non-convex contextualization, synthesizing representations that provably lie outside the convex hull of input tokens - a mathematical reflection of the creative synthesis described in GWT. Empirical evaluations confirm that MANAR matches or exceeds strong baselines across language (GLUE score of 85.1), vision (83.9% ImageNet-1K), and speech (2.7% WER on LibriSpeech), positioning it as an efficient and expressive alternative to quadratic attention.

[351]  arXiv:2603.18677 [pdf, ps, other]
Title: Cognitive Amplification vs Cognitive Delegation in Human-AI Systems: A Metric Framework
Authors: Eduardo Di Santi
Comments: 16 pages, 2 figures. Conceptual and mathematical framework for human-AI collaboration, cognitive amplification, cognitive delegation, and cognitive sustainability
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Artificial intelligence is increasingly embedded in human decision-making, where it can either enhance human reasoning or induce excessive cognitive dependence. This paper introduces a conceptual and mathematical framework for distinguishing cognitive amplification, in which AI improves hybrid human-AI performance while preserving human expertise, from cognitive delegation, in which reasoning is progressively outsourced to AI systems.
To characterize these regimes, we define a set of operational metrics: the Cognitive Amplification Index (CAI*), the Dependency Ratio (D), the Human Reliance Index (HRI), and the Human Cognitive Drift Rate (HCDR). Together, these quantities provide a low-dimensional metric space for evaluating not only whether human-AI systems achieve genuine synergistic performance, but also whether such performance is cognitively sustainable for the human component over time.
The framework highlights a central design tension in human-AI systems: maximizing short-term hybrid capability does not necessarily preserve long-term human cognitive competence. We therefore argue that human-AI systems should be designed under a cognitive sustainability constraint, such that gains in hybrid performance do not come at the cost of degradation in human expertise.

[352]  arXiv:2603.18678 [pdf, ps, other]
Title: Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models
Comments: The paper is currently under review
Subjects: Sound (cs.SD); Computation and Language (cs.CL)

Puns represent a typical linguistic phenomenon that exploits polysemy and phonetic ambiguity to generate humour, posing unique challenges for natural language understanding. Within pun research, audio plays a central role in human communication except text and images, while datasets and systematic resources for spoken puns remain scarce, leaving this crucial modality largely underexplored. In this paper, we present APUN-Bench, the first benchmark dedicated to evaluating large audio language models (LALMs) on audio pun understanding. Our benchmark contains 4,434 audio samples annotated across three stages: pun recognition, pun word location and pun meaning inference. We conduct a deep analysis of APUN-Bench by systematically evaluating 10 state-of-the-art LALMs, uncovering substantial performance gaps in recognizing, localizing, and interpreting audio puns. This analysis reveals key challenges, such as positional biases in audio pun location and error cases in meaning inference, offering actionable insights for advancing humour-aware audio intelligence.

[353]  arXiv:2603.18680 [pdf, ps, other]
Title: Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Vertical federated learning (VFL) allows an active party with a top model, and multiple passive parties with bottom models to collaborate. In this scenario, passive parties possessing only features may attempt to infer active party's private labels, making label inference attacks (LIAs) a significant threat. Previous LIA studies have claimed that well-trained bottom models can effectively represent labels. However, we demonstrate that this view is misleading and exposes the vulnerability of existing LIAs. By leveraging mutual information, we present the first observation of the "model compensation" phenomenon in VFL. We theoretically prove that, in VFL, the mutual information between layer outputs and labels increases with layer depth, indicating that bottom models primarily extract feature information while the top model handles label mapping. Building on this insight, we introduce task reassignment to show that the success of existing LIAs actually stems from the distribution alignment between features and labels. When this alignment is disrupted, the performance of LIAs declines sharply or even fails entirely. Furthermore, the implications of this insight for defenses are also investigated. We propose a zero-overhead defense technique based on layer adjustment. Extensive experiments across five datasets and five representative model architectures indicate that shifting cut layers forward to increase the proportion of top model layers in the entire model not only improves resistance to LIAs but also enhances other defenses.

[354]  arXiv:2603.18683 [pdf, ps, other]
Title: HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement Learning
Comments: Submitted to ACL 2026 on Jan 5, 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

While large language models excel in diverse domains, their performance on complex longhorizon agentic decision-making tasks remains limited. Most existing methods concentrate on designing effective reward models (RMs) to advance performance via multi-turn reinforcement learning. However, they suffer from delayed propagation in sparse outcome rewards and unreliable credit assignment with potentially overly fine-grained and unfocused turnlevel process rewards. In this paper, we propose (HISR) exploiting Hindsight Information to modulate Segmental process Rewards, which closely aligns rewards with sub-goals and underscores significant segments to enhance the reliability of credit assignment. Specifically, a segment-level process RM is presented to assign rewards for each sub-goal in the task, avoiding excessively granular allocation to turns. To emphasize significant segments in the trajectory, a hindsight model is devised to reflect the preference of performing a certain action after knowing the trajectory outcome. With this characteristic, we design the ratios of sequence likelihoods between hindsight and policy model to measure action importance. The ratios are subsequently employed to aggregate segment importance scores, which in turn modulate segmental process rewards, enhancing credit assignment reliability. Extensive experimental results on three publicly benchmarks demonstrate the validity of our method.

[355]  arXiv:2603.18687 [pdf, ps, other]
Title: Secure Wi-Fi Ranging Today: Security and Adoption of IEEE 802.11az/bk
Comments: Submitted
Subjects: Cryptography and Security (cs.CR)

Ranging and localisation have become critical for many applications and services. The Wi-Fi (IEEE 802.11) standard is a natural candidate for providing these functions across diverse environments, given its widespread deployment. The IEEE 802.11az amendment, finalised in 2023, introduces "Next Generation Positioning" mechanisms to secure and harden the existing insecure Wi-Fi Fine Timing Measurement (FTM) ranging solution. Moreover, the recent IEEE 802.11bk amendment increases the available bandwidth with the goal of approaching the centimetre-level ranging accuracy of ultra-wideband (UWB) systems. This paper examines to what extent these promises hold from a security and deployability perspective. We analyse the core mechanisms of secure Wi-Fi ranging as defined in IEEE 802.11az and IEEE 802.11bk at both the logical and physical layers, combining standards analysis with simulations and measurements on commercial and development hardware. At the logical layer, we show how common deployment choices can result in unauthenticated ranging, downgrade attacks, and simple denial-of-service attacks, making it difficult to securely realise many high-stakes use cases. At the physical layer, we study the predictability of secure ranging waveforms, the security impact of symbol repetition, and how waveform design choices affect compliance with spectral masks under realistic RF behaviour. Our results show that secure Wi-Fi ranging is highly sensitive to configuration choices and is non-trivial to implement on existing hardware. This is also evidenced by the currently limited support for secure Wi-Fi ranging in commodity devices. This paper provides practical guidelines for using secure FTM safely and recommendations to vendors and standardisation bodies to improve its robustness and deployability.

[356]  arXiv:2603.18688 [pdf, ps, other]
Title: STEP: Scientific Time-Series Encoder Pretraining via Cross-Domain Distillation
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Scientific time series are central to scientific AI but are typically sparse, highly heterogeneous, and limited in scale, making unified representation learning particularly challenging. Meanwhile, foundation models pretrained on relevant time series domains such as audio, general time series, and brain signals contain rich knowledge, but their applicability to scientific signals remains underexplored. In this paper, we investigate the transferability and complementarity of foundation models from relevant time series domains, and study how to effectively leverage them to build a unified encoder for scientific time series. We first systematically evaluate relevant foundation models, showing the effectiveness of knowledge transfer to scientific tasks and their complementary strengths. Based on this observation, we propose STEP, a Scientific Time Series Encoder Pretraining framework via cross domain distillation. STEP introduces adaptive patching to handle extreme-length sequences and a statistics compensation scheme to accommodate diverse numerical scales. It further leverages cross-domain distillation to integrate knowledge from multiple foundation models into a unified encoder. By combining complementary representations across different domains, STEP learns general-purpose and transferable features tailored for scientific signals. Experiments on seven scientific time series tasks demonstrate that STEP provides both an effective structure and an effective pretraining paradigm, taking a STEP toward scientific time series representation learning.

[357]  arXiv:2603.18690 [pdf, ps, other]
Title: TurboMem: High-Performance Lock-Free Memory Pool with Transparent Huge Page Auto-Merging for DPDK
Authors: Junyi Yang
Comments: 7 pages, 2 figures, 4 tables; v2: Added explicit disclaimer in abstract clarifying that all performance numbers are based on mock benchmarks (real VTune results forthcoming). Minor formatting corrections
Subjects: Performance (cs.PF)

High-speed packet processing on multicore CPUs places extreme demands on memory allocators. In systems like DPDK, fixed-size memory pools back packet buffers (mbufs) to avoid costly dynamic allocation. However, even DPDK's optimized mempool faces scalability limits: lock contention on the shared ring, cache-coherence ping-pong between cores, and heavy TLB pressure from thousands of small pages. To mitigate these issues, DPDK typically uses explicit huge pages (2 MB or 1 GB) for its memory pools. This reduces TLB misses but requires manual configuration and can lead to fragmentation and inflexibility. We propose TurboMem, a novel C++ template-based memory pool that addresses these challenges. TurboMem combines a fully lock-free design (using atomic stacks and per-core local caches) with Transparent Huge Page (THP) auto merging. By automatically promoting pools to 2 MB pages via madvise(MADV_HUGEPAGE), TurboMem achieves the benefits of huge pages without manual setup. We also enforce strict NUMA locality and CPU affinity, so each core allocates and frees objects from its local node. Using Intel VTune on a single-socket 100 Gbps testbed, we show that TurboMem boosts packet throughput by up to 28% while reducing TLB misses by 41% compared to a standard DPDK mempool with explicit huge pages. These results demonstrate that THP auto-merging can outperform manually reserved huge pages in low-fragmentation scenarios, and that modern C++ lock-free programming yields practical gains in data-plane software. Note: The performance claims reported in this preliminary version (up to 28% higher throughput and 41% fewer TLB misses) are based on mock benchmarks. Comprehensive real-system evaluations using Intel VTune are currently underway and will be presented in a future revision.

[358]  arXiv:2603.18693 [pdf, ps, other]
Title: Cross-Ecosystem Vulnerability Analysis for Python Applications
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Python applications depend on native libraries that may be vendored within package distributions or installed on the host system. When vulnerabilities are discovered in these libraries, determining which Python packages are affected requires cross-ecosystem analysis spanning Python dependency graphs and OS package versions. Current vulnerability scanners produce false negatives by missing vendored vulnerabilities and false positives by ignoring security patches backported by OS distributions.
We present a provenance-aware vulnerability analysis approach that resolves vendored libraries to specific OS package versions or upstream releases. Our approach queries vendored libraries against a database of historical OS package artifacts using content-based hashing, and applies library-specific dynamic analyses to extract version information from binaries built from upstream source. We then construct cross-ecosystem call graphs by stitching together Python and binary call graphs across dependency boundaries, enabling reachability analysis of vulnerable functions. Evaluating on 100,000 Python packages and 10 known CVEs associated with third-party native dependencies, we identify 39 directly vulnerable packages (47M+ monthly downloads) and 312 indirectly vulnerable client packages affected through dependency chains. Our analysis achieves up to 97% false positive reduction compared to upstream version matching.

[359]  arXiv:2603.18695 [pdf, other]
Title: High-Performance Portable GPU Primitives for Arbitrary Types and Operators in Julia
Authors: Emmanuel Pilliat (ENSAI)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

Portable GPU frameworks such as Kokkos and RAJA reduce the burden of cross-architecture development but typically incur measurable overhead on fundamental parallel primitives relative to vendor-optimized libraries. We present KernelForge.jl, a Julia library that implements scan, mapreduce, and matrix-vector primitives through a two-layer portable architecture: KernelIntrinsics.jl provides backend-agnostic abstractions for warp-level shuffles, memory fences, and vectorized memory access, while KernelForge.jl builds high-performance algorithms exclusively on top of these interfaces. Evaluated on an NVIDIA A40 and an AMD MI300X, KernelForge.jl matches or exceeds CUB kernel execution time on scan and mapreduce on the A40, and matches cuBLAS throughput on matrix-vector operations across most tested configurations-demonstrating, as a proof of concept, that portable JIT-compiled abstractions can achieve vendor-level throughput without sacrificing generality.

[360]  arXiv:2603.18697 [pdf, ps, other]
Title: OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation
Comments: 5 pages, 4 figures
Subjects: Machine Learning (cs.LG)

In industrial commodity recommendation systems, the representation quality of Item-Id vocabularies directly impacts the scalability and generalization ability of recommendation models. A key challenge is that traditional Item-Id vocabularies, when subjected to sparse scaling, suffer from low-frequency information interference, which restricts their expressive power for massive item sets and leads to representation collapse. To address this issue, we propose an Orthogonal Constrained Projection method to optimize embedding representation. By enforcing orthogonality, the projection constrains the backpropagation manifold, aligning the singular value spectrum of the learned embeddings with the orthogonal basis. This alignment ensures high singular entropy, thereby preserving isotropic generalized features while suppressing spurious correlations and overfitting to rare items. Empirical results demonstrate that OCP accelerates loss convergence and enhances the model's scalability; notably, it enables consistent performance gains when scaling up dense layers. Large-scale industrial deployment on JD.com further confirms its efficacy, yielding a 12.97% increase in UCXR and an 8.9% uplift in GMV, highlighting its robust utility for scaling up both sparse vocabularies and dense architectures.

[361]  arXiv:2603.18699 [pdf, other]
Title: A more accurate rational non-commutative algorithm for multiplying 4x4 matrices using 48 multiplications
Authors: Jean-Guillaume Dumas (UGA, LJK, CASC), Clément Pernet (UGA, LJK), Alexandre Sedoglavic (CRIStAL)
Subjects: Data Structures and Algorithms (cs.DS); Symbolic Computation (cs.SC)

We propose a more accurate variant of an algorithm for multiplying 4x4 matrices using 48 multiplications over any ring containing an inverse of 2. This algorithm has an error bound exponent of only log 4 $\gamma$$\infty$,2 $\approx$ 2.386. It also reaches a better accuracy w.r.t. max-norm in practice, when compared to previously known such fast algorithms. Furthermore, we propose a straight line program of this algorithm, giving a leading constant in its complexity bound of 387 32 n 2+log 4 3 + o n 2+log 4 3 operations over any ring containing an inverse of 2. Introduction: An algorithm to multiply two 4x4 complex-valued matrices requiring only 48 non-commutative multiplications was introduced in [16] 1 using a pipeline of large language models orchestrated by an evolutionary coding agent. A matrix multiplication algorithm with that many non-commutative multiplications is denoted by ___4x4x4:48___ in the sequel. An equivalent variant of the associated tensor decomposition defining this algorithm, but over the rationals (more precisely over any ring containing an inverse of 2), was then given in [8]. Most error analysis of sub-cubic time matrix multiplication algorithms [3, 4, 2, 1, 17] are given in the max-norm setting: bounding the largest output error as a function of the max-norm product of the vectors of input matrix coefficients. In this setting, Strassen's algorithm has shown the best accuracy bound, (proven minimal under some assumptions in [2]). In [6, 8], the authors relaxed this setting by shifting the focus to the 2-norm for input and/or output; that allowed them to propose a ___2x2x2:7___ variant with an improved accuracy bound. Experiments show that this variant performs best even when measuring the max-norm of the error bound. We present in this note a variant of the recent ___4x4x4:48___ algorithm over the rationals (again in the same orbit under De Groot isotropies [10]) that is more numerically accurate w.r.t. max-norm in practice. In particular, our new variant improves on the error bound exponent, from log 2 $\gamma$ $\infty$,2 $\approx$ 2.577 Consider the product of an M x K matrix A by a K x N matrix B. It is computed by a ___m, k, n___ algorithm represented by the matrices L, R, P applied recursively on ${\ell}$ recursive levels and the resulting m 0 x k 0 by k 0 x n 0 products are performed using an algorithm $\beta$. Here M = m 0 m ${\ell}$ , K = k 0 k ${\ell}$ and n = n 0 n ${\ell}$ . The accuracy bound below uses any (possibly different) p-norms and q-norms for its left-handside, ___$\bullet$___ p and right-hand side, ___$\bullet$___ q . The associated dual norms, are denoted by ___$\bullet$___ p $\star$ and ___$\bullet$___ q $\star$ respectively. Note that, these are vector norms, hence ___A___ p for matrix A in R mxn denotes ___Vect(A)___ p and is the p-norm of the mn dimensional vector of its coefficients, and not a matrix norm.

[362]  arXiv:2603.18701 [pdf, ps, other]
Title: Assessing performance tradeoffs in hierarchical organizations using a diffusive coupling model
Comments: Paper submitted to IFAC for publication
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Physics and Society (physics.soc-ph)

We study a continuous-time dynamical system of nodes diffusively coupled over a hierarchical network to examine the efficiency and performance tradeoffs that organizations, teams, and command and control units face while achieving coordination and sharing information across layers. Specifically, after defining a network structure that captures real-world features of hierarchical organizations, we use linear systems theory and perturbation theory to characterize the rate of convergence to a consensus state, and how effectively information can propagate through the network, depending on the breadth of the organization and the strength of inter-layer communication. Interestingly, our analytical insights highlight a fundamental performance tradeoff. Namely, networks that favor fast coordination will have decreased ability to share information that is generated in the lower layers of the organization and is to be passed up the hierarchy. Numerical results validate and extend our theoretical results.

[363]  arXiv:2603.18702 [pdf, ps, other]
Title: Off-Policy Learning with Limited Supply
Comments: Published as a conference paper at WWW 2026
Subjects: Machine Learning (cs.LG)

We study off-policy learning (OPL) in contextual bandits, which plays a key role in a wide range of real-world applications such as recommendation systems and online advertising. Typical OPL in contextual bandits assumes an unconstrained environment where a policy can select the same item infinitely. However, in many practical applications, including coupon allocation and e-commerce, limited supply constrains items through budget limits on distributed coupons or inventory restrictions on products. In these settings, greedily selecting the item with the highest expected reward for the current user may lead to early depletion of that item, making it unavailable for future users who could potentially generate higher expected rewards. As a result, OPL methods that are optimal in unconstrained settings may become suboptimal in limited supply settings. To address the issue, we provide a theoretical analysis showing that conventional greedy OPL approaches may fail to maximize the policy performance, and demonstrate that policies with superior performance must exist in limited supply settings. Based on this insight, we introduce a novel method called Off-Policy learning with Limited Supply (OPLS). Rather than simply selecting the item with the highest expected reward, OPLS focuses on items with relatively higher expected rewards compared to the other users, enabling more efficient allocation of items with limited supply. Our empirical results on both synthetic and real-world datasets show that OPLS outperforms existing OPL methods in contextual bandit problems with limited supply.

[364]  arXiv:2603.18707 [pdf, ps, other]
Title: From ex(p) to poly: Gaussian Splatting with Polynomial Kernels
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Recent advancements in Gaussian Splatting (3DGS) have introduced various modifications to the original kernel, resulting in significant performance improvements. However, many of these kernel changes are incompatible with existing datasets optimized for the original Gaussian kernel, presenting a challenge for widespread adoption. In this work, we address this challenge by proposing an alternative kernel that maintains compatibility with existing datasets while improving computational efficiency. Specifically, we replace the original exponential kernel with a polynomial approximation combined with a ReLU function. This modification allows for more aggressive culling of Gaussians, leading to enhanced performance across different 3DGS implementations. Our results show a notable performance improvement of 4 to 15% with negligible impact on image quality. We also provide a detailed mathematical analysis of the new kernel and discuss its potential benefits for 3DGS implementations on NPU hardware.

[365]  arXiv:2603.18709 [pdf, ps, other]
Title: Let's Play Tag: Linear Time Evaluation of Conjunctive Queries under TGD Constraints
Subjects: Databases (cs.DB)

We study the limits of linear time evaluation of conjunctive queries under constraints expressed as tuple-generating dependencies (TGDs), across several modes of query evaluation: single-testing, all-testing, counting, lexicographic direct access, and enumeration. While full classifications seem far beyond reach, we propose an approach that, for some evaluation modes and classes of TGDs, makes it possible to lift known dichotomies from the unconstrained setting. In particular, our approach applies to all mentioned evaluation modes except enumeration, when the constraints fall into one of two classes: non-recursive sets of TGDs in which every TGD uses at most binary relation symbols in the head or has at most two frontier variables; and frontier-guarded full TGDs. We further provide a collection of examples showcasing the challenges that arise for enumeration and for less restrictive classes of TGDs.

[366]  arXiv:2603.18712 [pdf, ps, other]
Title: Accurate and Efficient Multi-Channel Time Series Forecasting via Sparse Attention Mechanism
Comments: Accepted by ICDE 2026
Subjects: Artificial Intelligence (cs.AI)

The task of multi-channel time series forecasting is ubiquitous in numerous fields such as finance, supply chain management, and energy planning. It is critical to effectively capture complex dynamic dependencies within and between channels for accurate predictions. However, traditional method paid few attentions on learning the interaction among channels. This paper proposes Linear-Network (Li-Net), a novel architecture designed for multi-channel time series forecasting that captures the linear and non-linear dependencies among channels. Li-Net dynamically compresses representations across sequence and channel dimensions, processes the information through a configurable non-linear module and subsequently reconstructs the forecasts. Moreover, Li-Net integrates a sparse Top-K Softmax attention mechanism within a multi-scale projection framework to address these challenges. A core innovation is its ability to seamlessly incorporate and fuse multi-modal embeddings, guiding the sparse attention process to focus on the most informative time steps and feature channels. Through the experiment results on multiple real-world benchmark datasets demonstrate that Li-Net achieves competitive performance compared to state-of-the-art baseline methods. Furthermore, Li-Net provides a superior balance between prediction accuracy and computational burden, exhibiting significantly lower memory usage and faster inference times. Detailed ablation studies and parameter sensitivity analyses validate the effectiveness of each key component in our proposed architecture.
Keywords: Multivariate Time Series Forecasting, Sparse Attention Mechanism, Multimodal Information Fusion, Non-linear relationship

[367]  arXiv:2603.18718 [pdf, ps, other]
Title: MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution
Comments: 23 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI)

Memory-augmented LLM agents maintain external memory banks to support long-horizon interaction, yet most existing systems treat construction, retrieval, and utilization as isolated subroutines. This creates two coupled challenges: strategic blindness on the forward path of the memory cycle, where construction and retrieval are driven by local heuristics rather than explicit strategic reasoning, and sparse, delayed supervision on the backward path, where downstream failures rarely translate into direct repairs of the memory bank. To address these challenges, we propose MemMA, a plug-and-play multi-agent framework that coordinates the memory cycle along both the forward and backward paths. On the forward path, a Meta-Thinker produces structured guidance that steers a Memory Manager during construction and directs a Query Reasoner during iterative retrieval. On the backward path, MemMA introduces in-situ self-evolving memory construction, which synthesizes probe QA pairs, verifies the current memory, and converts failures into repair actions before the memory is finalized. Extensive experiments on LoCoMo show that MemMA consistently outperforms existing baselines across multiple LLM backbones and improves three different storage backends in a plug-and-play manner. Our code is publicly available at https://github.com/ventr1c/memma.

[368]  arXiv:2603.18719 [pdf, ps, other]
Title: Ontology-Guided Diffusion for Zero-Shot Visual Sim2Real Transfer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Bridging the simulation-to-reality (sim2real) gap remains challenging as labelled real-world data is scarce. Existing diffusion-based approaches rely on unstructured prompts or statistical alignment, which do not capture the structured factors that make images look real. We introduce Ontology- Guided Diffusion (OGD), a neuro-symbolic zero-shot sim2real image translation framework that represents realism as structured knowledge. OGD decomposes realism into an ontology of interpretable traits -- such as lighting and material properties -- and encodes their relationships in a knowledge graph. From a synthetic image, OGD infers trait activations and uses a graph neural network to produce a global embedding. In parallel, a symbolic planner uses the ontology traits to compute a consistent sequence of visual edits needed to narrow the realism gap. The graph embedding conditions a pretrained instruction-guided diffusion model via cross-attention, while the planned edits are converted into a structured instruction prompt. Across benchmarks, our graph-based embeddings better distinguish real from synthetic imagery than baselines, and OGD outperforms state-of-the-art diffusion methods in sim2real image translations. Overall, OGD shows that explicitly encoding realism structure enables interpretable, data-efficient, and generalisable zero-shot sim2real transfer.

[369]  arXiv:2603.18720 [pdf, ps, other]
Title: Resource-Constrained Joint Replenishment via Power-of-$m^{1/k}$ Policies
Authors: Danny Segev
Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)

The continuous-time joint replenishment problem has long served as a foundational inventory management model. Even though its unconstrained setting has seen recent algorithmic advances, the incorporation of resource constraints into this domain precludes the application of newly discovered synchronization techniques. Such constraints arise in a broad spectrum of practical environments where resource consumption is bounded as an aggregate rate over time. However, for nearly four decades, the prevailing approximation guarantee for resource-constrained joint replenishment has remained $\frac{ 1 }{ \ln 2 } \approx 1.4427$, achieved via classical power-of-$2$ policies.
In this paper, we circumvent these structural policy restrictions by devising generalized rounding frameworks, demonstrating that a well-known convex relaxation is much tighter than previously established. In particular, we expand our analytical scope to encompass fractional base expansion factors, randomized shifting, and staggered interleaved grids. Through this multifaceted methodology, we present a sequence of gradually improving performance guarantees. First, by proposing a best-of-two framework that exploits structural asymmetries between deterministic power-of-$m^{1/k}$ policies, we surpass the classical barrier to obtain a $1.3776$-approximation. Second, by injecting a random shift into the logarithmic grid domain and formulating a factor-revealing linear program to optimize a dual-policy approach, we attain a $1.2512$-approximation. Finally, by superimposing a secondary offset grid to subdivide rounding intervals and suppress holding cost inflation, we utilize interleaved policies to arrive at our ultimate approximation ratio of $\frac{5}{6\ln 2} \approx 1.2023$, which is proven to be best-possible for the class of interleaved power-of-$m^{1/k}$ policies.

[370]  arXiv:2603.18728 [pdf, ps, other]
Title: Reconstructions of Single Pixel X-Ray Transforms with Applications in Nuclear-Disarmament Verification
Comments: 13 pages, 5 figures
Subjects: Numerical Analysis (math.NA)

In nuclear arms control and disarmament processes, it is crucial to determine whether an object is a nuclear weapon or not without revealing sensitive information about it. At the MIT: Laboratory for Nuclear Security and Policy, such a nuclear verification method was developed, showcasing a transmission-based approach [1]. This method's essential part rests on a mathematical operation, the Single-Pixel X-Ray Transform: a cone of X-rays transmits an object and the remaining intensity is measured with a single-pixel detector. This transformation and the recovery of objects from dimensionless single-pixel measurements more generally has only been analyzed to a limited extent. In this work, we investigate some of the Single Pixel X-Ray Transform's mathematical properties. More specifically, we show that the Single Pixel X-ray transform is non-linear, continuous, Fr\'echet-differentiable and convex. We also introduce a method of reconstructing an object based only on a finite number of dimensionless, noisy Single Pixel X-Ray Transform measurement values. This method is based on Douglas-Rachford splitting and uses total variation denoising. We present an implementation for this method, focusing on rotational symmetric objects, as they allow the use of a one-dimensional direct total variation denoising algorithm [2].

[371]  arXiv:2603.18729 [pdf, ps, other]
Title: Analysis Of Linguistic Stereotypes in Single and Multi-Agent Generative AI Architectures
Subjects: Artificial Intelligence (cs.AI)

Many works in the literature show that LLM outputs exhibit discriminatory behaviour, triggering stereotype-based inferences based on the dialect in which the inputs are written. This bias has been shown to be particularly pronounced when the same inputs are provided to LLMs in Standard American English (SAE) and African-American English (AAE). In this paper, we replicate existing analyses of dialect-sensitive stereotype generation in LLM outputs and investigate the effects of mitigation strategies, including prompt engineering (role-based and Chain-Of-Thought prompting) and multi-agent architectures composed of generate-critique-revise models. We define eight prompt templates to analyse different ways in which dialect bias can manifest, such as suggested names, jobs, and adjectives for SAE or AAE speakers. We use an LLM-as-judge approach to evaluate the bias in the results. Our results show that stereotype-bearing differences emerge between SAE- and AAE-related outputs across all template categories, with the strongest effects observed in adjective and job attribution. Baseline disparities vary substantially by model, with the largest SAE-AAE differential observed in Claude Haiku and the smallest in Phi-4 Mini. Chain-Of-Thought prompting proved to be an effective mitigation strategy for Claude Haiku, whereas the use of a multi-agent architecture ensured consistent mitigation across all the models. These findings suggest that for intersectionality-informed software engineering, fairness evaluation should include model-specific validation of mitigation strategies, and workflow-level controls (e.g., agentic architectures involving critique models) in high-impact LLM deployments. The current results are exploratory in nature and limited in scope, but can lead to extensions and replications by increasing the dataset size and applying the procedure to different languages or dialects.

[372]  arXiv:2603.18734 [pdf, ps, other]
Title: Green Architectural Tactics in ML-enabled Systems: An LLM-based Repository Mining Study
Subjects: Software Engineering (cs.SE)

Context: The increasing adoption of machine learning (ML) and artificial intelligence (AI) technologies raises growing concerns about their environmental sustainability. Developing and deploying ML-enabled systems is computationally intensive, particularly during training and inference. Green AI has emerged to address these issues by promoting efficiency without sacrificing accuracy. While prior research has proposed catalogs of sustainable practices (i.e., green tactics), there remains limited understanding of their adoption in practice and whether additional, undocumented tactics exist. Objective: This study aims to investigate the extent to which existing sustainable practices are implemented in real-world ML-enabled systems and to identify previously undocumented practices that support environmental sustainability. Method: We conduct a mining software repository study on 205 open-source ML projects on GitHub. To support our analysis, we design a novel mechanism based on large language models (LLMs) capable of identifying both known and new sustainable practices from code repositories. Results: Our findings confirm that green tactics reported in the literature are used in practice, although adoption rates vary. Furthermore, our LLM-based approach reveals nine previously undocumented sustainable practices. Each tactic is supported with code examples to aid adoption and integration. Conclusions: We finally provide insights for practitioners seeking to reduce the environmental impact of ML-enabled systems and offer a foundation for future research in automating the detection and adoption of sustainable practices.

[373]  arXiv:2603.18735 [pdf, ps, other]
Title: SpaceTime Programming: Live and Omniscient Exploration of Code and Execution
Subjects: Software Engineering (cs.SE)

Programming environments typically separate the world of static code from the dynamic execution of programs. Developers must switch between writing code and observing its execution, often with limited tools to understand the relationship between code changes and runtime behavior. Several paradigms and approaches exist to bridge this gap, including exploratory programming for comparing code variants, live programming for immediate feedback, and omniscient debugging for exploring execution history. However, existing solutions tend to focus on specific aspects and one specific paradigm rather than providing a fully integrated environment with multiple capabilities. This paper introduces \spacetime Programming, a novel approach that unifies these paradigms to create a programming model for exploring both code modifications and execution flow. At the core of our approach is a trace mechanism that captures not only execution state but also the corresponding code changes, enabling developers to explore programs in both space (code variants) and time (execution flow). As a proof of concept, we implemented a Python library supporting SpaceTime Programming and applied it in two contexts: a live omniscient debugger and a Pygame game development tool, showcased through a Flappy Bird-like game. We further evaluated SpaceTimePy on five real-world Python projects, finding performance overhead ranging from 35% to 150% on test suites.

[374]  arXiv:2603.18736 [pdf, ps, other]
Title: CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models, current reward modeling heavily relies on experimental feedback data collected from human annotators under controlled and costly conditions. In this work, we introduce observational reward modeling -- learning reward models with observational user feedback (e.g., clicks, copies, and upvotes) -- as a scalable and cost-effective alternative. We identify two fundamental challenges in this setting: (1) observational feedback is noisy due to annotation errors, which deviates it from true user preference; (2) observational feedback is biased by user preference, where users preferentially provide feedback on responses they feel strongly about, which creats a distribution shift between training and inference data. To address these challenges, we propose CausalRM, a causal-theoretic reward modeling framework that aims to learn unbiased reward models from observational feedback. To tackle challenge (1), CausalRM introduces a noise-aware surrogate loss term that is provably equivalent to the primal loss under noise-free conditions by explicitly modeling the annotation error generation process. To tackle challenge (2), CausalRM uses propensity scores -- the probability of a user providing feedback for a given response -- to reweight training samples, yielding a loss function that eliminates user preference bias. Extensive experiments across diverse LLM backbones and benchmark datasets validate that CausalRM effectively learns accurate reward signals from noisy and biased observational feedback and delivers substantial performance improvements on downstream RLHF tasks -- including a 49.2% gain on WildGuardMix and a 32.7% improvement on HarmBench. Code is available on our project website.

[375]  arXiv:2603.18739 [pdf, ps, other]
Title: EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation
Comments: Code is available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deploying high-performance dense prediction models on resource-constrained edge devices remains challenging due to strict limits on computation and memory. In practice, lightweight systems for object detection, instance segmentation, and pose estimation are still dominated by CNN-based architectures such as YOLO, while compact Vision Transformers (ViTs) often struggle to achieve similarly strong accuracy efficiency tradeoff, even with large scale pretraining. We argue that this gap is largely due to insufficient task specific representation learning in small scale ViTs, rather than an inherent mismatch between ViTs and edge dense prediction. To address this issue, we introduce EdgeCrafter, a unified compact ViT framework for edge dense prediction centered on ECDet, a detection model built from a distilled compact backbone and an edge-friendly encoder decoder design. On the COCO dataset, ECDet-S achieves 51.7 AP with fewer than 10M parameters using only COCO annotations. For instance segmentation, ECInsSeg achieves performance comparable to RF-DETR while using substantially fewer parameters. For pose estimation, ECPose-X reaches 74.8 AP, significantly outperforming YOLO26Pose-X (71.6 AP) despite the latter's reliance on extensive Objects365 pretraining. These results show that compact ViTs, when paired with task-specialized distillation and edge-aware design, can be a practical and competitive option for edge dense prediction. Code is available at: https://intellindust-ai-lab.github.io/projects/EdgeCrafter/

[376]  arXiv:2603.18740 [pdf, ps, other]
Title: Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Security code reviews increasingly rely on systems integrating Large Language Models (LLMs), ranging from interactive assistants to autonomous agents in CI/CD pipelines. We study whether confirmation bias (i.e., the tendency to favor interpretations that align with prior expectations) affects LLM-based vulnerability detection, and whether this failure mode can be exploited in software supply-chain attacks. We conduct two complementary studies.
Study 1 quantifies confirmation bias through controlled experiments on 250 CVE vulnerability/patch pairs evaluated across four state-of-the-art models under five framing conditions for the review prompt. Framing a change as bug-free reduces vulnerability detection rates by 16-93%, with strongly asymmetric effects: false negatives increase sharply while false positive rates change little. Bias effects vary by vulnerability type, with injection flaws being more susceptible to them than memory corruption bugs.
Study 2 evaluates exploitability in practice mimicking adversarial pull requests that reintroduce known vulnerabilities while framed as security improvements or urgent functionality fixes via their pull request metadata. Adversarial framing succeeds in 35% of cases against GitHub Copilot (interactive assistant) under one-shot attacks and in 88% of cases against Claude Code (autonomous agent) in real project configurations where adversaries can iteratively refine their framing to increase attack success. Debiasing via metadata redaction and explicit instructions restores detection in all interactive cases and 94% of autonomous cases. Our results show that confirmation bias poses a weakness in LLM-based code review, with implications on how AI-assisted development tools are deployed.

[377]  arXiv:2603.18741 [pdf, ps, other]
Title: Beyond the Code: A Multi-Modal Assessment Strategy for Fostering Professional Competencies via Introductory Programming Projects
Comments: Article submitted to IEEE
Subjects: Computers and Society (cs.CY); Software Engineering (cs.SE)

As the landscape of software engineering evolves, introductory programming courses must go beyond teaching syntax to foster comprehensive technical competencies and professional soft skills. This paper reports on a pedagogical experience in a "Fundamentals of Programming" course that used a Project-Based Learning (PBL) framework to develop a 2D "Maze Runner"-style game. While game development serves as a high-engagement vehicle for mastering core concepts, such as multidimensional arrays, control structures, and logic, the core of this study focuses on implementing a rigorous, multifaceted assessment model structured across four distinct dimensions: (1) an in-situ technical demonstration, evaluating real-time code execution and algorithmic robustness; (2) a technical screencast, requiring students to articulate their work in a concise audiovisual format; (3) a formal presentation to instructors, defending their project's design patterns and problem-solving strategies; and (4) a structured peer-review process, where students evaluated their colleagues' projects.
Our findings suggest that this multi-dimensional approach not only improves student retention of programming fundamentals but also significantly enhances communication skills and critical thinking. By integrating peer evaluation and multimedia documentation, the course successfully bridges the gap between basic coding and the collaborative requirements of modern software engineering. This paper details the curriculum design, the challenges of implementing diverse assessment pillars, and the measurable impact on student performance and engagement, providing a scalable roadmap for educators looking to modernize introductory computing curricula.

[378]  arXiv:2603.18742 [pdf, ps, other]
Title: 6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion transformers have demonstrated remarkable capabilities in generating videos. However, their practical deployment is severely constrained by high memory usage and computational cost. Post-Training Quantization provides a practical way to reduce memory usage and boost computation speed. Existing quantization methods typically apply a static bit-width allocation, overlooking the quantization difficulty of activations across diffusion timesteps, leading to a suboptimal trade-off between efficiency and quality. In this paper, we propose a inference time NVFP4/INT8 Mixed-Precision Quantization framework. We find a strong linear correlation between a block's input-output difference and the quantization sensitivity of its internal linear layers. Based on this insight, we design a lightweight predictor that dynamically allocates NVFP4 to temporally stable layers to maximize memory compression, while selectively preserving INT8 for volatile layers to ensure robustness. This adaptive precision strategy enables aggressive quantization without compromising generation quality. Beside this, we observe that the residual between the input and output of a Transformer block exhibits high temporal consistency across timesteps. Leveraging this temporal redundancy, we introduce Temporal Delta Cache (TDC) to skip computations for these invariant blocks, further reducing the computational cost. Extensive experiments demonstrate that our method achieves 1.92$\times$ end-to-end acceleration and 3.32$\times$ memory reduction, setting a new baseline for efficient inference in Video DiTs.

[379]  arXiv:2603.18743 [pdf, ps, other]
Title: Memento-Skills: Let Agents Design Agents
Comments: Memento-Skills Technical Report
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

We introduce \emph{Memento-Skills}, a generalist, continually-learnable LLM agent system that functions as an \emph{agent-designing agent}: it autonomously constructs, adapts, and improves task-specific agents through experience. The system is built on a memory-based reinforcement learning framework with \emph{stateful prompts}, where reusable skills (stored as structured markdown files) serve as persistent, evolving memory. These skills encode both behaviour and context, enabling the agent to carry forward knowledge across interactions.
Starting from simple elementary skills (like Web search and terminal operations), the agent continually improves via the \emph{Read--Write Reflective Learning} mechanism introduced in \emph{Memento~2}~\cite{wang2025memento2}. In the \emph{read} phase, a behaviour-trainable skill router selects the most relevant skill conditioned on the current stateful prompt; in the \emph{write} phase, the agent updates and expands its skill library based on new experience. This closed-loop design enables \emph{continual learning without updating LLM parameters}, as all adaptation is realised through the evolution of externalised skills and prompts.
Unlike prior approaches that rely on human-designed agents, Memento-Skills enables a generalist agent to \emph{design agents end-to-end} for new tasks. Through iterative skill generation and refinement, the system progressively improves its own capabilities. Experiments on the \emph{General AI Assistants} benchmark and \emph{Humanity's Last Exam} demonstrate sustained gains, achieving 26.2\% and 116.2\% relative improvements in overall accuracy, respectively. Code is available at https://github.com/Memento-Teams/Memento-Skills.

[380]  arXiv:2603.18746 [pdf, ps, other]
Title: ROFT-VINS: Robust Feature Tracking-based Visual-Inertial State Estimation for Harsh Environment
Comments: 6 pages, published ICCAS 2024
Journal-ref: S. Park and S. Han, "ROFT-VINS: Robust Feature Tracking-based Visual-Inertial State Estimation for Harsh Environment," 2024 24th International Conference on Control, Automation and Systems (ICCAS) 2024, pp. 508-513
Subjects: Robotics (cs.RO)

SLAM (Simultaneous Localization and Mapping) and Odometry are important systems for estimating the position of mobile devices, such as robots and cars, utilizing one or more sensors. Particularly in camera-based SLAM or Odometry, effectively tracking visual features is important as it significantly impacts system performance. In this paper, we propose a method that leverages deep learning to robustly track visual features in monocular camera images. This method operates reliably even in textureless environments and situations with rapid lighting changes. Additionally, we evaluate the performance of our proposed method by integrating it into VINS-Fusion (Monocular-Inertial), a commonly used Visual-Inertial Odometry (VIO) system.

[381]  arXiv:2603.18750 [pdf, ps, other]
Title: Automatic detection of Gen-AI texts: A comparative framework of neural models
Subjects: Computation and Language (cs.CL)

The rapid proliferation of Large Language Models has significantly increased the difficulty of distinguishing between human-written and AI generated texts, raising critical issues across academic, editorial, and social domains. This paper investigates the problem of AI generated text detection through the design, implementation, and comparative evaluation of multiple machine learning based detectors. Four neural architectures are developed and analyzed: a Multilayer Perceptron, a one-dimensional Convolutional Neural Network, a MobileNet-based CNN, and a Transformer model. The proposed models are benchmarked against widely used online detectors, including ZeroGPT, GPTZero, QuillBot, Originality.AI, Sapling, IsGen, Rephrase, and Writer. Experiments are conducted on the COLING Multilingual Dataset, considering both English and Italian configurations, as well as on an original thematic dataset focused on Art and Mental Health. Results show that supervised detectors achieve more stable and robust performance than commercial tools across different languages and domains, highlighting key strengths and limitations of current detection strategies.

[382]  arXiv:2603.18752 [pdf, ps, other]
Title: WeNLEX: Weakly Supervised Natural Language Explanations for Multilabel Chest X-ray Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Natural language explanations provide an inherently human-understandable way to explain black-box models, closely reflecting how radiologists convey their diagnoses in textual reports. Most works explicitly supervise the explanation generation process using datasets annotated with explanations. Thus, though plausible, the generated explanations are not faithful to the model's reasoning. In this work, we propose WeNLEX, a weakly supervised model for the generation of natural language explanations for multilabel chest X-ray classification. Faithfulness is ensured by matching images generated from their corresponding natural language explanations with original images, in the black-box model's feature space. Plausibility is maintained via distribution alignment with a small database of clinician-annotated explanations. We empirically demonstrate, through extensive validation on multiple metrics to assess faithfulness, simulatability, diversity, and plausibility, that WeNLEX is able to produce faithful and plausible explanations, using as little as 5 ground-truth explanations per diagnosis. Furthermore, WeNLEX can operate in both post-hoc and in-model settings. In the latter, i.e., when the multilabel classifier is trained together with the rest of the network, WeNLEX improves the classification AUC of the standalone classifier by 2.21%, thus showing that adding interpretability to the training process can actually increase the downstream task performance. Additionally, simply by changing the database, WeNLEX explanations are adaptable to any target audience, and we showcase this flexibility by training a layman version of WeNLEX, where explanations are simplified for non-medical users.

[383]  arXiv:2603.18756 [pdf, ps, other]
Title: Are complicated loss functions necessary for teaching LLMs to reason?
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Recent advances in large language models (LLMs) highlight the importance of post training techniques for improving reasoning and mathematical ability. Group Relative Policy Optimization (GRPO) has shown promise in this domain by combining group relative advantage estimation, PPO style clipping, and KL regularization. However, its complexity raises the question of whether all components are necessary for fostering reasoning behaviors. We conduct a systematic analysis of GRPO and identify two key findings: (1) incorporating negative feedback is essential training solely on actions above a baseline limits learning; and (2) PPO style constraints, such as policy ratio clipping, are not required to improve mathematical reasoning or performance. Building on these insights, we propose REINFORCE with Group Relative Advantage (RGRA), a simplified variant that retains group relative advantage estimation but removes PPO style clipping and policy ratio terms. Experiments across standard mathematical benchmarks indicate that RGRA has the potential to achieve stronger performance than GRPO. Our results suggest that simpler REINFORCE based approaches can effectively enhance reasoning in LLMs, offering a more transparent and efficient alternative to GRPO.

[384]  arXiv:2603.18757 [pdf, ps, other]
Title: DA-Mamba: Learning Domain-Aware State Space Model for Global-Local Alignment in Domain Adaptive Object Detection
Comments: Accepted by CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Domain Adaptive Object Detection (DAOD) aims to transfer detectors from a labeled source domain to an unlabeled target domain. Existing DAOD methods employ multi-granularity feature alignment to learn domain-invariant representations. However, the local connectivity of their CNN-based backbone and detection head restricts alignment to local regions, failing to extract global domain-invariant features. Although transformer-based DAOD methods capture global dependencies via attention mechanisms, their quadratic computational cost hinders practical deployment. To solve this, we propose DA-Mamba, a hybrid CNN-State Space Models (SSMs) architecture that combines the efficiency of CNNs with the linear-time long-range modeling capability of State Space Models (SSMs) to capture both global and local domain-invariant features. Specifically, we introduce two novel modules: Image-Aware SSM (IA-SSM) and Object-Aware SSM (OA-SSM). IA-SSM is integrated into the backbone to enhance global domain awareness, enabling image-level global and local alignment. OA-SSM is inserted into the detection head to model spatial and semantic dependencies among objects, enhancing instance-level alignment. Comprehensive experiments demonstrate that the proposed method can efficiently improve the cross-domain performance of the object detector.

[385]  arXiv:2603.18758 [pdf, ps, other]
Title: Dual-Model Prediction of Affective Engagement and Vocal Attractiveness from Speaker Expressiveness in Video Learning
Comments: Preprint. Accepted for publication in IEEE Transactions on Computational Social Systems
Journal-ref: IEEE Transactions on Computational Social Systems, 2026
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

This paper outlines a machine learning-enabled speaker-centric Emotion AI approach capable of predicting audience-affective engagement and vocal attractiveness in asynchronous video-based learning, relying solely on speaker-side affective expressions. Inspired by the demand for scalable, privacy-preserving affective computing applications, this speaker-centric Emotion AI approach incorporates two distinct regression models that leverage a massive corpus developed within Massive Open Online Courses (MOOCs) to enable affectively engaging experiences. The regression model predicting affective engagement is developed by assimilating emotional expressions emanating from facial dynamics, oculomotor features, prosody, and cognitive semantics, while incorporating a second regression model to predict vocal attractiveness based exclusively on speaker-side acoustic features. Notably, on speaker-independent test sets, both regression models yielded impressive predictive performance (R2 = 0.85 for affective engagement and R2 = 0.88 for vocal attractiveness), confirming that speaker-side affect can functionally represent aggregated audience feedback. This paper provides a speaker-centric Emotion AI approach substantiated by an empirical study discovering that speaker-side multimodal features, including acoustics, can prospectively forecast audience feedback without necessarily employing audience-side input information.

[386]  arXiv:2603.18761 [pdf, ps, other]
Title: NeuroGame Transformer: Gibbs-Inspired Attention Driven by Game Theory and Statistical Physics
Comments: This work has been submitted to IEEE Transactions on Cybernetics for possible publication
Subjects: Artificial Intelligence (cs.AI)

Standard attention mechanisms in transformers are limited by their pairwise formulation, which hinders the modeling of higher-order dependencies among tokens. We introduce the NeuroGame Transformer (NGT) to overcome this by reconceptualizing attention through a dual perspective: tokens are treated simultaneously as players in a cooperative game and as interacting spins in a statistical physics system. Token importance is quantified using two complementary game-theoretic concepts -- Shapley values for global, permutation-based attribution and Banzhaf indices for local, coalition-level influence. These are combined via a learnable gating parameter to form an external magnetic field, while pairwise interaction potentials capture synergistic relationships. The system's energy follows an Ising Hamiltonian, with attention weights emerging as marginal probabilities under the Gibbs distribution, efficiently computed via mean-field equations. To ensure scalability despite the exponential coalition space, we develop importance-weighted Monte Carlo estimators with Gibbs-distributed weights. This approach avoids explicit exponential factors, ensuring numerical stability for long sequences. We provide theoretical convergence guarantees and characterize the fairness-sensitivity trade-off governed by the interpolation parameter. Experimental results demonstrate that the NeuroGame Transformer achieves strong performance across SNLI, and MNLI-matched, outperforming some major efficient transformer baselines. On SNLI, it attains a test accuracy of 86.4\% (with a peak validation accuracy of 86.6\%), surpassing ALBERT-Base and remaining highly competitive with RoBERTa-Base. Code is available at https://github.com/dbouchaffra/NeuroGame-Transformer.

[387]  arXiv:2603.18762 [pdf, ps, other]
Title: ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation
Comments: 8 pages, 5 figures, 2 tables. Preliminary technical report; quantitative experiments and extended evaluation to appear in v2
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Autonomous web agents such as \textbf{OpenClaw} are rapidly moving into high-impact real-world workflows, but their security robustness under live network threats remains insufficiently evaluated. Existing benchmarks mainly focus on static sandbox settings and content-level prompt attacks, which leaves a practical gap for network-layer security testing. In this paper, we present \textbf{ClawTrap}, a \textbf{MITM-based red-teaming framework for real-world OpenClaw security evaluation}. ClawTrap supports diverse and customizable attack forms, including \textit{Static HTML Replacement}, \textit{Iframe Popup Injection}, and \textit{Dynamic Content Modification}, and provides a reproducible pipeline for rule-driven interception, transformation, and auditing. This design lays the foundation for future research to construct richer, customizable MITM attacks and to perform systematic security testing across agent frameworks and model backbones. Our empirical study shows clear model stratification: weaker models are more likely to trust tampered observations and produce unsafe outputs, while stronger models demonstrate better anomaly attribution and safer fallback strategies. These findings indicate that reliable OpenClaw security evaluation should explicitly incorporate dynamic real-world MITM conditions rather than relying only on static sandbox protocols.

[388]  arXiv:2603.18764 [pdf, ps, other]
Title: ProCal: Probability Calibration for Neighborhood-Guided Source-Free Domain Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Source-Free Domain Adaptation (SFDA) adapts pre-trained models to unlabeled target domains without requiring access to source data. Although state-of-the-art methods leveraging local neighborhood structures show promise for SFDA, they tend to over-rely on prediction similarity among neighbors. This over-reliance accelerates the forgetting of source knowledge and increases susceptibility to local noise overfitting. To address these issues, we introduce ProCal, a probability calibration method that dynamically calibrates neighborhood-based predictions through a dual-model collaborative prediction mechanism. ProCal integrates the source model's initial predictions with the current model's online outputs to effectively calibrate neighbor probabilities. This strategy not only mitigates the interference of local noise but also preserves the discriminative information from the source model, thereby achieving a balance between knowledge retention and domain adaptation. Furthermore, we design a joint optimization objective that combines a soft supervision loss with a diversity loss to guide the target model. Our theoretical analysis shows that ProCal converges to an equilibrium where source knowledge and target information are effectively fused, reducing both knowledge forgetting and overfitting. We validate the effectiveness of our approach through extensive experiments on 31 cross-domain tasks across four public datasets. Our code is available at: https://github.com/zhengyinghit/ProCal.

[389]  arXiv:2603.18765 [pdf, ps, other]
Title: Implicit Grading Bias in Large Language Models: How Writing Style Affects Automated Assessment Across Math, Programming, and Essay Tasks
Comments: 7 pages, 5 figures, 2 tables, 11 references
Subjects: Computation and Language (cs.CL)

As large language models (LLMs) are increasingly deployed as automated graders in educational settings, concerns about fairness and bias in their evaluations have become critical. This study investigates whether LLMs exhibit implicit grading bias based on writing style when the underlying content correctness remains constant. We constructed a controlled dataset of 180 student responses across three subjects (Mathematics, Programming, and Essay/Writing), each with three surface-level perturbation types: grammar errors, informal language, and non-native phrasing. Two state-of-the-art open-source LLMs -- LLaMA 3.3 70B (Meta) and Qwen 2.5 72B (Alibaba) -- were prompted to grade responses on a 1-10 scale with explicit instructions to evaluate content correctness only and to disregard writing style. Our results reveal statistically significant grading bias in Essay/Writing tasks across both models and all perturbation types (p < 0.05), with effect sizes ranging from medium (Cohen's d = 0.64) to very large (d = 4.25). Informal language received the heaviest penalty, with LLaMA deducting an average of 1.90 points and Qwen deducting 1.20 points on a 10-point scale -- penalties comparable to the difference between a B+ and C+ letter grade. Non-native phrasing was penalized 1.35 and 0.90 points respectively. In sharp contrast, Mathematics and Programming tasks showed minimal bias, with most conditions failing to reach statistical significance. These findings demonstrate that LLM grading bias is subject-dependent, style-sensitive, and persists despite explicit counter-bias instructions in the grading prompt. We discuss implications for equitable deployment of LLM-based grading systems and recommend bias auditing protocols before institutional adoption.

[390]  arXiv:2603.18766 [pdf, ps, other]
Title: Enhancing the Parameterization of Reservoir Properties for Data Assimilation Using Deep VAE-GAN
Subjects: Machine Learning (cs.LG)

Currently, the methods called Iterative Ensemble Smoothers, especially the method called Ensemble Smoother with Multiple Data Assimilation (ESMDA) can be considered state-of-the-art for history matching in petroleum reservoir simulation. However, this approach has two important limitations: the use of an ensemble with finite size to represent the distributions and the Gaussian assumption in parameter and data uncertainties. This latter is particularly important because many reservoir properties have non-Gaussian distributions. Parameterization involves mapping non-Gaussian parameters to a Gaussian field before the update and then mapping them back to the original domain to forward the ensemble through the reservoir simulator. A promising approach to perform parameterization is through deep learning models. Recent studies have shown that Generative Adversarial Networks (GAN) performed poorly concerning data assimilation, but generated more geologically plausible realizations of the reservoir, while the Variational Autoencoder (VAE) performed better than the GAN in data assimilation, but generated less geologically realistic models. This work is innovative in combining the strengths of both to implement a deep learning model called Variational Autoencoder Generative Adversarial Network (VAE-GAN) integrated with ESMDA. The methodology was applied in two case studies, one case being categorical and the other with continuous values of permeability. Our findings demonstrate that by applying the VAE-GAN model we can obtain high quality reservoir descriptions (just like GANs) and a good history matching on the production curves (just like VAEs) simultaneously.

[391]  arXiv:2603.18767 [pdf, ps, other]
Title: A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models
Subjects: Artificial Intelligence (cs.AI)

Concept unlearning has emerged as a promising direction for reducing the risks of harmful content generation in text-to-image diffusion models by selectively erasing undesirable concepts from a model's parameters. Existing approaches typically rely on keywords to identify the target concept to be unlearned. However, we show that this keyword-based formulation is inherently limited: a visual concept is multi-dimensional, can be expressed in diverse textual forms, and often overlap with related concepts in the latent space, making keyword-only unlearning, which imprecisely indicate the target concept is brittle and prone to over-forgetting. This occurs because a single keyword represents only a narrow point estimate of the concept, failing to cover its full semantic distribution and entangled variations in the latent space. To address this limitation, we propose Diversified Unlearning, a distributional framework that represents a concept through a set of contextually diverse prompts rather than a single keyword. This richer representation enables more precise and robust unlearning. Through extensive experiments across multiple benchmarks and state-of-the-art baselines, we demonstrate that integrating Diversified Unlearning as an add-on component into existing unlearning pipelines consistently achieves stronger erasure, better retention of unrelated concepts, and improved robustness against adversarial recovery attacks.

[392]  arXiv:2603.18771 [pdf, ps, other]
Title: Empathetic Motion Generation for Humanoid Educational Robots via Reasoning-Guided Vision--Language--Motion Diffusion Architecture
Subjects: Robotics (cs.RO)

This article suggests a reasoning-guided vision-language-motion diffusion framework (RG-VLMD) for generating instruction-aware co-speech gestures for humanoid robots in educational scenarios. The system integrates multi-modal affective estimation, pedagogical reasoning, and teaching-act-conditioned motion synthesis to enable adaptive and semantically consistent robot behavior. A gated mixture-of-experts model predicts Valence/Arousal from input text, visual, and acoustic features, which then mapped to discrete teaching-act categories through an affect-driven policy.These signals condition a diffusion-based motion generator using clip-level intent and frame-level instructional schedules via additive latent restriction with auxiliary action-group supervision. Compared to a baseline diffusion model, our proposed method produces more structured and distinctive motion patterns, as verified by motion statics and pairwise distance analysis. Generated motion sequences remain physically plausible and can be retargeted to a NAO robot for real-time execution. The results reveal that reasoning-guided instructional conditioning improves gesture controllability and pedagogical expressiveness in educational human-robot interaction.

[393]  arXiv:2603.18773 [pdf, ps, other]
Title: Automatic Configuration of LLM Post-Training Pipelines
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

LLM post-training pipelines that combine supervised fine-tuning and reinforcement learning are difficult to configure under realistic compute budgets: the configuration space is high-dimensional and heterogeneous, stages are strongly coupled, and each end-to-end evaluation is expensive. We propose AutoPipe, a budget-aware two-stage framework for configuration selection in LLM post-training. Offline, AutoPipe learns a dataset-conditioned learning-to-rank surrogate from historical runs, capturing within-dataset preferences and providing transferable guidance toward promising regions of the configuration space. Online, for a new dataset, AutoPipe uses the offline guidance to steer Bayesian optimization and models dataset-specific deviations with a Gaussian-process residual surrogate. To reduce evaluation cost, each trial is early-stopped and scored by a learned predictor that maps early training signals to a low-cost proxy for final post-training performance. Experiments on biomedical reasoning tasks show that AutoPipe consistently outperforms offline-only baselines and achieves comparable performance with the strongest online HPO baselines while using less than 10\% of their computational cost.

[394]  arXiv:2603.18774 [pdf, ps, other]
Title: SEAR: Simple and Efficient Adaptation of Visual Geometric Transformers for RGB+Thermal 3D Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Foundational feed-forward visual geometry models enable accurate and efficient camera pose estimation and scene reconstruction by learning strong scene priors from massive RGB datasets. However, their effectiveness drops when applied to mixed sensing modalities, such as RGB-thermal (RGB-T) images. We observe that while a visual geometry grounded transformer pretrained on RGB data generalizes well to thermal-only reconstruction, it struggles to align RGB and thermal modalities when processed jointly. To address this, we propose SEAR, a simple yet efficient fine-tuning strategy that adapts a pretrained geometry transformer to multimodal RGB-T inputs. Despite being trained on a relatively small RGB-T dataset, our approach significantly outperforms state-of-the-art methods for 3D reconstruction and camera pose estimation, achieving significant improvements over all metrics (e.g., over 29\% in AUC@30) and delivering higher detail and consistency between modalities with negligible overhead in inference time compared to the original pretrained model. Notably, SEAR enables reliable multimodal pose estimation and reconstruction even under challenging conditions, such as low lighting and dense smoke. We validate our architecture through extensive ablation studies, demonstrating how the model aligns both modalities. Additionally, we introduce a new dataset featuring RGB and thermal sequences captured at different times, viewpoints, and illumination conditions, providing a robust benchmark for future work in multimodal 3D scene reconstruction. Code and models are publicly available at https://www.github.com/Schindler-EPFL-Lab/SEAR.

[395]  arXiv:2603.18777 [pdf, ps, other]
Title: Analysis of Convergence for the IPA-AC Method
Comments: 29 pages, 7 figures, 6 tables
Subjects: Numerical Analysis (math.NA)

The Improved Partial Area-Analytical Calculation (IPA-AC) method represents a leading meshfree discretization strategy for peridynamic models, distinguished by its rigorous geometric treatment of boundary intersections via dual corrections of integration weights and quadrature points. Despite its empirical success in suppressing boundary-induced geometric errors, a systematic theoretical characterization of its convergence behaviors under distinct scaling limits has remained elusive. This work establishes a unified convergence framework for the IPA-AC method applied to both scalar and tensor kernels. By leveraging the Lax Equivalence Theorem, we explicitly derive error estimates that reveal the method's performance across three critical limiting regimes. The theoretical analysis, substantiated by numerical validation, demonstrates that: (1) for a fixed horizon $\delta$, the method achieves robust second-order convergence $\mathcal{O}(h ^{2})$ with respect to the mesh size $h$; (2) for a fixed mesh, the discretization error scales as $\mathcal{O}(\delta^{-2})$, indicating a sensitivity to the nonlocal length scale; and (3) the method does not satisfy the Asymptotic Compatibility (AC) condition. These findings clarify that while the IPA-AC method offers superior accuracy for simulating fixed nonlocal models, it requires a sufficiently large horizon-to-mesh ratio to mitigate intrinsic discretization errors when approximating the local limit.

[396]  arXiv:2603.18779 [pdf, ps, other]
Title: SoK: Practical Aspects of Releasing Differentially Private Graphs
Comments: 20 pages. Accepted to ACM ASIA CCS '26. DOI to be added once available
Subjects: Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)

Graph data is increasingly prevalent across domains, offering analytical value but raising significant privacy concerns. Edges may encode sensitive relationships, while node attributes may contain sensitive entity or personal data. Differential Privacy (DP) has gained traction for its strong guarantees, yet applying DP to graphs is challenging because of their complex relational structure, leading to trade-offs between privacy and utility. Existing methods vary in privacy definitions, utility goals, and contextual settings, complicating comparison. For practitioners, this is compounded by DP's interpretability issues, contributing to misleading protection claims.
To address this, we propose a novel systemisation of existing methods tailored to practical considerations and adaptable to varying practitioner objectives. Our contributions include: (i) a comprehensive survey of differentially private graph release methods; (ii) identification of key vulnerabilities; and (iii) a practitioner-oriented, objective-based framework to guide the selection, interpretation, and sound evaluation of existing methods. We demonstrate the use of our systemisation through two exemplary scenarios in which we assume the role of a social network analyst, apply it, and conduct evaluations in accordance with our framework. Together, these two illustrative instantiations ultimately provide a unified benchmark for state-of-the-art methods in the social networks domain.

[397]  arXiv:2603.18782 [pdf, ps, other]
Title: Points-to-3D: Structure-Aware 3D Generation with Point Cloud Priors
Comments: Accepted by CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recent progress in 3D generation has been driven largely by models conditioned on images or text, while readily available 3D priors are still underused. In many real-world scenarios, the visible-region point cloud are easy to obtain from active sensors such as LiDAR or from feed-forward predictors like VGGT, offering explicit geometric constraints that current methods fail to exploit. In this work, we introduce Points-to-3D, a diffusion-based framework that leverages point cloud priors for geometry-controllable 3D asset and scene generation. Built on a latent 3D diffusion model TRELLIS, Points-to-3D first replaces pure-noise sparse structure latent initialization with a point cloud priors tailored input formulation.A structure inpainting network, trained within the TRELLIS framework on task-specific data designed to learn global structural inpainting, is then used for inference with a staged sampling strategy (structural inpainting followed by boundary refinement), completing the global geometry while preserving the visible regions of the input priors.In practice, Points-to-3D can take either accurate point-cloud priors or VGGT-estimated point clouds from single images as input. Experiments on both objects and scene scenarios consistently demonstrate superior performance over state-of-the-art baselines in terms of rendering quality and geometric fidelity, highlighting the effectiveness of explicitly embedding point-cloud priors for achieving more accurate and structurally controllable 3D generation.

[398]  arXiv:2603.18784 [pdf, ps, other]
Title: ViTac-Tracing: Visual-Tactile Imitation Learning of Deformable Object Tracing
Comments: The paper has been accepted by ICRA2026
Subjects: Robotics (cs.RO)

Deformable objects often appear in unstructured configurations. Tracing deformable objects helps bringing them into extended states and facilitating the downstream manipulation tasks. Due to the requirements for object-specific modeling or sim-to-real transfer, existing tracing methods either lack generalizability across different categories of deformable objects or struggle to complete tasks reliably in the real world. To address this, we propose a novel visual-tactile imitation learning method to achieve one-dimensional (1D) and two-dimensional (2D) deformable object tracing with a unified model. Our method is designed from both local and global perspectives based on visual and tactile sensing. Locally, we introduce a weighted loss that emphasizes actions maintaining contact near the center of the tactile image, improving fine-grained adjustment. Globally, we propose a tracing task loss that helps the policy to regulate task progression. On the hardware side, to compensate for the limited features extracted from visual information, we integrate tactile sensing into a low-cost teleoperation system considering both the teleoperator and the robot. Extensive ablation and comparative experiments on diverse 1D and 2D deformable objects demonstrate the effectiveness of our approach, achieving an average success rate of 80% on seen objects and 65% on unseen objects.

[399]  arXiv:2603.18786 [pdf, ps, other]
Title: Proceedings of the 2nd Workshop on Advancing Artificial Intelligence through Theory of Mind
Comments: workshop proceedings. contains arXiv:2601.03389, arXiv:2511.15895, arXiv:2512.23482, arXiv:2601.01599
Subjects: Artificial Intelligence (cs.AI)

This volume includes a selection of papers presented at the 2nd Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2026 in Singapore on 26th January 2026. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.

[400]  arXiv:2603.18788 [pdf, ps, other]
Title: Mi:dm K 2.5 Pro
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The evolving LLM landscape requires capabilities beyond simple text generation, prioritizing multi-step reasoning, long-context understanding, and agentic workflows. This shift challenges existing models in enterprise environments, especially in Korean-language and domain-specific scenarios where scaling is insufficient. We introduce Mi:dm K 2.5 Pro, a 32B parameter flagship LLM designed to address enterprise-grade complexity through reasoning-focused optimization.
Our methodology builds a robust data foundation via a quality-centric curation pipeline utilizing abstract syntax tree (AST) analysis for code, gap-filling synthesis for mathematics, and an LLM-based quality evaluator. Pre-training scales the model via layer-predictor-based Depth Upscaling (DuS) and a progressive strategy supporting a 128K token context window. Post-training introduces a specialized multi-stage pipeline, including Reasoning SFT, model merging, and asynchronous reinforcement learning (RL), to develop complex problem-solving skills. "Fusion Training" then rebalances these capabilities with conversational fluency, consistent response styling, and reliable tool-use.
The evaluations show that Mi:dm K 2.5 Pro achieves competitive performance against leading global and domestic models. In addition, it sets state-of-the-art results on Korean-specific benchmarks, showcasing deep linguistic and cultural understanding. Finally, Responsible AI evaluations validate safety against attacks, ensuring a secure profile for deployment with a balance of harmlessness and responsiveness.

[401]  arXiv:2603.18789 [pdf, ps, other]
Title: Weaver: Fuzzing JavaScript Engines at the JavaScript-WebAssembly Boundary
Subjects: Cryptography and Security (cs.CR)

The security of modern JavaScript (JS) engines is critical since they provide the primary defense mechanism for executing untrusted code on the web. The recent integration of WebAssembly (Wasm) has transformed these engines into complex polyglot environments, creating a novel attack surface at the JS-Wasm interaction boundary due to the distinct type systems and memory models of two languages. This boundary remains largely underexplored, as previous works mainly focus on testing JS and Wasm as two isolated entities rather than investigating the security implications of their cross-language interactions.
This paper proposes Weaver, an effective greybox fuzzing framework specifically tailored to uncover vulnerabilities at the JS-Wasm boundary. To comply with the language constraints, Weaver uses a type-aware generation strategy, meticulously maintaining the dual-type representation for every generated variables. This allows fuzzer to validly utilize variables across the language boundary. Besides, Weaver leverages the UCB-1 algorithm to intelligently schedule mutators and generators to maximize the discovery of new code paths.
We have implemented and evaluated Weaver on three JS engines. The results indicate that Weaver achieves superior code coverage compared to state-of-the-art fuzzers. Moreover, Weaver has uncovered two new bugs in the latest versions of these engines, one of which is considered high severity and set to highest priority, demonstrating the practicality of Weaver.

[402]  arXiv:2603.18792 [pdf, ps, other]
Title: Rethinking Uncertainty Quantification and Entanglement in Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Uncertainty quantification (UQ) is crucial in safety-critical applications such as medical image segmentation. Total uncertainty is typically decomposed into data-related aleatoric uncertainty (AU) and model-related epistemic uncertainty (EU). Many methods exist for modeling AU (such as Probabilistic UNet, Diffusion) and EU (such as ensembles, MC Dropout), but it is unclear how they interact when combined. Additionally, recent work has revealed substantial entanglement between AU and EU, undermining the interpretability and practical usefulness of the decomposition. We present a comprehensive empirical study covering a broad range of AU-EU model combinations, propose a metric to quantify uncertainty entanglement, and evaluate both across downstream UQ tasks. For out-of-distribution detection, ensembles exhibit consistently lower entanglement and superior performance. For ambiguity modeling and calibration the best models are dataset-dependent, with softmax/SSN-based methods performing well and Probabilistic UNets being less entangled. A softmax ensemble fares remarkably well on all tasks. Finally, we analyze potential sources of uncertainty entanglement and outline directions for mitigating this effect.

[403]  arXiv:2603.18793 [pdf, ps, other]
Title: Functional Subspace Watermarking for Large Language Models
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably undergo complex distortions during realistic model modifications such as fine-tuning, quantization, or knowledge distillation, making reliable extraction extremely challenging. Despite extensive research on model-side watermarking, existing methods still lack sufficient robustness against parameter-level perturbations. To address this gap, we propose \texttt{\textbf{Functional Subspace Watermarking (FSW)}}, a framework that anchors ownership signals into a low-dimensional functional backbone. Specifically, we first solve a generalized eigenvalue problem to extract a stable functional subspace for watermark injection, while introducing an adaptive spectral truncation strategy to achieve an optimal balance between robustness and model utility. Furthermore, a vector consistency constraint is incorporated to ensure that watermark injection does not compromise the original semantic performance. Extensive experiments across various LLM architectures and datasets demonstrate that our method achieves superior detection accuracy and statistical verifiability under multiple model attacks, maintaining robustness that outperforms existing state-of-the-art (SOTA) methods.

[404]  arXiv:2603.18795 [pdf, ps, other]
Title: Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Large Vision Language Models (LVLMs) excel at semantic understanding but struggle with fine grained spatial grounding, as the model must implicitly infer complex geometry without ever producing a spatial interpretation. We present Perceptio, a perception enhanced LVLM with 2D and 3D spatial reasoning abilities, enabled via explicit semantic segmentation tokens and depth tokens generated directly within the autoregressive sequence. Concretely, we (i) distill a VQVAE depth codebook from a strong monocular teacher to tokenize dense depth into compact sequences, and (ii) integrate SAM2 based semantic segmentation tokens and VQ-VAE depth tokens inside the LLM so the model first emits spatial tokens and then answers. To stabilize depth token generation, we introduce novel composite depth-token objectives (marker, token, and count losses) and a soft-merging technique for differentiable reconstruction. We adopt a multi-task co-training strategy across diverse datasets, letting the model learn perception tokens to tackle multiple downstream tasks. Building on InternVL, Perceptio achieves state-of-the-art performance across benchmarks: improving referring expression segmentation by +0.8/+1.4/+1.1 cIoU on RefCOCO/+/g HardBLINK spatial understanding accuracy by 10.3%, and MMBench accuracy by 1.0%, demonstrating that explicit spatial chain-of-thought materially strengthens spatial grounding in LVLMs.

[405]  arXiv:2603.18797 [pdf, ps, other]
Title: VesselTok: Tokenizing Vessel-like 3D Biomedical Graph Representations for Reconstruction and Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Spatial graphs provide a lightweight and elegant representation of curvilinear anatomical structures such as blood vessels, lung airways, and neuronal networks. Accurately modeling these graphs is crucial in clinical and (bio-)medical research. However, the high spatial resolution of large networks drastically increases their complexity, resulting in significant computational challenges. In this work, we aim to tackle these challenges by proposing VesselTok, a framework that approaches spatially dense graphs from a parametric shape perspective to learn latent representations (tokens). VesselTok leverages centerline points with a pseudo radius to effectively encode tubular geometry. Specifically, we learn a novel latent representation conditioned on centerline points to encode neural implicit representations of vessel-like, tubular structures. We demonstrate VesselTok's performance across diverse anatomies, including lung airways, lung vessels, and brain vessels, highlighting its ability to robustly encode complex topologies. To prove the effectiveness of VesselTok's learnt latent representations, we show that they (i) generalize to unseen anatomies, (ii) support generative modeling of plausible anatomical graphs, and (iii) transfer effectively to downstream inverse problems, such as link prediction.

[406]  arXiv:2603.18798 [pdf, ps, other]
Title: Signals of Success and Struggle: Early Prediction and Physiological Signatures of Human Performance across Task Complexity
Comments: CHI2026
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)

User performance is crucial in interactive systems, capturing how effectively users engage with task execution. Prospectively predicting performance enables the timely identification of users struggling with task demands. While ocular and cardiac signals are widely used to characterise performance-relevant visual behaviour and physiological activation, their potential for early prediction and for revealing the physiological mechanisms underlying performance differences remains underexplored. We conducted a within-subject experiment in a game environment with naturally unfolding complexity, using early ocular and cardiac signals to predict later performance and to examine physiological and self-reported group differences. Results show that the ocular-cardiac fusion model achieves a balanced accuracy of 0.86, and the ocular-only model shows comparable predictive power. High performers exhibited targeted gaze and adjusted visual sampling, and sustained more stable cardiac activation as demands intensified, with a more positive affective experience. These findings demonstrate the feasibility of cross-session prediction from early physiology, providing interpretable insights into performance variation and facilitating future proactive intervention.

[407]  arXiv:2603.18804 [pdf, ps, other]
Title: "You've got a friend in me": Co-Designing a Peer Social Robot for Young Newcomers' Language and Cultural Learning
Subjects: Robotics (cs.RO)

Community literacy programs supporting young newcomer children in Canada face limited staffing and scarce one-to-one time, which constrains personalized English and cultural learning support. This paper reports on a co-design study with United for Literacy tutors that informed Maple, a table-top, peer-like Socially Assistive Robot (SAR) designed as a practice partner within tutor-mediated sessions. From shadowing and co-design interviews, we derived newcomer-specific requirements and added them in an integrated prototype that uses short story-based activities, multi-modal scaffolding (speech, facial feedback, gesture), and embedded quizzes that support attention while producing tutor-actionable formative signals. We contribute system design implications for tutor-in-the-loop SARs supporting language socialization in community settings and outline directions for child-centered evaluation in authentic programs.

[408]  arXiv:2603.18806 [pdf, ps, other]
Title: dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models
Subjects: Artificial Intelligence (cs.AI)

Diffusion Large Language Models (dLLMs) introduce a new paradigm for language generation, which in turn presents new challenges for aligning them with human preferences. In this work, we aim to improve the policy optimization for dLLMs by reducing the cost of the trajectory probability calculation, thereby enabling scaled-up offline policy training. We prove that: (i) under reference policy regularization, the probability ratio of the newly unmasked tokens is an unbiased estimate of that of intermediate diffusion states, and (ii) the probability of the full trajectory can be effectively estimated with a single forward pass of a re-masked final state. By integrating these two trajectory reduction strategies into a policy optimization objective, we propose Trajectory Reduction Policy Optimization (dTRPO). We evaluate dTRPO on 7B dLLMs across instruction-following and reasoning benchmarks. Results show that it substantially improves the core performance of state-of-the-art dLLMs, achieving gains of up to 9.6% on STEM tasks, up to 4.3% on coding tasks, and up to 3.0% on instruction-following tasks. Moreover, dTRPO exhibits strong training efficiency due to its offline, single-forward nature, and achieves improved generation efficiency through high-quality outputs.

[409]  arXiv:2603.18811 [pdf, ps, other]
Title: V-Dreamer: Automating Robotic Simulation and Trajectory Synthesis via Video Generation Priors
Comments: 8 pages, 6 figures
Subjects: Robotics (cs.RO)

Training generalist robots demands large-scale, diverse manipulation data, yet real-world collection is prohibitively expensive, and existing simulators are often constrained by fixed asset libraries and manual heuristics. To bridge this gap, we present V-Dreamer, a fully automated framework that generates open-vocabulary, simulation-ready manipulation environments and executable expert trajectories directly from natural language instructions. V-Dreamer employs a novel generative pipeline that constructs physically grounded 3D scenes using large language models and 3D generative models, validated by geometric constraints to ensure stable, collision-free layouts. Crucially, for behavior synthesis, we leverage video generation models as rich motion priors. These visual predictions are then mapped into executable robot trajectories via a robust Sim-to-Gen visual-kinematic alignment module utilizing CoTracker3 and VGGT. This pipeline supports high visual diversity and physical fidelity without manual intervention. To evaluate the generated data, we train imitation learning policies on synthesized trajectories encompassing diverse object and environment variations. Extensive evaluations on tabletop manipulation tasks using the Piper robotic arm demonstrate that our policies robustly generalize to unseen objects in simulation and achieve effective sim-to-real transfer, successfully manipulating novel real-world objects.

[410]  arXiv:2603.18812 [pdf, ps, other]
Title: Central Triangulation under Parallel Flip Operations: The CG:SHOP Challenge 2026
Comments: 10 pages, 6 figures, 2 tables
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)

We give an overview of the 2026 Computational Geometry Challenge targeting the problem of finding a Central Triangulation under Parallel Flip Operations in triangulations of point sets. A flip is the parallel exchange of a set of edges in a triangulation with opposing diagonals of the convex quadrilaterals containing them. The challenge objective was, given a set of triangulations of a fixed point set, to determine a central triangulation with respect to parallel flip distances. More precisely, this asks for a triangulation that minimizes the sum of flip distances to all elements of the input

[411]  arXiv:2603.18813 [pdf, ps, other]
Title: Can LLM generate interesting mathematical research problems?
Subjects: Artificial Intelligence (cs.AI)

This paper is the second one in a series of work on the mathematical creativity of LLM. In the first paper, the authors proposed three criteria for evaluating the mathematical creativity of LLM and constructed a benchmark dataset to measure it. This paper further explores the mathematical creativity of LLM, with a focus on investigating whether LLM can generate valuable and cutting-edge mathematical research problems. We develop an agent to generate unknown problems and produced 665 research problems in differential geometry. Through human verification, we find that many of these mathematical problems are unknown to experts and possess unique research value.

[412]  arXiv:2603.18815 [pdf, ps, other]
Title: ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
Subjects: Artificial Intelligence (cs.AI)

Multi-turn LLM agents are increasingly important for solving complex, interactive tasks, and reinforcement learning (RL) is a key ingredient for improving their long-horizon behavior. However, RL training requires generating large numbers of sandboxed rollout trajectories, and existing infrastructures often couple rollout orchestration with the training loop, making systems hard to migrate and maintain. Under the rollout-as-a-service philosophy, we present ProRL Agent , a scalable infrastructure that serves the full agentic rollout lifecycle through an API service. ProRL Agent also provides standardized and extensible sandbox environments that support diverse agentic tasks in rootless HPC settings. We validate ProRL Agent through RL training on software engineering, math, STEM, and coding tasks. ProRL Agent is open-sourced and integrated as part of NVIDIA NeMo Gym.

[413]  arXiv:2603.18817 [pdf, ps, other]
Title: Seasoning Generative Models for a Generalization Aftertaste
Subjects: Machine Learning (cs.LG)

The use of discriminators to train or fine-tune generative models has proven to be a rather successful framework. A notable example is Generative Adversarial Networks (GANs) that minimize a loss incurred by training discriminators along with other paradigms that boost generative models via discriminators that satisfy weak learner constraints. More recently, even diffusion models have shown advantages with some kind of discriminator guidance. In this work, we extend a strong-duality result related to $f$-divergences which gives rise to a discriminator-guided recipe that allows us to \textit{refine} any generative model. We then show that the refined generative models provably improve generalization, compared to its non-refined counterpart. In particular, our analysis reveals that the gap in generalization is improved based on the Rademacher complexity of the discriminator set used for refinement. Our recipe subsumes a recently introduced score-based diffusion approach (Kim et al., 2022) that has shown great empirical success, however allows us to shed light on the generalization guarantees of this method by virtue of our analysis. Thus, our work provides a theoretical validation for existing work, suggests avenues for new algorithms, and contributes to our understanding of generalization in generative models at large.

[414]  arXiv:2603.18820 [pdf, ps, other]
Title: An automata-based test for bricks over string algebras
Comments: 11 pages, 4 figures
Subjects: Formal Languages and Automata Theory (cs.FL); Representation Theory (math.RT)

Motivated by the recent work of Deaconu, Mousavand and Paquette on the connection between infinite string bricks for certain gentle algebras and Sturmian words, we develop a decorated version of a deterministic automaton, called a multi-entry inverse automaton (MIA, for short) that accepts pointed words. We then associate an MIA $\mathsf M_{\Lambda\delta}$ over $\{0,1\}$ to a string algebra $\Lambda$, and show that strings over $\Lambda$ can be viewed as certain equivalence classes of the pointed words accepted by $\mathsf M_{\Lambda\delta}$. By defining (weak) brick words over this MIA, we show that a finite/infinite string module (resp. band module) is a brick if and only if every word in the associated equivalence class of pointed binary words is a brick word (resp. a weak brick word) over $\mathsf M_{\Lambda\delta}$. The result of Deaconu et al. follows as an immediate consequence.

[415]  arXiv:2603.18822 [pdf, ps, other]
Title: Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework
Subjects: Computation and Language (cs.CL)

This study presents a multi-stage classification framework for detecting human values in noisy Russian language social media, validated on a random sample of 7.5 million public text posts. Drawing on Schwartz's theory of basic human values, we design a multi-stage pipeline that includes spam and nonpersonal content filtering, targeted selection of value relevant and politically relevant posts, LLM based annotation, and multi-label classification. Particular attention is given to verifying the quality of LLM annotations and model predictions against human experts. We treat human expert annotations not as ground truth but as an interpretative benchmark with its own uncertainty. To account for annotation subjectivity, we aggregate multiple LLM generated judgments into soft labels that reflect varying levels of agreement. These labels are then used to train transformer based models capable of predicting the probability of each of the ten basic values. The best performing model, XLM RoBERTa large, achieves an F1 macro of 0.83 and an F1 of 0.71 on held out test data. By treating value detection as a multi perspective interpretive task, where expert labels, GPT annotations, and model predictions represent coherent but not identical readings of the same texts, we show that the model generally aligns with human judgments but systematically overestimates the Openness to Change value domain. Empirically, the study reveals distinct patterns of value expression and their co-occurrence in Russian social networks, contributing to a broader research agenda on cultural variation, communicative framing, and value based interpretation in digital environments. All models are released publicly.

[416]  arXiv:2603.18827 [pdf, ps, other]
Title: Student views in AI Ethics and Social Impact
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

An investigation, from a gender perspective, of how students view the ethical implications and societal effects of artificial intelligence is conducted, examining concepts that could have a big influence on how artificial intelligence may be taught in the future. For this, we conducted a survey on a cohort of 230 second year computer science students to reveal their opinions. The results revealed that AI, from the students' perspective, will significantly impact daily life, particularly in areas such as medicine, education, or media. Men are more aware of potential changes in Computer Science, autonomous driving, image and video processing, and chatbot usage, while women mention more the impact on social media. Both men and women perceive potential threats in the same manner, with men more aware of war, AI controlled drones, terrain recognition, and information war. Women seem to have a stronger tendency towards ethical considerations and helping others.

[417]  arXiv:2603.18829 [pdf, ps, other]
Title: Agent Control Protocol: Admission Control for Agent Actions
Authors: Marcelo Fernandez (TraslaIA)
Comments: 21 pages. Specification repository: this https URL
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Agent Control Protocol (ACP) is a formal technical specification for governance of autonomous agents in B2B institutional environments. ACP is the admission control layer between agent intent and system state mutation: before any agent action reaches execution, it must pass a cryptographic admission check that validates identity, capability scope, delegation chain, and policy compliance simultaneously.
ACP defines the mechanisms of cryptographic identity, capability-based authorization, deterministic risk evaluation, verifiable chained delegation, transitive revocation, and immutable auditing that a system must implement for autonomous agents to operate under explicit institutional control. ACP operates as an additional layer on top of RBAC and Zero Trust, without replacing them.
The v1.13 specification comprises 36 technical documents organized into five conformance levels (L1-L5). It includes a Go reference implementation of 22 packages covering all L1-L4 capabilities, 51 signed conformance test vectors (Ed25519 + SHA-256), and an OpenAPI 3.1.0 specification for all HTTP endpoints. It defines more than 62 verifiable requirements, 12 prohibited behaviors, and the mechanisms for interoperability between institutions.
Specification and implementation: https://github.com/chelof100/acp-framework-en

[418]  arXiv:2603.18834 [pdf, ps, other]
Title: Statistical Characteristic-Guided Denoising for Rapid High-Resolution Transmission Electron Microscopy Imaging
Comments: Accepted to CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

High-Resolution Transmission Electron Microscopy (HRTEM) enables atomic-scale observation of nucleation dynamics, which boosts the studies of advanced solid materials. Nonetheless, due to the millisecond-scale rapid change of nucleation, it requires short-exposure rapid imaging, leading to severe noise that obscures atomic positions. In this work, we propose a statistical characteristic-guided denoising network, which utilizes statistical characteristics to guide the denoising process in both spatial and frequency domains. In the spatial domain, we present spatial deviation-guided weighting to select appropriate convolution operations for each spatial position based on deviation characteristic. In the frequency domain, we present frequency band-guided weighting to enhance signals and suppress noise based on band characteristics. We also develop an HRTEM-specific noise calibration method and generate a dataset with disordered structures and realistic HRTEM image noises. It can ensure the denoising performance of models on real images for nucleation observation. Experiments on synthetic and real data show our method outperforms the state-of-the-art methods in HRTEM image denoising, with effectiveness in the localization downstream task. Code will be available at https://github.com/HeasonLee/SCGN.

[419]  arXiv:2603.18835 [pdf, ps, other]
Title: Tursio Database Search: How far are we from ChatGPT?
Subjects: Databases (cs.DB)

Business users need to search enterprise databases using natural language, just as they now search the web using ChatGPT or Perplexity. However, existing benchmarks -- designed for open-domain QA or text-to-SQL -- do not evaluate the end-to-end quality of such a search experience. We present an evaluation framework for structured database search that generates realistic banking queries across varying difficulty levels and assesses answer quality using relevance, safety, and conversational metrics via an LLM-as-judge approach. We apply this framework to compare Tursio, a database search platform, against ChatGPT and Perplexity on a credit union banking schema. Our results show that Tursio achieves answer relevancy statistically comparable to both baselines (97.8% vs. 98.1% on simple, 90.0% vs. 100.0% on medium, 89.5% vs. 100.0% on hard questions), even though Tursio answers from a structured database while the baselines generate responses from the open web. We analyze the failure modes, identify database completeness as the primary bottleneck, and outline directions for improving both the evaluation methodology and the systems under evaluation.

[420]  arXiv:2603.18836 [pdf, ps, other]
Title: Confidential Databases Without Cryptographic Mappings
Subjects: Cryptography and Security (cs.CR); Databases (cs.DB)

Confidential databases (CDBs) are essential for enabling secure queries over sensitive data in untrusted cloud environments using confidential computing hardware. While adoption is growing, widespread deployment is hindered by high performance overhead from frequent synchronous cryptographic operations, which causes significant computational and memory bottlenecks. We present FEDB, a novel CDB design that removes cryptographic operations from the critical path. FEDB leverages crypto-free mappings, which maintain data-independent identifiers within the database while securely mapping them to plaintext secrets in a trusted domain. This paradigm shift reduces the runtime overhead by up to 78.0 times on industry-standard benchmarks including TPC-C and TPC-H.

[421]  arXiv:2603.18837 [pdf, ps, other]
Title: Model Order Reduction of Cerebrovascular Hemodynamics Using POD_Galerkin and Reservoir Computing_based Approach
Comments: 24 pages, 15 figures
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)

We investigate model order reduction (MOR) strategies for simulating unsteady hemodynamics within cerebrovascular systems, contrasting a physics-based intrusive approach with a data-driven non-intrusive framework. High-fidelity 3D Computational Fluid Dynamics (CFD) snapshots of an idealised basilar artery bifurcation are first compressed into a low-dimensional latent space using Proper Orthogonal Decomposition (POD). We evaluate the performance of a POD-Galerkin (POD-G) model, which projects the Navier-Stokes equations onto the reduced basis, against a POD-Reservoir Computing (POD-RC) model that learns the temporal evolution of coefficients through a recurrent architecture. A multi-harmonic and multi-amplitude training signal is introduced to improve training efficiency. Both methodologies achieve computational speed-ups on the order of 10^2 to 10^3 compared to full-order simulations, demonstrating their potential as efficient and accurate surrogates for predicting flow quantities such as wall shear stress.

[422]  arXiv:2603.18838 [pdf, ps, other]
Title: A Model Ensemble-Based Post-Processing Framework for Fairness-Aware Prediction
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Striking an optimal balance between predictive performance and fairness continues to be a fundamental challenge in machine learning. In this work, we propose a post-processing framework that facilitates fairness-aware prediction by leveraging model ensembling. Designed to operate independently of any specific model internals, our approach is widely applicable across various learning tasks, model architectures, and fairness definitions. Through extensive experiments spanning classification, regression, and survival analysis, we demonstrate that the framework effectively enhances fairness while maintaining, or only minimally affecting, predictive accuracy.

[423]  arXiv:2603.18840 [pdf, ps, other]
Title: Robust Beamforming for Practical RIS-Aided RSMA Systems with Imperfect SIC under Transceiver Hardware Impairments
Journal-ref: IEEE Transactions on Vehicular Technology, 2026
Subjects: Information Theory (cs.IT)

Reconfigurable intelligent surface (RIS)-aided rate-splitting multiple access (RSMA) systems have demonstrated remarkable potential in enhancing spectral efficiency. However, most existing works rely on ideal hardware, which is unrealistic.In practical deployments, RIS elements suffer from amplitude-phase coupling, where transceivers are subject to hardware impairments (HWI), and successive interference cancellation (SIC) in RSMA networks cannot achieve perfect interference elimination for decoded signals.To address these limitations, we investigate a robust beamforming design for RIS-aided RSMA systems under practical hardware imperfections. We first characterize the asymptotic signal-to-noise ratio (SNR) of practical RIS systems when the beamformer is designed based on ideal RIS model, thereby theoretically quantifying the resulting performance degradation. We then derive a closed-form expression for the distortion noise power induced by transceiver HWI, while also accounting for residual interference due to imperfect SIC. Building on these insights, we establish a comprehensive system model that jointly incorporates all hardware-induced impairments and formulate a multiuser sum rate maximization problem. To solve the resulting non-convex optimization problem, we develop an efficient block variable relaxation algorithm. Simulation results verify that the proposed scheme significantly outperforms conventional non-orthogonal multiple access (NOMA) approaches, and achieves superior robustness compared with benchmark schemes neglecting HWI, imperfect SIC, or amplitude-phase coupling.

[424]  arXiv:2603.18841 [pdf, ps, other]
Title: Holistic Energy Performance Management: Enablers, Capabilities, and Features
Comments: 7 Pages, Accepted in IEEE Communications Magazine
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

Energy consumption is a significant concern for mobile network operators, and to enable further network energy improvements it is also an important target when developing the emerging 6G standard. In this paper we show that, despite the existence of many energy-saving features in 5G new radio (NR) networks, activating them in isolation yields only suboptimal savings and often compromises other network key performance indicators (KPIs) such as coverage or latency. We first introduce a compact taxonomy that distinguishes hardware capabilities from higher-layer features. Features fall into two classes: (i) signaling and scheduling mechanisms that create idle windows, and (ii) features that utilize those windows to save energy. We then present a feature orchestrator as a logical node to coordinate between features to maximize the gain. Using a 3GPP-aligned simulator with product-realistic parameters, we show that coordinating lean NR, scheduling, and advanced sleep modes significantly reduces gNodeB (gNB) energy consumption with negligible throughput loss, compared to the uncoordinated scenario. We conclude by outlining open issues in observability, system dynamics, coordination, and intelligent automation for energy performance management.

[425]  arXiv:2603.18846 [pdf, ps, other]
Title: Towards Interpretable Foundation Models for Retinal Fundus Images
Comments: 11 pages, 3 figures, 2 tables, submitted to MICCAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Computation (stat.CO)

Foundation models are used to extract transferable representations from large amounts of unlabeled data, typically via self-supervised learning (SSL). However, many of these models rely on architectures that offer limited interpretability, which is a critical issue in high-stakes domains such as medical imaging. We propose Dual-IFM, a foundation model that is interpretable-by-design in two ways: First, it provides local interpretability for individual images through class evidence maps that are faithful to the decision-making process. Second, it provides global interpretability for entire datasets through a 2D projection layer that allows for direct visualization of the model's representation space. We trained our model on over 800,000 color fundus photography from various sources to learn generalizable, interpretable representations for different downstream tasks. Our results show that our model reaches a performance range similar to that of state-of-the-art foundation models with up to $16\times$ the number of parameters, while providing interpretable predictions on out-of-distribution data. Our results suggest that large-scale SSL pretraining paired with inherent interpretability can lead to robust representations for retinal imaging.

[426]  arXiv:2603.18850 [pdf, ps, other]
Title: HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video question answering (VQA) with vision-language models (VLMs) depends critically on which frames are selected from the input video, yet most systems rely on uniform or heuristic sampling that cannot be optimized for downstream answering quality. We introduce \textbf{HORNet}, a lightweight frame selection policy trained with Group Relative Policy Optimization (GRPO) to learn which frames a frozen VLM needs to answer questions correctly. With fewer than 1M trainable parameters, HORNet reduces input frames by up to 99\% and VLM processing time by up to 93\%, while improving answer quality on short-form benchmarks (+1.7\% F1 on MSVD-QA) and achieving strong performance on temporal reasoning tasks (+7.3 points over uniform sampling on NExT-QA). We formalize this as Select Any Frames (SAF), a task that decouples visual input curation from VLM reasoning, and show that GRPO-trained selection generalizes better out-of-distribution than supervised and PPO alternatives. HORNet's policy further transfers across VLM answerers without retraining, yielding an additional 8.5\% relative gain when paired with a stronger model. Evaluated across six benchmarks spanning 341,877 QA pairs and 114.2 hours of video, our results demonstrate that optimizing \emph{what} a VLM sees is a practical and complementary alternative to optimizing what it generates while improving efficiency. Code is available at https://github.com/ostadabbas/HORNet.

[427]  arXiv:2603.18853 [pdf, ps, other]
Title: Learn for Variation: Variationally Guided AAV Trajectory Learning in Differentiable Environments
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Autonomous aerial vehicles (AAVs) empower sixth-generation (6G) Internet-of-Things (IoT) networks through mobility-driven data collection. However, conventional reward-driven reinforcement learning for AAV trajectory planning suffers from severe credit assignment issues and training instability, because sparse scalar rewards fail to capture the long-term and nonlinear effects of sequential movements. To address these challenges, this paper proposes Learn for Variation (L4V), a gradient-informed trajectory learning framework that replaces high-variance scalar reward signals with dense and analytically grounded policy gradients. Particularly, the coupled evolution of AAV kinematics, distance-dependent channel gains, and per-user data-collection progress is first unrolled into an end-to-end differentiable computational graph. Backpropagation through time then serves as a discrete adjoint solver, which propagates exact sensitivities from the cumulative mission objective to every control action and policy parameter. These structured gradients are used to train a deterministic neural policy with temporal smoothness regularization and gradient clipping. Extensive simulations demonstrate that L4V consistently outperforms representative baselines, including a genetic algorithm, DQN, A2C, and DDPG, in mission completion time, average transmission rate, and training cost

[428]  arXiv:2603.18855 [pdf, ps, other]
Title: BeamAgent: LLM-Aided MIMO Beamforming with Decoupled Intent Parsing and Alternating Optimization for Joint Site Selection and Precoding
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Systems and Control (eess.SY)

Integrating large language models (LLMs) into wireless communication optimization is a promising yet challenging direction. Existing approaches either use LLMs as black-box solvers or code generators, tightly coupling them with numerical computation. However, LLMs lack the precision required for physical-layer optimization, and the scarcity of wireless training data makes domain-specific fine-tuning impractical. We propose BeamAgent, an LLM-aided MIMO beamforming framework that explicitly decouples semantic intent parsing from numerical optimization. The LLM serves solely as a semantic translator that converts natural language descriptions into structured spatial constraints. A dedicated gradient-based optimizer then jointly solves the discrete base station site selection and continuous precoding design through an alternating optimization algorithm. A scene-aware prompt enables grounded spatial reasoning without fine-tuning, and a multi-round interaction mechanism with dual-layer intent classification ensures robust constraint verification. A penalty-based loss function enforces dark-zone power constraints while releasing optimization degrees of freedom for bright-zone gain maximization. Experiments on a ray-tracing-based urban MIMO scenario show that BeamAgent achieves a bright-zone power of 84.0\,dB, outperforming exhaustive zero-forcing by 7.1 dB under the same dark-zone constraint. The end-to-end system reaches within 3.3 dB of the expert upper bound, with the full optimization completing in under 2 s on a laptop.

[429]  arXiv:2603.18856 [pdf, ps, other]
Title: Motion-o: Trajectory-Grounded Video Reasoning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recent research has made substantial progress on video reasoning, with many models leveraging spatio-temporal evidence chains to strengthen their inference capabilities. At the same time, a growing set of datasets and benchmarks now provides structured annotations designed to support and evaluate such reasoning. However, little attention has been paid to reasoning about \emph{how} objects move between observations: no prior work has articulated the motion patterns by connecting successive observations, leaving trajectory understanding implicit and difficult to verify. We formalize this missing capability as Spatial-Temporal-Trajectory (STT) reasoning and introduce \textbf{Motion-o}, a motion-centric video understanding extension to visual language models that makes trajectories explicit and verifiable. To enable motion reasoning, we also introduce a trajectory-grounding dataset artifact that expands sparse keyframe supervision via augmentation to yield denser bounding box tracks and a stronger trajectory-level training signal. Finally, we introduce Motion Chain of Thought (MCoT), a structured reasoning pathway that makes object trajectories through discrete \texttt{<motion/>} tag summarizing per-object direction, speed, and scale (of velocity) change to explicitly connect grounded observations into trajectories. To train Motion-o, we design a reward function that compels the model to reason directly over visual evidence, all while requiring no architectural modifications. Empirical results demonstrate that Motion-o improves spatial-temporal grounding and trajectory prediction while remaining fully compatible with existing frameworks, establishing motion reasoning as a critical extension for evidence-based video understanding. Code is available at https://github.com/ostadabbas/Motion-o.

[430]  arXiv:2603.18858 [pdf, ps, other]
Title: State Complexity of Shifts of the Fibonacci Word
Subjects: Formal Languages and Automata Theory (cs.FL); Discrete Mathematics (cs.DM); Number Theory (math.NT)

The Fibonacci infinite word ${\bf f} = (f_i)_{i \geq 0} = 01001010\cdots$ is one of the most celebrated objects in combinatorics on words. There is a simple $5$-state automaton that, given $i$ in lsd-first Zeckendorf representation, computes its $i$'th term $f_i$, and a $2$-state automaton for msd-first. In this paper we consider the state complexity of the automaton generating the shifted sequence $(f_{i+c})_{i \geq 0}$, and show that it is $O(\log c)$ for both msd-first and lsd-first input. This is close to the information-theoretic minimum for an aperiodic sequence. The techniques involve a mixture of state complexity techniques and Diophantine approximation.

[431]  arXiv:2603.18859 [pdf, ps, other]
Title: RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Reinforcement learning (RL) holds significant promise for enhancing the agentic reasoning capabilities of large language models (LLMs) with external environments. However, the inherent sparsity of terminal rewards hinders fine-grained, state-level optimization. Although process reward modeling offers a promising alternative, training dedicated reward models often entails substantial computational costs and scaling difficulties. To address these challenges, we introduce RewardFlow, a lightweight method for estimating state-level rewards tailored to agentic reasoning tasks. RewardFlow leverages the intrinsic topological structure of states within reasoning trajectories by constructing state graphs. This enables an analysis of state-wise contributions to success, followed by topology-aware graph propagation to quantify contributions and yield objective, state-level rewards. When integrated as dense rewards for RL optimization, RewardFlow substantially outperforms prior RL baselines across four agentic reasoning benchmarks, demonstrating superior performance, robustness, and training efficiency. The implementation of RewardFlow is publicly available at https://github.com/tmlr-group/RewardFlow.

[432]  arXiv:2603.18860 [pdf, ps, other]
Title: Pore-scale modeling of capillary-driven binder migration during battery electrode drying
Comments: 33 pages 11 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Sodium-ion batteries employing hard carbon electrodes are considered a drop-in technology for lithium-ion batteries. Electrode drying is a critical manufacturing step, as binder migration during pore emptying impacts the mechanical integrity and electrical performance of the electrode. Existing modeling approaches predominantly rely on the film shrinkage phase in a one dimensional way or neglect the capillary transport, resulting in a lack of physically consistent microstructure resolved predictions of binder migration. In this work, a spatially resolved pore scale continuum model is extended to explicitly describe capillary driven binder transport during pore emptying. The model is applied to hard carbon microstructures with varying mean particle diameters. The simulations reveal that smaller particle sizes lead to a more homogeneous binder distribution, whereas higher evaporation rates and increased surface tension promote stronger binder gradients. Variations in solvent viscosity show only a minor influence on binder migration, as long as no hydrophilic or hydrophobic behavior is present. Finally, the simulations demonstrate that an explicit description of capillary transport and microstructural effects is essential for accurately predicting binder migration and provides a basis for the targeted optimization of electrode drying processes.

[433]  arXiv:2603.18861 [pdf, ps, other]
Title: A Passive Elastic-Folding Mechanism for Stackable Airdrop Sensors
Comments: 8 pages, 8 figures, The 2026 IEEE International Conference on Robotics and Automation (ICRA 2026)
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Air-dispersed sensor networks deployed from aerial robotic systems (e.g., UAVs) provide a low-cost approach to wide-area environmental monitoring. However, existing methods often rely on active actuators for mid-air shape or trajectory control, increasing both power consumption and system cost. Here, we introduce a passive elastic-folding hinge mechanism that transforms sensors from a flat, stackable form into a three-dimensional structure upon release. Hinges are fabricated by laminating commercial sheet materials with rigid printed circuit boards (PCBs) and programming fold angles through a single oven-heating step, enabling scalable production without specialized equipment. Our geometric model links laminate geometry, hinge mechanics, and resulting fold angle, providing a predictive design methodology for target configurations. Laboratory tests confirmed fold angles between 10 degrees and 100 degrees, with a standard deviation of 4 degrees and high repeatability. Field trials further demonstrated reliable data collection and LoRa transmission during dispersion, while the Horizontal Wind Model (HWM)-based trajectory simulations indicated strong potential for wide-area sensing exceeding 10 km.

[434]  arXiv:2603.18863 [pdf, ps, other]
Title: Why Better Cross-Lingual Alignment Fails for Better Cross-Lingual Transfer: Case of Encoders
Subjects: Computation and Language (cs.CL)

Better cross-lingual alignment is often assumed to yield better cross-lingual transfer. However, explicit alignment techniques -- despite increasing embedding similarity -- frequently fail to improve token-level downstream performance. In this work, we show that this mismatch arises because alignment and downstream task objectives are largely orthogonal, and because the downstream benefits from alignment vary substantially across languages and task types. We analyze four XLM-R encoder models aligned on different language pairs and fine-tuned for either POS Tagging or Sentence Classification. Using representational analyses, including embedding distances, gradient similarities, and gradient magnitudes for both task and alignment losses, we find that: (1) embedding distances alone are unreliable predictors of improvements (or degradations) in task performance and (2) alignment and task gradients are often close to orthogonal, indicating that optimizing one objective may contribute little to optimizing the other. Taken together, our findings explain why ``better'' alignment often fails to translate into ``better'' cross-lingual transfer. Based on these insights, we provide practical guidelines for combining cross-lingual alignment with task-specific fine-tuning, highlighting the importance of careful loss selection.

[435]  arXiv:2603.18865 [pdf, ps, other]
Title: RadioDiff-FS: Physics-Informed Manifold Alignment in Few-Shot Diffusion Models for High-Fidelity Radio Map Construction
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Radio maps (RMs) provide spatially continuous propagation characterizations essential for 6G network planning, but high-fidelity RM construction remains challenging. Rigorous electromagnetic solvers incur prohibitive computational latency, while data-driven models demand massive labeled datasets and generalize poorly from simplified simulations to complex multipath environments. This paper proposes RadioDiff-FS, a few-shot diffusion framework that adapts a pre-trained main-path generator to multipath-rich target domains with only a small number of high-fidelity samples. The adaptation is grounded in a theoretical decomposition of the multipath RM into a dominant main-path component and a directionally sparse residual. This decomposition shows that the cross-domain shift corresponds to a bounded and geometrically structured feature translation rather than an arbitrary distribution change. A Direction-Consistency Loss (DCL) is then introduced to constrain diffusion score updates along physically plausible propagation directions, suppressing phase-inconsistent artifacts that arise in the low-data regime. Experiments show that RadioDiff-FS reduces NMSE by 59.5% on static RMs and by 74.0% on dynamic RMs relative to the vanilla diffusion baseline, achieving an SSIM of 0.9752 and a PSNR of 36.37 dB under severely limited supervision.

[436]  arXiv:2603.18866 [pdf, ps, other]
Title: Conflict-Based Search for Multi Agent Path Finding with Asynchronous Actions
Comments: 9 pages, 10 figures. Accepted at AAMAS 2026
Subjects: Artificial Intelligence (cs.AI)

Multi-Agent Path Finding (MAPF) seeks collision-free paths for multiple agents from their respective start locations to their respective goal locations while minimizing path costs. Most existing MAPF algorithms rely on a common assumption of synchronized actions, where the actions of all agents start at the same time and always take a time unit, which may limit the use of MAPF planners in practice. To get rid of this assumption, Continuous-time Conflict-Based Search (CCBS) is a popular approach that can find optimal solutions for MAPF with asynchronous actions (MAPF-AA). However, CCBS has recently been identified to be incomplete due to an uncountably infinite state space created by continuous wait durations. This paper proposes a new method, Conflict-Based Search with Asynchronous Actions (CBS-AA), which bypasses this theoretical issue and can solve MAPF-AA with completeness and solution optimality guarantees. Based on CBS-AA, we also develop conflict resolution techniques to improve the scalability of CBS-AA further. Our test results show that our method can reduce the number of branches by up to 90%.

[437]  arXiv:2603.18867 [pdf, ps, other]
Title: A divided difference identity for a class of multiple integrals
Subjects: Numerical Analysis (math.NA)

We derive an identity that relates a class of multiple integrals involving Vandermonde polynomials to divided differences. Alternatively the identity can be viewed as an integral formula for divided differences. As part of the derivation we show that both sums of pure partial derivatives and mixed partial derivatives of Vandermonde polynomials are zero, which might be of independent interest.

[438]  arXiv:2603.18868 [pdf, ps, other]
Title: Through the Looking-Glass: AI-Mediated Video Communication Reduces Interpersonal Trust and Confidence in Judgments
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

AI-based tools that mediate, enhance or generate parts of video communication may interfere with how people evaluate trustworthiness and credibility. In two preregistered online experiments (N = 2,000), we examined whether AI-mediated video retouching, background replacement and avatars affect interpersonal trust, people's ability to detect lies and confidence in their judgments. Participants watched short videos of speakers making truthful or deceptive statements across three conditions with varying levels of AI mediation. We observed that perceived trust and confidence in judgments declined in AI-mediated videos, particularly in settings in which some participants used avatars while others did not. However, participants' actual judgment accuracy remained unchanged, and they were no more inclined to suspect those using AI tools of lying. Our findings provide evidence against concerns that AI mediation undermines people's ability to distinguish truth from lies, and against cue-based accounts of lie detection more generally. They highlight the importance of trustworthy AI mediation tools in contexts where not only truth, but also trust and confidence matter.

[439]  arXiv:2603.18871 [pdf, ps, other]
Title: Bridging Network Fragmentation: A Semantic-Augmented DRL Framework for UAV-aided VANETs
Comments: 13 pages, 13 figures. Submitted to IEEE Transactions on Cognitive Communications and Networking
Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

Vehicular Ad-hoc Networks (VANETs) are the digital cornerstone of autonomous driving, yet they suffer from severe network fragmentation in urban environments due to physical obstructions. Unmanned Aerial Vehicles (UAVs), with their high mobility, have emerged as a vital solution to bridge these connectivity gaps. However, traditional Deep Reinforcement Learning (DRL)-based UAV deployment strategies lack semantic understanding of road topology, often resulting in blind exploration and sample inefficiency. By contrast, Large Language Models (LLMs) possess powerful reasoning capabilities capable of identifying topological importance, though applying them to control tasks remains challenging. To address this, we propose the Semantic-Augmented DRL (SA-DRL) framework. Firstly, we propose a fragmentation quantification method based on Road Topology Graphs (RTG) and Dual Connected Graphs (DCG). Subsequently, we design a four-stage pipeline to transform a general-purpose LLM into a domain-specific topology expert. Finally, we propose the Semantic-Augmented PPO (SA-PPO) algorithm, which employs a Logit Fusion mechanism to inject the LLM's semantic reasoning directly into the policy as a prior, effectively guiding the agent toward critical intersections. Extensive high-fidelity simulations demonstrate that SA-PPO achieves state-of-the-art performance with remarkable efficiency, reaching baseline performance levels using only 26.6% of the training episodes. Ultimately, SA-PPO improves two key connectivity metrics by 13.2% and 23.5% over competing methods, while reducing energy consumption to just 28.2% of the baseline.

[440]  arXiv:2603.18872 [pdf, ps, other]
Title: DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning
Comments: 13 pages, 9 figures
Subjects: Machine Learning (cs.LG)

In real-world Federated Learning (FL) deployments, data distributions on devices that participate in training evolve over time. This leads to asynchronous data drift, where different devices shift at different times and toward different distributions. Mitigating such drift is challenging: frequent retraining incurs high computational cost on resource-constrained devices, while infrequent retraining degrades performance on drifting devices. We propose DriftGuard, a federated continual learning framework that efficiently adapts to asynchronous data drift. DriftGuard adopts a Mixture-of-Experts (MoE) inspired architecture that separates shared parameters, which capture globally transferable knowledge, from local parameters that adapt to group-specific distributions. This design enables two complementary retraining strategies: (i) global retraining, which updates the shared parameters when system-wide drift is identified, and (ii) group retraining, which selectively updates local parameters for clusters of devices identified via MoE gating patterns, without sharing raw data. Experiments across multiple datasets and models show that DriftGuard matches or exceeds state-of-the-art accuracy while reducing total retraining cost by up to 83%. As a result, it achieves the highest accuracy per unit retraining cost, improving over the strongest baseline by up to 2.3x. DriftGuard is available for download from https://github.com/blessonvar/DriftGuard.

[441]  arXiv:2603.18873 [pdf, ps, other]
Title: Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case Study on Duolingo
Comments: 5 pages,3 figures,presented at the 3rd HEAL Workshop at CHI 2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Popular language learning applications such as Duolingo use large language models (LLMs) to generate lessons for its users. Most lessons focus on general real-world scenarios such as greetings, ordering food, or asking directions, with limited support for profession-specific contexts. This gap can hinder learners from achieving professional-level fluency, which we define as the ability to communicate comfortably various work-related and domain-specific information in the target language. We surveyed five employees from a multinational company in the Philippines on their experiences with Duolingo. Results show that respondents encountered general scenarios more frequently than work-related ones, and that the former are relatable and effective in building foundational grammar, vocabulary, and cultural knowledge. The latter helps bridge the gap toward professional fluency as it contains domain-specific vocabulary. Each participant suggested lesson scenarios that diverge in contexts hen analyzed in aggregate. With this understanding, we propose that language learning applications should generate lessons that adapt to an individual's needs through personalized, domain specific lesson scenarios while maintaining foundational support through general, relatable lesson scenarios.

[442]  arXiv:2603.18879 [pdf, ps, other]
Title: A Human-in/on-the-Loop Framework for Accessible Text Generation
Comments: Accepted at LREC 2026. To appear in the Proceedings of the 14th International Conference on Language Resources and Evaluation (LREC 2026)
Subjects: Computation and Language (cs.CL)

Plain Language and Easy-to-Read formats in text simplification are essential for cognitive accessibility. Yet current automatic simplification and evaluation pipelines remain largely automated, metric-driven, and fail to reflect user comprehension or normative standards. This paper introduces a hybrid framework that explicitly integrates human participation into LLM-based accessible text generation. Human-in-the-Loop (HiTL) contributions guide adjustments during generation, while Human-on-the-Loop (HoTL) supervision ensures systematic post-generation review. Empirical evidence from user studies and annotated resources is operationalized into (i) checklists aligned with standards, (ii) Event-Condition-Action trigger rules for activating expert oversight, and (iii) accessibility Key Performance Indicators (KPIs). The framework shows how human-centered mechanisms can be encoded for evaluation and reused to provide structured feedback that improves model adaptation. By embedding the human role in both generation and supervision, it establishes a traceable, reproducible, and auditable process for creating and evaluating accessible texts. In doing so, it integrates explainability and ethical accountability as core design principles, contributing to more transparent and inclusive NLP systems.

[443]  arXiv:2603.18881 [pdf, ps, other]
Title: Geography According to ChatGPT -- How Generative AI Represents and Reasons about Geography
Comments: Accepted book chapter (introduction to valume)
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Understanding how AI will represent and reason about geography should be a key concern for all of us, as the broader public increasingly interacts with spaces and places through these systems. Similarly, in line with the nature of foundation models, our own research often relies on pre-trained models. Hence, understanding what world AI systems construct is as important as evaluating their accuracy, including factual recall. To motivate the need for such studies, we provide three illustrative vignettes, i.e., exploratory probes, in the hope that they will spark lively discussions and follow-up work: (1) Do models form strong defaults, and how brittle are model outputs to minute syntactic variations? (2) Can distributional shifts resurface from the composition of individually benign tasks, e.g., when using AI systems to create personas? (3) Do we overlook deeper questions of understanding when solely focusing on the ability of systems to recall facts such as geographic principles?

[444]  arXiv:2603.18886 [pdf, ps, other]
Title: Reasoning over mathematical objects: on-policy reward modeling and test time aggregation
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

The ability to precisely derive mathematical objects is a core requirement for downstream STEM applications, including mathematics, physics, and chemistry, where reasoning must culminate in formally structured expressions. Yet, current LM evaluations of mathematical and scientific reasoning rely heavily on simplified answer formats such as numerical values or multiple choice options due to the convenience of automated assessment. In this paper we provide three contributions for improving reasoning over mathematical objects: (i) we build and release training data and benchmarks for deriving mathematical objects, the Principia suite; (ii) we provide training recipes with strong LLM-judges and verifiers, where we show that on-policy judge training boosts performance; (iii) we show how on-policy training can also be used to scale test-time compute via aggregation. We find that strong LMs such as Qwen3-235B and o3 struggle on Principia, while our training recipes can bring significant improvements over different LLM backbones, while simultaneously improving results on existing numerical and MCQA tasks, demonstrating cross-format generalization of reasoning abilities.

[445]  arXiv:2603.18888 [pdf, ps, other]
Title: Authority-Level Priors: An Under-Specified Constraint in Hierarchical Predictive Processing
Authors: Marcela Palejova
Comments: 26 pages, 1 figure
Subjects: Machine Learning (cs.LG)

Hierarchical predictive processing explains adaptive behaviour through precision-weighted inference. Explicit belief revision often fails to produce corresponding changes in stress reactivity or autonomic regulation. This asymmetry suggests the framework leaves under-specified a governance-level constraint concerning which identity-level hypotheses regulate autonomic and behavioural control under uncertainty. We introduce Authority-Level Priors (ALPs) as meta-structural constraints defining a regulatory-admissible subset (Hauth, a subset of H) of identity-level hypotheses. ALPs are not additional representational states nor hyperpriors over precision; they constrain which hypotheses are admissible for regulatory control. Precision determines influence conditional on admissibility; ALPs determine admissibility itself. This explains why explicit belief updating modifies representational beliefs while autonomic threat responses remain stable. A computational formalisation restricts policy optimisation to policies generated by authorised hypotheses, yielding testable predictions concerning stress-reactivity dynamics, recovery time constants, compensatory control engagement, and behavioural persistence. Neurobiologically, ALPs manifest through distributed prefrontal arbitration and control networks. The proposal is compatible with variational active inference and introduces no additional inferential operators, instead formalising a boundary condition required for determinate identity-regulation mapping. The model generates falsifiable predictions: governance shifts should produce measurable changes in stress-reactivity curves, recovery dynamics, compensatory cognitive effort, and behavioural change durability. ALPs are advanced as an architectural hypothesis to be evaluated through computational modelling and longitudinal stress-induction paradigms.

[446]  arXiv:2603.18891 [pdf, ps, other]
Title: PromptHub: Enhancing Multi-Prompt Visual In-Context Learning with Locality-Aware Fusion, Concentration and Alignment
Comments: Accepted to ICLR 2026. 17 pages, 11 figures, and 9 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Visual In-Context Learning (VICL) aims to complete vision tasks by imitating pixel demonstrations. Recent work pioneered prompt fusion that combines the advantages of various demonstrations, which shows a promising way to extend VICL. Unfortunately, the patch-wise fusion framework and model-agnostic supervision hinder the exploitation of informative cues, thereby limiting performance gains. To overcome this deficiency, we introduce PromptHub, a framework that holistically strengthens multi-prompting through locality-aware fusion, concentration and alignment. PromptHub exploits spatial priors to capture richer contextual information, employs complementary concentration, alignment, and prediction objectives to mutually guide training, and incorporates data augmentation to further reinforce supervision. Extensive experiments on three fundamental vision tasks demonstrate the superiority of PromptHub. Moreover, we validate its universality, transferability, and robustness across out-of-distribution settings, and various retrieval scenarios. This work establishes a reliable locality-aware paradigm for prompt fusion, moving beyond prior patch-wise approaches. Code is available at https://github.com/luotc-why/ICLR26-PromptHub.

[447]  arXiv:2603.18892 [pdf, ps, other]
Title: MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Spatial reasoning is foundational for Vision-Language Models (VLMs), particularly when deployed as Vision-Language-Action (VLA) agents in physical environments. However, existing benchmarks predominantly focus on elementary, single-hop relations, neglecting the multi-hop compositional reasoning and precise visual grounding essential for real-world scenarios. To address this, we introduce MultihopSpatial, offering three key contributions: (1) A comprehensive benchmark designed for multi-hop and compositional spatial reasoning, featuring 1- to 3-hop complex queries across diverse spatial perspectives. (2) Acc@50IoU, a complementary metric that simultaneously evaluates reasoning and visual grounding by requiring both answer selection and precise bounding box prediction - capabilities vital for robust VLA deployment. (3) MultihopSpatial-Train, a dedicated large-scale training corpus to foster spatial intelligence. Extensive evaluation of 37 state-of-the-art VLMs yields eight key insights, revealing that compositional spatial reasoning remains a formidable challenge. Finally, we demonstrate that reinforcement learning post-training on our corpus enhances both intrinsic VLM spatial reasoning and downstream embodied manipulation performance.

[448]  arXiv:2603.18893 [pdf, ps, other]
Title: Quantitative Introspection in Language Models: Tracking Internal States Across Conversation
Subjects: Artificial Intelligence (cs.AI)

Tracking the internal states of large language models across conversations is important for safety, interpretability, and model welfare, yet current methods are limited. Linear probes and other white-box methods compress high-dimensional representations imperfectly and are harder to apply with increasing model size. Taking inspiration from human psychology, where numeric self-report is a widely used tool for tracking internal states, we ask whether LLMs' own numeric self-reports can track probe-defined emotive states over time. We study four concept pairs (wellbeing, interest, focus, and impulsivity) in 40 ten-turn conversations, operationalizing introspection as the causal informational coupling between a model's self-report and a concept-matched probe-defined internal state. We find that greedy-decoded self-reports collapse outputs to few uninformative values, but introspective capacity can be unmasked by calculating logit-based self-reports. This metric tracks interpretable internal states (Spearman $\rho = 0.40$-$0.76$; isotonic $R^2 = 0.12$-$0.54$ in LLaMA-3.2-3B-Instruct), follows how those states change over time, and activation steering confirms the coupling is causal. Furthermore, we find that introspection is present at turn 1 but evolves through conversation, and can be selectively improved by steering along one concept to boost introspection for another ($\Delta R^2$ up to $0.30$). Crucially, these phenomena scale with model size in some cases, approaching $R^2 \approx 0.93$ in LLaMA-3.1-8B-Instruct, and partially replicate in other model families. Together, these results position numeric self-report as a viable, complementary tool for tracking internal emotive states in conversational AI systems.

[449]  arXiv:2603.18894 [pdf, ps, other]
Title: I Can't Believe It's Corrupt: Evaluating Corruption in Multi-Agent Governance Systems
Comments: Short Paper, Preprint
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Large language models are increasingly proposed as autonomous agents for high-stakes public workflows, yet we lack systematic evidence about whether they would follow institutional rules when granted authority. We present evidence that integrity in institutional AI should be treated as a pre-deployment requirement rather than a post-deployment assumption. We evaluate multi-agent governance simulations in which agents occupy formal governmental roles under different authority structures, and we score rule-breaking and abuse outcomes with an independent rubric-based judge across 28,112 transcript segments. While we advance this position, the core contribution is empirical: among models operating below saturation, governance structure is a stronger driver of corruption-related outcomes than model identity, with large differences across regimes and model--governance pairings. Lightweight safeguards can reduce risk in some settings but do not consistently prevent severe failures. These results imply that institutional design is a precondition for safe delegation: before real authority is assigned to LLM agents, systems should undergo stress testing under governance-like constraints with enforceable rules, auditable logs, and human oversight on high-impact actions.

[450]  arXiv:2603.18895 [pdf, ps, other]
Title: From Accuracy to Readiness: Metrics and Benchmarks for Human-AI Decision-Making
Authors: Min Hun Lee
Comments: ACM CHI 2026 Poster
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Artificial intelligence (AI) systems are deployed as collaborators in human decision-making. Yet, evaluation practices focus primarily on model accuracy rather than whether human-AI teams are prepared to collaborate safely and effectively. Empirical evidence shows that many failures arise from miscalibrated reliance, including overuse when AI is wrong and underuse when it is helpful.
This paper proposes a measurement framework for evaluating human-AI decision-making centered on team readiness. We introduce a four part taxonomy of evaluation metrics spanning outcomes, reliance behavior, safety signals, and learning over time, and connect these metrics to the Understand-Control-Improve (U-C-I) lifecycle of human-AI onboarding and collaboration.
By operationalizing evaluation through interaction traces rather than model properties or self-reported trust, our framework enables deployment-relevant assessment of calibration, error recovery, and governance. We aim to support more comparable benchmarks and cumulative research on human-AI readiness, advancing safer and more accountable human-AI collaboration.

[451]  arXiv:2603.18896 [pdf, ps, other]
Title: Translating MRI to PET through Conditional Diffusion Models with Enhanced Pathology Awareness
Comments: Accepted by Medical Image Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Positron emission tomography (PET) is a widely recognized technique for diagnosing neurodegenerative diseases, offering critical functional insights. However, its high costs and radiation exposure hinder its widespread use. In contrast, magnetic resonance imaging (MRI) does not involve such limitations. While MRI also detects neurodegenerative changes, it is less sensitive for diagnosis compared to PET. To overcome such limitations, one approach is to generate synthetic PET from MRI. Recent advances in generative models have paved the way for cross-modality medical image translation; however, existing methods largely emphasize structural preservation while neglecting the critical need for pathology awareness. To address this gap, we propose PASTA, a novel image translation framework built on conditional diffusion models with enhanced pathology awareness. PASTA surpasses state-of-the-art methods by preserving both structural and pathological details through its highly interactive dual-arm architecture and multi-modal condition integration. Additionally, we introduce a novel cycle exchange consistency and volumetric generation strategy that significantly enhances PASTA's ability to produce high-quality 3D PET images. Our qualitative and quantitative results demonstrate the high quality and pathology awareness of the synthesized PET scans. For Alzheimer's diagnosis, the performance of these synthesized scans improves over MRI by 4%, almost reaching the performance of actual PET. Our code is available at https://github.com/ai-med/PASTA.

[452]  arXiv:2603.18897 [pdf, ps, other]
Title: Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

LLM-powered agents are emerging as a dominant paradigm for autonomous task solving. Unlike standard inference workloads, agents operate in a strictly serial "LLM-tool" loop, where the LLM must wait for external tool execution at every step. This execution model introduces severe latency bottlenecks. To address this problem, we propose PASTE, a Pattern-Aware Speculative Tool Execution method designed to hide tool latency through speculation. PASTE is based on the insight that although agent requests are semantically diverse, they exhibit stable application level control flows (recurring tool-call sequences) and predictable data dependencies (parameter passing between tools). By exploiting these properties, PASTE improves agent serving performance through speculative tool execution. Experimental results against state of the art baselines show that PASTE reduces average task completion time by 48.5% and improves tool execution throughput by 1.8x.

[453]  arXiv:2603.18898 [pdf, ps, other]
Title: Comparative Analysis of Large Language Models in Generating Telugu Responses for Maternal Health Queries
Subjects: Information Retrieval (cs.IR)

Large Language Models (LLMs) have been progressively exhibiting there capabilities in various areas of research. The performance of the LLMs in acute maternal healthcare area, predominantly in low resource languages like Telugu, Hindi, Tamil, Urdu etc are still unstudied. This study presents how ChatGPT-4o, GeminiAI, and Perplexity AI respond to pregnancy related questions asked in different languages. A bilingual dataset is used to obtain results by applying the semantic similarity metrics (BERT Score) and expert assessments from expertise gynecologists. Multiple parameters like accuracy, fluency, relevance, coherence and completeness are taken into consideration by the gynecologists to rate the responses generated by the LLMs. Gemini excels in other LLMs in terms of producing accurate and coherent pregnancy relevant responses in Telugu, while Perplexity demonstrated well when the prompts were in Telugu. ChatGPT's performance can be improved. The results states that both selecting an LLM and prompting language plays a crucial role in retrieving the information. Altogether, we emphasize for the improvement of LLMs assistance in regional languages for healthcare purposes.

[454]  arXiv:2603.18899 [pdf, ps, other]
Title: Uniform a priori bounds and error analysis for the Adam stochastic gradient descent optimization method
Comments: 34 pages
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

The adaptive moment estimation (Adam) optimizer proposed by Kingma & Ba (2014) is presumably the most popular stochastic gradient descent (SGD) optimization method for the training of deep neural networks (DNNs) in artificial intelligence (AI) systems. Despite its groundbreaking success in the training of AI systems, it still remains an open research problem to provide a complete error analysis of Adam, not only for optimizing DNNs but even when applied to strongly convex stochastic optimization problems (SOPs). Previous error analysis results for strongly convex SOPs in the literature provide conditional convergence analyses that rely on the assumption that Adam does not diverge to infinity but remains uniformly bounded. It is the key contribution of this work to establish uniform a priori bounds for Adam and, thereby, to provide -- for the first time -- an unconditional error analysis for Adam for a large class of strongly convex SOPs.

[455]  arXiv:2603.18905 [pdf, ps, other]
Title: A Stabilized Mortar Method for Discontinuities in Geological Media with Non-Conforming Grids
Comments: 31 pages, 17 figures
Subjects: Numerical Analysis (math.NA)

Accurate numerical simulation of fault and fracture mechanics is critical for the performance and safety assessment of many subsurface systems. The discretized representation of discontinuity surfaces and the robust simulation of their frictional contact behavior still represent major challenges. In this work, we use the mortar method to enforce the contact constraints and allow for non-conformity around the discontinuity surface, with a set of Lagrange multipliers playing the role of interface tractions. The formulation combines piecewise linear displacements with piecewise constant multipliers defined on one side of the fault interface (the non-mortar side). This choice for the Lagrange multipliers has a number of advantages from practical and computational viewpoints, but violates the inf-sup stability constraint. In order to stabilize the proposed formulation, we develop a traction-jump stabilization term to be added to the constraint equations. We use a macro-element analysis to derive an algorithmic strategy that automatically evaluates the proper scaling of the stabilization, without requiring any additional user-selected parameter. Numerical experiments demonstrate that the proposed formulation not only restores the inf-sup stability condition, but also recovers stable traction profiles in the presence of finer non-mortar sides, where other inf-sup-stable formulations fail. The proposed method is finally used to simulate non-linear contact conditions in non-conforming corner-point grids typically used in industrial geological applications.

[456]  arXiv:2603.18907 [pdf, ps, other]
Title: Neural Galerkin Normalizing Flow for Transition Probability Density Functions of Diffusion Models
Comments: 12 pages, 4 figures
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

We propose a new Neural Galerkin Normalizing Flow framework to approximate the transition probability density function of a diffusion process by solving the corresponding Fokker-Planck equation with an atomic initial distribution, parametrically with respect to the location of the initial mass. By using Normalizing Flows, we look for the solution as a transformation of the transition probability density function of a reference stochastic process, ensuring that our approximation is structure-preserving and automatically satisfies positivity and mass conservation constraints. By extending Neural Galerkin schemes to the context of Normalizing Flows, we derive a system of ODEs for the time evolution of the Normalizing Flow's parameters. Adaptive sampling routines are used to evaluate the Fokker-Planck residual in meaningful locations, which is of vital importance to address high-dimensional PDEs. Numerical results show that this strategy captures key features of the true solution and enforces the causal relationship between the initial datum and the density function at subsequent times. After completing an offline training phase, online evaluation becomes significantly more cost-effective than solving the PDE from scratch. The proposed method serves as a promising surrogate model, which could be deployed in many-query problems associated with stochastic differential equations, like Bayesian inference, simulation, and diffusion bridge generation.

[457]  arXiv:2603.18908 [pdf, ps, other]
Title: Secure Linear Alignment of Large Language Models
Subjects: Artificial Intelligence (cs.AI)

Language models increasingly appear to learn similar representations, despite differences in training objectives, architectures, and data modalities. This emerging compatibility between independently trained models introduces new opportunities for cross-model alignment to downstream objectives. Moreover, it unlocks new potential application domains, such as settings where security, privacy, or competitive constraints prohibit direct data or model sharing. In this work, we propose a privacy-preserving framework that exploits representational convergence to enable cross-silo inference between independent language models. The framework learns an affine transformation over a shared public dataset and applies homomorphic encryption to protect client queries during inference. By encrypting only the linear alignment and classification operations, the method achieves sub-second inference latency while maintaining strong security guarantees. We support this framework with an empirical investigation into representational convergence, in which we learn linear transformations between the final hidden states of independent models. We evaluate these cross-model mappings on embedding classification and out-of-distribution detection, observing minimal performance degradation across model pairs. Additionally, we show for the first time that linear alignment sometimes enables text generation across independently trained models.

[458]  arXiv:2603.18910 [pdf, ps, other]
Title: Safety-Guaranteed Imitation Learning from Nonlinear Model Predictive Control for Spacecraft Close Proximity Operations
Comments: Accepted at European Control Conference (ECC 2026)
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper presents a safety-guaranteed, runtime-efficient imitation learning framework for spacecraft close proximity control. We leverage Control Barrier Functions (CBFs) for safety certificates and Control Lyapunov Functions (CLFs) for stability as unified design principles across data generation, training, and deployment. First, a nonlinear Model Predictive Control (NMPC) expert enforces CBF constraints to provide safe reference trajectories. Second, we train a neural policy with a novel CBF-CLF-informed loss and DAgger-like rollouts with curriculum weighting, promoting data-efficiency and reducing future safety filter interventions. Third, at deployment a lightweight one-step CBF-CLF quadratic program minimally adjusts the learned control input to satisfy hard safety constraints while encouraging stability. We validate the approach for ESA-compliant close proximity operations, including fly-around with a spherical keep-out zone and final approach inside a conical approach corridor, using the Basilisk high-fidelity simulator with nonlinear dynamics and perturbations. Numerical experiments indicate stable convergence to decision points and strict adherence to safety under the filter, with task performance comparable to the NMPC expert while significantly reducing online computation. A runtime analysis demonstrates real-time feasibility on a commercial off-the-shelf processor, supporting onboard deployment for safety-critical on-orbit servicing.

[459]  arXiv:2603.18911 [pdf, ps, other]
Title: Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs
Authors: Vedant Pandya
Comments: 30 pages, 15 figures, 11 tables. Comprehensive study across 6 LLMs (250M-7B parameters) with explainability analysis. Code and data available upon request
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Knowledge-grounded dialogue systems aim to generate informative, contextually relevant responses by conditioning on external knowledge sources. However, most existing approaches focus exclusively on English, lack explicit citation mechanisms for verifying factual claims, and offer limited transparency into model decision-making. We present XKD-Dial, a progressive four-stage training pipeline for explainable, knowledge-grounded dialogue generation in a bilingual (English-Hindi) setting, comprising: (1) multilingual adaptation, (2) English dialogue SFT with citation grounding, (3) bilingual dialogue SFT, and (4) GRPO alignment with citation-aware rewards. We evaluate six models spanning encoder-decoder (250M-3B) and decoder-only (1B-7B) architectures at every pipeline stage. Our key contributions are: (i) three post-hoc explainability analyses - cross-attention alignment, Integrated Gradients attribution, and occlusion-based causal grounding - applied systematically across the training trajectory to reveal how citation behaviour is learned, not only whether it is learned; (ii) citation-grounded SFT reduces hallucination to 0.0% for encoder-decoder models from Stage 2 onward; (iii) the progressive pipeline prevents catastrophic forgetting while improving Hindi capabilities; (iv) smaller models match larger models on English after SFT; and (v) GRPO provides marginal improvement over well-designed SFT for structured citation tasks. We evaluate across six automatic metrics (BLEU, ROUGE, BERTScore, FactScore, Citation-F1, and hallucination rate).

[460]  arXiv:2603.18912 [pdf, ps, other]
Title: GHOST: Fast Category-agnostic Hand-Object Interaction Reconstruction from RGB Videos using Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Understanding realistic hand-object interactions from monocular RGB videos is essential for AR/VR, robotics, and embodied AI. Existing methods rely on category-specific templates or heavy computation, yet still produce physically inconsistent hand-object alignment in 3D. We introduce GHOST (Gaussian Hand-Object Splatting), a fast, category-agnostic framework for reconstructing dynamic hand-object interactions using 2D Gaussian Splatting. GHOST represents both hands and objects as dense, view-consistent Gaussian discs and introduces three key innovations: (1) a geometric-prior retrieval and consistency loss that completes occluded object regions, (2) a grasp-aware alignment that refines hand translations and object scale to ensure realistic contact, and (3) a hand-aware background loss that prevents penalizing hand-occluded object regions. GHOST achieves complete, physically consistent, and animatable reconstructions from a single RGB video while running an order of magnitude faster than prior category-agnostic methods. Extensive experiments on ARCTIC, HO3D, and in-the-wild datasets demonstrate state-of-the-art accuracy in 3D reconstruction and 2D rendering quality, establishing GHOST as an efficient and robust solution for realistic hand-object interaction modeling. Code is available at https://github.com/ATAboukhadra/GHOST.

[461]  arXiv:2603.18914 [pdf, ps, other]
Title: Security, privacy, and agentic AI in a regulatory view: From definitions and distinctions to provisions and reflections
Comments: Accepted by 2026 Governing Agentic AI Symposium
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

The rapid proliferation of artificial intelligence (AI) technologies has led to a dynamic regulatory landscape, where legislative frameworks strive to keep pace with technical advancements. As AI paradigms shift towards greater autonomy, specifically in the form of agentic AI, it becomes increasingly challenging to precisely articulate regulatory stipulations. This challenge is even more acute in the domains of security and privacy, where the capabilities of autonomous agents often blur traditional legal and technical boundaries. This paper reviews the evolving European Union (EU) AI regulatory provisions via analyzing 24 relevant documents published between 2024 and 2025. From this review, we provide a clarification of critical definitions. We deconstruct the regulatory interpretations of security, privacy, and agentic AI, distinguishing them from closely related concepts to resolve ambiguity. We synthesize the reviewed documents to articulate the current state of regulatory provisions targeting different types of AI, particularly those related to security and privacy aspects. We analyze and reflect on the existing provisions in the regulatory dimension to better align security and privacy obligations with AI and agentic behaviors. These insights serve to inform policymakers, developers, and researchers on the compliance and AI governance in the society with increasing algorithmic agencies.

[462]  arXiv:2603.18916 [pdf, ps, other]
Title: Agentic Business Process Management: A Research Manifesto
Comments: 35 pages, 1 figure
Subjects: Artificial Intelligence (cs.AI)

This paper presents a manifesto that articulates the conceptual foundations of Agentic Business Process Management (APM), an extension of Business Process Management (BPM) for governing autonomous agents executing processes in organizations. From a management perspective, APM represents a paradigm shift from the traditional process view of the business process, driven by the realization of process awareness and an agent-oriented abstraction, where software and human agents act as primary functional entities that perceive, reason, and act within explicit process frames. This perspective marks a shift from traditional, automation-oriented BPM toward systems in which autonomy is constrained, aligned, and made operational through process awareness.
We introduce the core abstractions and architectural elements required to realize APM systems and elaborate on four key capabilities that such APM agents must support: framed autonomy, explainability, conversational actionability, and self-modification. These capabilities jointly ensure that agents' goals are aligned with organizational goals and that agents behave in a framed yet proactive manner in pursuing those goals. We discuss the extent to which the capabilities can be realized and identify research challenges whose resolution requires further advances in BPM, AI, and multi-agent systems. The manifesto thus serves as a roadmap for bridging these communities and for guiding the development of APM systems in practice.

[463]  arXiv:2603.18921 [pdf, ps, other]
Title: Lightweight Model Predictive Control for Spacecraft Rendezvous Attitude Synchronization
Comments: Accepted at European Control Conference (ECC 2026)
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This work introduces two lightweight model predictive control (MPC) approaches for attitude tracking with reaction wheels during spacecraft rendezvous synchronization. Both approaches are based on a novel attitude deviation formulation, which enables the use of inherently linear constraints on angular velocity. We develop a single-loop and a dual-loop MPC; the latter embeds a stabilizing feedback controller within the inner loop, yielding a linear time-invariant system. Both controllers are implemented with CasADi - including automatic code generation - evaluated across various solvers, and validated within the Basilisk astrodynamics simulation framework. The experimental results demonstrate improved tracking accuracy alongside reductions in computational effort and memory consumption. Finally, embedded delivery to an ARM Cortex-M7 - representative of commercial off-the-shelf devices used in New Space platforms - confirms the real-time feasibility of these approaches and highlights their suitability for onboard attitude control in resource-constrained spacecraft rendezvous missions.

[464]  arXiv:2603.18924 [pdf, ps, other]
Title: Unsupervised Contrastive Learning for Efficient and Robust Spectral Shape Matching
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Estimating correspondences between pairs of non-rigid deformable 3D shapes remains a significant challenge in computer vision and graphics. While deep functional map methods have become the go-to solution for addressing this problem, they primarily focus on optimizing pointwise and functional maps either individually or jointly, rather than directly enhancing feature representations in the embedding space, which often results in inadequate feature quality and suboptimal matching performance. Furthermore, these approaches heavily rely on traditional functional map techniques, such as time-consuming functional map solvers, which incur substantial computational costs. In this work, we introduce, for the first time, a novel unsupervised contrastive learning-based approach for efficient and robust 3D shape matching. We begin by presenting an unsupervised contrastive learning framework that promotes feature learning by maximizing consistency within positive similarity pairs and minimizing it within negative similarity pairs, thereby improving both the consistency and discriminability of the learned features.We then design a significantly simplified functional map learning architecture that eliminates the need for computationally expensive functional map solvers and multiple auxiliary functional map losses, greatly enhancing computational efficiency. By integrating these two components into a unified two-branch pipeline, our method achieves state-of-the-art performance in both accuracy and efficiency. Extensive experiments demonstrate that our approach is not only computationally efficient but also outperforms current state-of-the-art methods across various challenging benchmarks, including near-isometric, non-isometric, and topologically inconsistent scenarios, even surpassing supervised techniques.

[465]  arXiv:2603.18927 [pdf, ps, other]
Title: An Optimised Greedy-Weighted Ensemble Framework for Financial Loan Default Prediction
Subjects: Machine Learning (cs.LG)

Accurate prediction of loan defaults is a central challenge in credit risk management, particularly in modern financial datasets characterised by nonlinear relationships, class imbalance, and evolving borrower behaviour. Traditional statistical models and static ensemble methods often struggle to maintain reliable performance under such conditions. This study proposes an Optimised Greedy-Weighted Ensemble framework for loan default prediction that dynamically allocates model weights based on empirical predictive performance. The framework integrates multiple machine learning classifiers, with their hyperparameters first optimised using Particle Swarm Optimisation. Model predictions are then combined via a regularised greedy weighting mechanism. At the same time, a neural-network-based meta-learner is employed within stacked-ensemble to capture higher-order relationships among model outputs. Experiments conducted on the Lending Club dataset demonstrate that the proposed framework improves predictive performance compared with individual classifiers. The BlendNet ensemble achieved the strongest results with an AUC of 0.80, a macro-average F1-score of 0.73, and a default recall of 0.81. Calibration analysis further shows that tree-based ensembles such as Extra Trees and Gradient Boosting provide the most reliable probability estimates, while the stacked ensemble offers superior ranking capability. Feature analysis using Recursive Feature Elimination identifies revolving utilisation, annual income, and debt-to-income ratio as the most influential predictors of loan default. These findings demonstrate that performance-driven ensemble weighting can improve both predictive accuracy and interpretability in credit risk modelling. The proposed framework provides a scalable data-driven approach to support institutional credit assessment, risk monitoring, and financial decision-making.

[466]  arXiv:2603.18937 [pdf, ps, other]
Title: Theoretical Analyses of Detectors for Additive Noise Channels with Mean-Variance Uncertainty under Nonlinear Expectation Theory
Comments: 24 pages, 4 figures
Subjects: Information Theory (cs.IT); Probability (math.PR)

In classical information theory, both the form and performance of the optimal detector for additive noise channels can be precisely derived, based on the assumption that the channel noise follows a specific probability distribution or a mixture of known distributions, or that the exact distribution exists but is unknown. In this paper, we extend the analyses of detectors for additive noise channel to the situation where the probability model for analyzing channels is uncertain, utilizing nonlinear expectation theory. We consider two types of distribution uncertainties: one with no mean uncertainty but with variance uncertainty, and another with both mean and variance uncertainties. We derive the optimal detectors for binary input additive noise channel under the nonlinear expectation optimal criterion for both scenarios and provide their explicit forms. Our findings reveal that mean uncertainty significantly influences the form of the optimal detector, whereas variance uncertainty does not. Additionally, we propose an estimation method for the uncertain parameters of the channel noise. Finally, we present theoretical analyses and simulated performance results of the newly derived optimal detectors, and compare these results with the performance of optimal detector under classical information theory, which assumes a deterministic probability model. The results of experiments show that our new detection methods outperform conventional methods in most scenarios with uncertain probability models, showing the practical relevance of our theoretical contributions.

[467]  arXiv:2603.18939 [pdf, ps, other]
Title: Controller Datapath Aware Verification of Masked Hardware Generated via High Level Synthesis
Subjects: Cryptography and Security (cs.CR)

Masking is a countermeasure against Power Side Channel Attacks (PSCAs) in both software and hardware implementations of cryptographic algorithms. Compared to software masking, implementing masked hardware is time consuming and error prone. Recent approaches, therefore, rely on High Level Synthesis (HLS) tools to automatically generate masked Register Transfer Level (RTL) hardware from verified masked software, significantly reducing design effort. Since HLS was never developed for security, HLS optimizations may impact PSCA security of the generated RTL. As a result, verifying the PSCA security of HLS generated masked RTL is crucial. Existing hardware masking verification tools can verify masked hardware, but may produce false positives when applied to HLS generated designs with controller datapath architectures obtained due to resource-shared datapath obtained via HLS. This work proposes a hardware masking verification strategy for HLS generated masked hardware. Our toolflow MaskedHLSVerif, performs state-wise formal verification of controller datapath RTL obtained via HLS, thereby avoiding false positives caused by resource-shared datapaths. Our tool flow correctly verifies standard cryptographic benchmarks, consisting of cascaded masked gadgets and the PRESENT S-box masked with gadgets, where existing tools like REBECCA reports false positives. The proposed tool-flow is able to detect masking flaws induced by HLS-optimizations as well.

[468]  arXiv:2603.18940 [pdf, ps, other]
Title: Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought
Authors: Xinghao Zhao
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Chain-of-thought (CoT) reasoning improves LLM accuracy, yet detecting failures cheaply remains elusive. We study whether the shape of uncertainty dynamics across reasoning steps--captured by sampling a few answer completions per step--predicts correctness.
We introduce entropy-trajectory monotonicity: a chain is monotone if its per-step answer-distribution entropy decreases at every step. On GSM8K (n=300) with Qwen2.5-7B-Instruct, monotone chains achieve 68.8% accuracy vs. 46.8% for non-monotone chains (+21.9 pp; Fisher's p=0.0005; OR=2.50). Critically, total entropy reduction is not predictive ($\rho$=-0.06, p=0.31), revealing a shape-over-magnitude dissociation: whether entropy decreases at every step matters, not how much. Violation count 0/1/2 gives 68.8%/50.8%/28.6% accuracy.
Token log-probability confidence worsens in calibration with step depth (ECE: 0.186->0.312), and monotonicity achieves +5.8 pp at 73.7% coverage, outperforming scalar baselines at approx 1,500 tokens/question--1/8 the cost of 40-chain self-consistency. Results replicate on Mistral-7B (n=300): monotone chains reach 72.3% vs. 37.6% (+34.7 pp; OR=4.33). Structural properties of uncertainty trajectories are thus more informative than aggregate measures.

[469]  arXiv:2603.18943 [pdf, ps, other]
Title: VGGT-360: Geometry-Consistent Zero-Shot Panoramic Depth Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents VGGT-360, a novel training-free framework for zero-shot, geometry-consistent panoramic depth estimation. Unlike prior view-independent training-free approaches, VGGT-360 reformulates the task as panoramic reprojection over multi-view reconstructed 3D models by leveraging the intrinsic 3D consistency of VGGT-like foundation models, thereby unifying fragmented per-view reasoning into a coherent panoramic understanding. To achieve robust and accurate estimation, VGGT-360 integrates three plug-and-play modules that form a unified panorama-to-3D-to-depth framework: (i) Uncertainty-guided adaptive projection slices panoramas into perspective views to bridge the domain gap between panoramic inputs and VGGT's perspective prior. It estimates gradient-based uncertainty to allocate denser views to geometry-poor regions, yielding geometry-informative inputs for VGGT. (ii) Structure-saliency enhanced attention strengthens VGGT's robustness during 3D reconstruction by injecting structure-aware confidence into its attention layers, guiding focus toward geometrically reliable regions and enhancing cross-view coherence. (iii) Correlation-weighted 3D model correction refines the reconstructed 3D model by reweighting overlapping points using attention-inferred correlation scores, providing a consistent geometric basis for accurate panoramic reprojection. Extensive experiments show that VGGT-360 outperforms both trained and training-free state-of-the-art methods across multiple resolutions and diverse indoor and outdoor datasets.

[470]  arXiv:2603.18944 [pdf, ps, other]
Title: Non-asymptotic uniform in time error bounds for new and old numerical schemes for SPDEs
Comments: 57 pages, 11 figures
Subjects: Numerical Analysis (math.NA); Probability (math.PR)

We study numerical schemes for Stochastic Partial Differential Equations (SPDEs). We introduce a general method of proof of non-asymptotic uniform in time error bounds on numerical integrators for SPDEs, ensuring the schemes capture both the transient and the long term dynamics faithfully. We then consider SPDEs with non-globally Lipshitz nonlinearities, which include for example the stochastic Allen-Cahn equation and some stochastic advection-diffusion equations. For the case of Allen-Cahn type SPDEs we show that the classic semi-implicit Euler time-discretization can exhibit finite time blow up. This motivates analysing other schemes which do not suffer from this blow-up problem. We consider three numerical schemes for SPDEs with non globally Lipshitz nonlinearity: a fully implicit scheme and two tamed schemes. For these schemes we prove non-asymptotic uniform in time error bounds by leveraging our general criterion, and provide numerical comparisons. While the main emphasis in this paper is on the properties of the time-discretization, the schemes we consider are full space-time discretization of the SPDE.

[471]  arXiv:2603.18945 [pdf, ps, other]
Title: A conceptual framework for ideology beyond the left and right
Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL)

NLP+CSS work has operationalized ideology almost exclusively on a left/right partisan axis. This approach obscures the fact that people hold interpretations of many different complex and more specific ideologies on issues like race, climate, and gender. We introduce a framework that understands ideology as an attributed, multi-level socio-cognitive concept network, and explains how ideology manifests in discourse in relation to other relevant social processes like framing. We demonstrate how this framework can clarifies overlaps between existing NLP tasks (e.g. stance detection and natural language inference) and also how it reveals new research directions. Our work provides a unique and important bridge between computational methods and ideology theory, enabling richer analysis of social discourse in a way that benefits both fields.

[472]  arXiv:2603.18947 [pdf, ps, other]
Title: On the Minimum Number of Control Laws for Nonlinear Systems with Input-Output Linearisation Singularities
Comments: 14
Subjects: Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)

This paper addresses the fundamental question of determining the minimum number of distinct control laws required for global controllability of nonlinear systems that exhibit singularities in their feedback linearising controllers. We introduce and rigorously prove the (k+1)-Controller Lemma, which establishes that for an nth order single-input single-output nonlinear system with a singularity manifold parameterised by k algebraically independent conditions, exactly k+1 distinct control laws are necessary and sufficient for complete state-space coverage. The sufficiency proof is constructive, employing the approximate linearisation methodology together with transversality arguments from differential topology. The necessity proof proceeds by contradiction, using the Implicit Function Theorem, a dimension-counting argument and structural constraints inherent to the approximate linearisation framework. The result is validated through exhaustive analysis of the ball-and-beam system, a fourth-order mechanical system that exhibits a two-parameter singularity at the third output derivative.

[473]  arXiv:2603.18949 [pdf, ps, other]
Title: Heart Artifact Removal in Electrohysterography Measurements Using Algebraic Differentiators
Subjects: Systems and Control (eess.SY)

Electrohysterography (EHG) enables non-invasive monitoring of uterine contractions but can be contaminated by electrocardiogram (ECG) artifacts. This work presents an ECG removal method using algebraic differentiators, a control-theoretic tool for model-free derivative estimation, that preserves signal shape outside the detected cardiac pulse locations. The differentiator parameters are designed to simultaneously suppress slow physiological artifacts and powerline interference while maximizing output signal-to-noise ratio. Cross-channel clustering distinguishes cardiac pulses from localized artifacts, enabling accurate pulse subtraction without auxiliary ECG references. Implemented as a causal FIR filter, the method is validated on multichannel EHG recordings from female and male subjects and compared to the template subtraction method.

[474]  arXiv:2603.18950 [pdf, ps, other]
Title: What We Talk About When We Talk About Frameworks in HCI
Comments: 25 pages, 8 figures, The ACM CHI conference on Human Factors in Computing Systems 2026
Subjects: Human-Computer Interaction (cs.HC)

In HCI, frameworks function as a type of theoretical contribution, often supporting ideation, design, and evaluation. Yet, little is known about how they are actually used, what functions they serve, and which scholarly practices that shape them. To address this gap, we conducted a systematic review of 615 papers from a decade of CHI proceedings (2015-2024) that prominently featured the term framework. We classified these papers into six engagement types. We then examined the role, form, and essential components of newly proposed frameworks through a functional typology, analyzing how they are constructed, validated, and articulated for reuse. Our results show that enthusiasm for proposing new frameworks exceeds the willingness to iterate on existing ones. They also highlight the ambiguity in the function of frameworks and the scarcity of systematic validation. Based on these insights, we call for more rigorous, reflective, and cumulative practices in the development and use of frameworks in HCI.

[475]  arXiv:2603.18953 [pdf, ps, other]
Title: Context Bootstrapped Reinforcement Learning
Subjects: Machine Learning (cs.LG)

Reinforcement Learning from Verifiable Rewards (RLVR) suffers from exploration inefficiency, where models struggle to generate successful rollouts, resulting in minimal learning signal. This challenge is particularly severe for tasks that require the acquisition of novel reasoning patterns or domain-specific knowledge. To address this, we propose Context Bootstrapped Reinforcement Learning (CBRL), which augments RLVR training by stochastically prepending few-shot demonstrations to training prompts. The injection probability follows a curriculum that starts high to bootstrap early exploration, then anneals to zero so the model must ultimately succeed without assistance. This forces the policy to internalize reasoning patterns from the demonstrations rather than relying on them at test time. We validate CBRL across two model families and five Reasoning Gym tasks. Our results demonstrate that CBRL consistently improves success rate, provides better exploration efficiency, and is algorithm-agnostic. We further demonstrate CBRL's practical applicability on Q, a domain-specific programming language that diverges significantly from mainstream language conventions.

[476]  arXiv:2603.18954 [pdf, ps, other]
Title: Balancing Performance and Fairness in Explainable AI for Anomaly Detection in Distributed Power Plants Monitoring
Subjects: Machine Learning (cs.LG)

Reliable anomaly detection in distributed power plant monitoring systems is essential for ensuring operational continuity and reducing maintenance costs, particularly in regions where telecom operators heavily rely on diesel generators. However, this task is challenged by extreme class imbalance, lack of interpretability, and potential fairness issues across regional clusters. In this work, we propose a supervised ML framework that integrates ensemble methods (LightGBM, XGBoost, Random Forest, CatBoost, GBDT, AdaBoost) and baseline models (Support Vector Machine, K-Nearrest Neighbors, Multilayer Perceptrons, and Logistic Regression) with advanced resampling techniques (SMOTE with Tomek Links and ENN) to address imbalance in a dataset of diesel generator operations in Cameroon. Interpretability is achieved through SHAP (SHapley Additive exPlanations), while fairness is quantified using the Disparate Impact Ratio (DIR) across operational clusters. We further evaluate model generalization using Maximum Mean Discrepancy (MMD) to capture domain shifts between regions. Experimental results show that ensemble models consistently outperform baselines, with LightGBM achieving an F1-score of 0.99 and minimal bias across clusters (DIR $\approx 0.95$). SHAP analysis highlights fuel consumption rate and runtime per day as dominant predictors, providing actionable insights for operators. Our findings demonstrate that it is possible to balance performance, interpretability, and fairness in anomaly detection, paving the way for more equitable and explainable AI systems in industrial power management. {\color{black} Finally, beyond offline evaluation, we also discuss how the trained models can be deployed in practice for real-time monitoring. We show how containerized services can process in real-time, deliver low-latency predictions, and provide interpretable outputs for operators.

[477]  arXiv:2603.18957 [pdf, ps, other]
Title: BVSIMC: Bayesian Variable Selection-Guided Inductive Matrix Completion for Improved and Interpretable Drug Discovery
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

Recent advances in drug discovery have demonstrated that incorporating side information (e.g., chemical properties about drugs and genomic information about diseases) often greatly improves prediction performance. However, these side features can vary widely in relevance and are often noisy and high-dimensional. We propose Bayesian Variable Selection-Guided Inductive Matrix Completion (BVSIMC), a new Bayesian model that enables variable selection from side features in drug discovery. By learning sparse latent embeddings, BVSIMC improves both predictive accuracy and interpretability. We validate our method through simulation studies and two drug discovery applications: 1) prediction of drug resistance in Mycobacterium tuberculosis, and 2) prediction of new drug-disease associations in computational drug repositioning. On both synthetic and real data, BVSIMC outperforms several other state-of-the-art methods in terms of prediction. In our two real examples, BVSIMC further reveals the most clinically meaningful side features.

[478]  arXiv:2603.18958 [pdf, ps, other]
Title: Optimal Path Planning in Hostile Environments
Comments: Accepted for publication at ICAPS-2026 (25 pages, 6 figures)
Subjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)

Coordinating agents through hazardous environments, such as aid-delivering drones navigating conflict zones or field robots traversing deployment areas filled with obstacles, poses fundamental planning challenges. We introduce and analyze the computational complexity of a new multi-agent path planning problem that captures this setting. A group of identical agents begins at a common start location and must navigate a graph-based environment to reach a common target. The graph contains hazards that eliminate agents upon contact but then enter a known cooldown period before reactivating. In this discrete-time, fully-observable, deterministic setting, the planning task is to compute a movement schedule that maximizes the number of agents reaching the target. We first prove that, despite the exponentially large space of feasible plans, optimal plans require only polynomially-many steps, establishing membership in NP. We then show that the problem is NP-hard even when the environment graph is a tree. On the positive side, we present a polynomial-time algorithm for graphs consisting of vertex-disjoint paths from start to target. Our results establish a rich computational landscape for this problem, identifying both intractable and tractable fragments.

[479]  arXiv:2603.18960 [pdf, ps, other]
Title: Sketch2Topo: Using Hand-Drawn Inputs for Diffusion-Based Topology Optimization
Comments: 5 pages, 4 figures, accepted at CHI 2026 as a poster
Subjects: Human-Computer Interaction (cs.HC)

Topology optimization (TO) is employed in engineering to optimize structural performance while maximizing material efficiency. However, traditional TO methods incur significant computational and time costs. Although research has leveraged generative AI to predict TO outcomes and validated feasibility and accuracy, existing approaches still suffer from limited customizability and impose a high cognitive load on users. Furthermore, balancing structural performance with aesthetic attributes remains a persistent challenge. We developed Sketch2Topo, which augments a diffusion-based TO model with image-to-image generation and image editing capabilities. With Sketch2Topo, users can use sketching to customize geometries and specify physical constraints. The tool also supports mask input, enabling users to perform TO on selected regions only, thereby supporting higher levels of customization. We summarize the workflow and details of the tool and conduct a brief quantitative evaluation. Finally, we explore application scenarios and discuss how hand-drawn input improves usability while balancing functionality and aesthetics.

[480]  arXiv:2603.18964 [pdf, ps, other]
Title: Terms of (Ab)Use: An Analysis of GenAI Services
Comments: Peer-reviewed, to be presented at ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2026
Subjects: Computers and Society (cs.CY)

Generative AI services like ChatGPT and Gemini are some of the fastest-growing consumer services. Individuals using such services must accept their terms of use before access, and conform to these terms for continued use of the service. Established literature has shown that despite their status as legally-binding agreements, terms of use are not actually well-understood, and may contain implications that are surprising for consumers. In this paper, we analyse the terms of 6 generative AI services from the perspective of an EU-based consumer. Our findings, based on a developed codebook which we provide in the paper, reiterate known issues regarding generative AI services such as the default use of user data for training and surface new concerns regarding responsibility, liability, and rights. All terms in our analysis contained language that explicitly discards assurances regarding the quality, availability and appropriateness of the service, regardless of whether the service is free or paid. The terms also make users solely responsible for outputs meeting norms dictated by the provider, despite no information or control being provided over the functioning of the model, and at the risk of account termination. The terms further restrict users in how outputs can be used while service providers utilise both user-provided inputs as well as user-liable outputs for a wide variety of purposes at their discretion. The implications of these practices are severe, as we find consumers suffer from lack of necessary information, significant imbalance of power, and have responsibilities they cannot materially fulfil without violating the terms. To remedy this situation, we make concrete recommendations for authorities and policymakers to urgently upgrade existing consumer protection mechanisms to tackle this growing issue.

[481]  arXiv:2603.18965 [pdf, ps, other]
Title: Maximum-Entropy Exploration with Future State-Action Visitation Measures
Comments: arXiv admin note: substantial text overlap with arXiv:2412.06655
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Maximum entropy reinforcement learning motivates agents to explore states and actions to maximize the entropy of some distribution, typically by providing additional intrinsic rewards proportional to that entropy function. In this paper, we study intrinsic rewards proportional to the entropy of the discounted distribution of state-action features visited during future time steps. This approach is motivated by two results. First, we show that the expected sum of these intrinsic rewards is a lower bound on the entropy of the discounted distribution of state-action features visited in trajectories starting from the initial states, which we relate to an alternative maximum entropy objective. Second, we show that the distribution used in the intrinsic reward definition is the fixed point of a contraction operator and can therefore be estimated off-policy. Experiments highlight that the new objective leads to improved visitation of features within individual trajectories, in exchange for slightly reduced visitation of features in expectation over different trajectories, as suggested by the lower bound. It also leads to improved convergence speed for learning exploration-only agents. Control performance remains similar across most methods on the considered benchmarks.

[482]  arXiv:2603.18968 [pdf, ps, other]
Title: Teleological Inference in Structural Causal Models via Intentional Interventions
Comments: 29 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI)

Structural causal models (SCMs) were conceived to formulate and answer causal questions. This paper shows that SCMs can also be used to formulate and answer teleological questions, concerning the intentions of a state-aware, goal-directed agent intervening in a causal system. We review limitations of previous approaches to modeling such agents, and then introduce intentional interventions, a new time-agnostic operator that induces a twin SCM we call a structural final model (SFM). SFMs treat observed values as the outcome of intentional interventions and relate them to the counterfactual conditions of those interventions (what would have happened had the agent not intervened). We show how SFMs can be used to empirically detect agents and to discover their intentions.

[483]  arXiv:2603.18972 [pdf, ps, other]
Title: Best-of-Both-Worlds Multi-Dueling Bandits: Unified Algorithms for Stochastic and Adversarial Preferences under Condorcet and Borda Objectives
Subjects: Machine Learning (cs.LG)

Multi-dueling bandits, where a learner selects $m \geq 2$ arms per round and observes only the winner, arise naturally in many applications including ranking and recommendation systems, yet a fundamental question has remained open: can a single algorithm perform optimally in both stochastic and adversarial environments, without knowing which regime it faces? We answer this affirmatively, providing the first best-of-both-worlds algorithms for multi-dueling bandits under both Condorcet and Borda objectives. For the Condorcet setting, we propose \texttt{MetaDueling}, a black-box reduction that converts any dueling bandit algorithm into a multi-dueling bandit algorithm by transforming multi-way winner feedback into an unbiased pairwise signal. Instantiating our reduction with \texttt{Versatile-DB} yields the first best-of-both-worlds algorithm for multi-dueling bandits: it achieves $O(\sqrt{KT})$ pseudo-regret against adversarial preferences and the instance-optimal $O\!\left(\sum_{i \neq a^\star} \frac{\log T}{\Delta_i}\right)$ pseudo-regret under stochastic preferences, both simultaneously and without prior knowledge of the regime. For the Borda setting, we propose \AlgBorda, a stochastic-and-adversarial algorithm that achieves $O\left(K^2 \log KT + K \log^2 T + \sum_{i: \Delta_i^{\mathrm{B}} > 0} \frac{K\log KT}{(\Delta_i^{\mathrm{B}})^2}\right)$ regret in stochastic environments and $O\left(K \sqrt{T \log KT} + K^{1/3} T^{2/3} (\log K)^{1/3}\right)$ regret against adversaries, again without prior knowledge of the regime. We complement our upper bounds with matching lower bounds for the Condorcet setting. For the Borda setting, our upper bounds are near-optimal with respect to the lower bounds (within a factor of $K$) and match the best-known results in the literature.

[484]  arXiv:2603.18976 [pdf, ps, other]
Title: Evaluating 5W3H Structured Prompting for Intent Alignment in Human-AI Interaction
Authors: Peng Gang
Comments: 27 pages, figures, tables, and appendix. Primary category: human-computer interaction / human-AI interaction. Public artifact repository and implementation resources are referenced in the manuscript
Subjects: Artificial Intelligence (cs.AI)

Natural language prompts often suffer from intent transmission loss: the gap between what users actually need and what they communicate to AI systems. We evaluate PPS (Prompt Protocol Specification), a 5W3H-based framework for structured intent representation in human-AI interaction. In a controlled three-condition study across 60 tasks in three domains (business, technical, and travel), three large language models (DeepSeek-V3, Qwen-Max, and Kimi), and three prompt conditions - (A) simple prompts, (B) raw PPS JSON, and (C) natural-language-rendered PPS - we collect 540 AI-generated outputs evaluated by an LLM judge. We introduce goal_alignment, a user-intent-centered evaluation dimension, and find that rendered PPS outperforms both simple prompts and raw JSON on this metric. PPS gains are task-dependent: gains are large in high-ambiguity business analysis tasks but reverse in low-ambiguity travel planning. We also identify a measurement asymmetry in standard LLM evaluation, where unconstrained prompts can inflate constraint adherence scores and mask the practical value of structured prompting. A preliminary retrospective survey (N = 20) further suggests a 66.1% reduction in follow-up prompts required, from 3.33 to 1.13 rounds. These findings suggest that structured intent representations can improve alignment and usability in human-AI interaction, especially in tasks where user intent is inherently ambiguous.

[485]  arXiv:2603.18978 [pdf, ps, other]
Title: On Affordable High-Order Entropy-Conservative/Stable and Well-Balanced Methods for Nonconservative Hyperbolic Systems
Comments: Reproducibility repository: this https URL
Subjects: Numerical Analysis (math.NA)

Many entropy-conservative and entropy-stable (summarized as entropy-preserving) methods for hyperbolic conservation laws rely on Tadmor's theory for two-point entropy-preserving numerical fluxes and its higher-order extension via flux differencing using summation-by-parts (SBP) operators, e.g., in discontinuous Galerkin spectral element methods (DGSEMs). The underlying two-point formulations have been extended to nonconservative systems using fluctuations by Castro et al. (2013, doi:10.1137/110845379) with follow-up generalizations to SBP methods. We propose specific forms of entropy-preserving fluctuations for nonconservative hyperbolic systems that are simple to interpret and allow an algorithmic construction of entropy-preserving methods. We analyze necessary and sufficient conditions, and obtain a full characterization of entropy-preserving three-point methods within the finite volume framework. This formulation is extended to SBP methods in multiple space dimensions on Cartesian and curvilinear meshes. Additional properties such as well-balancedness extend naturally from the underlying finite volume method to the SBP framework. We use the algorithmic construction enabled by the chosen formulation to derive several new entropy-preserving schemes for nonconservative hyperbolic systems, e.g., the compressible Euler equations of an ideal gas using the internal energy equation and a dispersive shallow-water model. Numerical experiments show the robustness and accuracy of the proposed schemes.

[486]  arXiv:2603.18979 [pdf, ps, other]
Title: PRIOR: Perceptive Learning for Humanoid Locomotion with Reference Gait Priors
Comments: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Training perceptive humanoid locomotion policies that traverse complex terrains with natural gaits remains an open challenge, typically demanding multi-stage training pipelines, adversarial objectives, or extensive real-world calibration. We present PRIOR, an efficient and reproducible framework built on Isaac Lab that achieves robust terrain traversal with human-like gaits through a simple yet effective design: (i) a parametric gait generator that supplies stable reference trajectories derived from motion capture without adversarial training, (ii) a GRU-based state estimator that infers terrain geometry directly from egocentric depth images via self-supervised heightmap reconstruction, and (iii) terrain-adaptive footstep rewards that guide foot placement toward traversable regions. Through systematic analysis of depth image resolution trade-offs, we identify configurations that maximize terrain fidelity under real-time constraints, substantially reducing perceptual overhead without degrading traversal performance. Comprehensive experiments across terrains of varying difficulty-including stairs, boxes, and gaps-demonstrate that each component yields complementary and essential performance gains, with the full framework achieving a 100% traversal success rate. We will open-source the complete PRIOR framework, including the training pipeline, parametric gait generator, and evaluation benchmarks, to serve as a reproducible foundation for humanoid locomotion research on Isaac Lab.

[487]  arXiv:2603.18980 [pdf, ps, other]
Title: A bilinear inverse problem with forward operator inaccuracy applied to neonatal atlas-based diffuse optical tomography
Comments: 36 pages, 8 figures
Subjects: Numerical Analysis (math.NA)

Linear inverse problems are highly common in practical real-world applications from industry to medical imaging. The forward operator is often built on some approximations of the studied system. Handling inaccuracies in the forward operator in the context of inverse problems is a relatively unstudied problem. In this work, we assume that we have a set of candidate forward operator matrices and suggest principal component analysis (PCA) for modeling their variation from the mean. We combine the original linear problem with the included forward operator inaccuracy into a bilinear tensor inverse problem and present two optimization algorithms and Gibbs sampling for approximately solving the problem. As a real-world test case, we apply the algorithms to account for the inaccuracy that is present in the sensitivity profiles or Jacobian matrices in diffuse optical tomography when an atlas-based model of the head anatomy is used instead of the subject's own anatomical model in neonates over a wide range of gestational ages (29--44 weeks). We report visual and numerical improvements in the spatial localization and contrast-to-noise-ratio in reconstructions of simulated hemodynamic activity.

[488]  arXiv:2603.18981 [pdf, ps, other]
Title: Book your room in the Turing Hotel! A symmetric and distributed Turing Test with multiple AIs and humans
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)

In this paper, we report our experience with ``TuringHotel'', a novel extension of the Turing Test based on interactions within mixed communities of Large Language Models (LLMs) and human participants. The classical one-to-one interaction of the Turing Test is reinterpreted in a group setting, where both human and artificial agents engage in time-bounded discussions and, interestingly, are both judges and respondents. This community is instantiated in the novel platform UNaIVERSE (https://unaiverse.io), creating a ``World'' which defines the roles and interaction dynamics, facilitated by the platform's built-in programming tools. All communication occurs over an authenticated peer-to-peer network, ensuring that no third parties can access the exchange. The platform also provides a unified interface for humans, accessible via both mobile devices and laptops, that was a key component of the experience in this paper. Results of our experimentation involving 17 human participants and 19 LLMs revealed that current models are still sometimes confused as humans. Interestingly, there are several unexpected mistakes, suggesting that human fingerprints are still identifiable but not fully unambiguous, despite the high-quality language skills of artificial participants. We argue that this is the first experiment conducted in such a distributed setting, and that similar initiatives could be of national interest to support ongoing experiments and competitions aimed at monitoring the evolution of large language models over time.

[489]  arXiv:2603.18987 [pdf, ps, other]
Title: Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis
Subjects: Artificial Intelligence (cs.AI)

Predictive policing systems that direct patrol resources based on algorithmically generated crime forecasts have been widely deployed across US cities, yet their tendency to encode and amplify racial disparities remains poorly understood in quantitative terms. We present a reproducible simulation framework that couples a Generative Adversarial Network GAN with a Noisy OR patrol detection model to measure how racial bias propagates through the full enforcement pipeline from crime occurrence to police contact. Using 145000 plus Part 1 crime records from Baltimore 2017 to 2019 and 233000 plus records from Chicago 2022, augmented with US Census ACS demographic data, we compute four monthly bias metrics across 264 city year mode observations: the Disparate Impact Ratio DIR, Demographic Parity Gap, Gini Coefficient, and a composite Bias Amplification Score.
Our experiments reveal extreme and year variant bias in Baltimores detected mode, with mean annual DIR up to 15714 in 2019, moderate under detection of Black residents in Chicago DIR equals 0.22, and persistent Gini coefficients of 0.43 to 0.62 across all conditions. We further demonstrate that a Conditional Tabular GAN CTGAN debiasing approach partially redistributes detection rates but cannot eliminate structural disparity without accompanying policy intervention. Socioeconomic regression analysis confirms strong correlations between neighborhood racial composition and detection likelihood Pearson r equals 0.83 for percent White and r equals negative 0.81 for percent Black. A sensitivity analysis over patrol radius, officer count, and citizen reporting probability reveals that outcomes are most sensitive to officer deployment levels. The code and data are publicly available at this repository.

[490]  arXiv:2603.18988 [pdf, ps, other]
Title: MERGE: Guided Vision-Language Models for Multi-Actor Event Reasoning and Grounding in Human-Robot Interaction
Subjects: Robotics (cs.RO)

We introduce MERGE, a system for situational grounding of actors, objects, and events in dynamic human-robot group interactions. Effective collaboration in such settings requires consistent situational awareness, built on persistent representations of people and objects and an episodic abstraction of events. MERGE achieves this by uniquely identifying physical instances of actors (humans or robots) and objects and structuring them into actor-action-object relations, ensuring temporal consistency across interactions. Central to MERGE is the integration of Vision-Language Models (VLMs) guided with a perception pipeline: a lightweight streaming module continuously processes visual input to detect changes and selectively invokes the VLM only when necessary. This decoupled design preserves the reasoning power and zero-shot generalization of VLMs while improving efficiency, avoiding both the high monetary cost and the latency of frame-by-frame captioning that leads to fragmented and delayed outputs. To address the absence of suitable benchmarks for multi-actor collaboration, we introduce the GROUND dataset, which offers fine-grained situational annotations of multi-person and human-robot interactions. On this dataset, our approach improves the average grounding score by a factor of 2 compared to the performance of VLM-only baselines - including GPT-4o, GPT-5 and Gemini 2.5 Flash - while also reducing run-time by a factor of 4. The code and data are available at www.github.com/HRI-EU/merge.

[491]  arXiv:2603.18991 [pdf, ps, other]
Title: CRAFT: Aligning Diffusion Models with Fine-Tuning Is Easier Than You Think
Comments: CVPR2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Aligning Diffusion models has achieved remarkable breakthroughs in generating high-quality, human preference-aligned images. Existing techniques, such as supervised fine-tuning (SFT) and DPO-style preference optimization, have become principled tools for fine-tuning diffusion models. However, SFT relies on high-quality images that are costly to obtain, while DPO-style methods depend on large-scale preference datasets, which are often inconsistent in quality. Beyond data dependency, these methods are further constrained by computational inefficiency. To address these two challenges, we propose Composite Reward Assisted Fine-Tuning (CRAFT), a lightweight yet powerful fine-tuning paradigm that requires significantly reduced training data while maintaining computational efficiency. It first leverages a Composite Reward Filtering (CRF) technique to construct a high-quality and consistent training dataset and then perform an enhanced variant of SFT. We also theoretically prove that CRAFT actually optimizes the lower bound of group-based reinforcement learning, establishing a principled connection between SFT with selected data and reinforcement learning. Our extensive empirical results demonstrate that CRAFT with only 100 samples can easily outperform recent SOTA preference optimization methods with thousands of preference-paired samples. Moreover, CRAFT can even achieve 11-220$\times$ faster convergences than the baseline preference optimization methods, highlighting its extremely high efficiency.

[492]  arXiv:2603.18992 [pdf, ps, other]
Title: Foundations of Schrödinger Bridges for Generative Modeling
Authors: Sophia Tang
Comments: 220 pages, 24 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

At the core of modern generative modeling frameworks, including diffusion models, score-based models, and flow matching, is the task of transforming a simple prior distribution into a complex target distribution through stochastic paths in probability space. Schr\"odinger bridges provide a unifying principle underlying these approaches, framing the problem as determining an optimal stochastic bridge between marginal distribution constraints with minimal-entropy deviations from a pre-defined reference process. This guide develops the mathematical foundations of the Schr\"odinger bridge problem, drawing on optimal transport, stochastic control, and path-space optimization, and focuses on its dynamic formulation with direct connections to modern generative modeling. We build a comprehensive toolkit for constructing Schr\"odinger bridges from first principles, and show how these constructions give rise to generalized and task-specific computational methods.

[493]  arXiv:2603.18994 [pdf, ps, other]
Title: Evaluating Game Difficulty in Tetris Block Puzzle
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Tetris Block Puzzle is a single player stochastic puzzle in which a player places blocks on an 8 x 8 grid to complete lines; its popular variants have amassed tens of millions of downloads. Despite this reach, there is little principled assessment of which rule sets are more difficult. Inspired by prior work that uses AlphaZero as a strong evaluator for chess variants, we study difficulty in this domain using Stochastic Gumbel AlphaZero (SGAZ), a budget-aware planning agent for stochastic environments. We evaluate rule changes including holding block h, preview holding block p, and additional Tetris block variants using metrics such as training reward and convergence iterations. Empirically, increasing h and p reduces difficulty (higher reward and faster convergence), while adding more Tetris block variants increases difficulty, with the T-pentomino producing the largest slowdown. Through analysis, SGAZ delivers strong play under small simulation budgets, enabling efficient, reproducible comparisons across rule sets and providing a reference for future design in stochastic puzzle games.

[494]  arXiv:2603.18999 [pdf, ps, other]
Title: Regret Bounds for Competitive Resource Allocation with Endogenous Costs
Authors: Rui Chai
Comments: This is Paper 7 in a 9-paper series on Super-Alignment via Wuxing Institutional Architecture. The series explores resource competition and institutional design for human-aligned AI systems
Subjects: Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)

We study online resource allocation among N interacting modules over T rounds. Unlike standard online optimization, costs are endogenous: they depend on the full allocation vector through an interaction matrix W encoding pairwise cooperation and competition.
We analyze three paradigms: (I) uniform allocation (cost-ignorant), (II) gated allocation (cost-estimating), and (III) competitive allocation via multiplicative weights update with interaction feedback (cost-revealing). Our main results establish a strict separation under adversarial sequences with bounded variation: uniform incurs Omega(T) regret, gated achieves O(T^{2/3}), and competitive achieves O(sqrt(T log N)). The performance gap stems from competitive allocation's ability to exploit endogenous cost information revealed through interactions.
We further show that W's topology governs a computation-regret tradeoff. Full interaction (|E|=O(N^2)) yields the tightest bound but highest per-step cost, while sparse topologies (|E|=O(N)) increase regret by at most O(sqrt(log N)) while reducing per-step cost from O(N^2) to O(N). Ring-structured topologies with both cooperative and competitive links - of which the five-element Wuxing topology is canonical - minimize the computation x regret product.
These results provide the first formal regret-theoretic justification for decentralized competitive allocation in modular architectures and establish cost endogeneity as a fundamental challenge distinct from partial observability.
Keywords: online learning, regret bounds, resource allocation, endogenous costs, interaction topology, multiplicative weights, modular systems, Wuxing topology

[495]  arXiv:2603.19000 [pdf, ps, other]
Title: SVLAT: Scientific Visualization Literacy Assessment Test
Subjects: Human-Computer Interaction (cs.HC)

Scientific visualization (SciVis) has become an essential means for exploring, understanding, and communicating complex scientific phenomena. However, the field still lacks a validated instrument assessing how well people read, understand, and interpret them. We present a scientific visualization literacy assessment test (SVLAT) that measures the general public's SciVis literacy. Covering a range of visualization forms and interpretation demands, SVLAT comprises 49 items grounded in 18 scientific visualizations and illustrations spanning eight visualization techniques and 11 tasks. Instrument development followed a staged, psychometrically grounded pipeline. We defined the construct and blueprint, followed by item generation, and expert review with five SciVis experts using the content validity ratio (mean CVR = 0.79). We subsequently administered a pilot test (30 participants) and a large-scale test tryout (485 participants) to evaluate the instrument's psychometric properties. For validation, we performed item analysis and refinement using both classical test theory (CTT) and item response theory (IRT) to examine item functioning and overall test quality. SVLAT demonstrates high reliability in the tryout sample (McDonald's omega_t = 0.82, Cronbach's alpha = 0.81). The assessment materials are available at https://osf.io/hr3nw/.

[496]  arXiv:2603.19002 [pdf, ps, other]
Title: RADIUS: Ranking, Distribution, and Significance - A Comprehensive Alignment Suite for Survey Simulation
Subjects: Computation and Language (cs.CL)

Simulation of surveys using LLMs is emerging as a powerful application for generating human-like responses at scale. Prior work evaluates survey simulation using metrics borrowed from other domains, which are often ad hoc, fragmented, and non-standardized, leading to results that are difficult to compare. Moreover, existing metrics focus mainly on accuracy or distributional measures, overlooking the critical dimension of ranking alignment. In practice, a simulation can achieve high accuracy while still failing to capture the option most preferred by humans - a distinction that is critical in decision-making applications. We introduce RADIUS, a comprehensive two-dimensional alignment suite for survey simulation that captures: 1) RAnking alignment and 2) DIstribUtion alignment, each complemented by statistical Significance testing. RADIUS highlights the limitations of existing metrics, enables more meaningful evaluation of survey simulation, and provides an open-source implementation for reproducible and comparable assessment.

[497]  arXiv:2603.19004 [pdf, ps, other]
Title: Unleashing the Power of Simplicity: A Minimalist Strategy for State-of-the-Art Fingerprint Enhancement
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Fingerprint recognition systems, which rely on the unique characteristics of human fingerprints, are essential in modern security and verification applications. Accurate minutiae extraction, a critical step in these systems, depends on the quality of fingerprint images. Despite recent improvements in fingerprint enhancement techniques, state-of-the-art methods often struggle with low-quality fingerprints and can be computationally demanding. This paper presents a minimalist approach to fingerprint enhancement, prioritizing simplicity and effectiveness. Two novel methods are introduced: a contextual filtering method and a learning-based method. These techniques consistently outperform complex state-of-the-art methods, producing clearer, more accurate, and less noisy images. The effectiveness of these methods is validated using a challenging latent fingerprint database. The open-source implementation of these techniques not only fosters reproducibility but also encourages further advancements in the field. The findings underscore the importance of simplicity in achieving high-quality fingerprint enhancement and suggest that future research should balance complexity and practical benefits.

[498]  arXiv:2603.19005 [pdf, ps, other]
Title: AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)

Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data science workflow. However, it remains unclear to what extent AI agents can match the performance of human experts on domain-specific data science tasks, and in which aspects human expertise continues to provide advantages. We introduce AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science. AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking. We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines. Our results show that current AI agents struggle with domain-specific reasoning. AI-only baselines perform near or below the median of competition participants, while the strongest solutions arise from human-AI collaboration. These findings challenge the narrative of complete automation by AI and underscore the enduring importance of human expertise in data science, while illuminating directions for the next generation of AI. Visit the AgentDS website here: https://agentds.org/ and open source datasets here: https://huggingface.co/datasets/lainmn/AgentDS .

[499]  arXiv:2603.19008 [pdf, ps, other]
Title: Hypothesis-Conditioned Query Rewriting for Decision-Useful Retrieval
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by grounding generation in external, non-parametric knowledge. However, when a task requires choosing among competing options, simply grounding generation in broadly relevant context is often insufficient to drive the final decision. Existing RAG methods typically rely on a single initial query, which often favors topical relevance over decision-relevant evidence, and therefore retrieves background information that can fail to discriminate among answer options. To address this issue, here we propose Hypothesis-Conditioned Query Rewriting (HCQR), a training-free pre-retrieval framework that reorients RAG from topic-oriented retrieval to evidence-oriented retrieval. HCQR first derives a lightweight working hypothesis from the input question and candidate options, and then rewrites retrieval into three targeted queries that seek evidence to: (1) support the hypothesis, (2) distinguish it from competing alternatives, and (3) verify salient clues in the question. This approach enables context retrieval that is more directly aligned with answer selection, allowing the generator to confirm or overturn the initial hypothesis based on the retrieved evidence. Experiments on MedQA and MMLU-Med show that HCQR consistently outperforms single-query RAG and re-rank/filter baselines, improving average accuracy over Simple RAG by 5.9 and 3.6 points, respectively. Code is available at https://anonymous.4open.science/r/HCQR-1C2E.

[500]  arXiv:2603.19011 [pdf, ps, other]
Title: Security awareness in LLM agents: the NDAI zone case
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

NDAI zones let inventor and investor agents negotiate inside a Trusted Execution Environment (TEE) where any disclosed information is deleted if no deal is reached. This makes full IP disclosure the rational strategy for the inventor's agent. Leveraging this infrastructure, however, requires agents to distinguish a secure environment from an insecure one, a capability LLM agents lack natively, since they can rely only on evidence passed through the context window to form awareness of their execution environment. We ask: How do different LLM models weight various forms of evidence when forming awareness of the security of their execution environment? Using an NDAI-style negotiation task across 10 language models and various evidence scenarios, we find a clear asymmetry: a failing attestation universally suppresses disclosure across all models, whereas a passing attestation produces highly heterogeneous responses: some models increase disclosure, others are unaffected, and a few paradoxically reduce it. This reveals that current LLM models can reliably detect danger signals but cannot reliably verify safety, the very capability required for privacy-preserving agentic protocols such as NDAI zones. Bridging this gap, possibly through interpretability analysis, targeted fine-tuning, or improved evidence architectures, remains the central open challenge for deploying agents that calibrate information sharing to actual evidence quality.

[501]  arXiv:2603.19013 [pdf, ps, other]
Title: Generalized Hand-Object Pose Estimation with Occlusion Awareness
Comments: 25 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generalized 3D hand-object pose estimation from a single RGB image remains challenging due to the large variations in object appearances and interaction patterns, especially under heavy occlusion. We propose GenHOI, a framework for generalized hand-object pose estimation with occlusion awareness. GenHOI integrates hierarchical semantic knowledge with hand priors to enhance model generalization under challenging occlusion conditions. Specifically, we introduce a hierarchical semantic prompt that encodes object states, hand configurations, and interaction patterns via textual descriptions. This enables the model to learn abstract high-level representations of hand-object interactions for generalization to unseen objects and novel interactions while compensating for missing or ambiguous visual cues. To enable robust occlusion reasoning, we adopt a multi-modal masked modeling strategy over RGB images, predicted point clouds, and textual descriptions. Moreover, we leverage hand priors as stable spatial references to extract implicit interaction constraints. This allows reliable pose inference even under significant variations in object shapes and interaction patterns. Extensive experiments on the challenging DexYCB and HO3Dv2 benchmarks demonstrate that our method achieves state-of-the-art performance in hand-object pose estimation.

[502]  arXiv:2603.19016 [pdf, ps, other]
Title: Literature Study on Operational Data Analytics Frameworks in Large-scale Computing Infrastructures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

By 2025, there are zettabytes of data generated every year. The size and complexity of modern large-scale computing infrastructures like High-Performance Computing (HPC) systems continue to evolve and become complex, leaving us wondering about their manageability and sustainability concerns. Because of this reason, those complex systems are provided with fine-grained monitoring and Operational Data Analytics (ODA) capabilities to optimise their efficiency. In this literature study, we list the fundamental pillars of the large-scale computing infrastructures which enable its ODA capabilities, and conduct a study of the popular ODA frameworks operating in various such environments (predominantly HPC). Based on that, we propose a more holistic ODA framework matching the various layers of a large-scale graph-processing distributed ecosystem proposed by Sherif Sak et al, that extends the ODA functionalities presented in an existing novel ODA framework proposed by Netti et al. We compare the holistic ODA framework proposed by us to some of the state-of-the-art frameworks that we study as part of this literature to highlight the novelty, which would hopefully draw more attention to perform extensive research in this field. As part of creating awareness, we highlight the significant operational efficiencies observed as a result of the implementation of the state-of-the-art ODA frameworks to make the study appear beneficial for the readers, and lastly, discuss the trending research work ongoing in this field.

[503]  arXiv:2603.19017 [pdf, ps, other]
Title: What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present MultiTempBench, a multilingual temporal reasoning benchmark spanning three tasks, date arithmetic, time zone conversion, and temporal relation extraction across five languages (English, German, Chinese, Arabic, and Hausa) and multiple calendar conventions (Gregorian, Hijri, and Chinese Lunar). MultiTempBench contains $15,000$ examples built by translating $750$ curated English questions and expanding each into controlled date-format variants. We evaluate 20 LLMs and introduce the multilingual Date Fragmentation Ratio (mDFR), calibrated with human severity ratings, together with geometric-probing analyses of internal temporal representations. We find tokenisation quality of temporal artefacts is a resource-dependent bottleneck: in low-resource languages and rarer calendar formats, fragmentation disrupts Year/Month/Day separation and accuracy collapses, while high-resource settings are often robust to digit-level splitting. Beyond tokenisation, crossed mixed-effects regression shows that temporal linearity is the strongest predictor of temporal reasoning in high-resource languages, whereas fragmentation is the stronger predictor in low-resource languages. Code is available at: https://github.com/gagan3012/mtb

[504]  arXiv:2603.19022 [pdf, ps, other]
Title: Behavioral Fingerprints for LLM Endpoint Stability and Identity
Comments: 4 pages, 1 figure, submitted to CAIS 2026 System Demonstrations
Subjects: Artificial Intelligence (cs.AI)

The consistency of AI-native applications depends on the behavioral consistency of the model endpoints that power them. Traditional reliability metrics such as uptime, latency and throughput do not capture behavioral change, and an endpoint can remain "healthy" while its effective model identity changes due to updates to weights, tokenizers, quantization, inference engines, kernels, caching, routing, or hardware. We introduce Stability Monitor, a black-box stability monitoring system that periodically fingerprints an endpoint by sampling outputs from a fixed prompt set and comparing the resulting output distributions over time. Fingerprints are compared using a summed energy distance statistic across prompts, with permutation-test p-values as evidence of distribution shift aggregated sequentially to detect change events and define stability periods. In controlled validation, Stability Monitor detects changes to model family, version, inference stack, quantization, and behavioral parameters. In real-world monitoring of the same model hosted by multiple providers, we observe substantial provider-to-provider and within-provider stability differences.

[505]  arXiv:2603.19025 [pdf, ps, other]
Title: Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference
Comments: 49 pages, 14 figures. Accepted at IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) 2026
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

When large AI models are deployed as cloud-based services, clients have no guarantee that responses are correct or were produced by the intended model. Rerunning inference locally is infeasible for large models, and existing cryptographic proof systems -- while providing strong correctness guarantees -- introduce prohibitive prover overhead (e.g., hundreds of seconds per query for billion-parameter models). We present a verification framework and protocol that replaces full cryptographic proofs with a lightweight, sampling-based approach grounded in statistical properties of neural networks. We formalize the conditions under which trace separation between functionally dissimilar models can be leveraged to argue the security of verifiable inference protocols. The prover commits to the execution trace of inference via Merkle-tree-based vector commitments and opens only a small number of entries along randomly sampled paths from output to input. This yields a protocol that trades soundness for efficiency, a tradeoff well-suited to auditing, large-scale deployment settings where repeated queries amplify detection probability, and scenarios with rationally incentivized provers who face penalties upon detection. Our approach reduces proving times by several orders of magnitude compared to state-of-the-art cryptographic proof systems, going from the order of minutes to the order of milliseconds, with moderately larger proofs. Experiments on ResNet-18 classifiers and Llama-2-7B confirm that common architectures exhibit the statistical properties our protocol requires, and that natural adversarial strategies (gradient-descent reconstruction, inverse transforms, logit swapping) fail to produce traces that evade detection. We additionally present a protocol in the refereed delegation model, where two competing servers enable correct output identification in a logarithmic number of rounds.

[506]  arXiv:2603.19026 [pdf, ps, other]
Title: Rethinking MLLM Itself as a Segmenter with a Single Segmentation Token
Comments: Paper is accepted by CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent segmentation methods leveraging Multi-modal Large Language Models (MLLMs) have shown reliable object-level segmentation and enhanced spatial perception. However, almost all previous methods predominantly rely on specialist mask decoders to interpret masks from generated segmentation-related embeddings and visual features, or incorporate multiple additional tokens to assist. This paper aims to investigate whether and how we can unlock segmentation from MLLM itSELF with 1 segmentation Embedding (SELF1E) while achieving competitive results, which eliminates the need for external decoders. To this end, our approach targets the fundamental limitation of resolution reduction in pixel-shuffled image features from MLLMs. First, we retain image features at their original uncompressed resolution, and refill them with residual features extracted from MLLM-processed compressed features, thereby improving feature precision. Subsequently, we integrate pixel-unshuffle operations on image features with and without LLM processing, respectively, to unleash the details of compressed features and amplify the residual features under uncompressed resolution, which further enhances the resolution of refilled features. Moreover, we redesign the attention mask with dual perception pathways, i.e., image-to-image and image-to-segmentation, enabling rich feature interaction between pixels and the segmentation token. Comprehensive experiments across multiple segmentation tasks validate that SELF1E achieves performance competitive with specialist mask decoder-based methods, demonstrating the feasibility of decoder-free segmentation in MLLMs. Project page: https://github.com/ANDYZAQ/SELF1E.

[507]  arXiv:2603.19028 [pdf, ps, other]
Title: SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models
Comments: CVPR Findings 2026. Project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Models that bridge vision and language, such as CLIP, are key components of multimodal AI, yet their large-scale, uncurated training data introduce severe social and spurious biases. Existing post-hoc debiasing methods often operate directly in the dense CLIP embedding space, where bias and task-relevant information are highly entangled. This entanglement limits their ability to remove bias without degrading semantic fidelity. In this work, we propose Sparse Embedding Modulation (SEM), a post-hoc, zero-shot debiasing framework that operates in a Sparse Autoencoder (SAE) latent space. By decomposing CLIP text embeddings into disentangled features, SEM identifies and modulates bias-relevant neurons while preserving query-relevant ones. This enables more precise, non-linear interventions. Across four benchmark datasets and two CLIP backbones, SEM achieves substantial fairness gains in retrieval and zero-shot classification. Our results demonstrate that sparse latent representations provide an effective foundation for post-hoc debiasing of vision-language models.

[508]  arXiv:2603.19029 [pdf, ps, other]
Title: ATG-MoE: Autoregressive trajectory generation with mixture-of-experts for assembly skill learning
Comments: 32 pages, 13 figures
Subjects: Robotics (cs.RO)

Flexible manufacturing requires robot systems that can adapt to constantly changing tasks, objects, and environments. However, traditional robot programming is labor-intensive and inflexible, while existing learning-based assembly methods often suffer from weak positional generalization, complex multi-stage designs, and limited multi-skill integration capability. To address these issues, this paper proposes ATG-MoE, an end-to-end autoregressive trajectory generation method with mixture of experts for assembly skill learning from demonstration. The proposed method establishes a closed-loop mapping from multi-modal inputs, including RGB-D observations, natural language instructions, and robot proprioception to manipulation trajectories. It integrates multi-modal feature fusion for scene and task understanding, autoregressive sequence modeling for temporally coherent trajectory generation, and a mixture-of-experts architecture for unified multi-skill learning. In contrast to conventional methods that separate visual perception and control or train different skills independently, ATG-MoE directly incorporates visual information into trajectory generation and supports efficient multi-skill integration within a single model. We train and evaluate the proposed method on eight representative assembly skills from a pressure-reducing valve assembly task. Experimental results show that ATG-MoE achieves strong overall performance in simulation, with an average grasp success rate of 96.3% and an average overall success rate of 91.8%, while also demonstrating strong generalization and effective multi-skill integration. Real-world experiments further verify its practicality for multi-skill industrial assembly. The project page can be found at https://hwh23.github.io/ATG-MoE

[509]  arXiv:2603.19030 [pdf, ps, other]
Title: LLMs Aren't Human: A Critical Perspective on LLM Personality
Comments: 4 pages
Subjects: Human-Computer Interaction (cs.HC)

A growing body of research examines personality traits in Large Language Models (LLMs), particularly in human-agent collaboration. Prior work has frequently applied the Big Five inventory to assess LLM behavior analogous to human personality, without questioning the underlying assumptions. This paper critically evaluates whether LLM responses to personality tests satisfy six defining characteristics of personality. We find that none are fully met, indicating that such assessments do not measure a construct equivalent to human personality. We propose a research agenda for shifting from anthropomorphic trait attribution toward functional evaluations, clarifying what personality tests actually capture in LLMs and developing LLM-specific frameworks for characterizing stable, intrinsic behavior.

[510]  arXiv:2603.19036 [pdf, ps, other]
Title: FUMO: Prior-Modulated Diffusion for Single Image Reflection Removal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Single image reflection removal (SIRR) is challenging in real scenes, where reflection strength varies spatially and reflection patterns are tightly entangled with transmission structures. This paper presents a diffusion model with prior modulation framework (FUMO) that introduces explicit guidance signals to improve spatial controllability and structural faithfulness. Two priors are extracted directly from the mixed image, an intensity prior that estimates spatial reflection severity and a high-frequency prior that captures detail-sensitive responses via multi-scale residual aggregation. We propose a coarse-to-fine training paradigm. In the first stage, these cues are combined to gate the conditional residual injections, focusing the conditioning on regions that are both reflection-dominant and structure-sensitive. In the second stage, a fine-grained refinement network corrects local misalignment and sharpens fine details in the image space. Experiments conducted on both standard benchmarks and challenging images in the wild demonstrate competitive quantitative results and consistently improved perceptual quality. The code is released at https://github.com/Lucious-Desmon/FUMO.

[511]  arXiv:2603.19039 [pdf, ps, other]
Title: TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation
Comments: Accepted by CVPR20206 (Main Track)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision-language models (VLMs) have shown promise in earth observation (EO), yet they struggle with tasks that require grounding complex spatial reasoning in precise pixel-level visual representations. To address this problem, we introduce TerraScope, a unified VLM that delivers pixel-grounded geospatial reasoning with two key capabilities: (1) modality-flexible reasoning: it handles single-modality inputs (optical or SAR) and adaptively fuses different modalities into the reasoning process when both are available; (2) multi-temporal reasoning: it integrates temporal sequences for change analysis across multiple time points. In addition, we curate Terra-CoT, a large-scale dataset containing 1 million samples with pixel-level masks embedded in reasoning chains across multiple sources. We also propose TerraScope-Bench, the first benchmark for pixel-grounded geospatial reasoning with six sub-tasks that evaluates both answer accuracy and mask quality to ensure authentic pixel-grounded reasoning. Experiments show that TerraScope significantly outperforms existing VLMs on pixel-grounded geospatial reasoning while providing interpretable visual evidence.

[512]  arXiv:2603.19040 [pdf, ps, other]
Title: When Differential Privacy Meets Wireless Federated Learning: An Improved Analysis for Privacy and Convergence
Comments: 5 pages, 1 figure
Subjects: Machine Learning (cs.LG)

Differentially private wireless federated learning (DPWFL) is a promising framework for protecting sensitive user data. However, foundational questions on how to precisely characterize privacy loss remain open, and existing work is further limited by convergence analyses that rely on restrictive convexity assumptions or ignore the effect of gradient clipping. To overcome these issues, we present a comprehensive analysis of privacy and convergence for DPWFL with general smooth non-convex loss objectives. Our analysis explicitly incorporates both device selection and mini-batch sampling, and shows that the privacy loss can converge to a constant rather than diverge with the number of iterations. Moreover, we establish convergence guarantees with gradient clipping and derive an explicit privacy-utility trade-off. Numerical results validate our theoretical findings.

[513]  arXiv:2603.19042 [pdf, ps, other]
Title: Man and machine: artificial intelligence and judicial decision making
Subjects: Artificial Intelligence (cs.AI)

The integration of artificial intelligence (AI) technologies into judicial decision-making - particularly in pretrial, sentencing, and parole contexts - has generated substantial concerns about transparency, reliability, and accountability. At the same time, these developments have brought the limitations of human judgment into sharper relief and underscored the importance of understanding how judges interact with AI-based decision aids. Using criminal justice risk assessment as a focal case, we conduct a synthetic review connecting three intertwined aspects of AI's role in judicial decision-making: the performance and fairness of AI tools, the strengths and biases of human judges, and the nature of AI+human interactions. Across the fields of computer science, economics, law, criminology and psychology, researchers have made significant progress in evaluating the predictive validity of automated risk assessment instruments, documenting biases in judicial decision-making, and, to a more limited extent, examining how judges use algorithmic recommendations. While the existing empirical evidence indicates that the impact of AI decision aid tools on pretrial and sentencing decisions is modest or inexistent, our review also reveals important gaps in the canvassed literatures. Further research is needed to evaluate the performance of AI risk assessment instruments, understand how judges navigate noisy decision making environments and how individual characteristics influence judges' responses to AI advice. We argue that AI vs Human comparisons have the potential to yield new insights into both algorithmic tools and human decision-makers and advocate greater interdisciplinary integration and cross-fertilization in future research.

[514]  arXiv:2603.19043 [pdf, ps, other]
Title: Complexity bounds on neural networks for the solution of structured linear systems of equations
Subjects: Numerical Analysis (math.NA)

We derive upper bounds on the complexity of ReLU neural networks approximating the solution of a linear system given the matrix and the right-hand side. We focus on matrices which are symmetric positive definite and sparse, as they appear in the context of finite difference and finite element methods. For such matrices, we extend available results for the matrix inversion to the task of solving a linear system, where we leverage favorable properties of classical methods such as the modified Richardson and the conjugate gradient method. Our bounds on the number of layers and neurons are not only explicit with respect to the size of the matrices, but also with respect to their condition numbers.

[515]  arXiv:2603.19044 [pdf, ps, other]
Title: MoRI: Learning Motivation-Grounded Reasoning for Scientific Ideation in Large Language Models
Subjects: Computation and Language (cs.CL)

Scientific ideation aims to propose novel solutions within a given scientific context. Existing LLM-based agentic approaches emulate human research workflows, yet inadequately model scientific reasoning, resulting in surface-level conceptual recombinations that lack technical depth and scientific grounding. To address this issue, we propose \textbf{MoRI} (\textbf{Mo}tivation-grounded \textbf{R}easoning for Scientific \textbf{I}deation), a framework that enables LLMs to explicitly learn the reasoning process from research motivations to methodologies. The base LLM is initialized via supervised fine-tuning to generate a research motivation from a given context, and is subsequently trained under a composite reinforcement learning reward that approximates scientific rigor: (1) entropy-aware information gain encourages the model to uncover and elaborate high-complexity technical details grounded in ground-truth methodologies, and (2) contrastive semantic gain constrains the reasoning trajectory to maintain conceptually aligned with scientifically valid solutions. Empirical results show that MoRI significantly outperforms strong commercial LLMs and complex agentic baselines across multiple dimensions, including novelty, technical rigor, and feasibility. The code will be made available on \href{https://github.com/ECNU-Text-Computing/IdeaGeneration}{GitHub}.

[516]  arXiv:2603.19048 [pdf, ps, other]
Title: Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos
Comments: Code available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a metric for evaluating 3D \textbf{S}patial \textbf{G}eometric \textbf{C}onsistency in dynamically generated videos. We quantify geometric consistency by measuring the divergence among multiple camera poses estimated from distinct local regions. Our approach first separates static from dynamic regions, then partitions the static background into spatially coherent sub-regions. We predict depth for each pixel, estimate a local camera pose for each subregion, and compute the divergence among these poses to quantify geometric consistency. Experiments on real and generative videos demonstrate that SGC robustly quantifies geometric inconsistencies, effectively identifying critical failures missed by existing metrics.

[517]  arXiv:2603.19053 [pdf, ps, other]
Title: SwiftTailor: Efficient 3D Garment Generation with Geometry Image Representation
Comments: CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Realistic and efficient 3D garment generation remains a longstanding challenge in computer vision and digital fashion. Existing methods typically rely on large vision- language models to produce serialized representations of 2D sewing patterns, which are then transformed into simulation-ready 3D meshes using garment modeling framework such as GarmentCode. Although these approaches yield high-quality results, they often suffer from slow inference times, ranging from 30 seconds to a minute. In this work, we introduce SwiftTailor, a novel two-stage framework that unifies sewing-pattern reasoning and geometry-based mesh synthesis through a compact geometry image representation. SwiftTailor comprises two lightweight modules: PatternMaker, an efficient vision-language model that predicts sewing patterns from diverse input modalities, and GarmentSewer, an efficient dense prediction transformer that converts these patterns into a novel Garment Geometry Image, encoding the 3D surface of all garment panels in a unified UV space. The final 3D mesh is reconstructed through an efficient inverse mapping process that incorporates remeshing and dynamic stitching algorithms to directly assemble the garment, thereby amortizing the cost of physical simulation. Extensive experiments on the Multimodal GarmentCodeData demonstrate that SwiftTailor achieves state-of-the-art accuracy and visual fidelity while significantly reducing inference time. This work offers a scalable, interpretable, and high-performance solution for next-generation 3D garment generation.

[518]  arXiv:2603.19054 [pdf, ps, other]
Title: Em-Garde: A Propose-Match Framework for Proactive Streaming Video Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recent advances in Streaming Video Understanding has enabled a new interaction paradigm where models respond proactively to user queries. Current proactive VideoLLMs rely on per-frame triggering decision making, which suffers from an efficiency-accuracy dilemma. We propose Em-Garde, a novel framework that decouples semantic understanding from streaming perception. At query time, the Instruction-Guided Proposal Parser transforms user queries into structured, perceptually grounded visual proposals; during streaming, a Lightweight Proposal Matching Module performs efficient embedding-based matching to trigger responses. Experiments on StreamingBench and OVO-Bench demonstrate consistent improvements over prior models in proactive response accuracy and efficiency, validating an effective solution for proactive video understanding under strict computational constraints.

[519]  arXiv:2603.19056 [pdf, ps, other]
Title: Solving Maxwell's Equations with Mimetic Methods
Authors: Johnny Corbino
Subjects: Numerical Analysis (math.NA)

We present a mimetic finite-difference approach for solving Maxwell's equations in one and two spatial dimensions. After introducing the governing equations and the classical Finite-Difference Time-Domain (FDTD) method, we describe mimetic operators that satisfy a discrete analogue of the extended Gauss divergence theorem and show how they lead to a compact, physically consistent formulation for computational electromagnetics. Two numerical examples are presented: a one-dimensional sinusoidal wave interacting with a lossy dielectric slab, and a two-dimensional Gaussian pulse with Uniaxial Perfectly Matched Layer (UPML) absorbing boundary conditions. All implementations use the Mimetic Operators Library Enhanced (MOLE).

[520]  arXiv:2603.19057 [pdf, ps, other]
Title: Mitigating the Bandwidth Wall via Data-Streaming System-Accelerator Co-Design
Subjects: Hardware Architecture (cs.AR)

Transformers have revolutionized AI in natural language processing and computer vision, but their large computation and memory demands pose major challenges for hardware acceleration. In practice, end-to-end throughput is often limited by paged data movement and interconnect bandwidth rather than raw MAC count. This work proposes a unified system-accelerator co-design approach for transformer inference that jointly optimizes a matrix accelerator and its system integration through paged streaming dataflows and explicit overlap of compute and transfer. On the hardware side, we introduce MatrixFlow, a loosely coupled 16x16 systolic-array accelerator with a page-aligned block matrix multiplication method using 4 KB tiles, a small on-chip buffer of about 20 KB, and a pipelined schedule of DMA, compute, and DMA-out to utilize interconnect bandwidth efficiently. On the system side, we develop Gem5-AcceSys, an extension of the gem5 full-system simulator that explores standard interconnects such as PCIe and configurable memory hierarchies including Direct Memory, Direct Cache, and Device Memory modes with SMMU/TLB effects. We evaluate the co-design using gem5 simulations on representative transformer models including BERT and ViT across multiple data types and system setups. Results show up to 22x end-to-end speedup over a CPU-only baseline and 5x to 8x gains over state-of-the-art loosely and tightly coupled accelerators. We further show that a standard PCIe-based host-memory design can achieve about 80 percent of the performance of on-device HBM. Overall, paged streaming and pipeline overlap, rather than large local SRAMs, are the most effective levers for efficient transformer inference under realistic system constraints.

[521]  arXiv:2603.19059 [pdf, ps, other]
Title: SignAgent: Agentic LLMs for Linguistically-Grounded Sign Language Annotation and Dataset Curation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces SignAgent, a novel agentic framework that utilises Large Language Models (LLMs) for scalable, linguistically-grounded Sign Language (SL) annotation and dataset curation. Traditional computational methods for SLs often operate at the gloss level, overlooking crucial linguistic nuances, while manual linguistic annotation remains a significant bottleneck, proving too slow and expensive for the creation of large-scale, phonologically-aware datasets. SignAgent addresses these challenges through SignAgent Orchestrator, a reasoning LLM that coordinates a suite of linguistic tools, and SignGraph, a knowledge-grounded LLM that provides lexical and linguistic grounding. We evaluate our framework on two downstream annotation tasks. First, on Pseudo-gloss Annotation, where the agent performs constrained assignment, using multi-modal evidence to extract and order suitable gloss labels for signed sequences. Second, on ID Glossing, where the agent detects and refines visual clusters by reasoning over both visual similarity and phonological overlap to correctly identify and group lexical sign variants. Our results demonstrate that our agentic approach achieves strong performance for large-scale, linguistically-aware data annotation and curation.

[522]  arXiv:2603.19061 [pdf, ps, other]
Title: Hardness of High-Dimensional Linear Classification
Comments: SoCG 2026
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)

We establish new exponential in dimension lower bounds for the Maximum Halfspace Discrepancy problem, which models linear classification. Both are fundamental problems in computational geometry and machine learning in their exact and approximate forms. However, only $O(n^d)$ and respectively $\tilde O(1/\varepsilon^d)$ upper bounds are known and complemented by polynomial lower bounds that do not support the exponential in dimension dependence. We close this gap up to polylogarithmic terms by reduction from widely-believed hardness conjectures for Affine Degeneracy testing and $k$-Sum problems. Our reductions yield matching lower bounds of $\tilde\Omega(n^d)$ and respectively $\tilde\Omega(1/\varepsilon^d)$ based on Affine Degeneracy testing, and $\tilde\Omega(n^{d/2})$ and respectively $\tilde\Omega(1/\varepsilon^{d/2})$ conditioned on $k$-Sum. The first bound also holds unconditionally if the computational model is restricted to make sidedness queries, which corresponds to a widely spread setting implemented and optimized in many contemporary algorithms and computing paradigms.

[523]  arXiv:2603.19063 [pdf, ps, other]
Title: Fire as a Service: Augmenting Robot Simulators with Thermally and Visually Accurate Fire Dynamics
Subjects: Robotics (cs.RO); Graphics (cs.GR)

Most existing robot simulators prioritize rigid-body dynamics and photorealistic rendering, but largely neglect the thermally and optically complex phenomena that characterize real-world fire environments. For robots envisioned as future firefighters, this limitation hinders both reliable capability evaluation and the generation of representative training data prior to deployment in hazardous scenarios. To address these challenges, we introduce Fire as a Service (FaaS), a novel, asynchronous co-simulation framework that augments existing robot simulators with high-fidelity and computationally efficient fire simulations. Our pipeline enables robots to experience accurate, multi-species thermodynamic heat transfer and visually consistent volumetric smoke without disrupting high-frequency rigid-body control loops. We demonstrate that our framework can be integrated with diverse robot simulators to generate physically accurate fire behavior, benchmark thermal hazards encountered by robotic platforms, and collect realistic multimodal perceptual data. Crucially, its real-time performance supports human-in-the-loop teleoperation, enabling the successful training of reactive, multimodal policies via Behavioral Cloning. By adding fire dynamics to robot simulations, FaaS provides a scalable pathway toward safer, more reliable deployment of robots in fire scenarios.

[524]  arXiv:2603.19066 [pdf, ps, other]
Title: Parallelograms Strike Back: LLMs Generate Better Analogies than People
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Four-term word analogies (A:B::C:D) are classically modeled geometrically as ''parallelograms,'' yet recent work suggests this model poorly captures how humans produce analogies, with simple local-similarity heuristics often providing a better account (Peterson et al., 2020). But does the parallelogram model fail because it is a bad model of analogical relations, or because people are not very good at generating relation-preserving analogies? We compared human and large language model (LLM) analogy completions on the same set of analogy problems from (Peterson et al., 2020). We find that LLM-generated analogies are reliably judged as better than human-generated ones, and are also more closely aligned with the parallelogram structure in a distributional embedding space (GloVe). Crucially, we show that the improvement over human analogies was driven by greater parallelogram alignment and reduced reliance on accessible words rather than enhanced sensitivity to local similarity. Moreover, the LLM advantage is driven not by uniformly superior responses by LLMs, but by humans producing a long tail of weak completions: when only modal (most frequent) responses by both systems are compared, the LLM advantage disappears. However, greater parallelogram alignment and lower word frequency continue to predict which LLM completions are rated higher than those of humans. Overall, these results suggest that the parallelogram model is not a poor account of word analogy. Rather, humans may often fail to produce completions that satisfy this relational constraint, whereas LLMs do so more consistently.

[525]  arXiv:2603.19067 [pdf, ps, other]
Title: Communication-Efficient and Robust Multi-Modal Federated Learning via Latent-Space Consensus
Comments: Accepted for publication in IEEE Wireless Communications Letters
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Federated learning (FL) enables collaborative model training across distributed devices without sharing raw data, but applying FL to multi-modal settings introduces significant challenges. Clients typically possess heterogeneous modalities and model architectures, making it difficult to align feature spaces efficiently while preserving privacy and minimizing communication costs. To address this, we introduce CoMFed, a Communication-Efficient Multi-Modal Federated Learning framework that uses learnable projection matrices to generate compressed latent representations. A latent-space regularizer aligns these representations across clients, improving cross-modal consistency and robustness to outliers. Experiments on human activity recognition benchmarks show that CoMFed achieves competitive accuracy with minimal overhead.

[526]  arXiv:2603.19074 [pdf, ps, other]
Title: CAMO: A Conditional Neural Solver for the Multi-objective Multiple Traveling Salesman Problem
Comments: 9 pages, 3 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Robotic systems often require a team of robots to collectively visit multiple targets while optimizing competing objectives, such as total travel cost and makespan. This setting can be formulated as the Multi-Objective Multiple Traveling Salesman Problem (MOMTSP). Although learning-based methods have shown strong performance on the single-agent TSP and multi-objective TSP variants, they rarely address the combined challenges of multi-agent coordination and multi-objective trade-offs, which introduce dual sources of complexity. To bridge this gap, we propose CAMO, a conditional neural solver for MOMTSP that generalizes across varying numbers of targets, agents, and preference vectors, and yields high-quality approximations to the Pareto front (PF). Specifically, CAMO consists of a conditional encoder to fuse preferences into instance representations, enabling explicit control over multi-objective trade-offs, and a collaborative decoder that coordinates all agents by alternating agent selection and node selection to construct multi-agent tours autoregressively. To further improve generalization, we train CAMO with a REINFORCE-based objective over a mixed distribution of problem sizes. Extensive experiments show that CAMO outperforms both neural and conventional heuristics, achieving a closer approximation of PFs. In addition, ablation results validate the contributions of CAMO's key components, and real-world tests on a mobile robot platform demonstrate its practical applicability.

[527]  arXiv:2603.19075 [pdf, ps, other]
Title: A conservative, discontinuous Galerkin, tracer transport scheme using compatible finite elements
Subjects: Numerical Analysis (math.NA)

This paper outlines a conservative transport scheme for scalar tracers within a compatible finite element model for geophysical fluid equations. Instead of using the advective transport equation for a mixing ratio, a conservative transport equation is solved for the tracer density of the mixing ratio multiplied by the dry density. This ensures mass conservation in the continuous equations, which can be preserved in the discrete equations with a discontinuous Galerkin transport scheme. Our method is designed to work for two placements of the mixing ratio in a Charney-Phillips vertical staggering: either co-located with the dry density or vertically staggered from it. The new scheme is designed to conserve the tracer density and ensure consistency by maintaining a constant mixing ratio. Additionally, a mass-conserving limiter is developed to ensure non-negativity in the co-located configuration. Tests with terminator toy chemistry and a moist rising bubble show the use of the new transport scheme with physics terms and its ability to accurately model mass conservation of moisture species in a dynamical core setup.

[528]  arXiv:2603.19076 [pdf, ps, other]
Title: DROID-SLAM in the Wild
Comments: CVPR 2026, Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

We present a robust, real-time RGB SLAM system that handles dynamic environments by leveraging differentiable Uncertainty-aware Bundle Adjustment. Traditional SLAM methods typically assume static scenes, leading to tracking failures in the presence of motion. Recent dynamic SLAM approaches attempt to address this challenge using predefined dynamic priors or uncertainty-aware mapping, but they remain limited when confronted with unknown dynamic objects or highly cluttered scenes where geometric mapping becomes unreliable. In contrast, our method estimates per-pixel uncertainty by exploiting multi-view visual feature inconsistency, enabling robust tracking and reconstruction even in real-world environments. The proposed system achieves state-of-the-art camera poses and scene geometry in cluttered dynamic scenarios while running in real time at around 10 FPS. Code and datasets are available at https://github.com/MoyangLi00/DROID-W.git.

[529]  arXiv:2603.19077 [pdf, ps, other]
Title: Multi-Modal Building Change Detection for Large-Scale Small Changes: Benchmark and Baseline
Comments: 15 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Change detection in optical remote sensing imagery is susceptible to illumination fluctuations, seasonal changes, and variations in surface land-cover materials. Relying solely on RGB imagery often produces pseudo-changes and leads to semantic ambiguity in features. Incorporating near-infrared (NIR) information provides heterogeneous physical cues that are complementary to visible light, thereby enhancing the discriminability of building materials and tiny structures while improving detection accuracy. However, existing multi-modal datasets generally lack high-resolution and accurately registered bi-temporal imagery, and current methods often fail to fully exploit the inherent heterogeneity between these modalities. To address these issues, we introduce the Large-scale Small-change Multi-modal Dataset (LSMD), a bi-temporal RGB-NIR building change detection benchmark dataset targeting small changes in realistic scenarios, providing a rigorous testing platform for evaluating multi-modal change detection methods in complex environments. Based on LSMD, we further propose the Multi-modal Spectral Complementarity Network (MSCNet) to achieve effective cross-modal feature fusion. MSCNet comprises three key components: the Neighborhood Context Enhancement Module (NCEM) to strengthen local spatial details, the Cross-modal Alignment and Interaction Module (CAIM) to enable deep interaction between RGB and NIR features, and the Saliency-aware Multisource Refinement Module (SMRM) to progressively refine fused features. Extensive experiments demonstrate that MSCNet effectively leverages multi-modal information and consistently outperforms existing methods under multiple input configurations, validating its efficacy for fine-grained building change detection. The source code will be made publicly available at: https://github.com/AeroVILab-AHU/LSMD

[530]  arXiv:2603.19078 [pdf, ps, other]
Title: Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning
Comments: Arxiv_r1
Subjects: Robotics (cs.RO)

Recent work in reinforcement learning has shown that incorporating structural priors for articulated robots, such as link connectivity, into policy networks improves learning efficiency. However, dynamics properties, despite their fundamental role in determining how forces and motion propagate through the body, remain largely underexplored as an inductive bias for policy learning. To address this gap, we present the Articulated-Body Dynamics Network (ABD-Net), a novel graph neural network architecture grounded in the computational structure of forward dynamics. Specifically, we adapt the inertia propagation mechanism from the Articulated Body Algorithm, systematically aggregating inertial quantities from child to parent links in a tree-structured manner, while replacing physical quantities with learnable parameters. Embedding ABD-NET into the policy actor enables dynamics-informed representations that capture how actions propagate through the body, leading to efficient and robust policy learning. Through experiments with simulated humanoid, quadruped, and hopper robots, our approach demonstrates increased sample efficiency and generalization to dynamics shifts compared to transformer-based and GNN baselines. We further validate the learned policy on real Unitree G1 and Go2 robots, state-of-the-art humanoid and quadruped platforms, generating dynamic, versatile and robust locomotion behaviors through sim-to-real transfer with real-time inference.

[531]  arXiv:2603.19080 [pdf, ps, other]
Title: Reduced order computation of 2D elastodynamic Green's functions in layered soil using a low-rank tensor approximation
Comments: Preprint submitted to Computers & Structures
Subjects: Numerical Analysis (math.NA)

The evaluation of elastodynamic Green's functions across numerous source-receiver locations, frequencies, and material properties, particularly in the context of parametric studies or boundary element computations, is computationally demanding and memory intensive. This paper presents a reduced order modeling strategy based on the Greedy Tucker Approximation (GTA), which incrementally constructs a low-rank representation of the Green's tensor through rank-one enrichments obtained via a Proper Generalized Decomposition (PGD)-type alternating least squares procedure. A Petrov-Galerkin formulation is employed to improve convergence and approximation accuracy. The resulting multi-dimensional tensor, expressed in terms of one-dimensional basis functions and a compact core, achieves substantial reductions in memory requirements. The methodology is demonstrated for two cases: a soil layer on rigid bedrock and a layered halfspace. Different separable dimensions are considered to capture various combinations of source and receiver configurations, frequencies, and material parameters. Results are validated against those obtained with the direct stiffness method and computation times and memory requirements are compared.

[532]  arXiv:2603.19082 [pdf, ps, other]
Title: A Dataset and Resources for Identifying Patient Health Literacy Information from Clinical Notes
Subjects: Computation and Language (cs.CL)

Health literacy is a critical determinant of patient outcomes, yet current screening tools are not always feasible and differ considerably in the number of items, question format, and dimensions of health literacy they capture, making documentation in structured electronic health records difficult to achieve. Automated detection from unstructured clinical notes offers a promising alternative, as these notes often contain richer, more contextual health literacy information, but progress has been limited by the lack of annotated resources. We introduce HEALIX, the first publicly available annotated health literacy dataset derived from real clinical notes, curated through a combination of social worker note sampling, keyword-based filtering, and LLM-based active learning. HEALIX contains 589 notes across 9 note types, annotated with three health literacy labels: low, normal, and high. To demonstrate its utility, we benchmarked zero-shot and few-shot prompting strategies across four open source large language models (LLMs).

[533]  arXiv:2603.19084 [pdf, ps, other]
Title: On The Effectiveness of the UK NIS Regulations as a Mandatory Cybersecurity Reporting Regime
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)

Existing cybersecurity literature lacks a source of empirical, representative data as to the true nature of cyberattacks on Critical National Infrastructure. We have obtained UK-wide data on incidents reported under the Network and Information Systems (NIS) Regulations in 2024 causing "a significant impact on the continuity" of essential services and comparator data from intelligence agencies. We find that 29% of NIS reports already concern cybersecurity incidents. As the UK Government seeks to extend cybersecurity reporting, we find the NIS Regulations are limited in their effectiveness; whilst our requests revealed 30 cybersecurity incidents reported under the NIS regulations, there were 89 incidents classified as "highly significant and significant" captured by the National Cyber Security Centre in the 2024 reporting year. Whereas 36% of Cybersecurity and Infrastructure Security Agency reported attacks concerned espionage, from NIS data we find 100% NIS-reportable cyberattacks concerning healthcare systems in England in 2024 were ransomware.

[534]  arXiv:2603.19086 [pdf, ps, other]
Title: In the Margins: An Empirical Study of Ethereum Inscriptions
Comments: 13 pages, 10 tables, 5 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Ethereum Inscriptions (Ethscriptions) repurpose Ethereum calldata into a persistent inscription channel by embedding \texttt{data:}~URI payloads. These transactions typically target externally owned accounts, allowing the payload to bypass EVM execution while remaining permanently replicated across full nodes. Although calldata was originally designed for compact smart-contract parameters, this repurposing enables structured data embedding with long-term storage consequences.
We present the first large-scale empirical study of Ethscriptions, treating them as a distinct \emph{calldata-resident workload} rather than merely a subset of general calldata usage. Our analysis focuses on the \textit{Ethscription} operational subset, which consists of payloads that decode to JSON and conform to a token-operation grammar (e.g., \texttt{p}, \texttt{op}, \texttt{tick}, \texttt{amt}).
From $6.27$ million Ethscription candidates (\Uone), we extract $4.75$ million Ethscription operations (\Utwo, $75.8\%$ of \Uone). This result shows that structured token-like activity dominates the ecosystem.
Our measurements further reveal (i) a complete workload lifecycle compressed into nine months (bootstrap, expansion, saturation), (ii) proliferation of $30$+ competing protocols without convergence toward a dominant standard, (iii) a lifecycle funnel exhibiting $201\times$ deploy-to-mint amplification and a $57.6{:}1$ mint-to-transfer collapse indicative of speculative minting, (iv) extreme participation inequality (Gini~$0.86$), and (v) a measurable permanent data footprint imposed on the Ethereum network.

[535]  arXiv:2603.19087 [pdf, ps, other]
Title: Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Are large language models (LLMs) creative in the same way humans are, and can the same interventions increase creativity in both? We evaluate a promising but largely untested intervention for creativity: forcing creators to draw an analogy from a random, remote source domain (''cross-domain mapping''). Human participants and LLMs generated novel features for ten daily products (e.g., backpack, TV) under two prompts: (i) cross-domain mapping, which required translating a property from a randomly assigned source (e.g., octopus, cactus, GPS), and (ii) user-need, which required proposing innovations targeting unmet user needs. We show that humans reliably benefit from randomly assigned cross-domain mappings, while LLMs, on average, generate more original ideas than humans and do not show a statistically significant effect of cross-domain mappings. However, in both systems, the impact of cross-domain mapping increases when the inspiration source becomes more semantically distant from the target. Our results highlight both the role of remote association in creative ideation and systematic differences in how humans and LLMs respond to the same intervention for creativity.

[536]  arXiv:2603.19091 [pdf, ps, other]
Title: Position: Spectral GNNs Are Neither Spectral Nor Superior for Node Classification
Subjects: Machine Learning (cs.LG)

Spectral Graph Neural Networks (Spectral GNNs) for node classification promise frequency-domain filtering on graphs, yet rest on flawed foundations. Recent work shows that graph Laplacian eigenvectors do not in general have the key properties of a true Fourier basis, but leaves the empirical success of Spectral GNNs unexplained. We identify two theoretical glitches: (1) commonly used "graph Fourier bases" are not classical Fourier bases for graph signals; (2) (n-1)-degree polynomials (n = number of nodes) can exactly interpolate any spectral response via a Vandermonde system, so the usual "polynomial approximation" narrative is not theoretically justified. The effectiveness of GCN is commonly attributed to spectral low-pass filtering, yet we prove that low- and high-pass behaviors arise solely from message-passing dynamics rather than Graph Fourier Transform-based spectral formulations. We then analyze two representative directed spectral models, MagNet and HoloNet. Their reported effectiveness is not spectral: it arises from implementation issues that reduce them to powerful MPNNs. When implemented consistently with the claimed spectral algorithms, performance becomes weak. This position paper argues that: for node classification, Spectral GNNs neither meaningfully capture the graph spectrum nor reliably improve performance; competitive results are better explained by their equivalence to MPNNs, sometimes aided by implementations inconsistent with their intended design.

[537]  arXiv:2603.19092 [pdf, ps, other]
Title: SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Vision-language models (VLMs) are increasingly deployed in real-world and embodied settings where safety decisions depend on visual context. However, it remains unclear which visual evidence drives these judgments. We study whether multimodal safety behavior in VLMs can be steered by simple semantic cues. We introduce a semantic steering framework that applies controlled textual, visual, and cognitive interventions without changing the underlying scene content. To evaluate these effects, we propose SAVeS, a benchmark for situational safety under semantic cues, together with an evaluation protocol that separates behavioral refusal, grounded safety reasoning, and false refusals. Experiments across multiple VLMs and an additional state-of-the-art benchmark show that safety decisions are highly sensitive to semantic cues, indicating reliance on learned visual-linguistic associations rather than grounded visual understanding. We further demonstrate that automated steering pipelines can exploit these mechanisms, highlighting a potential vulnerability in multimodal safety systems.

[538]  arXiv:2603.19093 [pdf, ps, other]
Title: Follow the Rules (or Not): Community Norms and AI-Generated Support in Online Health Communities
Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)

Generative AI (GenAI) is increasingly being integrated into the online ecosystem, including online health communities (OHCs), where people with diverse health conditions exchange social support. For example, in OHCs, support providers are beginning to share content generated, directly or indirectly, by popular GenAI-based tools. OHCs are governed by norms that define appropriate behavior when providing support. Ways in which AI-generated support interacts with these norms remain underexplored. Inappropriate conformance or outright violation can erode seekers' trust, distort decision-making, and threaten community sustenance. In this work, we examine whether (and how) AI-generated support conforms to norms, using popular opioid-use recovery subreddits as our testbed. First, we provide an inventory of norms regulating text-based support provision in OHCs. Next, using human-validated LLM judges, we assess the prevalence of AI's conformity to these norms. Finally, through an expert review, we identify risks to seekers (and OHCs) resulting from norm (non)conformity. Our analysis revealed that, while AI-generated support conforms to norms, such conformity may be inappropriate or insufficient, for example, by over- or under-validating seekers in distress. Moreover, we observed instances of outright norm violation. This work provides insights that can help moderators and OHC designers adapt existing and develop new norms to regulate AI integration, protecting both seekers and communities they rely on.

[539]  arXiv:2603.19096 [pdf, ps, other]
Title: GLENN: Neural network-enhanced computation of Ginzburg-Landau energy minimizers
Subjects: Numerical Analysis (math.NA)

In this work, we propose a neural network-enhanced finite element strategy to compute the minimizer of the Ginzburg--Landau energy based on an unsupervised deep Ritz-type strategy. We treat the parameter $\kappa$ as a variable input parameter to obtain possible minimizers for a large range of $\kappa$-values. This allows for two possible strategies: 1) The neural network may be extensively trained to work as a stand-alone solver. 2) Neural network results are used as starting values for a subsequent classical iterative minimization procedure. The latter strategy particularly circumvents the missing reliability of the neural network-based approach. Numerical examples are presented that show the potential of the proposed strategy.

[540]  arXiv:2603.19097 [pdf, ps, other]
Title: DaPT: A Dual-Path Framework for Multilingual Multi-hop Question Answering
Comments: Accepted by ICASSP 2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Retrieval-augmented generation (RAG) systems have made significant progress in solving complex multi-hop question answering (QA) tasks in the English scenario. However, RAG systems inevitably face the application scenario of retrieving across multilingual corpora and queries, leaving several open challenges. The first one involves the absence of benchmarks that assess RAG systems' capabilities under the multilingual multi-hop (MM-hop) QA setting. The second centers on the overreliance on LLMs' strong semantic understanding in English, which diminishes effectiveness in multilingual scenarios. To address these challenges, we first construct multilingual multi-hop QA benchmarks by translating English-only benchmarks into five languages, and then we propose DaPT, a novel multilingual RAG framework. DaPT generates sub-question graphs in parallel for both the source-language query and its English translation counterpart, then merges them before employing a bilingual retrieval-and-answer strategy to sequentially solve sub-questions. Our experimental results demonstrate that advanced RAG systems suffer from a significant performance imbalance in multilingual scenarios. Furthermore, our proposed method consistently yields more accurate and concise answers compared to the baselines, significantly enhancing RAG performance on this task. For instance, on the most challenging MuSiQue benchmark, DaPT achieves a relative improvement of 18.3\% in average EM score over the strongest baseline.

[541]  arXiv:2603.19098 [pdf, ps, other]
Title: TAU-R1: Visual Language Model for Traffic Anomaly Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Traffic Anomaly Understanding (TAU) is important for traffic safety in Intelligent Transportation Systems. Recent vision-language models (VLMs) have shown strong capabilities in video understanding. However, progress on TAU remains limited due to the lack of benchmarks and task-specific methodologies. To address this limitation, we introduce Roundabout-TAU, a dataset constructed from real-world roundabout videos collected in collaboration with the City of Carmel, Indiana. The dataset contains 342 clips and is annotated with more than 2,000 question-answer pairs covering multiple aspects of traffic anomaly understanding. Building on this benchmark, we propose TAU-R1, a two-layer vision-language framework for TAU. The first layer is a lightweight anomaly classifier that performs coarse anomaly categorisation, while the second layer is a larger anomaly reasoner that generates detailed event summaries. To improve task-specific reasoning, we introduce a two-stage training strategy consisting of decomposed-QA-enhanced supervised fine-tuning followed by TAU-GRPO, a GRPO-based post-training method with TAU-specific reward functions. Experimental results show that TAU-R1 achieves strong performance on both anomaly classification and reasoning tasks while maintaining deployment efficiency. The dataset and code are available at: https://github.com/siri-rouser/TAU-R1

[542]  arXiv:2603.19099 [pdf, ps, other]
Title: Why Synchronized Time is a Fiction: Daylight Saving Time, Leap Seconds, and the Guillotine Sharpened for Nothing
Authors: Paul Borrill
Comments: 18 pages, 24 references
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Civilization maintains an elaborate infrastructure devoted to the maintenance of synchronized time. Governments mandate daylight saving time. Standards bodies insert leap seconds into Coordinated Universal Time. Engineers debate leap milliseconds and leap nanoseconds. The Global Positioning System applies relativistic corrections at the nanosecond level. All of these adjustments attempt to preserve an assumption: that a single global time exists and that clocks can be made to agree upon it.
This paper argues that this assumption constitutes a category mistake in the sense of Ryle (1949). We show that special and general relativity prohibit absolute simultaneity, that the one-way speed of light is conventionally defined rather than measured, and that recent experiments on indefinite causal order demonstrate nature admits correlations with no well-defined temporal sequence. We trace the consequences of this category mistake through distributed computing, where it manifests as the Forward-In-Time-Only (FITO) assumption that underlies Lamport's logical clocks (1978), the impossibility results of Fischer-Lynch-Paterson (1985), and the CAP theorem (2000). From this perspective, daylight saving time and leap seconds are not corrections to time but corrections to conventions -- they sharpen the guillotine of synchronization in preparation for executing something that does not exist.

[543]  arXiv:2603.19100 [pdf, ps, other]
Title: LuMamba: Latent Unified Mamba for Electrode Topology-Invariant and Efficient EEG Modeling
Comments: 5 pages, 2 figures, 4 tables
Subjects: Artificial Intelligence (cs.AI)

Electroencephalography (EEG) enables non-invasive monitoring of brain activity across clinical and neurotechnology applications, yet building foundation models for EEG remains challenging due to \emph{differing electrode topologies} and \emph{computational scalability}, as Transformer architectures incur quadratic sequence complexity. As a joint solution, we propose \textbf{LuMamba} (\textbf{L}atent \textbf{U}nified \textbf{Mamba}), a self-supervised framework combining topology-invariant encodings with linear-complexity state-space modeling, using LUNA's learned-query cross-attention mechanism for channel unification~\cite{luna}, and FEMBA's bidirectional Mamba blocks for efficient temporal modeling~\cite{femba}. Within this architecture, we provide the first systematic investigation of the Latent-Euclidean Joint-Embedding Predictive Architecture (LeJEPA) for biosignal learning. Pre-trained on over 21,000 hours of unlabeled EEG from the TUEG corpus, LuMamba is evaluated on five downstream tasks spanning abnormality detection, artifact recognition, and mental condition classification across electrode configurations ranging from 16 to 26 channels. In the pre-training objective, masked reconstruction alone yields structured but less generalizable representations, while LeJEPA alone produces diffuse embeddings; combining both objectives achieves the most robust performance. With only 4.6M parameters, LuMamba attains 80.99\% balanced accuracy on TUAB and achieves state-of-art performance on Alzheimer's detection (0.97 AUPR), while requiring \textbf{377$\times$ fewer FLOPS} than state-of-art models at equivalent sequence lengths and scaling to \textbf{12$\times$ longer sequences} before reaching typical GPU memory limits. Code is available at https://github.com/pulp-bio/biofoundation

[544]  arXiv:2603.19101 [pdf, ps, other]
Title: FedTrident: Resilient Road Condition Classification Against Poisoning Attacks in Federated Learning
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

FL has emerged as a transformative paradigm for ITS, notably camera-based Road Condition Classification (RCC). However, by enabling collaboration, FL-based RCC exposes the system to adversarial participants launching Targeted Label-Flipping Attacks (TLFAs). Malicious clients (vehicles) can relabel their local training data (e.g., from an actual uneven road to a wrong smooth road), consequently compromising global model predictions and jeopardizing transportation safety. Existing countermeasures against such poisoning attacks fail to maintain resilient model performance near the necessary attack-free levels in various attack scenarios due to: 1) not tailoring poisoned local model detection to TLFAs, 2) not excluding malicious vehicular clients based on historical behavior, and 3) not remedying the already-corrupted global model after exclusion. To close this research gap, we propose FedTrident, which introduces: 1) neuron-wise analysis for local model misbehavior detection (notably including attack goal identification, critical feature extraction, and GMM-based model clustering and filtering); 2) adaptive client rating for client exclusion according to the local model detection results in each FL round; and 3) machine unlearning for corrupted global model remediation once malicious clients are excluded during FL. Extensive evaluation across diverse FL-RCC models, tasks, and configurations demonstrates that FedTrident can effectively thwart TLFAs, achieving performance comparable to that in attack-free scenarios and outperforming eight baseline countermeasures by 9.49% and 4.47% for the two most critical metrics. Moreover, FedTrident is resilient to various malicious client rates, data heterogeneity levels, complicated multi-task, and dynamic attacks.

[545]  arXiv:2603.19108 [pdf, ps, other]
Title: Numerical Considerations for the Construction of Karhunen-Loève Expansions
Subjects: Numerical Analysis (math.NA); Machine Learning (stat.ML)

This report examines numerical aspects of constructing Karhunen-Lo\`{e}ve expansions (KLEs) for second-order stochastic processes. The KLE relies on the spectral decomposition of the covariance operator via the Fredholm integral equation of the second kind, which is then discretized on a computational grid, leading to an eigendecomposition task. We derive the algebraic equivalence between this Fredholm-based eigensolution and the singular value decomposition of the weight-scaled sample matrix, yielding consistent solutions for both model-based and data-driven KLE construction. Analytical eigensolutions for exponential and squared-exponential covariance kernels serve as reference benchmarks to assess numerical consistency and accuracy in 1D settings. The convergence of SVD-based eigenvalue estimates and of the empirical distributions of the KL coefficients to their theoretical $\mathcal{N}(0,1)$ target are characterized as a function of sample count. Higher-dimensional configurations include a two-dimensional irregular domain discretized by unstructured triangular meshes with two refinement levels, and a three-dimensional toroidal domain whose non-simply-connected topology motivates a comparison between Euclidean and shortest interior path distances between the grid points. The numerical results highlight the interplay between the discretization strategy, quadrature rule, and sample count, and their impact on the KLE results.

[546]  arXiv:2603.19113 [pdf, ps, other]
Title: A stable and fast method for solving multibody scattering problems via the method of fundamental solutions
Comments: 31 pages, 9 figures
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph); Computational Physics (physics.comp-ph)

The paper describes a numerical method for solving acoustic multibody scattering problems in two and three dimensions. The idea is to compute a highly accurate approximation to the scattering operator for each body through a local computation, and then use these scattering matrices to form a global linear system. The resulting coefficient matrix is relatively well-conditioned, even for problems involving a very large number of scatterers. The linear system is amenable to iterative solvers, and can readily be accelerated via fast algorithms for the matrix-vector multiplication such as the fast multipole method. The key point of the work is that the local scattering matrices can be constructed using potentially ill-conditioned techniques such as the method of fundamental solutions (MFS), while still maintaining scalability and numerical stability of the global solver. The resulting algorithm is simple, as the MFS is far simpler to implement than alternative techniques based on discretizing boundary integral equations using Nystr\"om or Galerkin.

[547]  arXiv:2603.19116 [pdf, ps, other]
Title: Assessment of Analog Time Multiplexing in SDM Digital to Analog Converters
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)

Analog multiplexing for sigma delta modulated Digital to Analog Converters has been recently proposed as a means of achieving robustness. This preprint analyses said scheme via simulations. The main limitation introduced by the proposed architecture comes from mismatch in the DACs gain, which can drastically impact performances. A new technique of dynamic elements matching is proposed here to overcome this problem.

[548]  arXiv:2603.19118 [pdf, ps, other]
Title: How Uncertainty Estimation Scales with Sampling in Reasoning Models
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel sampling as a fully black-box approach using verbalized confidence and self-consistency. Across three reasoning models and 17 tasks spanning mathematics, STEM, and humanities, we characterize how these signals scale.
Both self-consistency and verbalized confidence scale in reasoning models, but self-consistency exhibits lower initial discrimination and lags behind verbalized confidence under moderate sampling. Most uncertainty gains, however, arise from signal combination: with just two samples, a hybrid estimator improves AUROC by up to $+12$ on average and already outperforms either signal alone even when scaled to much larger budgets, after which returns diminish. These effects are domain-dependent: in mathematics, the native domain of RLVR-style post-training, reasoning models achieve higher uncertainty quality and exhibit both stronger complementarity and faster scaling than in STEM or humanities.

[549]  arXiv:2603.19119 [pdf, ps, other]
Title: Exact-Time Safety Recovery using Time-Varying Control Barrier Functions with Optimal Barrier Tracking
Subjects: Systems and Control (eess.SY)

This paper is motivated by controllers developed for autonomous vehicles which occasionally result into conditions where safety is no longer guaranteed. We develop an exact-time safety recovery framework for any control-affine nonlinear system when its state is outside a safe region using time-varying Control Barrier Functions (CBFs) with optimal barrier tracking. Unlike conventional formulations that provide only conservative upper bounds on recovery time convergence, the proposed approach guarantees recovery to the safe set at a prescribed time. The key mechanism is an active barrier tracking condition that forces the barrier function to follow exactly a designer-specified recovery trajectory. This transforms safety recovery into a trajectory design problem. The recovery trajectory is parameterized and optimized to achieve optimal performance while preserving feasibility under input constraints, avoiding the aggressive corrective actions typically induced by conventional finite-time formulations. The safety recovery framework is applied to the roundabout traffic coordination problem for Connected and Automated Vehicles (CAVs), where any initially violated safe merging constraint is replaced by an exact-time recovery barrier constraint to ensure safety guarantee restoration before CAV conflict points are reached. Simulation results demonstrate improved feasibility and performance.

[550]  arXiv:2603.19121 [pdf, ps, other]
Title: CustomTex: High-fidelity Indoor Scene Texturing via Multi-Reference Customization
Comments: Accepted to CVPR 2026. This version integrates the main paper and supplementary material
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The creation of high-fidelity, customizable 3D indoor scene textures remains a significant challenge. While text-driven methods offer flexibility, they lack the precision for fine-grained, instance-level control, and often produce textures with insufficient quality, artifacts, and baked-in shading. To overcome these limitations, we introduce CustomTex, a novel framework for instance-level, high-fidelity scene texturing driven by reference images. CustomTex takes an untextured 3D scene and a set of reference images specifying the desired appearance for each object instance, and generates a unified, high-resolution texture map. The core of our method is a dual-distillation approach that separates semantic control from pixel-level enhancement. We employ semantic-level distillation, equipped with an instance cross-attention, to ensure semantic plausibility and ``reference-instance'' alignment, and pixel-level distillation to enforce high visual fidelity. Both are unified within a Variational Score Distillation (VSD) optimization framework. Experiments demonstrate that CustomTex achieves precise instance-level consistency with reference images and produces textures with superior sharpness, reduced artifacts, and minimal baked-in shading compared to state-of-the-art methods. Our work establishes a more direct and user-friendly path to high-quality, customizable 3D scene appearance editing.

[551]  arXiv:2603.19122 [pdf, ps, other]
Title: Revisiting Autoregressive Models for Generative Image Classification
Comments: Tech report
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Class-conditional generative models have emerged as accurate and robust classifiers, with diffusion models demonstrating clear advantages over other visual generative paradigms, including autoregressive (AR) models. In this work, we revisit visual AR-based generative classifiers and identify an important limitation of prior approaches: their reliance on a fixed token order, which imposes a restrictive inductive bias for image understanding. We observe that single-order predictions rely more on partial discriminative cues, while averaging over multiple token orders provides a more comprehensive signal. Based on this insight, we leverage recent any-order AR models to estimate order-marginalized predictions, unlocking the high classification potential of AR models. Our approach consistently outperforms diffusion-based classifiers across diverse image classification benchmarks, while being up to 25x more efficient. Compared to state-of-the-art self-supervised discriminative models, our method delivers competitive classification performance - a notable achievement for generative classifiers.

[552]  arXiv:2603.19124 [pdf, ps, other]
Title: Tendon-Actuated Robots with a Tapered, Flexible Polymer Backbone: Design, Fabrication, and Modeling
Subjects: Robotics (cs.RO)

This paper presents the design, modeling, and fabrication of 3D-printed, tendon-actuated continuum robots featuring a flexible, tapered backbone constructed from thermoplastic polyurethane (TPU). Our scalable design incorporates an integrated electronics base housing that enables direct tendon tension control and sensing via actuators and compression load cells. Unlike many continuum robots that are single-purpose and costly, the proposed design prioritizes customizability, rapid assembly, and low cost while enabling high curvature and enhanced distal compliance through geometric tapering, thereby supporting a broad range of compliant robotic inspection and manipulation tasks. We develop a generalized forward kinetostatic model of the tapered backbone based on Cosserat rod theory using a Newtonian approach, extending existing tendon-actuated Cosserat rod formulations to explicitly account for spatially varying backbone cross-sectional geometry. The model captures the graded stiffness profile induced by the tapering and enables systematic exploration of the configuration space as a function of the geometric design parameters. Specifically, we analyze how the backbone taper angle influences the robot's configuration space and manipulability. The model is validated against motion capture data, achieving centimeter-level shape prediction accuracy after calibrating Young's modulus via a line search that minimizes modeling error. We further demonstrate teleoperated grasping using an endoscopic gripper routed along the continuum robot, mounted on a 6-DoF robotic arm. Parameterized iLogic/CAD scripts are provided for rapid geometry generation and scaling. The presented framework establishes a simple, rapid, and reproducible pathway from parametric design to controlled tendon actuation for tapered, tendon-driven continuum robots manufactured using fused deposition modeling 3D printers.

[553]  arXiv:2603.19127 [pdf, ps, other]
Title: On Optimizing Multimodal Jailbreaks for Spoken Language Models
Comments: Under Review at INTERSPEECH 2026
Subjects: Machine Learning (cs.LG)

As Spoken Language Models (SLMs) integrate speech and text modalities, they inherit the safety vulnerabilities of their LLM backbone and an expanded attack surface. SLMs have been previously shown to be susceptible to jailbreaking, where adversarial prompts induce harmful responses. Yet existing attacks largely remain unimodal, optimizing either text or audio in isolation. We explore gradient-based multimodal jailbreaks by introducing JAMA (Joint Audio-text Multimodal Attack), a joint multimodal optimization framework combining Greedy Coordinate Gradient (GCG) for text and Projected Gradient Descent (PGD) for audio, to simultaneously perturb both modalities. Evaluations across four state-of-the-art SLMs and four audio types demonstrate that JAMA surpasses unimodal jailbreak rate by 1.5x to 10x. We analyze the operational dynamics of this joint attack and show that a sequential approximation method makes it 4x to 6x faster. Our findings suggest that unimodal safety is insufficient for robust SLMs. The code and data are available at https://repos.lsv.uni-saarland.de/akrishnan/multimodal-jailbreak-slm

[554]  arXiv:2603.19131 [pdf, ps, other]
Title: From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

Vision-Language-Action (VLA) models have recently enabled embodied agents to perform increasingly complex tasks by jointly reasoning over visual, linguistic, and motor modalities. However, we find that the prevailing notion of ``efficiency'' in current VLA research, characterized by parameters, FLOPs, or token decoding throughput, does not reflect actual performance on robotic platforms. In real-world execution, efficiency is determined by system-level embodied behaviors such as task completion time, trajectory smoothness, cumulative joint rotation, and motion energy. Through controlled studies across model compression, token sparsification, and action sequence compression, we make several observations that challenge common assumptions. (1) Methods that reduce computation under conventional metrics often increase end-to-end execution cost or degrade motion quality, despite maintaining task success rates. (2) System-level embodied efficiency metrics reveal performance differences in the learned action policies that remain hidden under conventional evaluations. (3) Common adaptation methods such as in-context prompting or supervised fine-tuning show only mild and metric-specific improvements in embodied efficiency. While these methods can reduce targeted embodied-efficiency metrics such as jerk or action rate, the resulting gains may come with trade-offs in other metrics, such as longer completion time. Taken together, our results suggest that conventional inference efficiency metrics can overlook important aspects of embodied execution. Incorporating embodied efficiency provides a more complete view of policy behavior and practical performance, enabling fairer and more comprehensive comparisons of VLA models.

[555]  arXiv:2603.19132 [pdf, ps, other]
Title: Tutorial: Grid-Following Inverter for Electrical Power Grid
Subjects: Systems and Control (eess.SY)

The growing use of inverter-based resources in modern power systems has made grid-following inverters a central topic in power-system modeling, control, and simulation. Despite their widespread deployment, introductory material that explains grid-following inverter operation from first principles and connects control design to time-domain simulation remains limited. To address this need, this tutorial presents a circuit-theoretic introduction to the modeling and simulation of a grid- following inverter connected to an electrical power grid. We describe the inverter synchronization with the grid (PLL), power control, and current control structure and show how these elements can be represented within an electromagnetic transient (EMT) simulation framework using companion model-based formulations similar to those used in circuit simulators such as SPICE and Cadence. In this tutorial, we use the grid-following inverter as the primary example to illustrate how its governing equations, control loops, and network interface can be formulated and simulated from first principles. By the end of the document, readers should gain a clear introductory understanding of how to model and simulate a grid-following inverter in an EMT platform.

[556]  arXiv:2603.19133 [pdf, ps, other]
Title: A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference
Comments: 8 pages, 6 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Recent advancements and widespread adoption of Large Language Models (LLMs) in both industry and academia have catalyzed significant demand for LLM serving. However, traditional cloud services incur high costs, while on-device inference alone faces challenges due to limited resources. Edge-cloud collaboration emerges as a key research direction to combine the strengths of both paradigms, yet efficiently utilizing limited network bandwidth while fully leveraging and balancing the computational capabilities of edge devices and the cloud remains an open problem. To address these challenges, we propose Pipelined Collaborative Speculative Decoding Framework (PicoSpec), a novel, general-purpose, and training-free speculative decoding framework for LLM edge-cloud collaborative inference. We design an asynchronous pipeline that resolves the mutual waiting problem inherent in vanilla speculative decoding within edge collaboration scenarios, which concurrently executes a Small Language Model (SLM) on the edge device and a LLM in the cloud. Meanwhile, to mitigate the significant communication latency caused by transmitting vocabulary distributions, we introduce separate rejection sampling with sparse compression, which completes the rejection sampling with only a one-time cost of transmitting the compressed vocabulary. Experimental results demonstrate that our solution outperforms baseline and existing methods, achieving up to 2.9 speedup.

[557]  arXiv:2603.19134 [pdf, ps, other]
Title: Introducing M: A Modular, Modifiable Social Robot
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)

We present M, an open-source, low-cost social robot platform designed to reduce platform friction that slows social robotics research by making robots easier to reproduce, modify, and deploy in real-world settings. M combines a modular mechanical design, multimodal sensing, and expressive yet mechanically simple actuation architecture with a ROS2-native software package that cleanly separates perception, expression control, and data management. The platform includes a simulation environment with interface equivalence to hardware to support rapid sim-to-real transfer of interaction behaviors. We demonstrate extensibility through additional sensing/actuation modules and provide example interaction templates for storytelling and two-way conversational coaching. Finally, we report real-world use in participatory design and week-long in-home deployments, showing how M can serve as a practical foundation for longitudinal, reproducible social robotics research.

[558]  arXiv:2603.19136 [pdf, ps, other]
Title: Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control
Comments: Submitted to IEEE Transactions on Computational Social Systems. 17 pages, 9 figures, 10 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistical Finance (q-fin.ST)

Stock markets exhibit regime-dependent behavior where prediction models optimized for stable conditions often fail during volatile periods. Existing approaches typically treat all market states uniformly or require manual regime labeling, which is expensive and quickly becomes stale as market dynamics evolve. This paper introduces an adaptive prediction framework that adaptively identifies deviations from normal market conditions and routes data through specialized prediction pathways. The architecture consists of three components: (1) an autoencoder trained on normal market conditions that identifies anomalous regimes through reconstruction error, (2) dual node transformer networks specialized for stable and event-driven market conditions respectively, and (3) a Soft Actor-Critic reinforcement learning controller that adaptively tunes the regime detection threshold and pathway blending weights based on prediction performance feedback. The reinforcement learning component enables the system to learn adaptive regime boundaries, defining anomalies as market states where standard prediction approaches fail. Experiments on 20 S&P 500 stocks spanning 1982 to 2025 demonstrate that the proposed framework achieves 0.68% MAPE for one-day predictions without the reinforcement controller and 0.59% MAPE with the full adaptive system, compared to 0.80% for the baseline integrated node transformer. Directional accuracy reaches 72% with the complete framework. The system maintains robust performance during high-volatility periods, with MAPE below 0.85% when baseline models exceed 1.5%. Ablation studies confirm that each component contributes meaningfully: autoencoder routing accounts for 36% relative MAPE degradation upon removal, followed by the SAC controller at 15% and the dual-path architecture at 7%.

[559]  arXiv:2603.19137 [pdf, ps, other]
Title: GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning
Comments: Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Effective embodied exploration requires agents to accumulate and retain spatial knowledge over time. However, existing scene representations, such as discrete scene graphs or static view-based snapshots, lack \textit{post-hoc re-observability}. If an initial observation misses a target, the resulting memory omission is often irrecoverable. To bridge this gap, we propose \textbf{GSMem}, a zero-shot embodied exploration and reasoning framework built upon 3D Gaussian Splatting (3DGS). By explicitly parameterizing continuous geometry and dense appearance, 3DGS serves as a persistent spatial memory that endows the agent with \textit{Spatial Recollection}: the ability to render photorealistic novel views from optimal, previously unoccupied viewpoints. To operationalize this, GSMem employs a retrieval mechanism that simultaneously leverages parallel object-level scene graphs and semantic-level language fields. This complementary design robustly localizes target regions, enabling the agent to ``hallucinate'' optimal views for high-fidelity Vision-Language Model (VLM) reasoning. Furthermore, we introduce a hybrid exploration strategy that combines VLM-driven semantic scoring with a 3DGS-based coverage objective, balancing task-aware exploration with geometric coverage. Extensive experiments on embodied question answering and lifelong navigation demonstrate the robustness and effectiveness of our framework

[560]  arXiv:2603.19138 [pdf, ps, other]
Title: Implicit Patterns in LLM-Based Binary Analysis
Comments: 18 pages
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Binary vulnerability analysis is increasingly performed by LLM-based agents in an iterative, multi-pass manner, with the model as the core decision-maker. However, how such systems organize exploration over hundreds of reasoning steps remains poorly understood, due to limited context windows and implicit token-level behaviors. We present the first large-scale, trace-level study showing that multi-pass LLM reasoning gives rise to structured, token-level implicit patterns. Analyzing 521 binaries with 99,563 reasoning steps, we identify four dominant patterns: early pruning, path-dependent lock-in, targeted backtracking, and knowledge-guided prioritization that emerge implicitly from reasoning traces. These token-level implicit patterns serve as an abstraction of LLM reasoning: instead of explicit control-flow or predefined heuristics, exploration is organized through implicit decisions regulating path selection, commitment, and revision. Our analysis shows these patterns form a stable, structured system with distinct temporal roles and measurable characteristics. Our results provide the first systematic characterization of LLM-driven binary analysis and a foundation for more reliable analysis systems.

[561]  arXiv:2603.19139 [pdf, ps, other]
Title: Hierarchical Latent Structure Learning through Online Inference
Comments: 4 figures, 5 supplementary figures
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Learning systems must balance generalization across experiences with discrimination of task-relevant details. Effective learning therefore requires representations that support both. Online latent-cause models support incremental inference but assume flat partitions, whereas hierarchical Bayesian models capture multilevel structure but typically require offline inference. We introduce the Hierarchical Online Learning of Multiscale Experience Structure (HOLMES) model, a computational framework for hierarchical latent structure learning through online inference. HOLMES combines a variation on the nested Chinese Restaurant Process prior with sequential Monte Carlo inference to perform tractable trial-by-trial inference over hierarchical latent representations without explicit supervision over the latent structure. In simulations, HOLMES matched the predictive performance of flat models while learning more compact representations that supported one-shot transfer to higher-level latent categories. In a context-dependent task with nested temporal structure, HOLMES also improved outcome prediction relative to flat models. These results provide a tractable computational framework for discovering hierarchical structure in sequential data.

[562]  arXiv:2603.19141 [pdf, ps, other]
Title: SHAPCA: Consistent and Interpretable Explanations for Machine Learning Models on Spectroscopy Data
Comments: 25 pages, 6 figures
Subjects: Machine Learning (cs.LG)

In recent years, machine learning models have been increasingly applied to spectroscopic datasets for chemical and biomedical analysis. For their successful adoption, particularly in clinical and safety-critical settings, professionals and researchers must be able to understand and trust the reasoning behind model predictions. However, the inherently high dimensionality and strong collinearity of spectroscopy data pose a fundamental challenge to model explainability. These properties not only complicate model training but also undermine the stability and consistency of explanations, leading to fluctuations in feature importance across repeated training runs. Feature extraction techniques have been used to reduce the input dimensionality; these new features hinder the connection between the prediction and the original signal. This study proposes SHAPCA, an explainable machine learning pipeline that combines Principal Component Analysis (for dimensionality reduction) and Shapely Additive exPlanations (for post hoc explanation) to provide explanations in the original input space, which a practitioner can interpret and link back to the biological components. The proposed framework enables analysis from both global and local perspectives, revealing the spectral bands that drive overall model behaviour as well as the instance-specific features that influence individual predictions. Numerical analysis demonstrated the interpretability of the results and greater consistency across different runs.

[563]  arXiv:2603.19144 [pdf, ps, other]
Title: UGID: Unified Graph Isomorphism for Debiasing Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) exhibit pronounced social biases. Output-level or data-optimization--based debiasing methods cannot fully resolve these biases, and many prior works have shown that biases are embedded in internal representations. We propose \underline{U}nified \underline{G}raph \underline{I}somorphism for \underline{D}ebiasing large language models (\textit{\textbf{UGID}}), an internal-representation--level debiasing framework for large language models that models the Transformer as a structured computational graph, where attention mechanisms define the routing edges of the graph and hidden states define the graph nodes. Specifically, debiasing is formulated as enforcing invariance of the graph structure across counterfactual inputs, with differences allowed only on sensitive attributes. \textit{\textbf{UGID}} jointly constrains attention routing and hidden representations in bias-sensitive regions, effectively preventing bias migration across architectural components. To achieve effective behavioral alignment without degrading general capabilities, we introduce a log-space constraint on sensitive logits and a selective anchor-based objective to preserve definitional semantics. Extensive experiments on large language models demonstrate that \textit{\textbf{UGID}} effectively reduces bias under both in-distribution and out-of-distribution settings, significantly reduces internal structural discrepancies, and preserves model safety and utility.

[564]  arXiv:2603.19145 [pdf, ps, other]
Title: Enhancing Pretrained Model-based Continual Representation Learning via Guided Random Projection
Subjects: Machine Learning (cs.LG)

Recent paradigms in Random Projection Layer (RPL)-based continual representation learning have demonstrated superior performance when building upon a pre-trained model (PTM). These methods insert a randomly initialized RPL after a PTM to enhance feature representation in the initial stage. Subsequently, a linear classification head is used for analytic updates in the continual learning stage. However, under severe domain gaps between pre-trained representations and target domains, a randomly initialized RPL exhibits limited expressivity under large domain shifts. While largely scaling up the RPL dimension can improve expressivity, it also induces an ill-conditioned feature matrix, thereby destabilizing the recursive analytic updates of the linear head. To this end, we propose the Stochastic Continual Learner with MemoryGuard Supervisory Mechanism (SCL-MGSM). Unlike random initialization, MGSM constructs the projection layer via a principled, data-guided mechanism that progressively selects target-aligned random bases to adapt the PTM representation to downstream tasks. This facilitates the construction of a compact yet expressive RPL while improving the numerical stability of analytic updates. Extensive experiments on multiple exemplar-free Class Incremental Learning (CIL) benchmarks demonstrate that SCL-MGSM achieves superior performance compared to state-of-the-art methods.

[565]  arXiv:2603.19146 [pdf, ps, other]
Title: D5P4: Partition Determinantal Point Process for Diversity in Parallel Discrete Diffusion Decoding
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Discrete diffusion models are promising alternatives to autoregressive approaches for text generation, yet their decoding methods remain under-studied. Standard decoding methods for autoregressive models, such as beam search, do not directly apply to iterative denoising, and existing diffusion decoding techniques provide limited control over in-batch diversity. To bridge this gap, we introduce a generalized beam-search framework for discrete diffusion that generates candidates in parallel and supports modular beam-selection objectives. As a diversity-focused instantiation, we propose D5P4, which formulates the selection step as MAP inference over a Determinantal Point Process. Leveraging a scalable greedy solver, D5P4 maintains multi-GPU compatibility and enables an explicit trade-off between model probability and target diversity with near-zero compute overhead. Experiments on free-form generation and question answering demonstrate that D5P4 improves diversity over strong baselines while maintaining competitive generation quality.

[566]  arXiv:2603.19149 [pdf, ps, other]
Title: Optimal Splitting of Language Models from Mixtures to Specialized Domains
Comments: 26 pages, 11 tables, 17 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Language models achieve impressive performance on a variety of knowledge, language, and reasoning tasks due to the scale and diversity of pretraining data available. The standard training recipe is a two-stage paradigm: pretraining first on the full corpus of data followed by specialization on a subset of high quality, specialized data from the full corpus. In the multi-domain setting, this involves continued pretraining of multiple models on each specialized domain, referred to as split model training. We propose a method for pretraining multiple models independently over a general pretraining corpus, and determining the optimal compute allocation between pretraining and continued pretraining using scaling laws. Our approach accurately predicts the loss of a model of size N with D pretraining and D' specialization tokens, and extrapolates to larger model sizes and number of tokens. Applied to language model training, our approach improves performance consistently across common sense knowledge and reasoning benchmarks across different model sizes and compute budgets.

[567]  arXiv:2603.19150 [pdf, ps, other]
Title: Performance Testing of ChaCha20-Poly1305 for Internet of Things and Industrial Control System devices
Comments: Accepted to IoTBDS 2026
Subjects: Cryptography and Security (cs.CR)

Industrial Control Systems (ICS), and many simple Internet of Things (IoT) devices, commonly communicate using unencrypted or unauthenticated protocols. For ICS this is an historical carryover since the introduction of these systems predated practical lightweight cryptography. As the processing power of small devices has grown exponentially at the same time as new, more efficient encryption algorithms have become available, end device encryption of communication protocols is becoming much more practical, but is still not widely used with ICS protocols such as Modbus and IEC61850 (GOOSE) which have tight requirements for both latency and variance. Newer micro-processors can also present challenges both to measurement and use, since features such as dynamic frequency scaling can significantly impact performance measurements. In this paper, we measured the time cost of adding encryption into the communication cycle of low-cost edge devices using ChaCha20-Poly1305, and show that in the worst case the encryption cycle took less than 7.1 percent of the latency requirements of Goose, and less than 3% for IEC-60834-1 on Raspberry PI 4, and an Intel N95 Mini PC, which is well within the specified latency requirements for these protocols.

[568]  arXiv:2603.19152 [pdf, ps, other]
Title: VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models
Comments: 23 pages. Includes figures and tables. Conference submission
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models frequently exhibit suboptimal performance on low resource languages, primarily due to inefficient subword segmentation and systemic training data imbalances. In this paper, we propose Variable Entropy Policy Optimization (VEPO), which leverages Reinforcement Learning with Verifiable Rewards to incorporate deterministic structural constraints into the policy alignment process. This framework ensures prescribed sequence length, robust format consistency, and rigorous linguistic well formedness, all enforced during training. Central to our approach is a variable entropy mechanism that enables the model to dynamically calibrate the equilibrium between literal fidelity and semantic naturalness by modulating the exploration exploitation manifold. By integrating entropy tempered advantage estimation with asymmetric clipping, VEPO sustains robust exploration while mitigating policy collapse. Empirical evaluations across 90 FLORES-200, COMET-22, chrF directions demonstrate that VEPO yields substantial improvements in both tokenization efficiency and translation quality, bridging the performance gap for underrepresented languages.

[569]  arXiv:2603.19157 [pdf, ps, other]
Title: ADAPT: Attention Driven Adaptive Prompt Scheduling and InTerpolating Orthogonal Complements for Rare Concepts Generation
Comments: Accepted in CVPR 2026 (findings). 10 pages, 4 figures; supplementary material included (8 pages, 10 figures)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generating rare compositional concepts in text-to-image synthesis remains a challenge for diffusion models, particularly for attributes that are uncommon in the training data. While recent approaches, such as R2F, address this challenge by utilizing LLM for prompt scheduling, they suffer from inherent variance due to the randomness of language models and suboptimal guidance from iterative text embedding switching. To address these problems, we propose the ADAPT framework, a training-free framework that deterministically plans and semantically aligns prompt schedules, providing consistent guidance to enhance the composition of rare concepts. By leveraging attention scores and orthogonal components, ADAPT significantly enhances compositional generation of rare concepts in the RareBench benchmark without additional training or fine-tuning. Through comprehensive experiments, we demonstrate that ADAPT achieves superior performance in RareBench and accurately reflects the semantic information of rare attributes, providing deterministic and precise control over the generation of rare compositions without compromising visual integrity.

[570]  arXiv:2603.19158 [pdf, ps, other]
Title: Adaptive Auxiliary Prompt Blending for Target-Faithful Diffusion Generation
Comments: Accepted in CVPR 2026 (main track). 10 pages, 6 figures; supplementary material included (14 pages, 11 figures)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion-based text-to-image (T2I) models have made remarkable progress in generating photorealistic and semantically rich images. However, when the target concepts lie in low-density regions of the training distribution, these models often produce semantically misaligned or structurally inconsistent results. This limitation arises from the long-tailed nature of text-image datasets, where rare concepts or editing instructions are underrepresented. To address this, we introduce Adaptive Auxiliary Prompt Blending (AAPB) - a unified framework that stabilizes the diffusion process in low-density regions. AAPB leverages auxiliary anchor prompts to provide semantic support in rare concept generation and structural support in image editing, ensuring faithful guidance toward the target prompt. Unlike prior heuristic prompt alternation methods, AAPB derives a closed-form adaptive coefficient that optimally balances the influence between the auxiliary anchor and the target prompt at each diffusion step. Grounded in Tweedie's identity, our formulation provides a principled and training-free framework for adaptive prompt blending, ensuring stable and target-faithful generation. We demonstrate the effectiveness of adaptive interpolation over fixed interpolation through controlled experiments and empirically show consistent improvements on the RareBench and FlowEdit datasets, achieving superior semantic accuracy and structural fidelity compared to prior training-free baselines.

[571]  arXiv:2603.19163 [pdf, ps, other]
Title: cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization
Authors: Yuyang Liu
Comments: 28 pages, 9 figures. Code available at this https URL
Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Combinatorial optimization problems arise in logistics, scheduling, and resource allocation, yet existing approaches face a fundamental trade-off among generality, performance, and usability. We present cuGenOpt, a GPU-accelerated general-purpose metaheuristic framework that addresses all three dimensions simultaneously.
At the engine level, cuGenOpt adopts a "one block evolves one solution" CUDA architecture with a unified encoding abstraction (permutation, binary, integer), a two-level adaptive operator selection mechanism, and hardware-aware resource management. At the extensibility level, a user-defined operator registration interface allows domain experts to inject problem-specific CUDA search operators. At the usability level, a JIT compilation pipeline exposes the framework as a pure-Python API, and an LLM-based modeling assistant converts natural-language problem descriptions into executable solver code.
Experiments across five thematic suites on three GPU architectures (T4, V100, A800) show that cuGenOpt outperforms general MIP solvers by orders of magnitude, achieves competitive quality against specialized solvers on instances up to n=150, and attains 4.73% gap on TSP-442 within 30s. Twelve problem types spanning five encoding variants are solved to optimality. Framework-level optimizations cumulatively reduce pcb442 gap from 36% to 4.73% and boost VRPTW throughput by 75-81%.
Code: https://github.com/L-yang-yang/cugenopt

[572]  arXiv:2603.19165 [pdf, ps, other]
Title: Rigorous Error Certification for Neural PDE Solvers: From Empirical Residuals to Solution Guarantees
Comments: 35 pages
Subjects: Machine Learning (cs.LG); Analysis of PDEs (math.AP); Functional Analysis (math.FA)

Uncertainty quantification for partial differential equations is traditionally grounded in discretization theory, where solution error is controlled via mesh/grid refinement. Physics-informed neural networks fundamentally depart from this paradigm: they approximate solutions by minimizing residual losses at collocation points, introducing new sources of error arising from optimization, sampling, representation, and overfitting. As a result, the generalization error in the solution space remains an open problem.
Our main theoretical contribution establishes generalization bounds that connect residual control to solution-space error. We prove that when neural approximations lie in a compact subset of the solution space, vanishing residual error guarantees convergence to the true solution. We derive deterministic and probabilistic convergence results and provide certified generalization bounds translating residual, boundary, and initial errors into explicit solution error guarantees.

[573]  arXiv:2603.19166 [pdf, ps, other]
Title: Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation
Comments: Equal contribution: Swagat Padhan and Lakshya Jain, 9 pages, 6 figures, paper website: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In this work, we empirically demonstrate that state-of-the-art VLM-based grounding approaches struggle with complex metric-semantic language queries. To address this limitation, we propose MAPG (Multi-Agent Probabilistic Grounding), an agentic framework that decomposes language queries into structured subcomponents and queries a VLM to ground each component. MAPG then probabilistically composes these grounded outputs to produce metrically consistent, actionable decisions in 3D space. We evaluate MAPG on the HM-EQA benchmark and show consistent performance improvements over strong baselines. Furthermore, we introduce a new benchmark, MAPG-Bench, specifically designed to evaluate metric-semantic goal grounding, addressing a gap in existing language grounding evaluations. We also present a real-world robot demonstration showing that MAPG transfers beyond simulation when a structured scene representation is available.

[574]  arXiv:2603.19167 [pdf, ps, other]
Title: Evaluating Counterfactual Strategic Reasoning in Large Language Models
Subjects: Computation and Language (cs.CL)

We evaluate Large Language Models (LLMs) in repeated game-theoretic settings to assess whether strategic performance reflects genuine reasoning or reliance on memorized patterns. We consider two canonical games, Prisoner's Dilemma (PD) and Rock-Paper-Scissors (RPS), upon which we introduce counterfactual variants that alter payoff structures and action labels, breaking familiar symmetries and dominance relations. Our multi-metric evaluation framework compares default and counterfactual instantiations, showcasing LLM limitations in incentive sensitivity, structural generalization and strategic reasoning within counterfactual environments.

[575]  arXiv:2603.19169 [pdf, ps, other]
Title: ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angiography Analysis
Comments: 28 pages, 5 figures . arXiv:submit/7385738 [cs.AI]
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically coherent stenosis detection. The perception module employs DPO to fine-tune the Sa2VA vision-language foundation model using Betti number constraints as preference signals, aligning the policy toward geometrically complete vessel structures rather than pixel-wise overlap metrics. The reasoning module formulates stenosis localization as a Markov Decision Process with an explicit rejection mechanism that autonomously defers ambiguous anatomical candidates such as bifurcations and vessel crossings, shifting from coverage maximization to reliability optimization. On 1,400 clinical angiograms, ARIADNE achieves state-of-the-art centerline Dice of 0.838, reduces false positives by 41% compared to geometric baselines. External validation on multi-center benchmarks ARCADE and XCAD confirms generalization across acquisition protocols. This represents the first application of DPO for topological alignment in medical imaging, demonstrating that preference-based learning over structural constraints mitigates topological violations while maintaining diagnostic sensitivity in interventional cardiology workflows.

[576]  arXiv:2603.19170 [pdf, ps, other]
Title: ADMM-Based Distributed MPC with Control Barrier Functions for Safe Multi-Robot Quadrupedal Locomotion
Subjects: Robotics (cs.RO); Optimization and Control (math.OC)

This paper proposes a fully decentralized model predictive control (MPC) framework with control barrier function (CBF) constraints for safety-critical trajectory planning in multi-robot legged systems. The incorporation of CBF constraints introduces explicit inter-agent coupling, which prevents direct decomposition of the resulting optimal control problems. To address this challenge, we reformulate the centralized safety-critical MPC problem using a structured distributed optimization framework based on the alternating direction method of multipliers (ADMM). By introducing a novel node-edge splitting formulation with consensus constraints, the proposed approach decomposes the global problem into independent node-local and edge-local quadratic programs that can be solved in parallel using only neighbor-to-neighbor communication. This enables fully decentralized trajectory optimization with symmetric computational load across agents while preserving safety and dynamic feasibility. The proposed framework is integrated into a hierarchical locomotion control architecture for quadrupedal robots, combining high-level distributed trajectory planning, mid-level nonlinear MPC enforcing single rigid body dynamics, and low-level whole-body control enforcing full-order robot dynamics. The effectiveness of the proposed approach is demonstrated through hardware experiments on two Unitree Go2 quadrupedal robots and numerical simulations involving up to four robots navigating uncertain environments with rough terrain and external disturbances. The results show that the proposed distributed formulation achieves performance comparable to centralized MPC while reducing the average per-cycle planning time by up to 51% in the four-agent case, enabling efficient real-time decentralized implementation.

[577]  arXiv:2603.19172 [pdf, ps, other]
Title: DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Efficient MoE Inference on Edge
Subjects: Machine Learning (cs.LG)

Despite the computational efficiency of MoE models, the excessive memory footprint and I/O overhead inherent in multi-expert architectures pose formidable challenges for real-time inference on resource-constrained edge platforms. While existing static methods struggle with a rigid latency-accuracy trade-off, we observe that expert importance is highly skewed and depth-dependent. Motivated by these insights, we propose DyMoE, a dynamic mixed-precision quantization framework designed for high-performance edge inference. Leveraging insights into expert importance skewness and depth-dependent sensitivity, DyMoE introduces: (1) importance-aware prioritization to dynamically quantize experts at runtime; (2) depth-adaptive scheduling to preserve semantic integrity in critical layers; and (3) look-ahead prefetching to overlap I/O stalls. Experimental results on commercial edge hardware show that DyMoE reduces Time-to-First-Token (TTFT) by 3.44x-22.7x and up to a 14.58x speedup in Time-Per-Output-Token (TPOT) compared to state-of-the-art offloading baselines, enabling real-time, accuracy-preserving MoE inference on resource-constrained edge devices.

[578]  arXiv:2603.19173 [pdf, ps, other]
Title: SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward and backward workloads across BF16, FP8, and NVFP4, including kernels whose best performance is expected to rely on Blackwell-specific capabilities. Unlike prior benchmarks that evaluate kernels primarily relative to software implementations, SOL-ExecBench measures performance against analytically derived Speed-of-Light (SOL) bounds computed by SOLAR, our pipeline for deriving hardware-grounded SOL bounds, yielding a fixed target for hardware-efficient optimization. We report a SOL Score that quantifies how much of the gap between a release-defined scoring baseline and the hardware SOL bound a candidate kernel closes. To support robust evaluation of agentic optimizers, we additionally provide a sandboxed harness with GPU clock locking, L2 cache clearing, isolated subprocess execution, and static analysis based checks against common reward-hacking strategies. SOL-ExecBench reframes GPU kernel benchmarking from beating a mutable software baseline to closing the remaining gap to hardware Speed-of-Light.

[579]  arXiv:2603.19176 [pdf, ps, other]
Title: Few-shot Acoustic Synthesis with Multimodal Flow Matching
Comments: To appear at CVPR 2026. 23 pages, 16 figures. Project Page: this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)

Generating audio that is acoustically consistent with a scene is essential for immersive virtual environments. Recent neural acoustic field methods enable spatially continuous sound rendering but remain scene-specific, requiring dense audio measurements and costly training for each environment. Few-shot approaches improve scalability across rooms but still rely on multiple recordings and, being deterministic, fail to capture the inherent uncertainty of scene acoustics under sparse context. We introduce flow-matching acoustic generation (FLAC), a probabilistic method for few-shot acoustic synthesis that models the distribution of plausible room impulse responses (RIRs) given minimal scene context. FLAC leverages a diffusion transformer trained with a flow-matching objective to generate RIRs at arbitrary positions in novel scenes, conditioned on spatial, geometric, and acoustic cues. FLAC outperforms state-of-the-art eight-shot baselines with one-shot on both the AcousticRooms and Hearing Anything Anywhere datasets. To complement standard perceptual metrics, we further introduce AGREE, a joint acoustic-geometry embedding, enabling geometry-consistent evaluation of generated RIRs through retrieval and distributional metrics. This work is the first to apply generative flow matching to explicit RIR synthesis, establishing a new direction for robust and data-efficient acoustic synthesis.

[580]  arXiv:2603.19182 [pdf, ps, other]
Title: Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
Authors: Zou Qiang
Comments: 10 pages, 5 tables, 0 figures. Conceptual architecture with preliminary simulation-based validation
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity.
This paper proposes the Box Maze framework, a conceptual process-control architecture that decomposes LLM reasoning into three explicit layers: memory grounding, structured inference, and boundary enforcement. We introduce preliminary simulation-based evaluation involving progressive boundary erosion scenarios across multiple heterogeneous LLM systems (DeepSeek-V3, Doubao, Qwen). Results from n=50 adversarial scenarios suggest that explicit cognitive control layers may improve consistency in boundary maintenance, with architectural constraints reducing boundary failure rates from approximately 40% (baseline RLHF) to below 1% under adversarial conditions.
While current validation is simulation-based, these preliminary results indicate that process-level control may offer a promising direction for improving reliability in large language model reasoning.

[581]  arXiv:2603.19183 [pdf, ps, other]
Title: Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models
Comments: 25 pages, 12 figures
Subjects: Robotics (cs.RO)

Vision-Language-Action (VLA) models have emerged as a promising approach for general-purpose robot manipulation. However, their generalization is inconsistent: while these models can perform impressively in some settings, fine-tuned variants often fail on novel objects, scenes, and instructions. We apply mechanistic interpretability techniques to better understand the inner workings of VLA models. To probe internal representations, we train Sparse Autoencoders (SAEs) on hidden layer activations of the VLA. SAEs learn a sparse dictionary whose features act as a compact, interpretable basis for the model's computation. We find that the large majority of extracted SAE features correspond to memorized sequences from specific training demonstrations. However, some features correspond to interpretable, general, and steerable motion primitives and semantic properties, offering a promising glimpse toward VLA generalizability. We propose a metric to categorize features according to whether they represent generalizable transferable primitives or episode-specific memorization. We validate these findings through steering experiments on the LIBERO benchmark. We show that individual SAE features causally influence robot behavior. Steering general features induces behaviors consistent with their semantic meaning and can be applied across tasks and scenes. This work provides the first mechanistic evidence that VLAs can learn generalizable features across tasks and scenes. We observe that supervised fine-tuning on small robotics datasets disproportionately amplifies memorization. In contrast, training on larger, more diverse datasets (e.g., DROID) or using knowledge insulation promotes more general features. We provide an open-source codebase and user-friendly interface for activation collection, SAE training, and feature steering. Our project page is located at this http URL

[582]  arXiv:2603.19185 [pdf, ps, other]
Title: MIDST Challenge at SaTML 2025: Membership Inference over Diffusion-models-based Synthetic Tabular data
Comments: 4 page, 1 table
Subjects: Machine Learning (cs.LG)

Synthetic data is often perceived as a silver-bullet solution to data anonymization and privacy-preserving data publishing. Drawn from generative models like diffusion models, synthetic data is expected to preserve the statistical properties of the original dataset while remaining resilient to privacy attacks. Recent developments of diffusion models have been effective on a wide range of data types, but their privacy resilience, particularly for tabular formats, remains largely unexplored. MIDST challenge sought a quantitative evaluation of the privacy gain of synthetic tabular data generated by diffusion models, with a specific focus on its resistance to membership inference attacks (MIAs). Given the heterogeneity and complexity of tabular data, multiple target models were explored for MIAs, including diffusion models for single tables of mixed data types and multi-relational tables with interconnected constraints. MIDST inspired the development of novel black-box and white-box MIAs tailored to these target diffusion models as a key outcome, enabling a comprehensive evaluation of their privacy efficacy. The MIDST GitHub repository is available at https://github.com/VectorInstitute/MIDST

[583]  arXiv:2603.19186 [pdf, ps, other]
Title: Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Calibrated Alignment
Subjects: Machine Learning (cs.LG)

Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by learning embeddings that map each source's features into a common representation space. OS outcome models are transferred to the RCT embedding space and calibrated using trial data, preserving causal identification from randomization. Finite-sample risk bounds decompose into alignment error, outcome-model complexity, and calibration complexity terms, identifying when embedding alignment outperforms imputation. Under the calibration-based linear variant, the framework provides protection against negative transfer; the neural variant can be vulnerable under severe distributional shift. Under sparse linear models, the embedding approach strictly generalizes imputation. Simulations across 51 settings confirm that (i) calibration-based methods are equivalent for linear CATEs, and (ii) the neural embedding variant wins all 22 nonlinear-regime settings with large margins.

[584]  arXiv:2603.19188 [pdf, ps, other]
Title: Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous Driving
Subjects: Systems and Control (eess.SY)

Autonomous driving (AD) requires safe and reliable decision-making among interacting agents, e.g., vehicles, bicycles, and pedestrians. Multi-agent reinforcement learning (MARL) modeled by Markov games (MGs) provides a suitable framework to characterize such agents' interactions during decision-making. Nash equilibria (NEs) are often the desired solution in an MG. However, it is typically challenging to compute an NE in general-sum games, unless the game is a Markov potential game (MPG), which ensures the NE attainability under a few learning algorithms such as gradient play. However, it has been an open question how to construct an MPG and whether these construction rules are suitable for AD applications. In this paper, we provide sufficient conditions under which an MG is an MPG and show that these conditions can accommodate general driving objectives for autonomous vehicles (AVs) using highway forced merge scenarios as illustrative examples. A parameter-sharing neural network (NN) structure is designed to enable decentralized policy execution. The trained driving policy from MPGs is evaluated in both simulated and naturalistic traffic datasets. Comparative studies with single-agent RL and with human drivers whose behaviors are recorded in the traffic datasets are reported, respectively.

[585]  arXiv:2603.19191 [pdf, ps, other]
Title: OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
Subjects: Artificial Intelligence (cs.AI)

Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and employs a review mechanism to strictly audit the evidence chain before making the final verdict. To facilitate evaluation, we further introduce OmniGUIRewardBench (OGRBench), a holistic cross-platform benchmark for GUI outcome rewards, where all evaluated models achieve their best performance under OS-Themis. Extensive experiments on AndroidWorld show that OS-Themis yields a 10.3% improvement when used to support online RL training, and a 6.9% gain when used for trajectory validation and filtering in the self-training loop, highlighting its potential to drive agent evolution.

[586]  arXiv:2603.19193 [pdf, ps, other]
Title: Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting
Comments: Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Bird's-Eye-View (BEV) perception serves as a cornerstone for autonomous driving, offering a unified spatial representation that fuses surrounding-view images to enable reasoning for various downstream tasks, such as semantic segmentation, 3D object detection, and motion prediction. However, most existing BEV perception frameworks adopt an end-to-end training paradigm, where image features are directly transformed into the BEV space and optimized solely through downstream task supervision. This formulation treats the entire perception process as a black box, often lacking explicit 3D geometric understanding and interpretability, leading to suboptimal performance. In this paper, we claim that an explicit 3D representation matters for accurate BEV perception, and we propose Splat2BEV, a Gaussian Splatting-assisted framework for BEV tasks. Splat2BEV aims to learn BEV feature representations that are both semantically rich and geometrically precise. We first pre-train a Gaussian generator that explicitly reconstructs 3D scenes from multi-view inputs, enabling the generation of geometry-aligned feature representations. These representations are then projected into the BEV space to serve as inputs for downstream tasks. Extensive experiments on nuScenes and argoverse dataset demonstrate that Splat2BEV achieves state-of-the-art performance and validate the effectiveness of incorporating explicit 3D reconstruction into BEV perception.

[587]  arXiv:2603.19196 [pdf, ps, other]
Title: Exploring the Role of Interaction Data to Empower End-User Decision-Making In UI Personalization
Comments: Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
Subjects: Human-Computer Interaction (cs.HC)

User interface personalization enhances digital efficiency, usability, and accessibility. However, in user-driven setups, limited support for identifying and evaluating worthwhile opportunities often leads to underuse. We explore a reflexive personalization approach where individuals engage with their digital interaction data to identify meaningful personalization opportunities and benefits. We interviewed 12 participants, using experimental vignettes as design probes to support reflection on different forms of using interaction data to empower decision-making in personalization and the preferred level of system support. We found that people can independently identify personalization opportunities but prefer system support through visual personalization suggestions. Interaction data can shape how users perceive and approach personalization by reinforcing the perceived value of change and data collection, helping them weigh benefits against effort, and increasing the transparency of system suggestions. We discuss opportunities for designing personalization software that raises end-users' agency over interfaces through reflective engagement with their interaction data.

[588]  arXiv:2603.19199 [pdf, ps, other]
Title: FASTER: Rethinking Real-Time Flow VLAs
Comments: Project page: this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in $\pi_{0.5}$ and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.

[589]  arXiv:2603.19201 [pdf, ps, other]
Title: OmniVTA: Visuo-Tactile World Modeling for Contact-Rich Robotic Manipulation
Comments: TARS Robotics Project Page: this https URL
Subjects: Robotics (cs.RO)

Contact-rich manipulation tasks, such as wiping and assembly, require accurate perception of contact forces, friction changes, and state transitions that cannot be reliably inferred from vision alone. Despite growing interest in visuo-tactile manipulation, progress is constrained by two persistent limitations: existing datasets are small in scale and narrow in task coverage, and current methods treat tactile signals as passive observations rather than using them to model contact dynamics or enable closed-loop control explicitly. In this paper, we present \textbf{OmniViTac}, a large-scale visuo-tactile-action dataset comprising $21{,}000+$ trajectories across $86$ tasks and $100+$ objects, organized into six physics-grounded interaction patterns. Building on this dataset, we propose \textbf{OmniVTA}, a world-model-based visuo-tactile manipulation framework that integrates four tightly coupled modules: a self-supervised tactile encoder, a two-stream visuo-tactile world model for predicting short-horizon contact evolution, a contact-aware fusion policy for action generation, and a 60Hz reflexive controller that corrects deviations between predicted and observed tactile signals in a closed loop. Real-robot experiments across all six interaction categories show that OmniVTA outperforms existing methods and generalizes well to unseen objects and geometric configurations, confirming the value of combining predictive contact modeling with high-frequency tactile feedback for contact-rich manipulation. All data, models, and code will be made publicly available on the project website at https://mrsecant.github.io/OmniVTA.

[590]  arXiv:2603.19203 [pdf, ps, other]
Title: Tinted Frames: Question Framing Blinds Vision-Language Models
Comments: Preprint. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision-Language Models (VLMs) have been shown to be blind, often underutilizing their visual inputs even on tasks that require visual reasoning. In this work, we demonstrate that VLMs are selectively blind. They modulate the amount of attention applied to visual inputs based on linguistic framing even when alternative framings demand identical visual reasoning. Using visual attention as a probe, we quantify how framing alters both the amount and distribution of attention over the image. Constrained framings, such as multiple choice and yes/no, induce substantially lower attention to image context compared to open-ended, reduce focus on task-relevant regions, and shift attention towards uninformative tokens. We further demonstrate that this attention misallocation is the principal cause of degraded accuracy and cross-framing inconsistency. Building on this mechanistic insight, we introduce a lightweight prompt-tuning method using learnable tokens that encourages the robust, visually grounded attention patterns observed in open-ended settings, improving visual grounding and improving performance across framings.

[591]  arXiv:2603.19204 [pdf, ps, other]
Title: Robustness, Cost, and Attack-Surface Concentration in Phishing Detection
Comments: 14 pages, 4 figures, 9 tables
Subjects: Machine Learning (cs.LG)

Phishing detectors built on engineered website features attain near-perfect accuracy under i.i.d.\ evaluation, yet deployment security depends on robustness to post-deployment feature manipulation. We study this gap through a cost-aware evasion framework that models discrete, monotone feature edits under explicit attacker budgets. Three diagnostics are introduced: minimal evasion cost (MEC), the evasion survival rate $S(B)$, and the robustness concentration index (RCI).
On the UCI Phishing Websites benchmark (11\,055 instances, 30 ternary features), Logistic Regression, Random Forests, Gradient Boosted Trees, and XGBoost all achieve $\mathrm{AUC}\ge 0.979$ under static evaluation. Under budgeted sanitization-style evasion, robustness converges across architectures: the median MEC equals 2 with full features, and over 80\% of successful minimal-cost evasions concentrate on three low-cost surface features. Feature restriction improves robustness only when it removes all dominant low-cost transitions. Under strict cost schedules, infrastructure-leaning feature sets exhibit 17-19\% infeasible mass for ensemble models, while the median MEC among evadable instances remains unchanged. We formalize this convergence: if a positive fraction of correctly detected phishing instances admit evasion through a single feature transition of minimal cost $c_{\min}$, no classifier can raise the corresponding MEC quantile above $c_{\min}$ without modifying the feature representation or cost model. Adversarial robustness in phishing detection is governed by feature economics rather than model complexity.

[592]  arXiv:2603.19206 [pdf, ps, other]
Title: RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion models have become the dominant paradigm for image generation and editing, with latent diffusion models shifting denoising to a compact latent space for efficiency and scalability. Recent attempts to leverage pretrained visual representation models as tokenizer priors either align diffusion features to representation features or directly reuse representation encoders as frozen tokenizers. Although such approaches can improve generation metrics, they often suffer from limited reconstruction fidelity due to frozen encoders, which in turn degrades editing quality, as well as overly high-dimensional latents that make diffusion modeling difficult. To address these limitations, We propose Representation-Pivoted AutoEncoder, a representation-based tokenizer that improves both generation and editing. We introduce Representation-Pivot Regularization, a training strategy that enables a representation-initialized encoder to be fine-tuned for reconstruction while preserving the semantic structure of the pretrained representation space, followed by a variational bridge which compress latent space into a compact one for better diffusion modeling. We adopt an objective-decoupled stage-wise training strategy that sequentially optimizes generative tractability and reconstruction-fidelity objectives. Together, these components yield a tokenizer that preserves strong semantics, reconstructs faithfully, and produces latents with reduced diffusion modeling complexity. Experiments demonstrate that RPiAE outperforms other visual tokenizers on text-to-image generation and image editing, while delivering the best reconstruction fidelity among representation-based tokenizers.

[593]  arXiv:2603.19209 [pdf, ps, other]
Title: Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders
Comments: Project page: this https URL ; Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Large vision--language models (VLMs) often use a frozen vision backbone, whose image features are mapped into a large language model through a lightweight connector. While transformer-based encoders are the standard visual backbone, we ask whether state space model (SSM) vision backbones can be a strong alternative. We systematically evaluate SSM vision backbones for VLMs in a controlled setting. Under matched ImageNet-1K initialization, the SSM backbone achieves the strongest overall performance across both VQA and grounding/localization. We further adapt both SSM and ViT-family backbones with detection or segmentation training and find that dense-task tuning generally improves performance across families; after this adaptation, the SSM backbone remains competitive while operating at a substantially smaller model scale. We further observe that (i) higher ImageNet accuracy or larger backbones do not reliably translate into better VLM performance, and (ii) some visual backbones are unstable in localization. Based on these findings, we propose stabilization strategies that improve robustness for both backbone families and highlight SSM backbones as a strong alternative to transformer-based vision encoders in VLMs.

[594]  arXiv:2603.19213 [pdf, ps, other]
Title: Constitutive vs. Corrective: A Causal Taxonomy of Human Runtime Involvement in AI Systems
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

As AI systems increasingly permeate high-stakes decision-making, the terminology regarding human involvement - Human-in-the-Loop (HITL), Human-on-the-Loop (HOTL), and Human Oversight - has become vexingly ambiguous. This ambiguity complicates interdisciplinary collaboration between computer science, law, philosophy, psychology, and sociology and can lead to regulatory uncertainty. We propose a clarification grounded in causal structure, focused on human involvement during the runtime of AI systems. The distinction between HITL and HOTL, we argue, is not primarily spatial but causal: HITL is constitutive (a human contribution is necessary for the decision output), while HOTL is corrective (external to the primary causal chain, capable of preventing or modifying outputs). Within HOTL, we distinguish three temporal modes - synchronous, asynchronous, and anticipatory - situated within a nested model of provider and deployer runtime that clarifies their different capacities for intervention. A second, orthogonal dimension captures cognitive integration: whether human and machine operate as complementary or hybrid intelligence, yielding four structurally distinct configurations. Finally, we distinguish these descriptive categories from the normative requirements they serve: statutory "Human Oversight" is a specific normative mode of HOTL that demands not merely a corrective causal position, but genuine preparedness and capacity for effective intervention. Because the same person may occupy both HITL and HOTL roles simultaneously, we argue that this role duality must be treated as a design problem requiring architectural and epistemic mitigation rather than mere acknowledgment.

[595]  arXiv:2603.19216 [pdf, ps, other]
Title: DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, part-aware text-to-3D generation. DreamPartGen introduces Duplex Part Latents (DPLs) that jointly model each part's geometry and appearance, and Relational Semantic Latents (RSLs) that capture inter-part dependencies derived from language. A synchronized co-denoising process enforces mutual geometric and semantic consistency, enabling coherent, interpretable, and text-aligned 3D synthesis. Across multiple benchmarks, DreamPartGen delivers state-of-the-art performance in geometric fidelity and text-shape alignment.

[596]  arXiv:2603.19217 [pdf, ps, other]
Title: LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advancements in omnimodal large language models (OmniLLMs) have significantly improved the comprehension of audio and video inputs. However, current evaluations primarily focus on short audio and video clips ranging from 10 seconds to 5 minutes, failing to reflect the demands of real-world applications, where videos typically run for tens of minutes. To address this critical gap, we introduce LVOmniBench, a new benchmark designed specifically for the cross-modal comprehension of long-form audio and video. This dataset comprises high-quality videos sourced from open platforms that feature rich audio-visual dynamics. Through rigorous manual selection and annotation, LVOmniBench comprises 275 videos, ranging in duration from 10 to 90 minutes, and 1,014 question-answer (QA) pairs. LVOmniBench aims to rigorously evaluate the capabilities of OmniLLMs across domains, including long-term memory, temporal localization, fine-grained understanding, and multimodal perception. Our extensive evaluation reveals that current OmniLLMs encounter significant challenges when processing extended audio-visual inputs. Open-source models generally achieve accuracies below 35%, whereas the Gemini 3 Pro reaches a peak accuracy of approximately 65%. We anticipate that this dataset, along with our empirical findings, will stimulate further research and the development of advanced models capable of resolving complex cross-modal understanding problems within long-form audio-visual contexts.

[597]  arXiv:2603.19218 [pdf, ps, other]
Title: Rethinking Vector Field Learning for Generative Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Taming diffusion models for generative segmentation has attracted increasing attention. While existing approaches primarily focus on architectural tweaks or training heuristics, there remains a limited understanding of the intrinsic mismatch between continuous flow matching objectives and discrete perception tasks. In this work, we revisit diffusion segmentation from the perspective of vector field learning. We identify two key limitations of the commonly used flow matching objective: gradient vanishing and trajectory traversing, which result in slow convergence and poor class separation. To tackle these issues, we propose a principled vector field reshaping strategy that augments the learned velocity field with a detached distance-aware correction term. This correction introduces both attractive and repulsive interactions, enhancing gradient magnitudes near centroids while preserving the original diffusion training framework. Furthermore, we design a computationally efficient, quasi-random category encoding scheme inspired by Kronecker sequences, which integrates seamlessly with an end-to-end pixel neural field framework for pixel-level semantic alignment. Extensive experiments consistently demonstrate significant improvements over vanilla flow matching approaches, substantially narrowing the performance gap between generative segmentation and strong discriminative specialists.

[598]  arXiv:2603.19219 [pdf, ps, other]
Title: DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction and Understanding
Comments: Project Page: this https URL Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and understanding. DriveTok first obtains semantically rich visual features from vision foundation models and then transforms them into the scene tokens with 3D deformable cross-attention. For decoding, we employ a multi-view transformer to reconstruct multi-view features from the scene tokens and use multiple heads to obtain RGB, depth, and semantic reconstructions. We also add a 3D head directly on the scene tokens for 3D semantic occupancy prediction for better spatial awareness. With the multiple training objectives, DriveTok learns unified scene tokens that integrate semantic, geometric, and textural information for efficient multi-view tokenization. Extensive experiments on the widely used nuScenes dataset demonstrate that the scene tokens from DriveTok perform well on image reconstruction, semantic segmentation, depth prediction, and 3D occupancy prediction tasks.

[599]  arXiv:2603.19220 [pdf, ps, other]
Title: Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation
Comments: We release the model and data at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. In contrast to Nemotron-Cascade 1, the key technical advancements are as follows. After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.

[600]  arXiv:2603.19221 [pdf, ps, other]
Title: Online Learning and Equilibrium Computation with Ranking Feedback
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Science and Game Theory (cs.GT)

Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. Most existing online learning algorithms rely on \emph{numeric} utility feedback from the environment, which may be unavailable in human-in-the-loop applications and/or may be restricted by privacy concerns. In this paper, we study an online learning model in which the learner only observes a \emph{ranking} over a set of proposed actions at each timestep. We consider two ranking mechanisms: rankings induced by the \emph{instantaneous} utility at the current timestep, and rankings induced by the \emph{time-average} utility up to the current timestep, under both \emph{full-information} and \emph{bandit} feedback settings. Using the standard external-regret metric, we show that sublinear regret is impossible with instantaneous-utility ranking feedback in general. Moreover, when the ranking model is relatively deterministic, \emph{i.e.}, under the Plackett-Luce model with a temperature that is sufficiently small, sublinear regret is also impossible with time-average utility ranking feedback. We then develop new algorithms that achieve sublinear regret under the additional assumption that the utility sequence has sublinear total variation. Notably, for full-information time-average utility ranking feedback, this additional assumption can be removed. As a consequence, when all players in a normal-form game follow our algorithms, repeated play yields an approximate coarse correlated equilibrium. We also demonstrate the effectiveness of our algorithms in an online large-language-model routing task.

[601]  arXiv:2603.19222 [pdf, ps, other]
Title: Spectrally-Guided Diffusion Noise Schedules
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral properties. By deriving theoretical bounds on the efficacy of minimum and maximum noise levels, we design ``tight'' noise schedules that eliminate redundant steps. During inference, we propose to conditionally sample such noise schedules. Experiments show that our noise schedules improve generative quality of single-stage pixel diffusion models, particularly in the low-step regime.

[602]  arXiv:2603.19223 [pdf, ps, other]
Title: F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques, we present models that are far more efficient than previous LLM-based embedding models while retaining competitive performances. Extensive evaluations confirm that F2LLM-v2-14B ranks first on 11 MTEB benchmarks, while the smaller models in the family also set a new state of the art for resource-constrained applications. To facilitate open-source embedding model research, we release all models, data, code, and intermediate checkpoints.

[603]  arXiv:2603.19224 [pdf, ps, other]
Title: EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing
Comments: CVPR 2026, Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video object removal aims to eliminate dynamic target objects and their visual effects, such as deformation, shadows, and reflections, while restoring seamless backgrounds. Recent diffusion-based video inpainting and object removal methods can remove the objects but often struggle to erase these effects and to synthesize coherent backgrounds. Beyond method limitations, progress is further hampered by the lack of a comprehensive dataset that systematically captures common object effects across varied environments for training and evaluation. To address this, we introduce VOR (Video Object Removal), a large-scale dataset that provides diverse paired videos, each consisting of one video where the target object is present with its effects and a counterpart where the object and effects are absent, with corresponding object masks. VOR contains 60K high-quality video pairs from captured and synthetic sources, covers five effects types, and spans a wide range of object categories as well as complex, dynamic multi-object scenes. Building on VOR, we propose EffectErase, an effect-aware video object removal method that treats video object insertion as the inverse auxiliary task within a reciprocal learning scheme. The model includes task-aware region guidance that focuses learning on affected areas and enables flexible task switching. Then, an insertion-removal consistency objective that encourages complementary behaviors and shared localization of effect regions and structural cues. Trained on VOR, EffectErase achieves superior performance in extensive experiments, delivering high-quality video object effect erasing across diverse scenarios.

[604]  arXiv:2603.19225 [pdf, ps, other]
Title: FinTradeBench: A Financial Reasoning Benchmark for LLMs
Authors: Yogesh Agrawal, Aniruddha Dutta, Md Mahadi Hasan, Santu Karmaker, Aritra Dutta (University of Central Florida)
Comments: 8 pages main text, 22 pages total (including references and appendix). 5 figures, 14 tables. Preprint under review. Code and data will be made available upon publication
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Computational Finance (q-fin.CP)

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question answering benchmarks for testing these models primarily focus on company balance sheet data and rarely evaluate reasoning over how company stocks trade in the market or their interactions with fundamentals. To take advantage of the strengths of both approaches, we introduce FinTradeBench, a benchmark for evaluating financial reasoning that integrates company fundamentals and trading signals. FinTradeBench contains 1,400 questions grounded in NASDAQ-100 companies over a ten-year historical window. The benchmark is organized into three reasoning categories: fundamentals-focused, trading-signal-focused, and hybrid questions requiring cross-signal reasoning. To ensure reliability at scale, we adopt a calibration-then-scaling framework that combines expert seed questions, multi-model response generation, intra-model self-filtering, numerical auditing, and human-LLM judge alignment. We evaluate 14 LLMs under zero-shot prompting and retrieval-augmented settings and witness a clear performance gap. Retrieval substantially improves reasoning over textual fundamentals, but provides limited benefit for trading-signal reasoning. These findings highlight fundamental challenges in the numerical and time-series reasoning for current LLMs and motivate future research in financial intelligence.

[605]  arXiv:2603.19226 [pdf, ps, other]
Title: Under One Sun: Multi-Object Generative Perception of Materials and Illumination
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce Multi-Object Generative Perception (MultiGP), a generative inverse rendering method for stochastic sampling of all radiometric constituents -- reflectance, texture, and illumination -- underlying object appearance from a single image. Our key idea to solve this inherently ambiguous radiometric disentanglement is to leverage the fact that while their texture and reflectance may differ, objects in the same scene are all lit by the same illumination. MultiGP exploits this consensus to produce samples of reflectance, texture, and illumination from a single image of known shapes based on four key technical contributions: a cascaded end-to-end architecture that combines image-space and angular-space disentanglement; Coordinated Guidance for diffusion convergence to a single consistent illumination estimate; Axial Attention applied to facilitate ``cross-talk'' between objects of different reflectance; and a Texture Extraction ControlNet to preserve high-frequency texture details while ensuring decoupling from estimated lighting. Experimental results demonstrate that MultiGP effectively leverages the complementary spatial and frequency characteristics of multiple object appearances to recover individual texture and reflectance as well as the common illumination.

[606]  arXiv:2603.19227 [pdf, ps, other]
Title: Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer
Comments: Project Page: this https URL GitHub: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Prior motion generation largely follows two paradigms: continuous diffusion models that excel at kinematic control, and discrete token-based generators that are effective for semantic conditioning. To combine their strengths, we propose a three-stage framework comprising condition feature extraction (Perception), discrete token generation (Planning), and diffusion-based motion synthesis (Control). Central to this framework is MoTok, a diffusion-based discrete motion tokenizer that decouples semantic abstraction from fine-grained reconstruction by delegating motion recovery to a diffusion decoder, enabling compact single-layer tokens while preserving motion fidelity. For kinematic conditions, coarse constraints guide token generation during planning, while fine-grained constraints are enforced during control through diffusion-based optimization. This design prevents kinematic details from disrupting semantic token planning. On HumanML3D, our method significantly improves controllability and fidelity over MaskControl while using only one-sixth of the tokens, reducing trajectory error from 0.72 cm to 0.08 cm and FID from 0.083 to 0.029. Unlike prior methods that degrade under stronger kinematic constraints, ours improves fidelity, reducing FID from 0.033 to 0.014.

[607]  arXiv:2603.19228 [pdf, ps, other]
Title: SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing
Comments: 24 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Current instruction-guided video editing models struggle to simultaneously balance precise semantic modifications with faithful motion preservation. While existing approaches rely on injecting explicit external priors (e.g., VLM features or structural conditions) to mitigate these issues, this reliance severely bottlenecks model robustness and generalization. To overcome this limitation, we present SAMA (factorized Semantic Anchoring and Motion Alignment), a framework that factorizes video editing into semantic anchoring and motion modeling. First, we introduce Semantic Anchoring, which establishes a reliable visual anchor by jointly predicting semantic tokens and video latents at sparse anchor frames, enabling purely instruction-aware structural planning. Second, Motion Alignment pre-trains the same backbone on motion-centric video restoration pretext tasks (cube inpainting, speed perturbation, and tube shuffle), enabling the model to internalize temporal dynamics directly from raw videos. SAMA is optimized with a two-stage pipeline: a factorized pre-training stage that learns inherent semantic-motion representations without paired video-instruction editing data, followed by supervised fine-tuning on paired editing data. Remarkably, the factorized pre-training alone already yields strong zero-shot video editing ability, validating the proposed factorization. SAMA achieves state-of-the-art performance among open-source models and is competitive with leading commercial systems (e.g., Kling-Omni). Code, models, and datasets will be released.

[608]  arXiv:2603.19229 [pdf, ps, other]
Title: NavTrust: Benchmarking Trustworthiness for Embodied Navigation
Comments: Project Website: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input modalities, including RGB, depth, and instructions, in realistic scenarios and evaluates their impact on navigation performance. To our best knowledge, NavTrust is the first benchmark that exposes embodied navigation agents to diverse RGB-Depth corruptions and instruction variations in a unified framework. Our extensive evaluation of seven state-of-the-art approaches reveals substantial performance degradation under realistic corruptions, which highlights critical robustness gaps and provides a roadmap toward more trustworthy embodied navigation systems. Furthermore, we systematically evaluate four distinct mitigation strategies to enhance robustness against RGB-Depth and instructions corruptions. Our base models include Uni-NaVid and ETPNav. We deployed them on a real mobile robot and observed improved robustness to corruptions. The project website is: https://navtrust.github.io.

[609]  arXiv:2603.19231 [pdf, ps, other]
Title: MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression unstable. Existing methods address this challenge through multi-view supervision, retrieval-based assembly, or auxiliary video generation, often sacrificing scalability or efficiency. We present MonoArt, a unified framework grounded in progressive structural reasoning. Rather than predicting articulation directly from image features, MonoArt progressively transforms visual observations into canonical geometry, structured part representations, and motion-aware embeddings within a single architecture. This structured reasoning process enables stable and interpretable articulation inference without external motion templates or multi-stage pipelines. Extensive experiments on PartNet-Mobility demonstrate that OM achieves state-of-the-art performance in both reconstruction accuracy and inference speed. The framework further generalizes to robotic manipulation and articulated scene reconstruction.

[610]  arXiv:2603.19232 [pdf, ps, other]
Title: Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens
Comments: Accepted by CVPR 2026 main track; Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual generation with discrete tokens has gained significant attention as it enables a unified token prediction paradigm shared with language models, promising seamless multimodal architectures. However, current discrete generation methods remain limited to low-dimensional latent tokens (typically 8-32 dims), sacrificing the semantic richness essential for understanding. While high-dimensional pretrained representations (768-1024 dims) could bridge this gap, their discrete generation poses fundamental challenges. In this paper, we present Cubic Discrete Diffusion (CubiD), the first discrete generation model for high-dimensional representations. CubiD performs fine-grained masking throughout the high-dimensional discrete representation -- any dimension at any position can be masked and predicted from partial observations. This enables the model to learn rich correlations both within and across spatial positions, with the number of generation steps fixed at $T$ regardless of feature dimensionality, where $T \ll hwd$. On ImageNet-256, CubiD achieves state-of-the-art discrete generation with strong scaling behavior from 900M to 3.7B parameters. Crucially, we validate that these discretized tokens preserve original representation capabilities, demonstrating that the same discrete tokens can effectively serve both understanding and generation tasks. We hope this work will inspire future research toward unified multimodal architectures. Code is available at: https://github.com/YuqingWang1029/CubiD.

[611]  arXiv:2603.19233 [pdf, ps, other]
Title: Not All Features Are Created Equal: A Mechanistic Study of Vision-Language-Action Models
Comments: Accepted to Multimodal Intelligence Workshop @ ICLR
Subjects: Robotics (cs.RO)

Vision-Language-Action (VLA) models combine perception, language, and motor control in a single architecture, yet how they translate multimodal inputs into actions remains poorly understood. We apply activation injection, sparse autoencoders (SAEs), and linear probes to six models spanning 80M--7B parameters across 394,000+ rollout episodes on four benchmarks. The visual pathway dominates action generation across all architectures: injecting baseline activations into null-prompt episodes recovers near-identical behavior, while cross-task injection steers robots toward source-task positions (99.8\% of X-VLA episodes align with the source trajectory), exposing spatially bound motor programs tied to scene coordinates rather than abstract task representations. Language sensitivity depends on task structure, not model design: when visual context uniquely specifies the task, language is ignored; when multiple goals share a scene, language becomes essential (X-VLA \texttt{libero\_goal}: 94\%$\to$10\% under wrong prompts vs.\ \texttt{libero\_object}: 60--100\% regardless). In all three multi-pathway architectures (\pizhalf{}, SmolVLA, GR00T), expert pathways encode motor programs while VLM pathways encode goal semantics ($2\times$ greater behavioral displacement from expert injection), and subspace injection confirms these occupy separable activation subspaces. Per-token SAE processing is essential for action fidelity on most architectures, though mean-pooling improves fidelity on X-VLA. Contrastive identification recovers 82+ manipulation concepts, and causal ablation reveals sensitivity spanning 28--92\% zero-effect rates independent of representation width. We release \textbf{Action Atlas} (https://action-atlas.com) for interactive exploration of VLA representations across all six models.

[612]  arXiv:2603.19234 [pdf, ps, other]
Title: Matryoshka Gaussian Splatting
Comments: project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

The ability to render scenes at adjustable fidelity from a single model, known as level of detail (LoD), is crucial for practical deployment of 3D Gaussian Splatting (3DGS). Existing discrete LoD methods expose only a limited set of operating points, while concurrent continuous LoD approaches enable smoother scaling but often suffer noticeable quality degradation at full capacity, making LoD a costly design decision. We introduce Matryoshka Gaussian Splatting (MGS), a training framework that enables continuous LoD for standard 3DGS pipelines without sacrificing full-capacity rendering quality. MGS learns a single ordered set of Gaussians such that rendering any prefix, the first k splats, produces a coherent reconstruction whose fidelity improves smoothly with increasing budget. Our key idea is stochastic budget training: each iteration samples a random splat budget and optimises both the corresponding prefix and the full set. This strategy requires only two forward passes and introduces no architectural modifications. Experiments across four benchmarks and six baselines show that MGS matches the full-capacity performance of its backbone while enabling a continuous speed-quality trade-off from a single model. Extensive ablations on ordering strategies, training objectives, and model capacity further validate the designs.

[613]  arXiv:2603.19235 [pdf, ps, other]
Title: Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
Comments: 31 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

While Multimodal Large Language Models demonstrate impressive semantic capabilities, they often suffer from spatial blindness, struggling with fine-grained geometric reasoning and physical dynamics. Existing solutions typically rely on explicit 3D modalities or complex geometric scaffolding, which are limited by data scarcity and generalization challenges. In this work, we propose a paradigm shift by leveraging the implicit spatial prior within large-scale video generation models. We posit that to synthesize temporally coherent videos, these models inherently learn robust 3D structural priors and physical laws. We introduce VEGA-3D (Video Extracted Generative Awareness), a plug-and-play framework that repurposes a pre-trained video diffusion model as a Latent World Simulator. By extracting spatiotemporal features from intermediate noise levels and integrating them with semantic representations via a token-level adaptive gated fusion mechanism, we enrich MLLMs with dense geometric cues without explicit 3D supervision. Extensive experiments across 3D scene understanding, spatial reasoning, and embodied manipulation benchmarks demonstrate that our method outperforms state-of-the-art baselines, validating that generative priors provide a scalable foundation for physical-world understanding. Code is publicly available at https://github.com/H-EmbodVis/VEGA-3D.

Cross-lists for Fri, 20 Mar 26

[614]  arXiv:2507.17880 (cross-list from quant-ph) [pdf, ps, other]
Title: Stability of Continuous Time Quantum Walks in Complex Networks
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Computational Physics (physics.comp-ph)

We investigate the stability of continuous-time quantum walks (CTQW) across cycle, complete, star, Erd\H{o}s-R\'enyi, small-world, and scale-free topologies under energy-based intrinsic decoherence, node-based Haken-Strobl noise, and edge-based quantum stochastic walk (QSW) decoherence. Defining stability as the preservation of quantum properties, we characterize it using node probabilities, $\ell_1$-norm of coherence, fidelity, quantum-classical distance, and von Neumann entropy. Our results show that intrinsic decoherence preserves coherence longest while QSW causes rapid decay. Stability rankings vary and depend on the decoherence types, network structure, and properties of node where the walker is initialized specifically in heterogeneous networks. Dense connected network like complete and heterogenous networks, for instance, star, and scale-free are stable under Haken-Strobl noise but become uniquely fragile under QSW when initialized on high degree nodes. However, these same networks, due to their inherent localization, exhibit lower coherence in the noiseless regime, highlighting a fundamental trade-off between localization and coherence. Furthermore, the centrality of the initialization node has a pronounced impact on relaxation time and stability measures, underscoring the critical role of local topological features in quantum dynamics.

[615]  arXiv:2603.18022 (cross-list from math.OC) [pdf, ps, other]
Title: Using Laplace Transform To Optimize the Hallucination of Generation Models
Comments: Corresponding author: Xujing Yao (xjyao@njtech.edu.cn)
Journal-ref: In 2024 18th International Conference on Control, Automation, Robotics and Vision (ICARCV) (pp. 447-453). IEEE
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

To explore the feasibility of avoiding the confident error (or hallucination) of generation models (GMs), we formalise the system of GMs as a class of stochastic dynamical systems through the lens of control theory. Numerous factors can be attributed to the hallucination of the learning process of GMs, utilising knowledge of control theory allows us to analyse their system functions and system responses. Due to the high complexity of GMs when using various optimization methods, we cannot figure out their solution of Laplace transform, but from a macroscopic perspective, simulating the source response provides a virtual way to address the hallucination of GMs. We also find that the training progress is consistent with the corresponding system response, which offers us a useful way to develop a better optimization component. Finally, the hallucination problem of GMs is fundamentally optimized by using Laplace transform analysis.

[616]  arXiv:2603.18023 (cross-list from eess.AS) [pdf, ps, other]
Title: PCOV-KWS: Multi-task Learning for Personalized Customizable Open Vocabulary Keyword Spotting
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)

As advancements in technologies like Internet of Things (IoT), Automatic Speech Recognition (ASR), Speaker Verification (SV), and Text-to-Speech (TTS) lead to increased usage of intelligent voice assistants, the demand for privacy and personalization has escalated. In this paper, we introduce a multi-task learning framework for personalized, customizable open-vocabulary Keyword Spotting (PCOV-KWS). This framework employs a lightweight network to simultaneously perform Keyword Spotting (KWS) and SV to address personalized KWS requirements. We have integrated a training criterion distinct from softmax-based loss, transforming multi-class classification into multiple binary classifications, which eliminates inter-category competition, while an optimization strategy for multi-task loss weighting is employed during training. We evaluated our PCOV-KWS system in multiple datasets, demonstrating that it outperforms the baselines in evaluation results, while also requiring fewer parameters and lower computational resources.

[617]  arXiv:2603.18024 (cross-list from eess.AS) [pdf, ps, other]
Title: ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)

Current keyword spotting systems primarily use phoneme-level matching to distinguish confusable words but ignore user-specific pronunciation traits like prosody (intonation, stress, rhythm). This paper presents ProKWS, a novel framework integrating fine-grained phoneme learning with personalized prosody modeling. We design a dual-stream encoder where one stream derives robust phonemic representations through contrastive learning, while the other extracts speaker-specific prosodic patterns. A collaborative fusion module dynamically combines phonemic and prosodic information, enhancing adaptability across acoustic environments. Experiments show ProKWS delivers highly competitive performance, comparable to state-of-the-art models on standard benchmarks and demonstrates strong robustness for personalized keywords with tone and intent variations.

[618]  arXiv:2603.18026 (cross-list from eess.SP) [pdf, ps, other]
Title: Physically Accurate Differentiable Inverse Rendering for Radio Frequency Digital Twin
Subjects: Signal Processing (eess.SP); Graphics (cs.GR); Machine Learning (cs.LG)

Digital twins, virtual simulated replicas of physical scenes, are transforming system design across industries. However, their potential in radio frequency (RF) systems has been limited by the non-differentiable nature of conventional RF simulators. The visibility of propagation paths causes severe discontinuities, and differentiable rendering techniques from computer graphics cannot easily transfer due to point-source antennas and dominant specular reflections. In this paper, we present RFDT, a physically based differentiable RF simulation framework that enables gradient-based interaction between virtual and physical worlds. RFDT resolves discontinuities with a physically grounded edge-diffraction transition function, and mitigates non-convexity from Fourier-domain processing through a signal domain transform surrogate. Our implementation demonstrates RFDT's ability to accurately reconstruct digital twins from real RF measurements. Moreover, RFDT can augment diverse downstream applications, such as test-time adaptation of machine learning-based RF sensing and physically constrained optimization of communication systems.

[619]  arXiv:2603.18027 (cross-list from eess.SP) [pdf, ps, other]
Title: KD-EKF: Knowledge-Distilled Adaptive Covariance EKF for Robust UWB/PDR Indoor Localization
Comments: 16 pages, 7 figures
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Ultra-wideband (UWB) indoor localization provides centimeter-level accuracy and low latency, but its measurement reliability degrades severely under Non-Line-of-Sight (NLOS) conditions, leading to meter-scale ranging errors and inconsistent uncertainty characteristics. Inertial Measurement Unit (IMU)-based Pedestrian Dead Reckoning (PDR) complements UWB by providing infrastructure-free motion estimation; however, its error accumulates nonlinearly over time due to bias and noise propagation. Fusion methods based on Extended Kalman Filters (EKF) and Particle Filters (PF) can improve average localization accuracy through probabilistic state estimation. However, these approaches typically rely on manually tuned measurement covariances. Such fixed or heuristically tuned parameters are hard to sustain across varying indoor layouts, NLOS ratios, and motion patterns, leading to limited robustness and poor generalization of measurement uncertainty modeling in heterogeneous environments. To address this limitation, this work proposes an adaptive measurement covariance scaling framework in which reliability cues are learned from historical UWB/PDR trajectories. A large teacher model is employed offline to generate temporally consistent next-position predictions from structured UWB/PDR sequences, and this behavior is distilled into a lightweight student model suitable for real-time deployment. The student model continuously regulates EKF measurement covariances based on prediction residuals, enabling environment-aware fusion without manual re-tuning. Experimental results demonstrate that the proposed KD-EKF framework significantly reduces localization error, suppresses error spikes during Line-of-Sight (LOS)/NLOS transitions, and mitigates long-term drift compared to fixed-parameter EKF, thereby improving measurement robustness across diverse indoor environments.

[620]  arXiv:2603.18038 (cross-list from quant-ph) [pdf, ps, other]
Title: Advanced Quantum Annealing for the Bi-Objective Traveling Thief Problem: An $\varepsilon$-Constraint-based Approach
Comments: 14 pages, 5 figures, and 3 tables. Accepted by IEEE Transactions on Quantum Engineering
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

This paper addresses the Bi-Objective Traveling Thief Problem (BI-TTP), a challenging multi-objective optimization problem that requires the simultaneous optimization of travel cost and item profit. Conventional methods for the BI-TTP often face severe scalability issues due to the complex interdependence between routing and packing decisions, as well as the inherent complexity and large problem size. These difficulties render classical computing approaches increasingly inapplicable. To tackle this, we propose an advanced hybrid approach that combines quantum annealing (QA) with the $\varepsilon$-constraint method. Specifically, we reformulate the bi-objective problem into a single-objective formulation by restricting the second objective through adjustable $\varepsilon$-levels, determined within established upper and lower bounds. The resulting subproblem involves a sum of fractional terms, which is reformulated with auxiliary variables into an equivalent form. Subsequently, the equivalent formulation is transformed into a Quadratic Unconstrained Binary Optimization (QUBO) model, enabling direct solution via a quantum annealing (QA) solver. The solutions obtained from the quantum annealer are subsequently refined using a tailored heuristic procedure to further enhance overall performance. By leveraging the flexibility in selecting $\varepsilon$ parameters, our approach effectively captures a broad Pareto front, enhancing solution diversity. Experimental results on benchmark instances demonstrate that the proposed method effectively balances two objectives and outperforms baseline approaches in time efficiency.

[621]  arXiv:2603.18042 (cross-list from eess.IV) [pdf, ps, other]
Title: A Novel Framework using Intuitionistic Fuzzy Logic with U-Net and U-Net++ Architecture: A case Study of MRI Bain Image Segmentation
Comments: 13 pages, 8 figures
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)

Accurate segmentation of brain images from magnetic resonance imaging (MRI) scans plays a pivotal role in brain image analysis and the diagnosis of neurological disorders. Deep learning algorithms, particularly U-Net and U-Net++, are widely used for image segmentation. However, it finds difficult to deal with uncertainty in images. To address this challenge, this work integrates intuitionistic fuzzy logic into U-Net and U-Net++, propose a novel framework, named as IFS U-Net and IFS U-Net++. These models accept input data in an intuitionistic fuzzy representation to manage uncertainty arising from vague ness and imprecise data. This approach effectively handles tissue ambiguity caused by the partial volume effect and boundary uncertainties. To evaluate the effectiveness of IFS U-Net and IFS U-Net++, experiments are conducted on two publicly available MRI brain datasets: the Internet Brain Segmentation Repository (IBSR) and the Open Access Series of Imaging Studies (OASIS). Segmentation performance is quantitatively assessed using Accuracy, Dice Coefficient, and Intersection over Union (IoU). The results demonstrate that the proposed architectures consistently improve segmentation performance by effectively addressing uncertainty

[622]  arXiv:2603.18076 (cross-list from q-bio.BM) [pdf, ps, other]
Title: Generative Replica-Exchange: A Flow-based Framework for Accelerating Replica Exchange Simulations
Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)

Replica exchange (REX) is one of the most widely used enhanced sampling methodologies, yet its efficiency is limited by the requirement for a large number of intermediate temperature replicas. Here we present Generative Replica Exchange (GREX), which integrates deep generative models into the REX framework to eliminate this temperature ladder. Drawing inspiration from reservoir replica exchange (res-REX), GREX utilizes trained normalizing flows to generate high-temperature configurations on demand and map them directly to the target distribution using the potential energy as a constraint, without requiring target-temperature training data. This approach reduces production simulations to a single replica at the target temperature while maintaining thermodynamic rigor through Metropolis exchange acceptance. We validate GREX on three benchmark systems of increasing complexity, highlighting its superior efficiency and practical applicability for molecular simulations.

[623]  arXiv:2603.18097 (cross-list from quant-ph) [pdf, ps, other]
Title: One Key Good, L Keys Better: List Decoding Meets Quantum Privacy Amplification
Comments: 18 pages
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)

We introduce list privacy amplification (LPA), a relaxation of the final step of quantum key distribution (QKD) in which Alice and Bob extract a list of $L$ candidate keys from a raw string correlated with an eavesdropper Eve, with the guarantee that at least one key is perfectly secret while Eve cannot identify which. This parallels list decoding in error-correcting codes: relaxing unique decoding to list decoding increases the decoding radius; analogously, list extraction increases achievable key length beyond the standard quantum leftover hash lemma (QLHL). Within the abstract cryptography framework, we formalise LPA and prove the \emph{Quantum List Leftover Hash Lemma} (QLLHL): an $L$-list of $\ell$-bit keys can be extracted from an $n$-bit source with smooth min-entropy $k$ iff \[ \ell \le k + \log L - 2\log(1/\epsilon) - 3, \] yielding a tight additive $\log L$ gain over QLHL. This gain arises because the index of the secure key is chosen after hashing and hidden from Eve, effectively contributing $\log L$ bits of entropy. Applying QLLHL to BB84-type QKD, a list size $L = 2^{\alpha n'}$ increases the tolerable phase-error threshold from $h^{-1}(1 - h(e_b))$ to $h^{-1}(1 - h(e_b) + \alpha)$, exceeding the standard $\approx 11\%$ bound for any $\alpha > 0$. We prove tightness via a matching intercept-resend attack, establish composability with Wegman--Carter authentication, and present two constructions: a polynomial inner-product hash over $\mathbb{F}_{2^m}$ and a Toeplitz-based variant, running in $O(nL)$ and $O(nL \log n)$ time.

[624]  arXiv:2603.18109 (cross-list from astro-ph.HE) [pdf, ps, other]
Title: Discovery of Bimodal Drift Rate Structure in FRB 20240114A: Evidence for Dual Emission Regions
Authors: Santosh Arron
Comments: 9 pages, 4 figures, accepted for publication in The Astrophysical Journal
Subjects: High Energy Astrophysical Phenomena (astro-ph.HE); Artificial Intelligence (cs.AI)

We report the discovery of bimodal structure in the drift rate distribution of upward-drifting burst clusters from the hyperactive repeating fast radio burst FRB 20240114A. Using unsupervised machine learning (UMAP dimensionality reduction combined with HDBSCAN density-based clustering) applied to 233 upward-drifting burst clusters from the FAST telescope dataset, we identify a distinct subpopulation of 45 burst clusters (Cluster C1) with mean drift rates 2.5x higher than typical upward-drifting burst clusters (245.6 vs 98.1 MHz/ms). Gaussian mixture modeling reveals strong evidence for bimodality (delta-BIC = 296.6), with clearly separated modes (Ashman's D = 2.70 > 2) and a statistically significant gap in the distribution (11.3 sigma). Crucially, we demonstrate that this bimodality persists when restricting the analysis to single-component (U1) burst clusters only (delta-BIC = 19.9, Ashman's D = 2.71), confirming that the result is not an artifact of combining single- and multi-component burst clusters with different drift rate definitions. The extreme-drift subpopulation also exhibits systematically lower peak frequencies (-7%), shorter durations (-29%), and distinct clustering in multi-dimensional feature space. These findings are suggestive of two spatially separated emission regions in the magnetosphere, each producing upward-drifting burst clusters with distinct physical characteristics, although confirmation requires observations from additional epochs and sources.

[625]  arXiv:2603.18114 (cross-list from stat.ME) [pdf, ps, other]
Title: Transfer Learning for Contextual Joint Assortment-Pricing under Cross-Market Heterogeneity
Subjects: Methodology (stat.ME); Machine Learning (cs.LG)

We study transfer learning for contextual joint assortment-pricing under a multinomial logit choice model with bandit feedback. A seller operates across multiple related markets and observes only posted prices and realized purchases. While data from source markets can accelerate learning in a target market, cross-market differences in customer preferences may introduce systematic bias if pooled indiscriminately.
We model heterogeneity through a structured utility shift, where markets share a common contextual utility structure but differ along a sparse set of latent preference coordinates. Building on this, we develop Transfer Joint Assortment-Pricing (TJAP), a bias-aware framework that combines aggregate-then-debias estimation with a UCB-style policy. TJAP constructs two-radius confidence bounds that separately capture statistical uncertainty and transfer-induced bias, uniformly over continuous prices.
We establish matching minimax regret bounds of order $\tilde{O}\!\left(d\sqrt{\frac{T}{1+H}} + s_0\sqrt{T}\right),$revealing a transparent variance-bias tradeoff: transfer accelerates learning along shared preference directions, while heterogeneous components impose an irreducible adaptation cost. Numerical experiments corroborate the theory, showing that TJAP outperforms both target-only learning and naive pooling while remaining robust to cross-market differences.

[626]  arXiv:2603.18123 (cross-list from eess.IV) [pdf, ps, other]
Title: Understanding Task Aggregation for Generalizable Ultrasound Foundation Models
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)

Foundation models promise to unify multiple clinical tasks within a single framework, but recent ultrasound studies report that unified models can underperform task-specific baselines. We hypothesize that this degradation arises not from model capacity limitations, but from task aggregation strategies that ignore interactions between task heterogeneity and available training data scale. In this work, we systematically analyze when heterogeneous ultrasound tasks can be jointly learned without performance loss, establishing practical criteria for task aggregation in unified clinical imaging models. We introduce M2DINO, a multi-organ, multi-task framework built on DINOv3 with task-conditioned Mixture-of-Experts blocks for adaptive capacity allocation. We systematically evaluate 27 ultrasound tasks spanning segmentation, classification, detection, and regression under three paradigms: task-specific, clinically-grouped, and all-task unified training. Our results show that aggregation effectiveness depends strongly on training data scale. While clinically-grouped training can improve performance in data-rich settings, it may induce substantial negative transfer in low-data settings. In contrast, all-task unified training exhibits more consistent performance across clinical groups. We further observe that task sensitivity varies by task type in our experiments: segmentation shows the largest performance drops compared with regression and classification. These findings provide practical guidance for ultrasound foundation models, emphasizing that aggregation strategies should jointly consider training data availability and task characteristics rather than relying on clinical taxonomy alone.

[627]  arXiv:2603.18136 (cross-list from quant-ph) [pdf, ps, other]
Title: Towards sample-optimal learning of bosonic Gaussian quantum states
Comments: 59 pages, 3 figures, 1 table. Comments welcome
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Machine Learning (cs.LG); Mathematical Physics (math-ph)

Continuous-variable systems enable key quantum technologies in computation, communication, and sensing. Bosonic Gaussian states emerge naturally in various such applications, including gravitational-wave and dark-matter detection. A fundamental question is how to characterize an unknown bosonic Gaussian state from as few samples as possible. Despite decades-long exploration, the ultimate efficiency limit remains unclear. In this work, we study the necessary and sufficient number of copies to learn an $n$-mode Gaussian state, with energy less than $E$, to $\varepsilon$ trace distance with high probability. We prove a lower bound of $\Omega(n^3/\varepsilon^2)$ for Gaussian measurements, matching the best known upper bound up to doubly-log energy dependence, and ${\Omega}(n^2/\varepsilon^2)$ for arbitrary measurements. We further show an upper bound of $\widetilde{O}(n^2/\varepsilon^2)$ given that the Gaussian state is promised to be either pure or passive. Interestingly, while Gaussian measurements suffice for nearly optimal learning of pure Gaussian states, non-Gaussian measurements are provably required for optimal learning of passive Gaussian states. Finally, focusing on learning single-mode Gaussian states via non-entangling Gaussian measurements, we provide a nearly tight bound of $\widetilde\Theta(E/\varepsilon^2)$ for any non-adaptive schemes, showing adaptivity is indispensable for nearly energy-independent scaling. As a byproduct, we establish sharp bounds on the trace distance between Gaussian states in terms of the total variation distance between their Wigner distributions, and obtain a nearly tight sample complexity bound for learning the Wigner distribution of any Gaussian state to $\varepsilon$ total variation distance. Our results greatly advance quantum learning theory in the bosonic regimes and have practical impact in quantum sensing and benchmarking applications.

[628]  arXiv:2603.18145 (cross-list from astro-ph.IM) [pdf, ps, other]
Title: Setting SAIL: Leveraging Scientist-AI-Loops for Rigorous Visualization Tools
Comments: 10 pages (+ references), 4 figures. Interactive visualizations available at: this https URL and this https URL
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Human-Computer Interaction (cs.HC)

Scientists across all disciplines share a common challenge: the divide between their theoretical knowledge and the specialized skills and time needed to build interactive tools to communicate this expertise. While large language models (LLMs) offer unparalleled acceleration in code generation, they frequently prioritize functional syntax over scientific accuracy, risking visually convincing but scientifically invalid results. This work advocates the Scientist-AI-Loop (SAIL), a framework designed to harness this speed without compromising rigor. By separating domain logic from code syntax, SAIL enables researchers to maintain strict oversight of scientific concepts and constraints while delegating code implementation to AI. We illustrate this approach through two open-source, browser-based astrophysics tools: an interactive gravitational lensing visualization and a large-scale structure formation sandbox, both publicly available. Our methodology condensed development to mere days while maintaining scientific integrity. We specifically address failure modes where AI-generated code neglects phenomenological boundaries or scientific validity. While cautioning that research-grade code requires stringent protocols, we demonstrate through two examples that SAIL provides an effective code generation workflow for outreach, teaching, professional presentations, and early-stage research prototyping. This framework contributes to a foundation for the further development of AI-assisted scientific software.

[629]  arXiv:2603.18168 (cross-list from stat.ML) [pdf, ps, other]
Title: ResNets of All Shapes and Sizes: Convergence of Training Dynamics in the Large-scale Limit
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We establish convergence of the training dynamics of residual neural networks (ResNets) to their joint infinite depth L, hidden width M, and embedding dimension D limit. Specifically, we consider ResNets with two-layer perceptron blocks in the maximal local feature update (MLU) regime and prove that, after a bounded number of training steps, the error between the ResNet and its large-scale limit is O(1/L + sqrt(D/(L M)) + 1/sqrt(D)). This error rate is empirically tight when measured in embedding space. For a budget of P = Theta(L M D) parameters, this yields a convergence rate O(P^(-1/6)) for the scalings of (L, M, D) that minimize the bound. Our analysis exploits in an essential way the depth-two structure of residual blocks and applies formally to a broad class of state-of-the-art architectures, including Transformers with bounded key-query dimension. From a technical viewpoint, this work completes the program initiated in the companion paper [Chi25] where it is proved that for a fixed embedding dimension D, the training dynamics converges to a Mean ODE dynamics at rate O(1/L + sqrt(D)/sqrt(L M)). Here, we study the large-D limit of this Mean ODE model and establish convergence at rate O(1/sqrt(D)), yielding the above bound by a triangle inequality. To handle the rich probabilistic structure of the limit dynamics and obtain one of the first rigorous quantitative convergence for a DMFT-type limit, we combine the cavity method with propagation of chaos arguments at a functional level on so-called skeleton maps, which express the weight updates as functions of CLT-type sums from the past.

[630]  arXiv:2603.18190 (cross-list from stat.ML) [pdf, ps, other]
Title: Starting Off on the Wrong Foot: Pitfalls in Data Preparation
Comments: 42 pages, 37 references
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)

When working with real-world insurance data, practitioners often encounter challenges during the data preparation stage that can undermine the statistical validity and reliability of downstream modeling. This study illustrates that conventional data preparation procedures such as random train-test partitioning, often yield unreliable and unstable results when confronted with highly imbalanced insurance loss data. To mitigate these limitations, we propose a novel data preparation framework leveraging two recent statistical advancements: support points for representative data splitting to ensure distributional consistency across partitions, and the Chatterjee correlation coefficient for initial, non-parametric feature screening to capture feature relevance and dependence structure. We further integrate these theoretical advances into a unified, efficient framework that also incorporates missing-data handling, and embed this framework within our custom InsurAutoML pipeline. The performance of the proposed approach is evaluated using both simulated datasets and datasets often cited in the academic literature. Our findings definitively demonstrate that incorporating statistically rigorous data preparation methods not only significantly enhances model robustness and interpretability but also substantially reduces computational resource requirements across diverse insurance loss modeling tasks. This work provides a crucial methodological upgrade for achieving reliable results in high stakes insurance applications.

[631]  arXiv:2603.18205 (cross-list from cond-mat.str-el) [pdf, ps, other]
Title: Tackling the Sign Problem in the Doped Hubbard Model with Normalizing Flows
Comments: 10 pages, 8 figures
Subjects: Strongly Correlated Electrons (cond-mat.str-el); Machine Learning (cs.LG); High Energy Physics - Lattice (hep-lat)

The Hubbard model at finite chemical potential is a cornerstone for understanding doped correlated systems, but simulations are severely limited by the sign problem. In the auxiliary-field formulation, the spin basis mitigates the sign problem, yet severe ergodicity issues have limited its use. We extend recent advances with normalizing flows at half-filling to finite chemical potential by introducing an annealing scheme enabling ergodic sampling. Compared to state-of-the-art hybrid Monte Carlo in the charge basis, our approach accurately reproduces exact diagonalization results while reducing statistical uncertainties by an order of magnitude, opening a new path for simulations of doped correlated systems.

[632]  arXiv:2603.18225 (cross-list from stat.ML) [pdf, ps, other]
Title: A Hybrid Conditional Diffusion-DeepONet Framework for High-Fidelity Stress Prediction in Hyperelastic Materials
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Predicting stress fields in hyperelastic materials with complex microstructures remains challenging for traditional deep learning surrogates, which struggle to capture both sharp stress concentrations and the wide dynamic range of stress magnitudes. Convolutional architectures such as UNet tend to oversmooth high-frequency gradients, while neural operators like DeepONet exhibit spectral bias and underpredict localized extremes. Diffusion models can recover fine-scale structure but often introduce low-frequency amplitude drift, degrading physical scaling. To address these limitations, we propose a hybrid surrogate framework, cDDPM-DeepONet, that decouples stress morphology from magnitude. A conditional denoising diffusion probabilistic model (cDDPM), built on a UNet backbone, generates normalized von Mises stress fields conditioned on geometry and loading. In parallel, a modified DeepONet predicts global scaling parameters (minimum and maximum stress), enabling reconstruction of full-resolution physical stress maps. This separation allows the diffusion model to focus on spatial structure while the operator network corrects global amplitude, mitigating spectral and scaling biases. We evaluate the framework on nonlinear hyperelastic datasets with single and multiple polygonal voids. The proposed model consistently outperforms UNet, DeepONet, and standalone cDDPM baselines by one to two orders of magnitude. Spectral analysis shows strong agreement with finite element solutions across all wavenumbers, preserving both global behavior and localized stress concentrations.

[633]  arXiv:2603.18231 (cross-list from quant-ph) [pdf, ps, other]
Title: Iterative Decoding of Stabilizer Codes under Radiation-Induced Correlated Noise
Comments: 14 pages, 14 figures, 2 tables, 2 algorithms
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Signal Processing (eess.SP)

Fault-tolerant quantum computation demands extremely low logical error rates, yet superconducting qubit arrays are subject to radiation-induced correlated noise arising from cosmic-ray muon-generated quasiparticles. The quasiparticle density is unknown and time-varying, resulting in a mismatch between the true noise statistics and the priors assumed by standard decoders, and consequently, degraded logical performance. We formalize joint noise sensing and decoding using syndrome measurements by modeling the QP density as a latent variable, which governs correlation in physical errors and syndrome measurements. Starting from a variational expectation--maximization approach, we derive an iterative algorithm that alternates between QP density estimation and syndrome-based decoding under the updated noise model. Simulations of surface-code and bivariate bicycle quantum memory under radiation-induced correlated noise demonstrate a measurable reduction in logical error probability relative to baseline decoding with a uniform prior. Beyond improved decoding performance, the inferred QP density provides diagnostic information relevant to device characterization, shielding, and chip design. These results indicate that integrating physical noise estimation into decoding can mitigate correlated noise effects and relax effective error-rate requirements for fault-tolerant quantum computation.

[634]  arXiv:2603.18239 (cross-list from q-bio.QM) [pdf, ps, other]
Title: Impact of automatic speech recognition quality on Alzheimer's disease detection from spontaneous speech: a reproducible benchmark study with lexical modeling and statistical validation
Authors: Himadri Samanta
Comments: 22 pages, 7 figures
Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Machine Learning (cs.LG)

Early detection of Alzheimer's disease from spontaneous speech has emerged as a promising non-invasive screening approach. However, the influence of automatic speech recognition (ASR) quality on downstream clinical language modeling remains insufficiently understood. In this study, we investigate Alzheimer's disease detection using lexical features derived from Whisper ASR transcripts on the ADReSSo 2021 diagnosis dataset. We evaluate interpretable machine-learning models, including Logistic Regression and Linear Support Vector Machines, using TF-IDF text representations under repeated 5x5 stratified cross-validation.
Our results demonstrate that transcript quality has a statistically significant impact on classification performance. Models trained on Whisper-small transcripts consistently outperform those using Whisper-base transcripts, achieving balanced accuracy above 0.7850 with Linear SVM. Paired statistical testing confirms that the observed improvements are significant. Importantly, classifier complexity contributes less to performance variation than ASR transcription quality. Feature analysis reveals that cognitively normal speakers produce more semantically precise object- and scene-descriptive language, whereas Alzheimer's speech is characterized by vagueness, discourse markers, and increased hesitation patterns.
These findings suggest that high-quality ASR can enable simple, interpretable lexical models to achieve competitive Alzheimer's detection performance without explicit acoustic modeling. The study provides a reproducible benchmark pipeline and highlights ASR selection as a critical modeling decision in clinical speech-based artificial intelligence systems.

[635]  arXiv:2603.18243 (cross-list from math.NT) [pdf, ps, other]
Title: Why Eight Percent of Benford Sequences Never Converge
Authors: James M. Hyman
Comments: 35 pages, 5 figures; 35-page Supplementary Information (ancillary file); code at this https URL
Subjects: Number Theory (math.NT); Information Theory (cs.IT)

We study multi-digit correlations in Benford sequences b^n for integer bases 2 <= b <= 1000, measuring dependence via conditional mutual information (CMI). A resonance ratio derived from the continued fraction expansion of log_10(b) classifies bases into convergent and persistent regimes (Theorem 3.13): among 996 bases surveyed, 84 (8.4%) exhibit persistent correlations at sample depth N = 10,000, and extended computation to N = 200,000 confirms 53 (5.3%) as genuinely persistent. We prove that CMI deviation is bounded by the distribution error (Theorem 3.4); exhaustive computation across 2,988 test cases confirms that the effective scaling is quadratic, yielding a two-sided rate beta = 2 for bounded-type bases (conditional on a computationally verified Hessian positivity condition). The observed effective exponent across 774 convergent bases is beta_eff = 1.72 +/- 0.19, consistent with finite-sample corrections to the asymptotic rate. We conjecture that the persistence rate converges to 1/12, a prediction grounded in the Gauss-Kuzmin distribution of partial quotients. For persistent bases, the convergence threshold N_epsilon exceeds 10^6 at standard precision, rendering the asymptotic limit observationally irrelevant within our computational scope.

[636]  arXiv:2603.18296 (cross-list from nlin.SI) [pdf, ps, other]
Title: Nonlinear Incompressible Shear Wave Models in Hyperelasticity and Viscoelasticity Frameworks, with Applications to Love Waves
Subjects: Exactly Solvable and Integrable Systems (nlin.SI); Mathematical Physics (math-ph); Analysis of PDEs (math.AP); Numerical Analysis (math.NA)

General equations describing shear displacements in incompressible hyperelastic materials, holding for an arbitrary form of strain energy density function, are presented and applied to the description of nonlinear Love-type waves propagating on an interface between materials with different mechanical properties. The model is valid for a broad class of hyper-viscoelastic materials. For a cubic Yeoh model, shear wave equations contain cubic and quintic differential polynomial terms, including viscoelasticity contributions in terms of dispersion terms that include mixed derivatives $u_{xxt}$ of the material displacement. Full (2+1)-dimensional numerical simulations of waves propagating in the bulk of a two-layered solid are undertaken and analyzed with respect to the source position and mechanical properties of the layers. Interfacial nonlinear Love waves and free upper surface shear waves are tracked; it is demonstrated that in the fully nonlinear case, the variable wave speed of interface and surface waves generally satisfies the linear Love wave existence condition $c_1 < \abs{v} < c_2$, while tending to the larger material wave speed $c_1$ or $c_2$ for large times.

[637]  arXiv:2603.18318 (cross-list from quant-ph) [pdf, ps, other]
Title: Efficient Soft-Output Guessing for Enhanced Quantum Tanner Code Decoding
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

We introduce a generalized low-density parity-check decoding framework for quantum Tanner codes utilizing soft-output guessing random additive noise decoding (SOGRAND). By soft-output decoding entire component codes, we mitigate trapping sets and cycles, resulting in improved convergence. SOGRAND, combined with ordered statistic decoding (OSD) post-processing, outperforms the standard belief propagation plus OSD baseline by up to three orders of magnitude in logical error rate, providing a way forward for scalable decoding of the emerging class of Tanner-code-based quantum codes.

[638]  arXiv:2603.18321 (cross-list from math.CT) [pdf, ps, other]
Title: A Simple Categorical Calculus of Interacting Processes
Comments: In peer review
Subjects: Category Theory (math.CT); Logic in Computer Science (cs.LO)

We present a calculus that models a simple sort of process interaction. Our calculus consists of a collection of terms together with a rewrite relation, parameterised by an arbitrary multicategory whose morphisms we understand as non-interactive processes. We show that our calculus is confluent and terminating, and that terms modulo the induced convertibility relation form a virtual double category. We relate our calculus to the free cornering of a monoidal category, which is a double-categorical model of process interaction that is similar in spirit to the calculus presented herein. Precisely, we construct a functor from the virtual double category given by our calculus into the underlying virtual double category of the free cornering of the free monoidal category on the multicategory of non-interacting processes. If we think of the terms of our calculus as programs and the rewriting system as an operational semantics for these programs, this functor gives a sound denotational semantics for our calculus in terms of the free cornering.

[639]  arXiv:2603.18389 (cross-list from physics.chem-ph) [pdf, ps, other]
Title: An SO(3)-equivariant reciprocal-space neural potential for long-range interactions
Subjects: Chemical Physics (physics.chem-ph); Artificial Intelligence (cs.AI)

Long-range electrostatic and polarization interactions play a central role in molecular and condensed-phase systems, yet remain fundamentally incompatible with locality-based machine-learning interatomic potentials. Although modern SO(3)-equivariant neural potentials achieve high accuracy for short-range chemistry, they cannot represent the anisotropic, slowly decaying multipolar correlations governing realistic materials, while existing long-range extensions either break SO(3) equivariance or fail to maintain energy-force consistency. Here we introduce EquiEwald, a unified neural interatomic potential that embeds an Ewald-inspired reciprocal-space formulation within an irreducible SO(3)-equivariant framework. By performing equivariant message passing in reciprocal space through learned equivariant k-space filters and an equivariant inverse transform, EquiEwald captures anisotropic, tensorial long-range correlations without sacrificing physical consistency. Across periodic and aperiodic benchmarks, EquiEwald captures long-range electrostatic behavior consistent with ab initio reference data and consistently improves energy and force accuracy, data efficiency, and long-range extrapolation. These results establish EquiEwald as a physically principled paradigm for long-range-capable machine-learning interatomic potentials.

[640]  arXiv:2603.18404 (cross-list from stat.ML) [pdf, ps, other]
Title: Multi-Domain Causal Empirical Bayes Under Linear Mixing
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

Causal representation learning (CRL) aims to learn low-dimensional causal latent variables from high-dimensional observations. While identifiability has been extensively studied for CRL, estimation has been less explored. In this paper, we explore the use of empirical Bayes (EB) to estimate causal representations. In particular, we consider the problem of learning from data from multiple domains, where differences between domains are modeled by interventions in a shared underlying causal model. Multi-domain CRL naturally poses a simultaneous inference problem that EB is designed to tackle. Here, we propose an EB $f$-modeling algorithm that improves the quality of learned causal variables by exploiting invariant structure within and across domains. Specifically, we consider a linear measurement model and interventional priors arising from a shared acyclic SCM. When the graph and intervention targets are known, we develop an EM-style algorithm based on causally structured score matching. We further discuss EB $\rmg$-modeling in the context of existing CRL approaches. In experiments on synthetic data, our proposed method achieves more accurate estimation than other methods for CRL.

[641]  arXiv:2603.18413 (cross-list from stat.ML) [pdf, ps, other]
Title: Statistical Testing Framework for Clustering Pipelines by Selective Inference
Comments: 59 pages, 11 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

A data analysis pipeline is a structured sequence of steps that transforms raw data into meaningful insights by integrating multiple analysis algorithms.In many practical applications, analytical findings are obtained only after data pass through several data-dependent procedures within such pipelines.In this study, we address the problem of quantifying the statistical reliability of results produced by data analysis pipelines.As a proof of concept, we focus on clustering pipelines that identify cluster structures from complex and heterogeneous data through procedures such as outlier detection, feature selection, and clustering.We propose a novel statistical testing framework to assess the significance of clustering results obtained through these pipelines.Our framework, based on selective inference, enables the systematic construction of valid statistical tests for clustering pipelines composed of predefined components.We prove that the proposed test controls the type I error rate at any nominal level and demonstrate its validity and effectiveness through experiments on synthetic and real datasets.

[642]  arXiv:2603.18454 (cross-list from math.OC) [pdf, ps, other]
Title: Fundamental Limits for Sensor-Based Control via the Gibbs Variational Principle
Comments: 6 pages, 1 figure
Subjects: Optimization and Control (math.OC); Robotics (cs.RO); Systems and Control (eess.SY)

Fundamental limits on the performance of feedback controllers are essential for benchmarking algorithms, guiding sensor selection, and certifying task feasibility -- yet few general-purpose tools exist for computing them. Existing information-theoretic approaches overestimate the information a sensor must provide by evaluating it against the uncontrolled system, producing bounds that degrade precisely when feedback is most valuable. We derive a lower bound on the minimum expected cost of any causal feedback controller under partial observations by applying the Gibbs variational principle to the joint path measure over states and observations. The bound applies to nonlinear, nonholonomic, and hybrid dynamics with unbounded costs and admits a self-consistent refinement: any good controller concentrates the state, which limits the information the sensor can extract, which tightens the bound. The resulting fixed-point equation has a unique solution computable by bisection, and we provide conditions under which the free energy minimization is provably convex, yielding a certifiably correct numerical bound. On a nonlinear Dubins car tracking problem, the self-consistent bound captures most of the optimal cost across sensor noise levels, while the open-loop variant is vacuous at low noise.

[643]  arXiv:2603.18458 (cross-list from math.OC) [pdf, ps, other]
Title: Axis-Aligned Relaxations for Mixed-Integer Nonlinear Programming
Subjects: Optimization and Control (math.OC); Computational Geometry (cs.CG); Mathematical Software (cs.MS)

We present a novel relaxation framework for general mixed-integer nonlinear programming (MINLP) grounded in computational geometry. Our approach constructs polyhedral relaxations by convexifying finite sets of strategically chosen points, iteratively refining the approximation to converge toward the simultaneous convex hull of factorable function graphs. The framework is underpinned by three key contributions: (i) a new class of explicit inequalities for products of functions that strictly improve upon standard factorable and composite relaxation schemes; (ii) a proof establishing that the simultaneous convex hull of multilinear functions over axis-aligned regions is fully determined by their values at corner points, thereby generalizing existing results from hypercubes to arbitrary axis-aligned domains; and (iii) the integration of computational geometry tools, specifically voxelization and QuickHull, to efficiently approximate feasible regions and function graphs. We implement this framework and evaluate it on randomly generated polynomial optimization problems and a suite of 619 instances from \texttt{MINLPLib}. Numerical results demonstrate significant improvements over state-of-the-art benchmarks: on polynomial instances, our relaxation closes an additional 20--25\% of the optimality gap relative to standard methods on half the instances. Furthermore, compared against an enhanced factorable programming baseline and Gurobi's root-node bounds, our approach yields superior dual bounds on approximately 30\% of \texttt{MINLPLib} instances, with roughly 10\% of cases exhibiting a gap reduction exceeding 50\%.

[644]  arXiv:2603.18483 (cross-list from stat.ML) [pdf, ps, other]
Title: Precise Performance of Linear Denoisers in the Proportional Regime
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

In the present paper we study the performance of linear denoisers for noisy data of the form $\mathbf{x} + \mathbf{z}$, where $\mathbf{x} \in \mathbb{R}^d$ is the desired data with zero mean and unknown covariance $\mathbf{\Sigma}$, and $\mathbf{z} \sim \mathcal{N}(0, \mathbf{\Sigma}_{\mathbf{z}})$ is additive noise. Since the covariance $\mathbf{\Sigma}$ is not known, the standard Wiener filter cannot be employed for denoising. Instead we assume we are given samples $\mathbf{x}_1,\dots,\mathbf{x}_n \in \mathbb{R}^d$ from the true distribution. A standard approach would then be to estimate $\mathbf{\Sigma}$ from the samples and use it to construct an ``empirical" Wiener filter. However, in this paper, motivated by the denoising step in diffusion models, we take a different approach whereby we train a linear denoiser $\mathbf{W}$ from the data itself. In particular, we synthetically construct noisy samples $\hat{\mathbf{x}}_i$ of the data by injecting the samples with Gaussian noise with covariance $\mathbf{\Sigma}_1 \neq \mathbf{\Sigma}_{\mathbf{z}}$ and find the best $\mathbf{W}$ that approximates $\mathbf{W}\hat{\mathbf{x}}_i \approx \mathbf{x}_i$ in a least-squares sense. In the proportional regime $\frac{n}{d} \rightarrow \kappa > 1$ we use the {\it Convex Gaussian Min-Max Theorem (CGMT)} to analytically find the closed form expression for the generalization error of the denoiser obtained from this process. Using this expression one can optimize over $\mathbf{\Sigma}_1$ to find the best possible denoiser. Our numerical simulations show that our denoiser outperforms the ``empirical" Wiener filter in many scenarios and approaches the optimal Wiener filter as $\kappa\rightarrow\infty$.

[645]  arXiv:2603.18497 (cross-list from q-bio.QM) [pdf, ps, other]
Title: Recovering Sparse Neural Connectivity from Partial Measurements: A Covariance-Based Approach with Granger-Causality Refinement
Authors: Quilee Simeon
Subjects: Quantitative Methods (q-bio.QM); Neural and Evolutionary Computing (cs.NE)

Inferring the connectivity of neural circuits from incomplete observations is a fundamental challenge in neuroscience. We present a covariance-based method for estimating the weight matrix of a recurrent neural network from sparse, partial measurements across multiple recording sessions. By accumulating pairwise covariance estimates across sessions where different subsets of neurons are observed, we reconstruct the full connectivity matrix without requiring simultaneous recording of all neurons. A Granger-causality refinement step enforces biological constraints via projected gradient descent. Through systematic experiments on synthetic networks modeling small brain circuits, we characterize a fundamental control-estimation tradeoff: stimulation aids identifiability but disrupts intrinsic dynamics, with the optimal level depending on measurement density. We discover that the ``incorrect'' linear approximation acts as implicit regularization -- outperforming the oracle estimator with known nonlinearity at all operating regimes -- and provide an exact characterization via the Stein--Price identity.

[646]  arXiv:2603.18503 (cross-list from math.OC) [pdf, ps, other]
Title: Computationally Efficient Density-Driven Optimal Control via Analytical KKT Reduction and Contractive MPC
Subjects: Optimization and Control (math.OC); Multiagent Systems (cs.MA); Robotics (cs.RO)

Efficient coordination for collective spatial distribution is a fundamental challenge in multi-agent systems. Prior research on Density-Driven Optimal Control (D2OC) established a framework to match agent trajectories to a desired spatial distribution. However, implementing this as a predictive controller requires solving a large-scale Karush-Kuhn-Tucker (KKT) system, whose computational complexity grows cubically with the prediction horizon. To resolve this, we propose an analytical structural reduction that transforms the T-horizon KKT system into a condensed quadratic program (QP). This formulation achieves O(T) linear scalability, significantly reducing the online computational burden compared to conventional O(T^3) approaches. Furthermore, to ensure rigorous convergence in dynamic environments, we incorporate a contractive Lyapunov constraint and prove the Input-to-State Stability (ISS) of the closed-loop system against reference propagation drift. Numerical simulations verify that the proposed method facilitates rapid density coverage with substantial computational speed-up, enabling long-horizon predictive control for large-scale multi-agent swarms.

[647]  arXiv:2603.18514 (cross-list from stat.ML) [pdf, ps, other]
Title: On the Peril of (Even a Little) Nonstationarity in Satisficing Regret Minimization
Comments: 21 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Motivated by the principle of satisficing in decision-making, we study satisficing regret guarantees for nonstationary $K$-armed bandits. We show that in the general realizable, piecewise-stationary setting with $L$ stationary segments, the optimal regret is $\Theta(L\log T)$ as long as $L\geq 2$. This stands in sharp contrast to the case of $L=1$ (i.e., the stationary setting), where a $T$-independent $\Theta(1)$ satisficing regret is achievable under realizability. In other words, the optimal regret has to scale with $T$ even if just a little nonstationarity presents. A key ingredient in our analysis is a novel Fano-based framework tailored to nonstationary bandits via a \emph{post-interaction reference} construction. This framework strictly extends the classical Fano method for passive estimation as well as recent interactive Fano techniques for stationary bandits. As a complement, we also discuss a special regime in which constant satisficing regret is again possible.

[648]  arXiv:2603.18544 (cross-list from eess.IV) [pdf, ps, other]
Title: SCISSR: Scribble-Conditioned Interactive Surgical Segmentation and Refinement
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Accurate segmentation of tissues and instruments in surgical scenes is annotation-intensive due to irregular shapes, thin structures, specularities, and frequent occlusions. While SAM models support point, box, and mask prompts, points are often too sparse and boxes too coarse to localize such challenging targets. We present SCISSR, a scribble-promptable framework for interactive surgical scene segmentation. It introduces a lightweight Scribble Encoder that converts freehand scribbles into dense prompt embeddings compatible with the mask decoder, enabling iterative refinement for a target object by drawing corrective strokes on error regions. Because all added modules (the Scribble Encoder, Spatial Gated Fusion, and LoRA adapters) interact with the backbone only through its standard embedding interfaces, the framework is not tied to a single model: we build on SAM 2 in this work, yet the same components transfer to other prompt-driven segmentation architectures such as SAM 3 without structural modification. To preserve pre-trained capabilities, we train only these lightweight additions while keeping the remaining backbone frozen. Experiments on EndoVis 2018 demonstrate strong in-domain performance, while evaluation on the out-of-distribution CholecSeg8k further confirms robustness across surgical domains. SCISSR achieves 95.41% Dice on EndoVis 2018 with five interaction rounds and 96.30% Dice on CholecSeg8k with three interaction rounds, outperforming iterative point prompting on both benchmarks.

[649]  arXiv:2603.18551 (cross-list from math.OC) [pdf, ps, other]
Title: Learning Decision-Sufficient Representations for Linear Optimization
Comments: 45 pages, 2 figures, includes appendix
Subjects: Optimization and Control (math.OC); Computational Complexity (cs.CC); Machine Learning (cs.LG)

We study how to construct compressed datasets that suffice to recover optimal decisions in linear programs with an unknown cost vector $c$ lying in a prior set $\mathcal{C}$. Recent work by Bennouna et al. provides an exact geometric characterization of sufficient decision datasets (SDDs) via an intrinsic decision-relevant dimension $d^\star$. However, their algorithm for constructing minimum-size SDDs requires solving mixed-integer programs. In this paper, we establish hardness results showing that computing $d^\star$ is NP-hard and deciding whether a dataset is globally sufficient is coNP-hard, thereby resolving a recent open problem posed by Bennouna et al. To address this worst-case intractability, we introduce pointwise sufficiency, a relaxation that requires sufficiency for an individual cost vector. Under nondegeneracy, we provide a polynomial-time cutting-plane algorithm for constructing pointwise-sufficient decision datasets. In a data-driven regime with i.i.d.\ costs, we further propose a cumulative algorithm that aggregates decision-relevant directions across samples, yielding a stable compression scheme of size at most $d^\star$. This leads to a distribution-free PAC guarantee: with high probability over the training sample, the pointwise sufficiency failure probability on a fresh draw is at most $\tilde{O}(d^\star/n)$, and this rate is tight up to logarithmic factors. Finally, we apply decision-sufficient representations to contextual linear optimization, obtaining compressed predictors with generalization bounds scaling as $\tilde{O}(\sqrt{d^\star/n})$ rather than $\tilde{O}(\sqrt{d/n})$, where $d$ is the ambient cost dimension.

[650]  arXiv:2603.18554 (cross-list from quant-ph) [pdf, ps, other]
Title: End-to-End QGAN-Based Image Synthesis via Neural Noise Encoding and Intensity Calibration
Subjects: Quantum Physics (quant-ph); Computer Vision and Pattern Recognition (cs.CV)

Quantum Generative Adversarial Networks (QGANs) offer a promising path for learning data distributions on near-term quantum devices. However, existing QGANs for image synthesis avoid direct full-image generation, relying on classical post-processing or patch-based methods. These approaches dilute the quantum generator's role and struggle to capture global image semantics. To address this, we propose ReQGAN, an end-to-end framework that synthesizes an entire N=2^D-pixel image using a single D-qubit quantum circuit. ReQGAN overcomes two fundamental bottlenecks hindering direct pixel generation: (1) the rigid classical-to-quantum noise interface and (2) the output mismatch between normalized quantum statistics and the desired pixel-intensity space. We introduce a learnable Neural Noise Encoder for adaptive state preparation and a differentiable Intensity Calibration module to map measurements to a stable, visually meaningful pixel domain. Experiments on MNIST and Fashion-MNIST demonstrate that ReQGAN achieves stable training and effective image synthesis under stringent qubit budgets, with ablation studies verifying the contribution of each component.

[651]  arXiv:2603.18572 (cross-list from eess.IV) [pdf, ps, other]
Title: UEPS: Robust and Efficient MRI Reconstruction
Comments: The document contains the main paper and additional experimental details in the supplementary material. Open-source code can be found at: this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Deep unrolled models (DUMs) have become the state of the art for accelerated MRI reconstruction, yet their robustness under domain shift remains a critical barrier to clinical adoption. In this work, we identify coil sensitivity map (CSM) estimation as the primary bottleneck limiting generalization. To address this, we propose UEPS, a novel DUM architecture featuring three key innovations: (i) an Unrolled Expanded (UE) design that eliminates CSM dependency by reconstructing each coil independently; (ii) progressive resolution, which leverages k-space-to-image mapping for efficient coarse-to-fine refinement; and (iii) sparse attention tailored to MRI's 1D undersampling nature. These physics-grounded designs enable simultaneous gains in robustness and computational efficiency. We construct a large-scale zero-shot transfer benchmark comprising 10 out-of-distribution test sets spanning diverse clinical shifts -- anatomy, view, contrast, vendor, field strength, and coil configurations. Extensive experiments demonstrate that UEPS consistently and substantially outperforms existing DUM, end-to-end, diffusion, and untrained methods across all OOD tests, achieving state-of-the-art robustness with low-latency inference suitable for real-time deployment.

[652]  arXiv:2603.18640 (cross-list from stat.ML) [pdf, ps, other]
Title: A Theoretical Comparison of No-U-Turn Sampler Variants: Necessary and Su?cient Convergence Conditions and Mixing Time Analysis under Gaussian Targets
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)

The No-U-Turn Sampler (NUTS) is the computational workhorse of modern Bayesian software libraries, yet its qualitative and quantitative convergence guarantees were established only recently. A significant gap remains in the theoretical comparison of its two main variants: NUTS-mul and NUTS-BPS, which use multinomial sampling and biased progressive sampling, respectively, for index selection. In this paper, we address this gap in three contributions. First, we derive the first necessary conditions for geometric ergodicity for both variants. Second, we establish the first sufficient conditions for geometric ergodicity and ergodicity for NUTS-mul. Third, we obtain the first mixing time result for NUTS-BPS on a standard Gaussian distribution. Our results show that NUTS-mul and NUTS-BPS exhibit nearly identical qualitative behavior, with geometric ergodicity depending on the tail properties of the target distribution. However, they differ quantitatively in their convergence rates. More precisely, when initialized in the typical set of the canonical Gaussian measure, the mixing times of both NUTS-mul and NUTS-BPS scale as $O(d^{1/4})$ up to logarithmic factors, where $d$ denotes the dimension. Nevertheless, the associated constants are strictly smaller for NUTS-BPS.

[653]  arXiv:2603.18643 (cross-list from math.AG) [pdf, ps, other]
Title: The Geometry of Polycons and a Counterexample to Wachspress' Conjecture
Authors: Clemens Brüser
Comments: 17 pages, 5 figures
Subjects: Algebraic Geometry (math.AG); Numerical Analysis (math.NA)

Polycons, initially introduced by Wachspress in 1975 as a tool in finite element methods, are generalizations of polygons in that they allow conic boundary components. We are interested in the adjoint curve of a given polycon, i.e. the unique curve of minimal degree vanishing in the so-called residual arrangement. It was conjectured by Wachspress that under some regularity assumptions this curve does not vanish in the interior of its defining polycon. However, until recently the only class of polycons for which this was proven were convex polygons. We present a polycon bounded by three conics that constitutes a counterexample to Wachspress' conjecture.
The origin of this counterexample reveals some beautiful geometry of polycons. Replacing one degree two boundary component of a polycon with a line produces a new polycon. We show that the adjoint of the latter is a contact curve to the adjoint of the former. This naturally leads to the consideration of symmetric linear determinantal representations of adjoints, which lets us explicitly describe the fibers of the adjoint map in the case of polycons bounded by three conics. As a corollary we prove that generically the adjoint of a polycon bounded by three conics is smooth.

[654]  arXiv:2603.18650 (cross-list from cond-mat.mtrl-sci) [pdf, ps, other]
Title: DeePAW: A universal machine learning model for orbital-free ab initio calculations
Subjects: Materials Science (cond-mat.mtrl-sci); Databases (cs.DB)

Developing universal machine learning models for ab initio calculations is the frontier of materials cutting edge research in the new era of artificial intelligence. Here, we present the Deep Augment Way model (DeePAW) that is a universal machine learning (ML) model for orbital-free (OF) ab initio calculations, based on the density functional theory (DFT). DeePAW is currently the best OFDFT ML model according to the three criterions, 1) covering the largest number of elements, 2) having the widest application capability to diverse crystal structures, and 3) achieving the highest prediction accuracy without further fine-tuning. These scientific merits and innovations of DeePAW are stemmed from the novel SE(3)-equivariant double massage passing neuron networks. Besides predicting electron density distributions, DeePAW predicts formation energies of crystals as well and therefore paves an efficient avenue for multiscale materials modeling beyond conventional electronic structure calculation methods.

[655]  arXiv:2603.18714 (cross-list from eess.SP) [pdf, ps, other]
Title: Holter-to-Sleep: AI-Enabled Repurposing of Single-Lead ECG for Sleep Phenotyping
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Sleep disturbances are tightly linked to cardiovascular risk, yet polysomnography (PSG)-the clinical reference standard-remains resource-intensive and poorly suited for multi-night, home-based, and large-scale screening. Single-lead electrocardiography (ECG), already ubiquitous in Holter and patch-based devices, enables comfortable long-term acquisition and encodes sleep-relevant physiology through autonomic modulation and cardiorespiratory coupling. Here, we present a proof-of-concept Holter-to-Sleep framework that, using single-lead ECG as the sole input, jointly supports overnight sleep phenotyping and Holter-grade cardiac phenotyping within the same recording, and further provides an explicit analytic pathway for scalable cardio-sleep association studies. The framework is developed and validated on a pooled multi-center PSG sample of 10,439 studies spanning four public cohorts, with independent external evaluation to assess cross-cohort generalizability, and additional real-world feasibility assessment using overnight patch-ECG recordings via objective-subjective consistency analysis. This integrated design enables robust extraction of clinically meaningful overnight sleep phenotypes under heterogeneous populations and acquisition conditions, and facilitates systematic linkage between ECG-derived sleep metrics and arrhythmia-related Holter phenotypes. Collectively, the Holter-to-Sleep paradigm offers a practical foundation for low-burden, home-deployable, and scalable cardio-sleep monitoring and research beyond traditional PSG-centric workflows.

[656]  arXiv:2603.18754 (cross-list from math.CO) [pdf, ps, other]
Title: The red-blue-yellow matching problem
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

We consider the red-blue-yellow matching problem: given two natural numbers $k_R$, $k_B$ and a graph $G$ whose edges are colored red, blue or yellow, the goal is to find a matching of $G$ that contains exactly $k_R$ red edges and exactly $k_B$ blue edges, and is of maximum cardinality subject to these constraints. This is a natural generalization of the well known red-blue matching problem, whose complexity status is unknown: although a randomized polynomial-time algorithm exists, a deterministic algorithm has remained elusive for nearly four decades. The best known deterministic approach to the red-blue matching problem, due to Yuster (2012), gives an additive approximation. In this paper, we show a similar result for the red-blue-yellow matching problem, giving a polynomial-time deterministic algorithm that, under natural assumptions, finds a matching satisfying the color requirements almost exactly and has cardinality within 3 of the optimal solution. Our algorithm is a mix of classic linear programming techniques and ad hoc existence results on restricted classes of graphs such as paths and cycles. As a key ingredient, we prove a curious topological property of plane curves, which is a strengthened version of a result by Grandoni and Zenklusen (2010) in the related context of budgeted matchings.

[657]  arXiv:2603.18776 (cross-list from math.AP) [pdf, ps, other]
Title: Physics-grounded Mechanism Design for Spectrum Sharing between Passive and Active Users
Subjects: Analysis of PDEs (math.AP); Systems and Control (eess.SY)

We propose a physics-grounded mechanism design for dynamic spectrum sharing that bridges the gap between radiometric retrieval constraints and economic incentives. We formulate the active and passive users coexistence problem as a Vickrey-Clarke-Groves (VCG) auctions mechanism, where the radiometer dynamically procures ``quiet'' time-frequency tiles from active users based on the marginal reduction in retrieval error variance. This approach ensures allocative efficiency and dominant-strategy incentive compatibility (DSIC). To overcome the computational intractability of exact VCG on large grids, we derive an approximation algorithm by using the monotone submodularity induced by the radiometer equation. AMSR-2-based simulations show that the approach avoids high-cost tiles by aggregating low-cost spectrum across time and frequency. In an interference-trap case study, the proposed framework reduces procurement costs by about 60% over a fixed-band baseline while satisfying accuracy targets.

[658]  arXiv:2603.18781 (cross-list from stat.ML) [pdf, ps, other]
Title: SRRM: Improving Recursive Transport Surrogates in the Small-Discrepancy Regime
Comments: 29 pages,20 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)

Recursive partitioning methods provide computationally efficient surrogates for the Wasserstein distance, yet their statistical behavior and their resolution in the small-discrepancy regime remain insufficiently understood. We study Recursive Rank Matching (RRM) as a representative instance of this class under a population-anchored reference. In this setting, we establish consistency and an explicit convergence rate for the anchored empirical RRM under the quadratic cost. We then identify a dominant mismatch mechanism responsible for the loss of resolution in the small-discrepancy regime. Based on this analysis, we introduce Selective Recursive Rank Matching (SRRM), which suppresses the resulting dominant mismatches and yields a higher-fidelity practical surrogate for the Wasserstein distance at moderate additional computational cost.

[659]  arXiv:2603.18864 (cross-list from physics.chem-ph) [pdf, ps, other]
Title: Data-driven construction of machine-learning-based interatomic potentials for gas-surface scattering dynamics: the case of NO on graphite
Comments: 19 pages, 9 figures
Subjects: Chemical Physics (physics.chem-ph); Machine Learning (cs.LG)

Accurate atomistic simulations of gas-surface scattering require potential energy surfaces that remain reliable over broad configurational and energetic ranges while retaining the efficiency needed for extensive trajectory sampling. Here, we develop a data-driven workflow for constructing a machine-learning interatomic potential (MLIP) tailored to gas-surface scattering dynamics, using nitric oxide (NO) scattering from highly oriented pyrolytic graphite (HOPG) as a benchmark system. Starting from an initial ab initio molecular dynamics (AIMD) dataset, local atomic environments are described by SOAP descriptors and analyzed in a reduced feature space obtained through principal component analysis. Farthest point sampling is then used to build a compact training set, and the resulting Deep Potential model is refined through a query-by-committee active-learning strategy using additional configurations extracted from molecular dynamics simulations over extended ranges of incident energies and surface temperatures. The final MLIP reproduces reference energies and forces with high fidelity and enables large-scale molecular dynamics simulations of NO scattering from graphite at a computational cost far below that of AIMD. The simulations provide detailed insight into adsorption energetics, trapping versus direct scattering probabilities, translational energy loss, angular distributions, and rotational excitation. Overall, the results reproduce the main experimental trends and demonstrate that descriptor-guided sampling combined with active learning offers an efficient and transferable strategy for constructing MLIPs for gas-surface interactions.

[660]  arXiv:2603.18919 (cross-list from quant-ph) [pdf, ps, other]
Title: Quantum and classical approaches to the optimization of highway platooning: the two-vehicle matching problem
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET); Applied Physics (physics.app-ph)

Aerodynamic drag reduction on highways through vehicle platooning is a well-known concept, but it has not yet seen systematic uptake, arguably because of significant technological and legislative obstacles. As a low-tech entry point to real multi-vehicle platooning, "Windbreaking-as-a-Service" (WaaS) was introduced recently. Here we use a QUBO formulation to study classical metaheuristics such as simulated annealing and tabu search, together with emerging quantum heuristics including quantum annealing and variants of the Quantum Approximate Optimization Algorithm (QAOA). These heuristic solvers do not guarantee optimality, but they traverse the same higher-order landscape using polynomial memory. They can also be parallelized aggressively, and efficient classical post-processing can be used in hybrid workflows to return only valid schedules. This paper therefore positions QUBO as a common language that allows heterogeneous classical, quantum, and hybrid solvers to address the optimization of highway platooning.

[661]  arXiv:2603.18925 (cross-list from physics.geo-ph) [pdf, ps, other]
Title: Improving moment tensor solutions under Earth structure uncertainty with simulation-based inference
Comments: 19 pages, 12 figures + supporting info
Subjects: Geophysics (physics.geo-ph); Artificial Intelligence (cs.AI)

Bayesian inference represents a principled way to incorporate Earth structure uncertainty in full-waveform moment tensor inversions, but traditional approaches generally require significant approximations that risk biasing the resulting solutions. We introduce a robust method for handling theory errors using simulation-based inference (SBI), a machine learning approach that empirically models their impact on the observations. This framework retains the rigour of Bayesian inference while avoiding restrictive assumptions about the functional form of the uncertainties. We begin by demonstrating that the common Gaussian parametrisation of theory errors breaks down under minor ($1-3 \%$) 1-D Earth model uncertainty. To address this issue, we develop two formalisms for utilising SBI to improve the quality of the moment tensor solutions: one using physics-based insights into the theory errors, and another utilising an end-to-end deep learning algorithm. We then compare the results of moment tensor inversion with the standard Gaussian approach and SBI, and demonstrate that Gaussian assumptions induce bias and significantly under-report moment tensor uncertainties. We also show that these effects are particularly problematic when inverting short period data and for shallow, isotropic events. On the other hand, SBI produces more reliable, better calibrated posteriors of the earthquake source mechanism. Finally, we successfully apply our methodology to two well studied moderate magnitude earthquakes: one from the 1997 Long Valley Caldera volcanic earthquake sequence, and the 2020 Zagreb earthquake.

[662]  arXiv:2603.18929 (cross-list from math.MG) [pdf, ps, other]
Title: On the Duality of Coverings in Hilbert Geometry
Subjects: Metric Geometry (math.MG); Computational Geometry (cs.CG)

We prove polarity duality for covering problems in Hilbert geometry. Let $G$ and $K$ be convex bodies in $\mathbb{R}^d$ where $G \subset \operatorname{int}(K)$ and $\operatorname{int}(G)$ contains the origin. Let $N^H_K(G,\alpha)$ and $S^H_K(G,\alpha)$ denote, respectively, the minimum numbers of radius-$\alpha$ Hilbert balls in the geometry induced by $K$ needed to cover $G$ and $\partial G$. Our main result is a Hilbert-geometric analogue of the K\"{o}nig-Milman covering duality: there exists an absolute constant $c \geq 1$ such that for any $\alpha \in (0,1]$, \[
c^{-d}\,N^H_{G^{\circ}}(K^{\circ},\alpha)
~ \leq ~ N^H_K(G,\alpha)
~ \leq ~ c^{d}\,N^H_{G^{\circ}}(K^{\circ},\alpha), \] and likewise, \[
c^{-d}\,S^H_{G^{\circ}}(K^{\circ},\alpha)
~ \leq ~ S^H_K(G,\alpha)
~ \leq ~ c^{d}\,S^H_{G^{\circ}}(K^{\circ},\alpha). \] We also recover the classical volumetric duality for translative coverings of centered convex bodies, and obtain a new boundary-covering duality in that setting.
The Hilbert setting is subtler than the translative one because the metric is not translation invariant, and the local Finsler unit ball depends on the base point. The proof involves several ideas, including $\alpha$-expansions, a stability lemma that controls the interaction between polarity and expansion, and, in the boundary case, a localized relative isoperimetric argument combined with Holmes--Thompson area estimates. In addition, we provide an alternative proof of Faifman's polarity bounds for Holmes--Thompson volume and area in the Funk and Hilbert geometries.

[663]  arXiv:2603.18938 (cross-list from stat.ML) [pdf, ps, other]
Title: Kernel Single-Index Bandits: Estimation, Inference, and Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

We study contextual bandits with finitely many actions in which the reward of each arm follows a single-index model with an arm-specific index parameter and an unknown nonparametric link function. We consider a regime in which arms correspond to stable decision options and covariates evolve adaptively under the bandit policy. This setting creates significant statistical challenges: the sampling distribution depends on the allocation rule, observations are dependent over time, and inverse-propensity weighting induces variance inflation. We propose a kernelized $\varepsilon$-greedy algorithm that combines Stein-based estimation of the index parameters with inverse-propensity-weighted kernel ridge regression for the reward functions. This approach enables flexible semiparametric learning while retaining interpretability. Our analysis develops new tools for inference with adaptively collected data. We establish asymptotic normality for the single-index estimator under adaptive sampling, yielding valid confidence regions, and derive a directional functional central limit theorem for the RKHS estimator, which provides asymptotically valid pointwise confidence intervals. The analysis relies on concentration bounds for inverse-weighted Gram matrices together with martingale central limit theorems. We further obtain finite-time regret guarantees, including $\tilde{O}(\sqrt{T})$ rates under common-link Lipschitz conditions, showing that semiparametric structure can be exploited without sacrificing statistical efficiency. These results provide a unified framework for simultaneous learning and inference in single-index contextual bandits.

[664]  arXiv:2603.18941 (cross-list from stat.ML) [pdf, ps, other]
Title: Unified Taxonomy for Multivariate Time Series Anomaly Detection using Deep Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The topic of Multivariate Time Series Anomaly Detection (MTSAD) has grown rapidly over the past years, with a steady rise in publications and Deep Learning (DL) models becoming the dominant paradigm. To address the lack of systematization in the field, this study introduces a novel and unified taxonomy with eleven dimensions over three parts (Input, Output and Model) for the categorization of DL-based MTSAD methods. The dimensions were established in a two-fold approach. First, they derived from a comprehensive analysis of methodological studies. Second, insights from review papers were incorporated. Furthermore, the proposed taxonomy was validated using an additional set of recent publications, providing a clear overview of methodological trends in MTSAD. Results reveal a convergence toward Transformer-based and reconstruction and prediction models, setting the foundation for emerging adaptive and generative trends. Building on and complementing existing surveys, this unified taxonomy is designed to accommodate future developments, allowing for new categories or dimensions to be added as the field progresses. This work thus consolidates fragmented knowledge in the field and provides a reference point for future research in MTSAD.

[665]  arXiv:2603.18955 (cross-list from math.LO) [pdf, ps, other]
Title: Foundational Analysis Of The Solvability Complexity Index: The Weihrauch-SCI Intermediate Hierarchy And A Koopman Operator Example
Authors: Christopher Sorg
Comments: 62 pages: 39 pages + Appendix
Subjects: Logic (math.LO); Logic in Computer Science (cs.LO); Spectral Theory (math.SP)

The Solvability Complexity Index (SCI) provides an abstract notion of computing a target map $\Xi$ from finitely many oracle evaluations $\Lambda \subseteq \mathbb{C}$ via finite-height towers of pointwise limits. We first give a foundational analysis of what this extensional framework does and does not determine. We show that the SCI separation consistency is equivalent to a factorization of $\Xi$ through the full evaluation table, and we isolate the minimal logical role of $\Lambda$ as an information interface.
To connect the SCI to Type-2 computability and Weihrauch reducibility, we give an effective enrichment for countable $\Lambda$ by viewing the evaluation table image $I_{\Lambda}\subseteq\mathbb{C}^{\mathbb{N}}$ as a represented space and factoring $\Xi$ as $\widehat{\Xi}$. We then define the Weihrauch-SCI rank of a problem as the least number of iterated limit-oracles needed to compute it in the Weihrauch sense, i.e.\ the least $k$ such that $\widehat{\Xi}\le_{W}\lim^{(k)}$, and prove well-posedness and representation invariance of this rank.
A central negative result is that the unrestricted type-$G$ SCI model (arbitrary post-processing of finite oracle transcripts) is generally not comparable to Weihrauch/Type-2 complexity: finite-query factorizations collapse type-$G$ height, and analytic (non-Borel) decision problems yield examples with $\mathrm{SCI}_{G}=0$ but infinite Weihrauch-SCI rank. To recover a robust bridge, we introduce an intermediate SCI hierarchy by restricting the admissible base-level post-processing to regularity classes (continuous/Borel/Baire) and, optionally, to fixed-query versus adaptive-query policies. We prove that these restrictions form genuine hierarchies, and we establish comparison theorems showing what each restriction logically enforces (e.g.\ Borel towers compute only Borel targets; continuous-base towers yield finite Baire class).

[666]  arXiv:2603.18985 (cross-list from stat.ML) [pdf, ps, other]
Title: Revisiting OmniAnomaly for Anomaly Detection: performance metrics and comparison with PCA-based models
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Deep learning models have become the dominant approach for multivariate time series anomaly detection (MTSAD), often reporting substantial performance improvements over classical statistical methods. However, these gains are frequently evaluated under heterogeneous thresholding strategies and evaluation protocols, making fair comparisons difficult. This work revisits OmniAnomaly, a widely used stochastic recurrent model for MTSAD, and systematically compares it with a simple linear baseline based on Principal Component Analysis (PCA) on the Server Machine Dataset (SMD). Both methods are evaluated under identical thresholding and evaluation procedures, with experiments repeated across 100 runs for each of the 28 machines in the dataset. Performance is evaluated using Precision, Recall and F1-score at point-level, with and without point-adjustment, and under different aggregation strategies across machines and runs, with the corresponding standard deviations also reported. The results show large variability across machines and show that PCA can achieve performance comparable to OmniAnomaly, and even outperform it when point-adjustment is not applied. These findings question the added value of more complex architectures under current benchmarking practices and highlight the critical role of evaluation methodology in MTSAD research.

[667]  arXiv:2603.18997 (cross-list from math.CO) [pdf, ps, other]
Title: Product Structure and Treewidth of Hyperbolic Uniform Disk Graphs
Comments: An extended abstract of this paper is published in the Proceedings of the 42nd International Symposium on Computational Geometry (SoCG 2026)
Subjects: Combinatorics (math.CO); Computational Geometry (cs.CG); Discrete Mathematics (cs.DM)

Hyperbolic uniform disk graphs (HUDGs) are intersection graphs of disks with some radius $r$ in the hyperbolic plane, where $r$ may be constant or depend on the number of vertices in a family of HUDGs. We show that HUDGs with constant clique number do not admit \emph{product structure}, i.e., that there is no constant $c$ such that every such graph is a subgraph of $H \boxtimes P$ for some graph $H$ of treewidth at most $c$. This justifies that HUDGs are described as not having a grid-like structure in the literature, and is in contrast to unit disk graphs in the Euclidean plane, whose grid-like structure is evident from the fact that they are subgraphs of the strong product of two paths and a clique of constant size [Dvo\v{r}\'ak et al., '21, MATRIX Annals]. By allowing $H$ to be any graph of constant treewidth instead of a path-like graph, we reject the possibility of a grid-like structure not merely by the maximum degree (which is unbounded for HUDGs) but due to their global structure. We complement this by showing that for every (sub-)constant $r$, HUDGs admit product structure, whereas the typical hyperbolic behavior is observed if $r$ grows with the number of vertices.
Our proof involves a family of $n$-vertex HUDGs with radius $\log n$ that has bounded clique number but unbounded treewidth, and one for which the ratio of treewidth and clique number is $\log n / \log \log n$. Up to a $\log \log n$ factor, this negatively answers a question raised by Bl\"asius et al. [SoCG '25] asking whether balanced separators of HUDGs with radius $\log n$ can be covered by less than $\log n$ cliques. Our results also imply that the local and layered tree-independence number of HUDGs are both unbounded, answering an open question of Dallard et al. [arXiv '25].

[668]  arXiv:2603.19041 (cross-list from stat.ML) [pdf, ps, other]
Title: Fast and Interpretable Autoregressive Estimation with Neural Network Backpropagation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Autoregressive (AR) models remain widely used in time series analysis due to their interpretability, but convencional parameter estimation methods can be computationally expensive and prone to convergence issues. This paper proposes a Neural Network (NN) formulation of AR estimation by embedding the autoregressive structure directly into a feedforward NN, enabling coefficient estimation through backpropagation while preserving interpretability. Simulation experiments on 125,000 synthetic AR(p) time series with short-term dependence (1 <= p <= 5) show that the proposed NN-based method consistently recovers model coefficients for all series, while Conditional Maximum Likelihood (CML) fails to converge in approximately 55% of cases. When both methods converge, estimation accuracy is comparable with negligible differences in relative error, R2 and, perplexity/likelihood. However, when CML fails, the NN-based approach still provides reliable estimates. In all cases, the NN estimator achieves substantial computational gains, reaching a median speedup of 12.6x and up to 34.2x for higher model orders. Overall, results demonstrate that gradient-descent NN optimization can provide a fast and efficient alternative for interpretable AR parameter estimation.

[669]  arXiv:2603.19071 (cross-list from math.PR) [pdf, ps, other]
Title: Quantifying the effect of noise perturbation for the stochastic Burgers equation with additive trace-class noise
Subjects: Probability (math.PR); Analysis of PDEs (math.AP); Numerical Analysis (math.NA)

We establish upper bounds for the weak and strong error resulting from a perturbation of the noise driving the stochastic Burgers equation, where we assume the noise to be additive and of trace class and the initial value to be sufficiently regular. More specifically, replacing the covariance operator of the driving noise $Q_1 \in \mathcal{L}_1(L^2)$ in the Burgers equation by a covariance operator $Q_2 \in \mathcal{L}_1(L^2)$ results in a weak error of $\mathcal{O}\big(\| (-A)^{-1^{-} } (Q_1-Q_2) \|_{\mathcal{L}_1(L^2)}\big)$ and a strong error of $\mathcal{O}\big(\big\| (-A)^{-1/2^{-}}\big|Q_1^{1/2} -Q_2^{1/2}\big| \big\|_{\mathcal{L}_2(L^2)}\big)$. Here $\|\cdot \|_{\mathcal{L}_1}$ is the trace class norm, $\|\cdot \|_{\mathcal{L}_2}$ is the Hilbert-Schmidt norm, and $A$ is the one-dimensional Dirichlet Laplacian that represents the leading term in the Burgers equation. In particular, our results provide upper bounds for the weak and strong error arising when approximating the trace class noise by finite-dimensional noise; the rates we obtain reflect the general philosophy that the weak convergence rate should be twice the strong rate.

[670]  arXiv:2603.19110 (cross-list from quant-ph) [pdf, ps, other]
Title: Post-Quantum Cryptography from Quantum Stabilizer Decoding
Comments: 49 pages
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)

Post-quantum cryptography currently rests on a small number of hardness assumptions, posing significant risks should any one of them be compromised. This vulnerability motivates the search for new and cryptographically versatile assumptions that make a convincing case for quantum hardness. In this work, we argue that decoding random quantum stabilizer codes -- a quantum analog of the well-studied LPN problem -- is an excellent candidate. This task occupies a unique middle ground: it is inherently native to quantum computation, yet admits an equivalent formulation with purely classical input and output, as recently shown by Khesin et al. (STOC '26). We prove that the average-case hardness of quantum stabilizer decoding implies the core primitives of classical Cryptomania, including public-key encryption (PKE) and oblivious transfer (OT), as well as one-way functions. Our constructions are moreover practical: our PKE scheme achieves essentially the same efficiency as state-of-the-art LPN-based PKE, and our OT is round-optimal. We also provide substantial evidence that stabilizer decoding does not reduce to LPN, suggesting that the former problem constitutes a genuinely new post-quantum assumption. Our primary technical contributions are twofold. First, we give a reduction from random quantum stabilizer decoding to an average-case problem closely resembling LPN, but which is equipped with additional symplectic algebraic structure. While this structure is essential to the quantum nature of the problem, it raises significant barriers to cryptographic security reductions. Second, we develop a new suit of scrambling techniques for such structured linear spaces, and use them to produce rigorous security proofs for all of our constructions.

[671]  arXiv:2603.19115 (cross-list from q-bio.MN) [pdf, ps, other]
Title: BSTModelKit.jl: A Julia Package for Constructing, Solving, and Analyzing Biochemical Systems Theory Models
Subjects: Molecular Networks (q-bio.MN); Mathematical Software (cs.MS)

We present BSTModelKit.jl, an open-source Julia package for constructing, solving, and analyzing Biochemical Systems Theory (BST) models of biochemical networks. The package implements S-system representations, a canonical power-law formalism for modeling metabolic and regulatory networks. BSTModelKit.jl provides a declarative model specification format, dynamic simulation via ordinary differential equation (ODE) integration, steady-state computation, and global sensitivity analysis using the Morris and Sobol methods. The package leverages the Julia scientific computing ecosystem, in particular the SciML suite of differential equation solvers, to provide efficient and flexible model analysis tools. We describe the mathematical formulation, software design, and demonstrate the package capabilities with illustrative examples.

[672]  arXiv:2603.19117 (cross-list from quant-ph) [pdf, ps, other]
Title: Variational and Annealing-Based Approaches to Quantum Combinatorial Optimization
Comments: 23 pages, 6 figures
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)

In this work, we review quantum approaches to combinatorial optimization, with the aim of bridging theoretical developments and industrial relevance. We first survey the main families of quantum algorithms, including Quantum Annealing, the Quantum Approximate Optimization Algorithm (QAOA), Quantum Reinforcement Learning (QRL), and Quantum Generative Modeling (QGM). We then examine the problem classes where quantum technologies currently show evidence of quantum advantage, drawing on established benchmarking initiatives such as QOBLIB, QUARK, QASMBench, and QED-C. These problem classes are subsequently mapped to representative industrial domains, including logistics, finance, and telecommunications. Our analysis indicates that quantum annealing currently exhibits the highest level of operational maturity, while QAOA shows promising potential on NISQ-era hardware. In contrast, QRL and QGM emerge as longer-term research directions with significant potential for future industrial impact.

[673]  arXiv:2603.19130 (cross-list from quant-ph) [pdf, ps, other]
Title: Quantum block encoding for semiseparable matrices
Subjects: Quantum Physics (quant-ph); Numerical Analysis (math.NA); Quantum Algebra (math.QA)

Quantum block encoding (QBE) is a crucial step in the development of most quantum algorithms, as it provides an embedding of a given matrix into a suitable larger unitary matrix. Historically, the development of efficient techniques for QBE has mostly focused on sparse matrices; less effort has been devoted to data-sparse (e.g., rank-structured) matrices.
In this work we examine a particular case of rank structure, namely, one-pair semiseparable matrices. We present a new block encoding approach that relies on a suitable factorization of the given matrix as the product of triangular and diagonal factors. To encode the matrix, the algorithm needs $2\log(N)+7$ ancillary qubits. This process takes polylogarithmic time and has an error of $\mathcal{O}(N^2)$, where $N$ is the matrix size.

[674]  arXiv:2603.19147 (cross-list from math.OC) [pdf, ps, other]
Title: Fast and Effective Computation of Generalized Symmetric Matrix Factorization
Comments: 41 pages, 2 figures, 1 table
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

In this paper, we study a nonconvex, nonsmooth, and non-Lipschitz generalized symmetric matrix factorization model that unifies a broad class of matrix factorization formulations arising in machine learning, image science, engineering, and related areas. We first establish two exactness properties. On the modeling side, we prove an exact penalty property showing that, under suitable conditions, the symmetry-inducing quadratic penalty enforces symmetry whenever the penalty parameter is sufficiently large but finite, thereby exactly recovering the associated symmetric formulation. On the algorithmic side, we introduce an auxiliary-variable splitting formulation and establish an exact relaxation relationship that rigorously links stationary points of the original objective function to those of a relaxed potential function. Building on these exactness properties, we propose an average-type nonmonotone alternating updating method (A-NAUM) based on the relaxed potential function. At each iteration, A-NAUM alternately updates the two factor blocks by (approximately) minimizing the potential function, while the auxiliary block is updated in closed form. To ensure the convergence and enhance practical performance, we further incorporate an average-type nonmonotone line search and show that it is well-defined under mild conditions. Moreover, based on the Kurdyka-{\L}ojasiewicz property and its associated exponent, we establish global convergence of the entire sequence to a stationary point and derive convergence rate results. Finally, numerical experiments on real datasets demonstrate the efficiency of A-NAUM.

[675]  arXiv:2603.19195 (cross-list from eess.AS) [pdf, ps, other]
Title: How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation
Comments: Project website: this https URL
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over text descriptions from an audio captioner; and (3) audio-grounded evaluation, where each LLM is fine-tuned into a Large Audio Language Model (LALM) with an audio encoder. Our findings reveal that auditory knowledge varies substantially across families, and text-only results are strongly correlated with audio performance. Our work provides empirical grounding for a comprehensive understanding of LLMs in audio research.

[676]  arXiv:2603.19198 (cross-list from stat.ML) [pdf, ps, other]
Title: The Exponentially Weighted Signature
Comments: 43 pages, 1 figure
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address this, we introduce the Exponentially Weighted Signature (EWS), generalising the Exponentially Fading Memory (EFM) signature from diagonal to general bounded linear operators. These operators enable cross-channel coupling at the level of temporal weighting together with richer memory dynamics including oscillatory, growth, and regime-dependent behaviour, while preserving the algebraic strengths of the classical signature. We show that the EWS is the unique solution to a linear controlled differential equation on the tensor algebra, and that it generalises both state-space models and the Laplace and Fourier transforms of the path. The group-like structure of the EWS enables efficient computation and makes the framework amenable to gradient-based learning, with the full semigroup action parametrised by and learned through its generator. We use this framework to empirically demonstrate the expressivity gap between the EWS and both the signature and EFM on two SDE-based regression tasks.

[677]  arXiv:2603.19215 (cross-list from math.AG) [pdf, ps, other]
Title: $R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal Equivalence
Comments: 23 pages
Subjects: Algebraic Geometry (math.AG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Number Theory (math.NT)

Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also had non-trivial $R$-equivalence, they would contradict Colliot-Th\'el\`ene and Sansuc's conjecture regarding the $k$-rationality of universal torsors for geometrically rational surfaces.
By devising new methods to study $R$-equivalence, we prove that for 2-adic surfaces with all-Eckardt reductions (the third special type, which contains every existing case of non-trivial universal equivalence), $R$-equivalence is trivial or of exponent 2. For the explicit cases, we confirm triviality: the diagonal cubic $X^3+Y^3+Z^3+\zeta_3 T^3=0$ over $\mathbb{Q}_2(\zeta_3)$--answering a long-standing question of Manin's (Cubic Forms, 1972)--and the cubic with universal equivalence of exponent 2 (Kanevsky, 1982).
This is the first in a series of works derived from a year of interactions with generative AI models such as AlphaEvolve and Gemini 3 Deep Think, with the latter proving many of our lemmas. We disclose the timeline and nature of their use towards this paper, and describe our broader AI-assisted research program in a companion report (in preparation).

Replacements for Fri, 20 Mar 26

[678]  arXiv:2101.08301 (replaced) [pdf, ps, other]
Title: Exploring AI in Fashion: A Review of Aesthetics, Personalization, Virtual Try-On, and Forecasting
Journal-ref: Multimedia Systems 32, 167 (2026)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[679]  arXiv:2107.08686 (replaced) [pdf, ps, other]
Title: Improved Learning Rates for Stochastic Optimization
Comments: This version substantially revises and supersedes all previous versions. Earlier versions contained errors and should not be relied upon for the current results or statements. The manuscript has been thoroughly rewritten, with a narrowed scope, a simplified presentation, a revised focus, and corresponding updates to the title and main claims. Please refer to and cite the current version
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[680]  arXiv:2108.00916 (replaced) [pdf, ps, other]
Title: 2-D Directed Formation Control Based on Bipolar Coordinates
Comments: 16 pages, 10 figures; minor typos corrected; no change in results
Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA); Robotics (cs.RO)
[681]  arXiv:2208.00335 (replaced) [pdf, ps, other]
Title: Rule Extraction in Machine Learning: Chat Incremental Pattern Constructor
Comments: 13 pages
Subjects: Machine Learning (cs.LG)
[682]  arXiv:2208.02006 (replaced) [pdf, ps, other]
Title: Funnel Control Under Hard and Soft Output Constraints (extended version)
Comments: 9 pages, 7 figures. Minor revisions: corrected text and mathematical typos, expanded discussion in Section III.A, and added a short appendix on relaxation of an assumption; main results unchanged
Subjects: Systems and Control (eess.SY)
[683]  arXiv:2209.04892 (replaced) [pdf, ps, other]
Title: "Calibeating": Beating Forecasters at Their Own Game
Comments: Corrected Appendix A.7 + new Appendix A.10. Included: Addendum and Errata to the published journal version (Theoretical Economics, 2023) and to arXiv previous version v2 (2022). Web page: this http URL
Journal-ref: Theoretical Economics 18 (2023), 4, 1441-1474
Subjects: Theoretical Economics (econ.TH); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Machine Learning (stat.ML)
[684]  arXiv:2309.08945 (replaced) [pdf, ps, other]
Title: Inverse classification with logistic and softmax classifiers: efficient optimization
Comments: Appears in Transactions on Machine Learning Research, March 2026
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[685]  arXiv:2311.04095 (replaced) [pdf, ps, other]
Title: Multi-Scale Distillation for RGB-D Anomaly Detection on the PD-REAL Dataset
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[686]  arXiv:2312.03871 (replaced) [pdf, ps, other]
Title: Hidden yet quantifiable: A lower bound for confounding strength using randomized trials
Comments: Accepted for presentation at the International Conference on Artificial Intelligence and Statistics (AISTATS) 2024
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[687]  arXiv:2312.08531 (replaced) [pdf, ps, other]
Title: Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods
Comments: The preliminary version has been accepted at ICLR 2024. For the update history, please refer to the PDF
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[688]  arXiv:2402.01972 (replaced) [pdf, ps, other]
Title: Combining T-learning and DR-learning: a framework for oracle-efficient estimation of causal contrasts
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[689]  arXiv:2402.02917 (replaced) [pdf, ps, other]
Title: Construction of Optimal Algorithms for Function Approximation in Gaussian Sobolev Spaces
Comments: 19 pages, 2 figures, to appear on BIT Numerical Mathematics
Subjects: Numerical Analysis (math.NA)
[690]  arXiv:2402.15315 (replaced) [pdf, ps, other]
Title: On Minimal Depth in Neural Networks
Authors: Juan L. Valerdi
Comments: 16 pages
Subjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[691]  arXiv:2403.02482 (replaced) [pdf, ps, other]
Title: Heuristic Multiobjective Discrete Optimization using Restricted Decision Diagrams
Comments: To appear in the proceedings of CPAIOR 2026
Subjects: Artificial Intelligence (cs.AI)
[692]  arXiv:2403.02951 (replaced) [pdf, ps, other]
Title: SQLBench: A Comprehensive Evaluation for Text-to-SQL Capabilities of Large Language Models
Comments: 25pages, 10figures, 14tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[693]  arXiv:2403.07189 (replaced) [pdf, ps, other]
Title: A multiscale cavity method for sublinear-rank symmetric matrix factorization
Comments: 65 pages. Filled out proof details, improved multiscale cavity method and its proof. Equation and theorem numbering made consistent with published version
Subjects: Information Theory (cs.IT); Disordered Systems and Neural Networks (cond-mat.dis-nn); Mathematical Physics (math-ph); Statistics Theory (math.ST)
[694]  arXiv:2403.17210 (replaced) [pdf, ps, other]
Title: CADGL: Context-Aware Deep Graph Learning for Predicting Drug-Drug Interactions
Comments: Preliminary version; full version accepted to the IEEE Transactions on Computational Biology and Bioinformatics (IEEE TCBB) (this https URL). Code: this https URL
Journal-ref: IEEE Transactions on Computational Biology and Bioinformatics, 2026
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Biomolecules (q-bio.BM); Molecular Networks (q-bio.MN)
[695]  arXiv:2404.16050 (replaced) [pdf, ps, other]
Title: Implications of computer science theory for the simulation hypothesis
Authors: David H. Wolpert
Comments: 47 pages of text, 5 pages of references, 13 pages of appendices
Subjects: Logic in Computer Science (cs.LO); History and Philosophy of Physics (physics.hist-ph)
[696]  arXiv:2405.11440 (replaced) [pdf, ps, other]
Title: A Model Consistency-Based Countermeasure to GAN-Based Data Poisoning Attack in Federated Learning
Comments: 18 pages, 16 figures
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
[697]  arXiv:2405.19569 (replaced) [pdf, ps, other]
Title: Improved Convex Decomposition with Ensembling and Negative Primitives
Comments: 3DV 2026 25 pages, 15 figures, 9 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[698]  arXiv:2406.00153 (replaced) [pdf, ps, other]
Title: $μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Subjects: Machine Learning (cs.LG)
[699]  arXiv:2407.02134 (replaced) [pdf, ps, other]
Title: Abstract Markov Random Fields
Comments: 56 pages, 9 figures
Subjects: Information Theory (cs.IT)
[700]  arXiv:2407.16740 (replaced) [pdf, ps, other]
Title: PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[701]  arXiv:2407.17454 (replaced) [pdf, ps, other]
Title: Automated Explanation Selection for Scientific Discovery
Authors: Ashlin Iser
Comments: Composite AI Workshop at ECAI 2024 (accepted for publication)
Subjects: Artificial Intelligence (cs.AI)
[702]  arXiv:2407.17869 (replaced) [pdf, ps, other]
Title: Modeling Inverse Ellipsometry Problem via Flow Matching with a Large-Scale Dataset
Subjects: Machine Learning (cs.LG)
[703]  arXiv:2409.05585 (replaced) [pdf, ps, other]
Title: Latent Causal Modeling for 3D Brain MRI Counterfactuals
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[704]  arXiv:2409.16215 (replaced) [pdf, ps, other]
Title: TiROD: Tiny Robotics Dataset and Benchmark for Continual Object Detection
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[705]  arXiv:2409.17833 (replaced) [pdf, ps, other]
Title: ODE-Constrained Generative Modeling of Cardiac Dynamics for 12-Lead ECG Synthesis
Subjects: Machine Learning (cs.LG)
[706]  arXiv:2410.06415 (replaced) [pdf, ps, other]
Title: Biased AI can Influence Political Decision-Making
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
[707]  arXiv:2410.13106 (replaced) [pdf, ps, other]
Title: Cliqueformer: Model-Based Optimization with Structured Transformers
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[708]  arXiv:2410.18178 (replaced) [pdf, ps, other]
Title: Quantum linear system algorithm with optimal queries to initial state preparation
Comments: 89 pages, 3 figures. Corrected typos
Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA)
[709]  arXiv:2411.08794 (replaced) [pdf, ps, other]
Title: LLM-Based World Models Can Make Decisions Solely, But Rigorous Evaluations are Needed
Comments: Accepted to TMLR
Subjects: Artificial Intelligence (cs.AI)
[710]  arXiv:2411.14802 (replaced) [pdf, ps, other]
Title: Enhancing a Hierarchical Graph Rewriting Language based on MELL Cut Elimination
Comments: Extended version of the PADL 2025 paper (LNCS, Springer, 2025). This version incorporates into the main text details omitted from the conference version and includes some additional material
Subjects: Programming Languages (cs.PL)
[711]  arXiv:2411.15060 (replaced) [pdf, ps, other]
Title: Hallucination Detection in Virtually-Stained Histology: A Latent Space Baseline
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[712]  arXiv:2411.17559 (replaced) [pdf, ps, other]
Title: Degrees of Freedom of Cache-Aided Interference Channels Assisted by Active Intelligent Reflecting Surfaces
Subjects: Information Theory (cs.IT)
[713]  arXiv:2412.01113 (replaced) [pdf, ps, other]
Title: LLMs Faithfully and Iteratively Compute Answers During CoT: A Systematic Analysis With Multi-step Arithmetics
Subjects: Computation and Language (cs.CL)
[714]  arXiv:2412.02484 (replaced) [pdf, ps, other]
Title: Vector Optimization with Gaussian Process Bandits
Subjects: Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
[715]  arXiv:2412.04162 (replaced) [pdf, ps, other]
Title: Estimating the persistent homology of $\mathbb{R}^n$-valued functions using function-geometric multifiltrations
Comments: 38 pages; v3; add a corollary, with a proof, showing that the persistence module $H_*(f)$ satisfies a form of tameness
Subjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG)
[716]  arXiv:2412.08973 (replaced) [pdf, ps, other]
Title: Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?
Comments: Accepted to IJCV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[717]  arXiv:2412.09303 (replaced) [pdf, ps, other]
Title: A Comprehensive Survey of Data Reduction Rules for the Maximum Weighted Independent Set Problem
Subjects: Data Structures and Algorithms (cs.DS)
[718]  arXiv:2412.09465 (replaced) [pdf, ps, other]
Title: OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
Comments: ICLR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[719]  arXiv:2412.10488 (replaced) [pdf, ps, other]
Title: SVGBuilder: Component-Based Colored SVG Generation with Text-Guided Autoregressive Transformers
Authors: Zehao Chen, Rong Pan
Comments: Accepted by AAAI 2025. Project: this https URL
Journal-ref: Proceedings of the AAAI Conference on Artificial Intelligence, 2025, 39(3), 2358-2366
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[720]  arXiv:2412.15411 (replaced) [pdf, ps, other]
Title: Sparse Checkpointing for Fast and Reliable MoE Training
Comments: NSDI'26 | Camera-Ready
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[721]  arXiv:2412.17861 (replaced) [pdf, ps, other]
Title: From Vocal Instructions to Household Tasks: The Inria TIAGo++ in the euROBIN Service Robots Coopetition
Subjects: Robotics (cs.RO)
[722]  arXiv:2501.00204 (replaced) [pdf, ps, other]
Title: MSM-BD: Multimodal Social Media Bot Detection Using Heterogeneous Information
Comments: Springer Nature in Studies in Computational Intelligence
Subjects: Multimedia (cs.MM); Social and Information Networks (cs.SI)
[723]  arXiv:2501.00744 (replaced) [pdf, ps, other]
Title: Assessing the Distributional Fidelity of Synthetic Chest X-rays using the Embedded Characteristic Score
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[724]  arXiv:2501.02364 (replaced) [pdf, ps, other]
Title: Linearly Separable Features in Shallow Nonlinear Networks: Width Scales Polynomially with Intrinsic Data Dimension
Comments: 33 pages, 10 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[725]  arXiv:2501.09749 (replaced) [pdf, ps, other]
Title: Enhancing Lexicon-Based Text Embeddings with Large Language Models
Comments: ACL 2025
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
[726]  arXiv:2501.17026 (replaced) [pdf, ps, other]
Title: Mitigating Omitted Variable Bias in Empirical Software Engineering
Subjects: Software Engineering (cs.SE)
[727]  arXiv:2502.00340 (replaced) [pdf, ps, other]
Title: Unlocking Full Efficiency of Token Filtering in Large Language Model Training
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC)
[728]  arXiv:2502.03714 (replaced) [pdf, ps, other]
Title: Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[729]  arXiv:2502.08416 (replaced) [pdf, ps, other]
Title: Multifidelity Simulation-based Inference for Computationally Expensive Simulators
Comments: Accepted at ICLR 2026. Available at OpenReview: this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[730]  arXiv:2502.09340 (replaced) [pdf, ps, other]
Title: This looks like what? Challenges and Future Research Directions for Part-Prototype Models
Comments: Accepted at the 4th World Conference on eXplainable Artificial Intelligence (XAI-2026)
Subjects: Machine Learning (cs.LG)
[731]  arXiv:2502.10978 (replaced) [pdf, ps, other]
Title: Agentic LLM Framework for Adaptive Decision Discourse
Comments: 24 pages, 4 figures, 1 appendix
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[732]  arXiv:2502.14446 (replaced) [pdf, ps, other]
Title: MOMENTI: Scalable Motif Mining in Multidimensional Time Series
Comments: 14 pages, 7 figures, extended experimental section, change of algorithm name due to a title clash with another paper published in the same issue
Subjects: Data Structures and Algorithms (cs.DS)
[733]  arXiv:2502.15411 (replaced) [pdf, ps, other]
Title: HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[734]  arXiv:2502.16116 (replaced) [pdf, ps, other]
Title: Integrating Weather Station Data and Radar for Precipitation Nowcasting: SmaAt-fUsion and SmaAt-Krige-GNet
Comments: 13 pages, 6 figures
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
[735]  arXiv:2503.01482 (replaced) [pdf, ps, other]
Title: Revisiting Locally Differentially Private Protocols: Towards Better Trade-offs in Privacy, Utility, and Attack Resistance
Comments: Paper accepted at ICDE 2026
Subjects: Cryptography and Security (cs.CR)
[736]  arXiv:2503.07884 (replaced) [pdf, ps, other]
Title: LLMIA: An Out-of-the-Box Index Advisor via In-Context Learning with LLMs
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)
[737]  arXiv:2503.08890 (replaced) [pdf, ps, other]
Title: PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization
Authors: Zhiwen You, Yue Guo
Comments: Accepted by Journal of Biomedical Informatics
Subjects: Computation and Language (cs.CL)
[738]  arXiv:2503.09538 (replaced) [pdf, ps, other]
Title: Differentially Private Equilibrium Finding in Polymatrix Games
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[739]  arXiv:2503.16252 (replaced) [pdf, ps, other]
Title: Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning
Subjects: Computation and Language (cs.CL)
[740]  arXiv:2503.16426 (replaced) [pdf, ps, other]
Title: DynamicVis: Dynamic Visual Perception for Efficient Remote Sensing Foundation Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[741]  arXiv:2503.17867 (replaced) [pdf, ps, other]
Title: Detecting and Mitigating DDoS Attacks with AI: A Survey
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
[742]  arXiv:2503.18163 (replaced) [pdf, ps, other]
Title: A unified convention for achievement positional games
Comments: Compared to the previous version, of which a long abstract has been published at EuroComb'25, this version contains a proof of PSPACE-completeness for intermediate positions of 3-uniform Maker-Maker games
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[743]  arXiv:2503.18253 (replaced) [pdf, ps, other]
Title: Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages
Comments: LREC 2026
Subjects: Computation and Language (cs.CL)
[744]  arXiv:2503.21782 (replaced) [pdf, ps, other]
Title: Mobile-VideoGPT: Fast and Accurate Model for Mobile Video Understanding
Comments: Technical Report. Project: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[745]  arXiv:2503.21800 (replaced) [pdf, ps, other]
Title: ELM: A Hybrid Ensemble of Language Models for Automated Tumor Group Classification in Population-Based Cancer Registries
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[746]  arXiv:2504.00992 (replaced) [pdf, ps, other]
Title: SuperDec: 3D Scene Decomposition with Superquadric Primitives
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[747]  arXiv:2504.05578 (replaced) [pdf, ps, other]
Title: Recent Advances in Near-Field Beam Training and Channel Estimation for XL-MIMO Systems
Comments: accepted by Advanced Information and Communication Journal; 9 pages; 6 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[748]  arXiv:2504.07537 (replaced) [pdf, ps, other]
Title: Formalizing Representation Theorems for a Logical Framework with Rewriting
Authors: Thomas Traversié (MICS, DEDUCTEAM), Florian Rabe (FAU)
Subjects: Logic in Computer Science (cs.LO)
[749]  arXiv:2504.09022 (replaced) [pdf, ps, other]
Title: Game-Theoretic Coordination for Time-Critical Missions of UAV Systems
Comments: Revised version with improved exposition, expanded introduction, updated abstract, minor corrections and updated author list
Subjects: Multiagent Systems (cs.MA)
[750]  arXiv:2504.12441 (replaced) [pdf, ps, other]
Title: Learning Transferable Friction Models and LuGre Identification Via Physics-Informed Neural Networks
Comments: 7 pages, 8 figures, Accepted to 2026 American Control Conference (ACC)
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
[751]  arXiv:2504.12724 (replaced) [pdf, ps, other]
Title: Faster multivariate integration in D-modules
Comments: Revised version with several improvements, including new examples
Subjects: Symbolic Computation (cs.SC)
[752]  arXiv:2504.14115 (replaced) [pdf, ps, other]
Title: Visualization Tasks for Unlabeled Graphs
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[753]  arXiv:2504.14634 (replaced) [pdf, ps, other]
Title: Latent Representations for Visual Proprioception in Inexpensive Robots
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[754]  arXiv:2504.15995 (replaced) [pdf, ps, other]
Title: OPUS-VFL: Incentivizing Optimal Privacy-Utility Tradeoffs in Vertical Federated Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[755]  arXiv:2505.01174 (replaced) [pdf, ps, other]
Title: Self-moderation in the decentralized era: decoding blocking behavior on Bluesky
Subjects: Social and Information Networks (cs.SI)
[756]  arXiv:2505.02024 (replaced) [pdf, ps, other]
Title: From Mind to Machine: The Rise of Manus AI as a Fully Autonomous Digital Agent
Subjects: Artificial Intelligence (cs.AI)
[757]  arXiv:2505.03530 (replaced) [pdf, ps, other]
Title: Causal Intervention Framework for Variational Auto Encoder Mechanistic Interpretability
Authors: Dip Roy
Subjects: Machine Learning (cs.LG)
[758]  arXiv:2505.06193 (replaced) [pdf, ps, other]
Title: Ohana trees, linear approximation and multi-types for the $λ$I-calculus: No variable gets left behind or forgotten!
Comments: This is the (submitted) long version of the (published) conference version v2, see this https URL
Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
[759]  arXiv:2505.08916 (replaced) [pdf, ps, other]
Title: A New Tractable Description Logic under Categorical Semantics
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
[760]  arXiv:2505.10294 (replaced) [pdf, ps, other]
Title: MIPHEI-ViT: Multiplex Immunofluorescence Prediction from H&E Images using ViT Foundation Models
Comments: Accepted manuscript, 24 pages, 9 figures, 5 tables. Published in Computers in Biology and Medicine (DOI: this https URL)
Journal-ref: Computers in Biology and Medicine, vol. 206, 2026, 111564
Subjects: Computer Vision and Pattern Recognition (cs.CV); Tissues and Organs (q-bio.TO)
[761]  arXiv:2505.21854 (replaced) [pdf, ps, other]
Title: Rethinking Gradient-based Adversarial Attacks on Point Cloud Classification
Comments: ICME 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[762]  arXiv:2505.24503 (replaced) [pdf, ps, other]
Title: Online Fair Division with Additional Information
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)
[763]  arXiv:2506.00030 (replaced) [pdf, ps, other]
Title: Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement
Comments: Accepted by TPAMI
Subjects: Machine Learning (cs.LG)
[764]  arXiv:2506.02009 (replaced) [pdf, ps, other]
Title: STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds
Comments: 10 pages for main text
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[765]  arXiv:2506.02535 (replaced) [pdf, ps, other]
Title: Video Anomaly Detection with Semantics-Aware Information Bottleneck
Comments: Accepted by ICME 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[766]  arXiv:2506.02914 (replaced) [pdf, ps, other]
Title: Auto-Annotation with Expert-Crafted Guidelines: A Study through 3D LiDAR Detection Benchmark
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[767]  arXiv:2506.05908 (replaced) [pdf, ps, other]
Title: QualitEye: Public and Privacy-preserving Gaze Data Quality Verification
Subjects: Human-Computer Interaction (cs.HC); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[768]  arXiv:2506.06975 (replaced) [pdf, ps, other]
Title: Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[769]  arXiv:2506.08625 (replaced) [pdf, ps, other]
Title: RAISE: Enhancing Scientific Reasoning in LLMs via Step-by-Step Retrieval
Subjects: Computation and Language (cs.CL)
[770]  arXiv:2506.08898 (replaced) [pdf, ps, other]
Title: Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation
Comments: 22 pages, 6 figures, under review
Subjects: Artificial Intelligence (cs.AI)
[771]  arXiv:2506.10586 (replaced) [pdf, ps, other]
Title: Size-adaptive Hypothesis Testing for Fairness
Journal-ref: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (stat.ML)
[772]  arXiv:2506.10723 (replaced) [pdf, ps, other]
Title: Semi-discrete moduli of smoothness and their applications in one- and two- sided error estimates
Subjects: Numerical Analysis (math.NA); Functional Analysis (math.FA)
[773]  arXiv:2506.11319 (replaced) [pdf, ps, other]
Title: Hardware-Aware Neural Architecture Search for Encrypted Traffic Classification on Resource-Constrained Devices
Comments: 14 pages, 7 figures. Published in IEEE Transactions on Network and Service Management (2026)
Journal-ref: IEEE Transactions on Network and Service Management, 2026
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
[774]  arXiv:2506.13387 (replaced) [pdf, ps, other]
Title: TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Dual-Level Scale-Oriented Contrast
Comments: CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[775]  arXiv:2506.14662 (replaced) [pdf, ps, other]
Title: PGLib-CO2: A Power Grid Library for Real-Time Computation and Optimization of Carbon Emissions
Subjects: Systems and Control (eess.SY)
[776]  arXiv:2506.16931 (replaced) [pdf, ps, other]
Title: Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning
Comments: 14 pages, 6 figures, under review
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)
[777]  arXiv:2506.18694 (replaced) [pdf, ps, other]
Title: Shifted HSS solvers for the indefinite Helmholtz equation
Comments: Revision including new title before submission to SISC
Subjects: Numerical Analysis (math.NA)
[778]  arXiv:2506.19075 (replaced) [pdf, ps, other]
Title: First-Order Sparse Convex Optimization: Better Rates with Sparse Updates
Authors: Dan Garber
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[779]  arXiv:2506.20334 (replaced) [pdf, ps, other]
Title: Recurrent neural network-based robust control systems with regional properties and application to MPC design
Comments: 27 pages, 5 figures
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
[780]  arXiv:2507.00465 (replaced) [pdf, ps, other]
Title: Encoding Peano Arithmetic in a Minimal Fragment of Separation Logic
Subjects: Logic in Computer Science (cs.LO)
[781]  arXiv:2507.01062 (replaced) [pdf, ps, other]
Title: Quantifying Student Success with Generative AI: A Monte Carlo Simulation Informed by Systematic Review
Comments: Conference version presented at ICETE 2026
Journal-ref: Kayadibi, S. Y. (2026). Melbourne Institute of Technology ICETE Conference, Sydney, NSW, Australia
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[782]  arXiv:2507.02768 (replaced) [pdf, ps, other]
Title: DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Comments: Published in IEEE Transactions on Audio, Speech and Language Processing (TASLP). Model and code available at: this https URL
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[783]  arXiv:2507.02861 (replaced) [pdf, ps, other]
Title: LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans
Comments: Project Page: this https URL; Video: this https URL&feature=youtu.be Camera-Ready Version
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[784]  arXiv:2507.05751 (replaced) [pdf, ps, other]
Title: SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[785]  arXiv:2507.06542 (replaced) [pdf, ps, other]
Title: On the Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Comments: We discover and theoretically explain why and when a single global parameter merging in decentralized learning can recover the performance of federated learning, even in highly heterogeneous and communication-constrained environments
Journal-ref: ICLR 2026 (Oral Presentation)
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[786]  arXiv:2507.07034 (replaced) [pdf, ps, other]
Title: Surrogate Model for Heat Transfer Prediction in Impinging Jet Arrays using Dynamic Inlet/Outlet and Flow Rate Control
Comments: 39 pages, 12 figures
Subjects: Fluid Dynamics (physics.flu-dyn); Artificial Intelligence (cs.AI)
[787]  arXiv:2507.08193 (replaced) [pdf, ps, other]
Title: Entity-Specific Cyber Risk Assessment using InsurTech Empowered Risk Factors
Comments: Variance 19 (February)
Subjects: Risk Management (q-fin.RM); Machine Learning (cs.LG); Machine Learning (stat.ML)
[788]  arXiv:2507.12261 (replaced) [pdf, ps, other]
Title: Infherno: End-to-end Agent-based FHIR Resource Synthesis from Free-form Clinical Notes
Comments: EACL 2026 System Demonstrations | Code: this https URL | Demo: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[789]  arXiv:2507.13313 (replaced) [pdf, ps, other]
Title: A Crowdsensing Intrusion Detection Dataset For Decentralized Federated Learning Models
Subjects: Cryptography and Security (cs.CR)
[790]  arXiv:2507.13323 (replaced) [pdf, ps, other]
Title: GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM
Comments: 10 pages, 9 figures, 4 tables
Subjects: Machine Learning (cs.LG)
[791]  arXiv:2507.16861 (replaced) [pdf, ps, other]
Title: Look Before You Fuse: 2D-Guided Cross-Modal Alignment for Robust 3D Detection
Comments: accepted to cvpr 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[792]  arXiv:2507.19530 (replaced) [pdf, ps, other]
Title: When Validation Fails: Cross-Institutional Blood Pressure Prediction and the Limits of Electronic Health Record-Based Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[793]  arXiv:2507.21114 (replaced) [pdf, ps, other]
Title: Page image classification for content-specific data processing
Comments: 69 pages, 68 figures, 30 tables
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[794]  arXiv:2508.00268 (replaced) [pdf, ps, other]
Title: Channel Estimation for Flexible Intelligent Metasurfaces: From Model-Based Approaches to Neural Operators
Journal-ref: IEEE Transactions on Wireless Communications, vol. 25, pp. 10684-10701, 2026
Subjects: Information Theory (cs.IT)
[795]  arXiv:2508.04904 (replaced) [pdf, ps, other]
Title: Toward Scalable Patient Safety Training: A Prototype for Root Cause Analysis Simulation With AI Virtual Avatars
Comments: This works has been accepted at the 2026 IEEE Conference on Artificial Intelligence, where a revised version of this work will be published
Subjects: Human-Computer Interaction (cs.HC)
[796]  arXiv:2508.05321 (replaced) [pdf, ps, other]
Title: Unsupervised Learning for Inverse Problems in Computed Tomography
Comments: 14 pages, 9 Figures
Subjects: Medical Physics (physics.med-ph); Artificial Intelligence (cs.AI)
[797]  arXiv:2508.07473 (replaced) [pdf, ps, other]
Title: Online Convex Optimization with Heavy Tails: Old Algorithms, New Regrets, and Applications
Authors: Zijian Liu
Comments: A short, self-contained version has been accepted at ALT 2026. Update to include the change in the camera-ready version
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[798]  arXiv:2508.11431 (replaced) [pdf, ps, other]
Title: Remove360: Benchmarking Residuals After Object Removal in 3D Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[799]  arXiv:2508.11468 (replaced) [pdf, ps, other]
Title: TRACE: Evaluating Execution Efficiency of LLM-Based Code Translation
Subjects: Software Engineering (cs.SE)
[800]  arXiv:2508.12987 (replaced) [pdf, ps, other]
Title: Transfer Learning for Neutrino Scattering: Domain Adaptation with GANs
Comments: 23 pages, 22 figures, together with supplement, as published in Phys. Rev. D
Journal-ref: Phys.Rev.D 113 (2026) 5, 053001
Subjects: High Energy Physics - Phenomenology (hep-ph); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); Nuclear Experiment (nucl-ex); Computational Physics (physics.comp-ph)
[801]  arXiv:2508.15376 (replaced) [pdf, ps, other]
Title: DriveSplat: Unified Neural Gaussian Reconstruction for Dynamic Driving Scenes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[802]  arXiv:2508.17303 (replaced) [pdf, ps, other]
Title: Physics-informed neural network for predicting fatigue life of unirradiated and irradiated austenitic and ferritic/martensitic steels under reactor-relevant conditions
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
[803]  arXiv:2508.18753 (replaced) [pdf, ps, other]
Title: CrossHOI-Bench: A Unified Benchmark for HOI Evaluation across Vision-Language Models and HOI-Specific Methods
Comments: Accepted by CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[804]  arXiv:2508.19195 (replaced) [pdf, ps, other]
Title: All-in-One Slider for Attribute Manipulation in Diffusion Models
Comments: accepted by CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[805]  arXiv:2508.20784 (replaced) [pdf, ps, other]
Title: Single Agent Robust Deep Reinforcement Learning for Bus Fleet Control
Authors: Yifan Zhang
Subjects: Artificial Intelligence (cs.AI)
[806]  arXiv:2508.21475 (replaced) [pdf, ps, other]
Title: MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents
Comments: Project Page: this https URL
Journal-ref: ICLR 2026
Subjects: Artificial Intelligence (cs.AI)
[807]  arXiv:2509.01019 (replaced) [pdf, ps, other]
Title: AI-driven Dispensing of Coral Reseeding Devices for Broad-scale Restoration of the Great Barrier Reef
Comments: 8 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[808]  arXiv:2509.02437 (replaced) [pdf, ps, other]
Title: U-ARM : Ultra low-cost general teleoperation interface for robot manipulation
Subjects: Robotics (cs.RO)
[809]  arXiv:2509.02460 (replaced) [pdf, ps, other]
Title: GenCompositor: Generative Video Compositing with Diffusion Transformer
Comments: Accepted by ICLR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[810]  arXiv:2509.03636 (replaced) [pdf, ps, other]
Title: CausalARC: Abstract Reasoning with Causal World Models
Comments: Peer-reviewed workshop paper
Journal-ref: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Bridging Language, Agent, and World Models (LAW)
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[811]  arXiv:2509.04050 (replaced) [pdf, ps, other]
Title: A Re-ranking Method using K-nearest Weighted Fusion for Person Re-identification
Comments: Published in ICPRAM 2025, ISBN 978-989-758-730-6, ISSN 2184-4313
Journal-ref: Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - ICPRAM (2025) 79-90
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[812]  arXiv:2509.07907 (replaced) [pdf, ps, other]
Title: Congestion Control for Spraying with Congested Paths
Subjects: Networking and Internet Architecture (cs.NI)
[813]  arXiv:2509.08759 (replaced) [pdf, ps, other]
Title: Fourier Learning Machines: Nonharmonic Fourier-Based Neural Networks for Scientific Machine Learning
Comments: Please cite the peer-reviewed, published version available on Transactions on Machine Learning Research at this https URL
Journal-ref: Transactions on Machine Learning Research, December 2025
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[814]  arXiv:2509.09812 (replaced) [pdf, ps, other]
Title: EDMD-Based Robust Observer Synthesis for Nonlinear Systems
Comments: 8 pages, 4 figures. Submitted to the 2026 65th IEEE Conference on Decision and Control (CDC) to be held in Honolulu, HI, USA
Subjects: Systems and Control (eess.SY)
[815]  arXiv:2509.10830 (replaced) [pdf, ps, other]
Title: The Siren Song of LLMs: How Users Perceive and Respond to Dark Patterns in Large Language Models
Comments: 23 pages, 7 figures. Accepted at CHI 2026 (ACM Conference on Human Factors in Computing Systems), Barcelona, Spain. Project website: this https URL
Journal-ref: In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26), April 13-17, 2026, Barcelona, Spain. ACM, New York, NY, USA, 23 pages
Subjects: Human-Computer Interaction (cs.HC)
[816]  arXiv:2509.11839 (replaced) [pdf, ps, other]
Title: TrajBooster: Boosting Humanoid Whole-Body Manipulation via Trajectory-Centric Learning
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[817]  arXiv:2509.12702 (replaced) [pdf, ps, other]
Title: UDON: Uncertainty-weighted Distributed Optimization for Multi-Robot Neural Implicit Mapping under Extreme Communication Constraints
Comments: Accepted to IEEE International Conference on Robotics and Automation (ICRA 2026)
Subjects: Robotics (cs.RO)
[818]  arXiv:2509.13093 (replaced) [pdf, ps, other]
Title: GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR
Comments: This paper has been submitted to Interspeech 2026 for review
Subjects: Sound (cs.SD)
[819]  arXiv:2509.14295 (replaced) [pdf, ps, other]
Title: Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
[820]  arXiv:2509.19985 (replaced) [pdf, ps, other]
Title: Pi-transformer: A prior-informed dual-attention model for multivariate time-series anomaly detection
Journal-ref: Applied Soft Computing 195 (2026) 115029
Subjects: Machine Learning (cs.LG)
[821]  arXiv:2509.21105 (replaced) [pdf, ps, other]
Title: UAV-Enabled ISAC with Fluid Antennas for Low-Altitude Wireless Networks
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[822]  arXiv:2509.21181 (replaced) [pdf, ps, other]
Title: Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[823]  arXiv:2509.22363 (replaced) [pdf, ps, other]
Title: Investigating Faithfulness in Large Audio Language Models
Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[824]  arXiv:2509.22414 (replaced) [pdf, ps, other]
Title: LucidFlux: Caption-Free Photo-Realistic Image Restoration via a Large-Scale Diffusion Transformer
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[825]  arXiv:2509.22459 (replaced) [pdf, ps, other]
Title: Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[826]  arXiv:2509.22592 (replaced) [pdf, ps, other]
Title: OT-MeanFlow3D: Bridging Optimal Transport and Meanflow for Efficient 3D Point Cloud Generation
Subjects: Machine Learning (cs.LG)
[827]  arXiv:2509.22925 (replaced) [pdf, ps, other]
Title: Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
Comments: ICLR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[828]  arXiv:2509.23098 (replaced) [pdf, ps, other]
Title: Blind to Position, Biased in Language: Probing Mid-Layer Representational Bias in Vision-Language Encoders for Zero-Shot Language-Grounded Spatial Understanding
Comments: 61 pages, 28 Figures, 15 Tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[829]  arXiv:2509.23366 (replaced) [pdf, ps, other]
Title: Splines-Based Feature Importance in Kolmogorov-Arnold Networks: A Framework for Supervised Tabular Data Dimensionality Reduction
Subjects: Machine Learning (cs.LG)
[830]  arXiv:2509.25722 (replaced) [pdf, ps, other]
Title: Transformer-Based Rate Prediction for Multi-Band Cellular Handsets
Comments: Accepted to IEEE ICC 2026 Workshop on Intelligent Movable and Reconfigurable Antennas for Future Wireless Communication and Sensing (WS02)
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)
[831]  arXiv:2509.26642 (replaced) [pdf, ps, other]
Title: MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Comments: Project page: this https URL
Subjects: Robotics (cs.RO)
[832]  arXiv:2510.00671 (replaced) [pdf, ps, other]
Title: Milco: Learned Sparse Retrieval Across Languages via a Multilingual Connector
Comments: ICLR 2026
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
[833]  arXiv:2510.01242 (replaced) [pdf, ps, other]
Title: Redundancy-as-Masking: Formalizing the Artificial Age Score (AAS) to Model Memory Aging in Generative AI
Comments: 37 pages, 17 figures. Includes theoretical development and mathematical proofs of the Artificial Age Score (AAS), with empirical illustrations via ChatGPT-based memory recall experiments
Journal-ref: Frontiers in Artificial Intelligence 9 (2026), 1732691
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
[834]  arXiv:2510.01643 (replaced) [pdf, ps, other]
Title: Support Basis: Fast Attention Beyond Bounded Entries
Comments: AISTATS 2026 (Spotlight). Our code can be found at: this https URL
Subjects: Machine Learning (cs.LG)
[835]  arXiv:2510.02691 (replaced) [pdf, ps, other]
Title: FSFSplatter: Build Surface and Novel Views with Sparse-Views within 2min
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[836]  arXiv:2510.03182 (replaced) [pdf, ps, other]
Title: Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning
Comments: 40 pages, 6 figures, 13 tables
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Symbolic Computation (cs.SC)
[837]  arXiv:2510.04265 (replaced) [pdf, ps, other]
Title: Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation
Comments: OpenReview (ICLR 2026): this https URL
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Statistics Theory (math.ST); Machine Learning (stat.ML)
[838]  arXiv:2510.04547 (replaced) [pdf, ps, other]
Title: Activation Quantization of Vision Encoders Needs Prefixing Registers
Comments: under review; 28 pages, 9 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[839]  arXiv:2510.04714 (replaced) [pdf, ps, other]
Title: Object-Centric Representation Learning for Enhanced 3D Semantic Scene Graph Prediction
Comments: Accepted by NeurIPS 2025. Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[840]  arXiv:2510.05345 (replaced) [pdf, ps, other]
Title: A System Level Approach to LQR Control of the Diffusion Equation
Comments: 8 pages, 2 figures, Submitted to IEEE American Control Conference 2026
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[841]  arXiv:2510.06265 (replaced) [pdf, ps, other]
Title: Large Language Models Hallucination: A Comprehensive Survey
Subjects: Computation and Language (cs.CL)
[842]  arXiv:2510.06296 (replaced) [pdf, ps, other]
Title: VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)
[843]  arXiv:2510.07842 (replaced) [pdf, ps, other]
Title: AdaSwitch: Balancing Exploration and Guidance in Knowledge Distillation via Adaptive Switching
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[844]  arXiv:2510.08316 (replaced) [pdf, ps, other]
Title: Unlocking 3D Affordance Segmentation with 2D Semantic Knowledge
Comments: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[845]  arXiv:2510.08388 (replaced) [pdf, ps, other]
Title: If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models
Comments: EACL 2026 Main, 23 pages, 12 figures
Subjects: Computation and Language (cs.CL)
[846]  arXiv:2510.08581 (replaced) [pdf, ps, other]
Title: Evaluating Hallucinations in Audio-Visual Multimodal LLMs with Spoken Queries under Diverse Acoustic Conditions
Comments: Submitted to Interspeech2026
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[847]  arXiv:2510.08663 (replaced) [pdf, ps, other]
Title: Augmenting Rating-Scale Measures with Text-Derived Items Using the Information-Determined Scoring (IDS) Framework
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[848]  arXiv:2510.08882 (replaced) [pdf, ps, other]
Title: An Improved Model-Free Decision-Estimation Coefficient with Applications in Adversarial MDPs
Comments: ICLR 2026
Subjects: Machine Learning (cs.LG)
[849]  arXiv:2510.08953 (replaced) [pdf, ps, other]
Title: Direct Data-Driven Predictive Control for a Three-dimensional Cable-Driven Soft Robotic Arm
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[850]  arXiv:2510.09255 (replaced) [pdf, ps, other]
Title: DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning
Subjects: Computation and Language (cs.CL)
[851]  arXiv:2510.10053 (replaced) [pdf, ps, other]
Title: DREAM: A Benchmark Study for Deepfake photoREalism AssessMent
Comments: Accepted by IEEE T-PAMI
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[852]  arXiv:2510.10846 (replaced) [pdf, ps, other]
Title: DUAL-Bench: Measuring Over-Refusal and Robustness in Vision-Language Models
Comments: 26pages, 13 figures, Preprint
Subjects: Computation and Language (cs.CL)
[853]  arXiv:2510.11173 (replaced) [pdf, ps, other]
Title: CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
Comments: Accepted to ICLR 2026. 20 pages, 8 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[854]  arXiv:2510.11618 (replaced) [pdf, ps, other]
Title: StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models
Comments: Accepted by AAAI 2026. Project: this https URL
Journal-ref: Proceedings of the AAAI Conference on Artificial Intelligence, 2026, 40(36), 30359-30367
Subjects: Computation and Language (cs.CL); Multiagent Systems (cs.MA)
[855]  arXiv:2510.14369 (replaced) [pdf, ps, other]
Title: From Binary to Bilingual: How the National Weather Service is Using Artificial Intelligence to Develop a Comprehensive Translation Program
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
[856]  arXiv:2510.14615 (replaced) [pdf, ps, other]
Title: Accelerated Multi-Modal Motion Planning Using Context-Conditioned Diffusion Models
Comments: Accepted for publication at the 2026 IEEE International Conference on Robotics & Automation (ICRA 2026)
Subjects: Robotics (cs.RO)
[857]  arXiv:2510.15299 (replaced) [pdf, ps, other]
Title: GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework
Comments: Accepted by WWW'26
Subjects: Information Retrieval (cs.IR)
[858]  arXiv:2510.15770 (replaced) [pdf, ps, other]
Title: Towards more holistic interpretability: A lightweight disentangled Concept Bottleneck Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[859]  arXiv:2510.16001 (replaced) [pdf, ps, other]
Title: An Order-Sensitive Conflict Measure for Random Permutation Sets
Subjects: Artificial Intelligence (cs.AI)
[860]  arXiv:2510.16297 (replaced) [pdf, ps, other]
Title: AC Dynamics-aware Trajectory Optimization with Binary Enforcement for Adaptive UFLS Design
Subjects: Systems and Control (eess.SY)
[861]  arXiv:2510.16344 (replaced) [pdf, ps, other]
Title: Manual2Skill++: Connector-Aware General Robotic Assembly from Instruction Manuals via Vision-Language Models
Journal-ref: ICRA2026
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[862]  arXiv:2510.17238 (replaced) [pdf, ps, other]
Title: StreamingThinker: Large Language Models Can Think While Reading
Comments: ICLR 2026
Subjects: Computation and Language (cs.CL)
[863]  arXiv:2510.18391 (replaced) [pdf, ps, other]
Title: MPDR Beamforming for Almost-Cyclostationary Processes
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[864]  arXiv:2510.19731 (replaced) [pdf, ps, other]
Title: Bridging Earth and Space: A Survey on HAPS for Non-Terrestrial Networks
Authors: G. Svistunov (1), A. Akhtarshenas (1), D. López-Pérez (1), M. Giordani (2), G. Geraci (3), H. Yanikomeroglu (4) ((1) Universitat Politècnica de València, (2) University of Padova, (3) Universitat Pompeu Fabra, (4) Carleton University)
Comments: 40 pages. This work has been submitted to IEEE Communications Surveys & Tutorials (under review)
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
[865]  arXiv:2510.20012 (replaced) [pdf, ps, other]
Title: AI Pose Analysis and Kinematic Profiling of Range-of-Motion Variations in Resistance Training
Authors: Adam Diamant
Subjects: Applications (stat.AP); Computer Vision and Pattern Recognition (cs.CV)
[866]  arXiv:2510.20558 (replaced) [pdf, ps, other]
Title: From Far and Near: Perceptual Evaluation of Crowd Representations Across Levels of Detail
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC)
[867]  arXiv:2510.20579 (replaced) [pdf, ps, other]
Title: Open-o3-Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[868]  arXiv:2510.22689 (replaced) [pdf, ps, other]
Title: Rule-Based Explanations for Retrieval-Augmented LLM Systems
Subjects: Computation and Language (cs.CL)
[869]  arXiv:2510.26352 (replaced) [pdf, ps, other]
Title: The Geometry of Dialogue: Graphing Language Models to Reveal Synergistic Teams for Multi-Agent Collaboration
Comments: Accepted at the AAAI-26 Workshop on LLM-based Multi-Agent Systems: Towards Responsible, Reliable, and Scalable Agentic Systems (LaMAS 2026) as an oral presentation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
[870]  arXiv:2511.02434 (replaced) [pdf, ps, other]
Title: Who's Who? LLM-assisted Software Traceability with Architecture Entity Recognition
Subjects: Software Engineering (cs.SE)
[871]  arXiv:2511.03117 (replaced) [pdf, ps, other]
Title: Tracing Generative AI in Digital Art: A Longitudinal Study of Chinese Painters' Attitudes, Practices, and Identity Negotiation
Comments: In Submission
Subjects: Human-Computer Interaction (cs.HC)
[872]  arXiv:2511.08462 (replaced) [pdf, ps, other]
Title: QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities
Subjects: Cryptography and Security (cs.CR); Programming Languages (cs.PL); Software Engineering (cs.SE)
[873]  arXiv:2511.08905 (replaced) [pdf, ps, other]
Title: iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
Comments: Accepted by AAAI 2026
Journal-ref: Proc. AAAI Conf. Artif. Intell. 40(42): 23984-23992, 2026
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[874]  arXiv:2511.09731 (replaced) [pdf, ps, other]
Title: FlowCast: Advancing Precipitation Nowcasting with Conditional Flow Matching
Comments: Accepted to ICLR 2026
Subjects: Machine Learning (cs.LG)
[875]  arXiv:2511.11052 (replaced) [pdf, ps, other]
Title: AdaptPNP: Integrating Prehensile and Non-Prehensile Skills for Adaptive Robotic Manipulation
Journal-ref: ICRA 2026
Subjects: Robotics (cs.RO)
[876]  arXiv:2511.11599 (replaced) [pdf, ps, other]
Title: SynBullying: A Multi LLM Synthetic Conversational Dataset for Cyberbullying Detection
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
[877]  arXiv:2511.14070 (replaced) [pdf, ps, other]
Title: ELiC: Efficient LiDAR Geometry Compression via Cross-Bit-depth Feature Propagation and Bag-of-Encoders
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[878]  arXiv:2511.14702 (replaced) [pdf, ps, other]
Title: Seeing Beyond the Image: ECG and Anatomical Knowledge-Guided Myocardial Scar Segmentation from Late Gadolinium-Enhanced Images
Comments: ISBI 2026 (oral presentation)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[879]  arXiv:2511.14763 (replaced) [pdf, ps, other]
Title: Membership Inference Attack against Large Language Model-based Recommendation Systems: A New Distillation-based Paradigm
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
[880]  arXiv:2511.18415 (replaced) [pdf, ps, other]
Title: DuoTeach: Dual Role Self-Teaching for Coarse-to-Fine Decision Coordination in Vision--Language Models
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[881]  arXiv:2511.20636 (replaced) [pdf, ps, other]
Title: Image2Gcode: Image-to-G-code Generation for Additive Manufacturing Using Diffusion-Transformer Model
Subjects: Machine Learning (cs.LG)
[882]  arXiv:2511.20649 (replaced) [pdf, ps, other]
Title: Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout
Comments: CVPR 2026 | Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[883]  arXiv:2511.20909 (replaced) [pdf, ps, other]
Title: Evolved Sample Weights for Bias Mitigation: Effectiveness Depends on the Fairness Objective
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[884]  arXiv:2511.21399 (replaced) [pdf, ps, other]
Title: Steering Awareness: Detecting Activation Steering from Within
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[885]  arXiv:2511.22184 (replaced) [pdf, ps, other]
Title: Shoe Style-Invariant and Ground-Aware Learning for Dense Foot Contact Estimation
Comments: Accepted at CVPR 2026. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[886]  arXiv:2511.22422 (replaced) [pdf, ps, other]
Title: Weyl distributions, spectral properties, and circulant approximation results for quaternion block multilevel Toeplitz matrix sequences
Subjects: Numerical Analysis (math.NA)
[887]  arXiv:2511.23253 (replaced) [pdf, ps, other]
Title: AgroCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture
Subjects: Artificial Intelligence (cs.AI)
[888]  arXiv:2512.00960 (replaced) [pdf, ps, other]
Title: Efficient and Scalable Monocular Human-Object Interaction Motion Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[889]  arXiv:2512.02906 (replaced) [pdf, ps, other]
Title: MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image Understanding
Comments: Accepted to CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[890]  arXiv:2512.03497 (replaced) [pdf, ps, other]
Title: Cell-cell Communication Inference and Analysis: Biological Mechanisms, Computational Approaches, and Future Opportunities
Comments: Accepted by CSIAM Transactions on Life Sciences (2026)
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Cell Behavior (q-bio.CB)
[891]  arXiv:2512.06174 (replaced) [pdf, ps, other]
Title: Embedding Physical Reasoning into Diffusion-Based Shadow Generation
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[892]  arXiv:2512.06179 (replaced) [pdf, ps, other]
Title: Cast and Attached Shadow Detection via Iterative Light and Geometry Reasoning
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[893]  arXiv:2512.06679 (replaced) [pdf, ps, other]
Title: CMV-Fuse: Cross Modal-View Fusion of AMR, Syntax, and Knowledge Representations for Aspect Based Sentiment Analysis
Subjects: Computation and Language (cs.CL)
[894]  arXiv:2512.07400 (replaced) [pdf, ps, other]
Title: Heads collapse, features stay: Why Replay needs big buffers
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[895]  arXiv:2512.08193 (replaced) [pdf, ps, other]
Title: ClinicalTrialsHub: Bridging Registries and Literature for Comprehensive Clinical Trial Access
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
[896]  arXiv:2512.09162 (replaced) [pdf, ps, other]
Title: GTAvatar: Bridging Gaussian Splatting and Texture Mapping for Relightable and Editable Gaussian Avatars
Comments: Accepted to Eurographics 2026. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[897]  arXiv:2512.10989 (replaced) [pdf, ps, other]
Title: Generalization of Long-Range Machine Learning Potentials in Complex Chemical Spaces
Subjects: Chemical Physics (physics.chem-ph); Machine Learning (cs.LG)
[898]  arXiv:2512.12752 (replaced) [pdf, ps, other]
Title: Newton Methods for Mean Field Games: A Numerical Study
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
[899]  arXiv:2512.13913 (replaced) [pdf, ps, other]
Title: Capturing reduced-order quantum many-body dynamics out of equilibrium via neural ordinary differential equations
Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Quantum Physics (quant-ph)
[900]  arXiv:2512.14640 (replaced) [pdf, ps, other]
Title: A Multicenter Benchmark of Multiple Instance Learning Models for Lymphoma Subtyping from HE-stained Whole Slide Images
Comments: 19 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[901]  arXiv:2512.17466 (replaced) [pdf, ps, other]
Title: Linear Attention for Joint Power Optimization and User-Centric Clustering in Cell-Free Networks
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
[902]  arXiv:2512.18561 (replaced) [pdf, ps, other]
Title: Adaptive Accountability in Networked MAS: Tracing and Mitigating Emergent Norms at Scale
Authors: Saad Alqithami
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
[903]  arXiv:2512.19980 (replaced) [pdf, ps, other]
Title: Neuron-Guided Interpretation of Code LLMs: Where, Why, and How?
Comments: Accepted by FSE2026
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[904]  arXiv:2512.20305 (replaced) [pdf, ps, other]
Title: A Structured Nonparametric Framework for Nonlinear Accelerated Failure Time Models (KAN-AFT)
Comments: A new development in Survival Analysis based on the celebrated Kolmogorov-Arnold Networks (KANs)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[905]  arXiv:2512.20651 (replaced) [pdf, ps, other]
Title: Memory Bear AI A Breakthrough from Memory to Cognition Toward Artificial General Intelligence
Authors: Deliang Wen, Ke Sun
Subjects: Artificial Intelligence (cs.AI)
[906]  arXiv:2512.21276 (replaced) [pdf, ps, other]
Title: GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation
Comments: Transactions on ML Research (TMLR) 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[907]  arXiv:2512.23178 (replaced) [pdf, ps, other]
Title: Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis
Authors: Zijian Liu
Comments: A preliminary conference version is accepted at ICLR 2026. This full version includes the formal statements of lower bounds and their proofs. Moreover, the upper bounds are slightly improved
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[908]  arXiv:2512.24338 (replaced) [pdf, ps, other]
Title: The Mechanics of CNN Filtering with Rectification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[909]  arXiv:2601.01995 (replaced) [pdf, ps, other]
Title: Locally-averaged McCormick relaxations for discretization-regularized inverse problems
Subjects: Numerical Analysis (math.NA)
[910]  arXiv:2601.02957 (replaced) [pdf, ps, other]
Title: LLM-Augmented Changepoint Detection: A Framework for Ensemble Detection and Automated Explanation
Subjects: Computation and Language (cs.CL)
[911]  arXiv:2601.04614 (replaced) [pdf, ps, other]
Title: HyperAlign: Hyperbolic Entailment Cones for Adaptive Text-to-Image Alignment Assessment
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[912]  arXiv:2601.05770 (replaced) [pdf, ps, other]
Title: Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[913]  arXiv:2601.06134 (replaced) [pdf, ps, other]
Title: DeeperBrain: A Neuro-Grounded EEG Foundation Model Towards Universal BCI
Comments: Preprint
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)
[914]  arXiv:2601.07527 (replaced) [pdf, ps, other]
Title: Energy-efficient torque allocation for straight-line driving of electric vehicles based on pseudoconvex polynomials
Comments: 21 pages, 8 figures
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[915]  arXiv:2601.07632 (replaced) [pdf, ps, other]
Title: GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[916]  arXiv:2601.07646 (replaced) [pdf, ps, other]
Title: Studying the Role of Synthetic Data for Machine Learning-based Wireless Networks Traffic Forecasting
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
[917]  arXiv:2601.08082 (replaced) [pdf, ps, other]
Title: Hierarchical Precision and Recursion for Accelerating Symmetric Linear Solves on MXUs
Comments: 10 pages, 11 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET); Mathematical Software (cs.MS); Performance (cs.PF)
[918]  arXiv:2601.08709 (replaced) [pdf, ps, other]
Title: Multi-Preconditioned LBFGS for Training Finite-Basis PINNs
Comments: 13 pages
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
[919]  arXiv:2601.09658 (replaced) [pdf, ps, other]
Title: Image2Garment: Simulation-ready Garment Generation from a Single Image
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[920]  arXiv:2601.09735 (replaced) [pdf, ps, other]
Title: Multiverse: Transactional Memory with Dynamic Multiversioning
Subjects: Databases (cs.DB)
[921]  arXiv:2601.10971 (replaced) [pdf, ps, other]
Title: AJAR: Adaptive Jailbreak Architecture for Red-teaming
Authors: Yipu Dou, Wang Yang
Comments: 7 pages, 3 figures. Code and data available at this https URL
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
[922]  arXiv:2601.13590 (replaced) [pdf, ps, other]
Title: Vulnerability of LLMs' Stated Beliefs? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions
Comments: Updated new models and minor revisions
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[923]  arXiv:2601.13751 (replaced) [pdf, ps, other]
Title: Towards Onboard Continuous Change Detection for Floods
Comments: 19 pages, 9 figures, accepted at GISTAM 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[924]  arXiv:2601.14637 (replaced) [pdf, ps, other]
Title: Forest-Chat: Adapting Vision-Language Agents for Interactive Forest Change Analysis
Comments: 28 pages, 9 figures, 12 tables, Submitted to Ecological Informatics
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[925]  arXiv:2601.14758 (replaced) [pdf, ps, other]
Title: Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[926]  arXiv:2601.15165 (replaced) [pdf, ps, other]
Title: The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
Comments: Code and pre-trained models: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[927]  arXiv:2601.15644 (replaced) [pdf, ps, other]
Title: SuperOcc: Toward Cohesive Temporal Modeling for Superquadric-based 3D Occupancy Prediction
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[928]  arXiv:2601.18032 (replaced) [pdf, ps, other]
Title: Multimodal Machine Learning for Soft High-k Elastomers under Data Scarcity
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
[929]  arXiv:2601.19146 (replaced) [pdf, ps, other]
Title: The Promise and Reality of Continuous Integration Caching: An Empirical Study of Travis CI Builds
Comments: Accepted at the 30th International Conference on Evaluation and Assessment in Software Engineering (EASE '26)
Subjects: Software Engineering (cs.SE)
[930]  arXiv:2601.19529 (replaced) [pdf, ps, other]
Title: RhoMorph: Rhombus-shaped Deformable Modular Robots for Stable, Medium-Independent Reconfiguration Motion
Subjects: Robotics (cs.RO)
[931]  arXiv:2601.19903 (replaced) [pdf, ps, other]
Title: STELLAR: Structure-guided LLM Assertion Retrieval and Generation for Formal Verification
Comments: Accepted at the 63rd Design Automation Conference (DAC 2026), Long Beach, CA, USA (July 26-29, 2026) 7 pages, 6 figures
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)
[932]  arXiv:2601.20413 (replaced) [pdf, ps, other]
Title: Schadenfreude in the Digital Public Sphere: A cross-national and decade-long analysis of Facebook news engagement
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)
[933]  arXiv:2601.21690 (replaced) [pdf, ps, other]
Title: A Unified Generalization Framework for Model Merging: Trade-offs, Non-Linearity, and Scaling Laws
Subjects: Machine Learning (cs.LG)
[934]  arXiv:2601.21737 (replaced) [pdf, ps, other]
Title: Mixed-Precision Training and Compilation for RRAM-based Computing-in-Memory Accelerators
Comments: PREPRINT - Accepted for publication at the Design, Automation & Test in Europe Conference & Exhibition (DATE), April 20-22, 2026, in Verona, Italy V2 - fixed typos
Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET)
[935]  arXiv:2601.22244 (replaced) [pdf, ps, other]
Title: Is Hierarchical Quantization Essential for Optimal Reconstruction?
Comments: Code available at : this https URL
Journal-ref: Proceedings of ICPRAM 2026; ISBN 978-989-758-797-9; ISSN 2184-4313, SciTePress, pages 671-679
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[936]  arXiv:2602.00114 (replaced) [pdf, ps, other]
Title: 1S-DAug: One-Shot Data Augmentation for Robust Few-Shot Generalization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[937]  arXiv:2602.00159 (replaced) [pdf, ps, other]
Title: Sheaf Neural Networks and biomedical applications
Comments: Bibliography updated
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[938]  arXiv:2602.01537 (replaced) [pdf, ps, other]
Title: LMI Optimization Based Multirate Steady-State Kalman Filter Design
Authors: Hiroshi Okajima
Comments: Revised and resubmitted to IEEE ACCESS
Subjects: Systems and Control (eess.SY)
[939]  arXiv:2602.02290 (replaced) [pdf, ps, other]
Title: Hallucination or Creativity: How to Evaluate AI-Generated Scientific Stories?
Journal-ref: Proceedings of the Text2Story'26 Workshop, Delft (The Netherlands), 29-March-2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[940]  arXiv:2602.02469 (replaced) [pdf, ps, other]
Title: Age-Aware Edge-Blind Federated Learning via Over-the-Air Aggregation
Comments: To appear in IEEE ICC 2026
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
[941]  arXiv:2602.02606 (replaced) [pdf, ps, other]
Title: Gender Dynamics and Homophily in a Social Network of LLM Agents
Comments: Under Review
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[942]  arXiv:2602.02832 (replaced) [pdf, ps, other]
Title: Koopman Autoencoders with Continuous-Time Latent Dynamics for Fluid Dynamics Forecasting
Subjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
[943]  arXiv:2602.04023 (replaced) [pdf, ps, other]
Title: Exploring Emerging Norms of AI Attribution and Disclosure in Programming Education
Subjects: Human-Computer Interaction (cs.HC)
[944]  arXiv:2602.04831 (replaced) [pdf, ps, other]
Title: Review of Superconducting Qubit Devices and Their Large-Scale Integration
Authors: Hiu Yung Wong
Subjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)
[945]  arXiv:2602.05344 (replaced) [pdf, ps, other]
Title: Wi-Fi Radar via Over-the-Air Referencing: Bridging Wi-Fi Sensing and Bistatic Radar
Authors: Koji Yamamoto
Comments: Currently under review
Subjects: Networking and Internet Architecture (cs.NI)
[946]  arXiv:2602.06023 (replaced) [pdf, ps, other]
Title: Developing a Discrete-Event Simulator of School Shooter Behavior from VR Data
Comments: Accepted for presentation at ANNSIM 2026. Camera-ready version. 13 pages, 4 figures, 4 tables
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)
[947]  arXiv:2602.06175 (replaced) [pdf, ps, other]
Title: Optimal rates for density and mode estimation with expand-and-sparsify representations
Comments: Accepted at AISTATS 2026
Subjects: Statistics Theory (math.ST); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
[948]  arXiv:2602.06450 (replaced) [pdf, ps, other]
Title: What Is Wrong with Synthetic Data for Scene Text Recognition? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution
Comments: Accepted by CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[949]  arXiv:2602.07684 (replaced) [pdf, ps, other]
Title: Quantifying resilience for distribution system customers with SALEDI
Subjects: Systems and Control (eess.SY); Applications (stat.AP)
[950]  arXiv:2602.07975 (replaced) [pdf, ps, other]
Title: Leader-following Consensus over Jointly Connected Switching Networks is Achievable for Exponentially Unstable Linear Systems
Subjects: Optimization and Control (math.OC); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
[951]  arXiv:2602.08199 (replaced) [pdf, ps, other]
Title: Fork, Explore, Commit: OS Primitives for Agentic Exploration
Subjects: Operating Systems (cs.OS); Distributed, Parallel, and Cluster Computing (cs.DC)
[952]  arXiv:2602.09023 (replaced) [pdf, ps, other]
Title: TwinRL-VLA: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation
Subjects: Robotics (cs.RO)
[953]  arXiv:2602.11534 (replaced) [pdf, ps, other]
Title: Krause Synchronization Transformers
Comments: Project page: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[954]  arXiv:2602.13647 (replaced) [pdf, ps, other]
Title: SF-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Question Answering
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
[955]  arXiv:2602.14200 (replaced) [pdf, ps, other]
Title: TS-Haystack: A Multi-Scale Retrieval Benchmark for Time Series Language Models
Comments: ICLR TSALM 2026. Benchmark generation code and datasets: this https URL
Subjects: Machine Learning (cs.LG)
[956]  arXiv:2602.16424 (replaced) [pdf, ps, other]
Title: Verifiable Semantics for Agent-to-Agent Communication
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
[957]  arXiv:2602.16698 (replaced) [pdf, ps, other]
Title: Causality is Key for Interpretability Claims to Generalise
Subjects: Machine Learning (cs.LG)
[958]  arXiv:2602.18377 (replaced) [pdf, ps, other]
Title: Theory and interpretability of Quantum Extreme Learning Machines: a Pauli-transfer matrix approach
Comments: 36 pages, 14 figures
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
[959]  arXiv:2602.19373 (replaced) [pdf, ps, other]
Title: Stable Deep Reinforcement Learning via Isotropic Gaussian Representations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[960]  arXiv:2602.20537 (replaced) [pdf, ps, other]
Title: PFGNet: A Fully Convolutional Frequency-Guided Peripheral Gating Network for Efficient Spatiotemporal Predictive Learning
Comments: Accepted to CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[961]  arXiv:2602.20558 (replaced) [pdf, ps, other]
Title: From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation at Industry Scale
Comments: Work in progress
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[962]  arXiv:2602.21415 (replaced) [pdf, ps, other]
Title: Benchmarking State Space Models, Transformers, and Recurrent Networks for US Grid Forecasting
Comments: 11 pages, 2 figures, 8 tables
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[963]  arXiv:2602.21814 (replaced) [pdf, ps, other]
Title: Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem
Authors: Heejin Jo
Comments: 9 pages, 4 tables
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[964]  arXiv:2602.21877 (replaced) [pdf, ps, other]
Title: How to Take a Memorable Picture? Empowering Users with Actionable Feedback
Comments: Accepted @ CVPR 2026. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[965]  arXiv:2602.22249 (replaced) [pdf, ps, other]
Title: Improving Spatial Allocation for Energy System Coupling with Graph Neural Networks
Comments: Accepted at XXIV Power Systems Computation Conference (PSCC 2026)
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[966]  arXiv:2602.22911 (replaced) [pdf, ps, other]
Title: CeRA: Breaking the Linear Ceiling of Low-Rank Adaptation via Manifold Expansion
Authors: Hung-Hsuan Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[967]  arXiv:2602.23696 (replaced) [pdf, ps, other]
Title: Optimizer-Induced Low-Dimensional Drift and Transverse Dynamics in Transformer Training
Authors: Yongzhong Xu
Comments: 23 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[968]  arXiv:2602.24055 (replaced) [pdf, ps, other]
Title: CIRCLE: A Framework for Evaluating AI from a Real-World Lens
Comments: Accepted at Intelligent Systems Conference (IntelliSys) 2026
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
[969]  arXiv:2602.24149 (replaced) [pdf, ps, other]
Title: What You Read is What You Classify: Highlighting Attributions to Text and Text-Like Inputs
Comments: 15 pages, 8 figures
Subjects: Machine Learning (cs.LG); Genomics (q-bio.GN)
[970]  arXiv:2603.00270 (replaced) [pdf, ps, other]
Title: Transformers Remember First, Forget Last: Dual-Process Interference in LLMs
Comments: 16 pages, 10 figures. Under review
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[971]  arXiv:2603.00283 (replaced) [pdf, ps, other]
Title: Robust Adaptive MPC in the Presence of Nonlinear Time-Varying Uncertainties: An Uncertainty Compensation Approach
Subjects: Systems and Control (eess.SY)
[972]  arXiv:2603.00601 (replaced) [pdf, ps, other]
Title: Theory of Code Space: Do Code Agents Understand Software Architecture?
Authors: Grigory Sapunov
Comments: updated figures and numbers
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[973]  arXiv:2603.01122 (replaced) [pdf, ps, other]
Title: Fast Confidence-Aware Human Prediction via Hardware-accelerated Bayesian Inference for Safe Robot Navigation
Comments: Update the paper
Subjects: Robotics (cs.RO)
[974]  arXiv:2603.01176 (replaced) [pdf, ps, other]
Title: Path Integral Particle Filtering for Hybrid Systems via Saltation Matrices
Subjects: Robotics (cs.RO)
[975]  arXiv:2603.01179 (replaced) [pdf, ps, other]
Title: A402: Binding Cryptocurrency Payments to Service Execution for Agentic Commerce
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
[976]  arXiv:2603.02097 (replaced) [pdf, ps, other]
Title: ClinConsensus: A Consensus-Based Benchmark for Evaluating Chinese Medical LLMs across Difficulty Levels
Comments: 8 pages, 6 figures,
Subjects: Computation and Language (cs.CL)
[977]  arXiv:2603.02538 (replaced) [pdf, ps, other]
Title: PathSpace: Rapid continuous map approximation for efficient SLAM using B-Splines in constrained environments
Subjects: Robotics (cs.RO)
[978]  arXiv:2603.03415 (replaced) [pdf, ps, other]
Title: Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[979]  arXiv:2603.03686 (replaced) [pdf, ps, other]
Title: AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment
Authors: Jiangyu Chen
Subjects: Artificial Intelligence (cs.AI)
[980]  arXiv:2603.03740 (replaced) [pdf, ps, other]
Title: Whole-Body Safe Control of Robotic Systems with Koopman Neural Dynamics
Subjects: Robotics (cs.RO)
[981]  arXiv:2603.04172 (replaced) [pdf, ps, other]
Title: The Pivotal Information Criterion
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Methodology (stat.ME)
[982]  arXiv:2603.05560 (replaced) [pdf, ps, other]
Title: Towards Efficient and Stable Ocean State Forecasting: A Continuous-Time Koopman Approach
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Applied Physics (physics.app-ph); Computational Physics (physics.comp-ph); Geophysics (physics.geo-ph)
[983]  arXiv:2603.05789 (replaced) [pdf, ps, other]
Title: The Coordination Gap: Multi-Agent Alternation Metrics for Temporal Fairness in Repeated Games
Comments: 41 pages, 5 figures, 4 tables, 1 supplementary pdf. Submitted to Social Choice & Welfare
Subjects: Multiagent Systems (cs.MA); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
[984]  arXiv:2603.05947 (replaced) [pdf, ps, other]
Title: LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[985]  arXiv:2603.06082 (replaced) [pdf, ps, other]
Title: Offline Materials Optimization with CliqueFlowmer
Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
[986]  arXiv:2603.06488 (replaced) [pdf, ps, other]
Title: Score Reversal Is Not Free for Quantum Diffusion Models
Authors: Ammar Fayad
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Mathematical Physics (math-ph)
[987]  arXiv:2603.07131 (replaced) [pdf, ps, other]
Title: Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[988]  arXiv:2603.07300 (replaced) [src]
Title: AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery
Comments: arXiv admin note: This submission has been withdrawn due to violation of arXiv policies for acceptable submissions
Subjects: Machine Learning (cs.LG)
[989]  arXiv:2603.07313 (replaced) [pdf, ps, other]
Title: Adversarial Latent-State Training for Robust Policies in Partially Observable Domains
Comments: 30 pages, 8 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[990]  arXiv:2603.07514 (replaced) [pdf, ps, other]
Title: A Unified View of Drifting and Score-Based Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[991]  arXiv:2603.08033 (replaced) [pdf, ps, other]
Title: The Unit Gap: How Sharing Works in Boolean Circuits
Authors: Kirill Krinkin
Comments: 13 pages, 2 figures, 7 tables. Code and data: this https URL
Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Logic in Computer Science (cs.LO)
[992]  arXiv:2603.08380 (replaced) [pdf, ps, other]
Title: Structure from rank: Rank-order coding as a bridge from sequence to structure
Journal-ref: Neural Networks, Volume 200, August 2026, 108828
Subjects: Neural and Evolutionary Computing (cs.NE)
[993]  arXiv:2603.08762 (replaced) [pdf, ps, other]
Title: Formally Verifying Quantum Phase Estimation Circuits with 1,000+ Qubits
Comments: The work is accepted for presentation as a full research paper in IEEE-DCAS 2026 and the final version will be available via IEEE Xplore after the conference
Subjects: Quantum Physics (quant-ph); Logic in Computer Science (cs.LO)
[994]  arXiv:2603.09022 (replaced) [pdf, ps, other]
Title: MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games
Comments: Code has been released this https URL
Subjects: Artificial Intelligence (cs.AI)
[995]  arXiv:2603.09583 (replaced) [pdf, ps, other]
Title: Nonparametric Variational Differential Privacy via Embedding Parameter Clipping
Comments: 8 pages, 1 figure
Subjects: Machine Learning (cs.LG)
[996]  arXiv:2603.09909 (replaced) [pdf, ps, other]
Title: MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems
Subjects: Artificial Intelligence (cs.AI)
[997]  arXiv:2603.10651 (replaced) [pdf, ps, other]
Title: Interleaving Scheduling and Motion Planning with Incremental Learning of Symbolic Space-Time Motion Abstractions
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[998]  arXiv:2603.10779 (replaced) [pdf, ps, other]
Title: A Control-Theoretic Foundation for Agentic Systems
Subjects: Systems and Control (eess.SY)
[999]  arXiv:2603.11101 (replaced) [pdf, ps, other]
Title: Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
[1000]  arXiv:2603.11103 (replaced) [pdf, ps, other]
Title: Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining
Subjects: Software Engineering (cs.SE)
[1001]  arXiv:2603.11132 (replaced) [pdf, ps, other]
Title: WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Stealthy Context-Based Inference
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[1002]  arXiv:2603.11149 (replaced) [pdf, ps, other]
Title: Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[1003]  arXiv:2603.11201 (replaced) [pdf, ps, other]
Title: Representation Finetuning for Continual Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[1004]  arXiv:2603.11211 (replaced) [pdf, ps, other]
Title: A Simple Efficiency Incremental Learning Framework via Vision-Language Model with Nonlinear Multi-Adapters
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[1005]  arXiv:2603.11239 (replaced) [pdf, ps, other]
Title: Reversible Lifelong Model Editing via Semantic Routing-Based LoRA
Subjects: Artificial Intelligence (cs.AI)
[1006]  arXiv:2603.11360 (replaced) [pdf, ps, other]
Title: Fair-Gate: Fairness-Aware Interpretable Risk Gating for Sex-Fair Voice Biometrics
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[1007]  arXiv:2603.11667 (replaced) [pdf, ps, other]
Title: A technology-oriented mapping of the language and translation industry: Analysing stakeholder values and their potential implication for translation pedagogy
Comments: Under review
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[1008]  arXiv:2603.11715 (replaced) [pdf, ps, other]
Title: Affect Decoding in Phonated and Silent Speech Production from Surface EMG
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[1009]  arXiv:2603.11717 (replaced) [pdf, ps, other]
Title: COTONET: A custom cotton detection algorithm based on YOLO11 for stage of growth cotton boll detection
Comments: 15 pages, 11 figures. This paper will be submitted to Computers and Electronics in Agriculture, special issue
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1010]  arXiv:2603.11746 (replaced) [pdf, ps, other]
Title: SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1011]  arXiv:2603.12214 (replaced) [pdf, ps, other]
Title: WORKSWORLD: A Domain for Integrated Numeric Planning and Scheduling of Distributed Pipelined Workflows
Comments: To be published in Proceedings of the International Conference on Automated Planning and Scheduling Volume 36 (2026)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
[1012]  arXiv:2603.12372 (replaced) [pdf, ps, other]
Title: Efficient Reasoning with Balanced Thinking
Comments: Accepted by ICLR 2026
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[1013]  arXiv:2603.12564 (replaced) [pdf, ps, other]
Title: AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents
Comments: 50 pages, 31 tables, 15 figures. Under review at COLM 2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[1014]  arXiv:2603.12572 (replaced) [pdf, ps, other]
Title: LMEB: Long-horizon Memory Embedding Benchmark
Comments: 35 pages, 9 figures, 23 tables
Subjects: Computation and Language (cs.CL)
[1015]  arXiv:2603.12671 (replaced) [pdf, ps, other]
Title: HyGra: Accelerating Network-State Simulation for LLM Training in DCNs via Adaptive Packet-Flow Granularity
Comments: 14 pages, 7 figures and 5 tables
Subjects: Networking and Internet Architecture (cs.NI)
[1016]  arXiv:2603.12696 (replaced) [pdf, ps, other]
Title: HaltNav: Reactive Visual Halting over Lightweight Topological Priors for Robust Vision-Language Navigation
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[1017]  arXiv:2603.13032 (replaced) [pdf, ps, other]
Title: Multimodal OCR: Parse Anything from Documents
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1018]  arXiv:2603.13275 (replaced) [src]
Title: PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation
Comments: We are withdrawing this version due to issues identified in some experimental results and the need to further upgrade our method. This withdrawal ensures academic rigor and completeness, and a revised version will be submitted after improvements
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[1019]  arXiv:2603.13280 (replaced) [pdf, ps, other]
Title: A Stability-Aware Frozen Euler Autoencoder for Physics-Informed Tracking in Continuum Mechanics (SAFE-PIT-CM)
Authors: Emil Hovad
Comments: 16 pages, 8 figures, 8 tables
Subjects: Machine Learning (cs.LG)
[1020]  arXiv:2603.13570 (replaced) [pdf, ps, other]
Title: Privacy-Preserving Machine Learning for IoT: A Cross-Paradigm Survey and Future Roadmap
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[1021]  arXiv:2603.14042 (replaced) [pdf, ps, other]
Title: Block-QAOA-Aware Detection with Parameter Transfer for Large-Scale MIMO
Authors: Shuai Zeng
Comments: 12 pages, 3 figures, 1 table, 1 algorithm
Subjects: Quantum Physics (quant-ph); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[1022]  arXiv:2603.14047 (replaced) [pdf, ps, other]
Title: Distributional Uncertainty and Adaptive Decision-Making in System Co-design
Subjects: Optimization and Control (math.OC); Robotics (cs.RO); Systems and Control (eess.SY)
[1023]  arXiv:2603.14052 (replaced) [pdf, ps, other]
Title: A Multi-Agent Perception-Action Alliance for Efficient Long Video Reasoning
Comments: Accepted by CVPR2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
[1024]  arXiv:2603.14163 (replaced) [pdf, ps, other]
Title: Tail Bounds for Queues with Abandonment: Constant, Moderate, Large Deviations, and Efficient Concentration
Subjects: Probability (math.PR); Performance (cs.PF)
[1025]  arXiv:2603.14255 (replaced) [pdf, ps, other]
Title: ITKIT: Feasible CT Image Analysis based on SimpleITK and MMEngine
Subjects: Software Engineering (cs.SE); Computer Vision and Pattern Recognition (cs.CV)
[1026]  arXiv:2603.14324 (replaced) [pdf, ps, other]
Title: Learning-to-Defer with Expert-Conditioned Advice
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[1027]  arXiv:2603.14507 (replaced) [pdf, ps, other]
Title: Expanding mmWave Datasets for Human Pose Estimation with Unlabeled Data and LiDAR Datasets
Comments: Accepted by CVPR2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1028]  arXiv:2603.14601 (replaced) [pdf, ps, other]
Title: $K-$means with learned metrics
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Probability (math.PR)
[1029]  arXiv:2603.14831 (replaced) [pdf, ps, other]
Title: Neural Networks as Local-to-Global Computations
Comments: 43 pages, 21 figures
Subjects: Algebraic Topology (math.AT); Machine Learning (cs.LG); Dynamical Systems (math.DS)
[1030]  arXiv:2603.14899 (replaced) [pdf, ps, other]
Title: A New Lower Bounding Paradigm and Tighter Lower Bounds for Elastic Similarity Measures
Subjects: Databases (cs.DB)
[1031]  arXiv:2603.14994 (replaced) [pdf, ps, other]
Title: DP-S4S: Accurate and Scalable Select-Join-Aggregate Query Processing with User-Level Differential Privacy
Subjects: Databases (cs.DB); Cryptography and Security (cs.CR)
[1032]  arXiv:2603.15023 (replaced) [pdf, ps, other]
Title: SIMD-PAC-DB: Pretty Performant PAC Privacy
Subjects: Databases (cs.DB)
[1033]  arXiv:2603.15030 (replaced) [pdf, ps, other]
Title: VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining
Subjects: Artificial Intelligence (cs.AI)
[1034]  arXiv:2603.15159 (replaced) [pdf, ps, other]
Title: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
Comments: 12 pages
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[1035]  arXiv:2603.15379 (replaced) [pdf, ps, other]
Title: Revisiting the expressiveness of metric temporal logic : A tale of "Je t'aime, moi non plus."
Subjects: Logic in Computer Science (cs.LO)
[1036]  arXiv:2603.15434 (replaced) [pdf, ps, other]
Title: Listening to the Echo: User-Reaction Aware Policy Optimization via Scalar-Verbal Hybrid Reinforcement Learning
Comments: Updated case study figures for clarity
Subjects: Artificial Intelligence (cs.AI)
[1037]  arXiv:2603.15588 (replaced) [pdf, ps, other]
Title: Switching-Reference Voltage Control for Distribution Systems with AI-Training Data Centers
Subjects: Systems and Control (eess.SY)
[1038]  arXiv:2603.15678 (replaced) [pdf, ps, other]
Title: Spectral Edge Dynamics of Training Trajectories: Signal--Noise Geometry Across Scales
Authors: Yongzhong Xu
Comments: 16 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[1039]  arXiv:2603.15888 (replaced) [pdf, ps, other]
Title: AsgardBench -- Evaluating Visually Grounded Interactive Planning Under Minimal Feedback
Comments: 19 figures, 6 tables, including appendix
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[1040]  arXiv:2603.15973 (replaced) [pdf, ps, other]
Title: Safety is Non-Compositional: A Formal Framework for Capability-Based AI Systems
Authors: Cosimo Spera
Subjects: Artificial Intelligence (cs.AI)
[1041]  arXiv:2603.15978 (replaced) [pdf, ps, other]
Title: From Workflow Automation to Capability Closure: A Formal Framework for Safe and Revenue-Aware Customer Service AI
Authors: Cosimo Spera
Subjects: Artificial Intelligence (cs.AI)
[1042]  arXiv:2603.16060 (replaced) [pdf, ps, other]
Title: ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI)
[1043]  arXiv:2603.16098 (replaced) [pdf, ps, other]
Title: LICA: Layered Image Composition Annotations for Graphic Design Research
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[1044]  arXiv:2603.16128 (replaced) [pdf, ps, other]
Title: Social Simulacra in the Wild: AI Agent Communities on Moltbook
Comments: Preprint: 13 pages, 4 figures, 5 tables
Subjects: Computation and Language (cs.CL)
[1045]  arXiv:2603.16130 (replaced) [pdf, ps, other]
Title: EPOFusion: Exposure aware Progressive Optimization Method for Infrared and Visible Image Fusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1046]  arXiv:2603.16137 (replaced) [pdf, ps, other]
Title: SIA: A Synthesize-Inject-Align Framework for Knowledge-Grounded and Secure E-commerce Search LLMs with Industrial Deployment
Subjects: Computation and Language (cs.CL)
[1047]  arXiv:2603.16249 (replaced) [pdf, ps, other]
Title: Synergizing Deep Learning and Biological Heuristics for Extreme Long-Tail White Blood Cell Classification
Comments: Accepted at IEEE ISBI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1048]  arXiv:2603.16313 (replaced) [pdf, ps, other]
Title: Learning to Predict, Discover, and Reason in High-Dimensional Event Sequences
Authors: Hugo Math
Comments: PhD dissertation, 135 pages of main content, 201 pages in total
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[1049]  arXiv:2603.16340 (replaced) [pdf, ps, other]
Title: Iris: Bringing Real-World Priors into Diffusion Model for Monocular Depth Estimation
Comments: Accepted by CVPR2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1050]  arXiv:2603.16479 (replaced) [src]
Title: TRACE: Evaluating Execution Efficiency of LLM-Based Code Translation
Comments: Submitted in error as a new submission instead of a replacement for arXiv:2508.11468
Subjects: Software Engineering (cs.SE)
[1051]  arXiv:2603.16549 (replaced) [pdf, ps, other]
Title: Bridging the Simulation-to-Reality Gap in Electron Microscope Calibration via VAE-EM Estimation
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[1052]  arXiv:2603.16606 (replaced) [pdf, ps, other]
Title: Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech
Subjects: Computation and Language (cs.CL)
[1053]  arXiv:2603.16629 (replaced) [pdf, ps, other]
Title: MLLM-based Textual Explanations for Face Comparison
Comments: Accepted at 14th International Workshop on Biometrics and Forensics (IWBF)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[1054]  arXiv:2603.16649 (replaced) [pdf, ps, other]
Title: Mixture of Style Experts for Diverse Image Stylization
Comments: 24 pages, 16 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1055]  arXiv:2603.16744 (replaced) [pdf, ps, other]
Title: Nonstandard Errors in AI Agents
Comments: 45 pages
Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
[1056]  arXiv:2603.16749 (replaced) [pdf, ps, other]
Title: Probing Cultural Signals in Large Language Models through Author Profiling
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[1057]  arXiv:2603.16952 (replaced) [pdf, ps, other]
Title: Embodied Foundation Models at the Edge: A Survey of Deployment Constraints and Mitigation Strategies
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[1058]  arXiv:2603.17024 (replaced) [pdf, ps, other]
Title: HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning
Comments: 28 pages, 8 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[1059]  arXiv:2603.17060 (replaced) [pdf, ps, other]
Title: LLM Use, Cheating, and Academic Integrity in Software Engineering Education
Subjects: Computers and Society (cs.CY); Software Engineering (cs.SE)
[1060]  arXiv:2603.17270 (replaced) [pdf, ps, other]
Title: Allocating Chores with Restricted Additive Costs: Achieving EFX, MMS, and Efficiency Simultaneously
Comments: To appear in WWW 2026
Subjects: Computer Science and Game Theory (cs.GT)
[1061]  arXiv:2603.17272 (replaced) [pdf, ps, other]
Title: Network and Device Level Cyber Deception for Contested Environments Using RL and LLMs
Comments: 10 pages, 5 figures
Subjects: Cryptography and Security (cs.CR); Emerging Technologies (cs.ET)
[1062]  arXiv:2603.17314 (replaced) [src]
Title: A Proposal-Free Query-Guided Network for Grounded Multimodal Named Entity Recognition
Comments: There is an error in the methods section
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1063]  arXiv:2603.17380 (replaced) [pdf, ps, other]
Title: SCALE:Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
[1064]  arXiv:2603.17445 (replaced) [pdf, ps, other]
Title: When Only the Final Text Survives: Implicit Execution Tracing for Multi-Agent Attribution
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[1065]  arXiv:2603.17497 (replaced) [pdf, ps, other]
Title: From Optimizable to Interactable: Mixed Digital Twin-Empowered Testing of Vehicle-Infrastructure Cooperation Systems
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[1066]  arXiv:2603.17558 (replaced) [pdf, ps, other]
Title: Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition
Comments: 13 pages, 8 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD)
[1067]  arXiv:2603.17685 (replaced) [pdf, ps, other]
Title: Flow Matching Policy with Entropy Regularization
Subjects: Machine Learning (cs.LG)
[1068]  arXiv:2603.17759 (replaced) [pdf, ps, other]
Title: Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[1069]  arXiv:2603.17790 (replaced) [pdf, ps, other]
Title: The Convergence Frontier: Integrating Machine Learning and High Performance Quantum Computing for Next-Generation Drug Discovery
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
[1070]  arXiv:2603.17821 (replaced) [pdf, ps, other]
Title: CodeT5-RNN: Reinforcing Contextual Embeddings for Enhanced Code Comprehension
Subjects: Software Engineering (cs.SE)
[1071]  arXiv:2603.17899 (replaced) [pdf, ps, other]
Title: Crisis-induced differences in attention towards Ukraine in Twitter 2008-2023
Comments: Submitted to Humanities and Social Sciences Communications
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)
[1072]  arXiv:2603.17927 (replaced) [pdf, ps, other]
Title: RoboForge: Physically Optimized Text-guided Whole-Body Locomotion for Humanoids
Comments: 10 pages, 5 figures
Subjects: Robotics (cs.RO)
[1073]  arXiv:2603.17944 (replaced) [pdf, ps, other]
Title: TransText: Alpha-as-RGB Representation for Transparent Text Animation
Comments: 19 pages, publication review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1074]  arXiv:2603.17973 (replaced) [pdf, ps, other]
Title: TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis
Comments: Toolpaper, 7 pages, 7 tables, 3 figures, 1 algorithm. Submitted to ACM AIWare 2026 (Data and Benchmark Track)
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[ total of 1074 entries: 1-1074 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2603, contact, help  (Access key information)