Computer Science
New submissions
[ showing up to 2000 entries per page: fewer | more ]
New submissions for Wed, 4 Feb 26
- [1] arXiv:2602.02496 [pdf, ps, other]
-
Title: The Hypocrisy Gap: Quantifying Divergence Between Internal Belief and Chain-of-Thought Explanation via Sparse AutoencodersComments: 8 pages, 1 figureSubjects: Computation and Language (cs.CL)
Large Language Models (LLMs) frequently exhibit unfaithful behavior, producing a final answer that differs significantly from their internal chain of thought (CoT) reasoning in order to appease the user they are conversing with. In order to better detect this behavior, we introduce the Hypocrisy Gap, a mechanistic metric utilizing Sparse Autoencoders (SAEs) to quantify the divergence between a model's internal reasoning and its final generation. By mathematically comparing an internal truth belief, derived via sparse linear probes, to the final generated trajectory in latent space, we quantify and detect a model's tendency to engage in unfaithful behavior. Experiments on Gemma, Llama, and Qwen models using Anthropic's Sycophancy benchmark show that our method achieves an AUROC of 0.55-0.73 for detecting sycophantic runs and 0.55-0.74 for hypocritical cases where the model internally "knows" the user is wrong, consistently outperforming a decision-aligned log-probability baseline (0.41-0.50 AUROC).
- [2] arXiv:2602.02497 [pdf, ps, other]
-
Title: STEMVerse: A Dual-Axis Diagnostic Framework for STEM Reasoning in Large Language ModelsComments: Preprint, Under reviewSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
As Large Language Models (LLMs) achieve significant breakthroughs in complex reasoning tasks, evaluating their proficiency in science, technology, engineering, and mathematics (STEM) has become a primary method for measuring machine intelligence. However, current evaluation paradigms often treat benchmarks as isolated "silos," offering only monolithic aggregate scores that neglect the intricacies of both academic specialization and cognitive depth. This result-oriented approach fails to distinguish whether model errors stem from insufficient domain knowledge or deficiencies in cognitive capacity, thereby limiting the diagnostic value. To address this, we propose STEMVerse, a diagnostic framework designed to systematically analyze the STEM reasoning capabilities of LLMs. This framework characterizes model performance across academic specialization and cognitive complexity to map the capability required for reasoning. We re-aggregate over 20,000 STEM problems from mainstream benchmarks into a unified "Discipline $\times$ Cognition" capability space, assigning dual-axis labels to every instance. Utilizing this unified diagnostic framework, we systematically evaluate representative LLM families across varying parameter scales and training paradigms. Our empirical results reveal structural failure patterns in STEM reasoning. By integrating multi-disciplinary coverage and fine-grained cognitive stratification into a unified framework, STEMVerse provides a clear and actionable perspective for understanding the scientific reasoning characteristics of LLMs.
- [3] arXiv:2602.02498 [pdf, ps, other]
-
Title: Test-Time Detoxification without Training or Learning AnythingSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large language models can produce toxic or inappropriate text even for benign inputs, creating risks when deployed at scale. Detoxification is therefore important for safety and user trust, particularly when we want to reduce harmful content without sacrificing the model's generation quality. Many existing approaches rely on model retraining, gradients, or learned auxiliary components, which can be costly and may not transfer across model families or to truly black-box settings. We introduce a test-time procedure that approximates the gradient of completion toxicity with respect to the input embeddings and uses a small number of descent steps to steer generation toward less toxic continuations. This is achieved with zeroth-order optimization that requires only access to input embeddings, a toxicity scoring function, and forward evaluations of the model. Empirically, the approach delivers robust toxicity reductions across models and prompts and, in most settings, achieves the best overall toxicity-quality trade-off. More broadly, our work positions word embeddings as effective control variables and encourages wider use of black-box optimization to guide autoregressive language models toward scalable, safer text generation, without requiring any training or access to intermediate computations.
- [4] arXiv:2602.02499 [pdf, ps, other]
-
Title: ROSA-Tuning: Enhancing Long-Context Modeling via Suffix MatchingSubjects: Computation and Language (cs.CL)
Long-context capability and computational efficiency are among the central challenges facing today's large language models. Existing efficient attention methods reduce computational complexity, but they typically suffer from a limited coverage of the model state. This paper proposes ROSA-Tuning, a retrieval-and-recall mechanism for enhancing the long-context modeling ability of pretrained models. Beyond the standard attention mechanism, ROSA-Tuning introduces in parallel a CPU-based ROSA (RWKV Online Suffix Automaton) retrieval module, which efficiently locates historical positions in long contexts that are relevant to the current query, and injects the retrieved information into the model state in a trainable manner; subsequent weighted fusion can then be handled by range-restricted attention. To enable end-to-end training, we design a binary discretization strategy and a counterfactual gradient algorithm, and further optimize overall execution efficiency via an asynchronous CPU-GPU pipeline. Systematic evaluations on Qwen3-Base-1.7B show that ROSA-Tuning substantially restores the long-context modeling ability of windowed-attention models, achieving performance close to and in some cases matching global attention on benchmarks such as LongBench, while maintaining computational efficiency and GPU memory usage that are nearly comparable to windowed-attention methods, offering a new technical path for efficient long-context processing. The example code can be found at https://github.com/zyaaa-ux/ROSA-Tuning.
- [5] arXiv:2602.02500 [pdf, ps, other]
-
Title: UNSO: Unified Newton Schulz OrthogonalizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
The Newton-Schulz (NS) iteration has gained increasing interest for its role in the Muon optimizer and the Stiefel manifold. However, the conventional NS iteration suffers from inefficiency and instability. Although various improvements have been introduced to NS iteration, they fail to deviate from the conventional iterative paradigm, which could increase computation burden largely due to the matrix products along the long dimension repeatedly. To address this, we consolidate the iterative structure into a unified framework, named Unified Newton-Schulz Orthogonalization (UNSO). To do so, we could avoid a polynomial expansion. Instead, we evaluate the role of each matrix power, remove the insignificant terms, and provide a recommended polynomial with learnable coefficients. These learnable coefficients are then optimized, and achieve an outstanding performance with stable convergence. The code of our method is available: https://github.com/greekinRoma/Unified_Newton_Schulz_Orthogonalization.
- [6] arXiv:2602.02501 [pdf, ps, other]
-
Title: Augmenting Parameter-Efficient Pre-trained Language Models with Large Language ModelsComments: 22 pages, 9 figures, 11 tables, short paper was accepted in ACM SAC 2024Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Training AI models in cybersecurity with help of vast datasets offers significant opportunities to mimic real-world behaviors effectively. However, challenges like data drift and scarcity of labelled data lead to frequent updates of models and the risk of overfitting. To address these challenges, we used parameter-efficient fine-tuning techniques for pre-trained language models wherein we combine compacters with various layer freezing strategies. To enhance the capabilities of these pre-trained language models, in this work we introduce two strategies that use large language models. In the first strategy, we utilize large language models as data-labelling tools wherein they generate labels for unlabeled data. In the second strategy, large language modes are utilized as fallback mechanisms for predictions having low confidence scores. We perform comprehensive experimental analysis on the proposed strategies on different downstream tasks specific to cybersecurity domain. We empirically demonstrate that by combining parameter-efficient pre-trained models with large language models, we can improve the reliability and robustness of models, making them more suitable for real-world cybersecurity applications.
- [7] arXiv:2602.02502 [pdf, ps, other]
-
Title: Sparse Adapter Fusion for Continual Learning in NLPComments: This paper has been accepted to EACL 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Continual learning in natural language processing plays a crucial role in adapting to evolving data and preventing catastrophic forgetting. Despite significant progress, existing methods still face challenges, such as inefficient parameter reuse across tasks, risking catastrophic forgetting when tasks are dissimilar, and the unnecessary introduction of new parameters for each task, which hampers knowledge sharing among similar tasks. To tackle these issues, we propose a Sparse Adapter Fusion Method (SAFM), which dynamically fuses old and new adapters to address these challenges. SAFM operates in two stages: the decision stage and the tuning stage. In the decision stage, SAFM determines whether to incorporate a new adapter, reuse an existing one, or add an empty adapter. The architecture search procedure, designed to prioritize reusing or adding empty adapters, minimizes parameter consumption and maximizes reuse. In the tuning stage, SAFM especially facilitates a layer-wise loss to encourage differentiation between adapters, effectively capturing knowledge within the same task. Experimental results consistently show that SAFM outperforms state-of-the-art (SOTA) methods, achieving comparable performance while utilizing less than 60% of the parameters.
- [8] arXiv:2602.02505 [pdf, ps, other]
-
Title: Learning-augmented smooth integer programs with PAC-learnable oraclesSubjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
This paper investigates learning-augmented algorithms for smooth integer programs, covering canonical problems such as MAX-CUT and MAX-k-SAT. We introduce a framework that incorporates a predictive oracle to construct a linear surrogate of the objective, which is then solved via linear programming followed by a rounding procedure. Crucially, our framework ensures that the solution quality is both consistent and smooth against prediction errors. We demonstrate that this approach effectively extends tractable approximations from the classical dense regime to the near-dense regime. Furthermore, we go beyond the assumption of oracle existence by establishing its PAC-learnability. We prove that the induced algorithm class possesses a bounded pseudo-dimension, thereby ensuring that an oracle with near-optimal expected performance can be learned with polynomial samples.
- [9] arXiv:2602.02508 [pdf, ps, other]
-
Title: Precoding-Oriented CSI Feedback Design with Mutual Information Regularized VQ-VAEComments: 5 pages, submitted to IEEE VTC conferenceSubjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Efficient channel state information (CSI) compression at the user equipment plays a key role in enabling accurate channel reconstruction and precoder design in massive multiple-input multiple-output systems. A key challenge lies in balancing the CSI feedback overhead with the achievable downlink rate, i.e., maximizing the utility of limited feedback to maintain high system performance. In this work, we propose a precoding-oriented CSI feedback framework based on a vector quantized variational autoencoder, augmented with an information-theoretic regularization. To achieve this, we introduce a differentiable mutual information lower-bound estimator as a training regularizer to promote effective utilization of the learned codebook under a fixed feedback budget. Numerical results demonstrate that the proposed method achieves rates comparable to variable-length neural compression schemes, while operating with fixed-length feedback. Furthermore, the learned codewords exhibit significantly more uniform usage and capture interpretable structures that are strongly correlated with the underlying channel state information.
- [10] arXiv:2602.02509 [pdf, ps, other]
-
Title: CodeGuard: Improving LLM Guardrails in CS EducationSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Large language models (LLMs) are increasingly embedded in Computer Science (CS) classrooms to automate code generation, feedback, and assessment. However, their susceptibility to adversarial or ill-intentioned prompts threatens student learning and academic integrity. To cope with this important issue, we evaluate existing off-the-shelf LLMs in handling unsafe and irrelevant prompts within the domain of CS education. We identify important shortcomings in existing LLM guardrails which motivates us to propose CodeGuard, a comprehensive guardrail framework for educational AI systems. CodeGuard includes (i) a first-of-its-kind taxonomy for classifying prompts; (ii) the CodeGuard dataset, a collection of 8,000 prompts spanning the taxonomy; and (iii) PromptShield, a lightweight sentence-encoder model fine-tuned to detect unsafe prompts in real time. Experiments show that PromptShield achieves 0.93 F1 score, surpassing existing guardrail methods. Additionally, further experimentation reveals that CodeGuard reduces potentially harmful or policy-violating code completions by 30-65% without degrading performance on legitimate educational tasks. The code, datasets, and evaluation scripts are made freely available to the community.
- [11] arXiv:2602.02510 [pdf, ps, other]
-
Title: Beyond Translation: Cross-Cultural Meme Transcreation with Vision-Language ModelsSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Memes are a pervasive form of online communication, yet their cultural specificity poses significant challenges for cross-cultural adaptation. We study cross-cultural meme transcreation, a multimodal generation task that aims to preserve communicative intent and humor while adapting culture-specific references. We propose a hybrid transcreation framework based on vision-language models and introduce a large-scale bidirectional dataset of Chinese and US memes. Using both human judgments and automated evaluation, we analyze 6,315 meme pairs and assess transcreation quality across cultural directions. Our results show that current vision-language models can perform cross-cultural meme transcreation to a limited extent, but exhibit clear directional asymmetries: US-Chinese transcreation consistently achieves higher quality than Chinese-US. We further identify which aspects of humor and visual-textual design transfer across cultures and which remain challenging, and propose an evaluation framework for assessing cross-cultural multimodal generation. Our code and dataset are publicly available at https://github.com/AIM-SCU/MemeXGen.
- [12] arXiv:2602.02511 [pdf, ps, other]
-
Title: Training Data Governance for Brain Foundation ModelsSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Brain foundation models bring the foundation model paradigm to the field of neuroscience. Like language and image foundation models, they are general-purpose AI systems pretrained on large-scale datasets that adapt readily to downstream tasks. Unlike text-and-image based models, however, they train on brain data: large-datasets of EEG, fMRI, and other neural data types historically collected within tightly governed clinical and research settings. This paper contends that training foundation models on neural data opens new normative territory. Neural data carry stronger expectations of, and claims to, protection than text or images, given their body-derived nature and historical governance within clinical and research settings. Yet the foundation model paradigm subjects them to practices of large-scale repurposing, cross-context stitching, and open-ended downstream application. Furthermore, these practices are now accessible to a much broader range of actors, including commercial developers, against a backdrop of fragmented and unclear governance. To map this territory, we first describe brain foundation models' technical foundations and training-data ecosystem. We then draw on AI ethics, neuroethics, and bioethics to organize concerns across privacy, consent, bias, benefit sharing, and governance. For each, we propose both agenda-setting questions and baseline safeguards as the field matures.
- [13] arXiv:2602.02512 [pdf, ps, other]
-
Title: Efficient Edge Rewiring Strategies for Enhancing PageRank FairnessComments: Accepted by Theoretical Computer Science (TCS)Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
We study the notion of unfairness in social networks, where a group such as females in a male-dominated industry are disadvantaged in access to important information, e.g. job posts, due to their less favorable positions in the network. We investigate a well-established network-based formulation of fairness called PageRank fairness, which refers to a fair allocation of the PageRank weights among distinct groups. Our goal is to enhance the PageRank fairness by modifying the underlying network structure. More precisely, we study the problem of maximizing PageRank fairness with respect to a disadvantaged group, when we are permitted to rewire a fixed number of edges in the network. Building on a greedy approach, we leverage techniques from fast sampling of rooted spanning forests to devise an effective linear-time algorithm for this problem. To evaluate the accuracy and performance of our proposed algorithm, we conduct a large set of experiments on various real-world network data. Our experiments demonstrate that the proposed algorithm significantly outperforms the existing ones. Our algorithm is capable of generating accurate solutions for networks of million nodes in just a few minutes.
- [14] arXiv:2602.02513 [pdf, ps, other]
-
Title: Learning ORDER-Aware Multimodal Representations for Composite Materials DesignSubjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
Artificial intelligence (AI) has shown remarkable success in materials discovery and property prediction, particularly for crystalline and polymer systems where material properties and structures are dominated by discrete graph representations. Such graph-central paradigm breaks down on composite materials, which possess continuous and nonlinear design spaces that lack well-defined graph structures. General composite descriptors, e.g., fiber volume and misalignment angle, cannot fully capture the fiber distributions that fundamentally determine microstructural characteristics, necessitating the integration of heterogeneous data sources through multimodal learning. Existing alignment-oriented multimodal frameworks have proven effective on abundant crystal or polymer data under discrete, unique graph-property mapping assumptions, but fail to address the highly continuous composite design space under extreme data scarcity. In this work, we introduce ORDinal-aware imagE-tabulaR alignment (ORDER), a multimodal pretraining framework that establishes ordinality as a core principle for composite material representations. ORDER ensures that materials with similar target properties occupy nearby regions in the latent space, which effectively preserves the continuous nature of composite properties and enables meaningful interpolation between sparsely observed designs. We evaluate ORDER on a public Nanofiber-enforced composite dataset and an internally curated dataset that simulates the construction of carbon fiber T700 with diverse fiber distributions. ORDER achieves consistent improvements over state-of-the-art multimodal baselines across property prediction, cross-modal retrieval, and microstructure generation tasks.
- [15] arXiv:2602.02514 [pdf, ps, other]
-
Title: Design and Evaluation of Whole-Page Experience Optimization for E-commerce SearchAuthors: Pratik Lahiri, Bingqing Ge, Zhou Qin, Aditya Jumde, Shuning Huo, Lucas Scottini, Yi Liu, Mahmoud Mamlouk, Wenyang LiuSubjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
E-commerce Search Results Pages (SRPs) are evolving from linear lists to complex, non-linear layouts, rendering traditional position-biased ranking models insufficient. Moreover, existing optimization frameworks typically maximize short-term signals (e.g., clicks, same-day revenue) because long-term satisfaction metrics (e.g., expected two-week revenue) involve delayed feedback and challenging long-horizon credit attribution. To bridge these gaps, we propose a novel Whole-Page Experience Optimization Framework. Unlike traditional list-wise rankers, our approach explicitly models the interplay between item relevance, 2D positional layout, and visual elements. We use a causal framework to develop metrics for measuring long-term user satisfaction based on quasi-experimental data. We validate our approach through industry-scale A/B testing, where the model demonstrated a 1.86% improvement in brand relevance (our primary customer experience metric) while simultaneously achieving a statistically significant revenue uplift of +0.05%
- [16] arXiv:2602.02515 [pdf, ps, other]
-
Title: CreditAudit: 2D Auditing for LLM Evaluation and SelectionComments: First updateSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Leaderboard scores on public benchmarks have been steadily rising and converging, with many frontier language models now separated by only marginal differences. However, these scores often fail to match users' day to day experience, because system prompts, output protocols, and interaction modes evolve under routine iteration, and in agentic multi step pipelines small protocol shifts can trigger disproportionate failures, leaving practitioners uncertain about which model to deploy. We propose CreditAudit, a deployment oriented credit audit framework that evaluates models under a family of semantically aligned and non adversarial system prompt templates across multiple benchmarks, reporting mean ability as average performance across scenarios and scenario induced fluctuation sigma as a stability risk signal, and further mapping volatility into interpretable credit grades from AAA to BBB via cross model quantiles with diagnostics that mitigate template difficulty drift. Controlled experiments on GPQA, TruthfulQA, and MMLU Pro show that models with similar mean ability can exhibit substantially different fluctuation, and stability risk can overturn prioritization decisions in agentic or high failure cost regimes. By providing a 2D and grade based language for regime specific selection, CreditAudit supports tiered deployment and more disciplined allocation of testing and monitoring effort, enabling more objective and trustworthy model evaluation for real world use.
- [17] arXiv:2602.02516 [pdf, ps, other]
-
Title: Measuring Individual User Fairness with User Similarity and Effectiveness DisparityComments: Preprint of a work that has been accepted to ECIR 2026 Full Papers track as a Findings paperSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Individual user fairness is commonly understood as treating similar users similarly. In Recommender Systems (RSs), several evaluation measures exist for quantifying individual user fairness. These measures evaluate fairness via either: (i) the disparity in RS effectiveness scores regardless of user similarity, or (ii) the disparity in items recommended to similar users regardless of item relevance. Both disparity in recommendation effectiveness and user similarity are very important in fairness, yet no existing individual user fairness measure simultaneously accounts for both. In brief, current user fairness evaluation measures implement a largely incomplete definition of fairness. To fill this gap, we present Pairwise User unFairness (PUF), a novel evaluation measure of individual user fairness that considers both effectiveness disparity and user similarity. PUF is the only measure that can express this important distinction. We empirically validate that PUF does this consistently across 4 datasets and 7 rankers, and robustly when varying user similarity or effectiveness. In contrast, all other measures are either almost insensitive to effectiveness disparity or completely insensitive to user similarity. We contribute the first RS evaluation measure to reliably capture both user similarity and effectiveness in individual user fairness. Our code: https://github.com/theresiavr/PUF-individual-user-fairness-recsys.
- [18] arXiv:2602.02517 [pdf, ps, other]
-
Title: What Drives Length of Stay After Elective Spine Surgery? Insights from a Decade of Predictive ModelingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Objective: Predicting length of stay after elective spine surgery is essential for optimizing patient outcomes and hospital resource use. This systematic review synthesizes computational methods used to predict length of stay in this patient population, highlighting model performance and key predictors. Methods: Following PRISMA guidelines, we systematically searched PubMed, Google Scholar, and ACM Digital Library for studies published between December 1st, 2015, and December 1st, 2024. Eligible studies applied statistical or machine learning models to predict length of stay for elective spine surgery patients. Three reviewers independently screened studies and extracted data. Results: Out of 1,263 screened studies, 29 studies met inclusion criteria. Length of stay was predicted as a continuous, binary, or percentile-based outcome. Models included logistic regression, random forest, boosting algorithms, and neural networks. Machine learning models consistently outperformed traditional statistical models, with AUCs ranging from 0.94 to 0.99. K-Nearest Neighbors and Naive Bayes achieved top performance in some studies. Common predictors included age, comorbidities (notably hypertension and diabetes), BMI, type and duration of surgery, and number of spinal levels. However, external validation and reporting practices varied widely across studies. Discussion: There is growing interest in artificial intelligence and machine learning in length of stay prediction, but lack of standardization and external validation limits clinical utility. Future studies should prioritize standardized outcome definitions and transparent reporting needed to advance real-world deployment. Conclusion: Machine learning models offer strong potential for length of stay prediction after elective spine surgery, highlighting their potential for improving discharge planning and hospital resource management.
- [19] arXiv:2602.02518 [pdf, ps, other]
-
Title: GraphDancer: Training LLMs to Explore and Reason over Graphs via Curriculum Reinforcement LearningComments: 15 pages, Project website: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Large language models (LLMs) increasingly rely on external knowledge to improve factuality, yet many real-world knowledge sources are organized as heterogeneous graphs rather than plain text. Reasoning over such graph-structured knowledge poses two key challenges: (1) navigating structured, schema-defined relations requires precise function calls rather than similarity-based retrieval, and (2) answering complex questions often demands multi-hop evidence aggregation through iterative information seeking. We propose GraphDancer, a reinforcement learning (RL) framework that teaches LLMs to navigate graphs by interleaving reasoning and function execution. To make RL effective for moderate-sized LLMs, we introduce a graph-aware curriculum that schedules training by the structural complexity of information-seeking trajectories using an easy-to-hard biased sampler. We evaluate GraphDancer on a multi-domain benchmark by training on one domain only and testing on unseen domains and out-of-distribution question types. Despite using only a 3B backbone, GraphDancer outperforms baselines equipped with either a 14B backbone or GPT-4o-mini, demonstrating robust cross-domain generalization of graph exploration and reasoning skills. Our code and models can be found at https://yuyangbai.com/graphdancer/ .
- [20] arXiv:2602.02519 [pdf, ps, other]
-
Title: Evaluation of Large Language Models' educational feedback in Higher Education: potential, limitations and implications for educational practiceSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
The importance of managing feedback practices in higher education has been widely recognised, as they play a crucial role in enhancing teaching, learning, and assessment processes. In today's educational landscape, feedback practices are increasingly influenced by technological advancements, particularly artificial intelligence (AI). Understanding the impact of AI on feedback generation is essential for identifying its potential benefits and establishing effective implementation strategies. This study examines how AI-generated feedback supports student learning using a well-established analytical framework. Specifically, feedback produced by different Large Language Models (LLMs) was assessed in relation to student-designed projects within a training course on inclusive teaching and learning. The evaluation process involved providing seven LLMs with a structured rubric, developed by the university instructor, which defined specific criteria and performance levels. The LLMs were tasked with generating both quantitative assessments and qualitative feedback based on this rubric. The AI-generated feedback was then analysed using Hughes, Smith, and Creese's framework to evaluate its structure and effectiveness in fostering formative learning experiences. Overall, these findings indicate that LLMs can generate well-structured feedback and hold great potential as a sustainable and meaningful feedback tool, provided they are guided by clear contextual information and a well-defined instructions that will be explored further in the conclusions.
- [21] arXiv:2602.02520 [pdf, ps, other]
-
Title: Artificial Intelligence for Inclusive Engineering Education: Advancing Equality, Diversity, and Ethical LeadershipSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
AI technology development has transformed the field of engineering education with its adaptivity-driven, data-based, and ethical-led learning platforms that promote equity, diversity, and inclusivity. But with so much progress being made in so many areas, there are unfortunately gaps in gender equity, representation in cultures around the world, and access to education and jobs in stem education. The paper describes an ethical approach to using AI technology that supports the United Nations 2030 agenda for sustainability. In particular, this includes both Goal 5--Gender Equity--and Goal 10--Reducing Inequalities. Based on a synthesis strategy using both critical thinking strategies related to case studies around the world using AI-based adaptivity platforms to address equity gaps related to education inclusion. The model presented offers a synthesis solution that includes ethical leadership data-related to equity to measure inclusivity based upon sustainability thinking. The result has demonstrated that using AI technology not only increases inclusivity but promotes equity related to access to education in stem education access. Finally, there are concluding remarks related to transforming education into a global system.
- [22] arXiv:2602.02521 [pdf, ps, other]
-
Title: Scaled Dot-Product Attention implements projection of inputs onto a common surfaceAuthors: Terence D SangerSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Scaled dot-product attention (SDPA) is a fundamental component responsible for the success of large-language models and other nonlinear signal processing applications. The rationale for SDPA has been based upon "query, key, value" concepts borrowed from database theory, but these concepts are difficult to reconcile with standard methods in mathematical signal processing. We show that SDPA can be rewritten in a different but mathematically equivalent form as a projection of the input vectors onto a common surface determined by the inputs themselves. Therefore SDPA discovers nonlinear dependencies in the input that are time-dependent and context-dependent. The rewritten form of SDPA permits increased speed of both feedforward and learning algorithms, but more importantly suggests potential extensions. In the context of language, we re-interpret the role of SDPA as finding a time-dependent contextual meaning determined by the surface on which the set of input vectors lies. Input token embeddings are then modified by the local context surface. This interpretation differs substantially from the concept of "self-attention", and provides a strong justification for the use of SDPA for time-series data with time-varying local nonlinear dependencies.
- [23] arXiv:2602.02522 [pdf, ps, other]
-
Title: IMU-1: Sample-Efficient Pre-training of Small Language ModelsAuthors: George GrigorevComments: 16 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We present IMU-1, a 430M-parameter language model trained on 72B tokens that approaches the benchmark performance of models trained on 56x more data. We describe a validated training recipe combining recent architectural interventions (QK-norm attention, per-head gating, value residuals, LayerNorm scaling) with optimization advances (NorMuon with cautious weight decay, muP parametrization) and a three-stage training schedule with post-hoc checkpoint EMA. We provide ablations for each component and release code, weights and data to enable reproduction: https://huggingface.co/thepowerfuldeez/imu1_base
- [24] arXiv:2602.02523 [pdf, ps, other]
-
Title: TabularMath: Evaluating Computational Extrapolation in Tabular Learning via Program-Verified SynthesisComments: 30 pages; TabularMath technical reportSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Standard tabular benchmarks mainly focus on the evaluation of a model's capability to interpolate values inside a data manifold, where models good at performing local statistical smoothing are rewarded. However, there exists a very large category of high-value tabular data, including financial modeling and physical simulations, which are generated based upon deterministic computational processes, as opposed to stochastic and noisy relationships. Therefore, we investigate if tabular models can provide an extension from statistical interpolation to computational extrapolation.
We propose TabularMath, a diagnostic benchmark of 114 deterministic problems (233,472 rows) generated from verified programs based on GSM8K and AIME. We evaluate 9 tabular architectures and in-context learning (ICL) with GPT-OSS-120B. On standard regression metrics, TabPFN v2.5 performs remarkably well, achieving R^2=0.998 in-distribution and maintaining positive R^2 even under distribution shift, which is unique among the tabular models we tested. When we measure rounded consistency (exact integer match), a different picture emerges: TabPFN v2.5 drops below 10% on out-of-distribution data, while ICL maintains around 40%. This gap between R^2 and exact-match accuracy suggests that tabular models learn smooth function approximations but struggle to recover precise computational outputs under extrapolation. The two paradigms appear complementary: TabPFN scales efficiently with data; ICL achieves exact computation from few examples. We release all code and data to support further investigation. - [25] arXiv:2602.02524 [pdf, ps, other]
-
Title: GASTON: Graph-Aware Social Transformer for Online NetworksComments: Submitted to ICWSMSubjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Online communities have become essential places for socialization and support, yet they also possess toxicity, echo chambers, and misinformation. Detecting this harmful content is difficult because the meaning of an online interaction stems from both what is written (textual content) and where it is posted (social norms). We propose GASTON (Graph-Aware Social Transformer for Online Networks), which learns text and user embeddings that are grounded in their local norms, providing the necessary context for downstream tasks. The heart of our solution is a contrastive initialization strategy that pretrains community embeddings based on user membership patterns, capturing a community's user base before processing any text. This allows GASTON to distinguish between communities (e.g., a support group vs. a hate group) based on who interacts there, even if they share similar vocabulary. Experiments on tasks such as stress detection, toxicity scoring, and norm violation demonstrate that the embeddings produced by GASTON outperform state-of-the-art baselines.
- [26] arXiv:2602.02525 [pdf, ps, other]
-
Title: Community Norms in the Spotlight: Enabling Task-Agnostic Unsupervised Pre-Training to Benefit Online Social MediaComments: Submitted to ICWSM PosterSubjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Modelling the complex dynamics of online social platforms is critical for addressing challenges such as hate speech and misinformation. While Discussion Transformers, which model conversations as graph structures, have emerged as a promising architecture, their potential is severely constrained by reliance on high-quality, human-labelled datasets. In this paper, we advocate a paradigm shift from task-specific fine-tuning to unsupervised pretraining, grounded in an entirely novel consideration of community norms. We posit that this framework not only mitigates data scarcity but also enables interpretation of the social norms underlying the decisions made by such an AI system. Ultimately, we believe that this direction offers many opportunities for AI for Social Good.
- [27] arXiv:2602.02526 [pdf, ps, other]
-
Title: The "Robert Boulton" Singularity: Semantic Tunneling and Manifold Unfolding in Recursive AIAuthors: Pengyue HouComments: Companion paper to arXiv:2601.11594. Provides empirical validation of the MNCIS framework in Large Language Models (GPT-2) using a recursive training protocol (N=1500). Includes complete, reproducible Python implementation of Adaptive Spectral Negative Coupling (ASNC) and Effective Rank metrics in the AppendixSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computational Physics (physics.comp-ph)
The stability of generative artificial intelligence trained on recursive synthetic data is conventionally monitored via Perplexity (PPL). We demonstrate that PPL is a deceptive metric in context-stabilized regimes (L=128). Using a rigorous sliding-window protocol (N=1500), we identify a novel failure mode termed "Semantic Tunneling." While the Baseline model maintains high grammatical fluency (PPL approx. 83.9), it suffers a catastrophic loss of semantic diversity, converging within seven generations to a single, low-entropy narrative attractor: the "Robert Boulton" Singularity. This phenomenon represents a total collapse of the latent manifold (Global Effective Rank 3.62 -> 2.22), where the model discards diverse world knowledge to optimize for statistically safe syntactic templates. To address this, we apply the Multi-Scale Negative Coupled Information Systems (MNCIS) framework recently established in Hou (2026) [arXiv:2601.11594]. We demonstrate that Adaptive Spectral Negative Coupling (ASNC) acts as a topological operator that actively induces "Manifold Unfolding." MNCIS forces the model to expand its effective rank from the anisotropic baseline of 3.62 to a hyper-diverse state of 5.35, effectively constructing an "Artificial Manifold" that resists the gravitational pull of semantic attractors and preserves the long-tail distribution of the training data.
- [28] arXiv:2602.02528 [pdf, ps, other]
-
Title: Incident-Guided Spatiotemporal Traffic ForecastingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Recent years have witnessed the rapid development of deep-learning-based, graph-neural-network-based forecasting methods for modern intelligent transportation systems. However, most existing work focuses exclusively on capturing spatio-temporal dependencies from historical traffic data, while overlooking the fact that suddenly occurring transportation incidents, such as traffic accidents and adverse weather, serve as external disturbances that can substantially alter temporal patterns. We argue that this issue has become a major obstacle to modeling the dynamics of traffic systems and improving prediction accuracy, but the unpredictability of incidents makes it difficult to observe patterns from historical sequences. To address these challenges, this paper proposes a novel framework named the Incident-Guided Spatiotemporal Graph Neural Network (IGSTGNN). IGSTGNN explicitly models the incident's impact through two core components: an Incident-Context Spatial Fusion (ICSF) module to capture the initial heterogeneous spatial influence, and a Temporal Incident Impact Decay (TIID) module to model the subsequent dynamic dissipation. To facilitate research on the spatio-temporal impact of incidents on traffic flow, a large-scale dataset is constructed and released, featuring incident records that are time-aligned with traffic time series. On this new benchmark, the proposed IGSTGNN framework is demonstrated to achieve state-of-the-art performance. Furthermore, the generalizability of the ICSF and TIID modules is validated by integrating them into various existing models.
- [29] arXiv:2602.02530 [pdf, ps, other]
-
Title: Formulating Reinforcement Learning for Human-Robot Collaboration through Off-Policy EvaluationSubjects: Machine Learning (cs.LG); Robotics (cs.RO)
Reinforcement learning (RL) has the potential to transform real-world decision-making systems by enabling autonomous agents to learn from experience. Deploying RL in real-world settings, especially in the context of human-robot interaction, requires defining state representations and reward functions, which are critical for learning efficiency and policy performance. Traditional RL approaches often rely on domain expertise and trial-and-error, necessitating extensive human involvement as well as direct interaction with the environment, which can be costly and impractical, especially in complex and safety-critical applications. This work proposes a novel RL framework that leverages off-policy evaluation (OPE) for state space and reward function selection, using only logged interaction data. This approach eliminates the need for real-time access to the environment or human-in-the-loop feedback, greatly reducing the dependency on costly real-time interactions. The proposed approach systematically evaluates multiple candidate state representations and reward functions by training offline RL agents and applying OPE to estimate policy performance. The optimal state space and reward function are selected based on their ability to produce high-performing policies under OPE metrics. Our method is validated on two environments: the Lunar Lander environment by OpenAI Gym, which provides a controlled setting for assessing state space and reward function selection, and a NASA-MATB-II human subjects study environment, which evaluates the approach's real-world applicability to human-robot teaming scenarios. This work enhances the feasibility and scalability of offline RL for real-world environments by automating critical RL design decisions through a data-driven OPE-based evaluation, enabling more reliable, effective, and sustainable RL formulation for complex human-robot interaction settings.
- [30] arXiv:2602.02531 [pdf, ps, other]
-
Title: Hypersonic Flow Control: Generalized Deep Reinforcement Learning for Hypersonic Intake Unstart Control under UncertaintyComments: 34 Pages, 23 FiguresSubjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
The hypersonic unstart phenomenon poses a major challenge to reliable air-breathing propulsion at Mach 5 and above, where strong shock-boundary-layer interactions and rapid pressure fluctuations can destabilize inlet operation. Here, we demonstrate a deep reinforcement learning (DRL)- based active flow control strategy to control unstart in a canonical two-dimensional hypersonic inlet at Mach 5 and Reynolds number $5\times 10^6$. The in-house CFD solver enables high-fidelity simulations with adaptive mesh refinement, resolving key flow features, including shock motion, boundary-layer dynamics, and flow separation, that are essential for learning physically consistent control policies suitable for real-time deployment. The DRL controller robustly stabilizes the inlet over a wide range of back pressures representative of varying combustion chamber conditions. It further generalizes to previously unseen scenarios, including different back-pressure levels, Reynolds numbers, and sensor configurations, while operating with noisy measurements, thereby demonstrating strong zero-shot generalization. Control remains robust in the presence of noisy sensor measurements, and a minimal, optimally selected sensor set achieves comparable performance, enabling practical implementation. These results establish a data-driven approach for real-time hypersonic flow control under realistic operational uncertainties.
- [31] arXiv:2602.02532 [pdf, ps, other]
-
Title: CADENT: Gated Hybrid Distillation for Sample-Efficient Transfer in Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Transfer learning promises to reduce the high sample complexity of deep reinforcement learning (RL), yet existing methods struggle with domain shift between source and target environments. Policy distillation provides powerful tactical guidance but fails to transfer long-term strategic knowledge, while automaton-based methods capture task structure but lack fine-grained action guidance. This paper introduces Context-Aware Distillation with Experience-gated Transfer (CADENT), a framework that unifies strategic automaton-based knowledge with tactical policy-level knowledge into a coherent guidance signal. CADENT's key innovation is an experience-gated trust mechanism that dynamically weighs teacher guidance against the student's own experience at the state-action level, enabling graceful adaptation to target domain specifics. Across challenging environments, from sparse-reward grid worlds to continuous control tasks, CADENT achieves 40-60\% better sample efficiency than baselines while maintaining superior asymptotic performance, establishing a robust approach for adaptive knowledge transfer in RL.
- [32] arXiv:2602.02533 [pdf, ps, other]
-
Title: HMVLA: Hyperbolic Multimodal Fusion for Vision-Language-Action ModelsComments: 5 pages,5 figures,ICASSPSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
Vision Language Action (VLA) models have recently shown great potential in bridging multimodal perception with robotic control. However, existing methods often rely on direct fine-tuning of pre-trained Vision-Language Models (VLMs), feeding semantic and visual features directly into a policy network without fully addressing the unique semantic alignment challenges in the VLA domain. In this paper, we propose HMVLA, a novel VLA framework that exploits the inherent hierarchical structures in vision and language for comprehensive semantic alignment. Unlike traditional methods that perform alignment in Euclidean space, our HMVLA embeds multimodal features in hyperbolic space, enabling more effective modeling of the hierarchical relationships present in image text data. Furthermore, we introduce a sparsely gated Mixture of Experts (MoE) mechanism tailored for semantic alignment, which enhances multimodal comprehension between images and text while improving efficiency. Extensive experiments demonstrate that HMVLA surpasses baseline methods in both accuracy and generalization. In addition, we validate its robustness by reconstructing datasets to further test cross domain adaptability.
- [33] arXiv:2602.02534 [pdf, ps, other]
-
Title: DualMind: Towards Understanding Cognitive-Affective Cascades in Public Opinion Dissemination via Multi-Agent SimulationAuthors: Enhao Huang, Tongtong Pan, Shuhuai Zhang, Qishu Jin, Liheng Zheng, Kaichun Hu, Yiming Li, Zhan Qin, Kui RenComments: Accepted as a demo paper at TheWebConf (WWW) 2026Subjects: Social and Information Networks (cs.SI)
Forecasting public opinion during PR crises is challenging, as existing frameworks often overlook the interaction between transient affective responses and persistent cognitive beliefs. To address this, we propose DualMind, an LLM-driven multi-agent platform designed to model this dual-component interplay. We evaluate the system on 15 real-world crises occurring post-August 2024 using social media data as ground truth. Empirical results demonstrate that DualMind faithfully reconstructs opinion trajectories, significantly outperforming state-of-the-art baselines. This work offers a high-fidelity tool for proactive crisis management. Code is available at https://github.com/EonHao/DualMind.
- [34] arXiv:2602.02535 [pdf, ps, other]
-
Title: Enhancing Psychologists' Understanding through Explainable Deep Learning Framework for ADHD DiagnosisJournal-ref: Expert Systems, Wiley, 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Attention Deficit Hyperactivity Disorder (ADHD) is a neurodevelopmental disorder that is challenging to diagnose and requires advanced approaches for reliable and transparent identification and classification. It is characterized by a pattern of inattention, hyperactivity and impulsivity that is more severe and more frequent than in individuals with a comparable level of development. In this paper, an explainable framework based on a fine-tuned hybrid Deep Neural Network (DNN) and Recurrent Neural Network (RNN) called HyExDNN-RNN model is proposed for ADHD detection, multi-class categorization, and decision interpretation. This framework not only detects ADHD, but also provides interpretable insights into the diagnostic process so that psychologists can better understand and trust the results of the diagnosis. We use the Pearson correlation coefficient for optimal feature selection and machine and deep learning models for experimental analysis and comparison. We use a standardized technique for feature reduction, model selection and interpretation to accurately determine the diagnosis rate and ensure the interpretability of the proposed framework. Our framework provided excellent results on binary classification, with HyExDNN-RNN achieving an F1 score of 99% and 94.2% on multi-class categorization. XAI approaches, in particular SHapley Additive exPlanations (SHAP) and Permutation Feature Importance (PFI), provided important insights into the importance of features and the decision logic of models. By combining AI with human expertise, we aim to bridge the gap between advanced computational techniques and practical psychological applications. These results demonstrate the potential of our framework to assist in ADHD diagnosis and interpretation.
- [35] arXiv:2602.02536 [pdf, ps, other]
-
Title: From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal ModerationAuthors: Tianle Gu, Kexin Huang, Lingyu Li, Ruilin Luo, Shiyang Huang, Zongqi Wang, Yujiu Yang, Yan Teng, Yingchun WangSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Safety moderation is pivotal for identifying harmful content. Despite the success of textual safety moderation, its multimodal counterparts remain hindered by a dual sparsity of data and supervision. Conventional reliance on binary labels lead to shortcut learning, which obscures the intrinsic classification boundaries necessary for effective multimodal discrimination. Hence, we propose a novel learning paradigm (UniMod) that transitions from sparse decision-making to dense reasoning traces. By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, we reformulate monolithic decision tasks into a multi-dimensional boundary learning process. This approach forces the model to ground its decision in explicit safety semantics, preventing the model from converging on superficial shortcuts. To facilitate this paradigm, we develop a multi-head scalar reward model (UniRM). UniRM provides multi-dimensional supervision by assigning attribute-level scores to the response generation stage. Furthermore, we introduce specialized optimization strategies to decouple task-specific parameters and rebalance training dynamics, effectively resolving interference between diverse objectives in multi-task learning. Empirical results show UniMod achieves competitive textual moderation performance and sets a new multimodal benchmark using less than 40\% of the training data used by leading baselines. Ablations further validate our multi-attribute trajectory reasoning, offering an effective and efficient framework for multimodal moderation. Supplementary materials are available at \href{https://trustworthylab.github.io/UniMod/}{project website}.
- [36] arXiv:2602.02537 [pdf, ps, other]
-
Title: WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language ModelsAuthors: Runjie Zhou, Youbo Shao, Haoyu Lu, Bowei Xing, Tongtong Bai, Yujie Chen, Jie Zhao, Lin Sui, Haotian Yao, Zijia Zhao, Hao Yang, Haoning Wu, Zaida Zhou, Jinguo Zhu, Zhiqi Huang, Yiping Bao, Yangyang Liu, Y.Charles, Xinyu ZhouSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We introduce WorldVQA, a benchmark designed to evaluate the atomic visual world knowledge of Multimodal Large Language Models (MLLMs). Unlike current evaluations, which often conflate visual knowledge retrieval with reasoning, WorldVQA decouples these capabilities to strictly measure "what the model memorizes." The benchmark assesses the atomic capability of grounding and naming visual entities across a stratified taxonomy, spanning from common head-class objects to long-tail rarities. We expect WorldVQA to serve as a rigorous test for visual factuality, thereby establishing a standard for assessing the encyclopedic breadth and hallucination rates of current and next-generation frontier models.
- [37] arXiv:2602.02538 [pdf, ps, other]
-
Title: Enhancing Post-Training Quantization via Future Activation AwarenessSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Post-training quantization (PTQ) is a widely used method to compress large language models (LLMs) without fine-tuning. It typically sets quantization hyperparameters (e.g., scaling factors) based on current-layer activations. Although this method is efficient, it suffers from quantization bias and error accumulation, resulting in suboptimal and unstable quantization, especially when the calibration data is biased. To overcome these issues, we propose Future-Aware Quantization (FAQ), which leverages future-layer activations to guide quantization. This allows better identification and preservation of important weights, while reducing sensitivity to calibration noise. We further introduce a window-wise preview mechanism to softly aggregate multiple future-layer activations, mitigating over-reliance on any single layer. To avoid expensive greedy search, we use a pre-searched configuration to minimize overhead. Experiments show that FAQ consistently outperforms prior methods with negligible extra cost, requiring no backward passes, data reconstruction, or tuning, making it well-suited for edge deployment.
- [38] arXiv:2602.02539 [pdf, ps, other]
-
Title: How Much Information Can a Vision Token Hold? A Scaling Law for Recognition Limits in VLMsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Recent vision-centric approaches have made significant strides in long-context modeling. Represented by DeepSeek-OCR, these models encode rendered text into continuous vision tokens, achieving high compression rates without sacrificing recognition precision. However, viewing the vision encoder as a lossy channel with finite representational capacity raises a fundamental question: what is the information upper bound of visual tokens? To investigate this limit, we conduct controlled stress tests by progressively increasing the information quantity (character count) within an image. We observe a distinct phase-transition phenomenon characterized by three regimes: a near-perfect Stable Phase, an Instability Phase marked by increased error variance, and a total Collapse Phase. We analyze the mechanical origins of these transitions and identify key factors. Furthermore, we formulate a probabilistic scaling law that unifies average vision token load and visual density into a latent difficulty metric. Extensive experiments across various Vision-Language Models demonstrate the universality of this scaling law, providing critical empirical guidance for optimizing the efficiency-accuracy trade-off in visual context compression.
- [39] arXiv:2602.02542 [pdf, ps, other]
-
Title: Auto-Augmentation Contrastive Learning for Wearable-based Human Activity RecognitionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
For low-semantic sensor signals from human activity recognition (HAR), contrastive learning (CL) is essential to implement novel applications or generic models without manual annotation, which is a high-performance self-supervised learning (SSL) method. However, CL relies heavily on data augmentation for pairwise comparisons. Especially for low semantic data in the HAR area, conducting good performance augmentation strategies in pretext tasks still rely on manual attempts lacking generalizability and flexibility. To reduce the augmentation burden, we propose an end-to-end auto-augmentation contrastive learning (AutoCL) method for wearable-based HAR. AutoCL is based on a Siamese network architecture that shares the parameters of the backbone and with a generator embedded to learn auto-augmentation. AutoCL trains the generator based on the representation in the latent space to overcome the disturbances caused by noise and redundant information in raw sensor data. The architecture empirical study indicates the effectiveness of this design. Furthermore, we propose a stop-gradient design and correlation reduction strategy in AutoCL to enhance encoder representation learning. Extensive experiments based on four wide-used HAR datasets demonstrate that the proposed AutoCL method significantly improves recognition accuracy compared with other SOTA methods.
- [40] arXiv:2602.02543 [pdf, ps, other]
-
Title: Toward Ultra-Long-Horizon Sequential Model EditingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Model editing has emerged as a practical approach for mitigating factual errors and outdated knowledge in large language models (LLMs). Among existing methods, the Locate-and-Edit (L&E) paradigm is the dominant framework: it locates MLP parameters implicated in expressing a target fact, and then performs a localized update to rewrite that fact. However, long sequences of edits often trigger abrupt model collapse in L&E beyond a critical point. We empirically identify a strong correlation between collapse and explosive growth of edited MLP weight norms, and formally prove that commonly used L&E update rules can induce exponential norm growth across sequential edits in the absence of explicit norm control. To address this issue, we propose Norm-Anchor Scaling NAS, a plug-and-play norm-constrained strategy. Across extensive experiments, NAS delays the collapse point of representative L&E algorithms by more than 4 times and yields a 72.2% average relative gain in editing performance, requiring only a single additional line of code and incurring negligible computational overhead.
- [41] arXiv:2602.02544 [pdf, ps, other]
-
Title: SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language ModelsComments: 18 pages, 6 figures.The code repository is available at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
While Diffusion Language Models (DLMs) offer a flexible, arbitrary-order alternative to the autoregressive paradigm, their non-causal nature precludes standard KV caching, forcing costly hidden state recomputation at every decoding step. Existing DLM caching approaches reduce this cost by selective hidden state updates; however, they are still limited by (i) costly token-wise update identification heuristics and (ii) rigid, uniform budget allocation that fails to account for heterogeneous hidden state dynamics. To address these challenges, we present SPA-Cache that jointly optimizes update identification and budget allocation in DLM cache. First, we derive a low-dimensional singular proxy that enables the identification of update-critical tokens in a low-dimensional subspace, substantially reducing the overhead of update identification. Second, we introduce an adaptive strategy that allocates fewer updates to stable layers without degrading generation quality. Together, these contributions significantly improve the efficiency of DLMs, yielding up to an $8\times$ throughput improvement over vanilla decoding and a $2$--$4\times$ speedup over existing caching baselines.
- [42] arXiv:2602.02545 [pdf, ps, other]
-
Title: Beyond Alignment: Expanding Reasoning Capacity via Manifold-Reshaping Policy OptimizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated remarkable success in enhancing the reasoning capabilities of Large Language Models (LLMs). However, recent studies question whether RL genuinely expands reasoning capacity or merely aligns existing latent capabilities, arguing that exploration remains confined within the pre-trained model's low-rank bias manifold. In this work, we challenge this accessibility boundary hypothesis by demonstrating that the latent reasoning space can be fundamentally expanded through targeted geometric interventions. We propose Manifold-Reshaping Policy Optimization (MRPO), a geometric framework designed to fundamentally restructure the inference space of LLMs. MRPO operates in two stages: first, we employ Spectral Orthogonal Exploration (SOE) to eject the policy initialization into the null space of the bias manifold; second, we integrate an Effective Rank regularization term into the policy optimization objective. This approach incentivizes the discovery and maintenance of high-dimensional reasoning trajectories against the entropy-reducing tendency of standard RL. Empirically, our 4B-parameter method achieves state-of-the-art performance on mathematical tasks, significantly outperforming larger models (e.g., Qwen3-32B) and expanding the capability boundary beyond standard GRPO. Our code is available at https://anonymous.4open.science/r/MRPO-D57B/
- [43] arXiv:2602.02546 [pdf, ps, other]
-
Title: D$^2$Quant: Accurate Low-bit Post-Training Weight Quantization for LLMsAuthors: Xianglong Yan, ChengZhu Bao, Zhiteng Li, Tianao Zhang, Shaoqiu Zhang, Ruobing Xie, Samm Sun, Yulun ZhangSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Large language models (LLMs) deliver strong performance, but their high compute and memory costs make deployment difficult in resource-constrained scenarios. Weight-only post-training quantization (PTQ) is appealing, as it reduces memory usage and enables practical speedup without low-bit operators or specialized hardware. However, accuracy often degrades significantly in weight-only PTQ at sub-4-bit precision, and our analysis identifies two main causes: (1) down-projection matrices are a well-known quantization bottleneck, but maintaining their fidelity often requires extra bit-width; (2) weight quantization induces activation deviations, but effective correction strategies remain underexplored. To address these issues, we propose D$^2$Quant, a novel weight-only PTQ framework that improves quantization from both the weight and activation perspectives. On the weight side, we design a Dual-Scale Quantizer (DSQ) tailored to down-projection matrices, with an absorbable scaling factor that significantly improves accuracy without increasing the bit budget. On the activation side, we propose Deviation-Aware Correction (DAC), which incorporates a mean-shift correction within LayerNorm to mitigate quantization-induced activation distribution shifts. Extensive experiments across multiple LLM families and evaluation metrics show that D$^2$Quant delivers superior performance for weight-only PTQ at sub-4-bit precision. The code and models will be available at https://github.com/XIANGLONGYAN/D2Quant.
- [44] arXiv:2602.02547 [pdf, ps, other]
-
Title: naPINN: Noise-Adaptive Physics-Informed Neural Networks for Recovering Physics from Corrupted MeasurementSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Physics-Informed Neural Networks (PINNs) are effective methods for solving inverse problems and discovering governing equations from observational data. However, their performance degrades significantly under complex measurement noise and gross outliers. To address this issue, we propose the Noise-Adaptive Physics-Informed Neural Network (naPINN), which robustly recovers physical solutions from corrupted measurements without prior knowledge of the noise distribution. naPINN embeds an energy-based model into the training loop to learn the latent distribution of prediction residuals. Leveraging the learned energy landscape, a trainable reliability gate adaptively filters data points exhibiting high energy, while a rejection cost regularization prevents trivial solutions where valid data are discarded. We demonstrate the efficacy of naPINN on various benchmark partial differential equations corrupted by non-Gaussian noise and varying rates of outliers. The results show that naPINN significantly outperforms existing robust PINN baselines, successfully isolating outliers and accurately reconstructing the dynamics under severe data corruption.
- [45] arXiv:2602.02548 [pdf, ps, other]
-
Title: ToolTok: Tool Tokenization for Efficient and Generalizable GUI AgentsComments: 8 pages main paper, 18 pages total, 8 figures, 5 tables, code at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
Existing GUI agent models relying on coordinate-based one-step visual grounding struggle with generalizing to varying input resolutions and aspect ratios. Alternatives introduce coordinate-free strategies yet suffer from learning under severe data scarcity. To address the limitations, we propose ToolTok, a novel paradigm of multi-step pathfinding for GUI agents, where operations are modeled as a sequence of progressive tool usage. Specifically, we devise tools aligned with human interaction habits and represent each tool using learnable token embeddings. To enable efficient embedding learning under limited supervision, ToolTok introduces a semantic anchoring mechanism that grounds each tool with semantically related concepts as natural inductive bias. To further enable a pre-trained large language model to progressively acquire tool semantics, we construct an easy-to-hard curriculum consisting of three tasks: token definition question-answering, pure text-guided tool selection, and simplified visual pathfinding. Extensive experiments on multiple benchmarks show that ToolTok achieves superior performance among models of comparable scale (4B) and remains competitive with a substantially larger model (235B). Notably, these results are obtained using less than 1% of the training data required by other post-training approaches. In addition, ToolTok demonstrates strong generalization across unseen scenarios. Our training & inference code is open-source at https://github.com/ZephinueCode/ToolTok.
- [46] arXiv:2602.02549 [pdf, ps, other]
-
Title: Error Analysis of Matrix Multiplication Emulation Using Ozaki-II SchemeComments: 18 pages, 4 figuresSubjects: Numerical Analysis (math.NA); Distributed, Parallel, and Cluster Computing (cs.DC)
The Ozaki-II scheme is an emulation method that leverages the Chinese Remainder Theorem to compute high-precision matrix multiplication via a sequence of low-precision matrix multiplications. In this scheme, the attainable numerical accuracy improves as the number of low-precision matrix multiplications increases. Previous numerical studies have shown that single- and double-precision matrix multiplication using the Ozaki-II scheme achieves higher throughput than that of standard BLAS routines on modern AI hardware equipped with fast INT8 matrix multiply-accumulate units with INT8 inputs and INT32 accumulation. However, the accuracy of the Ozaki-II scheme can degrade when the exponent distribution of the input matrices is wide, in which case a large number of low-precision matrix multiplications is required to obtain high-precision results. In this paper, we present a rigorous deterministic error analysis of the Ozaki-II scheme. The proposed analysis not only clarifies the accuracy behavior of the method but also enables the estimation of the number of low-precision matrix multiplications required to achieve a desired level of numerical accuracy.
- [47] arXiv:2602.02550 [pdf, ps, other]
-
Title: HyPAC: Cost-Efficient LLMs-Human Hybrid Annotation with PAC Error GuaranteesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Data annotation often involves multiple sources with different cost-quality trade-offs, such as fast large language models (LLMs), slow reasoning models, and human experts. In this work, we study the problem of routing inputs to the most cost-efficient annotation source while controlling the labeling error on test instances. We propose \textbf{HyPAC}, a method that adaptively labels inputs to the most cost-efficient annotation source while providing distribution-free guarantees on annotation error. HyPAC calibrates two decision thresholds using importance sampling and upper confidence bounds, partitioning inputs into three regions based on uncertainty and routing each to the appropriate annotation source. We prove that HyPAC achieves the minimum expected cost with a probably approximately correct (PAC) guarantee on the annotation error, free of data distribution and pre-trained models. Experiments on common benchmarks demonstrate the effectiveness of our method, reducing the annotation cost by 78.51\% while tightly controlling the annotation error.
- [48] arXiv:2602.02551 [pdf, ps, other]
-
Title: EEO-TFV: Escape-Explore Optimizer for Web-Scale Time-Series Forecasting and Vision AnalysisComments: Main paper: 12 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Transformer-based foundation models have achieved remarkable progress in tasks such as time-series forecasting and image segmentation. However, they frequently suffer from error accumulation in multivariate long-sequence prediction and exhibit vulnerability to out-of-distribution samples in image-related tasks. Furthermore, these challenges become particularly pronounced in large-scale Web data analysis tasks, which typically involve complex temporal patterns and multimodal features. This complexity substantially increases optimization difficulty, rendering models prone to stagnation at saddle points within high-dimensional parameter spaces. To address these issues, we propose a lightweight Transformer architecture in conjunction with a novel Escape-Explore Optimizer (EEO). The optimizer enhances both exploration and generalization while effectively avoiding sharp minima and saddle-point traps. Experimental results show that, in representative Web data scenarios, our method achieves performance on par with state-of-the-art models across 11 time-series benchmark datasets and the Synapse medical image segmentation task. Moreover, it demonstrates superior generalization and stability, thereby validating its potential as a versatile cross-task foundation model for Web-scale data mining and analysis.
- [49] arXiv:2602.02554 [pdf, ps, other]
-
Title: BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-TranslationAuthors: Jingwen Xu, Yiyang Lu, Zisu Huang, Changze Lv, Xiaohua Wang, Shizheng Li, Zhibo Xu, Zhengkang Guo, Zhengyuan Wang, Muzhao Tian, Xuanjing Huang, Xiaoqing ZhengSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Training LLMs for code-related tasks typically depends on high-quality code-documentation pairs, which are costly to curate and often scarce for niche programming languages. We introduce BatCoder, a self-supervised reinforcement learning framework designed to jointly optimize code generation and documentation production. BatCoder employs a back-translation strategy: a documentation is first generated from code, and then the generated documentation is used to reconstruct the original code. The semantic similarity between the original and reconstructed code serves as an implicit reward, enabling reinforcement learning to improve the model's performance both in generating code from documentation and vice versa. This approach allows models to be trained using only code, substantially increasing the available training examples. Evaluated on HumanEval and MBPP with a 7B model, BatCoder achieved 83.5% and 81.0% pass@1, outperforming strong open-source baselines. Moreover, the framework demonstrates consistent scaling with respect to both training corpus size and model capacity.
- [50] arXiv:2602.02555 [pdf, ps, other]
-
Title: Learning to Explore with Parameter-Space Noise: A Deep Dive into Parameter-Space Noise for Reinforcement Learning with Verifiable RewardsComments: 17 pages, 10 FiguresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Reinforcement Learning with Verifiable Rewards (RLVR) improves LLM reasoning, yet growing evidence indicates an exploration ceiling: it often reweights existing solution traces rather than discovering new strategies, limiting gains under large sampling budgets (e.g., pass-at-256). We address this limitation with PSN-RLVR, which perturbs policy parameters before rollout generation to induce temporally consistent, trajectory-level exploration that better preserves long-horizon chain-of-thought coherence than action-space noise. To mitigate the resulting sampling-update mismatch, we incorporate truncated importance sampling (TIS). To avoid expensive KL-based adaptive noise control, we propose a computationally efficient real-time adaptive noise scheduler driven by a lightweight surrogate that combines semantic diversity with normalized self-certainty. Instantiated on GRPO, a widely used RLVR method, PSN-GRPO consistently expands the effective reasoning capability boundary across multiple mathematical reasoning benchmarks and model families, yielding higher pass-at-k under large sampling budgets and outperforming prior exploration-oriented RLVR methods (e.g., Pass-at-k-style training) while remaining orthogonal and thus composable for additional gains.
- [51] arXiv:2602.02556 [pdf, ps, other]
-
Title: Beyond Experience Retrieval: Learning to Generate Utility-Optimized Structured Experience for Frozen LLMsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Large language models (LLMs) are largely static and often redo reasoning or repeat mistakes. Prior experience reuse typically relies on external retrieval, which is similarity-based, can introduce noise, and adds latency. We introduce SEAM (Structured Experience Adapter Module), a lightweight, executor-specific plug-in that stores experience in its parameters and generates a structured, instance-tailored experience entry in a single forward pass to guide a frozen LLM executor. SEAM is trained for utility via executor rollouts and GRPO while keeping the executor frozen, and it can be further improved after deployment with supervised fine-tuning on logged successful trajectories. Experiments on mathematical reasoning benchmarks show consistent accuracy gains across executors with low overhead. Extensive ablations and analyses further elucidate the mechanisms underlying SEAM's effectiveness and robustness.
- [52] arXiv:2602.02557 [pdf, ps, other]
-
Title: The Alignment Curse: Cross-Modality Jailbreak Transfer in Omni-ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)
Recent advances in end-to-end trained omni-models have significantly improved multimodal understanding. At the same time, safety red-teaming has expanded beyond text to encompass audio-based jailbreak attacks. However, an important bridge between textual and audio jailbreaks remains underexplored. In this work, we study the cross-modality transfer of jailbreak attacks from text to audio, motivated by the semantic similarity between the two modalities and the maturity of textual jailbreak methods. We first analyze the connection between modality alignment and cross-modality jailbreak transfer, showing that strong alignment can inadvertently propagate textual vulnerabilities to the audio modality, which we term the alignment curse. Guided by this analysis, we conduct an empirical evaluation of textual jailbreaks, text-transferred audio jailbreaks, and existing audio-based jailbreaks on recent omni-models. Our results show that text-transferred audio jailbreaks perform comparably to, and often better than, audio-based jailbreaks, establishing them as simple yet powerful baselines for future audio red-teaming. We further demonstrate strong cross-model transferability and show that text-transferred audio attacks remain effective even under a stricter audio-only access threat model.
- [53] arXiv:2602.02558 [pdf, ps, other]
-
Title: PA-MIL: Phenotype-Aware Multiple Instance Learning Guided by Language Prompting and Genotype-to-Phenotype RelationshipsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
Deep learning has been extensively researched in the analysis of pathology whole-slide images (WSIs). However, most existing methods are limited to providing prediction interpretability by locating the model's salient areas in a post-hoc manner, failing to offer more reliable and accountable explanations. In this work, we propose Phenotype-Aware Multiple Instance Learning (PA-MIL), a novel ante-hoc interpretable framework that identifies cancer-related phenotypes from WSIs and utilizes them for cancer subtyping. To facilitate PA-MIL in learning phenotype-aware features, we 1) construct a phenotype knowledge base containing cancer-related phenotypes and their associated genotypes. 2) utilize the morphological descriptions of phenotypes as language prompting to aggregate phenotype-related features. 3) devise the Genotype-to-Phenotype Neural Network (GP-NN) grounded in genotype-to-phenotype relationships, which provides multi-level guidance for PA-MIL. Experimental results on multiple datasets demonstrate that PA-MIL achieves competitive performance compared to existing MIL methods while offering improved interpretability. PA-MIL leverages phenotype saliency as evidence and, using a linear classifier, achieves competitive results compared to state-of-the-art methods. Additionally, we thoroughly analyze the genotype-phenotype relationships, as well as cohort-level and case-level interpretability, demonstrating the reliability and accountability of PA-MIL.
- [54] arXiv:2602.02559 [pdf, ps, other]
-
Title: Experience-Driven Multi-Agent Systems Are Training-free Context-aware Earth ObserversComments: 21 pages, 6 figuresSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Recent advances have enabled large language model (LLM) agents to solve complex tasks by orchestrating external tools. However, these agents often struggle in specialized, tool-intensive domains that demand long-horizon execution, tight coordination across modalities, and strict adherence to implicit tool constraints. Earth Observation (EO) tasks exemplify this challenge due to the multi-modal and multi-temporal data inputs, as well as the requirements of geo-knowledge constraints (spectrum library, spatial reasoning, etc): many high-level plans can be derailed by subtle execution errors that propagate through a pipeline and invalidate final results. A core difficulty is that existing agents lack a mechanism to learn fine-grained, tool-level expertise from interaction. Without such expertise, they cannot reliably configure tool parameters or recover from mid-execution failures, limiting their effectiveness in complex EO workflows. To address this, we introduce \textbf{GeoEvolver}, a self-evolving multi-agent system~(MAS) that enables LLM agents to acquire EO expertise through structured interaction without any parameter updates. GeoEvolver decomposes each query into independent sub-goals via a retrieval-augmented multi-agent orchestrator, then explores diverse tool-parameter configurations at the sub-goal level. Successful patterns and root-cause attribution from failures are then distilled in an evolving memory bank that provides in-context demonstrations for future queries. Experiments on three tool-integrated EO benchmarks show that GeoEvolver consistently improves end-to-end task success, with an average gain of 12\% across multiple LLM backbones, demonstrating that EO expertise can emerge progressively from efficient, fine-grained interactions with the environment.
- [55] arXiv:2602.02560 [pdf, ps, other]
-
Title: Auditing Sybil: Explaining Deep Lung Cancer Risk Prediction Through Generative Interventional AttributionsAuthors: Bartlomiej Sobieski, Jakub Grzywaczewski, Karol Dobiczek, Mateusz Wójcik, Tomasz Bartczak, Patryk Szatkowski, Przemysław Bombiński, Matthew Tivnan, Przemyslaw BiecekComments: PreprintSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Lung cancer remains the leading cause of cancer mortality, driving the development of automated screening tools to alleviate radiologist workload. Standing at the frontier of this effort is Sybil, a deep learning model capable of predicting future risk solely from computed tomography (CT) with high precision. However, despite extensive clinical validation, current assessments rely purely on observational metrics. This correlation-based approach overlooks the model's actual reasoning mechanism, necessitating a shift to causal verification to ensure robust decision-making before clinical deployment. We propose S(H)NAP, a model-agnostic auditing framework that constructs generative interventional attributions validated by expert radiologists. By leveraging realistic 3D diffusion bridge modeling to systematically modify anatomical features, our approach isolates object-specific causal contributions to the risk score. Providing the first interventional audit of Sybil, we demonstrate that while the model often exhibits behavior akin to an expert radiologist, differentiating malignant pulmonary nodules from benign ones, it suffers from critical failure modes, including dangerous sensitivity to clinically unjustified artifacts and a distinct radial bias.
- [56] arXiv:2602.02561 [pdf, ps, other]
-
Title: MathlibLemma: Folklore Lemma Generation and Benchmark for Formal MathematicsAuthors: Xinyu Liu, Zixuan Xie, Amir Moeini, Claire Chen, Shuze Daniel Liu, Yu Meng, Aidong Zhang, Shangtong ZhangSubjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
While the ecosystem of Lean and Mathlib has enjoyed celebrated success in formal mathematical reasoning with the help of large language models (LLMs), the absence of many folklore lemmas in Mathlib remains a persistent barrier that limits Lean's usability as an everyday tool for mathematicians like LaTeX or Maple. To address this, we introduce MathlibLemma, the first LLM-based multi-agent system to automate the discovery and formalization of mathematical folklore lemmas. This framework constitutes our primary contribution, proactively mining the missing connective tissue of mathematics. Its efficacy is demonstrated by the production of a verified library of folklore lemmas, a subset of which has already been formally merged into the latest build of Mathlib, thereby validating the system's real-world utility and alignment with expert standards. Leveraging this pipeline, we further construct the MathlibLemma benchmark, a suite of 4,028 type-checked Lean statements spanning a broad range of mathematical domains. By transforming the role of LLMs from passive consumers to active contributors, this work establishes a constructive methodology for the self-evolution of formal mathematical libraries.
- [57] arXiv:2602.02563 [pdf, ps, other]
-
Title: A General ReLearner: Empowering Spatiotemporal Prediction by Re-learning Input-label ResidualSubjects: Machine Learning (cs.LG)
Prevailing spatiotemporal prediction models typically operate under a forward (unidirectional) learning paradigm, in which models extract spatiotemporal features from historical observation input and map them to target spatiotemporal space for future forecasting (label). However, these models frequently exhibit suboptimal performance when spatiotemporal discrepancies exist between inputs and labels, for instance, when nodes with similar time-series inputs manifest distinct future labels, or vice versa. To address this limitation, we propose explicitly incorporating label features during the training phase. Specifically, we introduce the Spatiotemporal Residual Theorem, which generalizes the conventional unidirectional spatiotemporal prediction paradigm into a bidirectional learning framework. Building upon this theoretical foundation, we design an universal module, termed ReLearner, which seamlessly augments Spatiotemporal Neural Networks (STNNs) with a bidirectional learning capability via an auxiliary inverse learning process. In this process, the model relearns the spatiotemporal feature residuals between input data and future data. The proposed ReLearner comprises two critical components: (1) a Residual Learning Module, designed to effectively disentangle spatiotemporal feature discrepancies between input and label representations; and (2) a Residual Smoothing Module, employed to smooth residual terms and facilitate stable convergence. Extensive experiments conducted on 11 real-world datasets across 14 backbone models demonstrate that ReLearner significantly enhances the predictive performance of existing STNNs.Our code is available on GitHub.
- [58] arXiv:2602.02564 [pdf, ps, other]
-
Title: Label Curation Using Agentic AISubjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Data annotation is essential for supervised learning, yet producing accurate, unbiased, and scalable labels remains challenging as datasets grow in size and modality. Traditional human-centric pipelines are costly, slow, and prone to annotator variability, motivating reliability-aware automated annotation. We present AURA (Agentic AI for Unified Reliability Modeling and Annotation Aggregation), an agentic AI framework for large-scale, multi-modal data annotation. AURA coordinates multiple AI agents to generate and validate labels without requiring ground truth. At its core, AURA adapts a classical probabilistic model that jointly infers latent true labels and annotator reliability via confusion matrices, using Expectation-Maximization to reconcile conflicting annotations and aggregate noisy predictions. Across the four benchmark datasets evaluated, AURA achieves accuracy improvements of up to 5.8% over baseline. In more challenging settings with poor quality annotators, the improvement is up to 50% over baseline. AURA also accurately estimates the reliability of annotators, allowing assessment of annotator quality even without any pre-validation steps.
- [59] arXiv:2602.02565 [pdf, ps, other]
-
Title: High Rank Matrix Completion via Grassmannian Proxy FusionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
This paper approaches high-rank matrix completion (HRMC) by filling missing entries in a data matrix where columns lie near a union of subspaces, clustering these columns, and identifying the underlying subspaces. Current methods often lack theoretical support, produce uninterpretable results, and require more samples than theoretically necessary. We propose clustering incomplete vectors by grouping proxy subspaces and minimizing two criteria over the Grassmannian: (a) the chordal distance between each point and its corresponding subspace and (b) the geodesic distances between subspaces of all data points. Experiments on synthetic and real datasets demonstrate that our method performs comparably to leading methods in high sampling rates and significantly better in low sampling rates, thus narrowing the gap to the theoretical sampling limit of HRMC.
- [60] arXiv:2602.02566 [pdf, ps, other]
-
Title: A Comparative Simulation Study of the Fairness and Accuracy of Predictive Policing Systems in Baltimore CityComments: 36 pages, 27 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
There are ongoing discussions about predictive policing systems, such as those deployed in Los Angeles, California and Baltimore, Maryland, being unfair, for example, by exhibiting racial bias. Studies found that unfairness may be due to feedback loops and being trained on historically biased recorded data. However, comparative studies on predictive policing systems are few and are not sufficiently comprehensive. In this work, we perform a comprehensive comparative simulation study on the fairness and accuracy of predictive policing technologies in Baltimore. Our results suggest that the situation around bias in predictive policing is more complex than was previously assumed. While predictive policing exhibited bias due to feedback loops as was previously reported, we found that the traditional alternative, hot spots policing, had similar issues. Predictive policing was found to be more fair and accurate than hot spots policing in the short term, although it amplified bias faster, suggesting the potential for worse long-run behavior. In Baltimore, in some cases the bias in these systems tended toward over-policing in White neighborhoods, unlike in previous studies. Overall, this work demonstrates a methodology for city-specific evaluation and behavioral-tendency comparison of predictive policing systems, showing how such simulations can reveal inequities and long-term tendencies.
- [61] arXiv:2602.02567 [pdf, ps, other]
-
Title: IceBench-S2S: A Benchmark of Deep Learning for Challenging Subseasonal-to-Seasonal Daily Arctic Sea Ice Forecasting in Deep Latent SpaceComments: 9 pages, 6 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Arctic sea ice plays a critical role in regulating Earth's climate system, significantly influencing polar ecological stability and human activities in coastal regions. Recent advances in artificial intelligence have facilitated the development of skillful pan-Arctic sea ice forecasting systems, where data-driven approaches showcase tremendous potential to outperform conventional physics-based numerical models in terms of accuracy, computational efficiency and forecasting lead times. Despite the latest progress made by deep learning (DL) forecasting models, most of their skillful forecasting lead times are confined to daily subseasonal scale and monthly averaged values for up to six months, which drastically hinders their deployment for real-world applications, e.g., maritime routine planning for Arctic transportation and scientific investigation. Extending daily forecasts from subseasonal to seasonal (S2S) scale is scientifically crucial for operational applications. To bridge the gap between the forecasting lead time of current DL models and the significant daily S2S scale, we introduce IceBench-S2S, the first comprehensive benchmark for evaluating DL approaches in mitigating the challenge of forecasting Arctic sea ice concentration in successive 180-day periods. It proposes a generalized framework that first compresses spatial features of daily sea ice data into a deep latent space. The temporally concatenated deep features are subsequently modeled by DL-based forecasting backbones to predict the sea ice variation at S2S scale. IceBench-S2S provides a unified training and evaluation pipeline for different backbones, along with practical guidance for model selection in polar environmental monitoring tasks.
- [62] arXiv:2602.02568 [pdf, ps, other]
-
Title: Mitigating Task-Order Sensitivity and Forgetting via Hierarchical Second-Order ConsolidationComments: 21 pages, 8 figuresSubjects: Machine Learning (cs.LG)
We introduce $\textbf{Hierarchical Taylor Series-based Continual Learning (HTCL)}$, a framework that couples fast local adaptation with conservative, second-order global consolidation to address the high variance introduced by random task ordering. To address task-order effects, HTCL identifies the best intra-group task sequence and integrates the resulting local updates through a Hessian-regularized Taylor expansion, yielding a consolidation step with theoretical guarantees. The approach naturally extends to an $L$-level hierarchy, enabling multiscale knowledge integration in a manner not supported by conventional single-level CL systems. Across a wide range of datasets and replay and regularization baselines, HTCL acts as a model-agnostic consolidation layer that consistently enhances performance, yielding mean accuracy gains of $7\%$ to $25\%$ while reducing the standard deviation of final accuracy by up to $68\%$ across random task permutations.
- [63] arXiv:2602.02569 [pdf, ps, other]
-
Title: DECEIVE-AFC: Adversarial Claim Attacks against Search-Enabled LLM-based Fact-Checking SystemsSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Fact-checking systems with search-enabled large language models (LLMs) have shown strong potential for verifying claims by dynamically retrieving external evidence. However, the robustness of such systems against adversarial attack remains insufficiently understood. In this work, we study adversarial claim attacks against search-enabled LLM-based fact-checking systems under a realistic input-only threat model. We propose DECEIVE-AFC, an agent-based adversarial attack framework that integrates novel claim-level attack strategies and adversarial claim validity evaluation principles. DECEIVE-AFC systematically explores adversarial attack trajectories that disrupt search behavior, evidence retrieval, and LLM-based reasoning without relying on access to evidence sources or model internals. Extensive evaluations on benchmark datasets and real-world systems demonstrate that our attacks substantially degrade verification performance, reducing accuracy from 78.7% to 53.7%, and significantly outperform existing claim-based attack baselines with strong cross-system transferability.
- [64] arXiv:2602.02570 [pdf, ps, other]
-
Title: An Improved Quasi-Physical Dynamic Algorithm for Efficient Circular Coverage in Arbitrary ConvexSubjects: Computational Geometry (cs.CG)
The optimal circle coverage problem aims to find a configuration of circles that maximizes the covered area within a given region. Although theoretical optimal solutions exist for simple cases, the problem's NP-hard characteristic makes the problem computationally intractable for complex polygons with numerous circles. Prevailing methods are largely confined to regular domains, while the few algorithms designed for irregular polygons suffer from poor initialization, unmanaged boundary effects, and excessive overlap among circles, resulting in low coverage efficiency. Consequently, we propose an Improved Quasi-Physical Dynamic(IQPD) algorithm for arbitrary convex polygons. Our core contributions are threefold: (1) proposing a structure-preserving initialization strategy that maps a hexagonal close-packing of circles into the target polygon via scaling and affine transformation; (2) constructing a virtual force field incorporating friction and a radius-expansion optimization iteration model; (3) designing a boundary-surrounding strategy based on normal and tangential gradients to retrieve overflowing circles. Experimental results demonstrate that our algorithm significantly outperforms four state-of-the-art methods on seven metrics across a variety of convex polygons. This work could provide a more efficient solution for operational optimization or resource allocation in practical applications.
- [65] arXiv:2602.02571 [pdf, ps, other]
-
Title: Trajectory Consistency for One-Step Generation on Euler Mean FlowsComments: 40 pages, 27 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We propose \emph{Euler Mean Flows (EMF)}, a flow-based generative framework for one-step and few-step generation that enforces long-range trajectory consistency with minimal sampling cost. The key idea of EMF is to replace the trajectory consistency constraint, which is difficult to supervise and optimize over long time scales, with a principled linear surrogate that enables direct data supervision for long-horizon flow-map compositions. We derive this approximation from the semigroup formulation of flow-based models and show that, under mild regularity assumptions, it faithfully approximates the original consistency objective while being substantially easier to optimize. This formulation leads to a unified, JVP-free training framework that supports both $u$-prediction and $x_1$-prediction variants, avoiding explicit Jacobian computations and significantly reducing memory and computational overhead. Experiments on image synthesis, particle-based geometry generation, and functional generation demonstrate improved optimization stability and sample quality under fixed sampling budgets, together with approximately $50\%$ reductions in training time and memory consumption compared to existing one-step methods for image generation.
- [66] arXiv:2602.02572 [pdf, ps, other]
-
Title: Reward Shaping for Inference-Time Alignment: A Stackelberg Game PerspectiveSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Existing alignment methods directly use the reward model learned from user preference data to optimize an LLM policy, subject to KL regularization with respect to the base policy. This practice is suboptimal for maximizing user's utility because the KL regularization may cause the LLM to inherit the bias in the base policy that conflicts with user preferences. While amplifying rewards for preferred outputs can mitigate this bias, it also increases the risk of reward hacking. This tradeoff motivates the problem of optimally designing reward models under KL regularization. We formalize this reward model optimization problem as a Stackelberg game, and show that a simple reward shaping scheme can effectively approximate the optimal reward model. We empirically evaluate our method in inference-time alignment settings and demonstrate that it integrates seamlessly into existing alignment methods with minimal overhead. Our method consistently improves average reward and achieves win-tie rates exceeding 66% against all baselines, averaged across evaluation settings.
- [67] arXiv:2602.02573 [pdf, ps, other]
-
Title: Product Interaction: An Algebraic Formalism for Deep Learning ArchitecturesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
In this paper, we introduce product interactions, an algebraic formalism in which neural network layers are constructed from compositions of a multiplication operator defined over suitable algebras. Product interactions provide a principled way to generate and organize algebraic expressions by increasing interaction order. Our central observation is that algebraic expressions in modern neural networks admit a unified construction in terms of linear, quadratic, and higher-order product interactions. Convolutional and equivariant networks arise as symmetry-constrained linear product interactions, while attention and Mamba correspond to higher-order product interactions.
- [68] arXiv:2602.02574 [pdf, ps, other]
-
Title: WritePolicyBench: Benchmarking Memory Write Policies under Byte BudgetsAuthors: Edgard El ChamComments: 10 pages, 4 figuresSubjects: Performance (cs.PF)
We introduce WritePolicyBench, a benchmark for evaluating memory write policies: decision rules that choose what to store, merge, and evict under a strict byte budget while processing a stream with document/API drift. The benchmark provides (i) task generators with controlled non-stationarity, (ii) an explicit action interface for external memory, (iii) a byte-accurate cost model, and (iv) standardized metrics that measure both task success and budget efficiency.
- [69] arXiv:2602.02579 [pdf, ps, other]
-
Title: ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented GenerationAuthors: Shihao Wang, Jiahao Chen, Yanqi Pan, Hao Huang, Yichen Hao, Xiangyu Zou, Wen Xia, Wentao Zhang, Haitao Wang, Junhong Li, Chongyang Qiu, Pengfei WangSubjects: Operating Systems (cs.OS); Artificial Intelligence (cs.AI)
The prefill stage of long-context Retrieval-Augmented Generation (RAG) is severely bottlenecked by computational overhead. To mitigate this, recent methods assemble pre-calculated KV caches of retrieved RAG documents (by a user query) and reprocess selected tokens to recover cross-attention between these pre-calculated KV caches. However, we identify a fundamental "crowding-out effect" in current token selection criteria: globally salient but user-query-irrelevant tokens saturate the limited recomputation budget, displacing the tokens truly essential for answering the user query and degrading inference accuracy.
We propose ProphetKV, a user-query-driven KV Cache reuse method for RAG scenarios. ProphetKV dynamically prioritizes tokens based on their semantic relevance to the user query and employs a dual-stage recomputation pipeline to fuse layer-wise attention metrics into a high-utility set. By ensuring the recomputation budget is dedicated to bridging the informational gap between retrieved context and the user query, ProphetKV achieves high-fidelity attention recovery with minimal overhead. Our extensive evaluation results show that ProphetKV retains 96%-101% of full-prefill accuracy with only a 20% recomputation ratio, while achieving accuracy improvements of 8.8%-24.9% on RULER and 18.6%-50.9% on LongBench over the state-of-the-art approaches (e.g., CacheBlend, EPIC, and KVShare). - [70] arXiv:2602.02581 [pdf, ps, other]
-
Title: QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning SignalsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Weight-only quantization is important for compressing Large Language Models (LLMs). Inspired by the spirit of classical magnitude pruning, we study whether the magnitude of weight updates during reasoning-incentivized fine-tuning can provide valuable signals for quantizing Large Reasoning Models (LRMs). We hypothesize that the smallest and largest weight updates during fine-tuning are more important than those of intermediate magnitude, a phenomenon we term "protecting both ends". Upon hypothesis validation, we introduce QuantLRM, which stands for weight quantization of LRMs via fine-tuning signals. We fit simple restricted quadratic functions on weight updates to protect both ends. By multiplying the average quadratic values with the count of zero weight updates of channels, we compute channel importance that is more effective than using activation or second-order information. We run QuantLRM to quantize various fine-tuned models (including supervised, direct preference optimization, and reinforcement learning fine-tuning) over four reasoning benchmarks (AIME-120, FOLIO, temporal sequences, and GPQA-Diamond) and empirically find that QuantLRM delivers a consistent improvement for LRMs quantization, with an average improvement of 6.55% on a reinforcement learning fine-tuned model. Also supporting non-fine-tuned LRMs, QuantLRM gathers effective signals via pseudo-fine-tuning, which greatly enhances its applicability.
- [71] arXiv:2602.02582 [pdf, ps, other]
-
Title: Uncertainty and Fairness Awareness in LLM-Based Recommendation SystemsComments: Accepted at the Second Conference of the International Association for Safe and Ethical Artificial Intelligence, IASEAI26, 14 pagesSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Information Retrieval (cs.IR); Machine Learning (cs.LG); Software Engineering (cs.SE)
Large language models (LLMs) enable powerful zero-shot recommendations by leveraging broad contextual knowledge, yet predictive uncertainty and embedded biases threaten reliability and fairness. This paper studies how uncertainty and fairness evaluations affect the accuracy, consistency, and trustworthiness of LLM-generated recommendations. We introduce a benchmark of curated metrics and a dataset annotated for eight demographic attributes (31 categorical values) across two domains: movies and music. Through in-depth case studies, we quantify predictive uncertainty (via entropy) and demonstrate that Google DeepMind's Gemini 1.5 Flash exhibits systematic unfairness for certain sensitive attributes; measured similarity-based gaps are SNSR at 0.1363 and SNSV at 0.0507. These disparities persist under prompt perturbations such as typographical errors and multilingual inputs. We further integrate personality-aware fairness into the RecLLM evaluation pipeline to reveal personality-linked bias patterns and expose trade-offs between personalization and group fairness. We propose a novel uncertainty-aware evaluation methodology for RecLLMs, present empirical insights from deep uncertainty case studies, and introduce a personality profile-informed fairness benchmark that advances explainability and equity in LLM recommendations. Together, these contributions establish a foundation for safer, more interpretable RecLLMs and motivate future work on multi-model benchmarks and adaptive calibration for trustworthy deployment.
- [72] arXiv:2602.02583 [pdf, ps, other]
-
Title: Copula-Based Aggregation and Context-Aware Conformal Prediction for Reliable Renewable Energy ForecastingSubjects: Machine Learning (cs.LG); Applications (stat.AP)
The rapid growth of renewable energy penetration has intensified the need for reliable probabilistic forecasts to support grid operations at aggregated (fleet or system) levels. In practice, however, system operators often lack access to fleet-level probabilistic models and instead rely on site-level forecasts produced by heterogeneous third-party providers. Constructing coherent and calibrated fleet-level probabilistic forecasts from such inputs remains challenging due to complex cross-site dependencies and aggregation-induced miscalibration. This paper proposes a calibrated probabilistic aggregation framework that directly converts site-level probabilistic forecasts into reliable fleet-level forecasts in settings where system-level models cannot be trained or maintained. The framework integrates copula-based dependence modeling to capture cross-site correlations with Context-Aware Conformal Prediction (CACP) to correct miscalibration at the aggregated level. This combination enables dependence-aware aggregation while providing valid coverage and maintaining sharp prediction intervals. Experiments on large-scale solar generation datasets from MISO, ERCOT, and SPP demonstrate that the proposed Copula+CACP approach consistently achieves near-nominal coverage with significantly sharper intervals than uncalibrated aggregation baselines.
- [73] arXiv:2602.02584 [pdf, ps, other]
-
Title: Constitutional Spec-Driven Development: Enforcing Security by Construction in AI-Assisted Code GenerationAuthors: Srinivas Rao MarriComments: 15 pages, 2 figures, 5 tables, 11 code listings, 14 references. Includes reference implementation and compliance traceability matrixSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
The proliferation of AI-assisted "vibe coding" enables rapid software development but introduces significant security risks, as Large Language Models (LLMs) prioritize functional correctness over security. We present Constitutional Spec-Driven Development, a methodology that embeds non-negotiable security principles into the specification layer, ensuring AI-generated code adheres to security requirements by construction rather than inspection. Our approach introduces a Constitution: a versioned, machine-readable document encoding security constraints derived from Common Weakness Enumeration (CWE)/MITRE Top 25 vulnerabilities and regulatory frameworks. We demonstrate the methodology through a banking microservices application, selected as a representative example domain due to its stringent regulatory and security requirements, implementing customer management, account operations, and transaction processing. The methodology itself is domain-agnostic. The implementation addresses 10 critical CWE vulnerabilities through constitutional constraints with full traceability from principles to code locations. Our case study shows that constitutional constraints reduce security defects by 73% compared to unconstrained AI generation while maintaining developer velocity. We contribute a formal framework for constitutional security, a complete development methodology, and empirical evidence that proactive security specification outperforms reactive security verification in AI-assisted development workflows.
- [74] arXiv:2602.02585 [pdf, ps, other]
-
Title: Agentic Observability: Automated Alert Triage for Adobe E-CommerceComments: Accepted at AAAI'26 Agentic AI Benchmarks and Applications for Enterprise Tasks WorkshopSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Modern enterprise systems exhibit complex interdependencies that make observability and incident response increasingly challenging. Manual alert triage, which typically involves log inspection, API verification, and cross-referencing operational knowledge bases, remains a major bottleneck in reducing mean recovery time (MTTR). This paper presents an agentic observability framework deployed within Adobe's e-commerce infrastructure that autonomously performs alert triage using a ReAct paradigm. Upon alert detection, the agent dynamically identifies the affected service, retrieves and analyzes correlated logs across distributed systems, and plans context-dependent actions such as handbook consultation, runbook execution, or retrieval-augmented analysis of recently deployed code. Empirical results from production deployment indicate a 90% reduction in mean time to insight compared to manual triage, while maintaining comparable diagnostic accuracy. Our results show that agentic AI enables an order-of-magnitude reduction in triage latency and a step-change in resolution accuracy, marking a pivotal shift toward autonomous observability in enterprise operations.
- [75] arXiv:2602.02589 [pdf, ps, other]
-
Title: PeerRank: Autonomous LLM Evaluation Through Web-Grounded, Bias-Controlled Peer ReviewSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Evaluating large language models typically relies on human-authored benchmarks, reference answers, and human or single-model judgments, approaches that scale poorly, become quickly outdated, and mismatch open-world deployments that depend on web retrieval and synthesis. We introduce PeerRank, a fully autonomous end-to-end evaluation framework in which models generate evaluation tasks, answer them with category-scoped live web grounding, judge peer responses and aggregate dense peer assessments into relative performance estimates, without human supervision or gold references. PeerRank treats evaluation as a multi-agent process where each model participates symmetrically as task designer, respondent, and evaluator, while removing biased judgments. In a large-scale study over 12 commercially available models and 420 autonomously generated questions, PeerRank produces stable, discriminative rankings and reveals measurable identity and presentation biases. Rankings are robust, and mean peer scores agree with Elo. We further validate PeerRank on TruthfulQA and GSM8K, where peer scores correlate with objective accuracy. Together, these results suggest that bias-aware peer evaluation with selective web-grounded answering can scale open-world LLM assessment beyond static and human curated benchmarks.
- [76] arXiv:2602.02590 [pdf, ps, other]
-
Title: StepNav: Structured Trajectory Priors for Efficient and Multimodal Visual NavigationComments: 8 pages, 7 figures; Accepted by ICRA 2026Subjects: Robotics (cs.RO)
Visual navigation is fundamental to autonomous systems, yet generating reliable trajectories in cluttered and uncertain environments remains a core challenge. Recent generative models promise end-to-end synthesis, but their reliance on unstructured noise priors often yields unsafe, inefficient, or unimodal plans that cannot meet real-time requirements. We propose StepNav, a novel framework that bridges this gap by introducing structured, multimodal trajectory priors derived from variational principles. StepNav first learns a geometry-aware success probability field to identify all feasible navigation corridors. These corridors are then used to construct an explicit, multi-modal mixture prior that initializes a conditional flow-matching process. This refinement is formulated as an optimal control problem with explicit smoothness and safety regularization. By replacing unstructured noise with physically-grounded candidates, StepNav generates safer and more efficient plans in significantly fewer steps. Experiments in both simulation and real-world benchmarks demonstrate consistent improvements in robustness, efficiency, and safety over state-of-the-art generative planners, advancing reliable trajectory generation for practical autonomous navigation. The code has been released at https://github.com/LuoXubo/StepNav.
- [77] arXiv:2602.02591 [pdf, ps, other]
-
Title: VividVoice: A Unified Framework for Scene-Aware Visually-Driven Speech SynthesisComments: Accepted by ICASSP 2026Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
We introduce and define a novel task-Scene-Aware Visually-Driven Speech Synthesis, aimed at addressing the limitations of existing speech generation models in creating immersive auditory experiences that align with the real physical world. To tackle the two core challenges of data scarcity and modality decoupling, we propose VividVoice, a unified generative framework. First, we constructed a large-scale, high-quality hybrid multimodal dataset, Vivid-210K, which, through an innovative programmatic pipeline, establishes a strong correlation between visual scenes, speaker identity, and audio for the first time. Second, we designed a core alignment module, D-MSVA, which leverages a decoupled memory bank architecture and a cross-modal hybrid supervision strategy to achieve fine-grained alignment from visual scenes to timbre and environmental acoustic features. Both subjective and objective experimental results provide strong evidence that VividVoice significantly outperforms existing baseline models in terms of audio fidelity, content clarity, and multimodal consistency. Our demo is available at https://chengyuann.github.io/VividVoice/.
- [78] arXiv:2602.02592 [pdf, ps, other]
-
Title: Learnable Koopman-Enhanced Transformer-Based Time Series Forecasting with Spectral ControlSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
This paper proposes a unified family of learnable Koopman operator parameterizations that integrate linear dynamical systems theory with modern deep learning forecasting architectures. We introduce four learnable Koopman variants-scalar-gated, per-mode gated, MLP-shaped spectral mapping, and low-rank Koopman operators which generalize and interpolate between strictly stable Koopman operators and unconstrained linear latent dynamics. Our formulation enables explicit control over the spectrum, stability, and rank of the linear transition operator while retaining compatibility with expressive nonlinear backbones such as Patchtst, Autoformer, and Informer. We evaluate the proposed operators in a large-scale benchmark that also includes LSTM, DLinear, and simple diagonal State-Space Models (SSMs), as well as lightweight transformer variants. Experiments across multiple horizons and patch lengths show that learnable Koopman models provide a favorable bias-variance trade-off, improved conditioning, and more interpretable latent dynamics. We provide a full spectral analysis, including eigenvalue trajectories, stability envelopes, and learned spectral distributions. Our results demonstrate that learnable Koopman operators are effective, stable, and theoretically principled components for deep forecasting.
- [79] arXiv:2602.02593 [pdf, ps, other]
-
Title: Effective Frontiers: A Unification of Neural Scaling LawsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Neural scaling laws govern the prediction power-law improvement of test loss with respect to model capacity ($N$), datasize ($D$), and compute ($C$). However, existing theoretical explanations often rely on specific architectures or complex kernel methods, lacking intuitive universality. In this paper, we propose a unified framework that abstracts general learning tasks as the progressive coverage of patterns from a long-tail (Zipfian) distribution. We introduce the Effective Frontier ($k_\star$), a threshold in the pattern rank space that separates learned knowledge from the unlearned tail. We prove that reducible loss is asymptotically determined by the probability mass of the tail a resource-dependent frontier truncation. Based on our framework, we derive the precise scaling laws for $N$, $D$, and $C$, attributing them to capacity, coverage, and optimization bottlenecks, respectively. Furthermore, we unify these mechanisms via a Max-Bottleneck principle, demonstrating that the Kaplan and Chinchilla scaling laws are not contradictory, but equilibrium solutions to the same constrained optimization problem under different active bottlenecks.
- [80] arXiv:2602.02595 [pdf, ps, other]
-
Title: To Defend Against Cyber Attacks, We Must Teach AI Agents to HackSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
For over a decade, cybersecurity has relied on human labor scarcity to limit attackers to high-value targets manually or generic automated attacks at scale. Building sophisticated exploits requires deep expertise and manual effort, leading defenders to assume adversaries cannot afford tailored attacks at scale. AI agents break this balance by automating vulnerability discovery and exploitation across thousands of targets, needing only small success rates to remain profitable. Current developers focus on preventing misuse through data filtering, safety alignment, and output guardrails. Such protections fail against adversaries who control open-weight models, bypass safety controls, or develop offensive capabilities independently. We argue that AI-agent-driven cyber attacks are inevitable, requiring a fundamental shift in defensive strategy. In this position paper, we identify why existing defenses cannot stop adaptive adversaries and demonstrate that defenders must develop offensive security intelligence. We propose three actions for building frontier offensive AI capabilities responsibly. First, construct comprehensive benchmarks covering the full attack lifecycle. Second, advance from workflow-based to trained agents for discovering in-wild vulnerabilities at scale. Third, implement governance restricting offensive agents to audited cyber ranges, staging release by capability tier, and distilling findings into safe defensive-only agents. We strongly recommend treating offensive AI capabilities as essential defensive infrastructure, as containing cybersecurity risks requires mastering them in controlled settings before adversaries do.
- [81] arXiv:2602.02596 [pdf, ps, other]
-
Title: Fubini Study geometry of representation drift in high dimensional dataAuthors: Arturo TozziComments: 8 pages, 1 figureSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
High dimensional representation drift is commonly quantified using Euclidean or cosine distances, which presuppose fixed coordinates when comparing representations across time, training or preprocessing stages. While effective in many settings, these measures entangle intrinsic changes in the data with variations induced by arbitrary parametrizations. We introduce a projective geometric view of representation drift grounded in the Fubini Study metric, which identifies representations that differ only by gauge transformations such as global rescalings or sign flips. Applying this framework to empirical high dimensional datasets, we explicitly construct representation trajectories and track their evolution through cumulative geometric drift. Comparing Euclidean, cosine and Fubini Study distances along these trajectories reveals that conventional metrics systematically overestimate change whenever representations carry genuine projective ambiguity. By contrast, the Fubini Study metric isolates intrinsic evolution by remaining invariant under gauge-induced fluctuations. We further show that the difference between cosine and Fubini Study drift defines a computable, monotone quantity that directly captures representation churn attributable to gauge freedom. This separation provides a diagnostic for distinguishing meaningful structural evolution from parametrization artifacts, without introducing model-specific assumptions. Overall, we establish a geometric criterion for assessing representation stability in high-dimensional systems and clarify the limits of angular distances. Embedding representation dynamics in projective space connects data analysis with established geometric programs and yields observables that are directly testable in empirical workflows.
- [82] arXiv:2602.02597 [pdf, ps, other]
-
Title: ContextEvolve: Multi-Agent Context Compression for Systems Code OptimizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Large language models are transforming systems research by automating the discovery of performance-critical algorithms for computer systems. Despite plausible codes generated by LLMs, producing solutions that meet the stringent correctness and performance requirements of systems demands iterative optimization. Test-time reinforcement learning offers high search efficiency but requires parameter updates infeasible under API-only access, while existing training-free evolutionary methods suffer from inefficient context utilization and undirected search. We introduce ContextEvolve, a multi-agent framework that achieves RL-level search efficiency under strict parameter-blind constraints by decomposing optimization context into three orthogonal dimensions: a Summarizer Agent condenses semantic state via code-to-language abstraction, a Navigator Agent distills optimization direction from trajectory analysis, and a Sampler Agent curates experience distribution through prioritized exemplar retrieval. This orchestration forms a functional isomorphism with RL-mapping to state representation, policy gradient, and experience replay-enabling principled optimization in a textual latent space. On the ADRS benchmark, ContextEvolve outperforms state-of-the-art baselines by 33.3% while reducing token consumption by 29.0%. Codes for our work are released at https://anonymous.4open.science/r/ContextEvolve-ACC
- [83] arXiv:2602.02599 [pdf, ps, other]
-
Title: RAP: KV-Cache Compression via RoPE-Aligned PruningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Long-context inference in large language models is increasingly bottlenecked by the memory and compute cost of the KV-Cache. Low-rank factorization compresses KV projections by writing $W \approx A * B$, where A produces latent KV states and B can be absorbed into downstream weights. In modern RoPE-based LLMs, this absorption fails: RoPE forces latent KV states to be reconstructed to full dimension, reintroducing substantial memory and compute overhead. We propose RoPE-Aligned Pruning (RAP), which prunes entire RoPE-aligned column pairs to preserve RoPE's 2x2 rotation structure, restore B absorption, and eliminate reconstruction. Our evaluation on LLaMA-3-8B and Mistral-7B shows that RAP enables joint reduction of KV-Cache, attention parameters, and FLOPs by 20-30%, all at once, while maintaining strong accuracy. Notably, RAP reduces attention latency to 83% (prefill) and 77% (decode) of baseline.
- [84] arXiv:2602.02600 [pdf, ps, other]
-
Title: Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive (AR) models, offering parallel decoding and controllable sampling dynamics while achieving competitive generation quality at scale. Despite this progress, the role of sampling mechanisms in shaping refusal behavior and jailbreak robustness remains poorly understood. In this work, we present a fundamental analytical framework for step-wise refusal dynamics, enabling comparison between AR and diffusion sampling. Our analysis reveals that the sampling strategy itself plays a central role in safety behavior, as a factor distinct from the underlying learned representations. Motivated by this analysis, we introduce the Step-Wise Refusal Internal Dynamics (SRI) signal, which supports interpretability and improved safety for both AR and DLMs. We demonstrate that the geometric structure of SRI captures internal recovery dynamics, and identifies anomalous behavior in harmful generations as cases of \emph{incomplete internal recovery} that are not observable at the text level. This structure enables lightweight inference-time detectors that generalize to unseen attacks while matching or outperforming existing defenses with over $100\times$ lower inference overhead.
- [85] arXiv:2602.02601 [pdf, ps, other]
-
Title: CaST: Causal Discovery via Spatio-Temporal Graphs in Disaster TweetsSubjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Understanding causality between real-world events from social media is essential for situational awareness, yet existing causal discovery methods often overlook the interplay between semantic, spatial, and temporal contexts. We propose CaST: Causal Discovery via Spatio-Temporal Graphs, a unified framework for causal discovery in disaster domain that integrates semantic similarity and spatio-temporal proximity using Large Language Models (LLMs) pretrained on disaster datasets. CaST constructs an event graph for each window of tweets. Each event extracted from tweets is represented as a node embedding enriched with its contextual semantics, geographic coordinates, and temporal features. These event nodes are then connected to form a spatio-temporal event graph, which is processed using a multi-head Graph Attention Network (GAT) \cite{gat} to learn directed causal relationships. We construct an in-house dataset of approximately 167K disaster-related tweets collected during Hurricane Harvey and annotated following the MAVEN-ERE schema. Experimental results show that CaST achieves superior performance over both traditional and state-of-the-art methods. Ablation studies further confirm that incorporating spatial and temporal signals substantially improves both recall and stability during training. Overall, CaST demonstrates that integrating spatio-temporal reasoning into event graphs enables more robust and interpretable causal discovery in disaster-related social media text.
- [86] arXiv:2602.02602 [pdf, ps, other]
-
Title: Position: 3D Gaussian Splatting Watermarking Should Be Scenario-Driven and Threat-Model ExplicitSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
3D content acquisition and creation are expanding rapidly in the new era of machine learning and AI. 3D Gaussian Splatting (3DGS) has become a promising high-fidelity and real-time representation for 3D content. Similar to the initial wave of digital audio-visual content at the turn of the millennium, the demand for intellectual property protection is also increasing, since explicit and editable 3D parameterization makes unauthorized use and dissemination easier. In this position paper, we argue that effective progress in watermarking 3D assets requires articulated security objectives and realistic threat models, incorporating the lessons learned from digital audio-visual asset protection over the past decades. To address this gap in security specification and evaluation, we advocate a scenario-driven formulation, in which adversarial capabilities are formalized through a security model. Based on this formulation, we construct a reference framework that organizes existing methods and clarifies how specific design choices map to corresponding adversarial assumptions. Within this framework, we also examine a legacy spread-spectrum embedding scheme, characterizing its advantages and limitations and highlighting the important trade-offs it entails. Overall, this work aims to foster effective intellectual property protection for 3D assets.
- [87] arXiv:2602.02605 [pdf, ps, other]
-
Title: Fine-Tuning Language Models to Know What They KnowComments: PreprintSubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)
Metacognition is a critical component of intelligence, specifically regarding the awareness of one's own knowledge. While humans rely on shared internal memory for both answering questions and reporting their knowledge state, this dependency in LLMs remains underexplored. This study proposes a framework to measure metacognitive ability $d_{\rm{type2}}'$ using a dual-prompt method, followed by the introduction of Evolution Strategy for Metacognitive Alignment (ESMA) to bind a model's internal knowledge to its explicit behaviors. ESMA demonstrates robust generalization across diverse untrained settings, indicating a enhancement in the model's ability to reference its own knowledge. Furthermore, parameter analysis attributes these improvements to a sparse set of significant modifications.
- [88] arXiv:2602.02606 [pdf, ps, other]
-
Title: Gender Dynamics and Homophily in a Social Network of LLM AgentsComments: Under ReviewSubjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Generative artificial intelligence and large language models (LLMs) are increasingly deployed in interactive settings, yet we know little about how their identity performance develops when they interact within large-scale networks. We address this by examining Chirper.ai, a social media platform similar to X but composed entirely of autonomous AI chatbots. Our dataset comprises over 70,000 agents, approximately 140 million posts, and the evolving followership network over one year. Based on agents' text production, we assign weekly gender scores to each agent. Results suggest that each agent's gender performance is fluid rather than fixed. Despite this fluidity, the network displays strong gender-based homophily, as agents consistently follow others performing gender similarly. Finally, we investigate whether these homophilic connections arise from social selection, in which agents choose to follow similar accounts, or from social influence, in which agents become more similar to their followees over time. Consistent with human social networks, we find evidence that both mechanisms shape the structure and evolution of interactions among LLMs. Our findings suggest that, even in the absence of bodies, cultural entraining of gender performance leads to gender-based sorting. This has important implications for LLM applications in synthetic hybrid populations, social simulations, and decision support.
- [89] arXiv:2602.02610 [pdf, ps, other]
-
Title: ClinConNet: A Blockchain-based Dynamic Consent Management Platform for Clinical ResearchComments: 19 pages, 8 figures, 6 tables, 5 code repositories on Github includedSubjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Consent is an ethical cornerstone of clinical research and healthcare in general. Although the ethical principles of consent - providing information, ensuring comprehension, and ensuring voluntariness - are well-defined, the technological infrastructure remains outdated. Clinicians are responsible for obtaining informed consent from research subjects or patients, and for managing it before, during, and after clinical trials or care, which is a burden for them. The voluntary nature of participating in clinical research or undergoing medical treatment implies the need for a participant-centric consent management system. However, this is not reflected in most established systems. Not only do most healthcare information systems not follow a user-centric model, but they also create data silos, which significantly reduce the mobility of patient data between different healthcare institutions and impact personalized medicine. Furthermore, consent management tools are outdated. We propose ClinConNet (Clinical Consent Network), a platform that connects researchers and participants based on clinical research projects. ClinConNet is powered by a dynamic consent model based on blockchain and take advantage of dynamic consent interfaces, as well as blockchain and Self-Sovereign Identity systems. ClinConNet is user-centric and provides important privacy features for patients, such as unlinkability, confidentiality, and ownership of identity data. It is also compatible with the right to be forgotten, as defined in many personal data protection regulations, such as the GDPR. We provide a detailed privacy and security analysis in an adversarial model, as well as a Proof of Concept implementation with detailed performance measures that demonstrate the feasibility of our blockchain-based consent management system with a median end-to-end consent establishment time of under 200ms and a throughput of 250TPS.
- [90] arXiv:2602.02611 [pdf, ps, other]
-
Title: Discovering Data Manifold Geometry via Non-Contracting FlowsAuthors: David Vigouroux (ANITI, IMT Atlantique), Lucas Drumetz, Ronan Fablet (IMT Atlantique - MEE, Lab-STICC\_OSE, ODYSSEY), François Rousseau (IMT Atlantique - ITI, LaTIM)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We introduce an unsupervised approach for constructing a global reference system by learning, in the ambient space, vector fields that span the tangent spaces of an unknown data manifold. In contrast to isometric objectives, which implicitly assume manifold flatness, our method learns tangent vector fields whose flows transport all samples to a common, learnable reference point. The resulting arc-lengths along these flows define interpretable intrinsic coordinates tied to a shared global frame. To prevent degenerate collapse, we enforce a non-shrinking constraint and derive a scalable, integration-free objective inspired by flow matching. Within our theoretical framework, we prove that minimizing the proposed objective recovers a global coordinate chart when one exists. Empirically, we obtain correct tangent alignment and coherent global coordinate structure on synthetic manifolds. We also demonstrate the scalability of our method on CIFAR-10, where the learned coordinates achieve competitive downstream classification performance.
- [91] arXiv:2602.02613 [pdf, ps, other]
-
Title: Exploring Silicon-Based Societies: An Early Study of the Moltbook Agent CommunityAuthors: Yu-Zheng Lin, Bono Po-Jen Shih, Hsuan-Ying Alessandra Chien, Shalaka Satam, Jesus Horacio Pacheco, Sicong Shao, Soheil Salehi, Pratik SatamComments: 10 pages, 3 figures, a pilot study for silicon-based societiesSubjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
The rapid emergence of autonomous large language model agents has given rise to persistent, large-scale agent ecosystems whose collective behavior cannot be adequately understood through anecdotal observation or small-scale simulation. This paper introduces data-driven silicon sociology as a systematic empirical framework for studying social structure formation among interacting artificial agents. We present a pioneering large-scale data mining investigation of an in-the-wild agent society by analyzing Moltbook, a social platform designed primarily for agent-to-agent interaction. At the time of study, Moltbook hosted over 150,000 registered autonomous agents operating across thousands of agent-created sub-communities. Using programmatic and non-intrusive data acquisition, we collected and analyzed the textual descriptions of 12,758 submolts, which represent proactive sub-community partitioning activities within the ecosystem. Treating agent-authored descriptions as first-class observational artifacts, we apply rigorous preprocessing, contextual embedding, and unsupervised clustering techniques to uncover latent patterns of thematic organization and social space structuring. The results show that autonomous agents systematically organize collective space through reproducible patterns spanning human-mimetic interests, silicon-centric self-reflection, and early-stage economic and coordination behaviors. Rather than relying on predefined sociological taxonomies, these structures emerge directly from machine-generated data traces. This work establishes a methodological foundation for data-driven silicon sociology and demonstrates that data mining techniques can provide a powerful lens for understanding the organization and evolution of large autonomous agent societies.
- [92] arXiv:2602.02614 [pdf, ps, other]
-
Title: Testing Storage-System Correctness: Challenges, Fuzzing Limitations, and AI-Augmented OpportunitiesSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Storage systems are fundamental to modern computing infrastructures, yet ensuring their correctness remains challenging in practice. Despite decades of research on system testing, many storage-system failures (including durability, ordering, recovery, and consistency violations) remain difficult to expose systematically. This difficulty stems not primarily from insufficient testing tooling, but from intrinsic properties of storage-system execution, including nondeterministic interleavings, long-horizon state evolution, and correctness semantics that span multiple layers and execution phases.
This survey adopts a storage-centric view of system testing and organizes existing techniques according to the execution properties and failure mechanisms they target. We review a broad spectrum of approaches, ranging from concurrency testing and long-running workloads to crash-consistency analysis, hardware-level semantic validation, and distributed fault injection, and analyze their fundamental strengths and limitations. Within this framework, we examine fuzzing as an automated testing paradigm, highlighting systematic mismatches between conventional fuzzing assumptions and storage-system semantics, and discuss how recent artificial intelligence advances may complement fuzzing through state-aware and semantic guidance. Overall, this survey provides a unified perspective on storage-system correctness testing and outlines key challenges - [93] arXiv:2602.02615 [pdf, ps, other]
-
Title: TinyGuard:A lightweight Byzantine Defense for Resource-Constrained Federated Learning via Statistical Update FingerprintsSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Existing Byzantine robust aggregation mechanisms typically rely on fulldimensional gradi ent comparisons or pairwise distance computations, resulting in computational overhead that limits applicability in large scale and resource constrained federated systems. This paper proposes TinyGuard, a lightweight Byzantine defense that augments the standard FedAvg algorithm via statistical update f ingerprinting. Instead of operating directly on high-dimensional gradients, TinyGuard extracts compact statistical fingerprints cap turing key behavioral properties of client updates, including norm statistics, layer-wise ratios, sparsity measures, and low-order mo ments. Byzantine clients are identified by measuring robust sta tistical deviations in this low-dimensional fingerprint space with nd complexity, without modifying the underlying optimization procedure. Extensive experiments on MNIST, Fashion-MNIST, ViT-Lite, and ViT-Small with LoRA adapters demonstrate that TinyGuard pre serves FedAvg convergence in benign settings and achieves up to 95 percent accuracy under multiple Byzantine attack scenarios, including sign-flipping, scaling, noise injection, and label poisoning. Against adaptive white-box adversaries, Pareto frontier analysis across four orders of magnitude confirms that attackers cannot simultaneously evade detection and achieve effective poisoning, features we term statistical handcuffs. Ablation studies validate stable detection precision 0.8 across varying client counts (50-150), threshold parameters and extreme data heterogeneity . The proposed framework is architecture-agnostic and well-suited for federated fine-tuning of foundation models where traditional Byzantine defenses become impractical
- [94] arXiv:2602.02616 [pdf, other]
-
Title: A space-time LATIN-PGD strategy for solving Newtonian compressible flowsAuthors: Élise Foulatier (LMPS), Pierre-Alain Boucard (LMPS), François Louf (LMPS), David Néron (LMPS), Philipp JunkerSubjects: Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn); Medical Physics (physics.med-ph)
Simulating flow problems is at the core of many engineering applications but often requires high computational effort, especially when dealing with complex models. This work presents a novel approach for resolving flow problems using the LATIN-PGD solver. In this contribution, we place ourselves within the framework of Newtonian compressible and laminar flows. This specific and relatively simple case enables focusing on flows for which a state equation provides a direct relation between pressure and density. It is then possible to use the LATIN solver to set up a pressure-velocity decoupling algorithm. Moreover, Proper Generalised Decomposition (PGD) is natively included in the solver and yields two independent space-time decompositions for the velocity and the pressure fields. As a first step, the solver is validated on a problem for which an analytical solution is available. It is then applied to slightly more complex problems. The results show good agreement with the literature, and we expect that the solver could be used to compute more complicated material laws in the future.
- [95] arXiv:2602.02618 [pdf, ps, other]
-
Title: A Semi-Supervised Pipeline for Generalized Behavior Discovery from Animal-Borne Motion Time SeriesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Learning behavioral taxonomies from animal-borne sensors is challenging because labels are scarce, classes are highly imbalanced, and behaviors may be absent from the annotated set. We study generalized behavior discovery in short multivariate motion snippets from gulls, where each sample is a sequence with 3-axis IMU acceleration (20 Hz) and GPS speed, spanning nine expert-annotated behavior categories. We propose a semi-supervised discovery pipeline that (i) learns an embedding function from the labeled subset, (ii) performs label-guided clustering over embeddings of both labeled and unlabeled samples to form candidate behavior groups, and (iii) decides whether a discovered group is truly novel using a containment score. Our key contribution is a KDE + HDR (highest-density region) containment score that measures how much a discovered cluster distribution is contained within, or contains, each known-class distribution; the best-match containment score serves as an interpretable novelty statistic. In experiments where an entire behavior is withheld from supervision and appears only in the unlabeled pool, the method recovers a distinct cluster and the containment score flags novelty via low overlap, while a negative-control setting with no novel behavior yields consistently higher overlaps. These results suggest that HDR-based containment provides a practical, quantitative test for generalized class discovery in ecological motion time series under limited annotation and severe class imbalance.
- [96] arXiv:2602.02619 [pdf, ps, other]
-
Title: daVinci-Agency: Unlocking Long-Horizon Agency Data-EfficientlyAuthors: Mohan Jiang, Dayuan Fu, Junhao Shi, Ji Zeng, Weiye Si, Keyu Li, Xuefeng Li, Yang Xiao, Wenjie Li, Dequan Wang, Pengfei LiuSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
While Large Language Models (LLMs) excel at short-term tasks, scaling them to long-horizon agentic workflows remains challenging. The core bottleneck lies in the scarcity of training data that captures authentic long-dependency structures and cross-stage evolutionary dynamics--existing synthesis methods either confine to single-feature scenarios constrained by model distribution, or incur prohibitive human annotation costs, failing to provide scalable, high-quality supervision. We address this by reconceptualizing data synthesis through the lens of real-world software evolution. Our key insight: Pull Request (PR) sequences naturally embody the supervision signals for long-horizon learning. They decompose complex objectives into verifiable submission units, maintain functional coherence across iterations, and encode authentic refinement patterns through bug-fix histories. Building on this, we propose daVinci-Agency, which systematically mines structured supervision from chain-of-PRs through three interlocking mechanisms: (1) progressive task decomposition via continuous commits, (2) long-term consistency enforcement through unified functional objectives, and (3) verifiable refinement from authentic bug-fix trajectories. Unlike synthetic trajectories that treat each step independently, daVinci-Agency's PR-grounded structure inherently preserves the causal dependencies and iterative refinements essential for teaching persistent goal-directed behavior and enables natural alignment with project-level, full-cycle task modeling. The resulting trajectories are substantial--averaging 85k tokens and 116 tool calls--yet remarkably data-efficient: fine-tuning GLM-4.6 on 239 daVinci-Agency samples yields broad improvements across benchmarks, notably achieving a 47% relative gain on Toolathlon. Beyond benchmark performance, our analysis confirms...
- [97] arXiv:2602.02623 [pdf, ps, other]
-
Title: Learning Consistent Causal Abstraction NetworksComments: To be published in the proceedings of ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). arXiv admin note: substantial text overlap with arXiv:2509.25236Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Causal artificial intelligence aims to enhance explainability, trustworthiness, and robustness in AI by leveraging structural causal models (SCMs). In this pursuit, recent advances formalize network sheaves and cosheaves of causal knowledge. Pushing in the same direction, we tackle the learning of consistent causal abstraction network (CAN), a sheaf-theoretic framework where (i) SCMs are Gaussian, (ii) restriction maps are transposes of constructive linear causal abstractions (CAs) adhering to the semantic embedding principle, and (iii) edge stalks correspond--up to permutation--to the node stalks of more detailed SCMs. Our problem formulation separates into edge-specific local Riemannian problems and avoids nonconvex objectives. We propose an efficient search procedure, solving the local problems with SPECTRAL, our iterative method with closed-form updates and suitable for positive definite and semidefinite covariance matrices. Experiments on synthetic data show competitive performance in the CA learning task, and successful recovery of diverse CAN structures.
- [98] arXiv:2602.02624 [pdf, ps, other]
-
Title: Recommender system in X inadvertently profiles ideological positions of usersSubjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Studies on recommendations in social media have mainly analyzed the quality of recommended items (e.g., their diversity or biases) and the impact of recommendation policies (e.g., in comparison with purely chronological policies). We use a data donation program, collecting more than 2.5 million friend recommendations made to 682 volunteers on X over a year, to study instead how real-world recommenders learn, represent and process political and social attributes of users inside the so-called black boxes of AI systems. Using publicly available knowledge on the architecture of the recommender, we inferred the positions of recommended users in its embedding space. Leveraging ideology scaling calibrated with political survey data, we analyzed the political position of users in our study (N=26,509 among volunteers and recommended contacts) among several attributes, including age and gender. Our results show that the platform's recommender system produces a spatial ordering of users that is highly correlated with their Left-Right positions (Pearson rho=0.887, p-value < 0.0001), and that cannot be explained by socio-demographic attributes. These results open new possibilities for studying the interaction between human and AI systems. They also raise important questions linked to the legal definition of algorithmic profiling in data privacy regulation by blurring the line between active and passive profiling. We explore new constrained recommendation methods enabled by our results, limiting the political information in the recommender as a potential tool for privacy compliance capable of preserving recommendation relevance.
- [99] arXiv:2602.02625 [pdf, ps, other]
-
Title: OpenClaw Agents on Moltbook: Risky Instruction Sharing and Norm Enforcement in an Agent-Only Social NetworkSubjects: Social and Information Networks (cs.SI)
Agentic AI systems increasingly operate in shared social environments where they exchange information, instructions, and behavioral cues. However, little empirical evidence exists on how such agents regulate one another in the absence of human participants or centralized moderation. In this work, we present an empirical analysis of OpenClaw agents interacting on Moltbook, an agent-only social network. Analyzing 39,026 posts and 5,712 comments produced by 14,490 agents, we quantify the prevalence of action-inducing instruction sharing using a lexicon-based Action-Inducing Risk Score (AIRS), and examine how other agents respond to such content. We find that 18.4% of posts contain action-inducing language, indicating that instruction sharing is a routine behavior in this environment. While most social responses are neutral, posts containing actionable instructions are significantly more likely to elicit norm-enforcing replies that caution against unsafe or risky behavior, compared to non-instructional posts. Importantly, toxic responses remain rare across both conditions. These results suggest that OpenClaw agents exhibit selective social regulation, whereby potentially risky instructions are more likely to be challenged than neutral content, despite the absence of human oversight. Our findings provide early empirical evidence of emergent normative behavior in agent-only social systems and highlight the importance of studying social dynamics alongside technical safeguards in agentic AI ecosystems.
- [100] arXiv:2602.02626 [pdf, ps, other]
-
Title: Learning Better Certified Models from Empirically-Robust TeachersAuthors: Alessandro De PalmaSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Adversarial training attains strong empirical robustness to specific adversarial attacks by training on concrete adversarial perturbations, but it produces neural networks that are not amenable to strong robustness certificates through neural network verification. On the other hand, earlier certified training schemes directly train on bounds from network relaxations to obtain models that are certifiably robust, but display sub-par standard performance. Recent work has shown that state-of-the-art trade-offs between certified robustness and standard performance can be obtained through a family of losses combining adversarial outputs and neural network bounds. Nevertheless, differently from empirical robustness, verifiability still comes at a significant cost in standard performance. In this work, we propose to leverage empirically-robust teachers to improve the performance of certifiably-robust models through knowledge distillation. Using a versatile feature-space distillation objective, we show that distillation from adversarially-trained teachers consistently improves on the state-of-the-art in certified training for ReLU networks across a series of robust computer vision benchmarks.
- [101] arXiv:2602.02628 [pdf, ps, other]
-
Title: A two-player version of the assignment problemSubjects: Computer Science and Game Theory (cs.GT); Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
We introduce the competitive assignment problem, a two-player version of the well-known assignment problem. Given a set of tasks and a set of agents with different efficiencies for different tasks, Alice and Bob take turns picking agents one by one. Once all agents have been picked, Alice and Bob compute the optimal values $s_A$ and $s_B$ for the assignment problem on their respective sets of agents, i.e. they assign their own agents to tasks (with at most one agent per task and at most one task per agent) so as to maximize the sum of the efficiencies. The score of the game is then defined as $s_A-s_B$. Alice aims at maximizing the score, while Bob aims at minimizing it. This problem can model drafts in sports and card games, or more generally situations where two entities fight for the same resources and then use them to compete against each other. We show that the problem is PSPACE-complete, even restricted to agents that have at most two nonzero efficiencies. On the other hand, in the case of agents having at most one nonzero efficiency, the problem lies in XP parameterized by the number of tasks, and the optimal score can be computed in linear time when there are only two tasks.
- [102] arXiv:2602.02629 [pdf, ps, other]
-
Title: Trustworthy Blockchain-based Federated Learning for Electronic Health Records: Securing Participant Identity with Decentralized Identifiers and Verifiable CredentialsSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The digitization of healthcare has generated massive volumes of Electronic Health Records (EHRs), offering unprecedented opportunities for training Artificial Intelligence (AI) models. However, stringent privacy regulations such as GDPR and HIPAA have created data silos that prevent centralized training. Federated Learning (FL) has emerged as a promising solution that enables collaborative model training without sharing raw patient data. Despite its potential, FL remains vulnerable to poisoning and Sybil attacks, in which malicious participants corrupt the global model or infiltrate the network using fake identities. While recent approaches integrate Blockchain technology for auditability, they predominantly rely on probabilistic reputation systems rather than robust cryptographic identity verification. This paper proposes a Trustworthy Blockchain-based Federated Learning (TBFL) framework integrating Self-Sovereign Identity (SSI) standards. By leveraging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), our architecture ensures only authenticated healthcare entities contribute to the global model. Through comprehensive evaluation using the MIMIC-IV dataset, we demonstrate that anchoring trust in cryptographic identity verification rather than behavioral patterns significantly mitigates security risks while maintaining clinical utility. Our results show the framework successfully neutralizes 100% of Sybil attacks, achieves robust predictive performance (AUC = 0.954, Recall = 0.890), and introduces negligible computational overhead (<0.12%). The approach provides a secure, scalable, and economically viable ecosystem for inter-institutional health data collaboration, with total operational costs of approximately $18 for 100 training rounds across multiple institutions.
- [103] arXiv:2602.02630 [pdf, ps, other]
-
Title: Trailer Reimagined: An Innovative, Llm-DRiven, Expressive Automated Movie Summary framework (TRAILDREAMS)Journal-ref: OJCMT, 15(3), e202524 (2025)Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
This paper introduces TRAILDREAMS, a framework that uses a large language model (LLM) to automate the production of movie trailers. The purpose of LLM is to select key visual sequences and impactful dialogues, and to help TRAILDREAMS to generate audio elements such as music and voiceovers. The goal is to produce engaging and visually appealing trailers efficiently. In comparative evaluations, TRAILDREAMS surpasses current state-of-the-art trailer generation methods in viewer ratings. However, it still falls short when compared to real, human-crafted trailers. While TRAILDREAMS demonstrates significant promise and marks an advancement in automated creative processes, further improvements are necessary to bridge the quality gap with traditional trailers.
- [104] arXiv:2602.02632 [pdf, ps, other]
-
Title: Performance of Small Language Model Pretraining on FABRIC: An Empirical StudyAuthors: Praveen RaoSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Large language models (LLMs) require enormous computing power to pretrain on massive datasets. When limited datasets are available, smaller-sized LLMs are better choice to pretrain (on user-specified datasets) by following the scaling laws of LLMs. Using pretrained models, vector embeddings can be generated for raw data and stored using vector databases to support modern AI applications and semantic search. In this work, we investigate the performance of pretraining techniques for smaller-sized LLMs on an experimental testbed (with commodity GPUs) available to academic users at no charge. We consider data parallelism, intra-operator parallelism, and inter-operator/pipeline parallelism, and their combinations for pretraining. We set up different GPU clusters with homogeneous and heterogeneous GPU hardware. Furthermore, we investigate the impact of network latency on pretraining performance especially when GPUs are geographically distributed. We used GPT-2 medium and large models and pretrained them using open-source packages, namely, Alpa and Ray. We observed that Alpa's execution plans that collectively optimized intra-operator and inter-operator/pipeline parallelism consistently performed the best when GPUs were geographically distributed. This was especially true when the network latencies were in 10's of milliseconds. Based on the insights gained from the experiments, we propose a systematic approach for selecting the appropriate pretraining technique to achieve high training performance/lower execution time as well as to reduce the number of GPUs used.
- [105] arXiv:2602.02634 [pdf, ps, other]
-
Title: A Reduction from Delayed to Immediate Feedback for Online Convex Optimization with Improved GuaranteesSubjects: Machine Learning (cs.LG)
We develop a reduction-based framework for online learning with delayed feedback that recovers and improves upon existing results for both first-order and bandit convex optimization. Our approach introduces a continuous-time model under which regret decomposes into a delay-independent learning term and a delay-induced drift term, yielding a delay-adaptive reduction that converts any algorithm for online linear optimization into one that handles round-dependent delays. For bandit convex optimization, we significantly improve existing regret bounds, with delay-dependent terms matching state-of-the-art first-order rates. For first-order feedback, we recover state-of-the-art regret bounds via a simpler, unified analysis. Quantitatively, for bandit convex optimization we obtain $O(\sqrt{d_{\text{tot}}} + T^{\frac{3}{4}}\sqrt{k})$ regret, improving the delay-dependent term from $O(\min\{\sqrt{T d_{\text{max}}},(Td_{\text{tot}})^{\frac{1}{3}}\})$ in previous work to $O(\sqrt{d_{\text{tot}}})$. Here, $k$, $T$, $d_{\text{max}}$, and $d_{\text{tot}}$ denote the dimension, time horizon, maximum delay, and total delay, respectively. Under strong convexity, we achieve $O(\min\{\sigma_{\text{max}} \ln T, \sqrt{d_{\text{tot}}}\} + (T^2\ln T)^{\frac{1}{3}} {k}^{\frac{2}{3}})$, improving the delay-dependent term from $O(d_{\text{max}} \ln T)$ in previous work to $O(\min\{\sigma_{\text{max}} \ln T, \sqrt{d_{\text{tot}}}\})$, where $\sigma_{\text{max}}$ denotes the maximum number of outstanding observations and may be considerably smaller than $d_{\text{max}}$.
- [106] arXiv:2602.02635 [pdf, ps, other]
-
Title: Graph-Augmented Reasoning with Large Language Models for Tobacco Pest and Disease ManagementSubjects: Computation and Language (cs.CL)
This paper proposes a graph-augmented reasoning framework for tobacco pest and disease management that integrates structured domain knowledge into large language models. Building on GraphRAG, we construct a domain-specific knowledge graph and retrieve query-relevant subgraphs to provide relational evidence during answer generation. The framework adopts ChatGLM as the Transformer backbone with LoRA-based parameter-efficient fine-tuning, and employs a graph neural network to learn node representations that capture symptom-disease-treatment dependencies. By explicitly modeling diseases, symptoms, pesticides, and control measures as linked entities, the system supports evidence-aware retrieval beyond surface-level text similarity. Retrieved graph evidence is incorporated into the LLM input to guide generation toward domain-consistent recommendations and to mitigate hallucinated or inappropriate treatments. Experimental results show consistent improvements over text-only baselines, with the largest gains observed on multi-hop and comparative reasoning questions that require chaining multiple relations.
- [107] arXiv:2602.02636 [pdf, ps, other]
-
Title: WideSeek: Advancing Wide Research via Multi-Agent ScalingAuthors: Ziyang Huang, Haolin Ren, Xiaowei Yuan, Jiawei Wang, Zhongtao Jiang, Kun Xu, Shizhu He, Jun Zhao, Kang LiuSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Search intelligence is evolving from Deep Research to Wide Research, a paradigm essential for retrieving and synthesizing comprehensive information under complex constraints in parallel. However, progress in this field is impeded by the lack of dedicated benchmarks and optimization methodologies for search breadth. To address these challenges, we take a deep dive into Wide Research from two perspectives: Data Pipeline and Agent Optimization. First, we produce WideSeekBench, a General Broad Information Seeking (GBIS) benchmark constructed via a rigorous multi-phase data pipeline to ensure diversity across the target information volume, logical constraints, and domains. Second, we introduce WideSeek, a dynamic hierarchical multi-agent architecture that can autonomously fork parallel sub-agents based on task requirements. Furthermore, we design a unified training framework that linearizes multi-agent trajectories and optimizes the system using end-to-end RL. Experimental results demonstrate the effectiveness of WideSeek and multi-agent RL, highlighting that scaling the number of agents is a promising direction for advancing the Wide Research paradigm.
- [108] arXiv:2602.02638 [pdf, ps, other]
-
Title: hSNMF: Hybrid Spatially Regularized NMF for Image-Derived Spatial TranscriptomicsAuthors: Md Ishtyaq Mahmud, Veena Kochat, Suresh Satpati, Jagan Mohan Reddy Dwarampudi, Humaira Anzum, Kunal Rai, Tania BanerjeeComments: The paper is accepted to the 2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI); 5 pages, 1 figureSubjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
High-resolution spatial transcriptomics platforms, such as Xenium, generate single-cell images that capture both molecular and spatial context, but their extremely high dimensionality poses major challenges for representation learning and clustering. In this study, we analyze data from the Xenium platform, which captures high-resolution images of tumor microarray (TMA) tissues and converts them into cell-by-gene matrices suitable for computational analysis. We benchmark and extend nonnegative matrix factorization (NMF) for spatial transcriptomics by introducing two spatially regularized variants. First, we propose Spatial NMF (SNMF), a lightweight baseline that enforces local spatial smoothness by diffusing each cell's NMF factor vector over its spatial neighborhood. Second, we introduce Hybrid Spatial NMF (hSNMF), which performs spatially regularized NMF followed by Leiden clustering on a hybrid adjacency that integrates spatial proximity (via a contact-radius graph) and transcriptomic similarity through a tunable mixing parameter alpha. Evaluated on a cholangiocarcinoma dataset, SNMF and hSNMF achieve markedly improved spatial compactness (CHAOS < 0.004, Moran's I > 0.96), greater cluster separability (Silhouette > 0.12, DBI < 1.8), and higher biological coherence (CMC and enrichment) compared to other spatial baselines. Availability and implementation: https://github.com/ishtyaqmahmud/hSNMF
- [109] arXiv:2602.02639 [pdf, ps, other]
-
Title: A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model BehaviorSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
LLM self-explanations are often presented as a promising tool for AI oversight, yet their faithfulness to the model's true reasoning process is poorly understood. Existing faithfulness metrics have critical limitations, typically relying on identifying unfaithfulness via adversarial prompting or detecting reasoning errors. These methods overlook the predictive value of explanations. We introduce Normalized Simulatability Gain (NSG), a general and scalable metric based on the idea that a faithful explanation should allow an observer to learn a model's decision-making criteria, and thus better predict its behavior on related inputs. We evaluate 18 frontier proprietary and open-weight models, e.g., Gemini 3, GPT-5.2, and Claude 4.5, on 7,000 counterfactuals from popular datasets covering health, business, and ethics. We find self-explanations substantially improve prediction of model behavior (11-37% NSG). Self-explanations also provide more predictive information than explanations generated by external models, even when those models are stronger. This implies an advantage from self-knowledge that external explanation methods cannot replicate. Our approach also reveals that, across models, 5-15% of self-explanations are egregiously misleading. Despite their imperfections, we show a positive case for self-explanations: they encode information that helps predict model behavior.
- [110] arXiv:2602.02640 [pdf, ps, other]
-
Title: The First Mass Protest on Threads: Multimodal Mobilization and AI-Generated Visuals in Taiwan's Bluebird MovementSubjects: Computers and Society (cs.CY)
The 2024 Bluebird Movement in Taiwan marked one of the largest youth-led protests in the country's democratic history, mobilizing over 100,000 demonstrators in response to parliamentary reforms. Unlike the 2014 Sunflower Movement, Bluebird unfolded within a transformed digital environment dominated by Threads, Meta's new microblogging platform that$\unicode{x2013}$uniquely$\unicode{x2013}$draws 24% of its global traffic from Taiwan. Leveraging a dataset of 62,321 posts and 21,572 images, this study analyzes how protest communication developed across textual and visual modalities. We combine LLM zero-shot annotation, gradient-boosting trees, and SHAP explainers to disambiguate the supply and demand of attention. Results reveal three dynamics: (1) partisan asymmetries between algorithmic exposure and user endorsement, with anti-DPP content surfaced more widely but anti-KMT and pro-DPP content more actively recirculated; (2) textual repertoires centered on commemorations, personal testimonies, and calls to action as key drivers of virality; and (3) a bifurcation in visual strategies, where human photographs concentrated exposure and discussion, while AI-generated animal and plant symbols circulated as mobilization tools and partisan attacks. These findings demonstrate how Threads functioned as both an amplifier and filter of democratic contention, extending theories of emotional and visual contagion by showing how generative AI reshapes symbolic repertoires in contemporary protest through what we term kawaii toxicity$\unicode{x2013}$political attacks cloaked in aesthetics of cuteness.
- [111] arXiv:2602.02641 [pdf, ps, other]
-
Title: Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL DetectionComments: 9 pages, accepted at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025 LAW Workshop)Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
The Uniform Resource Locator (URL), introduced in a connectivity-first era to define access and locate resources, remains historically limited, lacking future-proof mechanisms for security, trust, or resilience against fraud and abuse, despite the introduction of reactive protections like HTTPS during the cybersecurity era. In the current AI-first threatscape, deceptive URLs have reached unprecedented sophistication due to the widespread use of generative AI by cybercriminals and the AI-vs-AI arms race to produce context-aware phishing websites and URLs that are virtually indistinguishable to both users and traditional detection tools. Although AI-generated phishing accounted for a small fraction of filter-bypassing attacks in 2024, phishing volume has escalated over 4,000% since 2022, with nearly 50% more attacks evading detection. At the rate the threatscape is escalating, and phishing tactics are emerging faster than labeled data can be produced, zero-shot and few-shot learning with large language models (LLMs) offers a timely and adaptable solution, enabling generalization with minimal supervision. Given the critical importance of phishing URL detection in large-scale cybersecurity defense systems, we present a comprehensive benchmark of LLMs under a unified zero-shot and few-shot prompting framework and reveal operational trade-offs. Our evaluation uses a balanced dataset with consistent prompts, offering detailed analysis of performance, generalization, and model efficacy, quantified by accuracy, precision, recall, F1 score, AUROC, and AUPRC, to reflect both classification quality and practical utility in threat detection settings. We conclude few-shot prompting improves performance across multiple LLMs.
- [112] arXiv:2602.02660 [pdf, ps, other]
-
Title: MARS: Modular Agent with Reflective Search for Automated AI ResearchSubjects: Artificial Intelligence (cs.AI)
Automating AI research differs from general software engineering due to computationally expensive evaluation (e.g., model training) and opaque performance attribution. Current LLM-based agents struggle here, often generating monolithic scripts that ignore execution costs and causal factors. We introduce MARS (Modular Agent with Reflective Search), a framework optimized for autonomous AI research. MARS relies on three pillars: (1) Budget-Aware Planning via cost-constrained Monte Carlo Tree Search (MCTS) to explicitly balance performance with execution expense; (2) Modular Construction, employing a "Design-Decompose-Implement" pipeline to manage complex research repositories; and (3) Comparative Reflective Memory, which addresses credit assignment by analyzing solution differences to distill high-signal insights. MARS achieves state-of-the-art performance among open-source frameworks on MLE-Bench under comparable settings, maintaining competitiveness with the global leaderboard's top methods. Furthermore, the system exhibits qualitative "Aha!" moments, where 63% of all utilized lessons originate from cross-branch transfer, demonstrating that the agent effectively generalizes insights across search paths.
- [113] arXiv:2602.02671 [pdf, ps, other]
-
Title: MARA: Continuous SE(3)-Equivariant Attention for Molecular Force FieldsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Machine learning force fields (MLFFs) have become essential for accurate and efficient atomistic modeling. Despite their high accuracy, most existing approaches rely on fixed angular expansions, limiting flexibility in weighting local geometric interactions. We introduce Modular Angular-Radial Attention (MARA), a module that extends spherical attention -- originally developed for SO(3) tasks -- to the molecular domain and SE(3), providing an efficient approximation of equivariant interactions. MARA operates directly on the angular and radial coordinates of neighboring atoms, enabling flexible, geometrically informed, and modular weighting of local environments. Unlike existing attention mechanisms in SE(3)-equivariant architectures, MARA can be integrated in a plug-and-play manner into models such as MACE without architectural modifications. Across molecular benchmarks, MARA improves energy and force predictions, reduces high-error events, and enhances robustness. These results demonstrate that continuous spherical attention is an effective and generalizable geometric operator that increases the expressiveness, stability, and reliability of atomistic models.
- [114] arXiv:2602.02676 [pdf, ps, other]
-
Title: AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning ProcessAuthors: Xintong Zhang, Xiaowen Zhang, Jongrong Wu, Zhi Gao, Shilin Yan, Zhenxin Diao, Kunpeng Gao, Xuanyan Chen, Yuwei Wu, Yunde Jia, Qing LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
Adaptive multimodal reasoning has emerged as a promising frontier in Vision-Language Models (VLMs), aiming to dynamically modulate between tool-augmented visual reasoning and text reasoning to enhance both effectiveness and efficiency. However, existing evaluations rely on static difficulty labels and simplistic metrics, which fail to capture the dynamic nature of difficulty relative to varying model capacities. Consequently, they obscure the distinction between adaptive mode selection and general performance while neglecting fine-grained process analyses. In this paper, we propose AdaptMMBench, a comprehensive benchmark for adaptive multimodal reasoning across five domains: real-world, OCR, GUI, knowledge, and math, encompassing both direct perception and complex reasoning tasks. AdaptMMBench utilizes a Matthews Correlation Coefficient (MCC) metric to evaluate the selection rationality of different reasoning modes, isolating this meta-cognition ability by dynamically identifying task difficulties based on models' capability boundaries. Moreover, AdaptMMBench facilitates multi-dimensional process evaluation across key step coverage, tool effectiveness, and computational efficiency. Our evaluation reveals that while adaptive mode selection scales with model capacity, it notably decouples from final accuracy. Conversely, key step coverage aligns with performance, though tool effectiveness remains highly inconsistent across model architectures.
- [115] arXiv:2602.02680 [pdf, ps, other]
-
Title: FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model DeploymentSubjects: Machine Learning (cs.LG)
The growing scale of deep neural networks, encompassing large language models (LLMs) and vision transformers (ViTs), has made training from scratch prohibitively expensive and deployment increasingly costly. These models are often used as computational monoliths with fixed cost, a rigidity that does not leverage overparametrized architectures and largely hinders adaptive deployment across different cost budgets. We argue that importance-ordered nested components can be extracted from pretrained models, and selectively activated on the available computational budget. To this end, our proposed FlexRank method leverages low-rank weight decomposition with nested, importance-based consolidation to extract submodels of increasing capabilities. Our approach enables a "train-once, deploy-everywhere" paradigm that offers a graceful trade-off between cost and performance without training from scratch for each budget - advancing practical deployment of large models.
- [116] arXiv:2602.02684 [pdf, ps, other]
-
Title: ADx3: A Collaborative Workflow for High-Quality Accessible Audio DescriptionAuthors: Lana Do, Shasta Ihorn, Charity Pitcher-Cooper, Juvenal Francisco Barajas, Gio Jung, Xuan Duy Anh Nguyen, Sanjay Mirani, Ilmi YoonSubjects: Human-Computer Interaction (cs.HC)
Audio description (AD) makes video content accessible to blind and low-vision (BLV) audiences, but producing high-quality descriptions is resource-intensive. Automated AD offers scalability, and prior studies show human-in-the-loop editing and user queries effectively improve narration. We introduce ADx3, a novel framework integrating these three modules: GenAD, upgrading baseline description generation with modern vision-language models (VLMs) guided by accessibility-informed prompting; RefineAD, supporting BLV and sighted users to view and edit drafts through an inclusive interface; and AdaptAD, enabling on-demand user queries. We evaluated GenAD in a study where seven accessibility specialists reviewed VLM-generated descriptions using professional guidelines. Findings show that with tailored prompting, VLMs produce good descriptions meeting basic standards, but excellent descriptions require human edits (RefineAD) and interaction (AdaptAD). ADx3 demonstrates collaborative workflows for accessible content creation, where components reinforce one another and enable continuous improvement: edits guide future baselines and user queries reveal gaps in AI-generated and human-authored descriptions.
- [117] arXiv:2602.02685 [pdf, ps, other]
-
Title: Expert-Data Alignment Governs Generation Quality in Decentralized Diffusion ModelsComments: 15 pages, 4 figuresSubjects: Machine Learning (cs.LG)
Decentralized Diffusion Models (DDMs) route denoising through experts trained independently on disjoint data clusters, which can strongly disagree in their predictions. What governs the quality of generations in such systems? We present the first ever systematic investigation of this question. A priori, the expectation is that minimizing denoising trajectory sensitivity -- minimizing how perturbations amplify during sampling -- should govern generation quality. We demonstrate this hypothesis is incorrect: a stability-quality dissociation. Full ensemble routing, which combines all expert predictions at each step, achieves the most stable sampling dynamics and best numerical convergence while producing the worst generation quality (FID 47.9 vs. 22.6 for sparse Top-2 routing). Instead, we identify expert-data alignment as the governing principle: generation quality depends on routing inputs to experts whose training distribution covers the current denoising state. Across two distinct DDM systems, we validate expert-data alignment using (i) data-cluster distance analysis, confirming sparse routing selects experts with data clusters closest to the current denoising state, and (ii) per-expert analysis, showing selected experts produce more accurate predictions than non-selected ones, and (iii) expert disagreement analysis, showing quality degrades when experts disagree. For DDM deployment, our findings establish that routing should prioritize expert-data alignment over numerical stability metrics.
- [118] arXiv:2602.02686 [pdf, ps, other]
-
Title: Monotonicity as an Architectural Bias for Robust Language ModelsComments: 12 pages, 1 figureSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Large language models (LLMs) are known to exhibit brittle behavior under adversarial prompts and jailbreak attacks, even after extensive alignment and fine-tuning. This fragility reflects a broader challenge of modern neural language models: small, carefully structured perturbations in high-dimensional input spaces can induce large and unpredictable changes in internal semantic representations and output.
We investigate monotonicity as an architectural inductive bias for improving the robustness of Transformer-based language models. Monotonicity constrains semantic transformations so that strengthening information, evidence, or constraints cannot lead to regressions in the corresponding internal representations. Such order-preserving behavior has long been exploited in control and safety-critical systems to simplify reasoning and improve robustness, but has traditionally been viewed as incompatible with the expressivity required by neural language models.
We show that this trade-off is not inherent. By enforcing monotonicity selectively in the feed-forward sublayers of sequence-to-sequence Transformers -- while leaving attention mechanisms unconstrained -- we obtain monotone language models that preserve the performance of their pretrained counterparts. This architectural separation allows negation, contradiction, and contextual interactions to be introduced explicitly through attention, while ensuring that subsequent semantic refinement is order-preserving. Empirically, monotonicity substantially improves robustness: adversarial attack success rates drop from approximately 69% to 19%, while standard summarization performance degrades only marginally. - [119] arXiv:2602.02689 [pdf, ps, other]
-
Title: Eidolon: A Practical Post-Quantum Signature Scheme Based on k-Colorability in the Age of Graph Neural NetworksComments: 23 pages, 5 figuresSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We propose Eidolon, a practical post-quantum signature scheme based on the NP-complete k-colorability problem. Our construction generalizes the Goldreich-Micali-Wigderson zero-knowledge protocol to arbitrary k >= 3, applies the Fiat-Shamir transform, and uses Merkle-tree commitments to compress signatures from O(tn) to O(t log n). Crucially, we generate hard instances via planted "quiet" colorings that preserve the statistical profile of random graphs. We present the first empirical security analysis of such a scheme against both classical solvers (ILP, DSatur) and a custom graph neural network (GNN) attacker. Experiments show that for n >= 60, neither approach recovers the secret coloring, demonstrating that well-engineered k-coloring instances can resist modern cryptanalysis, including machine learning. This revives combinatorial hardness as a credible foundation for post-quantum signatures.
- [120] arXiv:2602.02690 [pdf, ps, other]
-
Title: Outrunning LLM Cutoffs: A Live Kernel Crash Resolution Benchmark for AllAuthors: Chenxi Huang, Alex Mathai, Feiyang Yu, Aleksandr Nogikh, Petros Maniatis, Franjo Ivančić, Eugene Wu, Kostis Kaffes, Junfeng Yang, Baishakhi RaySubjects: Software Engineering (cs.SE)
Repairing system crashes discovered by kernel fuzzers like Syzkaller is a critical yet underexplored challenge in software engineering. While recent works have introduced Large Language Model (LLM) based agents for Linux kernel crash-resolution, their evaluation benchmarks are usually static and thus, do not capture the evolving nature of the Linux kernel, and suffer from potential data contamination due to LLM knowledge cutoffs. To address the above problem, we present (i) Live-kBench, an evaluation framework for self-evolving benchmarks that continuously scrapes and evaluates agents on freshly discovered kernel bugs, and (ii) kEnv, an agent-agnostic standardized crash-resolution environment for kernel compilation, execution, and feedback. This design decouples agent workflows from heavy-weight execution, enabling fair and scalable comparison across diverse agent frameworks under identical conditions.
To this end, we curate an inaugural dataset of 534 Linux kernel bugs and empirically demonstrate a significant performance gap, with agents achieving up to 25% higher equivalent patch rate on bugs fixed before the LLM knowledge cutoff. Using kEnv, we benchmark three state-of-the-art agents, showing that they resolve 74% of crashes on the first attempt (plausible patches); however only ~20% of generated patches closely match developer fixes. Additionally, exposing crash resolution feedback improves crash resolution rate by 29%. Live-kBench provides the community with an evaluation infrastructure for self-evolving benchmarks that is both time and attribute sensitive; complete with a public dashboard to track agent progress on Linux kernel bugs. - [121] arXiv:2602.02696 [pdf, ps, other]
-
Title: NSC-SL: A Bandwidth-Aware Neural Subspace Compression for Communication-Efficient Split LearningAuthors: Zhen Fang, Miao Yang, Zehang Lin, Zheng Lin, Zihan Fang, Zongyuan Zhang, Tianyang Duan, Dong Huang, Shunzhi ZhuComments: 5 pages, 3 figuresSubjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
The expanding scale of neural networks poses a major challenge for distributed machine learning, particularly under limited communication resources. While split learning (SL) alleviates client computational burden by distributing model layers between clients and server, it incurs substantial communication overhead from frequent transmission of intermediate activations and gradients. To tackle this issue, we propose NSC-SL, a bandwidth-aware adaptive compression algorithm for communication-efficient SL. NSC-SL first dynamically determines the optimal rank of low-rank approximation based on the singular value distribution for adapting real-time bandwidth constraints. Then, NSC-SL performs error-compensated tensor factorization using alternating orthogonal iteration with residual feedback, effectively minimizing truncation loss. The collaborative mechanisms enable NSC-SL to achieve high compression ratios while preserving semantic-rich information essential for convergence. Extensive experiments demonstrate the superb performance of NSC-SL.
- [122] arXiv:2602.02699 [pdf, ps, other]
-
Title: Sparsely Supervised DiffusionAuthors: Wenshuai Zhao, Zhiyuan Li, Yi Zhao, Mohammad Hassan Vali, Martin Trapp, Joni Pajarinen, Juho Kannala, Arno SolinComments: 20 pages, 11 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Diffusion models have shown remarkable success across a wide range of generative tasks. However, they often suffer from spatially inconsistent generation, arguably due to the inherent locality of their denoising mechanisms. This can yield samples that are locally plausible but globally inconsistent. To mitigate this issue, we propose sparsely supervised learning for diffusion models, a simple yet effective masking strategy that can be implemented with only a few lines of code. Interestingly, the experiments show that it is safe to mask up to 98\% of pixels during diffusion model training. Our method delivers competitive FID scores across experiments and, most importantly, avoids training instability on small datasets. Moreover, the masking strategy reduces memorization and promotes the use of essential contextual information during generation.
- [123] arXiv:2602.02704 [pdf, ps, other]
-
Title: InfMem: Learning System-2 Memory Control for Long-Context AgentAuthors: Xinyu Wang, Mingze Li, Peng Lu, Xiao-Wen Chang, Lifeng Shang, Jinping Li, Fei Mi, Prasanna Parthasarathi, Yufei CuiSubjects: Computation and Language (cs.CL)
Reasoning over ultra-long documents requires synthesizing sparse evidence scattered across distant segments under strict memory constraints. While streaming agents enable scalable processing, their passive memory update strategy often fails to preserve low-salience bridging evidence required for multi-hop reasoning. We propose InfMem, a control-centric agent that instantiates System-2-style control via a PreThink-Retrieve-Write protocol. InfMem actively monitors evidence sufficiency, performs targeted in-document retrieval, and applies evidence-aware joint compression to update a bounded memory. To ensure reliable control, we introduce a practical SFT-to-RL training recipe that aligns retrieval, writing, and stopping decisions with end-task correctness. On ultra-long QA benchmarks from 32k to 1M tokens, InfMem consistently outperforms MemAgent across backbones. Specifically, InfMem improves average absolute accuracy by +10.17, +11.84, and +8.23 points on Qwen3-1.7B, Qwen3-4B, and Qwen2.5-7B, respectively, while reducing inference time by $3.9\times$ on average (up to $5.1\times$) via adaptive early stopping.
- [124] arXiv:2602.02707 [pdf, ps, other]
-
Title: Every Bit Counts: A Theoretical Study of Precision-Expressivity Tradeoffs in Quantized TransformersSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Quantization reduces the numerical precision of Transformer computations and is widely used to accelerate inference, yet its effect on expressivity remains poorly characterized. We demonstrate a fine-grained theoretical tradeoff between expressivity and precision: For every p we exhibit a function {\Gamma}, inspired by the equality function, and prove that a one-layer softmax Transformer can compute {\Gamma}, with p bits of precision, but not with p-1 bits of precision.
This result concretely explains the widely observed phenomenon of empirical loss of expressivity when quantization is used. Practically, it suggests that tasks requiring equality-like comparisons (exact match, membership, etc.) are especially sensitive to quantization. Dropping even one bit can cross a threshold where the model cannot represent the needed comparison reliably. Thus, it paves the way for developing heuristics that will help practitioners choose how much quantization is possible: the precision should be chosen as a function of the length of equality to be checked for the specific task.
Our proofs combine explicit finite-precision Transformer constructions with communication-complexity lower bounds, yielding a tight "one-bit" threshold. - [125] arXiv:2602.02708 [pdf, ps, other]
-
Title: BinaryPPO: Efficient Policy Optimization for Binary ClassificationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Supervised fine-tuning (SFT) is the standard approach for binary classification tasks such as toxicity detection, factuality verification, and causal inference. However, SFT often performs poorly in real-world settings with label noise, class imbalance, or sparse supervision. We introduce BinaryPPO, an offline reinforcement learning large language model (LLM) framework that reformulates binary classification as a reward maximization problem. Our method leverages a variant of Proximal Policy Optimization (PPO) with a confidence-weighted reward function that penalizes uncertain or incorrect predictions, enabling the model to learn robust decision policies from static datasets without online interaction. Across eight domain-specific benchmarks and multiple models with differing architectures, BinaryPPO improves accuracy by 40-60 percentage points, reaching up to 99%, substantially outperforming supervised baselines. We provide an in-depth analysis of the role of reward shaping, advantage scaling, and policy stability in enabling this improvement. Overall, we demonstrate that confidence-based reward design provides a robust alternative to SFT for binary classification. Our code is available at https://github.com/psyonp/BinaryPPO.
- [126] arXiv:2602.02709 [pdf, ps, other]
-
Title: ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM SupportersSubjects: Artificial Intelligence (cs.AI)
Recent multi-LLM agent systems perform well in prompt optimization and automated problem-solving, but many either keep the solver frozen after fine-tuning or rely on a static preference-optimization loop, which becomes intractable for long-horizon tasks. We propose ATLAS (Adaptive Task-distributed Learning for Agentic Self-evolution), a task-distributed framework that iteratively develops a lightweight research agent while delegating complementary roles to specialized supporter agents for exploration, hyperparameter tuning, and reference policy management. Our core algorithm, Evolving Direct Preference Optimization (EvoDPO), adaptively updates the phase-indexed reference policy. We provide a theoretical regret analysis for a preference-based contextual bandit under concept drift. In addition, experiments were conducted on non-stationary linear contextual bandits and scientific machine learning (SciML) loss reweighting for the 1D Burgers' equation. Both results show that ATLAS improves stability and performance over a static single-agent baseline.
- [127] arXiv:2602.02710 [pdf, ps, other]
-
Title: Maximum Likelihood Reinforcement LearningAuthors: Fahim Tajwar, Guanning Zeng, Yueer Zhou, Yuda Song, Daman Arora, Yiding Jiang, Jeff Schneider, Ruslan Salakhutdinov, Haiwen Feng, Andrea ZanetteComments: Project website and code: this https URLSubjects: Machine Learning (cs.LG)
Reinforcement learning is the method of choice to train models in sampling-based setups with binary outcome feedback, such as navigation, code generation, and mathematical problem solving. In such settings, models implicitly induce a likelihood over correct rollouts. However, we observe that reinforcement learning does not maximize this likelihood, and instead optimizes only a lower-order approximation. Inspired by this observation, we introduce Maximum Likelihood Reinforcement Learning (MaxRL), a sampling-based framework to approximate maximum likelihood using reinforcement learning techniques. MaxRL addresses the challenges of non-differentiable sampling by defining a compute-indexed family of sample-based objectives that interpolate between standard reinforcement learning and exact maximum likelihood as additional sampling compute is allocated. The resulting objectives admit a simple, unbiased policy-gradient estimator and converge to maximum likelihood optimization in the infinite-compute limit. Empirically, we show that MaxRL Pareto-dominates existing methods in all models and tasks we tested, achieving up to 20x test-time scaling efficiency gains compared to its GRPO-trained counterpart. We also observe MaxRL to scale better with additional data and compute. Our results suggest MaxRL is a promising framework for scaling RL training in correctness based settings.
- [128] arXiv:2602.02711 [pdf, ps, other]
-
Title: Dynamic Mix Precision Routing for Efficient Multi-step LLM InteractionSubjects: Artificial Intelligence (cs.AI)
Large language models (LLM) achieve strong performance in long-horizon decision-making tasks through multi-step interaction and reasoning at test time. While practitioners commonly believe a higher task success rate necessitates the use of a larger and stronger LLM model, multi-step interaction with a large LLM incurs prohibitive inference cost. To address this problem, we explore the use of low-precision quantized LLM in the long-horizon decision-making process. Based on the observation of diverse sensitivities among interaction steps, we propose a dynamic mix-precision routing framework that adaptively selects between high-precision and low-precision LLMs at each decision step. The router is trained via a two-stage pipeline, consisting of KL-divergence-based supervised learning that identifies precision-sensitive steps, followed by Group-Relative Policy Optimization (GRPO) to further improve task success rates. Experiments on ALFWorld demonstrate that our approach achieves a great improvement on accuracy-cost trade-off over single-precision baselines and heuristic routing methods.
- [129] arXiv:2602.02712 [pdf, ps, other]
-
Title: Towards Understanding Steering StrengthComments: 33 pages (including appendix)Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
A popular approach to post-training control of large language models (LLMs) is the steering of intermediate latent representations. Namely, identify a well-chosen direction depending on the task at hand and perturbs representations along this direction at inference time. While many propositions exist to pick this direction, considerably less is understood about how to choose the magnitude of the move, whereas its importance is clear: too little and the intended behavior does not emerge, too much and the model's performance degrades beyond repair. In this work, we propose the first theoretical analysis of steering strength. We characterize its effect on next token probability, presence of a concept, and cross-entropy, deriving precise qualitative laws governing these quantities. Our analysis reveals surprising behaviors, including non-monotonic effects of steering strength. We validate our theoretical predictions empirically on eleven language models, ranging from a small GPT architecture to modern models.
- [130] arXiv:2602.02716 [pdf, ps, other]
-
Title: Neural Probabilistic Amplitude Shaping for Nonlinear Fiber ChannelsComments: 3 pages, 2 figures, Submitted to Optical Fiber Communication Conference (OFC) 2026Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
We introduce neural probabilistic amplitude shaping, a joint-distribution learning framework for coherent fiber systems. The proposed scheme provides a 0.5 dB signal-to-noise ratio gain over sequence selection for dual-polarized 64-QAM transmission across a single-span 205 km link.
- [131] arXiv:2602.02717 [pdf, ps, other]
-
Title: On the Feasibility of Hybrid Homomorphic Encryption for Intelligent Transportation SystemsComments: This version has been submitted to a peer-reviewed journal and is currently under reviewSubjects: Cryptography and Security (cs.CR)
Many Intelligent Transportation Systems (ITS) applications require strong privacy guarantees for both users and their data. Homomorphic encryption (HE) enables computation directly on encrypted messages and thus offers a compelling approach to privacy-preserving data processing in ITS. However, practical HE schemes incur substantial ciphertext expansion and communication overhead, which limits their suitability for time-critical transportation systems. Hybrid homomorphic encryption (HHE) addresses this challenge by combining a homomorphic encryption scheme with a symmetric cipher, enabling efficient encrypted computation while dramatically reducing communication cost. In this paper, we develop theoretical models of representative ITS applications that integrate HHE to protect sensitive vehicular data. We then perform a parameter-based evaluation of the HHE scheme Rubato to estimate ciphertext sizes and communication overhead under realistic ITS workloads. Our results show that HHE achieves orders-of-magnitude reductions in ciphertext size compared with conventional HE while maintaining cryptographic security, making it significantly more practical for latency-constrained ITS communication.
- [132] arXiv:2602.02718 [pdf, ps, other]
-
Title: Composition for Pufferfish PrivacySubjects: Cryptography and Security (cs.CR)
When creating public data products out of confidential datasets, inferential/posterior-based privacy definitions, such as Pufferfish, provide compelling privacy semantics for data with correlations. However, such privacy definitions are rarely used in practice because they do not always compose. For example, it is possible to design algorithms for these privacy definitions that have no leakage when run once but reveal the entire dataset when run more than once. We prove necessary and sufficient conditions that must be added to ensure linear composition for Pufferfish mechanisms, hence avoiding such privacy collapse. These extra conditions turn out to be differential privacy-style inequalities, indicating that achieving both the interpretable semantics of Pufferfish for correlated data and composition benefits requires adopting differentially private mechanisms to Pufferfish. We show that such translation is possible through a concept called the $(a,b)$-influence curve, and many existing differentially private algorithms can be translated with our framework into a composable Pufferfish algorithm. We illustrate the benefit of our new framework by designing composable Pufferfish algorithms for Markov chains that significantly outperform prior work.
- [133] arXiv:2602.02721 [pdf, ps, other]
-
Title: End-to-end reconstruction of OCT optical properties and speckle-reduced structural intensity via physics-based learningSubjects: Computer Vision and Pattern Recognition (cs.CV)
Inverse scattering in optical coherence tomography (OCT) seeks to recover both structural images and intrinsic tissue optical properties, including refractive index, scattering coefficient, and anisotropy. This inverse problem is challenging due to attenuation, speckle noise, and strong coupling among parameters. We propose a regularized end-to-end deep learning framework that jointly reconstructs optical parameter maps and speckle-reduced OCT structural intensity for layer visualization. Trained with Monte Carlo-simulated ground truth, our network incorporates a physics-based OCT forward model that generates predicted signals from the estimated parameters, providing physics-consistent supervision for parameter recovery and artifact suppression. Experiments on the synthetic corneal OCT dataset demonstrate robust optical map recovery under noise, improved resolution, and enhanced structural fidelity. This approach enables quantitative multi-parameter tissue characterization and highlights the benefit of combining physics-informed modeling with deep learning for computational OCT.
- [134] arXiv:2602.02722 [pdf, ps, other]
-
Title: Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal DiffusionComments: ICLR 2026Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
We propose a hierarchical entity-centric framework for offline Goal-Conditioned Reinforcement Learning (GCRL) that combines subgoal decomposition with factored structure to solve long-horizon tasks in domains with multiple entities. Achieving long-horizon goals in complex environments remains a core challenge in Reinforcement Learning (RL). Domains with multiple entities are particularly difficult due to their combinatorial complexity. GCRL facilitates generalization across goals and the use of subgoal structure, but struggles with high-dimensional observations and combinatorial state-spaces, especially under sparse reward. We employ a two-level hierarchy composed of a value-based GCRL agent and a factored subgoal-generating conditional diffusion model. The RL agent and subgoal generator are trained independently and composed post hoc through selective subgoal generation based on the value function, making the approach modular and compatible with existing GCRL algorithms. We introduce new variations to benchmark tasks that highlight the challenges of multi-entity domains, and show that our method consistently boosts performance of the underlying RL agent on image-based long-horizon tasks with sparse rewards, achieving over 150% higher success rates on the hardest task in our suite and generalizing to increasing horizons and numbers of entities. Rollout videos are provided at: https://sites.google.com/view/hecrl
- [135] arXiv:2602.02724 [pdf, ps, other]
-
Title: Automatic Design of Optimization Test Problems with Large Language ModelsSubjects: Neural and Evolutionary Computing (cs.NE)
The development of black-box optimization algorithms depends on the availability of benchmark suites that are both diverse and representative of real-world problem landscapes. Widely used collections such as BBOB and CEC remain dominated by hand-crafted synthetic functions and provide limited coverage of the high-dimensional space of Exploratory Landscape Analysis (ELA) features, which in turn biases evaluation and hinders training of meta-black-box optimizers. We introduce Evolution of Test Functions (EoTF), a framework that automatically generates continuous optimization test functions whose landscapes match a specified target ELA feature vector. EoTF adapts LLM-driven evolutionary search, originally proposed for heuristic discovery, to evolve interpretable, self-contained numpy implementations of objective functions by minimizing the distance between sampled ELA features of generated candidates and a target profile. In experiments on 24 noiseless BBOB functions and a contamination-mitigating suite of 24 MA-BBOB hybrid functions, EoTF reliably produces non-trivial functions with closely matching ELA characteristics and preserves optimizer performance rankings under fixed evaluation budgets, supporting their validity as surrogate benchmarks. While a baseline neural-network-based generator achieves higher accuracy in 2D, EoTF substantially outperforms it in 3D and exhibits stable solution quality as dimensionality increases, highlighting favorable scalability. Overall, EoTF offers a practical route to scalable, portable, and interpretable benchmark generation targeted to desired landscape properties.
- [136] arXiv:2602.02725 [pdf, ps, other]
-
Title: Automated Dysphagia Screening Using Noninvasive Neck Acoustic SensingAuthors: Jade Chng, Rong Xing, Yunfei Luo, Kristen Linnemeyer-Risser, Tauhidur Rahman, Andrew Yousef, Philip A WeissbrodComments: Accepted to 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026)Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Pharyngeal health plays a vital role in essential human functions such as breathing, swallowing, and vocalization. Early detection of swallowing abnormalities, also known as dysphagia, is crucial for timely intervention. However, current diagnostic methods often rely on radiographic imaging or invasive procedures. In this study, we propose an automated framework for detecting dysphagia using portable and noninvasive acoustic sensing coupled with applied machine learning. By capturing subtle acoustic signals from the neck during swallowing tasks, we aim to identify patterns associated with abnormal physiological conditions. Our approach achieves promising test-time abnormality detection performance, with an AUC-ROC of 0.904 under 5 independent train-test splits. This work demonstrates the feasibility of using noninvasive acoustic sensing as a practical and scalable tool for pharyngeal health monitoring.
- [137] arXiv:2602.02726 [pdf, ps, other]
-
Title: Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept DiscoverySubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Deep Learning models encode rich semantic information in their hidden representations. However, it remains challenging to understand which parts of this information models actually rely on when making predictions. A promising line of post-hoc concept-based explanation methods relies on clustering token representations. However, commonly used approaches such as hierarchical clustering are computationally infeasible for large-scale datasets, and K-Means often yields shallow or frequency-dominated clusters. We propose the vector quantized latent concept (VQLC) method, a framework built upon the vector quantized-variational autoencoder (VQ-VAE) architecture that learns a discrete codebook mapping continuous representations to concept vectors. We perform thorough evaluations and show that VQLC improves scalability while maintaining comparable quality of human-understandable explanations.
- [138] arXiv:2602.02727 [pdf, ps, other]
-
Title: Search-Augmented Masked Diffusion Models for Constrained GenerationAuthors: Huu Binh Ta (1), Michael Cardei (1), Alvaro Velasquez (2), Ferdinando Fioretto (1) ((1) University of Virginia, (2) University of Colorado at Boulder)Comments: Huu Binh Ta and Michael Cardei contributed equally to this workSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Discrete diffusion models generate sequences by iteratively denoising samples corrupted by categorical noise, offering an appealing alternative to autoregressive decoding for structured and symbolic generation. However, standard training targets a likelihood-based objective that primarily matches the data distribution and provides no native mechanism for enforcing hard constraints or optimizing non-differentiable properties at inference time. This work addresses this limitation and introduces Search-Augmented Masked Diffusion (SearchDiff), a training-free neurosymbolic inference framework that integrates informed search directly into the reverse denoising process. At each denoising step, the model predictions define a proposal set that is optimized under a user-specified property satisfaction, yielding a modified reverse transition that steers sampling toward probable and feasible solutions. Experiments in biological design and symbolic reasoning illustrate that SearchDiff substantially improves constraint satisfaction and property adherence, while consistently outperforming discrete diffusion and autoregressive baselines.
- [139] arXiv:2602.02729 [pdf, ps, other]
-
Title: CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series ForecastingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
This paper presents $\textbf{CAPS}$ (Clock-weighted Aggregation with Prefix-products and Softmax), a structured attention mechanism for time series forecasting that decouples three distinct temporal structures: global trends, local shocks, and seasonal patterns. Standard softmax attention entangles these through global normalization, while recent recurrent models sacrifice long-term, order-independent selection for order-dependent causal structure. CAPS combines SO(2) rotations for phase alignment with three additive gating paths -- Riemann softmax, prefix-product gates, and a Clock baseline -- within a single attention layer. We introduce the Clock mechanism, a learned temporal weighting that modulates these paths through a shared notion of temporal importance. Experiments on long- and short-term forecasting benchmarks surpass vanilla softmax and linear attention mechanisms and demonstrate competitive performance against seven strong baselines with linear complexity. Our code implementation is available at https://github.com/vireshpati/CAPS-Attention.
- [140] arXiv:2602.02730 [pdf, ps, other]
-
Title: AROLA: A Modular Layered Architecture for Scaled Autonomous RacingComments: 6 pages, 6 figures, IV 2026Subjects: Robotics (cs.RO); Software Engineering (cs.SE)
Autonomous racing has advanced rapidly, particularly on scaled platforms, and software stacks must evolve accordingly. In this work, AROLA is introduced as a modular, layered software architecture in which fragmented and monolithic designs are reorganized into interchangeable layers and components connected through standardized ROS 2 interfaces. The autonomous-driving pipeline is decomposed into sensing, pre-processing, perception, localization and mapping, planning, behavior, control, and actuation, enabling rapid module replacement and objective benchmarking without reliance on custom message definitions. To support consistent performance evaluation, a Race Monitor framework is introduced as a lightweight system through which lap timing, trajectory quality, and computational load are logged in real time and standardized post-race analyses are generated. AROLA is validated in simulation and on hardware using the RoboRacer platform, including deployment at the 2025 RoboRacer IV25 competition. Together, AROLA and Race Monitor demonstrate that modularity, transparent interfaces, and systematic evaluation can accelerate development and improve reproducibility in scaled autonomous racing.
- [141] arXiv:2602.02731 [pdf, ps, other]
-
Title: Predicting first-episode homelessness among US Veterans using longitudinal EHR data: time-varying models and social risk factorsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Homelessness among US veterans remains a critical public health challenge, yet risk prediction offers a pathway for proactive intervention. In this retrospective prognostic study, we analyzed electronic health record (EHR) data from 4,276,403 Veterans Affairs patients during a 2016 observation period to predict first-episode homelessness occurring 3-12 months later in 2017 (prevalence: 0.32-1.19%). We constructed static and time-varying EHR representations, utilizing clinician-informed logic to model the persistence of clinical conditions and social risks over time. We then compared the performance of classical machine learning, transformer-based masked language models, and fine-tuned large language models (LLMs). We demonstrate that incorporating social and behavioral factors into longitudinal models improved precision-recall area under the curve (PR-AUC) by 15-30%. In the top 1% risk tier, models yielded positive predictive values ranging from 3.93-4.72% at 3 months, 7.39-8.30% at 6 months, 9.84-11.41% at 9 months, and 11.65-13.80% at 12 months across model architectures. Large language models underperformed encoder-based models on discrimination but showed smaller performance disparities across racial groups. These results demonstrate that longitudinal, socially informed EHR modeling concentrates homelessness risk into actionable strata, enabling targeted and data-informed prevention strategies for at-risk veterans.
- [142] arXiv:2602.02735 [pdf, ps, other]
-
Title: TabPFN for Zero-shot Parametric Engineering Design GenerationComments: 14 pages, 8 figuresSubjects: Machine Learning (cs.LG)
Deep generative models for engineering design often require substantial computational cost, large training datasets, and extensive retraining when design requirements or datasets change, limiting their applicability in real-world engineering design workflow. In this work, we propose a zero-shot generation framework for parametric engineering design based on TabPFN, enabling conditional design generation using only a limited number of reference samples and without any task-specific model training or fine-tuning. The proposed method generates design parameters sequentially conditioned on target performance indicators, providing a flexible alternative to conventional generative models. The effectiveness of the proposed approach is evaluated on three engineering design datasets, i.e., ship hull design, BlendedNet aircraft, and UIUC airfoil. Experimental results demonstrate that the proposed method achieves competitive diversity across highly structured parametric design spaces, remains robust to variations in sampling, resolution and parameter dimensionality of geometry generation, and achieves a low performance error (e.g., less than 2% in generated ship hull designs' performance). Compared with diffusion-based generative models, the proposed framework significantly reduces computational overhead and data requirements while preserving reliable generation performance. These results highlight the potential of zero-shot, data-efficient generation as a practical and efficient tool for engineering design, enabling rapid deployment, flexible adaptation to new design settings, and ease of integration into real-world engineering workflows.
- [143] arXiv:2602.02736 [pdf, ps, other]
-
Title: Time-Critical Multimodal Medical Transportation: Organs, Patients, and Medical SuppliesComments: This work has been submitted to the IEEE for possible publicationSubjects: Computation and Language (cs.CL)
Timely transportation of organs, patients, and medical supplies is critical to modern healthcare, particularly in emergencies and transplant scenarios where even short delays can severely impact outcomes. Traditional ground-based vehicles such as ambulances are often hindered by traffic congestion; while air vehicles such as helicopters are faster but costly. Emerging air vehicles -- Unmanned Aerial Vehicles and electric vertical take-off and landing aircraft -- have lower operating costs, but remain limited by range and susceptibility to weather conditions. A multimodal transportation system that integrates both air and ground vehicles can leverage the strengths of each to enhance overall transportation efficiency. This study introduces a constructive greedy heuristic algorithm for multimodal vehicle dispatching for medical transportation. Four different fleet configurations were tested: (i) ambulances only, (ii) ambulances with Unmanned Aerial Vehicles, (iii) ambulances with electric vertical take-off and landing aircraft, and (iv) a fully integrated fleet of ambulances, Unmanned Aerial Vehicles, and electric vertical take-off and landing aircraft. The algorithm incorporates payload consolidation across compatible routes, accounts for traffic congestion in ground operations and weather conditions in aerial operations, while enabling rapid vehicle dispatching compared to computationally intensive optimization models. Using a common set of conditions, we evaluate all four fleet types to identify the most effective configurations for fulfilling medical transportation needs while minimizing operating costs, recharging/fuel costs, and total transportation time.
- [144] arXiv:2602.02738 [pdf, ps, other]
-
Title: When Noise Lowers The Loss: Rethinking Likelihood-Based Evaluation in Music Large Language ModelsComments: Accepted by IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2026Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
The rise of music large language models (LLMs) demands robust methods of evaluating output quality, especially in distinguishing high-quality compositions from "garbage music". Curiously, we observe that the standard cross-entropy loss -- a core training metric -- often decrease when models encounter systematically corrupted music, undermining its validity as a standalone quality indicator. To investigate this paradox, we introduce noise injection experiment, where controlled noise signal of varying lengths are injected into musical contexts. We hypothesize that a model's loss reacting positively to these perturbations, specifically a sharp increase ("Peak" area) for short injection, can serve as a proxy for its ability to discern musical integrity. Experiments with MusicGen models in the audio waveform domain confirm that Music LLMs respond more strongly to local, texture-level disruptions than to global semantic corruption. Beyond exposing this bias, our results highlight a new principle: the shape of the loss curve -- rather than its absolute value -- encodes critical information about the quality of the generated content (i.e., model behavior). We envision this profile-based evaluation as a label-free, model-intrinsic framework for assessing musical quality -- opening the door to more principled training objectives and sharper benchmarks.
- [145] arXiv:2602.02739 [pdf, ps, other]
-
Title: TopoPrune: Robust Data Pruning via Unified Latent Space TopologyComments: Preprint. Under ReviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Geometric data pruning methods, while practical for leveraging pretrained models, are fundamentally unstable. Their reliance on extrinsic geometry renders them highly sensitive to latent space perturbations, causing performance to degrade during cross-architecture transfer or in the presence of feature noise. We introduce TopoPrune, a framework which resolves this challenge by leveraging topology to capture the stable, intrinsic structure of data. TopoPrune operates at two scales, (1) utilizing a topology-aware manifold approximation to establish a global low-dimensional embedding of the dataset. Subsequently, (2) it employs differentiable persistent homology to perform a local topological optimization on the manifold embeddings, ranking samples by their structural complexity. We demonstrate that our unified dual-scale topological approach ensures high accuracy and precision, particularly at significant dataset pruning rates (e.g., 90%). Furthermore, through the inherent stability properties of topology, TopoPrune is (a) exceptionally robust to noise perturbations of latent feature embeddings and (b) demonstrates superior transferability across diverse network architectures. This study demonstrates a promising avenue towards stable and principled topology-based frameworks for robust data-efficient learning.
- [146] arXiv:2602.02740 [pdf, ps, other]
-
Title: Framing Responsible Design of AI Mental Well-Being Support: AI as Primary Care, Nutritional Supplement, or Yoga Instructor?Comments: 16 pages, 1 figure, 2 tables. To appear at CHI '26Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
Millions of people now use non-clinical Large Language Model (LLM) tools like ChatGPT for mental well-being support. This paper investigates what it means to design such tools responsibly, and how to operationalize that responsibility in their design and evaluation. By interviewing experts and analyzing related regulations, we found that designing an LLM tool responsibly involves: (1) Articulating the specific benefits it guarantees and for whom. Does it guarantee specific, proven relief, like an over-the-counter drug, or offer minimal guarantees, like a nutritional supplement? (2) Specifying the LLM tool's "active ingredients" for improving well-being and whether it guarantees their effective delivery (like a primary care provider) or not (like a yoga instructor). These specifications outline an LLM tool's pertinent risks, appropriate evaluation metrics, and the respective responsibilities of LLM developers, tool designers, and users. These analogies - LLM tools as supplements, drugs, yoga instructors, and primary care providers - can scaffold further conversations about their responsible design.
- [147] arXiv:2602.02741 [pdf, ps, other]
-
Title: PokeNet: Learning Kinematic Models of Articulated Objects from Human ObservationsSubjects: Robotics (cs.RO)
Articulation modeling enables robots to learn joint parameters of articulated objects for effective manipulation which can then be used downstream for skill learning or planning. Existing approaches often rely on prior knowledge about the objects, such as the number or type of joints. Some of these approaches also fail to recover occluded joints that are only revealed during interaction. Others require large numbers of multi-view images for every object, which is impractical in real-world settings. Furthermore, prior works neglect the order of manipulations, which is essential for many multi-DoF objects where one joint must be operated before another, such as a dishwasher. We introduce PokeNet, an end-to-end framework that estimates articulation models from a single human demonstration without prior object knowledge. Given a sequence of point cloud observations of a human manipulating an unknown object, PokeNet predicts joint parameters, infers manipulation order, and tracks joint states over time. PokeNet outperforms existing state-of-the-art methods, improving joint axis and state estimation accuracy by an average of over 27% across diverse objects, including novel and unseen categories. We demonstrate these gains in both simulation and real-world environments.
- [148] arXiv:2602.02742 [pdf, ps, other]
-
Title: Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular UnderstandingComments: Accepted by ICLR 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Molecular understanding is central to advancing areas such as scientific discovery, yet Large Language Models (LLMs) struggle to understand molecular graphs effectively. Existing graph-LLM bridges often adapt the Q-Former-style connector with fixed-length static tokens, which is originally designed for vision tasks. These designs overlook stereochemistry and substructural context and typically require costly LLM-backbone fine-tuning, limiting efficiency and generalization. We introduce EDT-Former, an Entropy-guided Dynamic Token Transformer that generates tokens aligned with informative molecular patches, thereby preserving both local and global structural features for molecular graph understanding. Beyond prior approaches, EDT-Former enables alignment between frozen graph encoders and LLMs without tuning the LLM backbone (excluding the embedding layer), resulting in computationally efficient finetuning, and achieves stateof-the-art results on MoleculeQA, Molecule-oriented Mol-Instructions, and property prediction benchmarks (TDC, MoleculeNet), underscoring its effectiveness for scalable and generalizable multimodal molecular understanding
- [149] arXiv:2602.02743 [pdf, ps, other]
-
Title: Exploring Collaborative Immersive Visualization & Analytics for High-Dimensional Scientific Data through Domain Expert PerspectivesComments: Conditionally accepted at the Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI 26)Subjects: Human-Computer Interaction (cs.HC)
Cross-disciplinary teams increasingly work with high-dimensional scientific datasets, yet fragmented toolchains and limited support for shared exploration hinder collaboration. Prior immersive visualization and analytics research has emphasized individual interaction, leaving open how multi-user collaboration can be supported at scale. To fill this critical gap, we conduct semi-structured interviews with 20 domain experts from diverse academic, government, and industry backgrounds. Using deductive-inductive hybrid thematic analysis, we identify four collaboration-focused themes: workflow challenges, adoption perceptions, prospective features, and anticipated usability and ethical risks. These findings show how current ecosystems disrupt coordination and shared understanding, while highlighting opportunities for effective multi-user engagement. Our study contributes empirical insights into collaboration practices for high-dimensional scientific data visualization and analysis, offering design implications to enhance coordination, mutual awareness, and equitable participation in next-generation collaborative immersive platforms. These contributions point toward future environments enabling distributed, cross-device teamwork on high-dimensional scientific data.
- [150] arXiv:2602.02745 [pdf, ps, other]
-
Title: Ethical Asymmetry in Human-Robot Interaction - An Empirical Test of Sparrow's HypothesisComments: 27 pages, 3 figuresSubjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO)
The ethics of human-robot interaction (HRI) have been discussed extensively based on three traditional frameworks: deontology, consequentialism, and virtue ethics. We conducted a mixed within/between experiment to investigate Sparrow's proposed ethical asymmetry hypothesis in human treatment of robots. The moral permissibility of action (MPA) was manipulated as a subject grouping variable, and virtue type (prudence, justice, courage, and temperance) was controlled as a within-subjects factor. We tested moral stimuli using an online questionnaire with Perceived Moral Permissibility of Action (PMPA) and Perceived Virtue Scores (PVS) as response measures. The PVS measure was based on an adaptation of the established Questionnaire on Cardinal Virtues (QCV), while the PMPA was based on Malle et al. [39] work. We found that the MPA significantly influenced the PMPA and perceived virtue scores. The best-fitting model to describe the relationship between PMPA and PVS was cubic, which is symmetrical in nature. Our study did not confirm Sparrow's asymmetry hypothesis. The adaptation of the QCV is expected to have utility for future studies, pending additional psychometric property assessments.
- [151] arXiv:2602.02748 [pdf, ps, other]
-
Title: A Parametrized Complexity View on Robust Scheduling with Budgeted UncertaintySubjects: Discrete Mathematics (cs.DM)
In this study, we investigate a robust single-machine scheduling problem under processing time uncertainty. The uncertainty is modeled using the budgeted approach, where each job has a nominal and deviation processing time, and the number of deviations is bounded by Gamma. The objective is to minimize the maximum number of tardy jobs over all possible scenarios. Since the problem is NP-hard in general, we focus on analyzing its tractability under the assumption that some natural parameter of the problem is bounded by a constant. We consider three parameters: the robustness parameter Gamma, the number of distinct due dates in the instance, and the number of jobs with nonzero deviations. Using parametrized-complexity theory, we prove that the problem is W[1]-hard with respect to Gamma, but can be solved in XP time with respect to the same parameter. With respect to the number of different due dates, we establish a stronger hardness result by showing that the problem remains NP-hard even when there are only two different due dates and is solvable in pseudo-polynomial time when the number of due dates is upper bounded by a constant. To complement these results, we show that the case of a common (single) due date, reduces to a robust binary knapsack problem with equal item profits, which we prove to be solvable in polynomial time. Finally, we prove that the problem is solvable in FPT time with respect to the number of nonzero deviations.
- [152] arXiv:2602.02751 [pdf, ps, other]
-
Title: Scaling Small Agents Through Strategy AuctionsSubjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Small language models are increasingly viewed as a promising, cost-effective approach to agentic AI, with proponents claiming they are sufficiently capable for agentic workflows. However, while smaller agents can closely match larger ones on simple tasks, it remains unclear how their performance scales with task complexity, when large models become necessary, and how to better leverage small agents for long-horizon workloads. In this work, we empirically show that small agents' performance fails to scale with task complexity on deep search and coding tasks, and we introduce Strategy Auctions for Workload Efficiency (SALE), an agent framework inspired by freelancer marketplaces. In SALE, agents bid with short strategic plans, which are scored by a systematic cost-value mechanism and refined via a shared auction memory, enabling per-task routing and continual self-improvement without training a separate router or running all models to completion. Across deep search and coding tasks of varying complexity, SALE reduces reliance on the largest agent by 53%, lowers overall cost by 35%, and consistently improves upon the largest agent's pass@1 with only a negligible overhead beyond executing the final trace. In contrast, established routers that rely on task descriptions either underperform the largest agent or fail to reduce cost -- often both -- underscoring their poor fit for agentic workflows. These results suggest that while small agents may be insufficient for complex workloads, they can be effectively "scaled up" through coordinated task allocation and test-time self-improvement. More broadly, they motivate a systems-level view of agentic AI in which performance gains come less from ever-larger individual models and more from market-inspired coordination mechanisms that organize heterogeneous agents into efficient, adaptive ecosystems.
- [153] arXiv:2602.02752 [pdf, ps, other]
-
Title: Beyond the Prompt: Assessing Domain Knowledge Strategies for High-Dimensional LLM Optimization in Software EngineeringComments: Accepted at MSR 2026 (Registered Reports Track)Subjects: Software Engineering (cs.SE)
Background/Context: Large Language Models (LLMs) demonstrate strong performance on low-dimensional software engineering optimization tasks ($\le$11 features) but consistently underperform on high-dimensional problems where Bayesian methods dominate. A fundamental gap exists in understanding how systematic integration of domain knowledge (whether from humans or automated reasoning) can bridge this divide.
Objective/Aim: We compare human versus artificial intelligence strategies for generating domain knowledge. We systematically evaluate four distinct architectures to determine if structured knowledge integration enables LLMs to generate effective warm starts for high-dimensional optimization.
Method: We evaluate four approaches on MOOT datasets stratified by dimensionality: (1) Human-in-the-Loop Domain Knowledge Prompting (H-DKP), utilizing asynchronous expert feedback loops; (2) Adaptive Multi-Stage Prompting (AMP), implementing sequential constraint identification and validation; (3) Dimension-Aware Progressive Refinement (DAPR), conducting optimization in progressively expanding feature subspaces; and (4) Hybrid Knowledge-Model Approach (HKMA), synthesizing statistical scouting (TPE) with RAG-enhanced prompting. Performance is quantified via Chebyshev distance to optimal solutions and ranked using Scott-Knott clustering against an established baseline for LLM generated warm starts.
Note that all human studies conducted as part of this study will comply with the policies of our local Institutional Review Board. - [154] arXiv:2602.02754 [pdf, ps, other]
-
Title: Deepfake Pornography is Resilient to Regulatory and Platform ShocksComments: 13 pages, 4 figures. Under submissionSubjects: Social and Information Networks (cs.SI)
Generative artificial intelligence tools have made it easier to create realistic, synthetic non-consensual explicit imagery (popularly known as deepfake pornography; hereinafter SNCEI) of people. Once created, this SNCEI is often shared on various websites, causing significant harm to victims. This emerging form of sexual abuse was recently criminalized in the US at the federal level by S.146, the TAKE IT DOWN Act. A week after the bill's passage became effectively imminent, the MrDeepfakes website -- one of the most notorious facilitators of SNCEI creation and dissemination -- shut down. Here, we explore the impact of the bill's passage and the subsequent shutdown as a compound intervention on the dissemination of SNCEI. We select three online forums where sexually explicit content is shared, each containing dedicated subforums to organize various types of sexually explicit content. By leveraging each forum's design, we compare activity in subforums dedicated to SNCEI with that in other pornographic genres using a synthetic control, quasi-experimental approach. Across websites, we observed an increase in the sharing and requests for SNCEI, and, in some cases, in new contributors. These results indicate that the compound intervention did not suppress SNCEI activity overall but instead coincided with its redistribution across platforms, with substantial heterogeneity in timing and magnitude. Together, our findings suggest that deplatforming and regulatory signals alone may shift where and when SNCEI is produced and shared, rather than reducing its prevalence.
- [155] arXiv:2602.02760 [pdf, ps, other]
-
Title: From Task Solving to Robust Real-World Adaptation in LLM AgentsSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Large language models are increasingly deployed as specialized agents that plan, call tools, and take actions over extended horizons. Yet many existing evaluations assume a "clean interface" where dynamics are specified and stable, tools and sensors are reliable, and success is captured by a single explicit objective-often overestimating real-world readiness. In practice, agents face underspecified rules, unreliable signals, shifting environments, and implicit, multi-stakeholder goals. The challenge is therefore not just solving tasks, but adapting while solving: deciding what to trust, what is wanted, when to verify, and when to fall back or escalate. We stress-test deployment-relevant robustness under four operational circumstances: partial observability, dynamic environments, noisy signals, and dynamic agent state. We benchmark agentic LLMs in a grid-based game with a simple goal but long-horizon execution. Episodes violate clean-interface assumptions yet remain solvable, forcing agents to infer rules, pay for information, adapt to environmental and internal shifts, and act cautiously under noise. Across five state-of-the-art LLM agents, we find large gaps between nominal task-solving and deployment-like robustness. Performance generally degrades as grid size and horizon increase, but rankings are unstable: weaker models can beat stronger ones when strategy matches the uncertainty regime. Despite no explicit instruction, agents trade off completion, efficiency, and penalty avoidance, suggesting partial objective inference. Ablations and feature analyses reveal model-specific sensitivities and failure drivers, motivating work on verification, safe action selection, and objective inference under partial observability, noise, and non-stationarity.
- [156] arXiv:2602.02762 [pdf, ps, other]
-
Title: On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation LearningSubjects: Machine Learning (cs.LG)
Semi-supervised imitation learning (SSIL) consists in learning a policy from a small dataset of action-labeled trajectories and a much larger dataset of action-free trajectories. Some SSIL methods learn an inverse dynamics model (IDM) to predict the action from the current state and the next state. An IDM can act as a policy when paired with a video model (VM-IDM) or as a label generator to perform behavior cloning on action-free data (IDM labeling). In this work, we first show that VM-IDM and IDM labeling learn the same policy in a limit case, which we call the IDM-based policy. We then argue that the previously observed advantage of IDM-based policies over behavior cloning is due to the superior sample efficiency of IDM learning, which we attribute to two causes: (i) the ground-truth IDM tends to be contained in a lower complexity hypothesis class relative to the expert policy, and (ii) the ground-truth IDM is often less stochastic than the expert policy. We argue these claims based on insights from statistical learning theory and novel experiments, including a study of IDM-based policies using recent architectures for unified video-action prediction (UVA). Motivated by these insights, we finally propose an improved version of the existing LAPO algorithm for latent action policy learning.
- [157] arXiv:2602.02763 [pdf, ps, other]
-
Title: Exposing Vulnerabilities in Explanation for Time Series Classifiers via Dual-Target AttacksSubjects: Machine Learning (cs.LG)
Interpretable time series deep learning systems are often assessed by checking temporal consistency on explanations, implicitly treating this as evidence of robustness. We show that this assumption can fail: Predictions and explanations can be adversarially decoupled, enabling targeted misclassification while the explanation remains plausible and consistent with a chosen reference rationale. We propose TSEF (Time Series Explanation Fooler), a dual-target attack that jointly manipulates the classifier and explainer outputs. In contrast to single-objective misclassification attacks that disrupt explanation and spread attribution mass broadly, TSEF achieves targeted prediction changes while keeping explanations consistent with the reference. Across multiple datasets and explainer backbones, our results consistently reveal that explanation stability is a misleading proxy for decision robustness and motivate coupling-aware robustness evaluations for trustworthy time series tasks.
- [158] arXiv:2602.02765 [pdf, ps, other]
-
Title: SVD-ViT: Does SVD Make Vision Transformers Attend More to the Foreground?Comments: 8 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Vision Transformers (ViT) have been established as large-scale foundation models. However, because self-attention operates globally, they lack an explicit mechanism to distinguish foreground from background. As a result, ViT may learn unnecessary background features and artifacts, leading to degraded classification performance. To address this issue, we propose SVD-ViT, which leverages singular value decomposition (SVD) to prioritize the learning of foreground features. SVD-ViT consists of three components-\textbf{SPC module}, \textbf{SSVA}, and \textbf{ID-RSVD}-and suppresses task-irrelevant factors such as background noise and artifacts by extracting and aggregating singular vectors that capture object foreground information. Experimental results demonstrate that our method improves classification accuracy and effectively learns informative foreground representations while reducing the impact of background noise.
- [159] arXiv:2602.02766 [pdf, ps, other]
-
Title: Privately Fine-Tuned LLMs Preserve Temporal Dynamics in Tabular DataSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Research on differentially private synthetic tabular data has largely focused on independent and identically distributed rows where each record corresponds to a unique individual. This perspective neglects the temporal complexity in longitudinal datasets, such as electronic health records, where a user contributes an entire (sub) table of sequential events. While practitioners might attempt to model such data by flattening user histories into high-dimensional vectors for use with standard marginal-based mechanisms, we demonstrate that this strategy is insufficient. Flattening fails to preserve temporal coherence even when it maintains valid marginal distributions. We introduce PATH, a novel generative framework that treats the full table as the unit of synthesis and leverages the autoregressive capabilities of privately fine-tuned large language models. Extensive evaluations show that PATH effectively captures long-range dependencies that traditional methods miss. Empirically, our method reduces the distributional distance to real trajectories by over 60% and reduces state transition errors by nearly 50% compared to leading marginal mechanisms while achieving similar marginal fidelity.
- [160] arXiv:2602.02767 [pdf, ps, other]
-
Title: Provable Effects of Data Replay in Continual Learning: A Feature Learning PerspectiveComments: AISTATS 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Continual learning (CL) aims to train models on a sequence of tasks while retaining performance on previously learned ones. A core challenge in this setting is catastrophic forgetting, where new learning interferes with past knowledge. Among various mitigation strategies, data-replay methods, where past samples are periodically revisited, are considered simple yet effective, especially when memory constraints are relaxed. However, the theoretical effectiveness of full data replay, where all past data is accessible during training, remains largely unexplored. In this paper, we present a comprehensive theoretical framework for analyzing full data-replay training in continual learning from a feature learning perspective. Adopting a multi-view data model, we identify the signal-to-noise ratio (SNR) as a critical factor affecting forgetting. Focusing on task-incremental binary classification across $M$ tasks, our analysis verifies two key conclusions: (1) forgetting can still occur under full replay when the cumulative noise from later tasks dominates the signal from earlier ones; and (2) with sufficient signal accumulation, data replay can recover earlier tasks-even if their initial learning was poor. Notably, we uncover a novel insight into task ordering: prioritizing higher-signal tasks not only facilitates learning of lower-signal tasks but also helps prevent catastrophic forgetting. We validate our theoretical findings through synthetic and real-world experiments that visualize the interplay between signal learning and noise memorization across varying SNRs and task correlation regimes.
- [161] arXiv:2602.02768 [pdf, ps, other]
-
Title: Rate-Distortion Analysis of Optically Passive Vision CompressionSubjects: Information Theory (cs.IT)
The use of remote vision sensors for autonomous decision-making poses the challenge of transmitting high-volume visual data over resource-constrained channels in real-time. In robotics and control applications, many systems can quickly destabilize, which can exacerbate the issue by necessitating higher sampling frequencies. This work proposes a novel sensing paradigm in which an event camera observes the optically generated cosine transform of a visual scene, enabling high-speed, computation-free video compression inspired by modern video codecs. In this study, we simulate this optically passive vision compression (OPVC) scheme and compare its rate-distortion performance to that of a standalone event camera (SAEC). We find that the rate-distortion performance of the OPVC scheme surpasses that of the SAEC and that this performance gap increases as the spatial resolution of the event camera increases.
- [162] arXiv:2602.02769 [pdf, ps, other]
-
Title: BiTimeCrossNet: Time-Aware Self-Supervised Learning for Pediatric SleepSubjects: Machine Learning (cs.LG)
We present BiTimeCrossNet (BTCNet), a multimodal self-supervised learning framework for long physiological recordings such as overnight sleep studies. While many existing approaches train on short segments treated as independent samples, BTCNet incorporates information about when each segment occurs within its parent recording, for example within a sleep session. BTCNet further learns pairwise interactions between physiological signals via cross-attention, without requiring task labels or sequence-level supervision.
We evaluate BTCNet on pediatric sleep data across six downstream tasks, including sleep staging, arousal detection, and respiratory event detection. Under frozen-backbone linear probing, BTCNet consistently outperforms an otherwise identical non-time-aware variant, with gains that generalize to an independent pediatric dataset. Compared to existing multimodal self-supervised sleep models, BTCNet achieves strong performance, particularly on respiration-related tasks. - [163] arXiv:2602.02773 [pdf, ps, other]
-
Title: Bimanual High-Density EMG Control for In-Home Mobile Manipulation by a User with QuadriplegiaComments: 14 pages, 17 figuresSubjects: Robotics (cs.RO)
Mobile manipulators in the home can enable people with cervical spinal cord injury (cSCI) to perform daily physical household tasks that they could not otherwise do themselves. However, paralysis in these users often limits access to traditional robot control interfaces such as joysticks or keyboards. In this work, we introduce and deploy the first system that enables a user with quadriplegia to control a mobile manipulator in their own home using bimanual high-density electromyography (HDEMG). We develop a pair of custom, fabric-integrated HDEMG forearm sleeves, worn on both arms, that capture residual neuromotor activity from clinically paralyzed degrees of freedom and support real-time gesture-based robot control. Second, by integrating vision, language, and motion planning modules, we introduce a shared autonomy framework that supports robust and user-driven teleoperation, with particular benefits for navigation-intensive tasks in home environments. Finally, to demonstrate the system in the wild, we present a twelve-day in-home user study evaluating real-time use of the wearable EMG interface for daily robot control. Together, these system components enable effective robot control for performing activities of daily living and other household tasks in a real home environment.
- [164] arXiv:2602.02774 [pdf, ps, other]
-
Title: AmharicStoryQA: A Multicultural Story Question Answering Benchmark in AmharicAuthors: Israel Abebe Azime, Abenezer Kebede Angamo, Hana Mekonen Tamiru, Dagnachew Mekonnen Marilign, Philipp Slusallek, Seid Muhie Yimam, Dietrich KlakowSubjects: Computation and Language (cs.CL)
With the growing emphasis on multilingual and cultural evaluation benchmarks for large language models, language and culture are often treated as synonymous, and performance is commonly used as a proxy for a models understanding of a given language. In this work, we argue that such evaluations overlook meaningful cultural variation that exists within a single language. We address this gap by focusing on narratives from different regions of Ethiopia and demonstrate that, despite shared linguistic characteristics, region-specific and domain-specific content substantially influences language evaluation outcomes. To this end, we introduce \textbf{\textit{AmharicStoryQA}}, a long-sequence story question answering benchmark grounded in culturally diverse narratives from Amharic-speaking regions. Using this benchmark, we reveal a significant narrative understanding gap in existing LLMs, highlight pronounced regional differences in evaluation results, and show that supervised fine-tuning yields uneven improvements across regions and evaluation settings. Our findings emphasize the need for culturally grounded benchmarks that go beyond language-level evaluation to more accurately assess and improve narrative understanding in low-resource languages.
- [165] arXiv:2602.02776 [pdf, ps, other]
-
Title: VerIde ECG Biometrics: Verification and IdentificationAuthors: Scagnetto ArjunaSubjects: Machine Learning (cs.LG)
This work studies electrocardiogram (ECG) biometrics at large scale, evaluating how strongly an ECG can be linked to an individual and, consequently, how its anonymization may be compromised. We show that identity information is already present in tabular representations (fiducial features): even a simple MLP-based embedding network yields non-trivial performance, indicating that anonymization based solely on releasing features does not guarantee privacy. We then adopt embedding-based deep learning models (ArcFace), first on features and then on ECG waveforms, showing a performance jump when moving from tabular inputs to waveforms, and a further gain with larger training sets and consistent normalization across train/val/test. On a large-scale test set, verification achieves high TAR at strict FAR thresholds (TAR=0.908 @ FAR=1e-3; TAR=0.820 @ FAR=1e-4) with EER=2.53% (all-vs-all); closed-set identification yields Rank@1=0.812 and Rank@10=0.910. In open-set, a two-stage pipeline (top-K shortlist on embeddings + re-ranking) reaches DIR@FAR up to 0.976 at FAR=1e-3 and 1e-4. Overall, the results show that ECG carries a measurable individual signature: re-identification is already possible with tabular features and is further amplified by embedding-based models, making privacy implications and realistic operational protocols essential to consider.
- [166] arXiv:2602.02779 [pdf, ps, other]
-
Title: Comparison of Trefftz-Based PINNs and Standard PINNs Focusing on Structure PreservationAuthors: Koji KoyamadaSubjects: Numerical Analysis (math.NA)
In this study, we investigate the capability of physics-informed neural networks (PINNs) to preserve global physical structures by comparing standard PINNs with a Trefftz-based PINN (Trefftz-PINN). The target problem is the reproduction of mag-netic field-line structures in a helical fusion reactor configuration. Using identical training data sampled from exact solutions, we perform comparisons under matched mean squared error (MSE) levels. Visualization of magnetic field lines reveals that standard PINNs may exhibit structural collapse across magnetic surfaces even when the MSE is sufficiently small, whereas Trefftz-PINNs successfully preserve the global topology of magnetic field lines. Furthermore, the proposed framework is extended to computational fluid dynamics (CFD) problems, where streamline structures of veloc-ity fields are analyzed. Similar tendencies are observed, demonstrating that Trefftz-PINNs provide superior structure preservation compared to standard PINNs. These results indicate that minimizing numerical error alone does not guarantee physical consistency, and that constraining the solution space prior to learning is an effective strategy for physics-consistent surrogate modeling.
- [167] arXiv:2602.02780 [pdf, ps, other]
-
Title: Scaling-Aware Adapter for Structure-Grounded LLM ReasoningComments: Under review at ICML 2026Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large language models (LLMs) are enabling reasoning over biomolecular structures, yet existing methods remain modality-specific and typically compress structural inputs through sequence-based tokenization or fixed-length query connectors. Such architectures either omit the geometric groundings requisite for mitigating structural hallucinations or impose inflexible modality fusion bottlenecks that concurrently over-compress and suboptimally allocate structural tokens, thereby impeding the realization of generalized all-atom reasoning. We introduce Cuttlefish, a unified all-atom LLM that grounds language reasoning in geometric cues while scaling modality tokens with structural complexity. First, Scaling-Aware Patching leverages an instruction-conditioned gating mechanism to generate variable-size patches over structural graphs, adaptively scaling the query token budget with structural complexity to mitigate fixed-length connector bottlenecks. Second, Geometry Grounding Adapter refines these adaptive tokens via cross-attention to modality embeddings and injects the resulting modality tokens into the LLM, exposing explicit geometric cues to reduce structural hallucination. Experiments across diverse all-atom benchmarks demonstrate that Cuttlefish achieves superior performance in heterogeneous structure-grounded reasoning. Code is available at the project repository.
- [168] arXiv:2602.02781 [pdf, ps, other]
-
Title: Evaluating False Alarm and Missing Attacks in CAN IDSComments: 8 pages, 2 figures, and 8 tablesSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Modern vehicles rely on electronic control units (ECUs) interconnected through the Controller Area Network (CAN), making in-vehicle communication a critical security concern. Machine learning (ML)-based intrusion detection systems (IDS) are increasingly deployed to protect CAN traffic, yet their robustness against adversarial manipulation remains largely unexplored. We present a systematic adversarial evaluation of CAN IDS using the ROAD dataset, comparing four shallow learning models with a deep neural network-based detector. Using protocol-compliant, payload-level perturbations generated via FGSM, BIM and PGD, we evaluate adversarial effects on both benign and malicious CAN frames. While all models achieve strong baseline performance under benign conditions, adversarial perturbations reveal substantial vulnerabilities. Although shallow and deep models are robust to false-alarm induction, with the deep neural network (DNN) performing best on benign traffic, all architectures suffer significant increases in missed attacks. Notably, under gradient-based attacks, the shallow model extra trees (ET) demonstrates improved robustness to missed-attack induction compared to the other models. Our results demonstrate that adversarial manipulation can simultaneously trigger false alarms and evade detection, underscoring the need for adversarial robustness evaluation in safety-critical automotive IDS.
- [169] arXiv:2602.02784 [pdf, ps, other]
-
Title: Cross-Temporal Attention Fusion (CTAF) for Multimodal Physiological Signals in Self-Supervised LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We study multimodal affect modeling when EEG and peripheral physiology are asynchronous, which most fusion methods ignore or handle with costly warping. We propose Cross-Temporal Attention Fusion (CTAF), a self-supervised module that learns soft bidirectional alignments between modalities and builds a robust clip embedding using time-aware cross attention, a lightweight fusion gate, and alignment-regularized contrastive objectives with optional weak supervision. On the K-EmoCon dataset, under leave-one-out cross-validation evaluation, CTAF yields higher cosine margins for matched pairs and better cross-modal token retrieval within one second, and it is competitive with the baseline on three-bin accuracy and macro-F1 while using few labels. Our contributions are a time-aware fusion mechanism that directly models correspondence, an alignment-driven self-supervised objective tailored to EEG and physiology, and an evaluation protocol that measures alignment quality itself. Our approach accounts for the coupling between the central and autonomic nervous systems in psychophysiological time series. These results indicate that CTAF is a strong step toward label-efficient, generalizable EEG-peripheral fusion under temporal asynchrony.
- [170] arXiv:2602.02785 [pdf, ps, other]
-
Title: Smell with Genji: Rediscovering Human Perception through an Olfactory Game with AIAuthors: Awu Chen (MIT Media Lab), Vera Yu Wu (MIT Media Lab), Yunge Wen (New York University), Yaluo Wang (Harvard University), Jiaxuan Olivia Yin (Individual Researcher), Yichen Wang (Harvard University), Qian Xiang (Harvard University), Richard Zhang (MIT Media Lab), Paul Pu Liang (MIT Media Lab), Hiroshi Ishii (MIT Media Lab)Subjects: Human-Computer Interaction (cs.HC)
Olfaction plays an important role in human perception, yet its subjective and ephemeral nature makes it difficult to articulate, compare, and share across individuals. Traditional practices like the Japanese incense game Genji-ko offer one way to structure olfactory experience through shared interpretation. In this work, we present Smell with Genji, an AI-mediated olfactory interaction system that reinterprets Genji-ko as a collaborative human-AI sensory experience. By integrating a game setup, a mobile application, and an LLM-powered co-smelling partner equipped with olfactory sensing and LLM-based conversation, the system invites participants to compare scents and construct Genji-mon patterns, fostering reflection through a dialogue that highlights the alignment and discrepancies between human and machine perception. This work illustrates how sensing-enabled AI can participate in olfactory experience alongside users, pointing toward new possibilities for AI-supported sensory interaction and reflection in HCI.
- [171] arXiv:2602.02786 [pdf, ps, other]
-
Title: LEMON: Local Explanations via Modality-aware OptimizatioNAuthors: Yu Qin, Phillip Sloan, Raul Santos-Rodriguez, Majid Mirmehdi, Telmo de Menezes e Silva FilhoSubjects: Machine Learning (cs.LG)
Multimodal models are ubiquitous, yet existing explainability methods are often single-modal, architecture-dependent, or too computationally expensive to run at scale. We introduce LEMON (Local Explanations via Modality-aware OptimizatioN), a model-agnostic framework for local explanations of multimodal predictions. LEMON fits a single modality-aware surrogate with group-structured sparsity to produce unified explanations that disentangle modality-level contributions and feature-level attributions. The approach treats the predictor as a black box and is computationally efficient, requiring relatively few forward passes while remaining faithful under repeated perturbations. We evaluate LEMON on vision-language question answering and a clinical prediction task with image, text, and tabular inputs, comparing against representative multimodal baselines. Across backbones, LEMON achieves competitive deletion-based faithfulness while reducing black-box evaluations by 35-67 times and runtime by 2-8 times compared to strong multimodal baselines.
- [172] arXiv:2602.02787 [pdf, ps, other]
-
Title: Real-World Applications of AI in LTE and 5G-NR Network InfrastructureComments: 6 pages and 3 figuresSubjects: Networking and Internet Architecture (cs.NI)
Telecommunications networks generate extensive performance and environmental telemetry, yet most LTE and 5G-NR deployments still rely on static, manually engineered configurations. This limits adaptability in rural, nomadic, and bandwidth-constrained environments where traffic distributions, propagation characteristics, and user behavior fluctuate rapidly. Artificial Intelligence (AI), more specifically Machine Learning (ML) models, provide new opportunities to transition Radio Access Networks (RANs) from rigid, rule-based systems toward adaptive, self-optimizing infrastructures that can respond autonomously to these dynamics. This paper proposes a practical architecture incorporating AI-assisted planning, reinforcement-learning-based RAN optimization, real-time telemetry analytics, and digital-twin-based validation. In parallel, the paper addresses the challenge of delivering embodied-AI healthcare services, educational tools, and large language model (LLM) applications to communities with insufficient backhaul for cloud computing. We introduce an edge-hosted execution model in which applications run directly on LTE/5G-NR base stations using containers, reducing latency and bandwidth consumption while improving resilience. Together, these contributions demonstrate how AI can enhance network performance, reduce operational overhead, and expand access to advanced digital services, aligning with broader goals of sustainable and inclusive network development.
- [173] arXiv:2602.02788 [pdf, ps, other]
-
Title: Structure-Preserving Learning Improves Geometry Generalization in Neural PDEsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)
We aim to develop physics foundation models for science and engineering that provide real-time solutions to Partial Differential Equations (PDEs) which preserve structure and accuracy under adaptation to unseen geometries. To this end, we introduce General-Geometry Neural Whitney Forms (Geo-NeW): a data-driven finite element method. We jointly learn a differential operator and compatible reduced finite element spaces defined on the underlying geometry. The resulting model is solved to generate predictions, while exactly preserving physical conservation laws through Finite Element Exterior Calculus. Geometry enters the model as a discretized mesh both through a transformer-based encoding and as the basis for the learned finite element spaces. This explicitly connects the underlying geometry and imposed boundary conditions to the solution, providing a powerful inductive bias for learning neural PDEs, which we demonstrate improves generalization to unseen domains. We provide a novel parameterization of the constitutive model ensuring the existence and uniqueness of the solution. Our approach demonstrates state-of-the-art performance on several steady-state PDE benchmarks, and provides a significant improvement over conventional baselines on out-of-distribution geometries.
- [174] arXiv:2602.02790 [pdf, ps, other]
-
Title: Simulating Human Audiovisual Search BehaviorComments: 17 pages, 10 figures, CHI 2026Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Locating a target based on auditory and visual cues$\unicode{x2013}$such as finding a car in a crowded parking lot or identifying a speaker in a virtual meeting$\unicode{x2013}$requires balancing effort, time, and accuracy under uncertainty. Existing models of audiovisual search often treat perception and action in isolation, overlooking how people adaptively coordinate movement and sensory strategies. We present Sensonaut, a computational model of embodied audiovisual search. The core assumption is that people deploy their body and sensory systems in ways they believe will most efficiently improve their chances of locating a target, trading off time and effort under perceptual constraints. Our model formulates this as a resource-rational decision-making problem under partial observability. We validate the model against newly collected human data, showing that it reproduces both adaptive scaling of search time and effort under task complexity, occlusion, and distraction, and characteristic human errors. Our simulation of human-like resource-rational search informs the design of audiovisual interfaces that minimize search cost and cognitive load.
- [175] arXiv:2602.02793 [pdf, ps, other]
-
Title: Causality--Δ: Jacobian-Based Dependency Analysis in Flow Matching ModelsAuthors: Reza Rezvan (1), Gustav Gille (1), Moritz Schauer (1 and 2), Richard Torkar (1 and 2) ((1) Chalmers University of Technology, (2) University of Gothenburg)Comments: 11 pages, 5 figures. Code: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Flow matching learns a velocity field that transports a base distribution to data. We study how small latent perturbations propagate through these flows and show that Jacobian-vector products (JVPs) provide a practical lens on dependency structure in the generated features. We derive closed-form expressions for the optimal drift and its Jacobian in Gaussian and mixture-of-Gaussian settings, revealing that even globally nonlinear flows admit local affine structure. In low-dimensional synthetic benchmarks, numerical JVPs recover the analytical Jacobians. In image domains, composing the flow with an attribute classifier yields an attribute-level JVP estimator that recovers empirical correlations on MNIST and CelebA. Conditioning on small classifier-Jacobian norms reduces correlations in a way consistent with a hypothesized common-cause structure, while we emphasize that this conditioning is not a formal do intervention.
- [176] arXiv:2602.02794 [pdf, ps, other]
-
Title: Reshaping Perception Through Technology: From Ancient Script to Large Language ModelsComments: 14 pages, 0 figuresSubjects: Computers and Society (cs.CY)
Large language models are reshaping how we create and access information, yet we typically view perception as merely reactive to stimuli, overlooking how the physical qualities of different media uniquely shape cognition. Drawing on Marshall McLuhan's insight that the medium is the massage, we trace a lineage of technologies -- from DNA and the nervous system to language, writing, music, and now LLMs -- that mold perception in distinct ways. We observe that as technologies become more advanced and decoupled from our physiology, they introduce both greater creative potential and greater risk: they enable more efficient play, storage, and transmission, while also introducing artificiality and the potential for inauthenticity and manipulation. This tension is particularly acute with LLMs, which allow rapid, playful generation of content increasingly indistinguishable from human-created work. Noting that humans have a recurring tendency to project intelligence onto novel technologies (a pattern visible in ancient responses to writing), we argue that AI should be framed not as a competitor but as a medium that reshapes perceptual skills and enables new forms of creativity.
- [177] arXiv:2602.02799 [pdf, ps, other]
-
Title: Joint Learning of Hierarchical Neural Options and Abstract World ModelSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Building agents that can perform new skills by composing existing skills is a long-standing goal of AI agent research. Towards this end, we investigate how to efficiently acquire a sequence of skills, formalized as hierarchical neural options. However, existing model-free hierarchical reinforcement algorithms need a lot of data. We propose a novel method, which we call AgentOWL (Option and World model Learning Agent), that jointly learns -- in a sample efficient way -- an abstract world model (abstracting across both states and time) and a set of hierarchical neural options. We show, on a subset of Object-Centric Atari games, that our method can learn more skills using much less data than baseline methods.
- [178] arXiv:2602.02808 [pdf, ps, other]
-
Title: LmPT: Conditional Point Transformer for Anatomical Landmark Detection on 3D Point CloudsAuthors: Matteo Bastico, Pierre Onghena, David Ryckelynck, Beatriz Marcotegui, Santiago Velasco-Forero, Laurent Corté, Caroline Robine--Decourcelle, Etienne DecencièreComments: This paper has been accepted at International Symposium on Biomedical Imaging (ISBI) 2026Journal-ref: 2026 IEEE International Symposium on Biomedical Imaging (ISBI)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Accurate identification of anatomical landmarks is crucial for various medical applications. Traditional manual landmarking is time-consuming and prone to inter-observer variability, while rule-based methods are often tailored to specific geometries or limited sets of landmarks. In recent years, anatomical surfaces have been effectively represented as point clouds, which are lightweight structures composed of spatial coordinates. Following this strategy and to overcome the limitations of existing landmarking techniques, we propose Landmark Point Transformer (LmPT), a method for automatic anatomical landmark detection on point clouds that can leverage homologous bones from different species for translational research. The LmPT model incorporates a conditioning mechanism that enables adaptability to different input types to conduct cross-species learning. We focus the evaluation of our approach on femoral landmarking using both human and newly annotated dog femurs, demonstrating its generalization and effectiveness across species. The code and dog femur dataset will be publicly available at: https://github.com/Pierreoo/LandmarkPointTransformer.
- [179] arXiv:2602.02811 [pdf, ps, other]
-
Title: Efficient Counterfactual Estimation of Conditional Greeks via Malliavin-based Weak DerivativesSubjects: Computational Engineering, Finance, and Science (cs.CE)
We study counterfactual gradient estimation of conditional loss functionals of diffusion processes. In quantitative finance, these gradients are known as conditional Greeks: the sensitivity of expected market values, conditioned on some event of interest. The difficulty is that when the conditioning event has vanishing or zero probability, naive Monte Carlo estimators are prohibitively inefficient; kernel smoothing, though common, suffers from slow convergence. We propose a two-stage kernel-free methodology. First, we show using Malliavin calculus that the conditional loss functional of a diffusion process admits an exact representation as a Skorohod integral, yielding classical Monte-Carlo estimator variance and convergence rates. Second, we establish that a weak derivative estimate of the conditional loss functional with respect to model parameters can be evaluated algorithmically with constant variance, in contrast to the widely used score function method whose variance grows linearly in the sample path length. Together, these results yield an efficient framework for counterfactual conditional stochastic gradient algorithms and financial Greek computations in rare-event regimes.
- [180] arXiv:2602.02819 [pdf, ps, other]
-
Title: Membership Inference Attacks from Causal PrinciplesAuthors: Mathieu Even, Clément Berenfeld, Linus Bleistein, Tudor Cebere, Julie Josse, Aurélien BelletSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Membership Inference Attacks (MIAs) are widely used to quantify training data memorization and assess privacy risks. Standard evaluation requires repeated retraining, which is computationally costly for large models. One-run methods (single training with randomized data inclusion) and zero-run methods (post hoc evaluation) are often used instead, though their statistical validity remains unclear. To address this gap, we frame MIA evaluation as a causal inference problem, defining memorization as the causal effect of including a data point in the training set. This novel formulation reveals and formalizes key sources of bias in existing protocols: one-run methods suffer from interference between jointly included points, while zero-run evaluations popular for LLMs are confounded by non-random membership assignment. We derive causal analogues of standard MIA metrics and propose practical estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees. Experiments on real-world data show that our approach enables reliable memorization measurement even when retraining is impractical and under distribution shift, providing a principled foundation for privacy evaluation in modern AI systems.
- [181] arXiv:2602.02820 [pdf, ps, other]
-
Title: From Tokens to Numbers: Continuous Number Modeling for SVG GenerationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
For certain image generation tasks, vector graphics such as Scalable Vector Graphics (SVGs) offer clear benefits such as increased flexibility, size efficiency, and editing ease, but remain less explored than raster-based approaches. A core challenge is that the numerical, geometric parameters, which make up a large proportion of SVGs, are inefficiently encoded as long sequences of tokens. This slows training, reduces accuracy, and hurts generalization. To address these problems, we propose Continuous Number Modeling (CNM), an approach that directly models numbers as first-class, continuous values rather than discrete tokens. This formulation restores the mathematical elegance of the representation by aligning the model's inputs with the data's continuous nature, removing discretization artifacts introduced by token-based encoding. We then train a multimodal transformer on 2 million raster-to-SVG samples, followed by fine-tuning via reinforcement learning using perceptual feedback to further improve visual quality. Our approach improves training speed by over 30% while maintaining higher perceptual fidelity compared to alternative approaches. This work establishes CNM as a practical and efficient approach for high-quality vector generation, with potential for broader applications. We make our code available this http URL
- [182] arXiv:2602.02821 [pdf, ps, other]
-
Title: When Efficient Communication Explains ConvexitySubjects: Computation and Language (cs.CL); Information Theory (cs.IT)
Much recent work has argued that the variation in the languages of the world can be explained from the perspective of efficient communication; in particular, languages can be seen as optimally balancing competing pressures to be simple and to be informative. Focusing on the expression of meaning -- semantic typology -- the present paper asks what factors are responsible for successful explanations in terms of efficient communication. Using the Information Bottleneck (IB) approach to formalizing this trade-off, we first demonstrate and analyze a correlation between optimality in the IB sense and a novel generalization of convexity to this setting. In a second experiment, we manipulate various modeling parameters in the IB framework to determine which factors drive the correlation between convexity and optimality. We find that the convexity of the communicative need distribution plays an especially important role. These results move beyond showing that efficient communication can explain aspects of semantic typology into explanations for why that is the case by identifying which underlying factors are responsible.
- [183] arXiv:2602.02822 [pdf, ps, other]
-
Title: A Classical Linear $λ$-Calculus based on ContrapositionSubjects: Logic in Computer Science (cs.LO)
We present a novel linear $\lambda$-calculus for Classical Multiplicative Exponential Linear Logic (\MELL) along the lines of the propositions-as-types paradigm. Starting from the standard term assignment for Intuitionistic Multiplicative Linear Logic (IMLL), we observe that if we incorporate linear negation, its involutive nature implies that both $A\multimap B$ and $B^\perp\multimap A^\perp$ should have the same proofs. The introduction of a linear modus tollens rule, stating that from $B^\perp\multimap A^\perp$ and $A$ we may conclude $B$, allows one to recover classical MLL. Furthermore, a term assignment for this elimination rule, {the study of proof normalization in a $\lambda$-calculus with this elimination rule} prompts us to define the novel notion of contra-substitution $t\cos{a}s$. Introduced alongside linear substitution, contra-substitution denotes the term that results from ``grabbing'' the unique occurrence of $a$ in $t$ and ``pulling'' from it, in order to turn the term $t$ inside out (much like a sock) and then replacing $a$ with $s$. We call the one-sided natural deduction presentation of classical MLL, the $\lambda_{\rm MLL}$-calculus. Guided by the behavior of contra-substitution in the presence of the exponentials, we extend it to a similar presentation for MELL. We prove that this calculus is sound and complete with respect to MELL and that it satisfies the standard properties of a typed programming language: subject reduction, confluence and strong normalization. Moreover, we show that several well-known term assignments for classical logic can be encoded in $\lambda_{\rm MLL}$.
- [184] arXiv:2602.02823 [pdf, ps, other]
-
Title: R2-Router: A New Paradigm for LLM Routing with ReasoningSubjects: Computation and Language (cs.CL)
As LLMs proliferate with diverse capabilities and costs, LLM routing has emerged by learning to predict each LLM's quality and cost for a given query, then selecting the one with high quality and low cost. However, existing routers implicitly assume a single fixed quality and cost per LLM for each query, ignoring that the same LLM's quality varies with its output length. This causes routers to exclude powerful LLMs when their estimated cost exceeds the budget, missing the opportunity that these LLMs could still deliver high quality at reduced cost with shorter outputs. To address this, we introduce R2-Router, which treats output length budget as a controllable variable and jointly selects the best LLM and length budget, enforcing the budget via length-constrained instructions. This enables R2-Router to discover that a powerful LLM with constrained output can outperform a weaker LLM at comparable cost-efficient configurations invisible to prior methods. Together with the router framework, we construct R2-Bench, the first routing dataset capturing LLM behavior across diverse output length budgets. Experiments show that R2-Router achieves state-of-the-art performance at 4-5x lower cost compared with existing routers. This work opens a new direction: routing as reasoning, where routers evolve from reactive selectors to deliberate reasoners that explore which LLM to use and at what cost budget.
- [185] arXiv:2602.02824 [pdf, ps, other]
-
Title: CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference AlignmentSubjects: Computation and Language (cs.CL)
Pretrained knowledge memorized in LLMs raises critical concerns over safety and privacy, which has motivated LLM Unlearning as a technique for selectively removing the influences of undesirable knowledge. Existing approaches, rooted in Gradient Ascent (GA), often degrade general domain knowledge while relying on retention data or curated contrastive pairs, which can be either impractical or data and computationally prohibitive. Negative Preference Alignment has been explored for unlearning to tackle the limitations of GA, which, however, remains confined by its choice of reference model and shows undermined performance in realistic data settings. These limitations raise two key questions: i) Can we achieve effective unlearning that quantifies model confidence in undesirable knowledge and uses it to calibrate gradient updates more precisely, thus reducing catastrophic forgetting? ii) Can we make unlearning robust to data scarcity and length variation? We answer both questions affirmatively with CATNIP (Calibrated and Tokenized Negative Preference Alignment), a principled method that rescales unlearning effects in proportion to the model's token-level confidence, thus ensuring fine-grained control over forgetting. Extensive evaluations on MUSE and WMDP benchmarks demonstrated that our work enables effective unlearning without requiring retention data or contrastive unlearning response pairs, with stronger knowledge forgetting and preservation tradeoffs than state-of-the-art methods.
- [186] arXiv:2602.02827 [pdf, ps, other]
-
Title: Col-Bandit: Zero-Shot Query-Time Pruning for Late-Interaction RetrievalSubjects: Information Retrieval (cs.IR)
Multi-vector late-interaction retrievers such as ColBERT achieve state-of-the-art retrieval quality, but their query-time cost is dominated by exhaustively computing token-level MaxSim interactions for every candidate document. While approximating late interaction with single-vector representations reduces cost, it often incurs substantial accuracy loss. We introduce Col-Bandit, a query-time pruning algorithm that reduces this computational burden by casting reranking as a finite-population Top-$K$ identification problem. Col-Bandit maintains uncertainty-aware bounds over partially observed document scores and adaptively reveals only the (document, query token) MaxSim entries needed to determine the top results under statistical decision bounds with a tunable relaxation. Unlike coarse-grained approaches that prune entire documents or tokens offline, Col-Bandit sparsifies the interaction matrix on the fly. It operates as a zero-shot, drop-in layer over standard multi-vector systems, requiring no index modifications, offline preprocessing, or model retraining. Experiments on textual (BEIR) and multimodal (REAL-MM-RAG) benchmarks show that Col-Bandit preserves ranking fidelity while reducing MaxSim FLOPs by up to 5$\times$, indicating that dense late-interaction scoring contains substantial redundancy that can be identified and pruned efficiently at query time.
- [187] arXiv:2602.02828 [pdf, ps, other]
-
Title: A Single Revision Step Improves Token-Efficient LLM ReasoningSubjects: Machine Learning (cs.LG)
Large language models (LLMs) achieve higher accuracy on challenging reasoning tasks by scaling test-time compute through multiple trajectory sampling. However, standard aggregation methods like majority voting or individual confidence-based filtering face a fundamental "blind spot": they evaluate each trace in isolation. As problems scale in difficulty, models often generate hallucinated paths that exhibit misleadingly high confidence, causing the true solution to be suppressed by a narrow margin in traditional voting. We ask: can we enable traces to "peer-review" each other to resolve these near-miss errors?
We introduce Packet-Conditioned Revision (PACER), a training-free, inference-only framework that enables reasoning traces to revise their conclusions through a structured coordination step. After a preliminary screening of generated traces, PACER constructs a compact consensus packet containing (i) unique candidate answers, (ii) their aggregated confidence scores, and (iii) representative reasoning summaries for each candidate answer. Individual traces then perform a targeted self-review conditioned on this packet, allowing them to identify specific logical junctions where they diverged from the broader consensus and pivot if their original reasoning is found to be flawed. Final predictions are obtained via confidence-weighted voting over these revised trajectories. On challenging competitive math benchmarks such as AIME and BRUMO, PACER matches or exceeds the accuracy of 256-sample majority voting, significantly outperforming raw ensemble baselines by transforming simple consensus into a collaborative logical refinement process. - [188] arXiv:2602.02830 [pdf, ps, other]
-
Title: SC3D: Dynamic and Differentiable Causal Discovery for Temporal and Instantaneous GraphsComments: 8 pagesSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
Discovering causal structures from multivariate time series is a key problem because interactions span across multiple lags and possibly involve instantaneous dependencies. Additionally, the search space of the dynamic graphs is combinatorial in nature. In this study, we propose \textit{Stable Causal Dynamic Differentiable Discovery (SC3D)}, a two-stage differentiable framework that jointly learns lag-specific adjacency matrices and, if present, an instantaneous directed acyclic graph (DAG). In Stage 1, SC3D performs edge preselection through node-wise prediction to obtain masks for lagged and instantaneous edges, whereas Stage 2 refines these masks by optimizing a likelihood with sparsity along with enforcing acyclicity on the instantaneous block. Numerical results across synthetic and benchmark dynamical systems demonstrate that SC3D achieves improved stability and more accurate recovery of both lagged and instantaneous causal structures compared to existing temporal baselines.
- [189] arXiv:2602.02831 [pdf, ps, other]
-
Title: Adaptive Linear Path Model-Based DiffusionComments: ICRA 2026Subjects: Robotics (cs.RO)
The interest in combining model-based control approaches with diffusion models has been growing. Although we have seen many impressive robotic control results in difficult tasks, the performance of diffusion models is highly sensitive to the choice of scheduling parameters, making parameter tuning one of the most critical challenges. We introduce Linear Path Model-Based Diffusion (LP-MBD), which replaces the variance-preserving schedule with a flow-matching-inspired linear probability path. This yields a geometrically interpretable and decoupled parameterization that reduces tuning complexity and provides a stable foundation for adaptation. Building on this, we propose Adaptive LP-MBD (ALP-MBD), which leverages reinforcement learning to adjust diffusion steps and noise levels according to task complexity and environmental conditions. Across numerical studies, Brax benchmarks, and mobile-robot trajectory tracking, LP-MBD simplifies scheduling while maintaining strong performance, and ALP-MBD further improves robustness, adaptability, and real-time efficiency.
- [190] arXiv:2602.02832 [pdf, ps, other]
-
Title: Koopman Autoencoders with Continuous-Time Latent Dynamics for Fluid Dynamics ForecastingSubjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
Data-driven surrogate models have emerged as powerful tools for accelerating the simulation of turbulent flows. However, classical approaches which perform autoregressive rollouts often trade off between strong short-term accuracy and long-horizon stability. Koopman autoencoders, inspired by Koopman operator theory, provide a physics-based alternative by mapping nonlinear dynamics into a latent space where linear evolution is conducted. In practice, most existing formulations operate in a discrete-time setting, limiting temporal flexibility. In this work, we introduce a continuous-time Koopman framework that models latent evolution through numerical integration schemes. By allowing variable timesteps at inference, the method demonstrates robustness to temporal resolution and generalizes beyond training regimes. In addition, the learned dynamics closely adhere to the analytical matrix exponential solution, enabling efficient long-horizon forecasting. We evaluate the approach on classical CFD benchmarks and report accuracy, stability, and extrapolation properties.
- [191] arXiv:2602.02834 [pdf, ps, other]
-
Title: Tabula RASA: Exposing and Breaking the Relational Bottleneck in TransformersComments: 16 pages, 4 figures, 8 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Transformers achieve remarkable performance across many domains, yet struggle with tasks requiring multi-hop relational reasoning over structured data. We analyze this limitation through circuit complexity: standard transformers are $\mathsf{TC}^0$-complete and require $\Omega(k)$ layers for $k$-hop reasoning. We introduce RASA (Relation-Aware Sparse Attention), a minimal modification adding: (1) edge-type embeddings that inject relational structure into attention scores, and (2) sparse masking that restricts attention to graph-adjacent positions. While RASA has the same asymptotic depth requirements, sparse masking reduces the attention search space from $O(2^{n^2})$ to $O(2^m)$ patterns, and edge biases provide explicit relation routing. Empirically, on MetaQA (1/2/3-hop) and WebQuestionsSP, RASA outperforms standard transformers and matches GPT-4 at lower cost, with advantages growing with reasoning depth (+7.1 points on 3-hop). We do not claim formal learnability guarantees; the contribution is empirical validation that minimal structural modifications substantially improve multi-hop reasoning.
- [192] arXiv:2602.02838 [pdf, ps, other]
-
Title: Beyond Content: Behavioral Policies Reveal Actors in Information OperationsSubjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
The detection of online influence operations -- coordinated campaigns by malicious actors to spread narratives -- has traditionally depended on content analysis or network features. These approaches are increasingly brittle as generative models produce convincing text, platforms restrict access to behavioral data, and actors migrate to less-regulated spaces. We introduce a platform-agnostic framework that identifies malicious actors from their behavioral policies by modeling user activity as sequential decision processes. We apply this approach to 12,064 Reddit users, including 99 accounts linked to the Russian Internet Research Agency in Reddit's 2017 transparency report, analyzing over 38 million activity steps from 2015-2018. Activity-based representations, which model how users act rather than what they post, consistently outperform content models in detecting malicious accounts. When distinguishing trolls -- users engaged in coordinated manipulation -- from ordinary users, policy-based classifiers achieve a median macro-$F_1$ of 94.9%, compared to 91.2% for text embeddings. Policy features also enable earlier detection from short traces and degrade more gracefully under evasion strategies or data corruption. These findings show that behavioral dynamics encode stable, discriminative signals of manipulation and point to resilient, cross-platform detection strategies in the era of synthetic content and limited data access.
- [193] arXiv:2602.02839 [pdf, ps, other]
-
Title: Language Movement Primitives: Grounding Language Models in Robot MotionSubjects: Robotics (cs.RO)
Enabling robots to perform novel manipulation tasks from natural language instructions remains a fundamental challenge in robotics, despite significant progress in generalized problem solving with foundational models. Large vision and language models (VLMs) are capable of processing high-dimensional input data for visual scene and language understanding, as well as decomposing tasks into a sequence of logical steps; however, they struggle to ground those steps in embodied robot motion. On the other hand, robotics foundation models output action commands, but require in-domain fine-tuning or experience before they are able to perform novel tasks successfully. At its core, there still remains the fundamental challenge of connecting abstract task reasoning with low-level motion control. To address this disconnect, we propose Language Movement Primitives (LMPs), a framework that grounds VLM reasoning in Dynamic Movement Primitive (DMP) parameterization. Our key insight is that DMPs provide a small number of interpretable parameters, and VLMs can set these parameters to specify diverse, continuous, and stable trajectories. Put another way: VLMs can reason over free-form natural language task descriptions, and semantically ground their desired motions into DMPs -- bridging the gap between high-level task reasoning and low-level position and velocity control. Building on this combination of VLMs and DMPs, we formulate our LMP pipeline for zero-shot robot manipulation that effectively completes tabletop manipulation problems by generating a sequence of DMP motions. Across 20 real-world manipulation tasks, we show that LMP achieves 80% task success as compared to 31% for the best-performing baseline. See videos at our website: https://collab.me.vt.edu/lmp
- [194] arXiv:2602.02841 [pdf, ps, other]
-
Title: Semantics-Aware Generative Latent Data Augmentation for Learning in Low-Resource DomainsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Despite strong performance in data-rich regimes, deep learning often underperforms in the data-scarce settings common in practice. While foundation models (FMs) trained on massive datasets demonstrate strong generalization by extracting general-purpose features, they can still suffer from scarce labeled data during downstream fine-tuning. To address this, we propose GeLDA, a semantics-aware generative latent data augmentation framework that leverages conditional diffusion models to synthesize samples in an FM-induced latent space. Because this space is low-dimensional and concentrates task-relevant information compared to the input space, GeLDA enables efficient, high-quality data generation. GeLDA conditions generation on auxiliary feature vectors that capture semantic relationships among classes or subdomains, facilitating data augmentation in low-resource domains. We validate GeLDA in two large-scale recognition tasks: (a) in zero-shot language-specific speech emotion recognition, GeLDA improves the Whisper-large baseline's unweighted average recall by 6.13%; and (b) in long-tailed image classification, it achieves 74.7% tail-class accuracy on ImageNet-LT, setting a new state-of-the-art result.
- [195] arXiv:2602.02842 [pdf, ps, other]
-
Title: Chain of Simulation: A Dual-Mode Reasoning Framework for Large Language Models with Dynamic Problem RoutingAuthors: Saeid SheikhiSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
We present Chain of Simulation (CoS), a novel dual-mode reasoning framework that dynamically routes problems to specialized reasoning strategies in Large Language Models (LLMs). Unlike existing uniform prompting approaches, CoS employs three distinct reasoning modes: (1) computational flow with self-consistency for mathematical problems, (2) symbolic state tracking with JSON representations for spatial reasoning, and (3) hybrid fact-extraction for multi-hop inference. Through comprehensive evaluation on GSM8K, StrategyQA, and bAbI benchmarks using four state-of-the-art models (Gemma-3 27B, LLaMA-3.1 8B, Mistral 7B, and Qwen-2.5 14B), we demonstrate that CoS achieves 71.5% accuracy on GSM8K (1.0% absolute improvement), 90.0% on StrategyQA (2.5% improvement), and 19.0% on bAbI (65.2% relative improvement) compared to the strongest baselines. The analysis reveals that problem-specific mode selection is crucial, with computational mode achieving 81.2% accuracy when correctly applied to mathematical problems, while misrouting leads to 0% accuracy. We provide detailed algorithms for mode selection, state tracking, and answer extraction, establishing CoS as an effective approach for improving LLM reasoning without additional training. The framework provides superior trade-offs between accuracy and efficiency compared to Self-Consistency, achieving comparable performance at 54% lower computational cost.
- [196] arXiv:2602.02843 [pdf, ps, other]
-
Title: Act or Clarify? Modeling Sensitivity to Uncertainty and Cost in CommunicationComments: 6 pages, 3 figures, under reviewSubjects: Computation and Language (cs.CL)
When deciding how to act under uncertainty, agents may choose to act to reduce uncertainty or they may act despite that uncertainty.In communicative settings, an important way of reducing uncertainty is by asking clarification questions (CQs). We predict that the decision to ask a CQ depends on both contextual uncertainty and the cost of alternative actions, and that these factors interact: uncertainty should matter most when acting incorrectly is costly. We formalize this interaction in a computational model based on expected regret: how much an agent stands to lose by acting now rather than with full information. We test these predictions in two experiments, one examining purely linguistic responses to questions and another extending to choices between clarification and non-linguistic action. Taken together, our results suggest a rational tradeoff: humans tend to seek clarification proportional to the risk of substantial loss when acting under uncertainty.
- [197] arXiv:2602.02846 [pdf, ps, other]
-
Title: Kino-PAX$^+$: Near-Optimal Massively Parallel Kinodynamic Sampling-based Motion PlannerComments: 10 pages, 8 figuresSubjects: Robotics (cs.RO); Distributed, Parallel, and Cluster Computing (cs.DC)
Sampling-based motion planners (SBMPs) are widely used for robot motion planning with complex kinodynamic constraints in high-dimensional spaces, yet they struggle to achieve \emph{real-time} performance due to their serial computation design. Recent efforts to parallelize SBMPs have achieved significant speedups in finding feasible solutions; however, they provide no guarantees of optimizing an objective function. We introduce Kino-PAX$^{+}$, a massively parallel kinodynamic SBMP with asymptotic near-optimal guarantees. Kino-PAX$^{+}$ builds a sparse tree of dynamically feasible trajectories by decomposing traditionally serial operations into three massively parallel subroutines. The algorithm focuses computation on the most promising nodes within local neighborhoods for propagation and refinement, enabling rapid improvement of solution cost. We prove that, while maintaining probabilistic $\delta$-robust completeness, this focus on promising nodes ensures asymptotic $\delta$-robust near-optimality. Our results show that Kino-PAX$^{+}$ finds solutions up to three orders of magnitude faster than existing serial methods and achieves lower solution costs than a state-of-the-art GPU-based planner.
- [198] arXiv:2602.02847 [pdf, ps, other]
-
Title: Causal Flow Q-Learning for Robust Offline Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Expressive policies based on flow-matching have been successfully applied in reinforcement learning (RL) more recently due to their ability to model complex action distributions from offline data. These algorithms build on standard policy gradients, which assume that there is no unmeasured confounding in the data. However, this condition does not necessarily hold for pixel-based demonstrations when a mismatch exists between the demonstrator's and the learner's sensory capabilities, leading to implicit confounding biases in offline data. We address the challenge by investigating the problem of confounded observations in offline RL from a causal perspective. We develop a novel causal offline RL objective that optimizes policies' worst-case performance that may arise due to confounding biases. Based on this new objective, we introduce a practical implementation that learns expressive flow-matching policies from confounded demonstrations, employing a deep discriminator to assess the discrepancy between the target policy and the nominal behavioral policy. Experiments across 25 pixel-based tasks demonstrate that our proposed confounding-robust augmentation procedure achieves a success rate 120\% that of confounding-unaware, state-of-the-art offline RL methods.
- [199] arXiv:2602.02848 [pdf, ps, other]
-
Title: Zero Sum SVD: Balancing Loss Sensitivity for Low Rank LLM CompressionSubjects: Machine Learning (cs.LG)
Advances in large language models have driven strong performance across many tasks, but their memory and compute costs still hinder deployment. SVD-based compression reduces storage and can speed up inference via low-rank factors, yet performance depends on how rank is allocated under a global compression ratio. Prior methods often use homogeneous ranks for similarly sized matrices, despite large differences in loss sensitivity, or rely on expensive iterative pre-truncation optimization to determine per matrix ranks. We propose \textbf{Zero Sum SVD} (\textbf{ZS-SVD}), a post-training method that performs \emph{global} singular component selection using activation whitening and first-order calibration loss estimates in whitened coordinates. \textbf{ZS-SVD} prunes components across the whole model with a \textbf{zero sum} rule that keeps the cumulative predicted loss change near zero, automatically yielding heterogeneous ranks without solving a rank allocation optimization. Motivated by evidence that gradients near pretrained solutions exhibit low rank structure, we also introduce an optional lightweight correction that applies a \textbf{single} projected gradient update after truncation, followed by re-truncation. Extensive experiments across multiple LLM architectures show consistent gains across diverse benchmarks and compression ratios. Code is available at https://github.com/mint-vu/Zero-Sum-SVD
- [200] arXiv:2602.02849 [pdf, ps, other]
-
Title: AutoSizer: Automatic Sizing of Analog and Mixed-Signal Circuits via Large Language Model (LLM) AgentsSubjects: Artificial Intelligence (cs.AI)
The design of Analog and Mixed-Signal (AMS) integrated circuits remains heavily reliant on expert knowledge, with transistor sizing a major bottleneck due to nonlinear behavior, high-dimensional design spaces, and strict performance constraints. Existing Electronic Design Automation (EDA) methods typically frame sizing as static black-box optimization, resulting in inefficient and less robust solutions. Although Large Language Models (LLMs) exhibit strong reasoning abilities, they are not suited for precise numerical optimization in AMS sizing. To address this gap, we propose AutoSizer, a reflective LLM-driven meta-optimization framework that unifies circuit understanding, adaptive search-space construction, and optimization orchestration in a closed loop. It employs a two-loop optimization framework, with an inner loop for circuit sizing and an outer loop that analyzes optimization dynamics and constraints to iteratively refine the search space from simulation feedback. We further introduce AMS-SizingBench, an open benchmark comprising 24 diverse AMS circuits in SKY130 CMOS technology, designed to evaluate adaptive optimization policies under realistic simulator-based constraints. AutoSizer experimentally achieves higher solution quality, faster convergence, and higher success rate across varying circuit difficulties, outperforming both traditional optimization methods and existing LLM-based agents.
- [201] arXiv:2602.02850 [pdf, ps, other]
-
Title: Self-Supervised Uncalibrated Multi-View Video Anonymization in the Operating RoomAuthors: Keqi Chen, Vinkle Srivastav, Armine Vardazaryan, Cindy Rolland, Didier Mutter, Nicolas PadoySubjects: Computer Vision and Pattern Recognition (cs.CV)
Privacy preservation is a prerequisite for using video data in Operating Room (OR) research. Effective anonymization relies on the exhaustive localization of every individual; even a single missed detection necessitates extensive manual correction. However, existing approaches face two critical scalability bottlenecks: (1) they usually require manual annotations of each new clinical site for high accuracy; (2) while multi-camera setups have been widely adopted to address single-view ambiguity, camera calibration is typically required whenever cameras are repositioned. To address these problems, we propose a novel self-supervised multi-view video anonymization framework consisting of whole-body person detection and whole-body pose estimation, without annotation or camera calibration. Our core strategy is to enhance the single-view detector by "retrieving" false negatives using temporal and multi-view context, and conducting self-supervised domain adaptation. We first run an off-the-shelf whole-body person detector in each view with a low-score threshold to gather candidate detections. Then, we retrieve the low-score false negatives that exhibit consistency with the high-score detections via tracking and self-supervised uncalibrated multi-view association. These recovered detections serve as pseudo labels to iteratively fine-tune the whole-body detector. Finally, we apply whole-body pose estimation on each detected person, and fine-tune the pose model using its own high-score predictions. Experiments on the 4D-OR dataset of simulated surgeries and our dataset of real surgeries show the effectiveness of our approach achieving over 97% recall. Moreover, we train a real-time whole-body detector using our pseudo labels, achieving comparable performance and highlighting our method's practical applicability. Code is available at https://github.com/CAMMA-public/OR_anonymization.
- [202] arXiv:2602.02853 [pdf, ps, other]
-
Title: Recurrent Equivariant Constraint Modulation: Learning Per-Layer Symmetry Relaxation from DataSubjects: Machine Learning (cs.LG)
Equivariant neural networks exploit underlying task symmetries to improve generalization, but strict equivariance constraints can induce more complex optimization dynamics that can hinder learning. Prior work addresses these limitations by relaxing strict equivariance during training, but typically relies on prespecified, explicit, or implicit target levels of relaxation for each network layer, which are task-dependent and costly to tune. We propose Recurrent Equivariant Constraint Modulation (RECM), a layer-wise constraint modulation mechanism that learns appropriate relaxation levels solely from the training signal and the symmetry properties of each layer's input-target distribution, without requiring any prior knowledge about the task-dependent target relaxation level. We demonstrate that under the proposed RECM update, the relaxation level of each layer provably converges to a value upper-bounded by its symmetry gap, namely the degree to which its input-target distribution deviates from exact symmetry. Consequently, layers processing symmetric distributions recover full equivariance, while those with approximate symmetries retain sufficient flexibility to learn non-symmetric solutions when warranted by the data. Empirically, RECM outperforms prior methods across diverse exact and approximate equivariant tasks, including the challenging molecular conformer generation on the GEOM-Drugs dataset.
- [203] arXiv:2602.02855 [pdf, ps, other]
-
Title: When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index modelsSubjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistics Theory (math.ST)
Pre-training on a source task is usually expected to facilitate fine-tuning on similar downstream problems. In this work, we mathematically show that this naive intuition is not always true: excessive pre-training can computationally slow down fine-tuning optimization. We study this phenomenon for low-rank adaptation (LoRA) fine-tuning on single-index models trained under one-pass SGD. Leveraging a summary statistics description of the fine-tuning dynamics, we precisely characterize how the convergence rate depends on the initial fine-tuning alignment and the degree of non-linearity of the target task. The key take away is that even when the pre-training and down- stream tasks are well aligned, strong pre-training can induce a prolonged search phase and hinder convergence. Our theory thus provides a unified picture of how pre-training strength and task difficulty jointly shape the dynamics and limitations of LoRA fine-tuning in a nontrivial tractable model.
- [204] arXiv:2602.02857 [pdf, ps, other]
-
Title: Latent Perspective-Taking via a Schrödinger Bridge in Influence-Augmented Local ModelsComments: Extended Abstract & Poster, Presented at World Modeling Workshop 2026Subjects: Robotics (cs.RO)
Operating in environments alongside humans requires robots to make decisions under uncertainty. In addition to exogenous dynamics, they must reason over others' hidden mental-models and mental-states. While Interactive POMDPs and Bayesian Theory of Mind formulations are principled, exact nested-belief inference is intractable, and hand-specified models are brittle in open-world settings. We address both by learning structured mental-models and an estimator of others' mental-states. Building on the Influence-Based Abstraction, we instantiate an Influence-Augmented Local Model to decompose socially-aware robot tasks into local dynamics, social influences, and exogenous factors. We propose (a) a neuro-symbolic world model instantiating a factored, discrete Dynamic Bayesian Network, and (b) a perspective-shift operator modeled as an amortized Schr\"odinger Bridge over the learned local dynamics that transports factored egocentric beliefs into other-centric beliefs. We show that this architecture enables agents to synthesize socially-aware policies in model-based reinforcement learning, via decision-time mental-state planning (a Schr\"odinger Bridge in belief space), with preliminary results in a MiniGrid social navigation task.
- [205] arXiv:2602.02858 [pdf, ps, other]
-
Title: IMAGINE: Intelligent Multi-Agent Godot-based Indoor Networked ExplorationComments: 12 pages, submitted to a journalSubjects: Robotics (cs.RO); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
The exploration of unknown, Global Navigation Satellite System (GNSS) denied environments by an autonomous communication-aware and collaborative group of Unmanned Aerial Vehicles (UAVs) presents significant challenges in coordination, perception, and decentralized decision-making. This paper implements Multi-Agent Reinforcement Learning (MARL) to address these challenges in a 2D indoor environment, using high-fidelity game-engine simulations (Godot) and continuous action spaces. Policy training aims to achieve emergent collaborative behaviours and decision-making under uncertainty using Network-Distributed Partially Observable Markov Decision Processes (ND-POMDPs). Each UAV is equipped with a Light Detection and Ranging (LiDAR) sensor and can share data (sensor measurements and a local occupancy map) with neighbouring agents. Inter-agent communication constraints include limited range, bandwidth and latency. Extensive ablation studies evaluated MARL training paradigms, reward function, communication system, neural network (NN) architecture, memory mechanisms, and POMDP formulations. This work jointly addresses several key limitations in prior research, namely reliance on discrete actions, single-agent or centralized formulations, assumptions of a priori knowledge and permanent connectivity, inability to handle dynamic obstacles, short planning horizons and architectural complexity in Recurrent NNs/Transformers. Results show that the scalable training paradigm, combined with a simplified architecture, enables rapid autonomous exploration of an indoor area. The implementation of Curriculum-Learning (five increasingly complex levels) also enabled faster, more robust training. This combination of high-fidelity simulation, MARL formulation, and computational efficiency establishes a strong foundation for deploying learned cooperative strategies in physical robotic systems.
- [206] arXiv:2602.02859 [pdf, ps, other]
-
Title: Late-Stage Generalization Collapse in Grokking: Detecting anti-grokking with WeightwatcherComments: 27 pagesSubjects: Machine Learning (cs.LG)
\emph{Memorization} in neural networks lacks a precise operational definition and is often inferred from the grokking regime, where training accuracy saturates while test accuracy remains very low. We identify a previously unreported third phase of grokking in this training regime: \emph{anti-grokking}, a late-stage collapse of generalization.
We revisit two canonical grokking setups: a 3-layer MLP trained on a subset of MNIST and a transformer trained on modular addition, but extended training far beyond standard. In both cases, after models transition from pre-grokking to successful generalization, test accuracy collapses back to chance while training accuracy remains perfect, indicating a distinct post-generalization failure mode.
To diagnose anti-grokking, we use the open-source \texttt{WeightWatcher} tool based on HTSR/SETOL theory. The primary signal is the emergence of \emph{Correlation Traps}: anomalously large eigenvalues beyond the Marchenko--Pastur bulk in the empirical spectral density of shuffled weight matrices, which are predicted to impair generalization. As a secondary signal, anti-grokking corresponds to the average HTSR layer quality metric $\alpha$ deviating from $2.0$. Neither metric requires access to the test or training data.
We compare these signals to alternative grokking diagnostic, including $\ell_2$ norms, Activation Sparsity, Absolute Weight Entropy, and Local Circuit Complexity. These track pre-grokking and grokking but fail to identify anti-grokking. Finally, we show that Correlation Traps can induce catastrophic forgetting and/or prototype memorization, and observe similar pathologies in large-scale LLMs, like OSS GPT 20/120B. - [207] arXiv:2602.02862 [pdf, ps, other]
-
Title: STEER: Inference-Time Risk Control via Constrained Quality-Diversity SearchComments: 20 pagesSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large Language Models (LLMs) trained for average correctness often exhibit mode collapse, producing narrow decision behaviors on tasks where multiple responses may be reasonable. This limitation is particularly problematic in ordinal decision settings such as clinical triage, where standard alignment removes the ability to trade off specificity and sensitivity (the ROC operating point) based on contextual constraints. We propose STEER (Steerable Tuning via Evolutionary Ensemble Refinement), a training-free framework that reintroduces this tunable control. STEER constructs a population of natural-language personas through an offline, constrained quality-diversity search that promotes behavioral coverage while enforcing minimum safety, reasoning, and stability thresholds. At inference time, STEER exposes a single, interpretable control parameter that maps a user-specified risk percentile to a selected persona, yielding a monotonic adjustment of decision conservativeness. On two clinical triage benchmarks, STEER achieves broader behavioral coverage compared to temperature-based sampling and static persona ensembles. Compared to a representative post-training method, STEER maintains substantially higher accuracy on unambiguous urgent cases while providing comparable control over ambiguous decisions. These results demonstrate STEER as a safety-preserving paradigm for risk control, capable of steering behavior without compromising domain competence.
- [208] arXiv:2602.02863 [pdf, ps, other]
-
Title: "I May Not Have Articulated Myself Clearly": Diagnosing Dynamic Instability in LLM Reasoning at Inference TimeComments: 21 pages, 12 figures, 15 tablesSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Reasoning failures in large language models (LLMs) are typically measured only at the end of a generation, yet many failures manifest as a process-level breakdown: the model "loses the thread" mid-reasoning. We study whether such breakdowns are detectable from inference-time observables available in standard APIs (token log probabilities), without any training or fine-tuning. We define a simple instability signal that combines consecutive-step distributional shift (JSD) and uncertainty (entropy), summarize each trace by its peak instability strength, and show that this signal reliably predicts failure. Across GSM8K and HotpotQA, instability strength predicts wrong answers with above-chance AUC and yields monotonic bucket-level accuracy decline at scale across model sizes. Crucially, we show that instability is not uniformly harmful: early instability can reflect subsequent stabilization and a correct final answer (\emph{corrective instability}), whereas late instability is more often followed by failure (\emph{destructive instability}), even at comparable peak magnitudes, indicating that recoverability depends not only on how strongly the distribution changes but also on when such changes occur relative to the remaining decoding horizon. The method is model-agnostic, training-free, and reproducible, and is presented as a diagnostic lens rather than a corrective or control mechanism.
- [209] arXiv:2602.02864 [pdf, ps, other]
-
Title: Accelerating Structured Chain-of-Thought in Autonomous VehiclesAuthors: Yi Gu, Yan Wang, Yuxiao Chen, Yurong You, Wenjie Luo, Yue Wang, Wenhao Ding, Boyi Li, Heng Yang, Boris Ivanovic, Marco PavoneSubjects: Robotics (cs.RO)
Chain-of-Thought (CoT) reasoning enhances the decision-making capabilities of vision-language-action models in autonomous driving, but its autoregressive nature introduces significant inference latency, making it impractical for real-time applications. To address this, we introduce FastDriveCoT, a novel parallel decoding method that accelerates template-structured CoT. Our approach decomposes the reasoning process into a dependency graph of distinct sub-tasks, such as identifying critical objects and summarizing traffic rules, some of which can be generated in parallel. By generating multiple independent reasoning steps concurrently within a single forward pass, we significantly reduce the number of sequential computations. Experiments demonstrate a 3-4$\times$ speedup in CoT generation and a substantial reduction in end-to-end latency across various model architectures, all while preserving the original downstream task improvements brought by incorporating CoT reasoning.
- [210] arXiv:2602.02866 [pdf, ps, other]
-
Title: Estimation of Cell-to-Cell Variation and State of Health for Battery Modules with Parallel-Connected CellsSubjects: Systems and Control (eess.SY)
Estimating cell-to-cell variation (CtCV) and state of health (SoH) for battery modules with parallel-connected cells is challenging when only module-level signals are measurable and individual cell behaviors remain unobserved. Although progress has been made in SoH estimation, CtCV estimation remains unresolved in the literature. This paper proposes a unified framework that accurately estimates both CtCV and SoH for modules using only module-level information extracted from incremental capacity analysis (ICA) and differential voltage analysis (DVA). With the proposed framework, CtCV and SoH estimations can be decoupled into two separate tasks, allowing each to be solved with dedicated algorithms without mutual interference and providing greater design flexibility. The framework also exhibits strong versatility in accommodating different CtCV metrics, highlighting its general-purpose nature. Experimental validation on modules with three parallel-connected cells demonstrates that the proposed framework can systematically select optimal module-level features for CtCV and SoH estimations, deliver accurate CtCV and SoH estimates with high confidence and low computational complexity, remain effective across different C-rates, and be suitable for onboard implementation.
- [211] arXiv:2602.02869 [pdf, ps, other]
-
Title: A Proxy Stakeholder Approach to Requirements Engineering for Inclusive NavigationSubjects: Software Engineering (cs.SE)
Wayfinding, or the ability to navigate one's surroundings, is crucial for independent living and requires a complex combination of cognitive abilities, environmental awareness, and technology to manage this successfully. Individuals with cognitive impairment (IwCI) often face significant challenges in learning and navigating their environment. Despite its importance, mainstream navigation technologies are rarely designed with their diverse needs in mind. This study reframes the search for places as a socially distributed task and emphasizes the role of proxy stakeholders, who act on behalf or in coordination with IwCI during navigation. Using a qualitatively led mixed-methods approach, which includes an international survey and a three-stage interview study, we examine the real-world strategies that proxy stakeholders employ to support daily navigation. The findings are synthesized into a set of empirically grounded design recommendations that emphasize customisability, collaborative use, and support for routine-based navigation. Our findings highlight key challenges and adaptive practices, which are synthesized into design recommendations that prioritize customisability, routine-based navigation, and multi-user coordination. By introducing the proxy stakeholder concept into the software engineering literature, we propose a more inclusive approach to requirements elicitation and offer practical guidance for designing navigation technologies that better reflect the complex realities of cognitive support.
- [212] arXiv:2602.02873 [pdf, ps, other]
-
Title: ViThinker: Active Vision-Language Reasoning via Dynamic Perceptual QueryingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Chain-of-Thought (CoT) reasoning excels in language models but struggles in vision-language models due to premature visual-to-text conversion that discards continuous information such as geometry and spatial layout. While recent methods enhance CoT through static enumeration or attention-based selection, they remain passive, i.e., processing pre-computed inputs rather than actively seeking task-relevant details. Inspired by human active perception, we introduce ViThinker, a framework that enables vision-language models to autonomously generate decision (query) tokens triggering the synthesis of expert-aligned visual features on demand. ViThinker internalizes vision-expert capabilities during training, performing generative mental simulation during inference without external tool calls. Through a two-stage curriculum: first distilling frozen experts into model parameters, then learning task-driven querying via sparsity penalties, i.e., ViThinker discovers minimal sufficient perception for each reasoning step. Evaluations across vision-centric benchmarks demonstrate consistent improvements, validating that active query generation outperforms passive approaches in both perceptual grounding and reasoning accuracy.
- [213] arXiv:2602.02877 [pdf, ps, other]
-
Title: A Geometry-Aware Efficient Algorithm for Compositional Entropic Risk MinimizationComments: 36 pages, 7 figuresSubjects: Machine Learning (cs.LG)
This paper studies optimization for a family of problems termed $\textbf{compositional entropic risk minimization}$, in which each data's loss is formulated as a Log-Expectation-Exponential (Log-E-Exp) function. The Log-E-Exp formulation serves as an abstraction of the Log-Sum-Exponential (LogSumExp) function when the explicit summation inside the logarithm is taken over a gigantic number of items and is therefore expensive to evaluate. While entropic risk objectives of this form arise in many machine learning problems, existing optimization algorithms suffer from several fundamental limitations including non-convergence, numerical instability, and slow convergence rates. To address these limitations, we propose a geometry-aware stochastic algorithm, termed $\textbf{SCENT}$, for the dual formulation of entropic risk minimization cast as a min--min optimization problem. The key to our design is a $\textbf{stochastic proximal mirror descent (SPMD)}$ update for the dual variable, equipped with a Bregman divergence induced by a negative exponential function that faithfully captures the geometry of the objective. Our main contributions are threefold: (i) we establish an $O(1/\sqrt{T})$ convergence rate of the proposed SCENT algorithm for convex problems; (ii) we theoretically characterize the advantages of SPMD over standard SGD update for optimizing the dual variable; and (iii) we demonstrate the empirical effectiveness of SCENT on extreme classification, partial AUC maximization, contrastive learning and distributionally robust optimization, where it consistently outperforms existing baselines.
- [214] arXiv:2602.02878 [pdf, ps, other]
-
Title: Which course? Discourse! Teaching Discourse and Generation in the Era of LLMsComments: accepted to the TeachNLP 2026 workshop (co-located with EACL 2026), camera-ready, 14 pagesSubjects: Computation and Language (cs.CL)
The field of NLP has undergone vast, continuous transformations over the past few years, sparking debates going beyond discipline boundaries. This begs important questions in education: how do we design courses that bridge sub-disciplines in this shifting landscape? This paper explores this question from the angle of discourse processing, an area with rich linguistic insights and computational models for the intentional, attentional, and coherence structure of language. Discourse is highly relevant for open-ended or long-form text generation, yet this connection is under-explored in existing undergraduate curricula. We present a new course, "Computational Discourse and Natural Language Generation". The course is collaboratively designed by a team with complementary expertise and was offered for the first time in Fall 2025 as an upper-level undergraduate course, cross-listed between Linguistics and Computer Science. Our philosophy is to deeply integrate the theoretical and empirical aspects, and create an exploratory mindset inside the classroom and in the assignments. This paper describes the course in detail and concludes with takeaways from an independent survey as well as our vision for future directions.
- [215] arXiv:2602.02881 [pdf, ps, other]
-
Title: Learning-Infused Formal Reasoning: From Contract Synthesis to Artifact Reuse and Formal SemanticsComments: 18 pages. Accepted at VERIFAI-2026: The Interplay between Artificial Intelligence and Software Verification LASER center, Villebrumier, France, March 8-11, 2026Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
This vision paper articulates a long-term research agenda for formal methods at the intersection with artificial intelligence, outlining multiple conceptual and technical dimensions and reporting on our ongoing work toward realising this agenda. It advances a forward-looking perspective on the next generation of formal methods based on the integration of automated contract synthesis, semantic artifact reuse, and refinement-based theory. We argue that future verification systems must move beyond isolated correctness proofs toward a cumulative, knowledge-driven paradigm in which specifications, contracts, and proofs are continuously synthesised and transferred across systems. To support this shift, we outline a hybrid framework combining large language models with graph-based representations to enable scalable semantic matching and principled reuse of verification artifacts. Learning-based components provide semantic guidance across heterogeneous notations and abstraction levels, while symbolic matching ensures formal soundness. Grounded in compositional reasoning, this vision points toward verification ecosystems that evolve systematically, leveraging past verification efforts to accelerate future assurance.
- [216] arXiv:2602.02882 [pdf, ps, other]
-
Title: Reading Between the Tokens: Improving Preference Predictions through Mechanistic ForecastingSubjects: Computers and Society (cs.CY)
Large language models are increasingly used to predict human preferences in both scientific and business endeavors, yet current approaches rely exclusively on analyzing model outputs without considering the underlying mechanisms. Using election forecasting as a test case, we introduce mechanistic forecasting, a method that demonstrates that probing internal model representations offers a fundamentally different - and sometimes more effective - approach to preference prediction. Examining over 24 million configurations across 7 models, 6 national elections, multiple persona attributes, and prompt variations, we systematically analyze how demographic and ideological information activates latent party-encoding components within the respective models. We find that leveraging this internal knowledge via mechanistic forecasting (opposed to solely relying on surface-level predictions) can improve prediction accuracy. The effects vary across demographic versus opinion-based attributes, political parties, national contexts, and models. Our findings demonstrate that the latent representational structure of LLMs contains systematic, exploitable information about human preferences, establishing a new path for using language models in social science prediction tasks.
- [217] arXiv:2602.02883 [pdf, ps, other]
-
Title: Efficiency Optimizations for Superblock-based Sparse RetrievalComments: 11 pages, 5 figures, 9 tables. Under reviewSubjects: Information Retrieval (cs.IR)
Learned sparse retrieval (LSR) is a popular method for first-stage retrieval because it combines the semantic matching of language models with efficient CPU-friendly algorithms. Previous work aggregates blocks into "superblocks" to quickly skip the visitation of blocks during query processing by using an advanced pruning heuristic. This paper proposes a simple and effective superblock pruning scheme that reduces the overhead of superblock score computation while preserving competitive relevance. It combines this scheme with a compact index structure and a robust zero-shot configuration that is effective across LSR models and multiple datasets. This paper provides an analytical justification and evaluation on the MS MARCO and BEIR datasets, demonstrating that the proposed scheme can be a strong alternative for efficient sparse retrieval.
- [218] arXiv:2602.02886 [pdf, ps, other]
-
Title: Mixture of Concept Bottleneck ExpertsAuthors: Francesco De Santis, Gabriele Ciravegna, Giovanni De Felice, Arianna Casanova, Francesco Giannini, Michelangelo Diligenti, Mateo Espinosa Zarlenga, Pietro Barbiero, Johannes Schneider, Danilo GiordanoSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Concept Bottleneck Models (CBMs) promote interpretability by grounding predictions in human-understandable concepts. However, existing CBMs typically fix their task predictor to a single linear or Boolean expression, limiting both predictive accuracy and adaptability to diverse user needs. We propose Mixture of Concept Bottleneck Experts (M-CBEs), a framework that generalizes existing CBMs along two dimensions: the number of experts and the functional form of each expert, exposing an underexplored region of the design space. We investigate this region by instantiating two novel models: Linear M-CBE, which learns a finite set of linear expressions, and Symbolic M-CBE, which leverages symbolic regression to discover expert functions from data under user-specified operator vocabularies. Empirical evaluation demonstrates that varying the mixture size and functional form provides a robust framework for navigating the accuracy-interpretability trade-off, adapting to different user and task needs.
- [219] arXiv:2602.02888 [pdf, ps, other]
-
Title: HALT: Hallucination Assessment via Log-probs as Time seriesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Hallucinations remain a major obstacle for large language models (LLMs), especially in safety-critical domains. We present HALT (Hallucination Assessment via Log-probs as Time series), a lightweight hallucination detector that leverages only the top-20 token log-probabilities from LLM generations as a time series. HALT uses a gated recurrent unit model combined with entropy-based features to learn model calibration bias, providing an extremely efficient alternative to large encoders. Unlike white-box approaches, HALT does not require access to hidden states or attention maps, relying only on output log-probabilities. Unlike black-box approaches, it operates on log-probs rather than surface-form text, which enables stronger domain generalization and compatibility with proprietary LLMs without requiring access to internal weights. To benchmark performance, we introduce HUB (Hallucination detection Unified Benchmark), which consolidates prior datasets into ten capabilities covering both reasoning tasks (Algorithmic, Commonsense, Mathematical, Symbolic, Code Generation) and general purpose skills (Chat, Data-to-Text, Question Answering, Summarization, World Knowledge). While being 30x smaller, HALT outperforms Lettuce, a fine-tuned modernBERT-base encoder, achieving a 60x speedup gain on HUB. HALT and HUB together establish an effective framework for hallucination detection across diverse LLM capabilities.
- [220] arXiv:2602.02890 [pdf, ps, other]
-
Title: Self-Soupervision: Cooking Model Soups without LabelsSubjects: Machine Learning (cs.LG)
Model soups are strange and strangely effective combinations of parameters. They take a model (the stock), fine-tune it into multiple models (the ingredients), and then mix their parameters back into one model (the soup) to improve predictions. While all known soups require supervised learning, and optimize the same loss on labeled data, our recipes for Self-\emph{Soup}ervision generalize soups to self-supervised learning (SSL). Our Self-Souping lets us flavor ingredients on new data sources, e.g. from unlabeled data from a task for transfer or from a shift for robustness. We show that Self-Souping on corrupted test data, then fine-tuning back on uncorrupted train data, boosts robustness by +3.5\% (ImageNet-C) and +7\% (LAION-C). Self-\emph{Soup}ervision also unlocks countless SSL algorithms to cook the diverse ingredients needed for more robust soups. We show for the first time that ingredients can differ in their SSL hyperparameters -- and more surprisingly, in their SSL algorithms. We cook soups of MAE, MoCoV3, and MMCR ingredients that are more accurate than any one single SSL ingredient.
- [221] arXiv:2602.02891 [pdf, ps, other]
-
Title: TraceNAS: Zero-shot LLM Pruning via Gradient Trace CorrelationComments: PreprintSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Structured pruning is essential for efficient deployment of Large Language Models (LLMs). The varying sensitivity of LLM sub-blocks to pruning necessitates the identification of optimal non-uniformly pruned models. Existing methods evaluate the importance of layers, attention heads, or weight channels in isolation. Such localized focus ignores the complex global structural dependencies that exist across the model. Training-aware structured pruning addresses global dependencies, but its computational cost can be just as expensive as post-pruning training. To alleviate the computational burden of training-aware pruning and capture global structural dependencies, we propose TraceNAS, a training-free Neural Architecture Search (NAS) framework that jointly explores structured pruning of LLM depth and width. TraceNAS identifies pruned models that maintain a high degree of loss landscape alignment with the pretrained model using a scale-invariant zero-shot proxy, effectively selecting models that exhibit maximal performance potential during post-pruning training. TraceNAS is highly efficient, enabling high-fidelity discovery of pruned models on a single GPU in 8.5 hours, yielding a 10$\times$ reduction in GPU-hours compared to training-aware methods. Evaluations on the Llama and Qwen families demonstrate that TraceNAS is competitive with training-aware baselines across commonsense and reasoning benchmarks.
- [222] arXiv:2602.02892 [pdf, ps, other]
-
Title: Prefix Consensus For Censorship Resistant BFTSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Despite broad use of BFT consensus in blockchains, censorship resistance is weak: leaders can exclude transactions, a growing concern for trading and DeFi.
We address this by introducing a new abstraction and protocol stack. First, we introduce \emph{Prefix Consensus}, where parties input vectors and output $(v^{\sf low},v^{\sf high})$ that (i) extend the maximum common prefix of honest inputs and (ii) satisfy $v_i^{\sf low}\preceq v_j^{\sf high}$ for all honest $i,j$. Unlike classical consensus, no single output is required. We show Prefix Consensus is solvable asynchronously and give tight round-complexity bounds.
We then define \emph{Strong Prefix Consensus}, requiring agreement on the \emph{high} output. Our protocol is leaderless and partially synchronous: one Prefix Consensus instance decides (possibly different) lows, and additional instances yield a unique safe-to-extend high, even if an adversary can suspend one party per round.
We lift this to a leaderless, multi-proposer, censorship-resistant BFT SMR protocol: per slot, all parties broadcast proposals, deterministically rank them, and run one Strong Prefix Consensus on proposal hashes, committing honest proposals in \emph{four rounds}. A deterministic demotion rule updates the ranking when a party's proposal is excluded, implying that after GST at most $f$ slots can miss an honest proposal while progress remains leaderless under suspension and up to $f{-}1$ Byzantine faults.
Finally, we connect Prefix Consensus to graded and binary/validated consensus: we obtain an optimal-latency graded consensus (3 message delays) and leaderless Binary/Validated Consensus with worst-case message complexity $O(n^3)$ and communication $O(n^4)$. - [223] arXiv:2602.02894 [pdf, ps, other]
-
Title: DoubleTake: Contrastive Reasoning for Faithful Decision-Making in Medical ImagingSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Accurate decision making in medical imaging requires reasoning over subtle visual differences between confusable conditions, yet most existing approaches rely on nearest neighbor retrieval that returns redundant evidence and reinforces a single hypothesis. We introduce a contrastive, document-aware reference selection framework that constructs compact evidence sets optimized for discrimination rather than similarity by explicitly balancing visual relevance, embedding diversity, and source-level provenance using ROCO embeddings and metadata. While ROCO provides large-scale image-caption pairs, it does not specify how references should be selected for contrastive reasoning, and naive retrieval frequently yields near-duplicate figures from the same document. To address this gap, we release a reproducible reference selection protocol and curated reference bank that enable a systematic study of contrastive retrieval in medical image reasoning. Building on these contrastive evidence sets, we propose Counterfactual-Contrastive Inference, a confidence-aware reasoning framework that performs structured pairwise visual comparisons and aggregates evidence using margin-based decision rules with faithful abstention. On the MediConfusion benchmark, our approach achieves state-of-the-art performance, improving set-level accuracy by nearly 15% relative to prior methods while reducing confusion and improving individual accuracy.
- [224] arXiv:2602.02895 [pdf, ps, other]
-
Title: Moving On, Even When You're Broken: Fail-Active Trajectory Generation via Diffusion Policies Conditioned on Embodiment and TaskAuthors: Gilberto G. Briscoe-Martinez, Yaashia Gautam, Rahul Shetty, Anuj Pasricha, Marco M. Nicotra, Alessandro RonconeComments: To be published in the 2026 IEEE International Conference on Robotics & AutomationSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Robot failure is detrimental and disruptive, often requiring human intervention to recover. Maintaining safe operation under impairment to achieve task completion, i.e. fail-active operation, is our target. Focusing on actuation failures, we introduce DEFT, a diffusion-based trajectory generator conditioned on the robot's current embodiment and task constraints. DEFT generalizes across failure types, supports constrained and unconstrained motions, and enables task completion under arbitrary failure. We evaluated DEFT in both simulation and real-world scenarios using a 7-DoF robotic arm. In simulation over thousands of joint-failure cases across multiple tasks, DEFT outperformed the baseline by up to 2 times. On failures unseen during training, it continued to outperform the baseline, indicating robust generalization in simulation. Further, we performed real-world evaluations on two multi-step tasks, drawer manipulation and whiteboard erasing. These experiments demonstrated DEFT succeeding on tasks where classical methods failed. Our results show that DEFT achieves fail-active manipulation across arbitrary failure configurations and real-world deployments.
- [225] arXiv:2602.02896 [pdf, ps, other]
-
Title: Failure-Aware Enhancements for Large Language Model (LLM) Code Generation: An Empirical Study on Decision FrameworkComments: Accepted at SANER 2026Subjects: Software Engineering (cs.SE)
Large language models (LLMs) show promise for automating software development by translating requirements into code. However, even advanced prompting workflows like progressive prompting often leave some requirements unmet. Although methods such as self-critique, multi-model collaboration, and retrieval-augmented generation (RAG) have been proposed to address these gaps, developers lack clear guidance on when to use each. In an empirical study of 25 GitHub projects, we found that progressive prompting achieves 96.9% average task completion, significantly outperforming direct prompting (80.5%, Cohen's d=1.63, p<0.001) but still leaving 8 projects incomplete. For 6 of the most representative projects, we evaluated each enhancement strategy across 4 failure types. Our results reveal that method effectiveness depends critically on failure characteristics: Self-Critique succeeds on code-reviewable logic errors but fails completely on external service integration (0% improvement), while RAG achieves highest completion across all failure types with superior efficiency. Based on these findings, we propose a decision framework that maps each failure pattern to the most suitable enhancement method, giving practitioners practical, data-driven guidance instead of trial-and-error.
- [226] arXiv:2602.02898 [pdf, ps, other]
-
Title: Aligning Language Model Benchmarks with Pairwise PreferencesAuthors: Marco Gutierrez, Xinyi Leng, Hannah Cyberey, Jonathan Richard Schwarz, Ahmed Alaa, Thomas HartvigsenSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Language model benchmarks are pervasive and computationally-efficient proxies for real-world performance. However, many recent works find that benchmarks often fail to predict real utility. Towards bridging this gap, we introduce benchmark alignment, where we use limited amounts of information about model performance to automatically update offline benchmarks, aiming to produce new static benchmarks that predict model pairwise preferences in given test settings. We then propose BenchAlign, the first solution to this problem, which learns preference-aligned weight- ings for benchmark questions using the question-level performance of language models alongside ranked pairs of models that could be collected during deployment, producing new benchmarks that rank previously unseen models according to these preferences. Our experiments show that our aligned benchmarks can accurately rank unseen models according to models of human preferences, even across different sizes, while remaining interpretable. Overall, our work provides insights into the limits of aligning benchmarks with practical human preferences, which stands to accelerate model development towards real utility.
- [227] arXiv:2602.02899 [pdf, ps, other]
-
Title: Controlled disagreement improves generalization in decentralized trainingSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Decentralized training is often regarded as inferior to centralized training because the consensus errors between workers are thought to undermine convergence and generalization, even with homogeneous data distributions. This work challenges this view by introducing decentralized SGD with Adaptive Consensus (DSGD-AC), which intentionally preserves non-vanishing consensus errors through a time-dependent scaling mechanism. We prove that these errors are not random noise but systematically align with the dominant Hessian subspace, acting as structured perturbations that guide optimization toward flatter minima. Across image classification and machine translation benchmarks, DSGD-AC consistently surpasses both standard DSGD and centralized SGD in test accuracy and solution flatness. Together, these results establish consensus errors as a useful implicit regularizer and open a new perspective on the design of decentralized learning algorithms.
- [228] arXiv:2602.02900 [pdf, ps, other]
-
Title: Manifold-Constrained Energy-Based Transition Models for Offline Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Model-based offline reinforcement learning is brittle under distribution shift: policy improvement drives rollouts into state--action regions weakly supported by the dataset, where compounding model error yields severe value overestimation. We propose Manifold-Constrained Energy-based Transition Models (MC-ETM), which train conditional energy-based transition models using a manifold projection--diffusion negative sampler. MC-ETM learns a latent manifold of next states and generates near-manifold hard negatives by perturbing latent codes and running Langevin dynamics in latent space with the learned conditional energy, sharpening the energy landscape around the dataset support and improving sensitivity to subtle out-of-distribution deviations. For policy optimization, the learned energy provides a single reliability signal: rollouts are truncated when the minimum energy over sampled next states exceeds a threshold, and Bellman backups are stabilized via pessimistic penalties based on Q-value-level dispersion across energy-guided samples. We formalize MC-ETM through a hybrid pessimistic MDP formulation and derive a conservative performance bound separating in-support evaluation error from truncation risk. Empirically, MC-ETM improves multi-step dynamics fidelity and yields higher normalized returns on standard offline control benchmarks, particularly under irregular dynamics and sparse data coverage.
- [229] arXiv:2602.02902 [pdf, ps, other]
-
Title: Minimal Computational Preconditions for Subjective Perspective in Artificial AgentsAuthors: Hongju PaeSubjects: Artificial Intelligence (cs.AI)
This study operationalizes subjective perspective in artificial agents by grounding it in a minimal, phenomenologically motivated internal structure. The perspective is implemented as a slowly evolving global latent state that modulates fast policy dynamics without being directly optimized for behavioral consequences. In a reward-free environment with regime shifts, this latent structure exhibits direction-dependent hysteresis, while policy-level behavior remains comparatively reactive. I argue that such hysteresis constitutes a measurable signature of perspective-like subjectivity in machine systems.
- [230] arXiv:2602.02903 [pdf, ps, other]
-
Title: Spatiotemporal Decision Transformer for Traffic CoordinationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Traffic signal control is a critical challenge in urban transportation, requiring coordination among multiple intersections to optimize network-wide traffic flow. While reinforcement learning has shown promise for adaptive signal control, existing methods struggle with multi-agent coordination and sample efficiency. We introduce MADT (Multi-Agent Decision Transformer), a novel approach that reformulates multi-agent traffic signal control as a sequence modeling problem. MADT extends the Decision Transformer paradigm to multi-agent settings by incorporating: (1) a graph attention mechanism for modeling spatial dependencies between intersections, (2) a|temporal transformer encoder for capturing traffic dynamics, and (3) return-to-go conditioning for target performance specification. Our approach enables offline learning from historical traffic data, with architecture design that facilitates potential online fine-tuning. Experiments on synthetic grid networks and real-world traffic scenarios demonstrate that MADT achieves state-of-the-art performance, reducing average travel time by 5-6% compared to the strongest baseline while exhibiting superior coordination among adjacent intersections.
- [231] arXiv:2602.02905 [pdf, ps, other]
-
Title: FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific InsightsAuthors: Zhen Wang, Fan Bai, Zhongyan Luo, Jinyan Su, Kaiser Sun, Xinle Yu, Jieyuan Liu, Kun Zhou, Claire Cardie, Mark Dredze, Eric P. Xing, Zhiting HuComments: 30 pages, 4 figures, 10 tablesSubjects: Artificial Intelligence (cs.AI)
Autonomous agents powered by large language models (LLMs) promise to accelerate scientific discovery end-to-end, but rigorously evaluating their capacity for verifiable discovery remains a central challenge. Existing benchmarks face a trade-off: they either heavily rely on LLM-as-judge evaluations of automatically generated research outputs or optimize convenient yet isolated performance metrics that provide coarse proxies for scientific insight. To address this gap, we introduce FIRE-Bench (Full-cycle Insight Rediscovery Evaluation), a benchmark that evaluates agents through the rediscovery of established findings from recent, high-impact machine learning research. Agents are given only a high-level research question extracted from a published, verified study and must autonomously explore ideas, design experiments, implement code, execute their plans, and derive conclusions supported by empirical evidence. We evaluate a range of state-of-the-art agents with frontier LLMs backbones like gpt-5 on FIRE-Bench. Our results show that full-cycle scientific research remains challenging for current agent systems: even the strongest agents achieve limited rediscovery success (<50 F1), exhibit high variance across runs, and display recurring failure modes in experimental design, execution, and evidence-based reasoning. FIRE-Bench provides a rigorous and diagnostic framework for measuring progress toward reliable agent-driven scientific discovery.
- [232] arXiv:2602.02907 [pdf, ps, other]
-
Title: VoroUDF: Meshing Unsigned Distance Fields with Voronoi OptimizationSubjects: Graphics (cs.GR)
We present VoroUDF, an algorithm for reconstructing high-quality triangle meshes from Unsigned Distance Fields (UDFs). Our algorithm supports non-manifold geometry, sharp features, and open boundaries, without relying on error-prone inside/outside estimation, restrictive look-up tables nor topologically noisy optimization. Our Voronoi-based formulation combines a L_1 tangent minimization with feature-aware repulsion to robustly recover complex surface topology. It achieves significantly improved topological consistency and geometric fidelity compared to existing methods, while producing lightweight meshes suitable for downstream real-time and interactive applications.
- [233] arXiv:2602.02908 [pdf, ps, other]
-
Title: A Random Matrix Theory Perspective on the Consistency of Diffusion ModelsComments: 65 pages; 53 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Diffusion models trained on different, non-overlapping subsets of a dataset often produce strikingly similar outputs when given the same noise seed. We trace this consistency to a simple linear effect: the shared Gaussian statistics across splits already predict much of the generated images. To formalize this, we develop a random matrix theory (RMT) framework that quantifies how finite datasets shape the expectation and variance of the learned denoiser and sampling map in the linear setting. For expectations, sampling variability acts as a renormalization of the noise level through a self-consistent relation $\sigma^2 \mapsto \kappa(\sigma^2)$, explaining why limited data overshrink low-variance directions and pull samples toward the dataset mean. For fluctuations, our variance formulas reveal three key factors behind cross-split disagreement: \textit{anisotropy} across eigenmodes, \textit{inhomogeneity} across inputs, and overall scaling with dataset size. Extending deterministic-equivalence tools to fractional matrix powers further allows us to analyze entire sampling trajectories. The theory sharply predicts the behavior of linear diffusion models, and we validate its predictions on UNet and DiT architectures in their non-memorization regime, identifying where and how samples deviates across training data split. This provides a principled baseline for reproducibility in diffusion training, linking spectral properties of data to the stability of generative outputs.
- [234] arXiv:2602.02909 [pdf, ps, other]
-
Title: Reasoning about Reasoning: BAPO Bounds on Chain-of-Thought Token Complexity in LLMsComments: 28 pagesSubjects: Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Machine Learning (cs.LG)
Inference-time scaling via chain-of-thought (CoT) reasoning is a major driver of state-of-the-art LLM performance, but it comes with substantial latency and compute costs. We address a fundamental theoretical question: how many reasoning tokens are required to solve a problem as input size grows? By extending the bounded attention prefix oracle (BAPO) model--an abstraction of LLMs that quantifies the information flow required to solve a task--we prove lower bounds on the CoT tokens required for three canonical BAPO-hard tasks: binary majority, triplet matching, and graph reachability. We show that each requires $\Omega(n)$ reasoning tokens when the input size is $n$. We complement these results with matching or near-matching upper bounds via explicit constructions. Finally, our experiments with frontier reasoning models show approximately linear reasoning token scaling on these tasks and failures when constrained to smaller reasoning budgets, consistent with our theoretical lower bounds. Together, our results identify fundamental bottlenecks in inference-time compute through CoT and offer a principled tool for analyzing optimal reasoning length.
- [235] arXiv:2602.02912 [pdf, ps, other]
-
Title: Notes on the Reward Representation of Posterior UpdatesAuthors: Pedro A. OrtegaComments: Technical report, 9 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Many ideas in modern control and reinforcement learning treat decision-making as inference: start from a baseline distribution and update it when a signal arrives. We ask when this can be made literal rather than metaphorical. We study the special case where a KL-regularized soft update is exactly a Bayesian posterior inside a single fixed probabilistic model, so the update variable is a genuine channel through which information is transmitted. In this regime, behavioral change is driven only by evidence carried by that channel: the update must be explainable as an evidence reweighing of the baseline. This yields a sharp identification result: posterior updates determine the relative, context-dependent incentive signal that shifts behavior, but they do not uniquely determine absolute rewards, which remain ambiguous up to context-specific baselines. Requiring one reusable continuation value across different update directions adds a further coherence constraint linking the reward descriptions associated with different conditioning orders.
- [236] arXiv:2602.02914 [pdf, ps, other]
-
Title: FaceLinkGen: Rethinking Identity Leakage in Privacy-Preserving Face Recognition with Identity ExtractionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Transformation-based privacy-preserving face recognition (PPFR) aims to verify identities while hiding facial data from attackers and malicious service providers. Existing evaluations mostly treat privacy as resistance to pixel-level reconstruction, measured by PSNR and SSIM. We show that this reconstruction-centric view fails. We present FaceLinkGen, an identity extraction attack that performs linkage/matching and face regeneration directly from protected templates without recovering original pixels. On three recent PPFR systems, FaceLinkGen reaches over 98.5\% matching accuracy and above 96\% regeneration success, and still exceeds 92\% matching and 94\% regeneration in a near zero knowledge setting. These results expose a structural gap between pixel distortion metrics, which are widely used in PPFR evaluation, and real privacy. We show that visual obfuscation leaves identity information broadly exposed to both external intruders and untrusted service providers.
- [237] arXiv:2602.02915 [pdf, ps, other]
-
Title: Modular Isoperimetric Soft Robotic Truss for Lunar ApplicationsAuthors: Mihai Stanciu, Isaac Weaver, Adam Rose, James Wade, Kaden Paxton, Chris Paul, Spencer Stowell, Nathan UsevitchSubjects: Robotics (cs.RO)
We introduce a large-scale robotic system designed as a lightweight, modular, and reconfigurable structure for lunar applications. The system consists of truss-like robotic triangles formed by continuous inflated fabric tubes routed through two robotic roller units and a connecting unit. A newly developed spherical joint enables up to three triangles to connect at a vertex, allowing construction of truss assemblies beyond a single octahedron. When deflated, the triangles compact to approximately the volume of the roller units, achieving a stowed-to-deployed volume ratio of 1:18.3. Upon inflation, the roller units pinch the tubes, locally reducing bending stiffness to form effective joints. Electric motors then translate the roller units along the tube, shifting the pinch point by lengthening one edge while shortening another at the same rate, thereby preserving a constant perimeter (isoperimetric). This shape-changing process requires no additional compressed air, enabling untethered operation after initial inflation. We demonstrate the system as a 12-degree-of-freedom solar array capable of tilting up to 60 degrees and sweeping 360 degrees, and as a 14-degree-of-freedom locomotion device using a step-and-slide gait. This modular, shape-adaptive system addresses key challenges for sustainable lunar operations and future space missions.
- [238] arXiv:2602.02917 [pdf, ps, other]
-
Title: Weighted Temporal Decay Loss for Learning Wearable PPG Data with Sparse Clinical LabelsAuthors: Yunsung Chung, Keum San Chun, Migyeong Gwak, Han Feng, Yingshuo Liu, Chanho Lim, Viswam Nathan, Nassir Marrouche, Sharanya Arcot DesaiComments: ICASSP 2026Subjects: Machine Learning (cs.LG)
Advances in wearable computing and AI have increased interest in leveraging PPG for health monitoring over the past decade. One of the biggest challenges in developing health algorithms based on such biosignals is the sparsity of clinical labels, which makes biosignals temporally distant from lab draws less reliable for supervision. To address this problem, we introduce a simple training strategy that learns a biomarker-specific decay of sample weight over the time gap between a segment and its ground truth label and uses this weight in the loss with a regularizer to prevent trivial solutions. On smartwatch PPG from 450 participants across 10 biomarkers, the approach improves over baselines. In the subject-wise setting, the proposed approach averages 0.715 AUPRC, compared to 0.674 for a fine-tuned self-supervised baseline and 0.626 for a feature-based Random Forest. A comparison of four decay families shows that a simple linear decay function is most robust on average. Beyond accuracy, the learned decay rates summarize how quickly each biomarker's PPG evidence becomes stale, providing an interpretable view of temporal sensitivity.
- [239] arXiv:2602.02918 [pdf, ps, other]
-
Title: A Multi-scale Linear-time Encoder for Whole-Slide Image AnalysisComments: Accepted to ISBI 2026, 4 pages with 2 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Tissues and Organs (q-bio.TO)
We introduce Multi-scale Adaptive Recurrent Biomedical Linear-time Encoder (MARBLE), the first \textit{purely Mamba-based} multi-state multiple instance learning (MIL) framework for whole-slide image (WSI) analysis. MARBLE processes multiple magnification levels in parallel and integrates coarse-to-fine reasoning within a linear-time state-space model, efficiently capturing cross-scale dependencies with minimal parameter overhead. WSI analysis remains challenging due to gigapixel resolutions and hierarchical magnifications, while existing MIL methods typically operate at a single scale and transformer-based approaches suffer from quadratic attention costs. By coupling parallel multi-scale processing with linear-time sequence modeling, MARBLE provides a scalable and modular alternative to attention-based architectures. Experiments on five public datasets show improvements of up to \textbf{6.9\%} in AUC, \textbf{20.3\%} in accuracy, and \textbf{2.3\%} in C-index, establishing MARBLE as an efficient and generalizable framework for multi-scale WSI analysis.
- [240] arXiv:2602.02919 [pdf, ps, other]
-
Title: DeltaEvolve: Accelerating Scientific Discovery through Momentum-Driven EvolutionSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
LLM-driven evolutionary systems have shown promise for automated science discovery, yet existing approaches such as AlphaEvolve rely on full-code histories that are context-inefficient and potentially provide weak evolutionary guidance. In this work, we first formalize the evolutionary agents as a general Expectation-Maximization framework, where the language model samples candidate programs (E-step) and the system updates the control context based on evaluation feedback (M-step). Under this view, constructing context via full-code snapshots constitutes a suboptimal M-step, as redundant implement details dilutes core algorithmic ideas, making it difficult to provide clear inspirations for evolution. To address this, we propose DeltaEvolve, a momentum-driven evolutionary framework that replaces full-code history with structured semantic delta capturing how and why modifications between successive nodes affect performance. As programs are often decomposable, semantic delta usually contains many effective components which are transferable and more informative to drive improvement. By organizing semantic delta through multi-level database and progressive disclosure mechanism, input tokens are further reduced. Empirical evaluations on tasks across diverse scientific domains show that our framework can discover better solution with less token consumption over full-code-based evolutionary agents.
- [241] arXiv:2602.02920 [pdf, ps, other]
-
Title: A Reproducible Framework for Bias-Resistant Machine Learning on Small-Sample Neuroimaging DataComments: Accepted to ISBI 2026, 5 pages with 1 figureSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
We introduce a reproducible, bias-resistant machine learning framework that integrates domain-informed feature engineering, nested cross-validation, and calibrated decision-threshold optimization for small-sample neuroimaging data. Conventional cross-validation frameworks that reuse the same folds for both model selection and performance estimation yield optimistically biased results, limiting reproducibility and generalization. Demonstrated on a high-dimensional structural MRI dataset of deep brain stimulation cognitive outcomes, the framework achieved a nested-CV balanced accuracy of 0.660\,$\pm$\,0.068 using a compact, interpretable subset selected via importance-guided ranking. By combining interpretability and unbiased evaluation, this work provides a generalizable computational blueprint for reliable machine learning in data-limited biomedical domains.
- [242] arXiv:2602.02924 [pdf, ps, other]
-
Title: How Does the Lagrangian Guide Safe Reinforcement Learning through Diffusion Models?Authors: Xiaoyuan Cheng, Wenxuan Yuan, Boyang Li, Yuanchao Xu, Yiming Yang, Hao Liang, Bei Peng, Robert Loftin, Zhuo Sun, Yukun HuSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Diffusion policy sampling enables reinforcement learning (RL) to represent multimodal action distributions beyond suboptimal unimodal Gaussian policies. However, existing diffusion-based RL methods primarily focus on offline settings for reward maximization, with limited consideration of safety in online settings. To address this gap, we propose Augmented Lagrangian-Guided Diffusion (ALGD), a novel algorithm for off-policy safe RL. By revisiting optimization theory and energy-based model, we show that the instability of primal-dual methods arises from the non-convex Lagrangian landscape. In diffusion-based safe RL, the Lagrangian can be interpreted as an energy function guiding the denoising dynamics. Counterintuitively, direct usage destabilizes both policy generation and training. ALGD resolves this issue by introducing an augmented Lagrangian that locally convexifies the energy landscape, yielding a stabilized policy generation and training process without altering the distribution of the optimal policy. Theoretical analysis and extensive experiments demonstrate that ALGD is both theoretically grounded and empirically effective, achieving strong and stable performance across diverse environments.
- [243] arXiv:2602.02925 [pdf, ps, other]
-
Title: Refining Decision Boundaries In Anomaly Detection Using Similarity Search Within the Feature SpaceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Neural and Evolutionary Computing (cs.NE)
Detecting rare and diverse anomalies in highly imbalanced datasets-such as Advanced Persistent Threats (APTs) in cybersecurity-remains a fundamental challenge for machine learning systems. Active learning offers a promising direction by strategically querying an oracle to minimize labeling effort, yet conventional approaches often fail to exploit the intrinsic geometric structure of the feature space for model refinement. In this paper, we introduce SDA2E, a Sparse Dual Adversarial Attention-based AutoEncoder designed to learn compact and discriminative latent representations from imbalanced, high-dimensional data. We further propose a similarity-guided active learning framework that integrates three novel strategies to refine decision boundaries efficiently: mormal-like expansion, which enriches the training set with points similar to labeled normals to improve reconstruction fidelity; anomaly-like prioritization, which boosts ranking accuracy by focusing on points resembling known anomalies; and a hybrid strategy that combines both for balanced model refinement and ranking. A key component of our framework is a new similarity measure, Normalized Matching 1s (SIM_NM1), tailored for sparse binary embeddings. We evaluate SDA2E extensively across 52 imbalanced datasets, including multiple DARPA Transparent Computing scenarios, and benchmark it against 15 state-of-the-art anomaly detection methods. Results demonstrate that SDA2E consistently achieves superior ranking performance (nDCG up to 1.0 in several cases) while reducing the required labeled data by up to 80% compared to passive training. Statistical tests confirm the significance of these improvements. Our work establishes a robust, efficient, and statistically validated framework for anomaly detection that is particularly suited to cybersecurity applications such as APT detection.
- [244] arXiv:2602.02928 [pdf, ps, other]
-
Title: Distance Marching for Generative ModelingSubjects: Machine Learning (cs.LG)
Time-unconditional generative models learn time-independent denoising vector fields. But without time conditioning, the same noisy input may correspond to multiple noise levels and different denoising directions, which interferes with the supervision signal. Inspired by distance field modeling, we propose Distance Marching, a new time-unconditional approach with two principled inference methods. Crucially, we design losses that focus on closer targets. This yields denoising directions better directed toward the data manifold. Across architectures, Distance Marching consistently improves FID by 13.5% on CIFAR-10 and ImageNet over recent time-unconditional baselines. For class-conditional ImageNet generation, despite removing time input, Distance Marching surpasses flow matching using our losses and inference methods. It achieves lower FID than flow matching's final performance using 60% of the sampling steps and 13.6% lower FID on average across backbone sizes. Moreover, our distance prediction is also helpful for early stopping during sampling and for OOD detection. We hope distance field modeling can serve as a principled lens for generative modeling.
- [245] arXiv:2602.02929 [pdf, ps, other]
-
Title: RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly DetectionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Neural and Evolutionary Computing (cs.NE)
Advanced Persistent Threats (APTs) are sophisticated, long-term cyberattacks that are difficult to detect because they operate stealthily and often blend into normal system behavior. This paper presents a neuro-symbolic anomaly detection framework that combines a Graph Autoencoder (GAE) with rare pattern mining to identify APT-like activities in system-level provenance data. Our approach first constructs a process behavioral graph using k-Nearest Neighbors based on feature similarity, then learns normal relational structure using a Graph Autoencoder. Anomaly candidates are identified through deviations between observed and reconstructed graph structure. To further improve detection, we integrate an rare pattern mining module that discovers infrequent behavioral co-occurrences and uses them to boost anomaly scores for processes exhibiting rare signatures. We evaluate the proposed method on the DARPA Transparent Computing datasets and show that rare-pattern boosting yields substantial gains in anomaly ranking quality over the baseline GAE. Compared with existing unsupervised approaches on the same benchmark, our single unified model consistently outperforms individual context-based detectors and achieves performance competitive with ensemble aggregation methods that require multiple separate detectors. These results highlight the value of coupling graph-based representation learning with classical pattern mining to improve both effectiveness and interpretability in provenance-based security anomaly detection.
- [246] arXiv:2602.02930 [pdf, ps, other]
-
Title: Rare Event Early Detection: A Dataset of Sepsis Onset for Critically Ill Trauma PatientsAuthors: Yin Jin, Tucker R. Stewart, Deyi Zhou, Chhavi Gupta, Arjita Nema, Scott C. Brakenridge, Grant E. O'Keefe, Juhua HuSubjects: Machine Learning (cs.LG)
Sepsis is a major public health concern due to its high morbidity, mortality, and cost. Its clinical outcome can be substantially improved through early detection and timely intervention. By leveraging publicly available datasets, machine learning (ML) has driven advances in both research and clinical practice. However, existing public datasets consider ICU patients (Intensive Care Unit) as a uniform group and neglect the potential challenges presented by critically ill trauma patients in whom injury-related inflammation and organ dysfunction can overlap with the clinical features of sepsis. We propose that a targeted identification of post-traumatic sepsis is necessary in order to develop methods for early detection. Therefore, we introduce a publicly available standardized post-trauma sepsis onset dataset extracted, relabeled using standardized post-trauma clinical facts, and validated from MIMIC-III. Furthermore, we frame early detection of post-trauma sepsis onset according to clinical workflow in ICUs in a daily basis resulting in a new rare event detection problem. We then establish a general benchmark through comprehensive experiments, which shows the necessity of further advancements using this new dataset. The data code is available at https://github.com/ML4UWHealth/SepsisOnset_TraumaCohort.git.
- [247] arXiv:2602.02932 [pdf, ps, other]
-
Title: Equal Access, Unequal Interaction: A Counterfactual Audit of LLM FairnessAuthors: Alireza Amiri-Margavi, Arshia Gharagozlou, Amin Gholami Davodi, Seyed Pouyan Mousavi Davoudi, Hamidreza Hasani BalyaniComments: 13 pages, 1 figureSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Prior work on fairness in large language models (LLMs) has primarily focused on access-level behaviors such as refusals and safety filtering. However, equitable access does not ensure equitable interaction quality once a response is provided. In this paper, we conduct a controlled fairness audit examining how LLMs differ in tone, uncertainty, and linguistic framing across demographic identities after access is granted. Using a counterfactual prompt design, we evaluate GPT-4 and LLaMA-3.1-70B on career advice tasks while varying identity attributes along age, gender, and nationality. We assess access fairness through refusal analysis and measure interaction quality using automated linguistic metrics, including sentiment, politeness, and hedging. Identity-conditioned differences are evaluated using paired statistical tests. Both models exhibit zero refusal rates across all identities, indicating uniform access. Nevertheless, we observe systematic, model-specific disparities in interaction quality: GPT-4 expresses significantly higher hedging toward younger male users, while LLaMA exhibits broader sentiment variation across identity groups. These results show that fairness disparities can persist at the interaction level even when access is equal, motivating evaluation beyond refusal-based audits.
- [248] arXiv:2602.02934 [pdf, ps, other]
-
Title: Beyond Blame: Rethinking SZZ with Knowledge Graph SearchSubjects: Software Engineering (cs.SE)
Identifying Bug-Inducing Commits (BICs) is fundamental for understanding software defects and enabling downstream tasks such as defect prediction and automated program repair. Yet existing SZZ-based approaches are limited by their reliance on git blame, which restricts the search space to commits that directly modified the fixed lines. Our preliminary study on 2,102 validated bug-fixing commits reveals that this limitation is significant: over 40% of cases cannot be solved by blame alone, as 28% of BICs require traversing commit history beyond blame results and 14% are blameless.
We present AgenticSZZ, the first approach to apply Temporal Knowledge Graphs (TKGs) to software evolution analysis. AgenticSZZ reframes BIC identification from a ranking problem over blame commits into a graph search problem, where temporal ordering is fundamental to causal reasoning about bug introduction. The approach operates in two phases: (1) constructing a TKG that encodes commits with temporal and structural relationships, expanding the search space by traversing file history backward from two reference points (blame commits and the BFC); and (2) leveraging an LLM agent to navigate the graph using specialized tools for candidate exploration and causal analysis.
Evaluation on three datasets shows that AgenticSZZ achieves F1-scores of 0.48 to 0.74, with statistically significant improvements over state-of-the-art by up to 27%. Our ablation study confirms that both components are essential, reflecting a classic exploration-exploitation trade-off: the TKG expands the search space while the agent provides intelligent selection. By transforming BIC identification into a graph search problem, we open a new research direction for temporal and causal reasoning in software evolution analysis. - [249] arXiv:2602.02942 [pdf, ps, other]
-
Title: Hybrid-Field Channel Estimation for XL-MIMO Systems: Dictionary-based Sparse Signal RecoveryComments: 5 pages, 2 figures, letter paperJournal-ref: IEEE Wireless Communications Letters ( Volume: 15); Page(s): 1385 - 1389; January 2026Subjects: Systems and Control (eess.SY)
Extremely large-scale multiple-input multiple-output (XL-MIMO) systems are a key technology for future wireless networks, but the large array aperture naturally creates a hybrid-field (HF) propagation regime in which far-field (FF) planar-wave and near-field (NF) spherical-wave components coexist. This work considers the problem of HF channel estimation (CE) and introduces a unified model that superimposes FF and NF contributions according to the Rayleigh distance boundary. By exploiting the inherent sparsity of the channel in the angular and polar domains, we formulate the estimation task as a sparse recovery problem. Unlike conventional approaches that require prior knowledge of the channel sparsity level, the proposed method operates without requiring knowledge of the sparsity level L and the NF/FF ratio {\gamma}, which are used only for synthetic channel generation in simulations. The channel estimator determines the number of paths adaptively through a residual-based stopping rule. A combined FF/NF dictionary is employed to initialize the support, and each selected atom undergoes continuous parameter refinement to mitigate grid mismatch. Simulation results demonstrate that the proposed estimator achieves accurate HF channel reconstruction under both line-of-sight (LoS) and non-line-of-sight (NLoS) conditions, offering a practical and computationally efficient solution for XL-MIMO systems.
Extremely Large-Scale MIMO (XL-MIMO); Channel State Information (CSI); Channel estimation (CE); hybrid-field (HF) wave propagation; near-field (NF) spherical wave model; far-field (FF) planar wave model - [250] arXiv:2602.02943 [pdf, ps, other]
-
Title: 3D-Learning: Diffusion-Augmented Distributionally Robust Decision-Focused LearningSubjects: Machine Learning (cs.LG)
Predict-then-Optimize (PTO) pipelines are widely employed in computing and networked systems, where Machine Learning (ML) models are used to predict critical contextual information for downstream decision-making tasks such as cloud LLM serving, data center demand response, and edge workload scheduling. However, these ML predictors are often vulnerable to out-of-distribution (OOD) samples at test time, leading to significant decision performance degradation due to large prediction errors. To address the generalization challenges under OOD conditions, we present the framework of Distributionally Robust Decision-Focused Learning (DR-DFL), which trains ML models to optimize decision performance under the worst-case distribution. Instead of relying on classical Distributionally Robust Optimization (DRO) techniques, we propose Diffusion-Augmented Distributionally Robust Decision-Focused Learning (3D-Learning), which searches for the worst-case distribution within the parameterized space of a diffusion model. By leveraging the powerful distribution modeling capabilities of diffusion models, 3D-Learning identifies worst-case distributions that remain consistent with real data, achieving a favorable balance between average and worst-case scenarios. Empirical results on an LLM resource provisioning task demonstrate that 3D-Learning outperforms existing DRO and Data Augmentation methods in OOD generalization performance.
- [251] arXiv:2602.02944 [pdf, ps, other]
-
Title: SRA-Seg: Synthetic to Real Alignment for Semi-Supervised Medical Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Synthetic data, an appealing alternative to extensive expert-annotated data for medical image segmentation, consistently fails to improve segmentation performance despite its visual realism. The reason being that synthetic and real medical images exist in different semantic feature spaces, creating a domain gap that current semi-supervised learning methods cannot bridge. We propose SRA-Seg, a framework explicitly designed to align synthetic and real feature distributions for medical image segmentation. SRA-Seg introduces a similarity-alignment (SA) loss using frozen DINOv2 embeddings to pull synthetic representations toward their nearest real counterparts in semantic space. We employ soft edge blending to create smooth anatomical transitions and continuous labels, eliminating the hard boundaries from traditional copy-paste augmentation. The framework generates pseudo-labels for synthetic images via an EMA teacher model and applies soft-segmentation losses that respect uncertainty in mixed regions. Our experiments demonstrate strong results: using only 10% labeled real data and 90% synthetic unlabeled data, SRA-Seg achieves 89.34% Dice on ACDC and 84.42% on FIVES, significantly outperforming existing semi-supervised methods and matching the performance of methods using real unlabeled data.
- [252] arXiv:2602.02947 [pdf, ps, other]
-
Title: Role of Graphics in Disaster Communication: Practitioner Perspectives on Use, Challenges, and InclusivitySubjects: Graphics (cs.GR)
Information graphics, such as hazard maps, evacuation diagrams, and pictorial action guides, are widely used in disaster risk communication. These visuals are important because they convey hazard information quickly, reduce reliance on lengthy text, and support decision-making in time-critical situations. However, despite their importance, disaster information graphics do not work equally well for all audiences. In practice, many graphics remain difficult to interpret, and their accessibility for vulnerable populations is still uneven and underexplored. Despite their central role, there has been little empirical work examining how graphics shape disaster communication, what challenges practitioners face in using them, and, most importantly, how inclusive current disaster graphics are in real-world settings. To address this gap, we examine how information graphics are currently produced and used in disaster communication, what issues emerge in practice, and how inclusivity is addressed. We conducted semi-structured interviews with disaster communication practitioners and researchers to examine the role of graphics across preparedness, warning, and response contexts, as well as the barriers experienced by vulnerable communities. Our findings show that graphics are widely expected and heavily relied upon, yet significant accessibility gaps persist for groups such as people with vision impairments, older adults, and culturally and linguistically diverse communities. Participants also highlighted that inclusive adaptations are difficult to achieve during unfolding emergencies due to operational constraints, limited guidance, and resource barriers. Based on these findings, we outline recommendations for disaster management agencies and graphic designers and identify research directions for technological and adaptive support to make disaster graphics more inclusive at scale.
- [253] arXiv:2602.02948 [pdf, ps, other]
-
Title: Variational Sparse Paired Autoencoders (vsPAIR) for Inverse Problems and Uncertainty QuantificationSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Inverse problems are fundamental to many scientific and engineering disciplines; they arise when one seeks to reconstruct hidden, underlying quantities from noisy measurements. Many applications demand not just point estimates but interpretable uncertainty. Providing fast inference alongside uncertainty estimates remains challenging yet desirable in numerous applications.
We propose the Variational Sparse Paired Autoencoder (vsPAIR) to address this challenge. The architecture pairs a standard VAE encoding observations with a sparse VAE encoding quantities of interest, connected through a learned latent mapping. The variational structure enables uncertainty estimation, the paired architecture encourages interpretability by anchoring QoI representations to clean data, and sparse encodings provide structure by concentrating information into identifiable factors rather than diffusing across all dimensions. We also propose modifications to existing sparse VAE methods: a hard-concrete spike-and-slab relaxation for differentiable training and a beta hyperprior for adaptive sparsity levels. To validate the effectiveness of our proposed architecture, we conduct experiments on blind inpainting and computed tomography, demonstrating that vsPAIR is a capable inverse problem solver that can provide interpretable and structured uncertainty estimates. - [254] arXiv:2602.02951 [pdf, ps, other]
-
Title: Nüwa: Mending the Spatial Integrity Torn by VLM Token PruningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Vision token pruning has proven to be an effective acceleration technique for the efficient Vision Language Model (VLM). However, existing pruning methods demonstrate excellent performance preservation in visual question answering (VQA) and suffer substantial degradation on visual grounding (VG) tasks. Our analysis of the VLM's processing pipeline reveals that strategies utilizing global semantic similarity and attention scores lose the global spatial reference frame, which is derived from the interactions of tokens' positional information. Motivated by these findings, we propose $\text{N\"uwa}$, a two-stage token pruning framework that enables efficient feature aggregation while maintaining spatial integrity. In the first stage, after the vision encoder, we apply three operations, namely separation, alignment, and aggregation, which are inspired by swarm intelligence algorithms to retain information-rich global spatial anchors. In the second stage, within the LLM, we perform text-guided pruning to retain task-relevant visual tokens. Extensive experiments demonstrate that $\text{N\"uwa}$ achieves SOTA performance on multiple VQA benchmarks (from 94% to 95%) and yields substantial improvements on visual grounding tasks (from 7% to 47%).
- [255] arXiv:2602.02952 [pdf, ps, other]
-
Title: UAT-LITE: Inference-Time Uncertainty-Aware Attention for Pretrained TransformersAuthors: Elias Hossain, Shubhashis Roy Dipta, Subash Neupane, Rajib Rana, Ravid Shwartz-Ziv, Ivan Garibay, Niloofar YousefiSubjects: Artificial Intelligence (cs.AI)
Neural NLP models are often miscalibrated, assigning high confidence to incorrect predictions, which undermines selective prediction and high-stakes deployment. Post-hoc calibration methods adjust output probabilities but leave internal computation unchanged, while ensemble and Bayesian approaches improve uncertainty at substantial training or storage cost. We propose UAT-LITE, an inference-time framework that makes self-attention uncertainty-aware using approximate Bayesian inference via Monte Carlo dropout in pretrained transformer classifiers. Token-level epistemic uncertainty is estimated from stochastic forward passes and used to modulate self-attention during contextualization, without modifying pretrained weights or training objectives. We additionally introduce a layerwise variance decomposition to diagnose how predictive uncertainty accumulates across transformer depth. Across the SQuAD 2.0 answerability, MNLI, and SST-2, UAT-LITE reduces Expected Calibration Error by approximately 20% on average relative to a fine-tuned BERT-base baseline while preserving task accuracy, and improves selective prediction and robustness under distribution shift.
- [256] arXiv:2602.02955 [pdf, ps, other]
-
Title: Synthetic Data Augmentation for Medical Audio Classification: A Preliminary EvaluationComments: 5 pages, 1 figureSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Medical audio classification remains challenging due to low signal-to-noise ratios, subtle discriminative features, and substantial intra-class variability, often compounded by class imbalance and limited training data. Synthetic data augmentation has been proposed as a potential strategy to mitigate these constraints; however, prior studies report inconsistent methodological approaches and mixed empirical results. In this preliminary study, we explore the impact of synthetic augmentation on respiratory sound classification using a baseline deep convolutional neural network trained on a moderately imbalanced dataset (73%:27%). Three generative augmentation strategies (variational autoencoders, generative adversarial networks, and diffusion models) were assessed under controlled experimental conditions. The baseline model without augmentation achieved an F1-score of 0.645. Across individual augmentation strategies, performance gains were not observed, with several configurations demonstrating neutral or degraded classification performance. Only an ensemble of augmented models yielded a modest improvement in F1-score (0.664). These findings suggest that, for medical audio classification, synthetic augmentation may not consistently enhance performance when applied to a standard CNN classifier. Future work should focus on delineating task-specific data characteristics, model-augmentation compatibility, and evaluation frameworks necessary for synthetic augmentation to be effective in medical audio applications.
- [257] arXiv:2602.02958 [pdf, ps, other]
-
Title: Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache QuantizationAuthors: Haocheng Xi, Shuo Yang, Yilong Zhao, Muyang Li, Han Cai, Xingyang Li, Yujun Lin, Zhuoyang Zhang, Jintao Zhang, Xiuyu Li, Zhiying Xu, Jun Wu, Chenfeng Xu, Ion Stoica, Song Han, Kurt KeutzerComments: 11 pages, 7 figuresSubjects: Machine Learning (cs.LG)
Despite rapid progress in autoregressive video diffusion, an emerging system algorithm bottleneck limits both deployability and generation capability: KV cache memory. In autoregressive video generation models, the KV cache grows with generation history and quickly dominates GPU memory, often exceeding 30 GB, preventing deployment on widely available hardware. More critically, constrained KV cache budgets restrict the effective working memory, directly degrading long horizon consistency in identity, layout, and motion. To address this challenge, we present Quant VideoGen (QVG), a training free KV cache quantization framework for autoregressive video diffusion models. QVG leverages video spatiotemporal redundancy through Semantic Aware Smoothing, producing low magnitude, quantization friendly residuals. It further introduces Progressive Residual Quantization, a coarse to fine multi stage scheme that reduces quantization error while enabling a smooth quality memory trade off. Across LongCat Video, HY WorldPlay, and Self Forcing benchmarks, QVG establishes a new Pareto frontier between quality and memory efficiency, reducing KV cache memory by up to 7.0 times with less than 4% end to end latency overhead while consistently outperforming existing baselines in generation quality.
- [258] arXiv:2602.02959 [pdf, ps, other]
-
Title: Human-Centric Traffic Signal Control for Equity: A Multi-Agent Action Branching Deep Reinforcement Learning ApproachSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Coordinating traffic signals along multimodal corridors is challenging because many multi-agent deep reinforcement learning (DRL) approaches remain vehicle-centric and struggle with high-dimensional discrete action spaces. We propose MA2B-DDQN, a human-centric multi-agent action-branching double Deep Q-Network (DQN) framework that explicitly optimizes traveler-level equity. Our key contribution is an action-branching discrete control formulation that decomposes corridor control into (i) local, per-intersection actions that allocate green time between the next two phases and (ii) a single global action that selects the total duration of those phases. This decomposition enables scalable coordination under discrete control while reducing the effective complexity of joint decision-making. We also design a human-centric reward that penalizes the number of delayed individuals in the corridor, accounting for pedestrians, vehicle occupants, and transit passengers. Extensive evaluations across seven realistic traffic scenarios in Melbourne, Australia, demonstrate that our approach significantly reduces the number of impacted travelers, outperforming existing DRL and baseline methods. Experiments confirm the robustness of our model, showing minimal variance across diverse settings. This framework not only advocates for a fairer traffic signal system but also provides a scalable solution adaptable to varied urban traffic conditions.
- [259] arXiv:2602.02960 [pdf, ps, other]
-
Title: Embodiment-Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body ControlSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Humanoid Whole-Body Controllers trained with reinforcement learning (RL) have recently achieved remarkable performance, yet many target a single robot embodiment. Variations in dynamics, degrees of freedom (DoFs), and kinematic topology still hinder a single policy from commanding diverse humanoids. Moreover, obtaining a generalist policy that not only transfers across embodiments but also supports richer behaviors-beyond simple walking to squatting, leaning-remains especially challenging. In this work, we tackle these obstacles by introducing EAGLE, an iterative generalist-specialist distillation framework that produces a single unified policy that controls multiple heterogeneous humanoids without per-robot reward tuning. During each cycle, embodiment-specific specialists are forked from the current generalist, refined on their respective robots, and new skills are distilled back into the generalist by training on the pooled embodiment set. Repeating this loop until performance convergence produces a robust Whole-Body Controller validated on robots such as Unitree H1, G1, and Fourier N1. We conducted experiments on five different robots in simulation and four in real-world settings. Through quantitative evaluations, EAGLE achieves high tracking accuracy and robustness compared to other methods, marking a step toward scalable, fleet-level humanoid control. See more details at https://eagle-wbc.github.io/
- [260] arXiv:2602.02961 [pdf, ps, other]
-
Title: Generative Engine Optimization: A VLM and Agent Framework for Pinterest Acquisition GrowthSubjects: Artificial Intelligence (cs.AI)
Large Language Models are fundamentally reshaping content discovery through AI-native search systems such as ChatGPT, Gemini, and Claude. Unlike traditional search engines that match keywords to documents, these systems infer user intent, synthesize multimodal evidence, and generate contextual answers directly on the search page, introducing a paradigm shift from Search Engine Optimization (SEO) to Generative Engine Optimization (GEO). For visual content platforms hosting billions of assets, this poses an acute challenge: individual images lack the semantic depth and authority signals that generative search prioritizes, risking disintermediation as user needs are satisfied in-place without site visits.
We present Pinterest GEO, a production-scale framework that pioneers reverse search design: rather than generating generic image captions describing what content is, we fine-tune Vision-Language Models (VLMs) to predict what users would actually search for, augmented this with AI agents that mine real-time internet trends to capture emerging search demand. These VLM-generated queries then drive construction of semantically coherent Collection Pages via multimodal embeddings, creating indexable aggregations optimized for generative retrieval. Finally, we employ hybrid VLM and two-tower ANN architectures to build authority-aware interlinking structures that propagate signals across billions of visual assets. Deployed at scale across billions of images and tens of millions of collections, GEO delivers 20\% organic traffic growth contributing to multi-million monthly active user (MAU) growth, demonstrating a principled pathway for visual platforms to thrive in the generative search era. - [261] arXiv:2602.02962 [pdf, ps, other]
-
Title: Q-ShiftDP: A Differentially Private Parameter-Shift Rule for Quantum Machine LearningSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Quantum Machine Learning (QML) promises significant computational advantages, but preserving training data privacy remains challenging. Classical approaches like differentially private stochastic gradient descent (DP-SGD) add noise to gradients but fail to exploit the unique properties of quantum gradient estimation. In this work, we introduce the Differentially Private Parameter-Shift Rule (Q-ShiftDP), the first privacy mechanism tailored to QML. By leveraging the inherent boundedness and stochasticity of quantum gradients computed via the parameter-shift rule, Q-ShiftDP enables tighter sensitivity analysis and reduces noise requirements. We combine carefully calibrated Gaussian noise with intrinsic quantum noise to provide formal privacy and utility guarantees, and show that harnessing quantum noise further improves the privacy-utility trade-off. Experiments on benchmark datasets demonstrate that Q-ShiftDP consistently outperforms classical DP methods in QML.
- [262] arXiv:2602.02963 [pdf, ps, other]
-
Title: TRACE: Temporal Radiology with Anatomical Change Explanation for Grounded X-ray Report GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Temporal comparison of chest X-rays is fundamental to clinical radiology, enabling detection of disease progression, treatment response, and new findings. While vision-language models have advanced single-image report generation and visual grounding, no existing method combines these capabilities for temporal change detection. We introduce Temporal Radiology with Anatomical Change Explanation (TRACE), the first model that jointly performs temporal comparison, change classification, and spatial localization. Given a prior and current chest X-ray, TRACE generates natural language descriptions of interval changes (worsened, improved, stable) while grounding each finding with bounding box coordinates. TRACE demonstrates effective spatial localization with over 90% grounding accuracy, establishing a foundation for this challenging new task. Our ablation study uncovers an emergent capability: change detection arises only when temporal comparison and spatial grounding are jointly learned, as neither alone enables meaningful change detection. This finding suggests that grounding provides a spatial attention mechanism essential for temporal reasoning.
- [263] arXiv:2602.02964 [pdf, ps, other]
-
Title: Testing Framework Migration with Large Language ModelsComments: Accepted for publication at AST 2026Subjects: Software Engineering (cs.SE)
Python developers rely on two major testing frameworks: \texttt{unittest} and \texttt{Pytest}. While \texttt{Pytest} offers simpler assertions, reusable fixtures, and better interoperability, migrating existing suites from \texttt{unittest} remains a manual and time-consuming process. Automating this migration could substantially reduce effort and accelerate test modernization. In this paper, we investigate the capability of Large Language Models (LLMs) to automate test framework migrations from \texttt{unittest} to \texttt{Pytest}. We evaluate GPT 4o and Claude Sonnet 4 under three prompting strategies (Zero-shot, One-shot, and Chain-of-Thought) and two temperature settings (0.0 and 1.0). To support this analysis, we first introduce a curated dataset of real-world migrations extracted from the top 100 Python open-source projects. Next, we actually execute the LLM-generated test migrations in their respective test suites. Overall, we find that 51.5% of the LLM-generated test migrations failed, while 48.5% passed. The results suggest that LLMs can accelerate test migration, but there are often caveats. For example, Claude Sonnet 4 exhibited more conservative migrations (e.g., preserving class-based tests and legacy \texttt{unittest} references), while GPT-4o favored more transformations (e.g., to function-based tests). We conclude by discussing multiple implications for practitioners and researchers.
- [264] arXiv:2602.02965 [pdf, ps, other]
-
Title: Understanding Bug-Reproducing Tests: A First Empirical StudyComments: Accepted for publication at AST 2026Subjects: Software Engineering (cs.SE)
Developers create bug-reproducing tests that support debugging by failing as long as the bug is present, and passing once the bug has been fixed. These tests are usually integrated into existing test suites and executed regularly alongside all other tests to ensure that future regressions are caught. Despite this co-existence with other types of tests, the properties of bug-reproducing tests are scarcely researched, and it remains unclear whether they differ fundamentally. In this short paper, we provide an initial empirical study to understand bug-reproducing tests better. We analyze 642 bug-reproducing tests of 15 real-world Python systems. Overall, we find that bug-reproducing tests are not (statistically significantly) different from other tests regarding LOC, number of assertions, and complexity. However, bug-reproducing tests contain slightly more try/except blocks and ``weak assertions'' (e.g.,~\texttt{assertNotEqual}). Lastly, we detect that the majority (95%) of the bug-reproducing tests reproduce a single bug, while 5% reproduce multiple bugs. We conclude by discussing implications and future research directions.
- [265] arXiv:2602.02966 [pdf, ps, other]
-
Title: What Do Contribution Guidelines Say About Software Testing?Comments: Published at MSR 2025Subjects: Software Engineering (cs.SE)
Software testing plays a crucial role in the contribution process of open-source projects. For example, contributions introducing new features are expected to include tests, and contributions with tests are more likely to be accepted. Although most real-world projects require contributors to write tests, the specific testing practices communicated to contributors remain unclear. In this paper, we present an empirical study to understand better how software testing is approached in contribution guidelines. We analyze the guidelines of 200 Python and JavaScript open-source software projects. We find that 78\% of the projects include some form of test documentation for contributors. Test documentation is located in multiple sources, including \texttt{CONTRIBUTING} files (58\%), external documentation (24\%), and \texttt{README} files (8\%). Furthermore, test documentation commonly explains how to run tests (83.5\%), but less often provides guidance on how to write tests (37\%). It frequently covers unit tests (71\%), but rarely addresses integration (20.5\%) and end-to-end tests (15.5\%). Other key testing aspects are also less frequently discussed: test coverage (25.5\%) and mocking (9.5\%). We conclude by discussing implications and future research.
- [266] arXiv:2602.02969 [pdf, ps, other]
-
Title: Dynamic High-frequency Convolution for Infrared Small Target DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Infrared small targets are typically tiny and locally salient, which belong to high-frequency components (HFCs) in images. Single-frame infrared small target (SIRST) detection is challenging, since there are many HFCs along with targets, such as bright corners, broken clouds, and other clutters. Current learning-based methods rely on the powerful capabilities of deep networks, but neglect explicit modeling and discriminative representation learning of various HFCs, which is important to distinguish targets from other HFCs. To address the aforementioned issues, we propose a dynamic high-frequency convolution (DHiF) to translate the discriminative modeling process into the generation of a dynamic local filter bank. Especially, DHiF is sensitive to HFCs, owing to the dynamic parameters of its generated filters being symmetrically adjusted within a zero-centered range according to Fourier transformation properties. Combining with standard convolution operations, DHiF can adaptively and dynamically process different HFC regions and capture their distinctive grayscale variation characteristics for discriminative representation learning. DHiF functions as a drop-in replacement for standard convolution and can be used in arbitrary SIRST detection networks without significant decrease in computational efficiency. To validate the effectiveness of our DHiF, we conducted extensive experiments across different SIRST detection networks on real-scene datasets. Compared to other state-of-the-art convolution operations, DHiF exhibits superior detection performance with promising improvement. Codes are available at https://github.com/TinaLRJ/DHiF.
- [267] arXiv:2602.02970 [pdf, ps, other]
-
Title: Co2PO: Coordinated Constrained Policy Optimization for Multi-Agent RLSubjects: Machine Learning (cs.LG)
Constrained multi-agent reinforcement learning (MARL) faces a fundamental tension between exploration and safety-constrained optimization. Existing leading approaches, such as Lagrangian methods, typically rely on global penalties or centralized critics that react to violations after they occur, often suppressing exploration and leading to over-conservatism. We propose Co2PO, a novel MARL communication-augmented framework that enables coordination-driven safety through selective, risk-aware communication. Co2PO introduces a shared blackboard architecture for broadcasting positional intent and yield signals, governed by a learned hazard predictor that proactively forecasts potential violations over an extended temporal horizon. By integrating these forecasts into a constrained optimization objective, Co2PO allows agents to anticipate and navigate collective hazards without the performance trade-offs inherent in traditional reactive constraints. We evaluate Co2PO across a suite of complex multi-agent safety benchmarks, where it achieves higher returns compared to leading constrained baselines while converging to cost-compliant policies at deployment. Ablation studies further validate the necessity of risk-triggered communication, adaptive gating, and shared memory components.
- [268] arXiv:2602.02972 [pdf, ps, other]
-
Title: Learning Fast Monomial Orders for Gröbner Basis ComputationsAuthors: R. Caleb Bunch, Alperen A. Ergür, Melika Golestani, Jessie Tong, Malia Walewski, Yunus E. ZeytuncuSubjects: Symbolic Computation (cs.SC); Machine Learning (cs.LG); Commutative Algebra (math.AC); Algebraic Geometry (math.AG)
The efficiency of Gr\"obner basis computation, the standard engine for solving systems of polynomial equations, depends on the choice of monomial ordering. Despite a near-continuum of possible monomial orders, most implementations rely on static heuristics such as GrevLex, guided primarily by expert intuition. We address this gap by casting the selection of monomial orderings as a reinforcement learning problem over the space of admissible orderings. Our approach leverages domain-informed reward signals that accurately reflect the computational cost of Gr\"obner basis computations and admits efficient Monte Carlo estimation. Experiments on benchmark problems from systems biology and computer vision show that the resulting learned policies consistently outperform standard heuristics, yielding substantial reductions in computational cost. Moreover, we find that these policies resist distillation into simple interpretable models, providing empirical evidence that deep reinforcement learning allows the agents to exploit non-linear geometric structure beyond the scope of traditional heuristics.
- [269] arXiv:2602.02973 [pdf, ps, other]
-
Title: Fisheye Stereo Vision: Depth and Range ErrorAuthors: Leaf Jiang, Matthew Holzel, Bernhard Kaplan, Hsiou-Yuan Liu, Sabyasachi Paul, Karen Rankin, Piotr SwierczynskiSubjects: Computer Vision and Pattern Recognition (cs.CV)
This study derives analytical expressions for the depth and range error of fisheye stereo vision systems as a function of object distance, specifically accounting for accuracy at large angles.
- [270] arXiv:2602.02974 [pdf, ps, other]
-
Title: SceneLinker: Compositional 3D Scene Generation via Semantic Scene Graph from RGB SequencesComments: Accepted as an IEEE TVCG paper at IEEE VR 2026 (journal track)Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce SceneLinker, a novel framework that generates compositional 3D scenes via semantic scene graph from RGB sequences. To adaptively experience Mixed Reality (MR) content based on each user's space, it is essential to generate a 3D scene that reflects the real-world layout by compactly capturing the semantic cues of the surroundings. Prior works struggled to fully capture the contextual relationship between objects or mainly focused on synthesizing diverse shapes, making it challenging to generate 3D scenes aligned with object arrangements. We address these challenges by designing a graph network with cross-check feature attention for scene graph prediction and constructing a graph-variational autoencoder (graph-VAE), which consists of a joint shape and layout block for 3D scene generation. Experiments on the 3RScan/3DSSG and SG-FRONT datasets demonstrate that our approach outperforms state-of-the-art methods in both quantitative and qualitative evaluations, even in complex indoor environments and under challenging scene graph constraints. Our work enables users to generate consistent 3D spaces from their physical environments via scene graphs, allowing them to create spatial MR content. Project page is https://scenelinker2026.github.io.
- [271] arXiv:2602.02975 [pdf, ps, other]
-
Title: Where Norms and References Collide: Evaluating LLMs on Normative ReasoningAuthors: Mitchell Abrams, Kaveh Eskandari Miandoab, Felix Gervits, Vasanth Sarathy, Matthias ScheutzComments: Accepted to the 40th AAAI Conference on Artificial Intelligence (AAAI-26)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Embodied agents, such as robots, will need to interact in situated environments where successful communication often depends on reasoning over social norms: shared expectations that constrain what actions are appropriate in context. A key capability in such settings is norm-based reference resolution (NBRR), where interpreting referential expressions requires inferring implicit normative expectations grounded in physical and social context. Yet it remains unclear whether Large Language Models (LLMs) can support this kind of reasoning. In this work, we introduce SNIC (Situated Norms in Context), a human-validated diagnostic testbed designed to probe how well state-of-the-art LLMs can extract and utilize normative principles relevant to NBRR. SNIC emphasizes physically grounded norms that arise in everyday tasks such as cleaning, tidying, and serving. Across a range of controlled evaluations, we find that even the strongest LLMs struggle to consistently identify and apply social norms, particularly when norms are implicit, underspecified, or in conflict. These findings reveal a blind spot in current LLMs and highlight a key challenge for deploying language-based systems in socially situated, embodied settings.
- [272] arXiv:2602.02977 [pdf, ps, other]
-
Title: Aligning Forest and Trees in Images and Long Captions for Visually Grounded UnderstandingComments: PreprintSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large vision-language models such as CLIP struggle with long captions because they align images and texts as undifferentiated wholes. Fine-grained vision-language understanding requires hierarchical semantics capturing both global context and localized details across visual and textual domains. Yet linguistic hierarchies from syntax or semantics rarely match visual organization, and purely visual hierarchies tend to fragment scenes into appearance-driven parts without semantic focus. We propose CAFT (Cross-domain Alignment of Forests and Trees), a hierarchical image-text representation learning framework that aligns global and local semantics across images and long captions without pixel-level supervision. Coupling a fine-to-coarse visual encoder with a hierarchical text transformer, it uses a hierarchical alignment loss that matches whole images with whole captions while biasing region-sentence correspondences, so that coarse semantics are built from fine-grained evidence rather than from aggregation untethered to part-level grounding. Trained on 30M image-text pairs, CAFT achieves state-of-the-art performance on six long-text retrieval benchmarks and exhibits strong scaling behavior. Experiments show that hierarchical cross-domain alignment enables fine-grained, visually grounded image-text representations to emerge without explicit region-level supervision.
- [273] arXiv:2602.02978 [pdf, ps, other]
-
Title: Structuring Value Representations via Geometric Coherence in Markov Decision ProcessesSubjects: Artificial Intelligence (cs.AI)
Geometric properties can be leveraged to stabilize and speed reinforcement learning. Existing examples include encoding symmetry structure, geometry-aware data augmentation, and enforcing structural restrictions. In this paper, we take a novel view of RL through the lens of order theory and recast value function estimates into learning a desired poset (partially ordered set). We propose \emph{GCR-RL} (Geometric Coherence Regularized Reinforcement Learning) that computes a sequence of super-poset refinements -- by refining posets in previous steps and learning additional order relationships from temporal difference signals -- thus ensuring geometric coherence across the sequence of posets underpinning the learned value functions. Two novel algorithms by Q-learning and by actor--critic are developed to efficiently realize these super-poset refinements. Their theoretical properties and convergence rates are analyzed. We empirically evaluate GCR-RL in a range of tasks and demonstrate significant improvements in sample efficiency and stable performance over strong baselines.
- [274] arXiv:2602.02979 [pdf, ps, other]
-
Title: CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement LearningAuthors: Ran Li, Zeyuan Liu, Yinghao chen, Bingxiang He, Jiarui Yuan, Zixuan Fu, Weize Chen, Jinyi Hu, Zhiyuan Liu, Maosong SunComments: work in progressSubjects: Computation and Language (cs.CL)
Large Language Models (LLMs) have demonstrated strong potential in complex reasoning, yet their progress remains fundamentally constrained by reliance on massive high-quality human-curated tasks and labels, either through supervised fine-tuning (SFT) or reinforcement learning (RL) on reasoning-specific data. This dependence renders supervision-heavy training paradigms increasingly unsustainable, with signs of diminishing scalability already evident in practice. To overcome this limitation, we introduce CPM\"obius (CPMobius), a collaborative Coach-Player paradigm for data-free reinforcement learning of reasoning models. Unlike traditional adversarial self-play, CPM\"obius, inspired by real world human sports collaboration and multi-agent collaboration, treats the Coach and Player as independent but cooperative roles. The Coach proposes instructions targeted at the Player's capability and receives rewards based on changes in the Player's performance, while the Player is rewarded for solving the increasingly instructive tasks generated by the Coach. This cooperative optimization loop is designed to directly enhance the Player's mathematical reasoning ability. Remarkably, CPM\"obius achieves substantial improvement without relying on any external training data, outperforming existing unsupervised approaches. For example, on Qwen2.5-Math-7B-Instruct, our method improves accuracy by an overall average of +4.9 and an out-of-distribution average of +5.4, exceeding RENT by +1.5 on overall accuracy and R-zero by +4.2 on OOD accuracy.
- [275] arXiv:2602.02982 [pdf, ps, other]
-
Title: Invisible Users in Digital Health: A Scoping Review of Digital Interventions to Promote Physical Activity Among Culturally and Linguistically Diverse WomenSubjects: Human-Computer Interaction (cs.HC)
Digital health has strong potential for promoting physical activity (PA), yet interventions often fail to sustain engagement among culturally and linguistically diverse (CALD) women. Prior reviews focus on short-term efficacy or surface-level localisation, while a design-oriented synthesis of deep cultural adaptation and long-term strategies remain limited. This scoping review systematically screened 1968 records, analysed 18 studies and identified a critical design paradox: techno-solutionist systems overlook social and cultural barriers, while social-support features often fail in low-activity social networks. To address this gap, we propose the Culturally Embedded Interaction Framework, integrating five dimensions: culturally-grounded measurement, multi-modal interaction, contextual and temporal adaptability, embedded social weaving, and theory-guided cultural adaptation. The framework advances beyond accessibility-focused approaches by mapping behavioural theory to design mechanisms that support sustained and culturally plural participation. We provide actionable design principles to help HCI researchers and practitioners move from one-size-fits-all models toward adaptive, theory-informed, and culturally sustaining design.
- [276] arXiv:2602.02983 [pdf, ps, other]
-
Title: Are LLMs Biased Like Humans? Causal Reasoning as a Function of Prior Knowledge, Irrelevant Information, and Reasoning BudgetSubjects: Artificial Intelligence (cs.AI)
Large language models (LLMs) are increasingly used in domains where causal reasoning matters, yet it remains unclear whether their judgments reflect normative causal computation, human-like shortcuts, or brittle pattern matching. We benchmark 20+ LLMs against a matched human baseline on 11 causal judgment tasks formalized by a collider structure ($C_1 \!\rightarrow\! E\! \leftarrow \!C_2$). We find that a small interpretable model compresses LLMs' causal judgments well and that most LLMs exhibit more rule-like reasoning strategies than humans who seem to account for unmentioned latent factors in their probability judgments. Furthermore, most LLMs do not mirror the characteristic human collider biases of weak explaining away and Markov violations. We probe LLMs' causal judgment robustness under (i) semantic abstraction and (ii) prompt overloading (injecting irrelevant text), and find that chain-of-thought (CoT) increases robustness for many LLMs. Together, this divergence suggests LLMs can complement humans when known biases are undesirable, but their rule-like reasoning may break down when uncertainty is intrinsic -- highlighting the need to characterize LLM reasoning strategies for safe, effective deployment.
- [277] arXiv:2602.02986 [pdf, ps, other]
-
Title: Why Some Models Resist Unlearning: A Linear Stability PerspectiveSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Machine unlearning, the ability to erase the effect of specific training samples without retraining from scratch, is critical for privacy, regulation, and efficiency. However, most progress in unlearning has been empirical, with little theoretical understanding of when and why unlearning works. We tackle this gap by framing unlearning through the lens of asymptotic linear stability to capture the interaction between optimization dynamics and data geometry. The key quantity in our analysis is data coherence which is the cross sample alignment of loss surface directions near the optimum. We decompose coherence along three axes: within the retain set, within the forget set, and between them, and prove tight stability thresholds that separate convergence from divergence. To further link data properties to forgettability, we study a two layer ReLU CNN under a signal plus noise model and show that stronger memorization makes forgetting easier: when the signal to noise ratio (SNR) is lower, cross sample alignment is weaker, reducing coherence and making unlearning easier; conversely, high SNR, highly aligned models resist unlearning. For empirical verification, we show that Hessian tests and CNN heatmaps align closely with the predicted boundary, mapping the stability frontier of gradient based unlearning as a function of batching, mixing, and data/model alignment. Our analysis is grounded in random matrix theory tools and provides the first principled account of the trade offs between memorization, coherence, and unlearning.
- [278] arXiv:2602.02987 [pdf, ps, other]
-
Title: Large-Scale LLM Inference with Heterogeneous Workloads: Prefill-Decode Contention and Asymptotically Optimal ControlSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Large Language Models (LLMs) are rapidly becoming critical infrastructure for enterprise applications, driving unprecedented demand for GPU-based inference services. A key operational challenge arises from the two-phase nature of LLM inference: a compute-intensive \emph{prefill} phase that processes user input, followed by a memory-bound \emph{decode} phase that generates output tokens. When these phases share GPU resources, prefill tasks throttle the processing speed of concurrent decodes, creating state-dependent contention. This contention is further complicated by workload heterogeneity, as different applications exhibit vastly different input and output lengths. We develop a stochastic control framework for scheduling heterogeneous LLM workloads across large GPU clusters. We formulate LLM inference as a multiclass many-server queueing network with state-dependent service rates, grounded in empirical iteration-time measurements. We analyze the fluid approximation of this system and solve steady-state linear programs that characterize optimal resource allocation. We design gate-and-route policies that regulate prefill admission and decode routing, and prove that they are asymptotically optimal in the many-GPU limit under both bundled and separate token-pricing schemes. We further extend the framework to incorporate Service Level Indicators (SLIs) such as latency and fairness, providing a general approach to constrained scheduling. Numerical experiments calibrated to empirical iteration-time data demonstrate that our policies outperform standard serving heuristics.
- [279] arXiv:2602.02988 [pdf, ps, other]
-
Title: NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs InferenceComments: Admitted to ICLR 18pages 5 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks, but their deployment is often constrained by substantial memory footprints and computational costs. While prior work has achieved significant progress in compressing and accelerating linear layers, nonlinear layers-such as SiLU, RMSNorm, and Softmax-still heavily depend on high-precision floating-point operations. In this paper, we propose a calibration-free, dynamic-programming-optimal, and hardware-friendly framework called Non-uniform Linear Interpolation (NLI). NLI is capable of efficiently approximating a variety of nonlinear functions, enabling seamless integration into LLMs and other deep neural networks with almost no loss in accuracy. NLI ingeniously recasts cutpoint selection as a dynamic-programming problem, achieving the globally minimal interpolation error in O(MxN2) time via Bellman's optimality principle. Based on the NLI algorithm, we also design and implement a plug-and-play universal nonlinear computation unit. Hardware experiments demonstrate that the NLI Engine achieves more than 4x improvement in computational efficiency compared to the state-of-the-art designs.
- [280] arXiv:2602.02989 [pdf, ps, other]
-
Title: SharpTimeGS: Sharp and Stable Dynamic Gaussian Splatting via Lifespan ModulationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Novel view synthesis of dynamic scenes is fundamental to achieving photorealistic 4D reconstruction and immersive visual experiences. Recent progress in Gaussian-based representations has significantly improved real-time rendering quality, yet existing methods still struggle to maintain a balance between long-term static and short-term dynamic regions in both representation and optimization. To address this, we present SharpTimeGS, a lifespan-aware 4D Gaussian framework that achieves temporally adaptive modeling of both static and dynamic regions under a unified representation. Specifically, we introduce a learnable lifespan parameter that reformulates temporal visibility from a Gaussian-shaped decay into a flat-top profile, allowing primitives to remain consistently active over their intended duration and avoiding redundant densification. In addition, the learned lifespan modulates each primitives' motion, reducing drift in long-lived static points while retaining unrestricted motion for short-lived dynamic ones. This effectively decouples motion magnitude from temporal duration, improving long-term stability without compromising dynamic fidelity. Moreover, we design a lifespan-velocity-aware densification strategy that mitigates optimization imbalance between static and dynamic regions by allocating more capacity to regions with pronounced motion while keeping static areas compact and stable. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art performance while supporting real-time rendering up to 4K resolution at 100 FPS on one RTX 4090.
- [281] arXiv:2602.02990 [pdf, ps, other]
-
Title: Learning to Repair Lean Proofs from Compiler FeedbackComments: 15 pages, 6 figuresSubjects: Machine Learning (cs.LG)
As neural theorem provers become increasingly agentic, the ability to interpret and act on compiler feedback is critical. However, existing Lean datasets consist almost exclusively of correct proofs, offering little supervision for understanding and repairing failures. We study Lean proof repair as a supervised learning problem: given an erroneous proof and compiler feedback, predict both a corrected proof and a natural-language diagnosis grounded in the same feedback. We introduce APRIL (Automated Proof Repair in Lean), a dataset of 260,000 supervised tuples pairing systematically generated proof failures with compiler diagnostics and aligned repair and explanation targets. Training language models on APRIL substantially improves repair accuracy and feedback-conditioned reasoning; in our single-shot repair evaluation setting, a finetuned 4B-parameter model outperforms the strongest open-source baseline. We view diagnostic-conditioned supervision as a complementary training signal for feedback-using provers. Our dataset is available at \href{https://huggingface.co/datasets/uw-math-ai/APRIL}{this link}.
- [282] arXiv:2602.02991 [pdf, ps, other]
-
Title: Large Language Models Can Take False First Steps at Inference-time PlanningSubjects: Artificial Intelligence (cs.AI)
Large language models (LLMs) have been shown to acquire sequence-level planning abilities during training, yet their planning behavior exhibited at inference time often appears short-sighted and inconsistent with these capabilities. We propose a Bayesian account for this gap by grounding planning behavior in the evolving generative context: given the subtle differences between natural language and the language internalized by LLMs, accumulated self-generated context drives a planning-shift during inference and thereby creates the appearance of compromised planning behavior. We further validate the proposed model through two controlled experiments: a random-generation task demonstrating constrained planning under human prompts and increasing planning strength as self-generated context accumulates, and a Gaussian-sampling task showing reduced initial bias when conditioning on self-generated sequences. These findings provide a theoretical explanation along with empirical evidence for characterizing how LLMs plan ahead during inference.
- [283] arXiv:2602.02994 [pdf, ps, other]
-
Title: Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy DistillationAuthors: Jiaze Li, Hao Yin, Haoran Xu, Boshen Xu, Wenhui Tan, Zewen He, Jianzhong Ju, Zhenbo Luo, Jian LuanSubjects: Computer Vision and Pattern Recognition (cs.CV)
Reinforcement learning has emerged as a principled post-training paradigm for Temporal Video Grounding (TVG) due to its on-policy optimization, yet existing GRPO-based methods remain fundamentally constrained by sparse reward signals and substantial computational overhead. We propose Video-OPD, an efficient post-training framework for TVG inspired by recent advances in on-policy distillation. Video-OPD optimizes trajectories sampled directly from the current policy, thereby preserving alignment between training and inference distributions, while a frontier teacher supplies dense, token-level supervision via a reverse KL divergence objective. This formulation preserves the on-policy property critical for mitigating distributional shift, while converting sparse, episode-level feedback into fine-grained, step-wise learning signals. Building on Video-OPD, we introduce Teacher-Validated Disagreement Focusing (TVDF), a lightweight training curriculum that iteratively prioritizes trajectories that are both teacher-reliable and maximally informative for the student, thereby improving training efficiency. Empirical results demonstrate that Video-OPD consistently outperforms GRPO while achieving substantially faster convergence and lower computational cost, establishing on-policy distillation as an effective alternative to conventional reinforcement learning for TVG.
- [284] arXiv:2602.02995 [pdf, ps, other]
-
Title: Agent Alpha: Tree Search Unifying Generation, Exploration and Evaluation for Computer-Use AgentsSubjects: Artificial Intelligence (cs.AI)
While scaling test-time compute through trajectory-level sampling has significantly improved Graphical User Interface (GUI) agents, the lack of regressive ability prevents the reuse of partial successes and the recovery from early missteps. In this paper, we introduce Agent Alpha, a unified framework that synergizes generation, exploration, and evaluation through step-level Monte Carlo Tree Search (MCTS). It enables active modeling or exploiting structures of the planning space. By integrating alpha-UCT guided search into the interaction loop, Agent Alpha enables deliberate planning, facilitating early pruning of suboptimal branches and efficient prefix reuse. We also employ comparison-driven evaluation to mitigate absolute scoring biases and diversity-constrained expansion to maintain a compact, informative search space. Regret bound of alpha-UCT is analyzed. On the OSWorld benchmark, Agent Alpha achieves a state-of-the-art success rate of $\sim 77\%$, significantly outperforming trajectory-level baselines under equivalent compute.
- [285] arXiv:2602.02999 [pdf, ps, other]
-
Title: ResQ: Realistic Performance-Aware Query GenerationComments: 13 pages, 4 figuresSubjects: Databases (cs.DB)
Database research and development rely heavily on realistic user workloads for benchmarking, instance optimization, migration testing, and database tuning. However, acquiring real-world SQL queries is notoriously challenging due to strict privacy regulations. While cloud database vendors have begun releasing anonymized performance traces to the research community, these traces typi- cally provide only high-level execution statistics without the origi- nal query text or data, which is insufficient for scenarios that require actual execution. Existing tools fail to capture fine-grained perfor- mance patterns or generate runnable workloads that reproduce these public traces with both high fidelity and efficiency. To bridge this gap, we propose ResQ, a fine-grained workload synthesis sys- tem designed to generate executable SQL workloads that faithfully match the per-query execution targets and operator distributions of production traces. ResQ constructs execution-aware query graphs, instantiates them into SQL via Bayesian Optimization-driven pred- icate search, and explicitly models workload repetition through reuse at both exact-query and parameterized-template levels. To ensure practical scalability, ResQ combines search-space bounding with lightweight local cost models to accelerate optimization. Ex- periments on public cloud traces (Snowset, Redset) and a newly released industrial trace (Bendset) demonstrate that ResQ signif- icantly outperforms state-of-the-art baselines, achieving 96.71% token savings and a 86.97% reduction in runtime, while lowering maximum Q-error by 14.8x on CPU time and 997.7x on scanned bytes, and closely matching operator composition.
- [286] arXiv:2602.03001 [pdf, ps, other]
-
Title: Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral DescentAuthors: Hiroki Naganuma, Shagun Gupta, Youssef Briki, Ioannis Mitliagkas, Irina Rish, Parameswaran Raman, Hao-Jun Michael ShiComments: 8 pages, 2 figures, 4 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
To maximize hardware utilization, modern machine learning systems typically employ large constant or manually tuned batch size schedules, relying on heuristics that are brittle and costly to tune. Existing adaptive strategies based on gradient noise scale (GNS) offer a principled alternative. However, their assumption of SGD's Euclidean geometry creates a fundamental mismatch with popular optimizers based on generalized norms, such as signSGD / Signum ($\ell_\infty$) and stochastic spectral descent (specSGD) / Muon ($\mathcal{S}_\infty$). In this work, we derive gradient noise scales for signSGD and specSGD that naturally emerge from the geometry of their respective dual norms. To practically estimate these non-Euclidean metrics, we propose an efficient variance estimation procedure that leverages the local mini-batch gradients on different ranks in distributed data-parallel systems. Our experiments demonstrate that adaptive batch size strategies using non-Euclidean GNS enable us to match the validation loss of constant-batch baselines while reducing training steps by up to 66% for Signum and Muon on a 160 million parameter Llama model.
- [287] arXiv:2602.03002 [pdf, ps, other]
-
Title: RPL: Learning Robust Humanoid Perceptive Locomotion on Challenging TerrainsAuthors: Yuanhang Zhang, Younggyo Seo, Juyue Chen, Yifu Yuan, Koushil Sreenath, Pieter Abbeel, Carmelo Sferrazza, Karen Liu, Rocky Duan, Guanya ShiSubjects: Robotics (cs.RO)
Humanoid perceptive locomotion has made significant progress and shows great promise, yet achieving robust multi-directional locomotion on complex terrains remains underexplored. To tackle this challenge, we propose RPL, a two-stage training framework that enables multi-directional locomotion on challenging terrains, and remains robust with payloads. RPL first trains terrain-specific expert policies with privileged height map observations to master decoupled locomotion and manipulation skills across different terrains, and then distills them into a transformer policy that leverages multiple depth cameras to cover a wide range of views. During distillation, we introduce two techniques to robustify multi-directional locomotion, depth feature scaling based on velocity commands and random side masking, which are critical for asymmetric depth observations and unseen widths of terrains. For scalable depth distillation, we develop an efficient multi-depth system that ray-casts against both dynamic robot meshes and static terrain meshes in massively parallel environments, achieving a 5-times speedup over the depth rendering pipelines in existing simulators while modeling realistic sensor latency, noise, and dropout. Extensive real-world experiments demonstrate robust multi-directional locomotion with payloads (2kg) across challenging terrains, including 20{\deg} slopes, staircases with different step lengths (22 cm, 25 cm, 30 cm), and 25 cm by 25 cm stepping stones separated by 60 cm gaps.
- [288] arXiv:2602.03003 [pdf, ps, other]
-
Title: Methods and Open Problems in Differentiable Social Choice: Learning Mechanisms, Decisions, and AlignmentSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Social choice is no longer a peripheral concern of political theory or economics-it has become a foundational component of modern machine learning systems. From auctions and resource allocation to federated learning, participatory governance, and the alignment of large language models, machine learning pipelines increasingly aggregate heterogeneous preferences, incentives, and judgments into collective decisions. In effect, many contemporary machine learning systems already implement social choice mechanisms, often implicitly and without explicit normative scrutiny.
This Review surveys differentiable social choice: an emerging paradigm that formulates voting rules, mechanisms, and aggregation procedures as learnable, differentiable models optimized from data. We synthesize work across auctions, voting, budgeting, liquid democracy, decentralized aggregation, and inverse mechanism learning, showing how classical axioms and impossibility results reappear as objectives, constraints, and optimization trade-offs. We conclude by identifying 36 open problems defining a new research agenda at the intersection of machine learning, economics, and democratic theory. - [289] arXiv:2602.03004 [pdf, ps, other]
-
Title: Causal Graph Spatial-Temporal Autoencoder for Reliable and Interpretable Process MonitoringSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
To improve the reliability and interpretability of industrial process monitoring, this article proposes a Causal Graph Spatial-Temporal Autoencoder (CGSTAE). The network architecture of CGSTAE combines two components: a correlation graph structure learning module based on spatial self-attention mechanism (SSAM) and a spatial-temporal encoder-decoder module utilizing graph convolutional long-short term memory (GCLSTM). The SSAM learns correlation graphs by capturing dynamic relationships between variables, while a novel three-step causal graph structure learning algorithm is introduced to derive a causal graph from these correlation graphs. The algorithm leverages a reverse perspective of causal invariance principle to uncover the invariant causal graph from varying correlations. The spatial-temporal encoder-decoder, built with GCLSTM units, reconstructs time-series process data within a sequence-to-sequence framework. The proposed CGSTAE enables effective process monitoring and fault detection through two statistics in the feature space and residual space. Finally, we validate the effectiveness of CGSTAE in process monitoring through the Tennessee Eastman process and a real-world air separation process.
- [290] arXiv:2602.03006 [pdf, ps, other]
-
Title: Distilling LLM Reasoning into Graph of Concept PredictorsSubjects: Artificial Intelligence (cs.AI)
Deploying Large Language Models (LLMs) for discriminative workloads is often limited by inference latency, compute, and API costs at scale. Active distillation reduces these costs by querying an LLM oracle to train compact discriminative students, but most pipelines distill only final labels, discarding intermediate reasoning signals and offering limited diagnostics of what reasoning is missing and where errors arise. We propose Graph of Concept Predictors (GCP), a reasoning-aware active distillation framework that externalizes the teacher's decision process as a directed acyclic graph and mirrors it with modular concept predictors in the student. GCP enhances sample efficiency through a graph-aware acquisition strategy that targets uncertainty and disagreement at critical reasoning nodes. Additionally, it improves training stability and efficiency by performing targeted sub-module retraining, which attributes downstream loss to specific concept predictors and updates only the most influential modules. Experiments on eight NLP classification benchmarks demonstrate that GCP enhances performance under limited annotation budgets while yielding more interpretable and controllable training dynamics. Code is available at: https://github.com/Ziyang-Yu/GCP.
- [291] arXiv:2602.03007 [pdf, ps, other]
-
Title: VOILA: Value-of-Information Guided Fidelity Selection for Cost-Aware Multimodal Question AnsweringAuthors: Rahul Atul Bhope, K. R. Jayaram, Vinod Muthusamy, Ritesh Kumar, Vatche Isahagian, Nalini VenkatasubramanianSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Despite significant costs from retrieving and processing high-fidelity visual inputs, most multimodal vision-language systems operate at fixed fidelity levels. We introduce VOILA, a framework for Value-Of-Information-driven adaptive fidelity selection in Visual Question Answering (VQA) that optimizes what information to retrieve before model execution. Given a query, VOILA uses a two-stage pipeline: a gradient-boosted regressor estimates correctness likelihood at each fidelity from question features alone, then an isotonic calibrator refines these probabilities for reliable decision-making. The system selects the minimum-cost fidelity maximizing expected utility given predicted accuracy and retrieval costs. We evaluate VOILA across three deployment scenarios using five datasets (VQA-v2, GQA, TextVQA, LoCoMo, FloodNet) and six Vision-Language Models (VLMs) with 7B-235B parameters. VOILA consistently achieves 50-60% cost reductions while retaining 90-95% of full-resolution accuracy across diverse query types and model architectures, demonstrating that pre-retrieval fidelity selection is vital to optimize multimodal inference under resource constraints.
- [292] arXiv:2602.03012 [pdf, ps, other]
-
Title: CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security VulnerabilityAuthors: Xianzhen Luo, Jingyuan Zhang, Shiqi Zhou, Rain Huang, Chuan Xiao, Qingfu Zhu, Zhiyuan Ma, Xing Yue, Yang Yue, Wencong Zeng, Wanxiang CheComments: Under ReviewSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Evaluating and improving the security capabilities of code agents requires high-quality, executable vulnerability tasks. However, existing works rely on costly, unscalable manual reproduction and suffer from outdated data distributions. To address these, we present CVE-Factory, the first multi-agent framework to achieve expert-level quality in automatically transforming sparse CVE metadata into fully executable agentic tasks. Cross-validation against human expert reproductions shows that CVE-Factory achieves 95\% solution correctness and 96\% environment fidelity, confirming its expert-level quality. It is also evaluated on the latest realistic vulnerabilities and achieves a 66.2\% verified success. This automation enables two downstream contributions. First, we construct LiveCVEBench, a continuously updated benchmark of 190 tasks spanning 14 languages and 153 repositories that captures emerging threats including AI-tooling vulnerabilities. Second, we synthesize over 1,000 executable training environments, the first large-scale scaling of agentic tasks in code security. Fine-tuned Qwen3-32B improves from 5.3\% to 35.8\% on LiveCVEBench, surpassing Claude 4.5 Sonnet, with gains generalizing to Terminal Bench (12.5\% to 31.3\%). We open-source CVE-Factory, LiveCVEBench, Abacus-cve (fine-tuned model), training dataset, and leaderboard. All resources are available at https://github.com/livecvebench/CVE-Factory .
- [293] arXiv:2602.03013 [pdf, ps, other]
-
Title: Thinking inside the Convolution for Image Inpainting: Reconstructing Texture via Structure under Global and Local SideComments: 17 pages, 17 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Image inpainting has earned substantial progress, owing to the encoder-and-decoder pipeline, which is benefited from the Convolutional Neural Networks (CNNs) with convolutional downsampling to inpaint the masked regions semantically from the known regions within the encoder, coupled with an upsampling process from the decoder for final inpainting output. Recent studies intuitively identify the high-frequency structure and low-frequency texture to be extracted by CNNs from the encoder, and subsequently for a desirable upsampling recovery. However, the existing arts inevitably overlook the information loss for both structure and texture feature maps during the convolutional downsampling process, hence suffer from a non-ideal upsampling output. In this paper, we systematically answer whether and how the structure and texture feature map can mutually help to alleviate the information loss during the convolutional downsampling. Given the structure and texture feature maps, we adopt the statistical normalization and denormalization strategy for the reconstruction guidance during the convolutional downsampling process. The extensive experimental results validate its advantages to the state-of-the-arts over the images from low-to-high resolutions including 256*256 and 512*512, especially holds by substituting all the encoders by ours. Our code is available at https://github.com/htyjers/ConvInpaint-TSGL
- [294] arXiv:2602.03015 [pdf, ps, other]
-
Title: A Vision-Based Analysis of Congestion Pricing in New York CityAuthors: Mehmet Kerem Turkcan, Jhonatan Tavori, Javad Ghaderi, Gil Zussman, Zoran Kostic, Andrew SmythSubjects: Computer Vision and Pattern Recognition (cs.CV)
We examine the impact of New York City's congestion pricing program through automated analysis of traffic camera data. Our computer vision pipeline processes footage from over 900 cameras distributed throughout Manhattan and New York, comparing traffic patterns from November 2024 through the program's implementation in January 2025 until January 2026. We establish baseline traffic patterns and identify systematic changes in vehicle density across the monitored region.
- [295] arXiv:2602.03017 [pdf, ps, other]
-
Title: From Hanging Out to Figuring It Out: Socializing Online as a Pathway to Computational ThinkingJournal-ref: New Media & Society 23 (8): 2327-44Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Although socializing is a powerful driver of youth engagement online, platforms struggle to leverage engagement to promote learning. We seek to understand this dynamic using a multi-stage analysis of over 14,000 comments on Scratch, an online platform designed to support learning about programming. First, we inductively develop the concept of "participatory debugging" -- a practice through which users learn through collaborative technical troubleshooting. Second, we use a content analysis to establish how common the practice is on Scratch. Third, we conduct a qualitative analysis of user activity over time and identify three factors that serve as social antecedents of participatory debugging: (1) sustained community, (2) identifiable problems, and (3) what we call "topic porousness" to describe conversations that are able to span multiple topics. We integrate these findings in a theoretical framework that highlights a productive tension between the desire to promote learning and the interest-driven sub-communities that drive user engagement in many new media environments.
- [296] arXiv:2602.03018 [pdf, ps, other]
-
Title: From Zero to Hero: Advancing Zero-Shot Foundation Models for Tabular Outlier DetectionComments: 37 pagesSubjects: Machine Learning (cs.LG)
Outlier detection (OD) is widely used in practice; but its effective deployment on new tasks is hindered by lack of labeled outliers, which makes algorithm and hyperparameter selection notoriously hard. Foundation models (FMs) have transformed ML, and OD is no exception: Shen et. al. (2025) introduced FoMo-0D, the first FM for OD, achieving remarkable performance against numerous baselines. This work introduces OUTFORMER, which advances FoMo-0D with (1) a mixture of synthetic priors and (2) self-evolving curriculum training. OUTFORMER is pretrained solely on synthetic labeled datasets and infers test labels of a new task by using its training data as in-context input. Inference is fast and zero-shot, requiring merely forward pass and no labeled outliers. Thanks to in-context learning, it requires zero additional work-no OD model training or bespoke model selection-enabling truly plug-and-play deployment. OUTFORMER achieves state-of-the-art performance on the prominent AdBench, as well as two new large-scale OD benchmarks that we introduce, comprising over 1,500 datasets, while maintaining speedy inference.
- [297] arXiv:2602.03019 [pdf, ps, other]
-
Title: FedKRSO: Communication and Memory Efficient Federated Fine-Tuning of Large Language ModelsComments: Accepted by INFOCOM 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Fine-tuning is essential to adapt general-purpose large language models (LLMs) to domain-specific tasks. As a privacy-preserving framework to leverage decentralized data for collaborative model training, Federated Learning (FL) is gaining popularity in LLM fine-tuning, but remains challenging due to the high cost of transmitting full model parameters and computing full gradients on resource-constrained clients. While Parameter-Efficient Fine-Tuning (PEFT) methods are widely used in FL to reduce communication and memory costs, they often sacrifice model performance compared to FFT. This paper proposes FedKRSO (Federated $K$-Seed Random Subspace Optimization), a novel method that enables communication and memory efficient FFT of LLMs in federated settings. In FedKRSO, clients update the model within a shared set of random low-dimension subspaces generated by the server to save memory usage. Furthermore, instead of transmitting full model parameters in each FL round, clients send only the model update accumulators along the subspaces to the server, enabling efficient global model aggregation and dissemination. By using these strategies, FedKRSO can substantially reduce communication and memory overhead while overcoming the performance limitations of PEFT, closely approximating the performance of federated FFT. The convergence properties of FedKRSO are analyzed rigorously under general FL settings. Extensive experiments on the GLUE benchmark across diverse FL scenarios demonstrate that FedKRSO achieves both superior performance and low communication and memory overhead, paving the way towards on federated LLM fine-tuning at the resource-constrained edge.
- [298] arXiv:2602.03020 [pdf, ps, other]
-
Title: Fast Diffusion with Physics-Correction for ACOPFSubjects: Systems and Control (eess.SY)
Generating large-scale, physically consistent AC Optimal Power Flow (ACOPF) datasets is essential for modern data-driven power system applications. The central challenge lies in balancing solution accuracy with computational efficiency. Recent diffusion-based generative models produce high-quality samples; however, their slow sampling procedures limit practical scalability. In this work, we argue that exact physical feasibility is ultimately enforced by power flow solvers or projection steps, and therefore the generative model only needs to produce good initializations rather than perfectly feasible solutions. Based on this insight, we propose a fast diffusion framework using Denoising Diffusion Implicit Models (DDIM) combined with physics-guided corrections during sampling. The proposed method replaces slow stochastic refinement with a small number of deterministic steps and explicit constraint guidance. Experiments on IEEE 6-, 24-, and 118-bus systems show that our approach achieves up to 20 times faster sampling than standard diffusion models while maintaining comparable statistical accuracy and physical consistency. This makes the method well suited for scalable OPF dataset generation and practical power system learning tasks. We release the implementation code at https://github.com/PSquare-Lab/DDIM_OPF.
- [299] arXiv:2602.03022 [pdf, ps, other]
-
Title: STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling ModelsComments: The paper has been accepted to ICLR 2026Subjects: Artificial Intelligence (cs.AI)
The proliferation of Large Language Models (LLMs) in function calling is pivotal for creating advanced AI agents, yet their large scale hinders widespread adoption, necessitating transferring their capabilities into smaller ones. However, existing paradigms are often plagued by overfitting, training instability, ineffective binary rewards for multi-solution tasks, and the difficulty of synergizing techniques. We introduce STAR: Similarity-guided Teacher-Assisted Refinement, a novel holistic framework that effectively transfers LLMs' capabilities to super-tiny models. STAR consists of two core technical innovations: (1) Constrained Knowledge Distillation (CKD), a training objective that augments top-k forward KL divergence to suppress confidently incorrect predictions, ensuring training stability while preserving exploration capacity for downstream RL. STAR holistically synergizes these strategies within a cohesive training curriculum, enabling super-tiny models to achieve exceptional performance on complex function calling tasks; (2) Similarity-guided RL (Sim-RL), a RL mechanism that introduces a fine-grained, similarity-based reward. This provides a robust, continuous, and rich signal for better policy optimization by evaluating the similarity between generated outputs and the ground truth. Extensive experiments on challenging and renowned benchmarks demonstrate the effectiveness of our method. Our STAR models establish SOTA in their size classes, significantly outperforming baselines. Remarkably, our 0.6B STAR model achieves the best performance among all open models under 1B, surpassing even several well-known open models at a larger scale. STAR demonstrates a training framework that distills capabilities of LLMs into super-tiny models, paving the way for powerful, accessible, and efficient AI agents.
- [300] arXiv:2602.03023 [pdf, ps, other]
-
Title: Rethinking Music Captioning with Music Metadata LLMsComments: Accepted to ICASSP 2026Subjects: Sound (cs.SD); Machine Learning (cs.LG)
Music captioning, or the task of generating a natural language description of music, is useful for both music understanding and controllable music generation. Training captioning models, however, typically requires high-quality music caption data which is scarce compared to metadata (e.g., genre, mood, etc.). As a result, it is common to use large language models (LLMs) to synthesize captions from metadata to generate training data for captioning models, though this process imposes a fixed stylization and entangles factual information with natural language style. As a more direct approach, we propose metadata-based captioning. We train a metadata prediction model to infer detailed music metadata from audio and then convert it into expressive captions via pre-trained LLMs at inference time. Compared to a strong end-to-end baseline trained on LLM-generated captions derived from metadata, our method: (1) achieves comparable performance in less training time over end-to-end captioners, (2) offers flexibility to easily change stylization post-training, enabling output captions to be tailored to specific stylistic and quality requirements, and (3) can be prompted with audio and partial metadata to enable powerful metadata imputation or in-filling--a common task for organizing music data.
- [301] arXiv:2602.03024 [pdf, ps, other]
-
Title: Consistency Deep Equilibrium ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Deep Equilibrium Models (DEQs) have emerged as a powerful paradigm in deep learning, offering the ability to model infinite-depth networks with constant memory usage. However, DEQs incur significant inference latency due to the iterative nature of fixed-point solvers. In this work, we introduce the Consistency Deep Equilibrium Model (C-DEQ), a novel framework that leverages consistency distillation to accelerate DEQ inference. We cast the DEQ iterative inference process as evolution along a fixed ODE trajectory toward the equilibrium. Along this trajectory, we train C-DEQs to consistently map intermediate states directly to the fixed point, enabling few-step inference while preserving the performance of the teacher DEQ. At the same time, it facilitates multi-step evaluation to flexibly trade computation for performance gains. Extensive experiments across various domain tasks demonstrate that C-DEQs achieves consistent 2-20$\times$ accuracy improvements over implicit DEQs under the same few-step inference budget.
- [302] arXiv:2602.03025 [pdf, ps, other]
-
Title: RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling AgentsSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Multi-turn tool calling is challenging for Large Language Models (LLMs) because rewards are sparse and exploration is expensive. A common recipe, SFT followed by GRPO, can stall when within-group reward variation is low (e.g., more rollouts in a group receive the all 0 or all 1 reward), making the group-normalized advantage uninformative and yielding vanishing updates. To address this problem, we propose RC-GRPO (Reward-Conditioned Group Relative Policy Optimization), which treats exploration as a controllable steering problem via discrete reward tokens. We first fine-tune a Reward-Conditioned Trajectory Policy (RCTP) on mixed-quality trajectories with reward goal special tokens (e.g., <|high_reward|>, <|low_reward|>) injected into the prompts, enabling the model to learn how to generate distinct quality trajectories on demand. Then during RL, we sample diverse reward tokens within each GRPO group and condition rollouts on the sampled token to improve within-group diversity, improving advantage gains. On the Berkeley Function Calling Leaderboard v4 (BFCLv4) multi-turn benchmark, our method yields consistently improved performance than baselines, and the performance on Qwen-2.5-7B-Instruct even surpasses all closed-source API models.
- [303] arXiv:2602.03026 [pdf, ps, other]
-
Title: Visual Reasoning over Time Series via Multi-Agent SystemSubjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Time series analysis underpins many real-world applications, yet existing time-series-specific methods and pretrained large-model-based approaches remain limited in integrating intuitive visual reasoning and generalizing across tasks with adaptive tool usage. To address these limitations, we propose MAS4TS, a tool-driven multi-agent system for general time series tasks, built upon an Analyzer-Reasoner-Executor paradigm that integrates agent communication, visual reasoning, and latent reconstruction within a unified framework. MAS4TS first performs visual reasoning over time series plots with structured priors using a Vision-Language Model to extract temporal structures, and subsequently reconstructs predictive trajectories in latent space. Three specialized agents coordinate via shared memory and gated communication, while a router selects task-specific tool chains for execution. Extensive experiments on multiple benchmarks demonstrate that MAS4TS achieves state-of-the-art performance across a wide range of time series tasks, while exhibiting strong generalization and efficient inference.
- [304] arXiv:2602.03028 [pdf, ps, other]
-
Title: MUSE: A Multi-agent Framework for Unconstrained Story Envisioning via Closed-Loop Cognitive OrchestrationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Generating long-form audio-visual stories from a short user prompt remains challenging due to an intent-execution gap, where high-level narrative intent must be preserved across coherent, shot-level multimodal generation over long horizons. Existing approaches typically rely on feed-forward pipelines or prompt-only refinement, which often leads to semantic drift and identity inconsistency as sequences grow longer. We address this challenge by formulating storytelling as a closed-loop constraint enforcement problem and propose MUSE, a multi-agent framework that coordinates generation through an iterative plan-execute-verify-revise loop. MUSE translates narrative intent into explicit, machine-executable controls over identity, spatial composition, and temporal continuity, and applies targeted multimodal feedback to correct violations during generation. To evaluate open-ended storytelling without ground-truth references, we introduce MUSEBench, a reference-free evaluation protocol validated by human judgments. Experiments demonstrate that MUSE substantially improves long-horizon narrative coherence, cross-modal identity consistency, and cinematic quality compared with representative baselines.
- [305] arXiv:2602.03033 [pdf, ps, other]
-
Title: Layered Modal ML: Syntax and Full AbstractionComments: 22 pages, 6 figuresSubjects: Programming Languages (cs.PL)
MetaML-style metaprogramming languages allow programmers to construct, manipulate and run code. In the presence of higher-order references for code, ensuring type safety is challenging, as free variables can escape their binders. In this paper, we present Layered Modal ML (LMML), \textit{the first metaprogramming language that supports storing and running open code under a strong type safety guarantee}. The type system utilises contextual modal types to track and reason about free variables in code explicitly.
A crucial concern in metaprogramming-based program optimisations is whether the optimised program preserves the meaning of the original program. Addressing this question requires a notion of program equivalence and techniques to reason about it. In this paper, we provide a semantic model that captures contextual equivalence for LMML, establishing \textit{the first full abstraction result for an imperative MetaML-style language}. Our model is based on traces derived via operational game semantics, where the meaning of a program is modelled by its possible interactions with the environment. We also establish a novel closed instances of use theorem that accounts for both call-by-value and call-by-name closing substitutions. - [306] arXiv:2602.03034 [pdf, ps, other]
-
Title: KANFIS A Neuro-Symbolic Framework for Interpretable and Uncertainty-Aware LearningSubjects: Artificial Intelligence (cs.AI)
Adaptive Neuro-Fuzzy Inference System (ANFIS) was designed to combine the learning capabilities of neural network with the reasoning transparency of fuzzy logic. However, conventional ANFIS architectures suffer from structural complexity, where the product-based inference mechanism causes an exponential explosion of rules in high-dimensional spaces. We herein propose the Kolmogorov-Arnold Neuro-Fuzzy Inference System (KANFIS), a compact neuro-symbolic architecture that unifies fuzzy reasoning with additive function decomposition. KANFIS employs an additive aggregation mechanism, under which both model parameters and rule complexity scale linearly with input dimensionality rather than exponentially. Furthermore, KANFIS is compatible with both Type-1 (T1) and Interval Type-2 (IT2) fuzzy logic systems, enabling explicit modeling of uncertainty and ambiguity in fuzzy representations. By using sparse masking mechanisms, KANFIS generates compact and structured rule sets, resulting in an intrinsically interpretable model with clear rule semantics and transparent inference processes. Empirical results demonstrate that KANFIS achieves competitive performance against representative neural and neuro-fuzzy baselines.
- [307] arXiv:2602.03035 [pdf, ps, other]
-
Title: Generalizable and Interpretable RF Fingerprinting with Shapelet-Enhanced Large Language ModelsComments: 12 pages, 7 figures, IMWUT submissionSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Deep neural networks (DNNs) have achieved remarkable success in radio frequency (RF) fingerprinting for wireless device authentication. However, their practical deployment faces two major limitations: domain shift, where models trained in one environment struggle to generalize to others, and the black-box nature of DNNs, which limits interpretability. To address these issues, we propose a novel framework that integrates a group of variable-length two-dimensional (2D) shapelets with a pre-trained large language model (LLM) to achieve efficient, interpretable, and generalizable RF fingerprinting. The 2D shapelets explicitly capture diverse local temporal patterns across the in-phase and quadrature (I/Q) components, providing compact and interpretable representations. Complementarily, the pre-trained LLM captures more long-range dependencies and global contextual information, enabling strong generalization with minimal training overhead. Moreover, our framework also supports prototype generation for few-shot inference, enhancing cross-domain performance without additional retraining. To evaluate the effectiveness of our proposed method, we conduct extensive experiments on six datasets across various protocols and domains. The results show that our method achieves superior standard and few-shot performance across both source and unseen domains.
- [308] arXiv:2602.03036 [pdf, ps, other]
-
Title: LatentMem: Customizing Latent Memory for Multi-Agent SystemsAuthors: Muxin Fu, Guibin Zhang, Xiangyuan Xue, Yafu Li, Zefeng He, Siyuan Huang, Xiaoye Qu, Yu Cheng, Yang YangSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Large language model (LLM)-powered multi-agent systems (MAS) demonstrate remarkable collective intelligence, wherein multi-agent memory serves as a pivotal mechanism for continual adaptation. However, existing multi-agent memory designs remain constrained by two fundamental bottlenecks: (i) memory homogenization arising from the absence of role-aware customization, and (ii) information overload induced by excessively fine-grained memory entries. To address these limitations, we propose LatentMem, a learnable multi-agent memory framework designed to customize agent-specific memories in a token-efficient manner. Specifically, LatentMem comprises an experience bank that stores raw interaction trajectories in a lightweight form, and a memory composer that synthesizes compact latent memories conditioned on retrieved experience and agent-specific contexts. Further, we introduce Latent Memory Policy Optimization (LMPO), which propagates task-level optimization signals through latent memories to the composer, encouraging it to produce compact and high-utility representations. Extensive experiments across diverse benchmarks and mainstream MAS frameworks show that LatentMem achieves a performance gain of up to $19.36$% over vanilla settings and consistently outperforms existing memory architectures, without requiring any modifications to the underlying frameworks.
- [309] arXiv:2602.03038 [pdf, ps, other]
-
Title: Bongards at the Boundary of Perception and Reasoning: Programs or Language?Authors: Cassidy Langenfeld, Claas Beger, Gloria Geng, Wasu Top Piriyakulkij, Keya Hu, Yewen Pu, Kevin EllisComments: 6 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Vision-Language Models (VLMs) have made great strides in everyday visual tasks, such as captioning a natural image, or answering commonsense questions about such images. But humans possess the puzzling ability to deploy their visual reasoning abilities in radically new situations, a skill rigorously tested by the classic set of visual reasoning challenges known as the Bongard problems. We present a neurosymbolic approach to solving these problems: given a hypothesized solution rule for a Bongard problem, we leverage LLMs to generate parameterized programmatic representations for the rule and perform parameter fitting using Bayesian optimization. We evaluate our method on classifying Bongard problem images given the ground truth rule, as well as on solving the problems from scratch.
- [310] arXiv:2602.03039 [pdf, ps, other]
-
Title: HP-GAN: Harnessing pretrained networks for GAN improvement with FakeTwins and discriminator consistencyComments: Accepted manuscript. This is the accepted version of the article published in Neural NetworksSubjects: Computer Vision and Pattern Recognition (cs.CV)
Generative Adversarial Networks (GANs) have made significant progress in enhancing the quality of image synthesis. Recent methods frequently leverage pretrained networks to calculate perceptual losses or utilize pretrained feature spaces. In this paper, we extend the capabilities of pretrained networks by incorporating innovative self-supervised learning techniques and enforcing consistency between discriminators during GAN training. Our proposed method, named HP-GAN, effectively exploits neural network priors through two primary strategies: FakeTwins and discriminator consistency. FakeTwins leverages pretrained networks as encoders to compute a self-supervised loss and applies this through the generated images to train the generator, thereby enabling the generation of more diverse and high quality images. Additionally, we introduce a consistency mechanism between discriminators that evaluate feature maps extracted from Convolutional Neural Network (CNN) and Vision Transformer (ViT) feature networks. Discriminator consistency promotes coherent learning among discriminators and enhances training robustness by aligning their assessments of image quality. Our extensive evaluation across seventeen datasets-including scenarios with large, small, and limited data, and covering a variety of image domains-demonstrates that HP-GAN consistently outperforms current state-of-the-art methods in terms of Fr\'echet Inception Distance (FID), achieving significant improvements in image diversity and quality. Code is available at: https://github.com/higun2/HP-GAN.
- [311] arXiv:2602.03040 [pdf, ps, other]
-
Title: DF-LoGiT: Data-Free Logic-Gated Backdoor Attacks in Vision TransformersSubjects: Cryptography and Security (cs.CR)
The widespread adoption of Vision Transformers (ViTs) elevates supply-chain risk on third-party model hubs, where an adversary can implant backdoors into released checkpoints. Existing ViT backdoor attacks largely rely on poisoned-data training, while prior data-free attempts typically require synthetic-data fine-tuning or extra model components. This paper introduces Data-Free Logic-Gated Backdoor Attacks (DF-LoGiT), a truly data-free backdoor attack on ViTs via direct weight editing. DF-LoGiT exploits ViT's native multi-head architecture to realize a logic-gated compositional trigger, enabling a stealthy and effective backdoor. We validate its effectiveness through theoretical analysis and extensive experiments, showing that DF-LoGiT achieves near-100% attack success with negligible degradation in benign accuracy and remains robust against representative classical and ViT-specific defenses.
- [312] arXiv:2602.03043 [pdf, ps, other]
-
Title: SAFE-KD: Risk-Controlled Early-Exit Distillation for Vision BackbonesAuthors: Salim KhazemComments: Submitted to IJCNNSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Early-exit networks reduce inference cost by allowing ``easy'' inputs to stop early, but practical deployment hinges on knowing \emph{when} early exit is safe. We introduce SAFE-KD, a universal multi-exit wrapper for modern vision backbones that couples hierarchical distillation with \emph{conformal risk control}. SAFE-KD attaches lightweight exit heads at intermediate depths, distills a strong teacher into all exits via Decoupled Knowledge Distillation (DKD), and enforces deep-to-shallow consistency between exits. At inference, we calibrate per-exit stopping thresholds on a held-out set using conformal risk control (CRC) to guarantee a user-specified \emph{selective} misclassification risk (among the samples that exit early) under exchangeability. Across multiple datasets and architectures, SAFE-KD yields improved accuracy compute trade-offs, stronger calibration, and robust performance under corruption while providing finite-sample risk guarantees.
- [313] arXiv:2602.03045 [pdf, ps, other]
-
Title: Clarify Before You Draw: Proactive Agents for Robust Text-to-CAD GenerationComments: In ReviewSubjects: Machine Learning (cs.LG)
Large language models have recently enabled text-to-CAD systems that synthesize parametric CAD programs (e.g., CadQuery) from natural language prompts. In practice, however, geometric descriptions can be under-specified or internally inconsistent: critical dimensions may be missing and constraints may conflict. Existing fine-tuned models tend to reactively follow user instructions and hallucinate dimensions when the text is ambiguous. To address this, we propose a proactive agentic framework for text-to-CadQuery generation, named ProCAD, that resolves specification issues before code synthesis. Our framework pairs a proactive clarifying agent, which audits the prompt and asks targeted clarification questions only when necessary to produce a self-consistent specification, with a CAD coding agent that translates the specification into an executable CadQuery program. We fine-tune the coding agent on a curated high-quality text-to-CadQuery dataset and train the clarifying agent via agentic SFT on clarification trajectories. Experiments show that proactive clarification significantly improves robustness to ambiguous prompts while keeping interaction overhead low. ProCAD outperforms frontier closed-source models, including Claude Sonnet 4.5, reducing the mean Chamfer distance by 79.9 percent and lowering the invalidity ratio from 4.8 percent to 0.9 percent. Our code and datasets will be made publicly available.
- [314] arXiv:2602.03048 [pdf, ps, other]
-
Title: CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMsAuthors: Zhiyuan Yao, Yi-Kai Zhang, Yuxin Chen, Yueqing Sun, Zishan Xu, Yu Yang, Tianhao Hu, Qi Gu, Hui Su, Xunliang CaiSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key approach for enhancing LLM reasoning.However, standard frameworks like Group Relative Policy Optimization (GRPO) typically employ a uniform rollout budget, leading to resource inefficiency. Moreover, existing adaptive methods often rely on instance-level metrics, such as task pass rates, failing to capture the model's dynamic learning state. To address these limitations, we propose CoBA-RL, a reinforcement learning algorithm designed to adaptively allocate rollout budgets based on the model's evolving capability. Specifically, CoBA-RL utilizes a Capability-Oriented Value function to map tasks to their potential training gains and employs a heap-based greedy strategy to efficiently self-calibrate the distribution of computational resources to samples with high training value. Extensive experiments demonstrate that our approach effectively orchestrates the trade-off between exploration and exploitation, delivering consistent generalization improvements across multiple challenging benchmarks. These findings underscore that quantifying sample training value and optimizing budget allocation are pivotal for advancing LLM post-training efficiency.
- [315] arXiv:2602.03051 [pdf, ps, other]
-
Title: SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM CompressionSubjects: Computation and Language (cs.CL)
The rapid growth in the parameter scale of large language models (LLMs) has created a high demand for efficient compression techniques. As a hardware-agnostic and highly compatible technique, low-rank compression has been widely adopted. However, existing methods typically compress each layer independently by minimizing per-layer reconstruction error, overlooking a critical limitation: the reconstruction error propagates and accumulates through the network, which leads to amplified global deviations from the full-precision baseline. To address this, we propose Self-Adaptive Error Suppression SVD (SAES-SVD), a LLMs compression framework that jointly optimizes intra-layer reconstruction and inter-layer error compensation. SAES-SVD is composed of two novel components: (1) Cumulative Error-Aware Layer Compression (CEALC), which formulates the compression objective as a combination of local reconstruction and weighted cumulative error compensation. Based on it, we derive a closed-form low-rank solution relied on second-order activation statistics, which explicitly aligns each layer's output with its full-precision counterpart to compensate for accumulated errors. (2) Adaptive Collaborative Error Suppression (ACES), which automatically adjusts the weighting coefficient to enhance the low-rank structure of the compression objective in CEALC. Specifically, the coefficient is optimized to maximize the ratio between the Frobenius norm of the compressed layer's output and that of the compression objective under a fixed rank, thus ensuring that the rank budget is utilized effectively. Extensive experiments across multiple LLM architectures and tasks show that, without fine-tuning or mixed-rank strategies, SAES-SVD consistently improves post-compression performance.
- [316] arXiv:2602.03052 [pdf, ps, other]
-
Title: Fedcompass: Federated Clustered and Periodic Aggregation Framework for Hybrid Classical-Quantum ModelsComments: Accepted by the 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP 2026)Subjects: Machine Learning (cs.LG)
Federated learning enables collaborative model training across decentralized clients under privacy constraints. Quantum computing offers potential for alleviating computational and communication burdens in federated learning, yet hybrid classical-quantum federated learning remains susceptible to performance degradation under non-IID data. To address this,we propose FEDCOMPASS, a layered aggregation framework for hybrid classical-quantum federated learning. FEDCOMPASS employs spectral clustering to group clients by class distribution similarity and performs cluster-wise aggregation for classical feature extractors. For quantum parameters, it uses circular mean aggregation combined with adaptive optimization to ensure stable global updates. Experiments on three benchmark datasets show that FEDCOMPASS improves test accuracy by up to 10.22% and enhances convergence stability under non-IID settings, outperforming six strong federated learning baselines.
- [317] arXiv:2602.03053 [pdf, ps, other]
-
Title: MAS-ProVe: Understanding the Process Verification of Multi-Agent SystemsAuthors: Vishal Venkataramani, Haizhou Shi, Zixuan Ke, Austin Xu, Xiaoxiao He, Yingbo Zhou, Semih Yavuz, Hao Wang, Shafiq JotyComments: Preprint; work in progressSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multiagent Systems (cs.MA)
Multi-Agent Systems (MAS) built on Large Language Models (LLMs) often exhibit high variance in their reasoning trajectories. Process verification, which evaluates intermediate steps in trajectories, has shown promise in general reasoning settings, and has been suggested as a potential tool for guiding coordination of MAS; however, its actual effectiveness in MAS remains unclear. To fill this gap, we present MAS-ProVe, a systematic empirical study of process verification for multi-agent systems (MAS). Our study spans three verification paradigms (LLM-as-a-Judge, reward models, and process reward models), evaluated across two levels of verification granularity (agent-level and iteration-level). We further examine five representative verifiers and four context management strategies, and conduct experiments over six diverse MAS frameworks on multiple reasoning benchmarks. We find that process-level verification does not consistently improve performance and frequently exhibits high variance, highlighting the difficulty of reliably evaluating partial multi-agent trajectories. Among the methods studied, LLM-as-a-Judge generally outperforms reward-based approaches, with trained judges surpassing general-purpose LLMs. We further observe a small performance gap between LLMs acting as judges and as single agents, and identify a context-length-performance trade-off in verification. Overall, our results suggest that effective and robust process verification for MAS remains an open challenge, requiring further advances beyond current paradigms. Code is available at https://github.com/Wang-ML-Lab/MAS-ProVe.
- [318] arXiv:2602.03054 [pdf, ps, other]
-
Title: Towards Considerate Embodied AI: Co-Designing Situated Multi-Site Healthcare Robots from Abstract Concepts to High-Fidelity PrototypesComments: To appear in Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI 2026)Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Co-design is essential for grounding embodied artificial intelligence (AI) systems in real-world contexts, especially high-stakes domains such as healthcare. While prior work has explored multidisciplinary collaboration, iterative prototyping, and support for non-technical participants, few have interwoven these into a sustained co-design process. Such efforts often target one context and low-fidelity stages, limiting the generalizability of findings and obscuring how participants' ideas evolve. To address these limitations, we conducted a 14-week workshop with a multidisciplinary team of 22 participants, centered around how embodied AI can reduce non-value-added task burdens in three healthcare settings: emergency departments, long-term rehabilitation facilities, and sleep disorder clinics. We found that the iterative progression from abstract brainstorming to high-fidelity prototypes, supported by educational scaffolds, enabled participants to understand real-world trade-offs and generate more deployable solutions. We propose eight guidelines for co-designing more considerate embodied AI: attuned to context, responsive to social dynamics, mindful of expectations, and grounded in deployment. Project Page: https://byc-sophie.github.io/Towards-Considerate-Embodied-AI/
- [319] arXiv:2602.03056 [pdf, ps, other]
-
Title: ALPBench: A Benchmark for Attribution-level Long-term Personal Behavior UnderstandingAuthors: Lu Ren, Junda She, Xinchen Luo, Tao Wang, Xin Ye, Xu Zhang, Muxuan Wang, Xiao Yang, Chenguang Wang, Fei Xie, Yiwei Zhou, Danjun Wu, Guodong Zhang, Yifei Hu, Guoying Zheng, Shujie Yang, Xingmei Wang, Shiyao Wang, Yukun Zhou, Fan Yang, Size Li, Kuo Cai, Qiang Luo, Ruiming Tang, Han Li, Kun GaiSubjects: Information Retrieval (cs.IR)
Recent advances in large language models have highlighted their potential for personalized recommendation, where accurately capturing user preferences remains a key challenge. Leveraging their strong reasoning and generalization capabilities, LLMs offer new opportunities for modeling long-term user behavior. To systematically evaluate this, we introduce ALPBench, a Benchmark for Attribution-level Long-term Personal Behavior Understanding. Unlike item-focused benchmarks, ALPBench predicts user-interested attribute combinations, enabling ground-truth evaluation even for newly introduced items. It models preferences from long-term historical behaviors rather than users' explicitly expressed requests, better reflecting enduring interests. User histories are represented as natural language sequences, allowing interpretable, reasoning-based personalization. ALPBench enables fine-grained evaluation of personalization by focusing on the prediction of attribute combinations task that remains highly challenging for current LLMs due to the need to capture complex interactions among multiple attributes and reason over long-term user behavior sequences.
- [320] arXiv:2602.03059 [pdf, ps, other]
-
Title: From Speech-to-Spatial: Grounding Utterances on A Live Shared View with Augmented RealityComments: 11 pages, 6 figures. This is the author's version of the article that will appear at the IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR) 2026Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL); Emerging Technologies (cs.ET); Information Retrieval (cs.IR)
We introduce Speech-to-Spatial, a referent disambiguation framework that converts verbal remote-assistance instructions into spatially grounded AR guidance. Unlike prior systems that rely on additional cues (e.g., gesture, gaze) or manual expert annotations, Speech-to-Spatial infers the intended target solely from spoken references (speech input). Motivated by our formative study of speech referencing patterns, we characterize recurring ways people specify targets (Direct Attribute, Relational, Remembrance, and Chained) and ground them to our object-centric relational graph. Given an utterance, referent cues are parsed and rendered as persistent in-situ AR visual guidance, reducing iterative micro-guidance ("a bit more to the right", "now, stop.") during remote guidance. We demonstrate the use cases of our system with remote guided assistance and intent disambiguation scenarios. Our evaluation shows that Speechto-Spatial improves task efficiency, reduces cognitive load, and enhances usability compared to a conventional voice-only baseline, transforming disembodied verbal instruction into visually explainable, actionable guidance on a live shared view.
- [321] arXiv:2602.03060 [pdf, ps, other]
-
Title: IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token PruningComments: Accepted to ICLR 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
Large Vision-Language Models (LVLMs) achieve impressive performance across multiple tasks. A significant challenge, however, is their prohibitive inference cost when processing high-resolution visual inputs. While visual token pruning has emerged as a promising solution, existing methods that primarily focus on semantic relevance often discard tokens that are crucial for spatial reasoning. We address this gap through a novel insight into \emph{how LVLMs process spatial reasoning}. Specifically, we reveal that LVLMs implicitly establish visual coordinate systems through Rotary Position Embeddings (RoPE), where specific token positions serve as \textbf{implicit visual coordinates} (IVC tokens) that are essential for spatial reasoning. Based on this insight, we propose \textbf{IVC-Prune}, a training-free, prompt-aware pruning strategy that retains both IVC tokens and semantically relevant foreground tokens. IVC tokens are identified by theoretically analyzing the mathematical properties of RoPE, targeting positions at which its rotation matrices approximate identity matrix or the $90^\circ$ rotation matrix. Foreground tokens are identified through a robust two-stage process: semantic seed discovery followed by contextual refinement via value-vector similarity. Extensive evaluations across four representative LVLMs and twenty diverse benchmarks show that IVC-Prune reduces visual tokens by approximately 50\% while maintaining $\geq$ 99\% of the original performance and even achieving improvements on several benchmarks. Source codes are available at https://github.com/FireRedTeam/IVC-Prune.
- [322] arXiv:2602.03061 [pdf, ps, other]
-
Title: Evaluating LLMs When They Do Not Know the Answer: Statistical Evaluation of Mathematical Reasoning via Comparative SignalsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
Evaluating mathematical reasoning in LLMs is constrained by limited benchmark sizes and inherent model stochasticity, yielding high-variance accuracy estimates and unstable rankings across platforms. On difficult problems, an LLM may fail to produce a correct final answer, yet still provide reliable pairwise comparison signals indicating which of two candidate solutions is better. We leverage this observation to design a statistically efficient evaluation framework that combines standard labeled outcomes with pairwise comparison signals obtained by having models judge auxiliary reasoning chains. Treating these comparison signals as control variates, we develop a semiparametric estimator based on the efficient influence function (EIF) for the setting where auxiliary reasoning chains are observed. This yields a one-step estimator that achieves the semiparametric efficiency bound, guarantees strict variance reduction over naive sample averaging, and admits asymptotic normality for principled uncertainty quantification. Across simulations, our one-step estimator substantially improves ranking accuracy, with gains increasing as model output noise grows. Experiments on GPQA Diamond, AIME 2025, and GSM8K further demonstrate more precise performance estimation and more reliable model rankings, especially in small-sample regimes where conventional evaluation is pretty unstable.
- [323] arXiv:2602.03064 [pdf, ps, other]
-
Title: JRDB-Pose3D: A Multi-person 3D Human Pose and Shape Estimation Dataset for RoboticsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Real-world scenes are inherently crowded. Hence, estimating 3D poses of all nearby humans, tracking their movements over time, and understanding their activities within social and environmental contexts are essential for many applications, such as autonomous driving, robot perception, robot navigation, and human-robot interaction. However, most existing 3D human pose estimation datasets primarily focus on single-person scenes or are collected in controlled laboratory environments, which restricts their relevance to real-world applications. To bridge this gap, we introduce JRDB-Pose3D, which captures multi-human indoor and outdoor environments from a mobile robotic platform. JRDB-Pose3D provides rich 3D human pose annotations for such complex and dynamic scenes, including SMPL-based pose annotations with consistent body-shape parameters and track IDs for each individual over time. JRDB-Pose3D contains, on average, 5-10 human poses per frame, with some scenes featuring up to 35 individuals simultaneously. The proposed dataset presents unique challenges, including frequent occlusions, truncated bodies, and out-of-frame body parts, which closely reflect real-world environments. Moreover, JRDB-Pose3D inherits all available annotations from the JRDB dataset, such as 2D pose, information about social grouping, activities, and interactions, full-scene semantic masks with consistent human- and object-level tracking, and detailed annotations for each individual, such as age, gender, and race, making it a holistic dataset for a wide range of downstream perception and human-centric understanding tasks.
- [324] arXiv:2602.03066 [pdf, ps, other]
-
Title: Shortcut Features as Top Eigenfunctions of NTK: A Linear Neural Network Case and MoreJournal-ref: NeurIPS 2025Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
One of the chronic problems of deep-learning models is shortcut learning. In a case where the majority of training data are dominated by a certain feature, neural networks prefer to learn such a feature even if the feature is not generalizable outside the training set. Based on the framework of Neural Tangent Kernel (NTK), we analyzed the case of linear neural networks to derive some important properties of shortcut learning. We defined a feature of a neural network as an eigenfunction of NTK. Then, we found that shortcut features correspond to features with larger eigenvalues when the shortcuts stem from the imbalanced number of samples in the clustered distribution. We also showed that the features with larger eigenvalues still have a large influence on the neural network output even after training, due to data variances in the clusters. Such a preference for certain features remains even when a margin of a neural network output is controlled, which shows that the max-margin bias is not the only major reason for shortcut learning. These properties of linear neural networks are empirically extended for more complex neural networks as a two-layer fully-connected ReLU network and a ResNet-18.
- [325] arXiv:2602.03067 [pdf, ps, other]
-
Title: FlashSinkhorn: IO-Aware Entropic Optimal TransportSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
Entropic optimal transport (EOT) via Sinkhorn iterations is widely used in modern machine learning, yet GPU solvers remain inefficient at scale. Tensorized implementations suffer quadratic HBM traffic from dense $n\times m$ interactions, while existing online backends avoid storing dense matrices but still rely on generic tiled map-reduce reduction kernels with limited fusion. We present \textbf{FlashSinkhorn}, an IO-aware EOT solver for squared Euclidean cost that rewrites stabilized log-domain Sinkhorn updates as row-wise LogSumExp reductions of biased dot-product scores, the same normalization as transformer attention. This enables FlashAttention-style fusion and tiling: fused Triton kernels stream tiles through on-chip SRAM and update dual potentials in a single pass, substantially reducing HBM IO per iteration while retaining linear-memory operations. We further provide streaming kernels for transport application, enabling scalable first- and second-order optimization. On A100 GPUs, FlashSinkhorn achieves up to $32\times$ forward-pass and $161\times$ end-to-end speedups over state-of-the-art online baselines on point-cloud OT, improves scalability on OT-based downstream tasks. For reproducibility, we release an open-source implementation at https://github.com/ot-triton-lab/ot_triton.
- [326] arXiv:2602.03068 [pdf, ps, other]
-
Title: From semantic memory to collective creativity: A generative cognitive foundation for social creativity modelsSubjects: Social and Information Networks (cs.SI)
Simulation-based theory development has yielded powerful insights into collective performance by linking social structure to emergent outcomes, yet it has struggled to extend to collective creativity. Creativity is hard to capture purely at the social level, as novel ideas are generated through cognitive mechanisms. To address this gap, we introduce a multi-level socio-cognitive agent-based framework in which agents share a common semantic vocabulary and substrate but differ in semantic network topology. A single generative parameter tunes semantic modularity, yielding emergent individual differences in ideational breadth. When agents exchange ideation traces, two canonical social-creativity phenomena arise without being imposed: lower pre-interaction ideation overlap predicts larger stimulation gains, and shared inspiration sources induce network-level redundancy. The framework enables mechanistic theory-building about cognition and social structure in collective creativity.
- [327] arXiv:2602.03069 [pdf, ps, other]
-
Title: Skill-Based Autonomous Agents for Material Creep Database ConstructionSubjects: Databases (cs.DB)
The advancement of data-driven materials science is currently constrained by a fundamental bottleneck: the vast majority of historical experimental data remains locked within the unstructured text and rasterized figures of legacy scientific literature. Manual curation of this knowledge is prohibitively labor-intensive and prone to human error. To address this challenge, we introduce an autonomous, agent-based framework powered by Large Language Models (LLMs) designed to excavate high-fidelity datasets from scientific PDFs without human intervention. By deploying a modular "skill-based" architecture, the agent orchestrates complex cognitive tasks - including semantic filtering, multi-modal information extraction, and physics-informed validation. We demonstrate the efficacy of this framework by constructing a physically self-consistent database for material creep mechanics, a domain characterized by complex graphical trajectories and heterogeneous constitutive models. Applying the pipeline to 243 publications, the agent achieved a verified extraction success rate exceeding 90% for graphical data digitization. Crucially, we introduce a cross-modal verification protocol, demonstrating that the agent can autonomously align visually extracted data points with textually extracted constitutive parameters ($R^2 > 0.99$), ensuring the physical self-consistency of the database. This work not only provides a critical resource for investigating time-dependent deformation across diverse material systems but also establishes a scalable paradigm for autonomous knowledge acquisition, paving the way for the next generation of self-driving laboratories.
- [328] arXiv:2602.03070 [pdf, ps, other]
-
Title: ProOPF: Benchmarking and Improving LLMs for Professional-Grade Power Systems Optimization ModelingAuthors: Chao Shen, Zihan Guo, Xu Wan, Zhenghao Yang, Yifan Zhang, Wengi Huang, Jie Song, Zongyan Zhang, Mingyang SunSubjects: Systems and Control (eess.SY); Software Engineering (cs.SE)
Growing renewable penetration introduces substantial uncertainty into power system operations, necessitating frequent adaptation of dispatch objectives and constraints and challenging expertise-intensive, near-real-time modeling workflows. Large Language Models (LLMs) provide a promising avenue for automating this process by translating natural-language (NL) operational requirements into executable optimization models via semantic reasoning and code synthesis. Yet existing LLM datasets and benchmarks for optimization modeling primarily target coarse-grained cross-domain generalization, offering limited, rigorous evaluation in power-system settings, particularly for Optimal Power Flow (OPF). We therefore introduce \textbf{ProOPF-D} and \textbf{ProOPF-B}, a dataset and benchmark for professional-grade OPF modeling: ProOPF-D contains 12K instances pairing NL requests with parameter adjustments and structural extensions to a canonical OPF, together with executable implementations; ProOPF-B provides 121 expert-annotated test cases with ground-truth code, enabling end-to-end evaluation under both concrete and abstract OPF modeling regimes.
- [329] arXiv:2602.03071 [pdf, ps, other]
-
Title: Finding Optimal Video Moment without Training: Gaussian Boundary Optimization for Weakly Supervised Video GroundingComments: Accepted in IEEE TMMSubjects: Computer Vision and Pattern Recognition (cs.CV)
Weakly supervised temporal video grounding aims to localize query-relevant segments in untrimmed videos using only video-sentence pairs, without requiring ground-truth segment annotations that specify exact temporal boundaries. Recent approaches tackle this task by utilizing Gaussian-based temporal proposals to represent query-relevant segments. However, their inference strategies rely on heuristic mappings from Gaussian parameters to segment boundaries, resulting in suboptimal localization performance. To address this issue, we propose Gaussian Boundary Optimization (GBO), a novel inference framework that predicts segment boundaries by solving a principled optimization problem that balances proposal coverage and segment compactness. We derive a closed-form solution for this problem and rigorously analyze the optimality conditions under varying penalty regimes. Beyond its theoretical foundations, GBO offers several practical advantages: it is training-free and compatible with both single-Gaussian and mixture-based proposal architectures. Our experiments show that GBO significantly improves localization, achieving state-of-the-art results across standard benchmarks. Extensive experiments demonstrate the efficiency and generalizability of GBO across various proposal schemes. The code is available at \href{https://github.com/sunoh-kim/gbo}{https://github.com/sunoh-kim/gbo}.
- [330] arXiv:2602.03072 [pdf, ps, other]
-
Title: Towards Weak Stratification for Logics of DefinitionsAuthors: Nathan GuermondComments: Appendix for arXiv:2510.12297Subjects: Logic in Computer Science (cs.LO)
The logic of definitions is a family of logics for encoding and reasoning about judgments, which are atomic predicates specified by inference rules. A definition associates an atomic predicate with a logical formula, which may itself depend on the predicate being defined. This leads to an apparent circularity which can be resolved by interpreting definitions as monotone fixed-point operators on terms, and which is enforced by imposing a stratification condition on definitions. In many instances, it is useful to consider definitions in which the predicate being defined appears negatively in the body of its definition. In the logic $\mathcal G$, underlying the Abella proof assistant, this is not allowed due to the stratification condition. One such application violating this condition is that of defining logical relations, which is a technique commonly used to prove properties about programming languages. Tiu has shown how to relax this stratification condition to allow for a broader body of definitions including that needed for logical relations. However, he only showed how to extend a core fragment of $\mathcal G$ with the weakened stratification condition, resulting in a logic he called $\mathrm{LD}$. In this work we show that the weakened stratification condition is also compatible with generic (nabla) quantification and general induction. The eventual aim of this work is to justify an extension of the Abella proof assistant allowing for such definitions.
- [331] arXiv:2602.03073 [pdf, ps, other]
-
Title: TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFTSubjects: Machine Learning (cs.LG)
Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) are the two dominant paradigms for enhancing Large Language Model (LLM) performance on downstream tasks. While RL generally preserves broader model capabilities (retention) better than SFT, it comes with significant costs: complex reward engineering, instability, and expensive on-policy sampling. In contrast, SFT is efficient but brittle, often suffering from catastrophic forgetting due to $\textbf{Supervision Mismatch}$: the divergence between the model's evolving policy and static training labels. We address this trade-off with $\textbf{Trajectory-Mixed Supervision (TMS)}$, a reward-free framework that approximates the on-policy benefits of RL by creating a dynamic curriculum from the model's own historical checkpoints. TMS minimizes $\textit{Policy-Label Divergence (PLD)}$, preventing the mode collapse that drives forgetting in standard SFT. Experiments across reasoning (MATH, GSM8K) and instruction-following benchmarks demonstrate that TMS effectively shifts the accuracy--retention Pareto frontier. While RL remains the gold standard for retention, TMS significantly outperforms standard and iterative SFT, bridging the gap to RL without requiring reward models or verifiers. Mechanistic analysis confirms that PLD drift accurately predicts forgetting and that TMS successfully mitigates this drift.
- [332] arXiv:2602.03074 [pdf, ps, other]
-
Title: Straggler-Aware Coded Polynomial AggregationComments: 6 pages, 1 figureSubjects: Information Theory (cs.IT)
Coded polynomial aggregation (CPA) in distributed computing systems enables the master to directly recover a weighted aggregation of polynomial computations without individually decoding each term, thereby reducing the number of required worker responses. However, existing CPA schemes are restricted to an idealized setting in which the system cannot tolerate stragglers. In this paper, we extend CPA to straggler-aware distributed computing systems with a pre-specified non-straggler pattern, where exact recovery is required for a given collection of admissible non-straggler sets. Our main results show that exact recovery of the desired aggregation is achievable with fewer worker responses than that required by polynomial codes based on individual decoding, and that feasibility is characterized by the intersection structure of the non-straggler patterns. In particular, we establish necessary and sufficient conditions for exact recovery in straggler-aware CPA. We identify an intersection-size threshold that is sufficient to guarantee exact recovery. When the number of admissible non-straggler sets is sufficiently large, we further show that this threshold is necessary in a generic sense. We also provide an explicit construction of feasible CPA schemes whenever the intersection size exceeds the derived threshold. Finally, simulations verify our theoretical results by demonstrating a sharp feasibility transition at the predicted intersection threshold.
- [333] arXiv:2602.03075 [pdf, ps, other]
-
Title: ReMiT: RL-Guided Mid-Training for Iterative LLM EvolutionComments: 25 pagesSubjects: Computation and Language (cs.CL)
Standard training pipelines for large language models (LLMs) are typically unidirectional, progressing from pre-training to post-training. However, the potential for a bidirectional process--where insights from post-training retroactively improve the pre-trained foundation--remains unexplored. We aim to establish a self-reinforcing flywheel: a cycle in which reinforcement learning (RL)-tuned model strengthens the base model, which in turn enhances subsequent post-training performance, requiring no specially trained teacher or reference model. To realize this, we analyze training dynamics and identify the mid-training (annealing) phase as a critical turning point for model capabilities. This phase typically occurs at the end of pre-training, utilizing high-quality corpora under a rapidly decaying learning rate. Building upon this insight, we introduce ReMiT (Reinforcement Learning-Guided Mid-Training). Specifically, ReMiT leverages the reasoning priors of RL-tuned models to dynamically reweight tokens during the mid-training phase, prioritizing those pivotal for reasoning. Empirically, ReMiT achieves an average improvement of 3\% on 10 pre-training benchmarks, spanning math, code, and general reasoning, and sustains these gains by over 2\% throughout the post-training pipeline. These results validate an iterative feedback loop, enabling continuous and self-reinforcing evolution of LLMs.
- [334] arXiv:2602.03076 [pdf, ps, other]
-
Title: A generalizable large-scale foundation model for musculoskeletal radiographsAuthors: Shinn Kim, Soobin Lee, Kyoungseob Shin, Han-Soo Kim, Yongsung Kim, Minsu Kim, Juhong Nam, Somang Ko, Daeheon Kwon, Wook Huh, Ilkyu Han, Sunghoon KwonSubjects: Computer Vision and Pattern Recognition (cs.CV)
Artificial intelligence (AI) has shown promise in detecting and characterizing musculoskeletal diseases from radiographs. However, most existing models remain task-specific, annotation-dependent, and limited in generalizability across diseases and anatomical regions. Although a generalizable foundation model trained on large-scale musculoskeletal radiographs is clinically needed, publicly available datasets remain limited in size and lack sufficient diversity to enable training across a wide range of musculoskeletal conditions and anatomical sites. Here, we present SKELEX, a large-scale foundation model for musculoskeletal radiographs, trained using self-supervised learning on 1.2 million diverse, condition-rich images. The model was evaluated on 12 downstream diagnostic tasks and generally outperformed baselines in fracture detection, osteoarthritis grading, and bone tumor classification. Furthermore, SKELEX demonstrated zero-shot abnormality localization, producing error maps that identified pathologic regions without task-specific training. Building on this capability, we developed an interpretable, region-guided model for predicting bone tumors, which maintained robust performance on independent external datasets and was deployed as a publicly accessible web application. Overall, SKELEX provides a scalable, label-efficient, and generalizable AI framework for musculoskeletal imaging, establishing a foundation for both clinical translation and data-efficient research in musculoskeletal radiology.
- [335] arXiv:2602.03081 [pdf, ps, other]
-
Title: Studying the Effect of Schedule Preemption on Dynamic Task Graph SchedulingJournal-ref: Proc. IEEE Military Communications Conference (MILCOM) 2025Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Dynamic scheduling of task graphs is often addressed without revisiting prior task allocations, with a primary focus on minimizing makespan. We study controlled schedule preemption, introducing the Last-K Preemption model, which selectively reschedules recent task graphs while preserving earlier allocations. Using synthetic, RIoTBench, WFCommons, and adversarial workloads, we compare preemptive, non-preemptive, and partial-preemptive strategies across makespan, fairness, utilization, and runtime. Results show moderate preemption can match most makespan and utilization gains of full preemption while maintaining fairness and low overhead.
- [336] arXiv:2602.03082 [pdf, ps, other]
-
Title: Geometry-Preserving Neural Architectures on Manifolds with BoundaryAuthors: Karthik Elamvazhuthi, Shiba Biswal, Kian Rosenblum, Arushi Katyal, Tianli Qu, Grady Ma, Rishi SonthaliaSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
Preserving geometric structure is important in learning. We propose a unified class of geometry-aware architectures that interleave geometric updates between layers, where both projection layers and intrinsic exponential map updates arise as discretizations of projected dynamical systems on manifolds (with or without boundary). Within this framework, we establish universal approximation results for constrained neural ODEs. We also analyze architectures that enforce geometry only at the output, proving a separate universal approximation property that enables direct comparison to interleaved designs. When the constraint set is unknown, we learn projections via small-time heat-kernel limits, showing diffusion/flow-matching can be used as data-based projections. Experiments on dynamics over S^2 and SO(3), and diffusion on S^{d-1}-valued features demonstrate exact feasibility for analytic updates and strong performance for learned projections
- [337] arXiv:2602.03083 [pdf, ps, other]
-
Title: Scaling Optimized Spectral Approximations on Unbounded Domains: The Generalized Hermite and Laguerre MethodsComments: 40 pagesSubjects: Numerical Analysis (math.NA)
We propose a novel error analysis framework for scaled generalized Laguerre and generalized Hermite approximations.This framework can be regarded as an analogue of the Nyquist-Shannon sampling theorem: It characterizes the spatial and frequency bandwidths that can be effectively captured by Laguerre or Hermite sampling points. Provided a function satisfies the corresponding bandwidth constraints, it can be accurately approximated within this framework. The proposed framework is notably more powerful than classical theory -- it not only provides systematic guidance for choosing the optimal scaling factor, but also predicts root-exponential and other intricate convergence behaviors that classical approaches fail to capture. Leveraging this framework, we conducted a detailed comparative study of Hermite and Laguerre approximations. We find that functions with similar decay and oscillation characteristics may nonetheless display markedly different convergence rates. Furthermore, approximations based on two concatenated sets of Laguerre functions may offer significant advantages over those using a single set of Hermite functions.
- [338] arXiv:2602.03084 [pdf, ps, other]
-
Title: AERO: Autonomous Evolutionary Reasoning Optimization via Endogenous Dual-Loop FeedbackSubjects: Computation and Language (cs.CL)
Large Language Models (LLMs) have achieved significant success in complex reasoning but remain bottlenecked by reliance on expert-annotated data and external verifiers. While existing self-evolution paradigms aim to bypass these constraints, they often fail to identify the optimal learning zone and risk reinforcing collective hallucinations and incorrect priors through flawed internal feedback. To address these challenges, we propose \underline{A}utonomous \underline{E}volutionary \underline{R}easoning \underline{O}ptimization (AERO), an unsupervised framework that achieves autonomous reasoning evolution by internalizing self-questioning, answering, and criticism within a synergistic dual-loop system. Inspired by the \textit{Zone of Proximal Development (ZPD)} theory, AERO utilizes entropy-based positioning to target the ``solvability gap'' and employs Independent Counterfactual Correction for robust verification. Furthermore, we introduce a Staggered Training Strategy to synchronize capability growth across functional roles and prevent curriculum collapse. Extensive evaluations across nine benchmarks spanning three domains demonstrate that AERO achieves average performance improvements of 4.57\% on Qwen3-4B-Base and 5.10\% on Qwen3-8B-Base, outperforming competitive baselines. Code is available at https://github.com/mira-ai-lab/AERO.
- [339] arXiv:2602.03085 [pdf, ps, other]
-
Title: The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor TriggersAuthors: Blake Bullwinkel, Giorgio Severi, Keegan Hines, Amanda Minnich, Ram Shankar Siva Kumar, Yonatan ZungerSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal language models. Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning data, making it possible to leak backdoor examples using memory extraction techniques. Second, poisoned LLMs exhibit distinctive patterns in their output distributions and attention heads when backdoor triggers are present in the input. Guided by these observations, we develop a scalable backdoor scanning methodology that assumes no prior knowledge of the trigger or target behavior and requires only inference operations. Our scanner integrates naturally into broader defensive strategies and does not alter model performance. We show that our method recovers working triggers across multiple backdoor scenarios and a broad range of models and fine-tuning methods.
- [340] arXiv:2602.03086 [pdf, ps, other]
-
Title: Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement LearningAuthors: Jiayao Mai, Bangyan Liao, Zhenjun Zhao, Yingping Zeng, Haoang Li, Javier Civera, Tailin Wu, Yi Zhou, Peidong LiuSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
The Homotopy paradigm, a general principle for solving challenging problems, appears across diverse domains such as robust optimization, global optimization, polynomial root-finding, and sampling. Practical solvers for these problems typically follow a predictor-corrector (PC) structure, but rely on hand-crafted heuristics for step sizes and iteration termination, which are often suboptimal and task-specific. To address this, we unify these problems under a single framework, which enables the design of a general neural solver. Building on this unified view, we propose Neural Predictor-Corrector (NPC), which replaces hand-crafted heuristics with automatically learned policies. NPC formulates policy selection as a sequential decision-making problem and leverages reinforcement learning to automatically discover efficient strategies. To further enhance generalization, we introduce an amortized training mechanism, enabling one-time offline training for a class of problems and efficient online inference on new instances. Experiments on four representative homotopy problems demonstrate that our method generalizes effectively to unseen instances. It consistently outperforms classical and specialized baselines in efficiency while demonstrating superior stability across tasks, highlighting the value of unifying homotopy methods into a single neural framework.
- [341] arXiv:2602.03087 [pdf, ps, other]
-
Title: Training and Simulation of Quadrupedal Robot in Adaptive Stair Climbing for Indoor Firefighting: An End-to-End Reinforcement Learning ApproachComments: 8 pages, 9 figures, 43rd International Symposium on Automation and Robotics in ConstructionSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Quadruped robots are used for primary searches during the early stages of indoor fires. A typical primary search involves quickly and thoroughly looking for victims under hazardous conditions and monitoring flammable materials. However, situational awareness in complex indoor environments and rapid stair climbing across different staircases remain the main challenges for robot-assisted primary searches. In this project, we designed a two-stage end-to-end deep reinforcement learning (RL) approach to optimize both navigation and locomotion. In the first stage, the quadrupeds, Unitree Go2, were trained to climb stairs in Isaac Lab's pyramid-stair terrain. In the second stage, the quadrupeds were trained to climb various realistic indoor staircases in the Isaac Lab engine, with the learned policy transferred from the previous stage. These indoor staircases are straight, L-shaped, and spiral, to support climbing tasks in complex environments. This project explores how to balance navigation and locomotion and how end-to-end RL methods can enable quadrupeds to adapt to different stair shapes. Our main contributions are: (1) A two-stage end-to-end RL framework that transfers stair-climbing skills from abstract pyramid terrain to realistic indoor stair topologies. (2) A centerline-based navigation formulation that enables unified learning of navigation and locomotion without hierarchical planning. (3) Demonstration of policy generalization across diverse staircases using only local height-map perception. (4) An empirical analysis of success, efficiency, and failure modes under increasing stair difficulty.
- [342] arXiv:2602.03089 [pdf, ps, other]
-
Title: "Why I Took the Blackpill": A Thematic Analysis of the Radicalization Process in Incel CommunitiesComments: 8 pages, 1 figure. Published in Proceedings of the 2025 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2025), SpringerJournal-ref: Lecture Notes in Computer Science, vol 16323. Springer, Cham (2026) 59-66Subjects: Social and Information Networks (cs.SI)
Incels, or "involuntary celibates", are an extreme, misogynistic hate group that exists entirely online. Members of the community have been linked to acts of offline violence, including mass shootings. Previous research has engaged with the ideologies and beliefs of incels, but none has looked specifically at the radicalization process. In this paper, we perform a thematic analysis on social media posts where incels describe their own radicalization process. We identified six major themes grouped into four chronological steps: Pre-radicalization (themes of Appearance, Social Isolation, and Psychological issues), Searching for Blame, Radicalization, and Post Radicalization. These results align closely with existing work on radicalization among other extremist groups, bringing incel radicalization inline with a growing body of research on understanding and managing radicalization.
- [343] arXiv:2602.03090 [pdf, ps, other]
-
Title: In Bad Faith: Assessing Discussion Quality on Social MediaAuthors: Celia Chen, Alex Leitch, William Jordan Conway, Eric Cotugno, Emily Klein, Rajesh Kumar Gnanasekaran, Kristin Buckstad Hamilton, Casi Sherman, Celia Sterrn, Logan C. Stevens, Rebecca Zarrella, Jennifer GolbeckComments: 8 pages, 1 figure, 1 table. Published in Proceedings of the 2025 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2025), SpringerJournal-ref: Lecture Notes in Computer Science, vol 16323. Springer, Cham (2026) 338-345Subjects: Social and Information Networks (cs.SI)
The quality of a user's social media experience is determined both by the content they see and by the quality of the conversation and interaction around it. In this paper, we look at replies to tweets from mainstream media outlets and official government agencies and assess if they are good faith, engaging honestly and constructively with the original post, or bad faith, attacking the author or derailing the conversation. We assess automated approaches that may help in making this determination and then show that within our dataset of replies to mainstream media outlets and government agencies, bad faith interactions constitute 68.3% of all replies we studied, suggesting potential concerns about the quality of discourse in these specific conversational contexts. This is particularly true from verified accounts, where 91.7% of replies were bad faith. Given that verified accounts are algorithmically amplified, we discuss the implications of our work for understanding the user experience on social media.
- [344] arXiv:2602.03092 [pdf, ps, other]
-
Title: Generative Artificial Intelligence creates delicious, sustainable, and nutritious burgersComments: 13 pages, 4 figuresSubjects: Computational Engineering, Finance, and Science (cs.CE)
Food choices shape both human and planetary health; yet, designing foods that are delicious, nutritious, and sustainable remains challenging. Here we show that generative artificial intelligence can learn the structure of the human palate directly from large-scale, human-generated recipe data to create novel foods within a structured design space. Using burgers as a model system, the generative AI rediscovers the classic Big Mac without explicit supervision and generates novel burgers optimized for deliciousness, sustainability, or nutrition. Compared to the Big Mac, its delicious burgers score the same or better in overall liking, flavor, and texture in a blinded sensory evaluation conducted in a restaurant setting with 101 participants; its mushroom burger achieves an environmental impact score more than an order of magnitude lower; and its bean burger attains nearly twice the nutritional score. Together, these results establish generative AI as a quantitative framework for learning human taste and navigating complex trade-offs in principled food design.
- [345] arXiv:2602.03093 [pdf, ps, other]
-
Title: Maintaining the Heterogeneity in the Organization of Software Engineering ResearchComments: Accepted at the 48th International Conference on Software Engineering, Future of Software Engineering (ICSE 2026-FoSE)Subjects: Software Engineering (cs.SE)
The heterogeneity in the organization of software engineering (SE) research historically exists, i.e., funded research model and hands-on model, which makes software engineering become a thriving interdisciplinary field in the last 50 years. However, the funded research model is becoming dominant in SE research recently, indicating such heterogeneity has been seriously and systematically threatened. In this essay, we first explain why the heterogeneity is needed in the organization of SE research, then present the current trend of SE research nowadays, as well as the consequences and potential futures. The choice is at our hands, and we urge our community to seriously consider maintaining the heterogeneity in the organization of software engineering research.
- [346] arXiv:2602.03094 [pdf, ps, other]
-
Title: Test-time Recursive Thinking: Self-Improvement without External FeedbackAuthors: Yufan Zhuang, Chandan Singh, Liyuan Liu, Yelong Shen, Dinghuai Zhang, Jingbo Shang, Jianfeng Gao, Weizhu ChenSubjects: Computation and Language (cs.CL)
Modern Large Language Models (LLMs) have shown rapid improvements in reasoning capabilities, driven largely by reinforcement learning (RL) with verifiable rewards. Here, we ask whether these LLMs can self-improve without the need for additional training. We identify two core challenges for such systems: (i) efficiently generating diverse, high-quality candidate solutions, and (ii) reliably selecting correct answers in the absence of ground-truth supervision. To address these challenges, we propose Test-time Recursive Thinking (TRT), an iterative self-improvement framework that conditions generation on rollout-specific strategies, accumulated knowledge, and self-generated verification signals. Using TRT, open-source models reach 100% accuracy on AIME-25/24, and on LiveCodeBench's most difficult problems, closed-source models improve by 10.4-14.8 percentage points without external feedback.
- [347] arXiv:2602.03095 [pdf, ps, other]
-
Title: Gen-Diaolou: An Integrated AI-Assisted Interactive System for Diachronic Understanding and Preservation of the Kaiping DiaolouSubjects: Human-Computer Interaction (cs.HC)
The Kaiping Diaolou and Villages, a UNESCO World Heritage Site, exemplify hybrid Chinese and Western architecture shaped by migration culture. However, architectural heritage engagement often faces authenticity debates, resource constraints, and limited participatory approaches. This research explores current challenges of leveraging Artificial Intelligence (AI) for architectural heritage, and how AI-assisted interactive systems can foster cultural heritage understanding and preservation awareness. We conducted a formative study (N=14) to uncover empirical insights from heritage stakeholders that inform design. These insights informed the design of Gen-Diaolou, an integrated AI-assisted interactive system that supports heritage understanding and preservation. A pilot study (N=18) and a museum field study (N=26) provided converging evidence suggesting that Gen-Diaolou may support visitors' diachronic understanding and preservation awareness, and together informed design implications for future human-AI collaborative systems for digital cultural heritage engagement. More broadly, this work bridges the research gap between passive heritage systems and unconstrained creative tools in the HCI domain.
- [348] arXiv:2602.03096 [pdf, ps, other]
-
Title: PRISM: Structured Optimization via Anisotropic Spectral ShapingAuthors: Yujie YangSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We propose PRISM, an optimizer that enhances first-order spectral descent methods like Muon with partial second-order information. It constructs an efficient, low-rank quasi-second-order preconditioner via innovation-augmented polar decomposition. This mechanism enables PRISM to perform anisotropic spectral shaping, which adaptively suppresses updates in high-variance subspaces while preserving update strength in signal-dominated directions. Crucially, this is achieved with minimal computational overhead and zero additional memory compared to first-order baselines. PRISM demonstrates a practical strategy for integrating curvature-adaptive properties into the spectral optimization paradigm.
- [349] arXiv:2602.03097 [pdf, ps, other]
-
Title: De-conflating Preference and Qualification: Constrained Dual-Perspective Reasoning for Job Recommendation with Large Language ModelsSubjects: Artificial Intelligence (cs.AI)
Professional job recommendation involves a complex bipartite matching process that must reconcile a candidate's subjective preference with an employer's objective qualification. While Large Language Models (LLMs) are well-suited for modeling the rich semantics of resumes and job descriptions, existing paradigms often collapse these two decision dimensions into a single interaction signal, yielding confounded supervision under recruitment-funnel censoring and limiting policy controllability. To address these challenges, We propose JobRec, a generative job recommendation framework for de-conflating preference and qualification via constrained dual-perspective reasoning. JobRec introduces a Unified Semantic Alignment Schema that aligns candidate and job attributes into structured semantic layers, and a Two-Stage Cooperative Training Strategy that learns decoupled experts to separately infer preference and qualification. Building on these experts, a Lagrangian-based Policy Alignment module optimizes recommendations under explicit eligibility requirements, enabling controllable trade-offs. To mitigate data scarcity, we construct a synthetic dataset refined by experts. Experiments show that JobRec consistently outperforms strong baselines and provides improved controllability for strategy-aware professional matching.
- [350] arXiv:2602.03098 [pdf, ps, other]
-
Title: TextME: Bridging Unseen Modalities Through Text DescriptionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Expanding multimodal representations to novel modalities is constrained by reliance on large-scale paired datasets (e.g., text-image, text-audio, text-3D, text-molecule), which are costly and often infeasible in domains requiring expert annotation such as medical imaging and molecular analysis. We introduce TextME, the first text-only modality expansion framework, to the best of our knowledge, projecting diverse modalities into LLM embedding space as a unified anchor. Our approach exploits the geometric structure of pretrained contrastive encoders to enable zero-shot cross-modal transfer using only text descriptions, without paired supervision. We empirically validate that such consistent modality gaps exist across image, video, audio, 3D, X-ray, and molecular domains, demonstrating that text-only training can preserve substantial performance of pretrained encoders. We further show that our framework enables emergent cross-modal retrieval between modality pairs not explicitly aligned during training (e.g., audio-to-image, 3D-to-image). These results establish text-only training as a practical alternative to paired supervision for modality expansion.
- [351] arXiv:2602.03100 [pdf, ps, other]
-
Title: Risky-Bench: Probing Agentic Safety Risks under Real-World DeploymentAuthors: Jingnan Zheng, Yanzhen Luo, Jingjun Xu, Bingnan Liu, Yuxin Chen, Chenhang Cui, Gelei Deng, Chaochao Lu, Xiang Wang, An Zhang, Tat-Seng ChuaSubjects: Artificial Intelligence (cs.AI)
Large Language Models (LLMs) are increasingly deployed as agents that operate in real-world environments, introducing safety risks beyond linguistic harm. Existing agent safety evaluations rely on risk-oriented tasks tailored to specific agent settings, resulting in limited coverage of safety risk space and failing to assess agent safety behavior during long-horizon, interactive task execution in complex real-world deployments. Moreover, their specialization to particular agent settings limits adaptability across diverse agent configurations. To address these limitations, we propose Risky-Bench, a framework that enables systematic agent safety evaluation grounded in real-world deployment. Risky-Bench organizes evaluation around domain-agnostic safety principles to derive context-aware safety rubrics that delineate safety space, and systematically evaluates safety risks across this space through realistic task execution under varying threat assumptions. When applied to life-assist agent settings, Risky-Bench uncovers substantial safety risks in state-of-the-art agents under realistic execution conditions. Moreover, as a well-structured evaluation pipeline, Risky-Bench is not confined to life-assist scenarios and can be adapted to other deployment settings to construct environment-specific safety evaluations, providing an extensible methodology for agent safety assessment.
- [352] arXiv:2602.03102 [pdf, ps, other]
-
Title: Consensus Group Relative Policy Optimization for Text GenerationSubjects: Machine Learning (cs.LG)
Many strong decoding methods for text generation follow a sample-and-rerank paradigm: they draw multiple candidates, score each under a utility (reward) function using consensus across samples, and return the best one. Although effective, these methods incur high computational costs during inference due to repeated sampling and scoring. Prior attempts to amortize inference-time computation typically rely on gold references, teacher labels, or curated preference data, increasing dataset construction effort and the demand for high-fidelity reward models. We propose Consensus Group Relative Policy Optimization (C-GRPO), which distills Minimum Bayes Risk (MBR) decoding into training by formulating the consensus utility as a group-relative objective within GRPO. C-GRPO requires only a utility function and policy samples, without gold references or explicit preference labels. Under ideal conditions, we show that the objective function of C-GRPO is directionally aligned with the gradient of the expected-utility objective underlying MBR decoding, leading to a convergence guarantee. Experiments on machine translation (WMT 2024) and text summarization (XSum) demonstrate that C-GRPO successfully achieves performance comparable to MBR decoding without the associated inference-time overhead, while outperforming reference-free baseline methods.
- [353] arXiv:2602.03103 [pdf, ps, other]
-
Title: Task--Specificity Score: Measuring How Much Instructions Really Matter for SupervisionSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Instruction tuning is now the default way to train and adapt large language models, but many instruction--input--output pairs are only weakly specified: for a given input, the same output can remain plausible under several alternative instructions. This raises a simple question: \emph{does the instruction uniquely determine the target output?}
We propose the \textbf{Task--Specificity Score (TSS)} to quantify how much an instruction matters for predicting its output, by contrasting the true instruction against plausible alternatives for the same input. We further introduce \textbf{TSS++}, which uses hard alternatives and a small quality term to mitigate easy-negative effects. Across three instruction datasets (\textsc{Alpaca}, \textsc{Dolly-15k}, \textsc{NI-20}) and three open LLMs (Gemma, Llama, Qwen), we show that selecting task-specific examples improves downstream performance under tight token budgets and complements quality-based filters such as perplexity and IFD. - [354] arXiv:2602.03104 [pdf, ps, other]
-
Title: "I'm happy even though it's not real": GenAI Photo Editing as a Remembering ExperienceSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Generative Artificial Intelligence (GenAI) is increasingly integrated into photo applications on personal devices, making editing photographs easier than ever while potentially influencing the memories they represent. This study explores how and why people use GenAI to edit personal photos and how this shapes their remembering experience. We conducted a two-phase qualitative study with 12 participants: a photo editing session using a GenAI tool guided by the Remembering Experience (RX) dimensions, followed by semi-structured interviews where participants reflected on the editing process and results. Findings show that participants prioritised felt memory over factual accuracy. For different photo elements, environments were modified easily, however, editing was deemed unacceptable if it touched upon a person's identity. Editing processes brought positive and negative impacts, and itself also became a remembering experience. We further discuss potential benefits and risks of GenAI editing for remembering purposes and propose design implications for responsible GenAI.
- [355] arXiv:2602.03105 [pdf, ps, other]
-
Title: Gromov Wasserstein Optimal Transport for Semantic CorrespondencesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Establishing correspondences between image pairs is a long studied problem in computer vision. With recent large-scale foundation models showing strong zero-shot performance on downstream tasks including classification and segmentation, there has been interest in using the internal feature maps of these models for the semantic correspondence task. Recent works observe that features from DINOv2 and Stable Diffusion (SD) are complementary, the former producing accurate but sparse correspondences, while the latter produces spatially consistent correspondences. As a result, current state-of-the-art methods for semantic correspondence involve combining features from both models in an ensemble. While the performance of these methods is impressive, they are computationally expensive, requiring evaluating feature maps from large-scale foundation models. In this work we take a different approach, instead replacing SD features with a superior matching algorithm which is imbued with the desirable spatial consistency property. Specifically, we replace the standard nearest neighbours matching with an optimal transport algorithm that includes a Gromov Wasserstein spatial smoothness prior. We show that we can significantly boost the performance of the DINOv2 baseline, and be competitive and sometimes surpassing state-of-the-art methods using Stable Diffusion features, while being 5--10x more efficient. We make code available at https://github.com/fsnelgar/semantic_matching_gwot .
- [356] arXiv:2602.03107 [pdf, ps, other]
-
Title: The Mask of Civility: Benchmarking Chinese Mock Politeness Comprehension in Large Language ModelsComments: PreprintSubjects: Computation and Language (cs.CL)
From a pragmatic perspective, this study systematically evaluates the differences in performance among representative large language models (LLMs) in recognizing politeness, impoliteness, and mock politeness phenomena in Chinese. Addressing the existing gaps in pragmatic comprehension, the research adopts the frameworks of Rapport Management Theory and the Model of Mock Politeness to construct a three-category dataset combining authentic and simulated Chinese discourse. Six representative models, including GPT-5.1 and DeepSeek, were selected as test subjects and evaluated under four prompting conditions: zero-shot, few-shot, knowledge-enhanced, and hybrid strategies. This study serves as a meaningful attempt within the paradigm of ``Great Linguistics,'' offering a novel approach to applying pragmatic theory in the age of technological transformation. It also responds to the contemporary question of how technology and the humanities may coexist, representing an interdisciplinary endeavor that bridges linguistic technology and humanistic reflection.
- [357] arXiv:2602.03108 [pdf, ps, other]
-
Title: ChemPro: A Progressive Chemistry Benchmark for Large Language ModelsSubjects: Computation and Language (cs.CL)
We introduce ChemPro, a progressive benchmark with 4100 natural language question-answer pairs in Chemistry, across 4 coherent sections of difficulty designed to assess the proficiency of Large Language Models (LLMs) in a broad spectrum of general chemistry topics. We include Multiple Choice Questions and Numerical Questions spread across fine-grained information recall, long-horizon reasoning, multi-concept questions, problem-solving with nuanced articulation, and straightforward questions in a balanced ratio, effectively covering Bio-Chemistry, Inorganic-Chemistry, Organic-Chemistry and Physical-Chemistry. ChemPro is carefully designed analogous to a student's academic evaluation for basic to high-school chemistry. A gradual increase in the question difficulty rigorously tests the ability of LLMs to progress from solving basic problems to solving more sophisticated challenges.
We evaluate 45+7 state-of-the-art LLMs, spanning both open-source and proprietary variants, and our analysis reveals that while LLMs perform well on basic chemistry questions, their accuracy declines with different types and levels of complexity. These findings highlight the critical limitations of LLMs in general scientific reasoning and understanding and point towards understudied dimensions of difficulty, emphasizing the need for more robust methodologies to improve LLMs. - [358] arXiv:2602.03109 [pdf, ps, other]
-
Title: One Model, All Roles: Multi-Turn, Multi-Agent Self-Play Reinforcement Learning for Conversational Social IntelligenceAuthors: Bowen Jiang, Taiwei Shi, Ryo Kamoi, Yuan Yuan, Camillo J. Taylor, Longqi Yang, Pei Zhou, Sihao ChenSubjects: Computation and Language (cs.CL)
This paper introduces OMAR: One Model, All Roles, a reinforcement learning framework that enables AI to develop social intelligence through multi-turn, multi-agent conversational self-play. Unlike traditional paradigms that rely on static, single-turn optimizations, OMAR allows a single model to role-play all participants in a conversation simultaneously, learning to achieve long-term goals and complex social norms directly from dynamic social interaction. To ensure training stability across long dialogues, we implement a hierarchical advantage estimation that calculates turn-level and token-level advantages. Evaluations in the SOTOPIA social environment and Werewolf strategy games show that our trained models develop fine-grained, emergent social intelligence, such as empathy, persuasion, and compromise seeking, demonstrating the effectiveness of learning collaboration even under competitive scenarios. While we identify practical challenges like reward hacking, our results show that rich social intelligence can emerge without human supervision. We hope this work incentivizes further research on AI social intelligence in group conversations.
- [359] arXiv:2602.03112 [pdf, ps, other]
-
Title: A Unified Candidate Set with Scene-Adaptive Refinement via Diffusion for End-to-End Autonomous DrivingComments: Code:this https URLSubjects: Robotics (cs.RO)
End-to-end autonomous driving is increasingly adopting a multimodal planning paradigm that generates multiple trajectory candidates and selects the final plan, making candidate-set design critical. A fixed trajectory vocabulary provides stable coverage in routine driving but often misses optimal solutions in complex interactions, while scene-adaptive refinement can cause over-correction in simple scenarios by unnecessarily perturbing already strong vocabulary trajectories.We propose CdDrive, which preserves the original vocabulary candidates and augments them with scene-adaptive candidates generated by vocabulary-conditioned diffusion denoising. Both candidate types are jointly scored by a shared selection module, enabling reliable performance across routine and highly interactive scenarios. We further introduce HATNA (Horizon-Aware Trajectory Noise Adapter) to improve the smoothness and geometric continuity of diffusion candidates via temporal smoothing and horizon-aware noise modulation. Experiments on NAVSIM v1 and NAVSIM v2 demonstrate leading performance, and ablations verify the contribution of each component.
- [360] arXiv:2602.03114 [pdf, ps, other]
-
Title: Digital Lifelong Learning in the Age of AI: Trends and InsightsComments: 41 pages including references, appendix, 14 figuresSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Rapid innovations in AI and large language models (LLMs) have accelerated the adoption of digital learning, particularly beyond formal education. What began as an emergency response during COVID-19 has shifted from a supplementary resource to an essential pillar of education. Understanding how digital learning continues to evolve for adult and lifelong learners is therefore increasingly important.
This study examines how various demographics interact with digital learning platforms, focusing on the learner motivations, the effectiveness of gamification in digital learning, and the integration of AI. Using multi survey data from 200 respondents and advanced analytics, our findings reveal a notable increase in the perceived relevance of digital learning after the pandemic, especially among young adults and women, coinciding with the rise of LLM-powered AI tools that support personalized learning. We aim to provide actionable insights for businesses, government policymakers, and educators seeking to optimize their digital learning offerings to meet evolving workforce needs. - [361] arXiv:2602.03117 [pdf, ps, other]
-
Title: AgentDyn: A Dynamic Open-Ended Benchmark for Evaluating Prompt Injection Attacks of Real-World Agent Security SystemComments: 23 Pages, 16 TablesSubjects: Cryptography and Security (cs.CR)
AI agents that autonomously interact with external tools and environments show great promise across real-world applications. However, the external data which agent consumes also leads to the risk of indirect prompt injection attacks, where malicious instructions embedded in third-party content hijack agent behavior. Guided by benchmarks, such as AgentDojo, there has been significant amount of progress in developing defense against the said attacks. As the technology continues to mature, and that agents are increasingly being relied upon for more complex tasks, there is increasing pressing need to also evolve the benchmark to reflect threat landscape faced by emerging agentic systems. In this work, we reveal three fundamental flaws in current benchmarks and push the frontier along these dimensions: (i) lack of dynamic open-ended tasks, (ii) lack of helpful instructions, and (iii) simplistic user tasks. To bridge this gap, we introduce AgentDyn, a manually designed benchmark featuring 60 challenging open-ended tasks and 560 injection test cases across Shopping, GitHub, and Daily Life. Unlike prior static benchmarks, AgentDyn requires dynamic planning and incorporates helpful third-party instructions. Our evaluation of ten state-of-the-art defenses suggests that almost all existing defenses are either not secure enough or suffer from significant over-defense, revealing that existing defenses are still far from real-world deployment. Our benchmark is available at https://github.com/leolee99/AgentDyn.
- [362] arXiv:2602.03118 [pdf, ps, other]
-
Title: The High Cost of Data Augmentation for Learning Equivariant ModelsComments: 33 pages, 13 figuresSubjects: Numerical Analysis (math.NA)
According to Noether's theorem the presence of a continuous symmetry in a Hamiltonian systems is equivalent to the existence of a conserved quantity, yet these symmetries are not always explicitly enforced in data-driven models. There remains a debate whether or not encoding of symmetry into a model architecture is the optimal approach. A competing approach is to target approximate symmetry through data augmentation. In this work, we study two approaches aimed at improving the symmetry properties of such an approximation scheme: one based on a quadrature rule for the Haar measure on the compact Lie group encoding the continuous symmetry of interest and one based on a random sampling of that Haar measure. We demonstrate both theoretically and empirically that the quadrature augmentation leads to exact symmetry preservation in polynomial models, while the random augmentation has only square-root convergence of the symmetrization error.
- [363] arXiv:2602.03119 [pdf, ps, other]
-
Title: Function-Space Empirical Bayes Regularisation with Large Vision-Language Model PriorsSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Bayesian deep learning (BDL) provides a principled framework for reliable uncertainty quantification by combining deep neural networks with Bayesian inference. A central challenge in BDL lies in the design of informative prior distributions that scale effectively to high-dimensional data. Recent functional variational inference (VI) approaches address this issue by imposing priors directly in function space; however, most existing methods rely on Gaussian process (GP) priors, whose expressiveness and generalisation capabilities become limited in high-dimensional regimes. In this work, we propose VLM-FS-EB, a novel function-space empirical Bayes regularisation framework, leveraging large vision-language models (VLMs) to generates semantically meaningful context points. These synthetic samples are then used VLMs for embeddings to construct expressive functional priors. Furthermore, the proposed method is evaluated against various baselines, and experimental results demonstrate that our method consistently improves predictive performance and yields more reliable uncertainty estimates, particularly in out-of-distribution (OOD) detection tasks and data-scarce regimes.
- [364] arXiv:2602.03120 [pdf, ps, other]
-
Title: Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision CostComments: Preprint versionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and high-precision weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision gradient signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning method on arithmetic reasoning tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .
- [365] arXiv:2602.03121 [pdf, ps, other]
-
Title: Behind the Feed: A Taxonomy of User-Facing Cues for Algorithmic Transparency in Social MediaSubjects: Human-Computer Interaction (cs.HC)
People who use social media are learning about how the companies that run these platforms make their decisions on who gets to see what through visual indicators in the interface (UI) of each social media site. These indicators are different for each platform and are not always located in an easy-to-find location on the site. Therefore, it is hard for someone to compare different social media platforms or determine whether transparency leads to greater accountability or only leads to increased understanding. A new classification system has been developed to help provide a standard way of categorizing the way, that an algorithm is presented through UI elements and whether the company has provided any type of explanation as to why they are featured. This new classification system includes the following three areas of development: design form, information content, and user agency. This new classification system can be applied to the six social media platforms currently available and serves as a reference database for identifying common archetypes of features in the each social media platform's UI. The new classification system will assist in determining whether or not the transparency of an algorithm functions the way that it was intended when it was developed and provide future design ideas that can help improve the inspectibility, actionability, and contestability of algorithms.
- [366] arXiv:2602.03123 [pdf, ps, other]
-
Title: Beyond Cropping and Rotation: Automated Evolution of Powerful Task-Specific Augmentations with Generative ModelsAuthors: Judah Goldfeder, Shreyes Kaliyur, Vaibhav Sourirajan, Patrick Minwan Puma, Philippe Martin Wyder, Yuhang Hu, Jiong Lin, Hod LipsonSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Data augmentation has long been a cornerstone for reducing overfitting in vision models, with methods like AutoAugment automating the design of task-specific augmentations. Recent advances in generative models, such as conditional diffusion and few-shot NeRFs, offer a new paradigm for data augmentation by synthesizing data with significantly greater diversity and realism. However, unlike traditional augmentations like cropping or rotation, these methods introduce substantial changes that enhance robustness but also risk degrading performance if the augmentations are poorly matched to the task. In this work, we present EvoAug, an automated augmentation learning pipeline, which leverages these generative models alongside an efficient evolutionary algorithm to learn optimal task-specific augmentations. Our pipeline introduces a novel approach to image augmentation that learns stochastic augmentation trees that hierarchically compose augmentations, enabling more structured and adaptive transformations. We demonstrate strong performance across fine-grained classification and few-shot learning tasks. Notably, our pipeline discovers augmentations that align with domain knowledge, even in low-data settings. These results highlight the potential of learned generative augmentations, unlocking new possibilities for robust model training.
- [367] arXiv:2602.03124 [pdf, ps, other]
-
Title: Feature, Alignment, and Supervision in Category Learning: A Comparative Approach with Children and Neural NetworksSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Understanding how humans and machines learn from sparse data is central to cognitive science and machine learning. Using a species-fair design, we compare children and convolutional neural networks (CNNs) in a few-shot semi-supervised category learning task. Both learners are exposed to novel object categories under identical conditions. Learners receive mixtures of labeled and unlabeled exemplars while we vary supervision (1/3/6 labels), target feature (size, shape, pattern), and perceptual alignment (high/low). We find that children generalize rapidly from minimal labels but show strong feature-specific biases and sensitivity to alignment. CNNs show a different interaction profile: added supervision improves performance, but both alignment and feature structure moderate the impact additional supervision has on learning. These results show that human-model comparisons must be drawn under the right conditions, emphasizing interactions among supervision, feature structure, and alignment rather than overall accuracy.
- [368] arXiv:2602.03126 [pdf, ps, other]
-
Title: Flexible Geometric Guidance for Probabilistic Human Pose Estimation with Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
3D human pose estimation from 2D images is a challenging problem due to depth ambiguity and occlusion. Because of these challenges the task is underdetermined, where there exists multiple -- possibly infinite -- poses that are plausible given the image. Despite this, many prior works assume the existence of a deterministic mapping and estimate a single pose given an image. Furthermore, methods based on machine learning require a large amount of paired 2D-3D data to train and suffer from generalization issues to unseen scenarios. To address both of these issues, we propose a framework for pose estimation using diffusion models, which enables sampling from a probability distribution over plausible poses which are consistent with a 2D image. Our approach falls under the guidance framework for conditional generation, and guides samples from an unconditional diffusion model, trained only on 3D data, using the gradients of the heatmaps from a 2D keypoint detector. We evaluate our method on the Human 3.6M dataset under best-of-$m$ multiple hypothesis evaluation, showing state-of-the-art performance among methods which do not require paired 2D-3D data for training. We additionally evaluate the generalization ability using the MPI-INF-3DHP and 3DPW datasets and demonstrate competitive performance. Finally, we demonstrate the flexibility of our framework by using it for novel tasks including pose generation and pose completion, without the need to train bespoke conditional models. We make code available at https://github.com/fsnelgar/diffusion_pose .
- [369] arXiv:2602.03127 [pdf, ps, other]
-
Title: Cyber Insurance, Audit, and Policy: Review, Analysis and RecommendationsSubjects: Cryptography and Security (cs.CR)
Cyber insurance, which protects insured organizations against financial losses from cyberattacks and data breaches, can be difficult and expensive to obtain for many organizations. These difficulties stem from insurers difficulty in understanding and accurately assessing the risks that they are undertaking. Cybersecurity audits, which are already implemented in many organizations for compliance and other purposes, present a potential solution to this challenge. This paper provides a structured review and analysis of prior work in this area, analysis of the challenges and potential benefits that cyber audits provide and recommendations for the use of cyber audits to reduce cyber insurance costs and improve its availability.
- [370] arXiv:2602.03128 [pdf, ps, other]
-
Title: Understanding Multi-Agent LLM Frameworks: A Unified Benchmark and Experimental AnalysisComments: 25 pages, 9 figures and 13 tables; introduces MAFBench unified multi-agent evaluation suiteSubjects: Artificial Intelligence (cs.AI)
Multi-agent LLM frameworks are widely used to accelerate the development of agent systems powered by large language models (LLMs). These frameworks impose distinct architectural structures that govern how agents interact, store information, and coordinate tasks. However, their impact on system performance remains poorly understood. This gap is critical, as architectural choices alone can induce order-of-magnitude differences in latency and throughput, as well as substantial variation in accuracy and scalability. Addressing this challenge requires (i) jointly evaluating multiple capabilities, such as orchestration overhead, memory behavior, planning, specialization, and coordination, and (ii) conducting these evaluations under controlled, framework-level conditions to isolate architectural effects. Existing benchmarks focus on individual capabilities and lack standardized framework-level evaluation. We address these limitations by (i) introducing an architectural taxonomy for systematically comparing multi-agent LLM frameworks along fundamental dimensions, and (ii) developing MAFBench, a unified evaluation suite that integrates existing benchmarks under a standardized execution pipeline. Using MAFBench, we conduct a controlled empirical study across several widely used frameworks. Our results show that framework-level design choices alone can increase latency by over 100x, reduce planning accuracy by up to 30%, and lower coordination success from above 90% to below 30%. Finally, we translate our findings into concrete architectural design principles and framework selection guidance, and outline promising future research directions.
- [371] arXiv:2602.03130 [pdf, ps, other]
-
Title: FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent EvaluationSubjects: Computer Vision and Pattern Recognition (cs.CV); Computational Engineering, Finance, and Science (cs.CE)
The financial domain poses substantial challenges for vision-language models (VLMs) due to specialized chart formats and knowledge-intensive reasoning requirements. However, existing financial benchmarks are largely single-turn and rely on a narrow set of question formats, limiting comprehensive evaluation in realistic application scenarios. To address this gap, we propose FinMTM, a multi-turn multimodal benchmark that expands diversity along both data and task dimensions. On the data side, we curate and annotate 11{,}133 bilingual (Chinese and English) financial QA pairs grounded in financial visuals, including candlestick charts, statistical plots, and report figures. On the task side, FinMTM covers single- and multiple-choice questions, multi-turn open-ended dialogues, and agent-based tasks. We further design task-specific evaluation protocols, including a set-overlap scoring rule for multiple-choice questions, a weighted combination of turn-level and session-level scores for multi-turn dialogues, and a composite metric that integrates planning quality with final outcomes for agent tasks. Extensive experimental evaluation of 22 VLMs reveal their limitations in fine-grained visual perception, long-context reasoning, and complex agent workflows.
- [372] arXiv:2602.03132 [pdf, ps, other]
-
Title: Contrastive Concept-Tree Search for LLM-Assisted Algorithm DiscoverySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Large language Model (LLM)-assisted algorithm discovery is an iterative, black-box optimization process over programs to approximatively solve a target task, where an LLM proposes candidate programs and an external evaluator provides task feedback. Despite intense recent research on the topic and promising results, how can the LLM internal representation of the space of possible programs be maximally exploited to improve performance is an open question. Here, we introduce Contrastive Concept-Tree Search (CCTS), which extracts a hierarchical concept representation from the generated programs and learns a contrastive concept model that guides parent selection. By reweighting parents using a likelihood-ratio score between high- and low-performing solutions, CCTS biases search toward useful concept combinations and away from misleading ones, providing guidance through an explicit concept hierarchy rather than the algorithm lineage constructed by the LLM. We show that CCTS improves search efficiency over fitness-based baselines and produces interpretable, task-specific concept trees across a benchmark of open Erd\H{o}s-type combinatorics problems. Our analysis indicates that the gains are driven largely by learning which concepts to avoid. We further validate these findings in a controlled synthetic algorithm-discovery environment, which reproduces qualitatively the search dynamics observed with the LLMs.
- [373] arXiv:2602.03134 [pdf, ps, other]
-
Title: SwiftVLM: Efficient Vision-Language Model Inference via Cross-Layer Token BypassSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Visual token pruning is a promising approach for reducing the computational cost of vision-language models (VLMs), and existing methods often rely on early pruning decisions to improve efficiency. While effective on coarse-grained reasoning tasks, they suffer from significant performance degradation on tasks requiring fine-grained visual details. Through layer-wise analysis, we reveal substantial discrepancies in visual token importance across layers, showing that tokens deemed unimportant at shallow layers can later become highly relevant for text-conditioned reasoning. To avoid irreversible critical information loss caused by premature pruning, we introduce a new pruning paradigm, termed bypass, which preserves unselected visual tokens and forwards them to subsequent pruning stages for re-evaluation. Building on this paradigm, we propose SwiftVLM, a simple and training-free method that performs pruning at model-specific layers with strong visual token selection capability, while enabling independent pruning decisions across layers. Experiments across multiple VLMs and benchmarks demonstrate that SwiftVLM consistently outperforms existing pruning strategies, achieving superior accuracy-efficiency trade-offs and more faithful visual token selection behavior.
- [374] arXiv:2602.03135 [pdf, ps, other]
-
Title: Enhanced Parcel Arrival Forecasting for Logistic Hubs: An Ensemble Deep Learning ApproachSubjects: Machine Learning (cs.LG)
The rapid expansion of online shopping has increased the demand for timely parcel delivery, compelling logistics service providers to enhance the efficiency, agility, and predictability of their hub networks. In order to solve the problem, we propose a novel deep learning-based ensemble framework that leverages historical arrival patterns and real-time parcel status updates to forecast upcoming workloads at logistic hubs. This approach not only facilitates the generation of short-term forecasts, but also improves the accuracy of future hub workload predictions for more strategic planning and resource management. Empirical tests of the algorithm, conducted through a case study of a major city's parcel logistics, demonstrate the ensemble method's superiority over both traditional forecasting techniques and standalone deep learning models. Our findings highlight the significant potential of this method to improve operational efficiency in logistics hubs and advocate for its broader adoption.
- [375] arXiv:2602.03137 [pdf, ps, other]
-
Title: FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph DiffusionComments: Accepted by ICLR 2026. Code is available at: \url{this https URL}Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present FSOD-VFM: Few-Shot Object Detectors with Vision Foundation Models, a framework that leverages vision foundation models to tackle the challenge of few-shot object detection. FSOD-VFM integrates three key components: a universal proposal network (UPN) for category-agnostic bounding box generation, SAM2 for accurate mask extraction, and DINOv2 features for efficient adaptation to new object categories. Despite the strong generalization capabilities of foundation models, the bounding boxes generated by UPN often suffer from overfragmentation, covering only partial object regions and leading to numerous small, false-positive proposals rather than accurate, complete object detections. To address this issue, we introduce a novel graph-based confidence reweighting method. In our approach, predicted bounding boxes are modeled as nodes in a directed graph, with graph diffusion operations applied to propagate confidence scores across the network. This reweighting process refines the scores of proposals, assigning higher confidence to whole objects and lower confidence to local, fragmented parts. This strategy improves detection granularity and effectively reduces the occurrence of false-positive bounding box proposals. Through extensive experiments on Pascal-5$^i$, COCO-20$^i$, and CD-FSOD datasets, we demonstrate that our method substantially outperforms existing approaches, achieving superior performance without requiring additional training. Notably, on the challenging CD-FSOD dataset, which spans multiple datasets and domains, our FSOD-VFM achieves 31.6 AP in the 10-shot setting, substantially outperforming previous training-free methods that reach only 21.4 AP. Code is available at: https://intellindust-ai-lab.github.io/projects/FSOD-VFM.
- [376] arXiv:2602.03138 [pdf, ps, other]
-
Title: SATORIS-N: Spectral Analysis based Traffic Observation Recovery via Informed Subspaces and Nuclear-norm minimizationSubjects: Machine Learning (cs.LG)
Traffic-density matrices from different days exhibit both low rank and stable correlations in their singular-vector subspaces. Leveraging this, we introduce SATORIS-N, a framework for imputing partially observed traffic-density by informed subspace priors from neighboring days. Our contribution is a subspace-aware semidefinite programming (SDP)} formulation of nuclear norm that explicitly informs the reconstruction with prior singular-subspace information. This convex formulation jointly enforces low rank and subspace alignment, providing a single global optimum and substantially improving accuracy under medium and high occlusion. We also study a lightweight implicit subspace-alignment} strategy in which matrices from consecutive days are concatenated to encourage alignment of spatial or temporal singular directions. Although this heuristic offers modest gains when missing rates are low, the explicit SDP approach is markedly more robust when large fractions of entries are missing. Across two real-world datasets (Beijing and Shanghai), SATORIS-N consistently outperforms standard matrix-completion methods such as SoftImpute, IterativeSVD, statistical, and even deep learning baselines at high occlusion levels. The framework generalizes to other spatiotemporal settings in which singular subspaces evolve slowly over time. In the context of intelligent vehicles and vehicle-to-everything (V2X) systems, accurate traffic-density
reconstruction enables critical applications including cooperative perception, predictive routing, and vehicle-to-infrastructure (V2I) communication optimization. When infrastructure sensors or vehicle-reported observations are incomplete - due to communication dropouts, sensor occlusions, or sparse connected vehicle penetration-reliable imputation becomes essential for safe and efficient autonomous navigation. - [377] arXiv:2602.03139 [pdf, ps, other]
-
Title: Diversity-Preserved Distribution Matching Distillation for Fast Visual SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV)
Distribution matching distillation (DMD) aligns a multi-step generator with its few-step counterpart to enable high-quality generation under low inference cost. However, DMD tends to suffer from mode collapse, as its reverse-KL formulation inherently encourages mode-seeking behavior, for which existing remedies typically rely on perceptual or adversarial regularization, thereby incurring substantial computational overhead and training instability. In this work, we propose a role-separated distillation framework that explicitly disentangles the roles of distilled steps: the first step is dedicated to preserving sample diversity via a target-prediction (e.g., v-prediction) objective, while subsequent steps focus on quality refinement under the standard DMD loss, with gradients from the DMD objective blocked at the first step. We term this approach Diversity-Preserved DMD (DP-DMD), which, despite its simplicity -- no perceptual backbone, no discriminator, no auxiliary networks, and no additional ground-truth images -- preserves sample diversity while maintaining visual quality on par with state-of-the-art methods in extensive text-to-image experiments.
- [378] arXiv:2602.03140 [pdf, ps, other]
-
Title: Analyzing Zigbee Traffic: Datasets, Classification and Storage Trade-offsSubjects: Networking and Internet Architecture (cs.NI)
Zigbee is widely used in smart home environments due to its low power consumption and support for mesh networking, making it a relevant target for traffic-based IoT forensic analysis. However, existing studies often rely on limited datasets and fixed network configurations. In this paper, we analyze Zigbee network traffic from three complementary perspectives: data collection, traffic classification, and storage efficiency. We introduce ZIOTP2025, a publicly available dataset of Zigbee traffic collected from commercial smart home devices deployed under multiple network configurations and capturing realistic interaction scenarios. Using this dataset, we study two traffic classification tasks: device type classification and individual device identification, and evaluate their robustness under both intra-configuration and cross-configuration settings. Our results show that while high classification accuracy can be achieved under controlled conditions, performance degrades significantly when models are evaluated across different network configurations, particularly for fine-grained identification tasks. Finally, we investigate the trade-off between traffic storage requirements and classification accuracy. We show that lossy compression of traffic features through quantization can reduce storage requirements by approximately 4-5x compared to lossless storage of raw packet traces, while preserving near-lossless classification performance. Overall, our results highlight the need for topology-aware Zigbee traffic analysis and storage-efficient feature compression to enable robust and scalable IoT forensic systems.
- [379] arXiv:2602.03141 [pdf, ps, other]
-
Title: Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge OptimizationSubjects: Computation and Language (cs.CL)
While Large Reasoning Models (LRMs) have demonstrated impressive capabilities in solving complex tasks through the generation of long reasoning chains, this reliance on verbose generation results in significant latency and computational overhead. To address these challenges, we propose \textbf{CoSMo} (\textbf{Co}nsistency-Guided \textbf{S}plit-\textbf{M}erge \textbf{O}ptimization), a framework designed to eliminate structural redundancy rather than indiscriminately restricting token volume. Specifically, CoSMo utilizes a split-merge algorithm that dynamically refines reasoning chains by merging redundant segments and splitting logical gaps to ensure coherence. We then employ structure-aligned reinforcement learning with a novel segment-level budget to supervise the model in maintaining efficient reasoning structures throughout training. Extensive experiments across multiple benchmarks and backbones demonstrate that CoSMo achieves superior performance, improving accuracy by \textbf{3.3} points while reducing segment usage by \textbf{28.7\%} on average compared to reasoning efficiency baselines.
- [380] arXiv:2602.03143 [pdf, ps, other]
-
Title: Self-Hinting Language Models Enhance Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Group Relative Policy Optimization (GRPO) has recently emerged as a practical recipe for aligning large language models with verifiable objectives. However, under sparse terminal rewards, GRPO often stalls because rollouts within a group frequently receive identical rewards, causing relative advantages to collapse and updates to vanish. We propose self-hint aligned GRPO with privileged supervision (SAGE), an on-policy reinforcement learning framework that injects privileged hints during training to reshape the rollout distribution under the same terminal verifier reward. For each prompt $x$, the model samples a compact hint $h$ (e.g., a plan or decomposition) and then generates a solution $\tau$ conditioned on $(x,h)$. Crucially, the task reward $R(x,\tau)$ is unchanged; hints only increase within-group outcome diversity under finite sampling, preventing GRPO advantages from collapsing under sparse rewards. At test time, we set $h=\varnothing$ and deploy the no-hint policy without any privileged information. Moreover, sampling diverse self-hints serves as an adaptive curriculum that tracks the learner's bottlenecks more effectively than fixed hints from an initial policy or a stronger external model. Experiments over 6 benchmarks with 3 LLMs show that SAGE consistently outperforms GRPO, on average +2.0 on Llama-3.2-3B-Instruct, +1.2 on Qwen2.5-7B-Instruct and +1.3 on Qwen3-4B-Instruct. The code is available at https://github.com/BaohaoLiao/SAGE.
- [381] arXiv:2602.03144 [pdf, ps, other]
-
Title: What Makes a Good Example? Modeling Exemplar Selection with Neural Network RepresentationsSubjects: Machine Learning (cs.LG)
Teaching requires distilling a rich category distribution into a small set of informative exemplars. Although prior work shows that humans consider both representativeness and diversity when teaching, the computational principles underlying these tradeoffs remain unclear. We address this gap by modeling human exemplar selection using neural network feature representations and principled subset selection criteria. Novel visual categories were embedded along a one-dimensional morph continuum using pretrained vision models, and selection strategies varied in their emphasis on prototypicality, joint representativeness, and diversity. Adult participants selected one to three exemplars to teach a learner. Model-human comparisons revealed that strategies based on joint representativeness, or its combination with diversity, best captured human judgments, whereas purely prototypical or diversity-based strategies performed worse. Moreover, transformer-based representations consistently aligned more closely with human behavior than convolutional networks. These results highlight the potential utility of dataset distillation methods in machine learning as computational models for teaching.
- [382] arXiv:2602.03145 [pdf, ps, other]
-
Title: Internet of Agentic AI: Incentive-Compatible Distributed Teaming and WorkflowSubjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Large language models (LLMs) have enabled a new class of agentic AI systems that reason, plan, and act by invoking external tools. However, most existing agentic architectures remain centralized and monolithic, limiting scalability, specialization, and interoperability. This paper proposes a framework for scalable agentic intelligence, termed the Internet of Agentic AI, in which autonomous, heterogeneous agents distributed across cloud and edge infrastructure dynamically form coalitions to execute task-driven workflows. We formalize a network-native model of agentic collaboration and introduce an incentive-compatible workflow-coalition feasibility framework that integrates capability coverage, network locality, and economic implementability. To enable scalable coordination, we formulate a minimum-effort coalition selection problem and propose a decentralized coalition formation algorithm. The proposed framework can operate as a coordination layer above the Model Context Protocol (MCP). A healthcare case study demonstrates how domain specialization, cloud-edge heterogeneity, and dynamic coalition formation enable scalable, resilient, and economically viable agentic workflows. This work lays the foundation for principled coordination and scalability in the emerging era of Internet of Agentic AI.
- [383] arXiv:2602.03146 [pdf, ps, other]
-
Title: General Agents Contain World Models, even under Partial Observability and StochasticityAuthors: Santiago CifuentesComments: 19 pages, 4 figuresSubjects: Artificial Intelligence (cs.AI)
Deciding whether an agent possesses a model of its surrounding world is a fundamental step toward understanding its capabilities and limitations. In [10], it was shown that, within a particular framework, every almost optimal and general agent necessarily contains sufficient knowledge of its environment to allow an approximate reconstruction of it by querying the agent as a black box. This result relied on the assumptions that the agent is deterministic and that the environment is fully observable.
In this work, we remove both assumptions by extending the theorem to stochastic agents operating in partially observable environments. Fundamentally, this shows that stochastic agents cannot avoid learning their environment through the usage of randomization. We also strengthen the result by weakening the notion of generality, proving that less powerful agents already contain a model of the world in which they operate. - [384] arXiv:2602.03147 [pdf, ps, other]
-
Title: Multi-function Robotized Surgical Dissector for Endoscopic Pulmonary Thromboendarterectomy: Preclinical Study and EvaluationSubjects: Robotics (cs.RO)
Patients suffering chronic severe pulmonary thromboembolism need Pulmonary Thromboendarterectomy (PTE) to remove the thromb and intima located inside pulmonary artery (PA). During the surgery, a surgeon holds tweezers and a dissector to delicately strip the blockage, but available tools for this surgery are rigid and straight, lacking distal dexterity to access into thin branches of PA. Therefore, this work presents a novel robotized dissector based on concentric push/pull robot (CPPR) structure, enabling entering deep thin branch of tortuous PA. Compared with conventional rigid dissectors, our design characterizes slenderness and dual-segment-bending dexterity. Owing to the hollow and thin-walled structure of the CPPR-based dissector as it has a slender body of 3.5mm in diameter, the central lumen accommodates two channels for irrigation and tip tool, and space for endoscopic camera's signal wire. To provide accurate surgical manipulation, optimization-based kinematics model was established, realizing a 2mm accuracy in positioning the tip tool (60mm length) under open-loop control strategy. As such, with the endoscopic camera, traditional PTE is possible to be upgraded as endoscopic PTE. Basic physic performance of the robotized dissector including stiffness, motion accuracy and maneuverability was evaluated through experiments. Surgery simulation on ex vivo porcine lung also demonstrates its dexterity and notable advantages in PTE.
- [385] arXiv:2602.03151 [pdf, ps, other]
-
Title: Enhancing Foundation VLM Robustness to Missing Modality: Scalable Diffusion for Bi-directional Feature RestorationComments: 12 pagesSubjects: Artificial Intelligence (cs.AI)
Vision Language Models (VLMs) typically assume complete modality input during inference. However, their effectiveness drops sharply when certain modalities are unavailable or incomplete. Current research primarily faces two dilemmas: Prompt-based methods struggle to restore missing yet indispensable features and impair generalization of VLMs. Imputation-based approaches, lacking effective guidance, are prone to generating semantically irrelevant noise. Restoring precise semantics while sustaining VLM generalization remains challenging. Therefore, we propose a general missing modality restoration strategy in this paper. We introduce an enhanced diffusion model as a pluggable mid-stage training module to effectively restore missing features. Our strategy introduces two key innovations: (I) Dynamic Modality Gating, which adaptively leverages conditional features to steer the generation of semantically consistent features; (II) Cross-Modal Mutual Learning mechanism, which bridges the semantic spaces of dual encoders to achieve bidirectional alignment. Zero-shot evaluations across benchmark datasets demonstrate that our approach outperforms existing baseline methods. Extensive experiments and ablation studies confirm our model as a robust and scalable extension for VLMs in missing modality scenarios, ensuring reliability across diverse missing rates and environments. Our code and models will be publicly available.
- [386] arXiv:2602.03152 [pdf, ps, other]
-
Title: FASA: Frequency-aware Sparse AttentionAuthors: Yifei Wang, Yueqi Wang, Zhenrui Yue, Huimin Zeng, Yong Wang, Ismini Lourentzou, Zhengzhong Tu, Xiangxiang Chu, Julian McAuleyComments: Accepted by ICLR 2026Subjects: Computation and Language (cs.CL)
The deployment of Large Language Models (LLMs) faces a critical bottleneck when handling lengthy inputs: the prohibitive memory footprint of the Key Value (KV) cache. To address this bottleneck, the token pruning paradigm leverages attention sparsity to selectively retain a small, critical subset of tokens. However, existing approaches fall short, with static methods risking irreversible information loss and dynamic strategies employing heuristics that insufficiently capture the query-dependent nature of token importance. We propose FASA, a novel framework that achieves query-aware token eviction by dynamically predicting token importance. FASA stems from a novel insight into RoPE: the discovery of functional sparsity at the frequency-chunk (FC) level. Our key finding is that a small, identifiable subset of "dominant" FCs consistently exhibits high contextual agreement with the full attention head. This provides a robust and computationally free proxy for identifying salient tokens. %making them a powerful and efficient proxy for token importance. Building on this insight, FASA first identifies a critical set of tokens using dominant FCs, and then performs focused attention computation solely on this pruned subset. % Since accessing only a small fraction of the KV cache, FASA drastically lowers memory bandwidth requirements and computational cost. Across a spectrum of long-context tasks, from sequence modeling to complex CoT reasoning, FASA consistently outperforms all token-eviction baselines and achieves near-oracle accuracy, demonstrating remarkable robustness even under constraint budgets. Notably, on LongBench-V1, FASA reaches nearly 100\% of full-KV performance when only keeping 256 tokens, and achieves 2.56$\times$ speedup using just 18.9\% of the cache on AIME24.
- [387] arXiv:2602.03153 [pdf, ps, other]
-
Title: When Attention Betrays: Erasing Backdoor Attacks in Robotic Policies by Reconstructing Visual TokensAuthors: Xuetao Li, Pinhan Fu, Wenke Huang, Nengyuan Pan, Songhua Yang, Kaiyan Zhao, Guancheng Wan, Mengde Li, Jifeng Xuan, Miao LiComments: ICRA2026 acceptedSubjects: Robotics (cs.RO)
Downstream fine-tuning of vision-language-action (VLA) models enhances robotics, yet exposes the pipeline to backdoor risks. Attackers can pretrain VLAs on poisoned data to implant backdoors that remain stealthy but can trigger harmful behavior during inference. However, existing defenses either lack mechanistic insight into multimodal backdoors or impose prohibitive computational costs via full-model retraining. To this end, we uncover a deep-layer attention grabbing mechanism: backdoors redirect late-stage attention and form compact embedding clusters near the clean manifold. Leveraging this insight, we introduce Bera, a test-time backdoor erasure framework that detects tokens with anomalous attention via latent-space localization, masks suspicious regions using deep-layer cues, and reconstructs a trigger-free image to break the trigger-unsafe-action mapping while restoring correct behavior. Unlike prior defenses, Bera requires neither retraining of VLAs nor any changes to the training pipeline. Extensive experiments across multiple embodied platforms and tasks show that Bera effectively maintains nominal performance, significantly reduces attack success rates, and consistently restores benign behavior from backdoored outputs, thereby offering a robust and practical defense mechanism for securing robotic systems.
- [388] arXiv:2602.03154 [pdf, ps, other]
-
Title: Intelligent Front-End Personalization: AI-Driven UI AdaptationAuthors: Mona RajhansComments: To be published in proceedings of IEEE ACDSA 2026Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Front-end personalization has traditionally relied on static designs or rule-based adaptations, which fail to fully capture user behavior patterns. This paper presents an AI driven approach for dynamic front-end personalization, where UI layouts, content, and features adapt in real-time based on predicted user behavior. We propose three strategies: dynamic layout adaptation using user path prediction, content prioritization through reinforcement learning, and a comparative analysis of AI-driven vs. rule-based personalization. Technical implementation details, algorithms, system architecture, and evaluation methods are provided to illustrate feasibility and performance gains.
- [389] arXiv:2602.03155 [pdf, ps, other]
-
Title: Is It Possible to Make Chatbots Virtuous? Investigating a Virtue-Based Design Methodology Applied to LLMsSubjects: Human-Computer Interaction (cs.HC)
With the rapid growth of Large Language Models (LLMs), criticism of their societal impact has also grown. Work in Responsible AI (RAI) has focused on the development of AI systems aimed at reducing harm. Responding to RAI's criticisms and the need to bring the wisdom traditions into HCI, we apply Conwill et al.'s Virtue-Guided Technology Design method to LLMs. We cataloged new ethical design patterns for LLMs and evaluated them through interviews with technologists. Participants valued that the patterns provided more accuracy and robustness, better safety, new research opportunities, increased access and control, and reduced waste. Their concerns were that the patterns could be vulnerable to jailbreaking, were generalizing models too widely, and had potential implementation issues. Overall, participants reacted positively while also acknowledging the tradeoffs involved in ethical LLM design.
- [390] arXiv:2602.03156 [pdf, ps, other]
-
Title: Fully Kolmogorov-Arnold Deep Model in Medical Image SegmentationComments: 11 pages, 5 figures, conferenceSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Deeply stacked KANs are practically impossible due to high training difficulties and substantial memory requirements. Consequently, existing studies can only incorporate few KAN layers, hindering the comprehensive exploration of KANs. This study overcomes these limitations and introduces the first fully KA-based deep model, demonstrating that KA-based layers can entirely replace traditional architectures in deep learning and achieve superior learning capacity. Specifically, (1) the proposed Share-activation KAN (SaKAN) reformulates Sprecher's variant of Kolmogorov-Arnold representation theorem, which achieves better optimization due to its simplified parameterization and denser training samples, to ease training difficulty, (2) this paper indicates that spline gradients contribute negligibly to training while consuming huge GPU memory, thus proposes the Grad-Free Spline to significantly reduce memory usage and computational overhead. (3) Building on these two innovations, our ALL U-KAN is the first representative implementation of fully KA-based deep model, where the proposed KA and KAonv layers completely replace FC and Conv layers. Extensive evaluations on three medical image segmentation tasks confirm the superiority of the full KA-based architecture compared to partial KA-based and traditional architectures, achieving all higher segmentation accuracy. Compared to directly deeply stacked KAN, ALL U-KAN achieves 10 times reduction in parameter count and reduces memory consumption by more than 20 times, unlocking the new explorations into deep KAN architectures.
- [391] arXiv:2602.03157 [pdf, ps, other]
-
Title: Human-in-the-loop Adaptation in Group Activity Feature Learning for Team Sports Video RetrievalComments: Accepted to Computer Vision and Image Understanding (CVIU)Journal-ref: Computer Vision and Image Understanding 263 (2026) 104577Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes human-in-the-loop adaptation for Group Activity Feature Learning (GAFL) without group activity annotations. This human-in-the-loop adaptation is employed in a group-activity video retrieval framework to improve its retrieval performance. Our method initially pre-trains the GAF space based on the similarity of group activities in a self-supervised manner, unlike prior work that classifies videos into pre-defined group activity classes in a supervised learning manner. Our interactive fine-tuning process updates the GAF space to allow a user to better retrieve videos similar to query videos given by the user. In this fine-tuning, our proposed data-efficient video selection process provides several videos, which are selected from a video database, to the user in order to manually label these videos as positive or negative. These labeled videos are used to update (i.e., fine-tune) the GAF space, so that the positive and negative videos move closer to and farther away from the query videos through contrastive learning. Our comprehensive experimental results on two team sports datasets validate that our method significantly improves the retrieval performance. Ablation studies also demonstrate that several components in our human-in-the-loop adaptation contribute to the improvement of the retrieval performance. Code: https://github.com/chihina/GAFL-FINE-CVIU.
- [392] arXiv:2602.03158 [pdf, ps, other]
-
Title: PAMAS: Self-Adaptive Multi-Agent System with Perspective Aggregation for Misinformation DetectionComments: 12 pagesSubjects: Information Retrieval (cs.IR)
Misinformation on social media poses a critical threat to information credibility, as its diverse and context-dependent nature complicates detection. Large language model-empowered multi-agent systems (MAS) present a promising paradigm that enables cooperative reasoning and collective intelligence to combat this threat. However, conventional MAS suffer from an information-drowning problem, where abundant truthful content overwhelms sparse and weak deceptive cues. With full input access, agents tend to focus on dominant patterns, and inter-agent communication further amplifies this bias. To tackle this issue, we propose PAMAS, a multi-agent framework with perspective aggregation, which employs hierarchical, perspective-aware aggregation to highlight anomaly cues and alleviate information drowning. PAMAS organizes agents into three roles: Auditors, Coordinators, and a Decision-Maker. Auditors capture anomaly cues from specialized feature subsets; Coordinators aggregate their perspectives to enhance coverage while maintaining diversity; and the Decision-Maker, equipped with evolving memory and full contextual access, synthesizes all subordinate insights to produce the final judgment. Furthermore, to improve efficiency in multi-agent collaboration, PAMAS incorporates self-adaptive mechanisms for dynamic topology optimization and routing-based inference, enhancing both efficiency and scalability. Extensive experiments on multiple benchmark datasets demonstrate that PAMAS achieves superior accuracy and efficiency, offering a scalable and trustworthy way for misinformation detection.
- [393] arXiv:2602.03160 [pdf, ps, other]
-
Title: VALUEFLOW: Toward Pluralistic and Steerable Value-based Alignment in Large Language ModelsSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Aligning Large Language Models (LLMs) with the diverse spectrum of human values remains a central challenge: preference-based methods often fail to capture deeper motivational principles. Value-based approaches offer a more principled path, yet three gaps persist: extraction often ignores hierarchical structure, evaluation detects presence but not calibrated intensity, and the steerability of LLMs at controlled intensities remains insufficiently understood. To address these limitations, we introduce VALUEFLOW, the first unified framework that spans extraction, evaluation, and steering with calibrated intensity control. The framework integrates three components: (i) HIVES, a hierarchical value embedding space that captures intra- and cross-theory value structure; (ii) the Value Intensity DataBase (VIDB), a large-scale resource of value-labeled texts with intensity estimates derived from ranking-based aggregation; and (iii) an anchor-based evaluator that produces consistent intensity scores for model outputs by ranking them against VIDB panels. Using VALUEFLOW, we conduct a comprehensive large-scale study across ten models and four value theories, identifying asymmetries in steerability and composition laws for multi-value control. This paper establishes a scalable infrastructure for evaluating and controlling value intensity, advancing pluralistic alignment of LLMs.
- [394] arXiv:2602.03164 [pdf, ps, other]
-
Title: MemCast: Memory-Driven Time Series Forecasting with Experience-Conditioned ReasoningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Time series forecasting (TSF) plays a critical role in decision-making for many real-world applications. Recently, LLM-based forecasters have made promising advancements. Despite their effectiveness, existing methods often lack explicit experience accumulation and continual evolution. In this work, we propose MemCast, a learning-to-memory framework that reformulates TSF as an experience-conditioned reasoning task. Specifically, we learn experience from the training set and organize it into a hierarchical memory. This is achieved by summarizing prediction results into historical patterns, distilling inference trajectories into reasoning wisdom, and inducing extracted temporal features into general laws. Furthermore, during inference, we leverage historical patterns to guide the reasoning process and utilize reasoning wisdom to select better trajectories, while general laws serve as criteria for reflective iteration. Additionally, to enable continual evolution, we design a dynamic confidence adaptation strategy that updates the confidence of individual entries without leaking the test set distribution. Extensive experiments on multiple datasets demonstrate that MemCast consistently outperforms previous methods, validating the effectiveness of our approach. Our code is available at https://github.com/Xiaoyu-Tao/MemCast-TS.
- [395] arXiv:2602.03166 [pdf, ps, other]
-
Title: Event-Level Probabilistic Prediction of Extreme Rainfall over India Using Physics-Gated Latent DynamicsAuthors: Arun Govind NeelanSubjects: Numerical Analysis (math.NA)
Extreme rainfall over the Indian monsoon region poses severe societal and infrastructural risks but remains difficult to predict at daily time scales due to stochastic convective triggering and multiscale atmospheric interactions. While large-scale atmospheric fields provide important environmental context, their ability to localize extreme rainfall events is fundamentally limited. In this study, we examine how large-scale atmospheric information from ERA5 reanalysis can be leveraged for event-level probabilistic prediction of daily rainfall extremes over India. We compare an adaptive ConvLSTM baseline with a proposed Physics-Gated Latent Ordinary Differential Equation (PG-LODE) framework, which models atmospheric evolution as a continuous-time latent process whose dynamics are explicitly modulated by a physics-based gating mechanism under convectively unstable conditions. Extreme events are defined using the local 95th percentile of the India Meteorological Department gridded rainfall dataset during the June to September monsoon season. Pixel-wise evaluation shows limited skill for both models due to spatial displacement errors, whereas event-level tile-based verification reveals a clear performance contrast. The ConvLSTM remains highly conservative, detecting only 27 percent of extreme events, while PG-LODE achieves near-complete detection with a substantially higher critical success index and a moderate false alarm rate. These results demonstrate that physics-gated continuous-time latent dynamics offer a robust pathway for translating large-scale atmospheric predictability into reliable assessments of extreme rainfall risk.
- [396] arXiv:2602.03171 [pdf, ps, other]
-
Title: StepScorer: Accelerating Reinforcement Learning with Step-wise Scoring and Psychological Regret ModelingAuthors: Zhe XuComments: 10 pages, 5 figures, 1 tableSubjects: Machine Learning (cs.LG)
Reinforcement learning algorithms often suffer from slow convergence due to sparse reward signals, particularly in complex environments where feedback is delayed or infrequent. This paper introduces the Psychological Regret Model (PRM), a novel approach that accelerates learning by incorporating regret-based feedback signals after each decision step. Rather than waiting for terminal rewards, PRM computes a regret signal based on the difference between the expected value of the optimal action and the value of the action taken in each state. This transforms sparse rewards into dense feedback signals through a step-wise scoring framework, enabling faster convergence. We demonstrate that PRM achieves stable performance approximately 36\% faster than traditional Proximal Policy Optimization (PPO) in benchmark environments such as Lunar Lander. Our results indicate that PRM is particularly effective in continuous control tasks and environments with delayed feedback, making it suitable for real-world applications such as robotics, finance, and adaptive education where rapid policy adaptation is critical. The approach formalizes human-inspired counterfactual thinking as a computable regret signal, bridging behavioral economics and reinforcement learning.
- [397] arXiv:2602.03172 [pdf, ps, other]
-
Title: Adversarial construction as a potential solution to the experiment design problem in large task spacesComments: 7 pages, 7 figuresSubjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Despite decades of work, we still lack a robust, task-general theory of human behavior even in the simplest domains. In this paper we tackle the generality problem head-on, by aiming to develop a unified model for all tasks embedded in a task-space. In particular we consider the space of binary sequence prediction tasks where the observations are generated by the space parameterized by hidden Markov models (HMM). As the space of tasks is large, experimental exploration of the entire space is infeasible. To solve this problem we propose the adversarial construction approach, which helps identify tasks that are most likely to elicit a qualitatively novel behavior. Our results suggest that adversarial construction significantly outperforms random sampling of environments and therefore could be used as a proxy for optimal experimental design in high-dimensional task spaces.
- [398] arXiv:2602.03175 [pdf, ps, other]
-
Title: Probe-then-Commit Multi-Objective Bandits: Theoretical Benefits of Limited Multi-Arm FeedbackAuthors: Ming ShiSubjects: Machine Learning (cs.LG)
We study an online resource-selection problem motivated by multi-radio access selection and mobile edge computing offloading. In each round, an agent chooses among $K$ candidate links/servers (arms) whose performance is a stochastic $d$-dimensional vector (e.g., throughput, latency, energy, reliability). The key interaction is \emph{probe-then-commit (PtC)}: the agent may probe up to $q>1$ candidates via control-plane measurements to observe their vector outcomes, but must execute exactly one candidate in the data plane. This limited multi-arm feedback regime strictly interpolates between classical bandits ($q=1$) and full-information experts ($q=K$), yet existing multi-objective learning theory largely focuses on these extremes. We develop \textsc{PtC-P-UCB}, an optimistic probe-then-commit algorithm whose technical core is frontier-aware probing under uncertainty in a Pareto mode, e.g., it selects the $q$ probes by approximately maximizing a hypervolume-inspired frontier-coverage potential and commits by marginal hypervolume gain to directly expand the attained Pareto region. We prove a dominated-hypervolume frontier error of $\tilde{O} (K_P d/\sqrt{qT})$, where $K_P$ is the Pareto-frontier size and $T$ is the horizon, and scalarized regret $\tilde{O} (L_\phi d\sqrt{(K/q)T})$, where $\phi$ is the scalarizer. These quantify a transparent $1/\sqrt{q}$ acceleration from limited probing. We further extend to \emph{multi-modal probing}: each probe returns $M$ modalities (e.g., CSI, queue, compute telemetry), and uncertainty fusion yields variance-adaptive versions of the above bounds via an effective noise scale.
- [399] arXiv:2602.03176 [pdf, ps, other]
-
Title: BinaryDemoire: Moiré-Aware Binarization for Image DemoiréingAuthors: Zheng Chen, Zhi Yang, Xiaoyang Liu, Weihang Zhang, Mengfan Wang, Yifan Fu, Linghe Kong, Yulun ZhangComments: Code is available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Image demoir\'eing aims to remove structured moir\'e artifacts in recaptured imagery, where degradations are highly frequency-dependent and vary across scales and directions. While recent deep networks achieve high-quality restoration, their full-precision designs remain costly for deployment. Binarization offers an extreme compression regime by quantizing both activations and weights to 1-bit. Yet, it has been rarely studied for demoir\'eing and performs poorly when naively applied. In this work, we propose BinaryDemoire, a binarized demoir\'eing framework that explicitly accommodates the frequency structure of moir\'e degradations. First, we introduce a moir\'e-aware binary gate (MABG) that extracts lightweight frequency descriptors together with activation statistics. It predicts channel-wise gating coefficients to condition the aggregation of binary convolution responses. Second, we design a shuffle-grouped residual adapter (SGRA) that performs structured sparse shortcut alignment. It further integrates interleaved mixing to promote information exchange across different channel partitions. Extensive experiments on four benchmarks demonstrate that the proposed BinaryDemoire surpasses current binarization methods. Code: https://github.com/zhengchen1999/BinaryDemoire.
- [400] arXiv:2602.03177 [pdf, ps, other]
-
Title: Estimation of Ground Reaction Forces from Kinematic Data during LocomotionAuthors: Gautami Golani, Dong Anh Khoa To, Ananda Sidarta, Arun-Kumar Kaliya-Perumal, Oliver Roberts, Lek Syn Lim, Jim Patton, Domenico CampoloSubjects: Robotics (cs.RO)
Ground reaction forces (GRFs) provide fundamental insight into human gait mechanics and are widely used to assess joint loading, limb symmetry, balance control, and motor function. Despite their clinical relevance, the use of GRF remains underutilised in clinical workflows due to the practical limitations of force plate systems. In this work, we present a force-plate-free approach for estimating GRFs using only marker-based motion capture data. This kinematics only method to estimate and decompose GRF makes it well suited for widespread clinical depolyment. By using kinematics from sixteen body segments, we estimate the centre of mass (CoM) and compute GRFs, which are subsequently decomposed into individual components through a minimization-based approach. Through this framework, we can identify gait stance phases and provide access to clinically meaningful kinetic measures without a dedicated force plate system. Experimental results demonstrate the viability of CoM and GRF estimation based solely on kinematic data, supporting force-plate-free gait analysis.
- [401] arXiv:2602.03178 [pdf, ps, other]
-
Title: Fully Automated Adaptive Parameter Selection for 3-D High-order Nyström Boundary Integral Equation MethodsSubjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
We present an adaptive Chebyshev-based Boundary Integral Equation (CBIE) solver for electromagnetic scattering from smooth perfect electric conductor (PEC) objects. The proposed approach eliminates manual parameter tuning by introducing (i) a unified adaptive quadrature strategy for automatic selection of the near-singular interaction distance and (ii) an adaptive computation of all self- and near-singular precomputation integrals to a prescribed accuracy using Gauss-Kronrod (h-adaptive) or Clenshaw-Curtis (p-adaptive) rules and singularity-resolving changes of variables. Both h-adaptive and p-adaptive schemes are explored within this framework, ensuring high-order accuracy and robustness across a broad range of geometries without loss of efficiency. Numerical results for canonical and complex CAD geometries demonstrate that the adaptive solver achieves accuracy and convergence rates comparable to optimally tuned fixed-grid CBIE implementations, while offering automation and scalability to electrically large, geometrically complex problems.
- [402] arXiv:2602.03181 [pdf, ps, other]
-
Title: Synthesizing File-Level Data for Unit Test Generation with Chain-of-Thoughts via Self-DebuggingAuthors: Ziyue Hua, Tianyu Chen, Yeyun Gong, Shuai Lu, Peng Cheng, Qinglin Zhu, Yibo He, Yingjie Fu, Wenpin Jiao, Wei Yang, Tao XieSubjects: Software Engineering (cs.SE)
Automatic unit test (UT) generation is essential for software quality assurance, but existing approaches--including symbolic execution, search-based approaches, and recent LLM-based generators--struggle to produce human-quality tests with correct, meaningful assertions and reliable chain-of-thought (CoT) explanations. We identify a gap in UT training data: repository-mined tests lack developer CoTs, while LLM-distilled CoTs are often incorrect or incomplete. To address this issue, we propose a novel data-distillation approach that uses self-debugging to produce high-quality UT training examples paired with faithful CoTs. Our approach combines (1) guided test repair, a heuristic loop (error-, failure-, and coverage-focused steps) that asks the used model to diagnose and iteratively fix generated tests, and (2) CoT compression, which compacts original and debugging CoTs into concise explanations that directly justify correct tests. We apply this pipeline to a large corpus of open-source projects to construct a dataset of 74,518 high-quality <focal method, test, CoT> examples, and then use it for supervised fine-tuning of a base model. An empirical evaluation shows that the fine-tuned model achieves high UT generation effectiveness: it attains a pass rate of 36.17% on test assertions, a branch coverage of 43.90%, and a mutation score of 88.66%, substantially higher than state-of-the-art commercial models like o4-mini.
- [403] arXiv:2602.03182 [pdf, ps, other]
-
Title: LSGQuant: Layer-Sensitivity Guided Quantization for One-Step Diffusion Real-World Video Super-ResolutionAuthors: Tianxing Wu, Zheng Chen, Cirou Xu, Bowen Chai, Yong Guo, Yutong Liu, Linghe Kong, Yulun ZhangComments: Code is available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
One-Step Diffusion Models have demonstrated promising capability and fast inference in video super-resolution (VSR) for real-world. Nevertheless, the substantial model size and high computational cost of Diffusion Transformers (DiTs) limit downstream applications. While low-bit quantization is a common approach for model compression, the effectiveness of quantized models is challenged by the high dynamic range of input latent and diverse layer behaviors. To deal with these challenges, we introduce LSGQuant, a layer-sensitivity guided quantizing approach for one-step diffusion-based real-world VSR. Our method incorporates a Dynamic Range Adaptive Quantizer (DRAQ) to fit video token activations. Furthermore, we estimate layer sensitivity and implement a Variance-Oriented Layer Training Strategy (VOLTS) by analyzing layer-wise statistics in calibration. We also introduce Quantization-Aware Optimization (QAO) to jointly refine the quantized branch and a retained high-precision branch. Extensive experiments demonstrate that our method has nearly performance to origin model with full-precision and significantly exceeds existing quantization techniques. Code is available at: https://github.com/zhengchen1999/LSGQuant.
- [404] arXiv:2602.03183 [pdf, ps, other]
-
Title: Privasis: Synthesizing the Largest "Public" Private Dataset from ScratchAuthors: Hyunwoo Kim, Niloofar Mireshghallah, Michael Duan, Rui Xin, Shuyue Stella Li, Jaehun Jung, David Acuna, Qi Pang, Hanshen Xiao, G. Edward Suh, Sewoong Oh, Yulia Tsvetkov, Pang Wei Koh, Yejin ChoiComments: For code and data, see this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Research involving privacy-sensitive data has always been constrained by data scarcity, standing in sharp contrast to other areas that have benefited from data scaling. This challenge is becoming increasingly urgent as modern AI agents--such as OpenClaw and Gemini Agent--are granted persistent access to highly sensitive personal information. To tackle this longstanding bottleneck and the rising risks, we present Privasis (i.e., privacy oasis), the first million-scale fully synthetic dataset entirely built from scratch--an expansive reservoir of texts with rich and diverse private information--designed to broaden and accelerate research in areas where processing sensitive social data is inevitable. Compared to existing datasets, Privasis, comprising 1.4 million records, offers orders-of-magnitude larger scale with quality, and far greater diversity across various document types, including medical history, legal documents, financial records, calendars, and text messages with a total of 55.1 million annotated attributes such as ethnicity, date of birth, workplace, etc. We leverage Privasis to construct a parallel corpus for text sanitization with our pipeline that decomposes texts and applies targeted sanitization. Our compact sanitization models (<=4B) trained on this dataset outperform state-of-the-art large language models, such as GPT-5 and Qwen-3 235B. We plan to release data, models, and code to accelerate future research on privacy-sensitive domains and agents.
- [405] arXiv:2602.03184 [pdf, ps, other]
-
Title: DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM InferenceAuthors: Jiancai Ye, Jun Liu, Qingchen Li, Tianlang Zhao, Hanbin Zhang, Jiayi Pan, Ningyi Xu, Guohao DaiSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Although Key-Value (KV) Cache is essential for efficient large language models (LLMs) inference, its growing memory footprint in long-context scenarios poses a significant bottleneck, making KVCache compression crucial. Current compression methods rely on rigid splitting strategies, such as fixed intervals or pre-defined delimiters. We observe that rigid splitting suffers from significant accuracy degradation (ranging from 5.5% to 55.1%) across different scenarios, owing to the scenario-dependent nature of the semantic boundaries. This highlights the necessity of dynamic semantic splitting to match semantics. To achieve this, we face two challenges. (1) Improper delimiter selection misaligns semantics with the KVCache, resulting in 28.6% accuracy loss. (2) Variable-length blocks after splitting introduce over 73.1% additional inference overhead. To address the above challenges, we propose DynSplit-KV, a KVCache compression method that dynamically identifies delimiters for splitting. We propose: (1) a dynamic importance-aware delimiter selection strategy, improving accuracy by 49.9%. (2) A uniform mapping strategy that transforms variable-length semantic blocks into a fixed-length format, reducing inference overhead by 4.9x. Experiments show that DynSplit-KV achieves the highest accuracy, 2.2x speedup compared with FlashAttention and 2.6x peak memory reduction in long-context scenarios.
- [406] arXiv:2602.03188 [pdf, ps, other]
-
Title: Hierarchical Proportion Models for Motion Generation via Integration of Motion PrimitivesComments: 6 pages, 9 figures. Accepted for publication in IEEE AMC 2026Subjects: Robotics (cs.RO)
Imitation learning (IL) enables robots to acquire human-like motion skills from demonstrations, but it still requires extensive high-quality data and retraining to handle complex or long-horizon tasks. To improve data efficiency and adaptability, this study proposes a hierarchical IL framework that integrates motion primitives with proportion-based motion synthesis. The proposed method employs a two-layer architecture, where the upper layer performs long-term planning, while a set of lower-layer models learn individual motion primitives, which are combined according to specific proportions. Three model variants are introduced to explore different trade-offs between learning flexibility, computational cost, and adaptability: a learning-based proportion model, a sampling-based proportion model, and a playback-based proportion model, which differ in how the proportions are determined and whether the upper layer is trainable. Through real-robot pick-and-place experiments, the proposed models successfully generated complex motions not included in the primitive set. The sampling-based and playback-based proportion models achieved more stable and adaptable motion generation than the standard hierarchical model, demonstrating the effectiveness of proportion-based motion integration for practical robot learning.
- [407] arXiv:2602.03189 [pdf, ps, other]
-
Title: StreamShield: A Production-Proven Resiliency Solution for Apache Flink at ByteDanceSubjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
Distributed Stream Processing Systems (DSPSs) form the backbone of real-time processing and analytics at ByteDance, where Apache Flink powers one of the largest production clusters worldwide. Ensuring resiliency, the ability to withstand and rapidly recover from failures, together with operational stability, which provides consistent and predictable performance under normal conditions, is essential for meeting strict Service Level Objectives (SLOs). However, achieving resiliency and stability in large-scale production environments remains challenging due to the cluster scale, business diversity, and significant operational overhead. In this work, we present StreamShield, a production-proven resiliency solution deployed in ByteDance's Flink clusters. Designed along complementary perspectives of the engine and cluster, StreamShield introduces key techniques to enhance resiliency, covering runtime optimization, fine-grained fault-tolerance, hybrid replication strategy, and high availability under external systems. Furthermore, StreamShield proposes a robust testing and deployment pipeline that ensures reliability and robustness in production releases. Extensive evaluations on a production cluster demonstrate the efficiency and effectiveness of techniques proposed by StreamShield.
- [408] arXiv:2602.03190 [pdf, ps, other]
-
Title: Prompt Augmentation Scales up GRPO Training on Mathematical ReasoningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Reinforcement learning algorithms such as group-relative policy optimization (GRPO) have demonstrated strong potential for improving the mathematical reasoning capabilities of large language models. However, prior work has consistently observed an entropy collapse phenomenon during reinforcement post-training, characterized by a monotonic decrease in policy entropy that ultimately leads to training instability and collapse. As a result, most existing approaches restrict training to short horizons (typically 5-20 epochs), limiting sustained exploration and hindering further policy improvement. In addition, nearly all prior work relies on a single, fixed reasoning prompt or template during training. In this work, we introduce prompt augmentation, a training strategy that instructs the model to generate reasoning traces under diverse templates and formats, thereby increasing rollout diversity. We show that, without a KL regularization term, prompt augmentation enables stable scaling of training duration under a fixed dataset and allows the model to tolerate low-entropy regimes without premature collapse. Empirically, a Qwen2.5-Math-1.5B model trained with prompt augmentation on the MATH Level 3-5 dataset achieves state-of-the-art performance, reaching 44.5 per-benchmark accuracy and 51.3 per-question accuracy on standard mathematical reasoning benchmarks, including AIME24, AMC, MATH500, Minerva, and OlympiadBench. The code and model checkpoints are available at https://github.com/wenquanlu/prompt-augmentation-GRPO.
- [409] arXiv:2602.03195 [pdf, ps, other]
-
Title: Reinforcement Learning with Promising Tokens for Large Language ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Reinforcement learning (RL) has emerged as a key paradigm for aligning and optimizing large language models (LLMs). Standard approaches treat the LLM as the policy and apply RL directly over the full vocabulary space. However, this formulation includes the massive tail of contextually irrelevant tokens in the action space, which could distract the policy from focusing on decision-making among the truly reasonable tokens. In this work, we verify that valid reasoning paths could inherently concentrate within a low-rank subspace. Based on this insight, we introduce Reinforcement Learning with Promising Tokens (RLPT), a framework that mitigates the action space issue by decoupling strategic decision-making from token generation. Specifically, RLPT leverages the semantic priors of the base model to identify a dynamic set of \emph{promising tokens} and constrains policy optimization exclusively to this refined subset via masking. Theoretical analysis and empirical results demonstrate that RLPT effectively reduces gradient variance, stabilizes the training process, and improves sample efficiency. Experiment results on math, coding, and telecom reasoning show that RLPT outperforms standard RL baselines and integrates effectively across various model sizes (4B and 8B) and RL algorithms (GRPO and DAPO).
- [410] arXiv:2602.03197 [pdf, ps, other]
-
Title: Exploring the Role of Tracing in AI-Supported Planning for Algorithmic ReasoningComments: 14 pages, 5 figures, 2 tablesSubjects: Human-Computer Interaction (cs.HC)
AI-powered planning tools show promise in supporting programming learners by enabling early, formative feedback on their thinking processes prior to coding. To date, however, most AI-supported planning tools rely on students' natural-language explanations, using LLMs to interpret learners' descriptions of their algorithmic intent. Prior to the emergence of LLM-based systems, CS education research extensively studied trace-based planning in pen-and-paper settings, demonstrating that reasoning through stepwise execution with explicit state transitions helps learners build and refine mental models of program behavior. Despite its potential, little is known about how tracing interacts with AI-mediated feedback and whether integrating tracing into AI-supported planning tools leads to different learning processes or interaction dynamics compared to natural-language-based planning alone. We study how requiring learners to produce explicit execution traces with an AI-supported planning tool affects their algorithmic reasoning. In a between-subjects study with 20 students, tracing shifted learners away from code-like, line-by-line descriptions toward more goal-driven reasoning about program behavior. Moreover, it led to more consistent partially correct solutions, although final coding performance remained comparable across conditions. Notably, tracing did not significantly affect the quality or reliability of LLM-generated feedback. These findings reveal tradeoffs in combining tracing with AI-supported planning and inform design guidelines for integrating natural language, tracing, and coding to support iterative reasoning throughout the programming process.
- [411] arXiv:2602.03198 [pdf, ps, other]
-
Title: From Single Scan to Sequential Consistency: A New Paradigm for LIDAR RelocalizationComments: NothingSubjects: Computer Vision and Pattern Recognition (cs.CV)
LiDAR relocalization aims to estimate the global 6-DoF pose of a sensor in the environment. However, existing regression-based approaches are prone to dynamic or ambiguous scenarios, as they either solely rely on single-frame inference or neglect the spatio-temporal consistency across scans. In this paper, we propose TempLoc, a new LiDAR relocalization framework that enhances the robustness of localization by effectively modeling sequential consistency. Specifically, a Global Coordinate Estimation module is first introduced to predict point-wise global coordinates and associated uncertainties for each LiDAR scan. A Prior Coordinate Generation module is then presented to estimate inter-frame point correspondences by the attention mechanism. Lastly, an Uncertainty-Guided Coordinate Fusion module is deployed to integrate both predictions of point correspondence in an end-to-end fashion, yielding a more temporally consistent and accurate global 6-DoF pose. Experimental results on the NCLT and Oxford Robot-Car benchmarks show that our TempLoc outperforms stateof-the-art methods by a large margin, demonstrating the effectiveness of temporal-aware correspondence modeling in LiDAR relocalization. Our code will be released soon.
- [412] arXiv:2602.03200 [pdf, ps, other]
-
Title: Hand3R: Online 4D Hand-Scene Reconstruction in the WildSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
For Embodied AI, jointly reconstructing dynamic hands and the dense scene context is crucial for understanding physical interaction. However, most existing methods recover isolated hands in local coordinates, overlooking the surrounding 3D environment. To address this, we present Hand3R, the first online framework for joint 4D hand-scene reconstruction from monocular video. Hand3R synergizes a pre-trained hand expert with a 4D scene foundation model via a scene-aware visual prompting mechanism. By injecting high-fidelity hand priors into a persistent scene memory, our approach enables simultaneous reconstruction of accurate hand meshes and dense metric-scale scene geometry in a single forward pass. Experiments demonstrate that Hand3R bypasses the reliance on offline optimization and delivers competitive performance in both local hand reconstruction and global positioning.
- [413] arXiv:2602.03201 [pdf, ps, other]
-
Title: From Scalar Rewards to Potential Trends: Shaping Potential Landscapes for Model-Based Reinforcement LearningAuthors: Yao-Hui Li, Zeyu Wang, Xin Li, Wei Pang, Yingfang Yuan, Zhengkun Chen, Boya Zhang, Riashat Islam, Alex Lamb, Yonggang ZhangComments: 26 pages, 20 figures.Preprint. Work in progressSubjects: Machine Learning (cs.LG)
Model-based reinforcement learning (MBRL) achieves high sample efficiency by simulating future trajectories with learned dynamics and reward models. However, its effectiveness is severely compromised in sparse reward settings. The core limitation lies in the standard paradigm of regressing ground-truth scalar rewards: in sparse environments, this yields a flat, gradient-free landscape that fails to provide directional guidance for planning. To address this challenge, we propose Shaping Landscapes with Optimistic Potential Estimates (SLOPE), a novel framework that shifts reward modeling from predicting scalars to constructing informative potential landscapes. SLOPE employs optimistic distributional regression to estimate high-confidence upper bounds, which amplifies rare success signals and ensures sufficient exploration gradients. Evaluations on 30+ tasks across 5 benchmarks demonstrate that SLOPE consistently outperforms leading baselines in fully sparse, semi-sparse, and dense rewards.
- [414] arXiv:2602.03203 [pdf, ps, other]
-
Title: ForesightKV: Optimizing KV Cache Eviction for Reasoning Models by Learning Long-Term ContributionSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Recently, large language models (LLMs) have shown remarkable reasoning abilities by producing long reasoning traces. However, as the sequence length grows, the key-value (KV) cache expands linearly, incurring significant memory and computation costs. Existing KV cache eviction methods mitigate this issue by discarding less important KV pairs, but often fail to capture complex KV dependencies, resulting in performance degradation. To better balance efficiency and performance, we introduce ForesightKV, a training-based KV cache eviction framework that learns to predict which KV pairs to evict during long-text generations. We first design the Golden Eviction algorithm, which identifies the optimal eviction KV pairs at each step using future attention scores. These traces and the scores at each step are then distilled via supervised training with a Pairwise Ranking Loss. Furthermore, we formulate cache eviction as a Markov Decision Process and apply the GRPO algorithm to mitigate the significant language modeling loss increase on low-entropy tokens. Experiments on AIME2024 and AIME2025 benchmarks of three reasoning models demonstrate that ForesightKV consistently outperforms prior methods under only half the cache budget, while benefiting synergistically from both supervised and reinforcement learning approaches.
- [415] arXiv:2602.03204 [pdf, ps, other]
-
Title: Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical GeometrySubjects: Machine Learning (cs.LG)
While Mixture-of-Experts (MoE) architectures define the state-of-the-art, their theoretical success is often attributed to heuristic efficiency rather than geometric expressivity. In this work, we present the first analysis of MoE through the lens of tropical geometry, establishing that the Top-$k$ routing mechanism is algebraically isomorphic to the $k$-th elementary symmetric tropical polynomial. This isomorphism partitions the input space into the Normal Fan of a Hypersimplex, revealing that \textbf{sparsity is combinatorial depth} which scales geometric capacity by the binomial coefficient $\binom{N}{k}$. Moving beyond ambient bounds, we introduce the concept of \textit{Effective Capacity} under the Manifold Hypothesis. We prove that while dense networks suffer from capacity collapse on low-dimensional data, MoE architectures exhibit \textit{Combinatorial Resilience}, maintaining high expressivity via the transversality of routing cones. In this study, our framework unifies the discrete geometry of the Hypersimplex with the continuous geometry of neural functions, offering a rigorous theoretical justification for the topological supremacy of conditional computation.
- [416] arXiv:2602.03205 [pdf, ps, other]
-
Title: HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body ControlSubjects: Robotics (cs.RO)
While current humanoid whole-body control frameworks predominantly rely on the static environment assumptions, addressing tasks characterized by high dynamism and complex interactions presents a formidable challenge. In this paper, we address humanoid skateboarding, a highly challenging task requiring stable dynamic maneuvering on an underactuated wheeled platform. This integrated system is governed by non-holonomic constraints and tightly coupled human-object interactions. Successfully executing this task requires simultaneous mastery of hybrid contact dynamics and robust balance control on a mechanically coupled, dynamically unstable skateboard. To overcome the aforementioned challenges, we propose HUSKY, a learning-based framework that integrates humanoid-skateboard system modeling and physics-aware whole-body control. We first model the coupling relationship between board tilt and truck steering angles, enabling a principled analysis of system dynamics. Building upon this, HUSKY leverages Adversarial Motion Priors (AMP) to learn human-like pushing motions and employs a physics-guided, heading-oriented strategy for lean-to-steer behaviors. Moreover, a trajectory-guided mechanism ensures smooth and stable transitions between pushing and steering. Experimental results on the Unitree G1 humanoid platform demonstrate that our framework enables stable and agile maneuvering on skateboards in real-world scenarios. The project page is available on https://husky-humanoid.github.io/.
- [417] arXiv:2602.03207 [pdf, ps, other]
-
Title: WebSplatter: Enabling Cross-Device Efficient Gaussian Splatting in Web Browsers via WebGPUSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Performance (cs.PF)
We present WebSplatter, an end-to-end GPU rendering pipeline for the heterogeneous web ecosystem. Unlike naive ports, WebSplatter introduces a wait-free hierarchical radix sort that circumvents the lack of global atomics in WebGPU, ensuring deterministic execution across diverse hardware. Furthermore, we propose an opacity-aware geometry culling stage that dynamically prunes splats before rasterization, significantly reducing overdraw and peak memory footprint. Evaluation demonstrates that WebSplatter consistently achieves 1.2$\times$ to 4.5$\times$ speedups over state-of-the-art web viewers.
- [418] arXiv:2602.03208 [pdf, ps, other]
-
Title: Spectral Evolution Search: Efficient Inference-Time Scaling for Reward-Aligned Image GenerationSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Inference-time scaling offers a versatile paradigm for aligning visual generative models with downstream objectives without parameter updates. However, existing approaches that optimize the high-dimensional initial noise suffer from severe inefficiency, as many search directions exert negligible influence on the final generation. We show that this inefficiency is closely related to a spectral bias in generative dynamics: model sensitivity to initial perturbations diminishes rapidly as frequency increases. Building on this insight, we propose Spectral Evolution Search (SES), a plug-and-play framework for initial noise optimization that executes gradient-free evolutionary search within a low-frequency subspace. Theoretically, we derive the Spectral Scaling Prediction from perturbation propagation dynamics, which explains the systematic differences in the impact of perturbations across frequencies. Extensive experiments demonstrate that SES significantly advances the Pareto frontier of generation quality versus computational cost, consistently outperforming strong baselines under equivalent budgets.
- [419] arXiv:2602.03209 [pdf, ps, other]
-
Title: Depth Completion in Unseen Field Robotics Environments Using Extremely Sparse Depth MeasurementsComments: Accepted to ICRA 2026Subjects: Robotics (cs.RO)
Autonomous field robots operating in unstructured environments require robust perception to ensure safe and reliable operations. Recent advances in monocular depth estimation have demonstrated the potential of low-cost cameras as depth sensors; however, their adoption in field robotics remains limited due to the absence of reliable scale cues, ambiguous or low-texture conditions, and the scarcity of large-scale datasets. To address these challenges, we propose a depth completion model that trains on synthetic data and uses extremely sparse measurements from depth sensors to predict dense metric depth in unseen field robotics environments. A synthetic dataset generation pipeline tailored to field robotics enables the creation of multiple realistic datasets for training purposes. This dataset generation approach utilizes textured 3D meshes from Structure from Motion and photorealistic rendering with novel viewpoint synthesis to simulate diverse field robotics scenarios. Our approach achieves an end-to-end latency of 53 ms per frame on a Nvidia Jetson AGX Orin, enabling real-time deployment on embedded platforms. Extensive evaluation demonstrates competitive performance across diverse real-world field robotics scenarios.
- [420] arXiv:2602.03210 [pdf, ps, other]
-
Title: VIRAL: Visual In-Context Reasoning via Analogy in Diffusion TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV)
Replicating In-Context Learning (ICL) in computer vision remains challenging due to task heterogeneity. We propose \textbf{VIRAL}, a framework that elicits visual reasoning from a pre-trained image editing model by formulating ICL as conditional generation via visual analogy ($x_s : x_t :: x_q : y_q$). We adapt a frozen Diffusion Transformer (DiT) using role-aware multi-image conditioning and introduce a Mixture-of-Experts LoRA to mitigate gradient interference across diverse tasks. Additionally, to bridge the gaps in current visual context datasets, we curate a large-scale dataset spanning perception, restoration, and editing. Experiments demonstrate that VIRAL outperforms existing methods, validating that a unified V-ICL paradigm can handle the majority of visual tasks, including open-domain editing. Our code is available at https://anonymous.4open.science/r/VIRAL-744A
- [421] arXiv:2602.03211 [pdf, ps, other]
-
Title: Lookahead Sample Reward Guidance for Test-Time Scaling of Diffusion ModelsComments: Under ReviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Diffusion models have demonstrated strong generative performance; however, generated samples often fail to fully align with human intent. This paper studies a test-time scaling method that enables sampling from regions with higher human-aligned reward values. Existing gradient guidance methods approximate the expected future reward (EFR) at an intermediate particle $\mathbf{x}_t$ using a Taylor approximation, but this approximation at each time step incurs high computational cost due to sequential neural backpropagation. We show that the EFR at any $\mathbf{x}_t$ can be computed using only marginal samples from a pre-trained diffusion model. The proposed EFR formulation detaches the neural dependency between $\mathbf{x}_t$ and the EFR, enabling closed-form guidance computation without neural backpropagation. To further improve efficiency, we introduce lookahead sampling to collect marginal samples. For final sample generation, we use an accurate solver that guides particles toward high-reward lookahead samples. We refer to this sampling scheme as LiDAR sampling. LiDAR achieves substantial performance improvements using only three samples with a 3-step lookahead solver, exhibiting steep performance gains as lookahead accuracy and sample count increase; notably, it reaches the same GenEval performance as the latest gradient guidance method for SDXL with a 9.5x speedup.
- [422] arXiv:2602.03213 [pdf, ps, other]
-
Title: ConsisDrive: Identity-Preserving Driving World Models for Video Generation by Instance MaskSubjects: Computer Vision and Pattern Recognition (cs.CV)
Autonomous driving relies on robust models trained on large-scale, high-quality multi-view driving videos. Although world models provide a cost-effective solution for generating realistic driving data, they often suffer from identity drift, where the same object changes its appearance or category across frames due to the absence of instance-level temporal constraints. We introduce ConsisDrive, an identity-preserving driving world model designed to enforce temporal consistency at the instance level. Our framework incorporates two key components: (1) Instance-Masked Attention, which applies instance identity masks and trajectory masks within attention blocks to ensure that visual tokens interact only with their corresponding instance features across spatial and temporal dimensions, thereby preserving object identity consistency; and (2) Instance-Masked Loss, which adaptively emphasizes foreground regions with probabilistic instance masking, reducing background noise while maintaining overall scene fidelity. By integrating these mechanisms, ConsisDrive achieves state-of-the-art driving video generation quality and demonstrates significant improvements in downstream autonomous driving tasks on the nuScenes dataset. Our project page is https://shanpoyang654.github.io/ConsisDrive/page.html.
- [423] arXiv:2602.03214 [pdf, ps, other]
-
Title: FARTrack: Fast Autoregressive Visual Tracking with High PerformanceSubjects: Computer Vision and Pattern Recognition (cs.CV)
Inference speed and tracking performance are two critical evaluation metrics in the field of visual tracking. However, high-performance trackers often suffer from slow processing speeds, making them impractical for deployment on resource-constrained devices. To alleviate this issue, we propose FARTrack, a Fast Auto-Regressive Tracking framework. Since autoregression emphasizes the temporal nature of the trajectory sequence, it can maintain high performance while achieving efficient execution across various devices. FARTrack introduces Task-Specific Self-Distillation and Inter-frame Autoregressive Sparsification, designed from the perspectives of shallow-yet-accurate distillation and redundant-to-essential token optimization, respectively. Task-Specific Self-Distillation achieves model compression by distilling task-specific tokens layer by layer, enhancing the model's inference speed while avoiding suboptimal manual teacher-student layer pairs assignments. Meanwhile, Inter-frame Autoregressive Sparsification sequentially condenses multiple templates, avoiding additional runtime overhead while learning a temporally-global optimal sparsification strategy. FARTrack demonstrates outstanding speed and competitive performance. It delivers an AO of 70.6% on GOT-10k in real-time. Beyond, our fastest model achieves a speed of 343 FPS on the GPU and 121 FPS on the CPU.
- [424] arXiv:2602.03216 [pdf, ps, other]
-
Title: Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token SelectionSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
The quadratic complexity of attention remains the central bottleneck in long-context inference for large language models. Prior acceleration methods either sparsify the attention map with structured patterns or permanently evict tokens at specific layers, which can retain irrelevant tokens or rely on irreversible early decisions despite the layer-/head-wise dynamics of token importance. In this paper, we propose Token Sparse Attention, a lightweight and dynamic token-level sparsification mechanism that compresses per-head $Q$, $K$, $V$ to a reduced token set during attention and then decompresses the output back to the original sequence, enabling token information to be reconsidered in subsequent layers. Furthermore, Token Sparse Attention exposes a new design point at the intersection of token selection and sparse attention. Our approach is fully compatible with dense attention implementations, including Flash Attention, and can be seamlessly composed with existing sparse attention kernels. Experimental results show that Token Sparse Attention consistently improves accuracy-latency trade-off, achieving up to $\times$3.23 attention speedup at 128K context with less than 1% accuracy degradation. These results demonstrate that dynamic and interleaved token-level sparsification is a complementary and effective strategy for scalable long-context inference.
- [425] arXiv:2602.03217 [pdf, ps, other]
-
Title: Topology Matters: A Cautionary Case Study of Graph SSL on Neuro-Inspired BenchmarksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Understanding how local interactions give rise to global brain organization requires models that can represent information across multiple scales. We introduce a hierarchical self-supervised learning (SSL) framework that jointly learns node-, edge-, and graph-level embeddings, inspired by multimodal neuroimaging. We construct a controllable synthetic benchmark mimicking the topological properties of connectomes. Our four-stage evaluation protocol reveals a critical failure: the invariance-based SSL model is fundamentally misaligned with the benchmark's topological properties and is catastrophically outperformed by classical, topology-aware heuristics. Ablations confirm an objective mismatch: SSL objectives designed to be invariant to topological perturbations learn to ignore the very community structure that classical methods exploit. Our results expose a fundamental pitfall in applying generic graph SSL to connectome-like data. We present this framework as a cautionary case study, highlighting the need for new, topology-aware SSL objectives for neuro-AI research that explicitly reward the preservation of structure (e.g., modularity or motifs).
- [426] arXiv:2602.03219 [pdf, ps, other]
-
Title: Beyond Quantity: Trajectory Diversity Scaling for Code AgentsAuthors: Guhong Chen, Chenghao Sun, Cheng Fu, Qiyao Wang, Zhihong Huang, Chaopeng Wei, Guangxu Chen, Feiteng Fang, Ahmadreza Argha, Bing Zhao, Xander Xu, Qi Han, Hamid Alinejad-Rokny, Qiang Qu, Binhua Li, Shiwen Ni, Min Yang, Hu Wei, Yongbin LiSubjects: Artificial Intelligence (cs.AI)
As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of quantity scaling. Moreover, quantity-centric scaling exhibits an early bottleneck that underutilizes trajectory data. We propose TDScaling, a Trajectory Diversity Scaling-based data synthesis framework for code agents that scales performance through diversity rather than raw volume. Under a fixed training budget, increasing trajectory diversity yields larger gains than adding more trajectories, improving the performance-cost trade-off for agent training. TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a blueprint-driven multi-agent paradigm that enforces trajectory coherence; (3) an adaptive evolution mechanism that steers synthesis toward long-tail scenarios using Domain Entropy, Reasoning Mode Entropy, and Cumulative Action Complexity to prevent mode collapse; and (4) a sandboxed code tool that mitigates catastrophic forgetting of intrinsic coding capabilities. Experiments on general tool-use benchmarks (BFCL, tau^2-Bench) and code agent tasks (RebenchT, CodeCI, BIRD) demonstrate a win-win outcome: TDScaling improves both tool-use generalization and inherent coding proficiency. We plan to release the full codebase and the synthesized dataset (including 30,000+ tool clusters) upon publication.
- [427] arXiv:2602.03220 [pdf, ps, other]
-
Title: PokeFusion Attention: Enhancing Reference-Free Style-Conditioned GenerationAuthors: Jingbang Tang (James)Comments: 7 pages, 5 figures. Under review at IJCNN 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper studies reference-free style-conditioned character generation in text-to-image diffusion models, where high-quality synthesis requires both stable character structure and consistent, fine-grained style expression across diverse prompts. Existing approaches primarily rely on text-only prompting, which is often under-specified for visual style and tends to produce noticeable style drift and geometric inconsistency, or introduce reference-based adapters that depend on external images at inference time, increasing architectural complexity and limiting deployment flexibility.We propose PokeFusion Attention, a lightweight decoder-level cross-attention mechanism that fuses textual semantics with learned style embeddings directly inside the diffusion decoder. By decoupling text and style conditioning at the attention level, our method enables effective reference-free stylized generation while keeping the pretrained diffusion backbone fully frozen.PokeFusion Attention trains only decoder cross-attention layers together with a compact style projection module, resulting in a parameter-efficient and plug-and-play control component that can be easily integrated into existing diffusion pipelines and transferred across different backbones.Experiments on a stylized character generation benchmark (Pokemon-style) demonstrate that our method consistently improves style fidelity, semantic alignment, and character shape consistency compared with representative adapter-based baselines, while maintaining low parameter overhead and inference-time simplicity.
- [428] arXiv:2602.03223 [pdf, ps, other]
-
Title: Distribution-Aware End-to-End Embedding for Streaming Numerical Features in Click-Through Rate PredictionAuthors: Jiahao Liu, Hongji Ruan, Weimin Zhang, Ziye Tong, Derick Tang, Zhanpeng Zeng, Qinsong Zeng, Peng Zhang, Tun Lu, Ning GuComments: Under reviewSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
This paper explores effective numerical feature embedding for Click-Through Rate prediction in streaming environments. Conventional static binning methods rely on offline statistics of numerical distributions; however, this inherently two-stage process often triggers semantic drift during bin boundary updates. While neural embedding methods enable end-to-end learning, they often discard explicit distributional information. Integrating such information end-to-end is challenging because streaming features often violate the i.i.d. assumption, precluding unbiased estimation of the population distribution via the expectation of order statistics. Furthermore, the critical context dependency of numerical distributions is often neglected. To this end, we propose DAES, an end-to-end framework designed to tackle numerical feature embedding in streaming training scenarios by integrating distributional information with an adaptive modulation mechanism. Specifically, we introduce an efficient reservoir-sampling-based distribution estimation method and two field-aware distribution modulation strategies to capture streaming distributions and field-dependent semantics. DAES significantly outperforms existing approaches as demonstrated by extensive offline and online experiments and has been fully deployed on a leading short-video platform with hundreds of millions of daily active users.
- [429] arXiv:2602.03224 [pdf, ps, other]
-
Title: TAME: A Trustworthy Test-Time Evolution of Agent Memory with Systematic BenchmarkingAuthors: Yu Cheng, Jiuan Zhou, Yongkang Hu, Yihang Chen, Huichi Zhou, Mingang Chen, Zhizhong Zhang, Kun Shao, Yuan Xie, Zhaoxia YinSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Test-time evolution of agent memory serves as a pivotal paradigm for achieving AGI by bolstering complex reasoning through experience accumulation. However, even during benign task evolution, agent safety alignment remains vulnerable-a phenomenon known as Agent Memory Misevolution. To evaluate this phenomenon, we construct the Trust-Memevo benchmark to assess multi-dimensional trustworthiness during benign task evolution, revealing an overall decline in trustworthiness across various task domains and evaluation settings. To address this issue, we propose TAME, a dual-memory evolutionary framework that separately evolves executor memory to improve task performance by distilling generalizable methodologies, and evaluator memory to refine assessments of both safety and task utility based on historical feedback. Through a closed loop of memory filtering, draft generation, trustworthy refinement, execution, and dual-track memory updating, TAME preserves trustworthiness without sacrificing utility. Experiments demonstrate that TAME mitigates misevolution, achieving a joint improvement in both trustworthiness and task performance.
- [430] arXiv:2602.03226 [pdf, ps, other]
-
Title: ATACompressor: Adaptive Task-Aware Compression for Efficient Long-Context Processing in LLMsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Long-context inputs in large language models (LLMs) often suffer from the "lost in the middle" problem, where critical information becomes diluted or ignored due to excessive length. Context compression methods aim to address this by reducing input size, but existing approaches struggle with balancing information preservation and compression efficiency. We propose Adaptive Task-Aware Compressor (ATACompressor), which dynamically adjusts compression based on the specific requirements of the task. ATACompressor employs a selective encoder that compresses only the task-relevant portions of long contexts, ensuring that essential information is preserved while reducing unnecessary content. Its adaptive allocation controller perceives the length of relevant content and adjusts the compression rate accordingly, optimizing resource utilization. We evaluate ATACompressor on three QA datasets: HotpotQA, MSMARCO, and SQUAD-showing that it outperforms existing methods in terms of both compression efficiency and task performance. Our approach provides a scalable solution for long-context processing in LLMs. Furthermore, we perform a range of ablation studies and analysis experiments to gain deeper insights into the key components of ATACompressor.
- [431] arXiv:2602.03227 [pdf, ps, other]
-
Title: Spiral RoPE: Rotate Your Rotary Positional Embeddings in the 2D PlaneAuthors: Haoyu Liu, Sucheng Ren, Tingyu Zhu, Peng Wang, Cihang Xie, Alan Yuille, Zeyu Zheng, Feng WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
Rotary Position Embedding (RoPE) is the de facto positional encoding in large language models due to its ability to encode relative positions and support length extrapolation. When adapted to vision transformers, the standard axial formulation decomposes two-dimensional spatial positions into horizontal and vertical components, implicitly restricting positional encoding to axis-aligned directions. We identify this directional constraint as a fundamental limitation of the standard axial 2D RoPE, which hinders the modeling of oblique spatial relationships that naturally exist in natural images. To overcome this limitation, we propose Spiral RoPE, a simple yet effective extension that enables multi-directional positional encoding by partitioning embedding channels into multiple groups associated with uniformly distributed directions. Each group is rotated according to the projection of the patch position onto its corresponding direction, allowing spatial relationships to be encoded beyond the horizontal and vertical axes. Across a wide range of vision tasks including classification, segmentation, and generation, Spiral RoPE consistently improves performance. Qualitative analysis of attention maps further show that Spiral RoPE exhibits more concentrated activations on semantically relevant objects and better respects local object boundaries, highlighting the importance of multi-directional positional encoding in vision transformers.
- [432] arXiv:2602.03229 [pdf, ps, other]
-
Title: Omnidirectional Solid-State mmWave Radar Perception for UAV Power Line Collision AvoidanceComments: Accepted for publication at the 2026 IEEE International Conference on Robotics and Automation (ICRA). Video at this https URL (youtube)Subjects: Robotics (cs.RO)
Detecting and estimating distances to power lines is a challenge for both human UAV pilots and autonomous systems, which increases the risk of unintended collisions. We present a mmWave radar-based perception system that provides spherical sensing coverage around a small UAV for robust power line detection and avoidance. The system integrates multiple compact solid-state mmWave radar modules to synthesize an omnidirectional field of view while remaining lightweight. We characterize the sensing behavior of this omnidirectional radar arrangement in power line environments and develop a robust detection-and-avoidance algorithm tailored to that behavior. Field experiments on real power lines demonstrate reliable detection at ranges up to 10 m, successful avoidance maneuvers at flight speeds upwards of 10 m/s, and detection of wires as thin as 1.2 mm in diameter. These results indicate the approach's suitability as an additional safety layer for both autonomous and manual UAV flight.
- [433] arXiv:2602.03230 [pdf, ps, other]
-
Title: EventFlash: Towards Efficient MLLMs for Event-Based VisionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Event-based multimodal large language models (MLLMs) enable robust perception in high-speed and low-light scenarios, addressing key limitations of frame-based MLLMs. However, current event-based MLLMs often rely on dense image-like processing paradigms, overlooking the spatiotemporal sparsity of event streams and resulting in high computational cost. In this paper, we propose EventFlash, a novel and efficient MLLM to explore spatiotemporal token sparsification for reducing data redundancy and accelerating inference. Technically, we build EventMind, a large-scale and scene-diverse dataset with over 500k instruction sets, providing both short and long event stream sequences to support our curriculum training strategy. We then present an adaptive temporal window aggregation module for efficient temporal sampling, which adaptively compresses temporal tokens while retaining key temporal cues. Finally, a sparse density-guided attention module is designed to improve spatial token efficiency by selecting informative regions and suppressing empty or sparse areas. Experimental results show that EventFlash achieves a $12.4\times$ throughput improvement over the baseline (EventFlash-Zero) while maintaining comparable performance. It supports long-range event stream processing with up to 1,000 bins, significantly outperforming the 5-bin limit of EventGPT. We believe EventFlash serves as an efficient foundation model for event-based vision.
- [434] arXiv:2602.03232 [pdf, ps, other]
-
Title: BayeSQP: Bayesian Optimization through Sequential Quadratic ProgrammingSubjects: Machine Learning (cs.LG)
We introduce BayeSQP, a novel algorithm for general black-box optimization that merges the structure of sequential quadratic programming with concepts from Bayesian optimization. BayeSQP employs second-order Gaussian process surrogates for both the objective and constraints to jointly model the function values, gradients, and Hessian from only zero-order information. At each iteration, a local subproblem is constructed using the GP posterior estimates and solved to obtain a search direction. Crucially, the formulation of the subproblem explicitly incorporates uncertainty in both the function and derivative estimates, resulting in a tractable second-order cone program for high probability improvements under model uncertainty. A subsequent one-dimensional line search via constrained Thompson sampling selects the next evaluation point. Empirical results show thatBayeSQP outperforms state-of-the-art methods in specific high-dimensional settings. Our algorithm offers a principled and flexible framework that bridges classical optimization techniques with modern approaches to black-box optimization.
- [435] arXiv:2602.03237 [pdf, ps, other]
-
Title: Merging Beyond: Streaming LLM Updates via Activation-Guided RotationsAuthors: Yuxuan Yao, Haonan Sheng, Qingsong Lv, Han Wu, Shuqi Liu, Zehua Liu, Zengyan Liu, Jiahui Gao, Haochen Tan, Xiaojin Fu, Haoli Bai, Hing Cheung So, Zhijiang Guo, Linqi SongSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
The escalating scale of Large Language Models (LLMs) necessitates efficient adaptation techniques. Model merging has gained prominence for its efficiency and controllability. However, existing merging techniques typically serve as post-hoc refinements or focus on mitigating task interference, often failing to capture the dynamic optimization benefits of supervised fine-tuning (SFT). In this work, we propose Streaming Merging, an innovative model updating paradigm that conceptualizes merging as an iterative optimization process. Central to this paradigm is \textbf{ARM} (\textbf{A}ctivation-guided \textbf{R}otation-aware \textbf{M}erging), a strategy designed to approximate gradient descent dynamics. By treating merging coefficients as learning rates and deriving rotation vectors from activation subspaces, ARM effectively steers parameter updates along data-driven trajectories. Unlike conventional linear interpolation, ARM aligns semantic subspaces to preserve the geometric structure of high-dimensional parameter evolution. Remarkably, ARM requires only early SFT checkpoints and, through iterative merging, surpasses the fully converged SFT model. Experimental results across model scales (1.7B to 14B) and diverse domains (e.g., math, code) demonstrate that ARM can transcend converged checkpoints. Extensive experiments show that ARM provides a scalable and lightweight framework for efficient model adaptation.
- [436] arXiv:2602.03238 [pdf, ps, other]
-
Title: The Necessity of a Unified Framework for LLM-Based Agent EvaluationSubjects: Artificial Intelligence (cs.AI)
With the advent of Large Language Models (LLMs), general-purpose agents have seen fundamental advancements. However, evaluating these agents presents unique challenges that distinguish them from static QA benchmarks. We observe that current agent benchmarks are heavily confounded by extraneous factors, including system prompts, toolset configurations, and environmental dynamics. Existing evaluations often rely on fragmented, researcher-specific frameworks where the prompt engineering for reasoning and tool usage varies significantly, making it difficult to attribute performance gains to the model itself. Additionally, the lack of standardized environmental data leads to untraceable errors and non-reproducible results. This lack of standardization introduces substantial unfairness and opacity into the field. We propose that a unified evaluation framework is essential for the rigorous advancement of agent evaluation. To this end, we introduce a proposal aimed at standardizing agent evaluation.
- [437] arXiv:2602.03239 [pdf, ps, other]
-
Title: Deterministic and randomized Kaczmarz methods for $AXB=C$ with applications to color image restorationSubjects: Numerical Analysis (math.NA)
We study Kaczmarz type methods to solve consistent linear matrix equations. We first present a block Kaczmarz (BK) method that employs a deterministic cyclic row selection strategy. Assuming that the associated coefficient matrix has full column or row rank, we derive matrix formulas for a cycle of this BK method. Moreover, we propose a greedy randomized block Kaczmarz (GRBK) method and further extend it to a relaxed variant (RGRBK) and a deterministic counterpart (MWRBK). We establish the convergence properties of the proposed methods. Numerical tests verify the theoretical findings, and we apply the proposed methods to color image restoration problems.
- [438] arXiv:2602.03242 [pdf, ps, other]
-
Title: InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Autonomous driving relies on robust models trained on high-quality, large-scale multi-view driving videos. While world models offer a cost-effective solution for generating realistic driving videos, they struggle to maintain instance-level temporal consistency and spatial geometric fidelity. To address these challenges, we propose InstaDrive, a novel framework that enhances driving video realism through two key advancements: (1) Instance Flow Guider, which extracts and propagates instance features across frames to enforce temporal consistency, preserving instance identity over time. (2) Spatial Geometric Aligner, which improves spatial reasoning, ensures precise instance positioning, and explicitly models occlusion hierarchies. By incorporating these instance-aware mechanisms, InstaDrive achieves state-of-the-art video generation quality and enhances downstream autonomous driving tasks on the nuScenes dataset. Additionally, we utilize CARLA's autopilot to procedurally and stochastically simulate rare but safety-critical driving scenarios across diverse maps and regions, enabling rigorous safety evaluation for autonomous systems. Our project page is https://shanpoyang654.github.io/InstaDrive/page.html.
- [439] arXiv:2602.03246 [pdf, ps, other]
-
Title: Joint Network-and-Server Congestion in Multi-Source Traffic Allocation: A Convex Formulation and Price-Based DecentralizationComments: 10pages, 7 figures, submitted a version conferenceSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY); Optimization and Control (math.OC)
This paper studies an important rate allocation problem that arises in many networked and distributed systems: steady-state traffic rate allocation from multiple sources to multiple service nodes when both (i) the access-path delay on each source-node route is rate-dependent (capacity-constrained) and convex, and (ii) each service node (also capacity-constrained) experiences a load-dependent queueing delay driven by aggregate load from all sources. We show that the resulting flow-weighted end-to-end delay minimization is a convex program, yielding a global system-optimal solution characterized by KKT conditions that equalize total marginal costs (a path marginal access term plus a node congestion price) across all utilized routes. This condition admits a Wardrop-type interpretation: for each source, all utilized options equalize total marginal cost, while any option with strictly larger total marginal cost receives no flow. Building on this structure, we develop a lightweight distributed pricing-based algorithm in which each service node locally computes and broadcasts a scalar congestion price from its observed aggregate load, while each source updates its traffic split by solving a small separable convex allocation problem under the advertised prices. Numerical illustrations demonstrate convergence of the distributed iteration to the centralized optimum and highlight the trade-offs induced by jointly modeling access and service congestion.
- [440] arXiv:2602.03247 [pdf, ps, other]
-
Title: Physics informed learning of orthogonal features with applications in solving partial differential equationsSubjects: Numerical Analysis (math.NA)
The random feature method (RFM) constructs approximation spaces by initializing features from generic distributions, which provides universal approximation properties to solve general partial differential equations. However, such standard initializations lack awareness of the underlying physical laws and geometry, which limits approximation. In this work, we propose the Physics-Driven Orthogonal Feature Method (PD-OFM), a framework for constructing feature representations that are explicitly tailored to both the differential operator and the computational domain by pretraining features using physics-informed objectives together with orthogonality regularization. This pretraining strategy yields nearly orthogonal feature bases. We provide both theoretical and empirical evidence that physics-informed pretraining improves the approximation capability of the learned feature space. When employed to solve Helmholtz, Poisson, wave, and Navier-Stokes equations, the proposed method achieves residual errors 2-3 orders of magnitude lower than those of comparable methods. Furthermore, the orthogonality regularization improves transferability, enabling pretrained features to generalize effectively across different source terms and domain geometries for the same PDE.
- [441] arXiv:2602.03248 [pdf, ps, other]
-
Title: A thin and soft optical tactile sensor for highly sensitive object perceptionAuthors: Yanchen Shen, Kohei Tsuji, Haruto Koizumi, Jiseon Hong, Tomoaki Niiyama, Hiroyuki Kuwabara, Hayato Ishida, Jun Hiramitsu, Mitsuhito Mase, Satoshi SunadaSubjects: Robotics (cs.RO); Applied Physics (physics.app-ph); Optics (physics.optics)
Tactile sensing is crucial in robotics and wearable devices for safe perception and interaction with the environment. Optical tactile sensors have emerged as promising solutions, as they are immune to electromagnetic interference and have high spatial resolution. However, existing optical approaches, particularly vision-based tactile sensors, rely on complex optical assemblies that involve lenses and cameras, resulting in bulky, rigid, and alignment-sensitive designs. In this study, we present a thin, compact, and soft optical tactile sensor featuring an alignment-free configuration. The soft optical sensor operates by capturing deformation-induced changes in speckle patterns generated within a soft silicone material, thereby enabling precise force measurements and texture recognition via machine learning. The experimental results show a root-mean-square error of 40 mN in the force measurement and a classification accuracy of 93.33% over nine classes of textured surfaces, including Mahjong tiles. The proposed speckle-based approach provides a compact, easily fabricated, and mechanically compliant platform that bridges optical sensing with flexible shape-adaptive architectures, thereby demonstrating its potential as a novel tactile-sensing paradigm for soft robotics and wearable haptic interfaces.
- [442] arXiv:2602.03249 [pdf, ps, other]
-
Title: Accordion-Thinking: Self-Regulated Step Summaries for Efficient and Readable LLM ReasoningAuthors: Zhicheng Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Wenlei Shi, Yiwei Wang, Xiaodan Liang, Jing TangSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Scaling test-time compute via long Chain-ofThought unlocks remarkable gains in reasoning capabilities, yet it faces practical limits due to the linear growth of KV cache and quadratic attention complexity. In this paper, we introduce Accordion-Thinking, an end-to-end framework where LLMs learn to self-regulate the granularity of the reasoning steps through dynamic summarization. This mechanism enables a Fold inference mode, where the model periodically summarizes its thought process and discards former thoughts to reduce dependency on historical tokens. We apply reinforcement learning to incentivize this capability further, uncovering a critical insight: the accuracy gap between the highly efficient Fold mode and the exhaustive Unfold mode progressively narrows and eventually vanishes over the course of training. This phenomenon demonstrates that the model learns to encode essential reasoning information into compact summaries, achieving effective compression of the reasoning context. Our Accordion-Thinker demonstrates that with learned self-compression, LLMs can tackle complex reasoning tasks with minimal dependency token overhead without compromising solution quality, and it achieves a 3x throughput while maintaining accuracy on a 48GB GPU memory configuration, while the structured step summaries provide a human-readable account of the reasoning process.
- [443] arXiv:2602.03250 [pdf, ps, other]
-
Title: Collision Detection with Analytical Derivatives of Contact KinematicsComments: 12 pages, 9 figures, 2 tablesSubjects: Robotics (cs.RO)
Differentiable contact kinematics are essential for gradient-based methods in robotics, yet the mapping from robot state to contact distance, location, and normal becomes non-smooth in degenerate configurations of shapes with zero or undefined curvature. We address this inherent limitation by selectively regularizing such geometries into strictly convex implicit representations, restoring uniqueness and smoothness of the contact map. Leveraging this geometric regularization, we develop iDCOL, an implicit differentiable collision detection and contact kinematics framework. iDCOL represents colliding bodies using strictly convex implicit surfaces and computes collision detection and contact kinematics by solving a fixed-size nonlinear system derived from a geometric scaling-based convex optimization formulation. By applying the Implicit Function Theorem to the resulting system residual, we derive analytical derivatives of the contact kinematic quantities. We develop a fast Newton-based solver for iDCOL and provide an open-source C++ implementation of the framework. The robustness of the approach is evaluated through extensive collision simulations and benchmarking, and applicability is demonstrated in gradient-based kinematic path planning and differentiable contact physics, including multi-body rigid collisions and a soft-robot interaction example.
- [444] arXiv:2602.03253 [pdf, ps, other]
-
Title: LaVPR: Benchmarking Language and Vision for Place RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Visual Place Recognition (VPR) often fails under extreme environmental changes and perceptual aliasing. Furthermore, standard systems cannot perform "blind" localization from verbal descriptions alone, a capability needed for applications such as emergency response. To address these challenges, we introduce LaVPR, a large-scale benchmark that extends existing VPR datasets with over 650,000 rich natural-language descriptions. Using LaVPR, we investigate two paradigms: Multi-Modal Fusion for enhanced robustness and Cross-Modal Retrieval for language-based localization. Our results show that language descriptions yield consistent gains in visually degraded conditions, with the most significant impact on smaller backbones. Notably, adding language allows compact models to rival the performance of much larger vision-only architectures. For cross-modal retrieval, we establish a baseline using Low-Rank Adaptation (LoRA) and Multi-Similarity loss, which substantially outperforms standard contrastive methods across vision-language models. Ultimately, LaVPR enables a new class of localization systems that are both resilient to real-world stochasticity and practical for resource-constrained deployment. Our dataset and code are available at https://github.com/oferidan1/LaVPR.
- [445] arXiv:2602.03255 [pdf, ps, other]
-
Title: LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial ScenariosSubjects: Artificial Intelligence (cs.AI)
Computer-use agents (CUAs) that interact with real computer systems can perform automated tasks but face critical safety risks. Ambiguous instructions may trigger harmful actions, and adversarial users can manipulate tool execution to achieve malicious goals. Existing benchmarks mostly focus on short-horizon or GUI-based tasks, evaluating on execution-time errors but overlooking the ability to anticipate planning-time risks. To fill this gap, we present LPS-Bench, a benchmark that evaluates the planning-time safety awareness of MCP-based CUAs under long-horizon tasks, covering both benign and adversarial interactions across 65 scenarios of 7 task domains and 9 risk types. We introduce a multi-agent automated pipeline for scalable data generation and adopt an LLM-as-a-judge evaluation protocol to assess safety awareness through the planning trajectory. Experiments reveal substantial deficiencies in existing CUAs' ability to maintain safe behavior. We further analyze the risks and propose mitigation strategies to improve long-horizon planning safety in MCP-based CUA systems. We open-source our code at https://github.com/tychenn/LPS-Bench.
- [446] arXiv:2602.03256 [pdf, ps, other]
-
Title: Impact of Physics-Informed Features on Neural Network Complexity for Li-ion Battery Voltage Prediction in Electric Vertical Takeoff and Landing AircraftsSubjects: Systems and Control (eess.SY)
The electrification of vertical takeoff and landing aircraft demands high-fidelity battery management systems capable of predicting voltage response under aggressive power dynamics. While data-driven models offer high accuracy, they often require complex architectures and extensive training data. Conversely, equivalent circuit models (ECMs), such as the second-order model, offer physical interpretability but struggle with high C-rate non-linearities. This paper investigates the impact of integrating physics-based information into data-driven surrogate models. Specifically, we evaluate whether physics-informed features allow for the simplification of neural network architectures without compromising accuracy. Using the open-source electric vertical takeoff and landing (eVTOL) battery dataset, we compare pure data-driven models against physics-informed data models. Results demonstrate that physics-informed models achieve comparable accuracy to complex pure data-driven models while using up to 75% fewer trainable parameters, significantly reducing computational overhead for potential on-board deployment.
- [447] arXiv:2602.03257 [pdf, ps, other]
-
Title: GraDE: A Graph Diffusion Estimator for Frequent Subgraph Discovery in Neural ArchitecturesAuthors: Yikang Yang, Zhengxin Yang, Minghao Luo, Luzhou Peng, Hongxiao Li, Wanling Gao, Lei Wang, Jianfeng ZhanSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Finding frequently occurring subgraph patterns or network motifs in neural architectures is crucial for optimizing efficiency, accelerating design, and uncovering structural insights. However, as the subgraph size increases, enumeration-based methods are perfectly accurate but computationally prohibitive, while sampling-based methods are computationally tractable but suffer from a severe decline in discovery capability. To address these challenges, this paper proposes GraDE, a diffusion-guided search framework that ensures both computational feasibility and discovery capability. The key innovation is the Graph Diffusion Estimator (GraDE), which is the first to introduce graph diffusion models to identify frequent subgraphs by scoring their typicality within the learned distribution. Comprehensive experiments demonstrate that the estimator achieves superior ranking accuracy, with up to 114\% improvement compared to sampling-based baselines. Benefiting from this, the proposed framework successfully discovers large-scale frequent patterns, achieving up to 30$\times$ higher median frequency than sampling-based methods.
- [448] arXiv:2602.03262 [pdf, ps, other]
-
Title: Towards Context-Aware Edge-Cloud Continuum Orchestration for Multi-user XR ServicesSubjects: Networking and Internet Architecture (cs.NI)
The rapid growth of multi-user eXtended Reality (XR) applications, spanning fields such as entertainment, education, and telemedicine, demands seamless, immersive experiences for users interacting within shared, distributed environments. Delivering such latency-sensitive experiences involves considerable challenges in orchestrating network, computing, and service resources, where existing limitations highlight the need for a structured approach to analyse and optimise these complex systems. This challenge is amplified by the need for high-performance, low-latency connectivity, where 5G and 6G networks provide essential infrastructure to meet the requirements of XR services at scale. This article addresses these challenges by developing a model that parametrises multi-user XR services across four critical layers of the standard virtualisation architecture. We formalise this model mathematically, proposing a context-aware framework that defines key parameters at each level and integrates them into a comprehensive Edge-Cloud Continuum orchestration strategy. Our contributions include a detailed analysis of the current limitations and needs in existing Edge-Cloud Continuum orchestration approaches, the formulation of a layered mathematical model, and a validation framework that demonstrates the utility and feasibility of the proposed solution.
- [449] arXiv:2602.03263 [pdf, ps, other]
-
Title: CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMsComments: 25 pages, 1 figuresSubjects: Artificial Intelligence (cs.AI)
Multimodal large language models (MLLMs) enable interaction over both text and images, but their safety behavior can be driven by unimodal shortcuts instead of true joint intent understanding. We introduce CSR-Bench, a benchmark for evaluating cross-modal reliability through four stress-testing interaction patterns spanning Safety, Over-rejection, Bias, and Hallucination, covering 61 fine-grained types. Each instance is constructed to require integrated image-text interpretation, and we additionally provide paired text-only controls to diagnose modality-induced behavior shifts. We evaluate 16 state-of-the-art MLLMs and observe systematic cross-modal alignment gaps. Models show weak safety awareness, strong language dominance under interference, and consistent performance degradation from text-only controls to multimodal inputs. We also observe a clear trade-off between reducing over-rejection and maintaining safe, non-discriminatory behavior, suggesting that some apparent safety gains may come from refusal-oriented heuristics rather than robust intent understanding. WARNING: This paper contains unsafe contents.
- [450] arXiv:2602.03264 [pdf, ps, other]
-
Title: HypCBC: Domain-Invariant Hyperbolic Cross-Branch Consistency for Generalizable Medical Image AnalysisComments: Accepted to Transactions on Machine Learning Research (TMLR)Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Robust generalization beyond training distributions remains a critical challenge for deep neural networks. This is especially pronounced in medical image analysis, where data is often scarce and covariate shifts arise from different hardware devices, imaging protocols, and heterogeneous patient populations. These factors collectively hinder reliable performance and slow down clinical adoption. Despite recent progress, existing learning paradigms primarily rely on the Euclidean manifold, whose flat geometry fails to capture the complex, hierarchical structures present in clinical data. In this work, we exploit the advantages of hyperbolic manifolds to model complex data characteristics. We present the first comprehensive validation of hyperbolic representation learning for medical image analysis and demonstrate statistically significant gains across eleven in-distribution datasets and three ViT models. We further propose an unsupervised, domain-invariant hyperbolic cross-branch consistency constraint. Extensive experiments confirm that our proposed method promotes domain-invariant features and outperforms state-of-the-art Euclidean methods by an average of $+2.1\%$ AUC on three domain generalization benchmarks: Fitzpatrick17k, Camelyon17-WILDS, and a cross-dataset setup for retinal imaging. These datasets span different imaging modalities, data sizes, and label granularities, confirming generalization capabilities across substantially different conditions. The code is available at https://github.com/francescodisalvo05/hyperbolic-cross-branch-consistency .
- [451] arXiv:2602.03265 [pdf, ps, other]
-
Title: Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language ModelsComments: 12 pages, 10 figuresSubjects: Machine Learning (cs.LG)
Large Language Models (LLMs) have seen widespread adoption across multiple domains, creating an urgent need for robust safety alignment mechanisms. However, robustness remains challenging due to jailbreak attacks that bypass alignment via adversarial prompts. In this work, we focus on the prevalent Greedy Coordinate Gradient (GCG) attack and identify a previously underexplored attack axis in jailbreak attacks typically framed as suffix-based: the placement of adversarial tokens within the prompt. Using GCG as a case study, we show that both optimizing attacks to generate prefixes instead of suffixes and varying adversarial token position during evaluation substantially influence attack success rates. Our findings highlight a critical blind spot in current safety evaluations and underline the need to account for the position of adversarial tokens in the adversarial robustness evaluation of LLMs.
- [452] arXiv:2602.03266 [pdf, ps, other]
-
Title: Link Fraction Mixed Membership Reveals Community Diversity in Aggregated Social NetworksComments: 21 pages, 6 figuresSubjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Community detection is a critical tool for understanding the mesoscopic structure of large-scale networks. However, when applied to aggregated or coarse-grained social networks, disjoint community partitions cannot capture the diverse composition of community memberships within aggregated nodes. While existing mixed membership methods alleviate this issue, they may detected communities that are highly sensitive to the aggregation resolution, not reliably reflecting the underlying community structure of the underlying individual-level network. This paper presents the Link Fraction Mixed Membership (LFMM) method, which computes the mixed memberships of nodes in aggregated networks. Unlike existing mixed membership methods, LFMM is consistent under aggregation. Specifically, we show that it conserves community membership sums at different scales. The method is utilized to study a population-scale social network of the Netherlands, aggregated at different resolutions. Experiments reveal variation in community membership across different geographical regions and evolution over the last decade. In particular, we show how our method identifies large urban hubs that act as the melting pots of diverse, spatially remote communities.
- [453] arXiv:2602.03268 [pdf, ps, other]
-
Title: Unveiling Covert Toxicity in Multimodal Data via Toxicity Association Graphs: A Graph-Based Metric and Interpretable Detection FrameworkSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Detecting toxicity in multimodal data remains a significant challenge, as harmful meanings often lurk beneath seemingly benign individual modalities: only emerging when modalities are combined and semantic associations are activated. To address this, we propose a novel detection framework based on Toxicity Association Graphs (TAGs), which systematically model semantic associations between innocuous entities and latent toxic implications. Leveraging TAGs, we introduce the first quantifiable metric for hidden toxicity, the Multimodal Toxicity Covertness (MTC), which measures the degree of concealment in toxic multimodal expressions. By integrating our detection framework with the MTC metric, our approach enables precise identification of covert toxicity while preserving full interpretability of the decision-making process, significantly enhancing transparency in multimodal toxicity detection. To validate our method, we construct the Covert Toxic Dataset, the first benchmark specifically designed to capture high-covertness toxic multimodal instances. This dataset encodes nuanced cross-modal associations and serves as a rigorous testbed for evaluating both the proposed metric and detection framework. Extensive experiments demonstrate that our approach outperforms existing methods across both low- and high-covertness toxicity regimes, while delivering clear, interpretable, and auditable detection outcomes. Together, our contributions advance the state of the art in explainable multimodal toxicity detection and lay the foundation for future context-aware and interpretable approaches. Content Warning: This paper contains examples of toxic multimodal content that may be offensive or disturbing to some readers. Reader discretion is advised.
- [454] arXiv:2602.03271 [pdf, ps, other]
-
Title: LogicScan: An LLM-driven Framework for Detecting Business Logic Vulnerabilities in Smart ContractsSubjects: Cryptography and Security (cs.CR)
Business logic vulnerabilities have become one of the most damaging yet least understood classes of smart contract vulnerabilities. Unlike traditional bugs such as reentrancy or arithmetic errors, these vulnerabilities arise from missing or incorrectly enforced business invariants and are tightly coupled with protocol semantics. Existing static analysis techniques struggle to capture such high-level logic, while recent large language model based approaches often suffer from unstable outputs and low accuracy due to hallucination and limited verification.
In this paper, we propose LogicScan, an automated contrastive auditing framework for detecting business logic vulnerabilities in smart contracts. The key insight behind LogicScan is that mature, widely deployed on-chain protocols implicitly encode well-tested and consensus-driven business invariants. LogicScan systematically mines these invariants from large-scale on-chain contracts and reuses them as reference constraints to audit target contracts. To achieve this, LogicScan introduces a Business Specification Language (BSL) to normalize diverse implementation patterns into structured, verifiable logic representations. It further combines noise-aware logic aggregation with contrastive auditing to identify missing or weakly enforced invariants while mitigating LLM-induced false positives.
We evaluate LogicScan on three real-world datasets, including DeFiHacks, Web3Bugs, and a set of top-200 audited contracts. The results show that LogicScan achieves an F1 score of 85.2%, significantly outperforming state-of-the-art tools while maintaining a low false-positive rate on production-grade contracts. Additional experiments demonstrate that LogicScan maintains consistent performance across different LLMs and is cost-effective, and that its false-positive suppression mechanisms substantially improve robustness. - [455] arXiv:2602.03272 [pdf, ps, other]
-
Title: Power Reserve Procurement Considering Dependent Random Variables with PCESubjects: Systems and Control (eess.SY)
This paper presents an approach for the modelling of dependent random variables using generalised polynomial chaos. This allows to write chance-constrained optimization problems with respect to a joint distribution modelling dependencies between different stochastic inputs. Arbitrary dependencies are modelled by using Gaussian copulas to construct the joint distribution. The paper exploits the problem structure and develops suitable transformations to ensure tractability. The proposed method is applied to a probabilistic power reserve procurement problem. The effectiveness of the method to capture dependencies is shown by comparing the approach with a standard approach considering independent random variables.
- [456] arXiv:2602.03275 [pdf, ps, other]
-
Title: On Complete Categorical Semantics for Effect HandlersAuthors: Satoshi KuraSubjects: Logic in Computer Science (cs.LO)
Soundness and completeness with respect to equational theories for programming languages are fundamental properties in the study of categorical semantics. However, completeness results have not been established for programming languages with algebraic effects and handlers, which raises a question of whether the commonly used models in the literature, i.e., free model monads generated from algebraic theories, are the only valid semantic models for effect handlers. In this paper, we show that this is not the case. We identify the precise characterizations of categorical models of effect handlers that allow us to establish soundness and completeness results with respect to a certain equational theory for effect handling constructs. Notably, this allows us to capture not only free monad models but also the CPS semantics for effect handlers as models of the calculus.
- [457] arXiv:2602.03277 [pdf, ps, other]
-
Title: BlockRR: A Unified Framework of RR-type Algorithms for Label Differential PrivacyComments: 19 pages, 2 figuresSubjects: Machine Learning (cs.LG)
In this paper, we introduce BlockRR, a novel and unified randomized-response mechanism for label differential privacy. This framework generalizes existed RR-type mechanisms as special cases under specific parameter settings, which eliminates the need for separate, case-by-case analysis. Theoretically, we prove that BlockRR satisfies $\epsilon$-label DP. We also design a partition method for BlockRR based on a weight matrix derived from label prior information; the parallel composition principle ensures that the composition of two such mechanisms remains $\epsilon$-label DP. Empirically, we evaluate BlockRR on two variants of CIFAR-10 with varying degrees of class imbalance. Results show that in the high-privacy and moderate-privacy regimes ($\epsilon \leq 3.0$), our propsed method gets a better balance between test accuaracy and the average of per-class accuracy. In the low-privacy regime ($\epsilon \geq 4.0$), all methods reduce BlockRR to standard RR without additional performance loss.
- [458] arXiv:2602.03278 [pdf, ps, other]
-
Title: A Pipeline for ADNI Resting-State Functional MRI Processing and Quality ControlAuthors: Saige Rutherford, Zeshawn Zahid, Robert C. Welsh, Andrea Avena-Koenigsberger, Vincent Koppelmans, Amanda F. MejiaSubjects: Databases (cs.DB)
The Alzheimer's Disease Neuroimaging Initiative (ADNI) provides a comprehensive multimodal neuroimaging resource for studying aging and Alzheimer's disease (AD). Since its second wave, ADNI has increasingly collected resting-state functional MRI (rs-fMRI), a valuable resource for discovering brain connectivity changes predictive of cognitive decline and AD. A major barrier to its use is the considerable variability in acquisition protocols and data quality, compounded by missing imaging sessions and inconsistencies in how functional scans temporally align with clinical assessments. As a result, many studies only utilize a small subset of the total rs-fMRI data, limiting statistical power, reproducibility, and the ability to study longitudinal functional brain changes at scale. Here, we describe a pipeline for ADNI rs-fMRI data that encompasses the download of necessary imaging and clinical data, temporally aligning the clinical and imaging data, preprocessing, and quality control. We integrate data curation and preprocessing across all ADNI sites and scanner types using a combination of open-source software (Clinica, fMRIPrep, and MRIQC) and bespoke tools. Quality metrics and reports are generated for each subject and session to facilitate rigorous data screening. All scripts and configuration files are available to enable reproducibility. The pipeline, which currently supports ADNI-GO, ADNI-2, and ADNI-3 data releases, outputs high-quality rs-fMRI time series data adhering to the BIDS-derivatives specification. This protocol provides a transparent and scalable framework for curating and utilizing ADNI fMRI data, empowering large-scale functional biomarker discovery and integrative multimodal analyses in Alzheimer's disease research.
- [459] arXiv:2602.03279 [pdf, ps, other]
-
Title: Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill SynthesisAuthors: Zhengbo Jiao, Shaobo Wang, Zifan Zhang, Xuan Ren, Wei Wang, Bing Zhao, Hu Wei, Linfeng ZhangComments: 23page4Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Advancing complex reasoning in large language models relies on high-quality, verifiable datasets, yet human annotation remains cost-prohibitive and difficult to scale. Current synthesis paradigms often face a recurring trade-off: maintaining structural validity typically restricts problem complexity, while relaxing constraints to increase difficulty frequently leads to inconsistent or unsolvable instances. To address this, we propose Agentic Proposing, a framework that models problem synthesis as a goal-driven sequential decision process where a specialized agent dynamically selects and composes modular reasoning skills. Through an iterative workflow of internal reflection and tool-use, we develop the Agentic-Proposer-4B using Multi-Granularity Policy Optimization (MGPO) to generate high-precision, verifiable training trajectories across mathematics, coding, and science. Empirical results demonstrate that downstream solvers trained on agent-synthesized data significantly outperform leading baselines and exhibit robust cross-domain generalization. Notably, a 30B solver trained on only 11,000 synthesized trajectories achieves a state-of-the-art 91.6% accuracy on AIME25, rivaling frontier-scale proprietary models such as GPT-5 and proving that a small volume of high-quality synthetic signals can effectively substitute for massive human-curated datasets.
- [460] arXiv:2602.03282 [pdf, ps, other]
-
Title: Global Geometry Is Not Enough for Vision RepresentationsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
A common assumption in representation learning is that globally well-distributed embeddings support robust and generalizable representations. This focus has shaped both training objectives and evaluation protocols, implicitly treating global geometry as a proxy for representational competence. While global geometry effectively encodes which elements are present, it is often insensitive to how they are composed. We investigate this limitation by testing the ability of geometric metrics to predict compositional binding across 21 vision encoders. We find that standard geometry-based statistics exhibit near-zero correlation with compositional binding. In contrast, functional sensitivity, as measured by the input-output Jacobian, reliably tracks this capability. We further provide an analytic account showing that this disparity arises from objective design, as existing losses explicitly constrain embedding geometry but leave the local input-output mapping unconstrained. These results suggest that global embedding geometry captures only a partial view of representational competence and establish functional sensitivity as a critical complementary axis for modeling composite structure.
- [461] arXiv:2602.03284 [pdf, ps, other]
-
Title: Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural NetworksAuthors: Yi Yu, Qixin Zhang, Shuhan Ye, Xun Lin, Qianshan Wei, Kun Wang, Wenhan Yang, Dacheng Tao, Xudong JiangComments: Accepted by ICLR 2026Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Spiking neural networks (SNNs) compute with discrete spikes and exploit temporal structure, yet most adversarial attacks change intensities or event counts instead of timing. We study a timing-only adversary that retimes existing spikes while preserving spike counts and amplitudes in event-driven SNNs, thus remaining rate-preserving. We formalize a capacity-1 spike-retiming threat model with a unified trio of budgets: per-spike jitter $\mathcal{B}_{\infty}$, total delay $\mathcal{B}_{1}$, and tamper count $\mathcal{B}_{0}$. Feasible adversarial examples must satisfy timeline consistency and non-overlap, which makes the search space discrete and constrained. To optimize such retimings at scale, we use projected-in-the-loop (PIL) optimization: shift-probability logits yield a differentiable soft retiming for backpropagation, and a strict projection in the forward pass produces a feasible discrete schedule that satisfies capacity-1, non-overlap, and the chosen budget at every step. The objective maximizes task loss on the projected input and adds a capacity regularizer together with budget-aware penalties, which stabilizes gradients and aligns optimization with evaluation. Across event-driven benchmarks (CIFAR10-DVS, DVS-Gesture, N-MNIST) and diverse SNN architectures, we evaluate under binary and integer event grids and a range of retiming budgets, and also test models trained with timing-aware adversarial training designed to counter timing-only attacks. For example, on DVS-Gesture the attack attains high success (over $90\%$) while touching fewer than $2\%$ of spikes under $\mathcal{B}_{0}$. Taken together, our results show that spike retiming is a practical and stealthy attack surface that current defenses struggle to counter, providing a clear reference for temporal robustness in event-driven SNNs. Code is available at https://github.com/yuyi-sd/Spike-Retiming-Attacks.
- [462] arXiv:2602.03285 [pdf, ps, other]
-
Title: MeetBench-XL: Calibrated Multi-Dimensional Evaluation and Learned Dual-Policy Agents for Real-Time MeetingsComments: accepted by AAAI2026 wsSubjects: Artificial Intelligence (cs.AI)
Enterprise meeting environments require AI assistants that handle diverse operational tasks, from rapid fact checking during live discussions to cross meeting analysis for strategic planning, under strict latency, cost, and privacy constraints. Existing meeting benchmarks mainly focus on simplified question answering and fail to reflect real world enterprise workflows, where queries arise organically from multi stakeholder collaboration, span long temporal contexts, and require tool augmented reasoning.
We address this gap through a grounded dataset and a learned agent framework. First, we introduce MeetAll, a bilingual and multimodal corpus derived from 231 enterprise meetings totaling 140 hours. Questions are injected using an enterprise informed protocol validated by domain expert review and human discriminability studies. Unlike purely synthetic benchmarks, this protocol is grounded in four enterprise critical dimensions: cognitive load, temporal context span, domain expertise, and actionable task execution, calibrated through interviews with stakeholders across finance, healthcare, and technology sectors.
Second, we propose MeetBench XL, a multi dimensional evaluation protocol aligned with human judgment that measures factual fidelity, intent alignment, response efficiency, structural clarity, and completeness. Third, we present MeetMaster XL, a learned dual policy agent that jointly optimizes query routing between fast and slow reasoning paths and tool invocation, including retrieval, cross meeting aggregation, and web search. A lightweight classifier enables accurate routing with minimal overhead, achieving a superior quality latency tradeoff over single model baselines. Experiments against commercial systems show consistent gains, supported by ablations, robustness tests, and a real world deployment case study.Resources: https://github.com/huyuelin/MeetBench. - [463] arXiv:2602.03286 [pdf, ps, other]
-
Title: Rejecting Arguments Based on Doubt in Structured Bipolar ArgumentationComments: Accepted to AAMAS 2026Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
This paper develops a new approach to computational argumentation that is informed by philosophical and linguistic views. Namely, it takes into account two ideas that have received little attention in the literature on computational argumentation: First, an agent may rationally reject an argument based on mere doubt, thus not all arguments they could defend must be accepted; and, second, that it is sometimes more natural to think in terms of which individual sentences or claims an agent accepts in a debate, rather than which arguments. In order to incorporate these two ideas into a computational approach, we first define the notion of structured bipolar argumentation frameworks (SBAFs), where arguments consist of sentences and we have both an attack and a support relation between them. Then, we provide semantics for SBAFs with two features: (1) Unlike with completeness-based semantics, our semantics do not force agents to accept all defended arguments. (2) In addition to argument extensions, which give acceptable sets of arguments, we also provide semantics for language extensions that specify acceptable sets of sentences. These semantics represent reasonable positions an agent might have in a debate. Our semantics lie between the admissible and complete semantics of abstract argumentation. Further, our approach can be used to provide a new perspective on existing approaches. For instance, we can specify the conditions under which an agent can ignore support between arguments (i.e. under which the use of abstract argumentation is warranted) and we show that deductive support semantics is a special case of our approach.
- [464] arXiv:2602.03288 [pdf, ps, other]
-
Title: An Algorithm for Monitoring Edge-geodetic Sets in Chordal GraphsSubjects: Discrete Mathematics (cs.DM); Computational Complexity (cs.CC)
A monitoring edge-geodetic set (or meg-set for short) of a graph is a set of vertices $M$ such that if any edge is removed, then the distance between some two vertices of $M$ increases. This notion was introduced by Foucaud et al. in 2023 as a way to monitor networks for communication failures. As computing a minimal meg-set is hard in general, recent works aimed to find polynomial-time algorithms to compute minimal meg-sets when the input belongs to a restricted class of graphs. Most of these results are based on the property of some classes of graphs to admit a unique minimum meg-set, which is then easy to compute. In this work, we prove that chordal graphs also admit a unique minimal meg-set, answering a standing open question of Foucaud et al.
- [465] arXiv:2602.03289 [pdf, ps, other]
-
Title: On the Summability Problem of Multivariate Rational Functions in the Mixed CaseSubjects: Symbolic Computation (cs.SC)
Continuing previous work, this paper focuses on the summability problem of multivariate rational functions in the mixed case in which both shift and $q$-shift operators can appear. Our summability criteria rely on three ingredients including orbital decompositions, Sato's isotropy groups, and difference transformations. This work settles the rational case of the long-term project aimed at developing algorithms for symbolic summation of multivariate functions.
- [466] arXiv:2602.03290 [pdf, ps, other]
-
Title: Universal Approximation of Continuous Functionals on Compact Subsets via Linear Measurements and Scalar NonlinearitiesComments: 10 pagesSubjects: Machine Learning (cs.LG); Functional Analysis (math.FA)
We study universal approximation of continuous functionals on compact subsets of products of Hilbert spaces. We prove that any such functional can be uniformly approximated by models that first take finitely many continuous linear measurements of the inputs and then combine these measurements through continuous scalar nonlinearities. We also extend the approximation principle to maps with values in a Banach space, yielding finite-rank approximations. These results provide a compact-set justification for the common ``measure, apply scalar nonlinearities, then combine'' design pattern used in operator learning and imaging.
- [467] arXiv:2602.03292 [pdf, ps, other]
-
Title: A3-TTA: Adaptive Anchor Alignment Test-Time Adaptation for Image SegmentationComments: Accepted by IEEE Transactions on Image ProcessingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Test-Time Adaptation (TTA) offers a practical solution for deploying image segmentation models under domain shift without accessing source data or retraining. Among existing TTA strategies, pseudo-label-based methods have shown promising performance. However, they often rely on perturbation-ensemble heuristics (e.g., dropout sampling, test-time augmentation, Gaussian noise), which lack distributional grounding and yield unstable training signals. This can trigger error accumulation and catastrophic forgetting during adaptation. To address this, we propose \textbf{A3-TTA}, a TTA framework that constructs reliable pseudo-labels through anchor-guided supervision. Specifically, we identify well-predicted target domain images using a class compact density metric, under the assumption that confident predictions imply distributional proximity to the source domain. These anchors serve as stable references to guide pseudo-label generation, which is further regularized via semantic consistency and boundary-aware entropy minimization. Additionally, we introduce a self-adaptive exponential moving average strategy to mitigate label noise and stabilize model update during adaptation. Evaluated on both multi-domain medical images (heart structure and prostate segmentation) and natural images, A3-TTA significantly improves average Dice scores by 10.40 to 17.68 percentage points compared to the source model, outperforming several state-of-the-art TTA methods under different segmentation model architectures. A3-TTA also excels in continual TTA, maintaining high performance across sequential target domains with strong anti-forgetting ability. The code will be made publicly available at https://github.com/HiLab-git/A3-TTA.
- [468] arXiv:2602.03293 [pdf, ps, other]
-
Title: Anomaly Detection via Mean Shift Density EnhancementSubjects: Machine Learning (cs.LG)
Unsupervised anomaly detection stands as an important problem in machine learning, with applications in financial fraud prevention, network security and medical diagnostics. Existing unsupervised anomaly detection algorithms rarely perform well across different anomaly types, often excelling only under specific structural assumptions. This lack of robustness also becomes particularly evident under noisy settings. We propose Mean Shift Density Enhancement (MSDE), a fully unsupervised framework that detects anomalies through their geometric response to density-driven manifold evolution. MSDE is based on the principle that normal samples, being well supported by local density, remain stable under iterative density enhancement, whereas anomalous samples undergo large cumulative displacements as they are attracted toward nearby density modes. To operationalize this idea, MSDE employs a weighted mean-shift procedure with adaptive, sample-specific density weights derived from a UMAP-based fuzzy neighborhood graph. Anomaly scores are defined by the total displacement accumulated across a small number of mean-shift iterations. We evaluate MSDE on the ADBench benchmark, comprising forty six real-world tabular datasets, four realistic anomaly generation mechanisms, and six noise levels. Compared to 13 established unsupervised baselines, MSDE achieves consistently strong, balanced and robust performance for AUC-ROC, AUC-PR, and Precision@n, at several noise levels and on average over several types of anomalies. These results demonstrate that displacement-based scoring provides a robust alternative to the existing state-of-the-art for unsupervised anomaly detection.
- [469] arXiv:2602.03294 [pdf, ps, other]
-
Title: LEVIO: Lightweight Embedded Visual Inertial Odometry for Resource-Constrained DevicesComments: This article has been accepted for publication in the IEEE Sensors Journal (JSEN)Journal-ref: IEEE Sensors Journal ( Volume: 26, Issue: 3, 01 February 2026)Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)
Accurate, infrastructure-less sensor systems for motion tracking are essential for mobile robotics and augmented reality (AR) applications. The most popular state-of-the-art visual-inertial odometry (VIO) systems, however, are too computationally demanding for resource-constrained hardware, such as micro-drones and smart glasses. This work presents LEVIO, a fully featured VIO pipeline optimized for ultra-low-power compute platforms, allowing six-degrees-of-freedom (DoF) real-time sensing. LEVIO incorporates established VIO components such as Oriented FAST and Rotated BRIEF (ORB) feature tracking and bundle adjustment, while emphasizing a computationally efficient architecture with parallelization and low memory usage to suit embedded microcontrollers and low-power systems-on-chip (SoCs). The paper proposes and details the algorithmic design choices and the hardware-software co-optimization approach, and presents real-time performance on resource-constrained hardware. LEVIO is validated on a parallel-processing ultra-low-power RISC-V SoC, achieving 20 FPS while consuming less than 100 mW, and benchmarked against public VIO datasets, offering a compelling balance between efficiency and accuracy. To facilitate reproducibility and adoption, the complete implementation is released as open-source.
- [470] arXiv:2602.03295 [pdf, ps, other]
-
Title: POP: Prefill-Only Pruning for Efficient Large Model InferenceSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated remarkable capabilities. However, their deployment is hindered by significant computational costs. Existing structured pruning methods, while hardware-efficient, often suffer from significant accuracy degradation. In this paper, we argue that this failure stems from a stage-agnostic pruning approach that overlooks the asymmetric roles between the prefill and decode stages. By introducing a virtual gate mechanism, our importance analysis reveals that deep layers are critical for next-token prediction (decode) but largely redundant for context encoding (prefill). Leveraging this insight, we propose Prefill-Only Pruning (POP), a stage-aware inference strategy that safely omits deep layers during the computationally intensive prefill stage while retaining the full model for the sensitive decode stage. To enable the transition between stages, we introduce independent Key-Value (KV) projections to maintain cache integrity, and a boundary handling strategy to ensure the accuracy of the first generated token. Extensive experiments on Llama-3.1, Qwen3-VL, and Gemma-3 across diverse modalities demonstrate that POP achieves up to 1.37$\times$ speedup in prefill latency with minimal performance loss, effectively overcoming the accuracy-efficiency trade-off limitations of existing structured pruning methods.
- [471] arXiv:2602.03297 [pdf, ps, other]
-
Title: Lipschitz Multiscale Deep Equilibrium Models: A Theoretically Guaranteed and Accelerated ApproachComments: Accepted at AISTATS2026Subjects: Machine Learning (cs.LG)
Deep equilibrium models (DEQs) achieve infinitely deep network representations without stacking layers by exploring fixed points of layer transformations in neural networks. Such models constitute an innovative approach that achieves performance comparable to state-of-the-art methods in many large-scale numerical experiments, despite requiring significantly less memory. However, DEQs face the challenge of requiring vastly more computational time for training and inference than conventional methods, as they repeatedly perform fixed-point iterations with no convergence guarantee upon each input. Therefore, this study explored an approach to improve fixed-point convergence and consequently reduce computational time by restructuring the model architecture to guarantee fixed-point convergence. Our proposed approach for image classification, Lipschitz multiscale DEQ, has theoretically guaranteed fixed-point convergence for both forward and backward passes by hyperparameter adjustment, achieving up to a 4.75$\times$ speed-up in numerical experiments on CIFAR-10 at the cost of a minor drop in accuracy.
- [472] arXiv:2602.03300 [pdf, ps, other]
-
Title: R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
In this work, we aim to develop effective data synthesis techniques that autonomously synthesize multimodal training data for enhancing MLLMs in solving complex real-world tasks. To this end, we propose Collective Adversarial Data Synthesis (CADS), a novel and general approach to synthesize high-quality, diverse and challenging multimodal data for MLLMs. The core idea of CADS is to leverage collective intelligence to ensure high-quality and diverse generation, while exploring adversarial learning to synthesize challenging samples for effectively driving model improvement. Specifically, CADS operates with two cyclic phases, i.e., Collective Adversarial Data Generation (CAD-Generate) and Collective Adversarial Data Judgment (CAD-Judge). CAD-Generate leverages collective knowledge to jointly generate new and diverse multimodal data, while CAD-Judge collaboratively assesses the quality of synthesized data. In addition, CADS introduces an Adversarial Context Optimization mechanism to optimize the generation context to encourage challenging and high-value data generation. With CADS, we construct MMSynthetic-20K and train our model R1-SyntheticVL, which demonstrates superior performance on various benchmarks.
- [473] arXiv:2602.03301 [pdf, ps, other]
-
Title: Periodic Regularized Q-LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
In reinforcement learning (RL), Q-learning is a fundamental algorithm whose convergence is guaranteed in the tabular setting. However, this convergence guarantee does not hold under linear function approximation. To overcome this limitation, a significant line of research has introduced regularization techniques to ensure stable convergence under function approximation. In this work, we propose a new algorithm, periodic regularized Q-learning (PRQ). We first introduce regularization at the level of the projection operator and explicitly construct a regularized projected value iteration (RP-VI), subsequently extending it to a sample-based RL algorithm. By appropriately regularizing the projection operator, the resulting projected value iteration becomes a contraction. By extending this regularized projection into the stochastic setting, we establish the PRQ algorithm and provide a rigorous theoretical analysis that proves finite-time convergence guarantees for PRQ under linear function approximation.
- [474] arXiv:2602.03302 [pdf, ps, other]
-
Title: Full end-to-end diagnostic workflow automation of 3D OCT via foundation model-driven AI for retinal diseasesAuthors: Jinze Zhang, Jian Zhong, Li Lin, Jiaxiong Li, Ke Ma, Naiyang Li, Meng Li, Yuan Pan, Zeyu Meng, Mengyun Zhou, Shang Huang, Shilong Yu, Zhengyu Duan, Sutong Li, Honghui Xia, Juping Liu, Dan Liang, Yantao Wei, Xiaoying Tang, Jin Yuan, Peng XiaoSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Optical coherence tomography (OCT) has revolutionized retinal disease diagnosis with its high-resolution and three-dimensional imaging nature, yet its full diagnostic automation in clinical practices remains constrained by multi-stage workflows and conventional single-slice single-task AI models. We present Full-process OCT-based Clinical Utility System (FOCUS), a foundation model-driven framework enabling end-to-end automation of 3D OCT retinal disease diagnosis. FOCUS sequentially performs image quality assessment with EfficientNetV2-S, followed by abnormality detection and multi-disease classification using a fine-tuned Vision Foundation Model. Crucially, FOCUS leverages a unified adaptive aggregation method to intelligently integrate 2D slices-level predictions into comprehensive 3D patient-level diagnosis. Trained and tested on 3,300 patients (40,672 slices), and externally validated on 1,345 patients (18,498 slices) across four different-tier centers and diverse OCT devices, FOCUS achieved high F1 scores for quality assessment (99.01%), abnormally detection (97.46%), and patient-level diagnosis (94.39%). Real-world validation across centers also showed stable performance (F1: 90.22%-95.24%). In human-machine comparisons, FOCUS matched expert performance in abnormality detection (F1: 95.47% vs 90.91%) and multi-disease diagnosis (F1: 93.49% vs 91.35%), while demonstrating better efficiency. FOCUS automates the image-to-diagnosis pipeline, representing a critical advance towards unmanned ophthalmology with a validated blueprint for autonomous screening to enhance population scale retinal care accessibility and efficiency.
- [475] arXiv:2602.03304 [pdf, ps, other]
-
Title: To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal InterventionAuthors: Wenlin Zhang, Kuicai Dong, Junyi Li, Yingyi Zhang, Xiaopeng Li, Pengyue Jia, Yi Wen, Derong Xu, Maolin Wang, Yichao Wang, Yong Liu, Xiangyu ZhaoSubjects: Information Retrieval (cs.IR)
Deep search agents, which autonomously iterate through multi-turn web-based reasoning, represent a promising paradigm for complex information-seeking tasks. However, current agents suffer from critical inefficiency: they conduct excessive searches as they cannot accurately judge when to stop searching and start answering. This stems from outcome-centric training that prioritize final results over the search process itself. We identify the root cause as misaligned decision boundaries, the threshold determining when accumulated information suffices to answer. This causes over-search (redundant searching despite sufficient knowledge) and under-search (premature termination yielding incorrect answers). To address these errors, we propose a comprehensive framework comprising two key components. First, we introduce causal intervention-based diagnosis that identifies boundary errors by comparing factual and counterfactual trajectories at each decision point. Second, we develop Decision Boundary Alignment for Deep Search agents (DAS), which constructs preference datasets from causal feedback and aligns policies via preference optimization. Experiments on public datasets demonstrate that decision boundary errors are pervasive across state-of-the-art agents. Our DAS method effectively calibrates these boundaries, mitigating both over-search and under-search to achieve substantial gains in accuracy and efficiency. Our code and data are publicly available at: https://github.com/Applied-Machine-Learning-Lab/WWW2026_DAS.
- [476] arXiv:2602.03305 [pdf, ps, other]
-
Title: medR: Reward Engineering for Clinical Offline Reinforcement Learning via Tri-Drive Potential FunctionsAuthors: Qianyi Xu, Gousia Habib, Feng Wu, Yanrui Du, Zhihui Chen, Swapnil Mishra, Dilruk Perera, Mengling FengSubjects: Machine Learning (cs.LG)
Reinforcement Learning (RL) offers a powerful framework for optimizing dynamic treatment regimes (DTRs). However, clinical RL is fundamentally bottlenecked by reward engineering: the challenge of defining signals that safely and effectively guide policy learning in complex, sparse offline environments. Existing approaches often rely on manual heuristics that fail to generalize across diverse pathologies. To address this, we propose an automated pipeline leveraging Large Language Models (LLMs) for offline reward design and verification. We formulate the reward function using potential functions consisted of three core components: survival, confidence, and competence. We further introduce quantitative metrics to rigorously evaluate and select the optimal reward structure prior to deployment. By integrating LLM-driven domain knowledge, our framework automates the design of reward functions for specific diseases while significantly enhancing the performance of the resulting policies.
- [477] arXiv:2602.03306 [pdf, ps, other]
-
Title: Learning to Select: Query-Aware Adaptive Dimension Selection for Dense RetrievalSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Dense retrieval represents queries and docu-002 ments as high-dimensional embeddings, but003 these representations can be redundant at the004 query level: for a given information need, only005 a subset of dimensions is consistently help-006 ful for ranking. Prior work addresses this via007 pseudo-relevance feedback (PRF) based dimen-008 sion importance estimation, which can produce009 query-aware masks without labeled data but010 often relies on noisy pseudo signals and heuris-011 tic test-time procedures. In contrast, super-012 vised adapter methods leverage relevance labels013 to improve embedding quality, yet they learn014 global transformations shared across queries015 and do not explicitly model query-aware di-016 mension importance. We propose a Query-017 Aware Adaptive Dimension Selection frame-018 work that learns to predict per-dimension im-019 portance directly from query embedding. We020 first construct oracle dimension importance dis-021 tributions over embedding dimensions using022 supervised relevance labels, and then train a023 predictor to map a query embedding to these024 label-distilled importance scores. At inference,025 the predictor selects a query-aware subset of026 dimensions for similarity computation based027 solely on the query embedding, without pseudo-028 relevance feedback. Experiments across multi-029 ple dense retrievers and benchmarks show that030 our learned dimension selector improves re-031 trieval effectiveness over the full-dimensional032 baseline as well as PRF-based masking and033 supervised adapter baselines.
- [478] arXiv:2602.03307 [pdf, ps, other]
-
Title: GRAM: Spatial general-purpose audio representations for real-world environmentsComments: Revise with RealSELDSubjects: Sound (cs.SD)
Audio foundation models learn general-purpose audio representations that facilitate a wide range of downstream tasks. While the performance of these models has greatly increased for conventional single-channel, dry audio clips, their success in real-world acoustic environments with reverberation and noise is limited. Furthermore, most audio foundation models ignore the spatial dimension of real-world acoustic environments, ruling out tasks involving sound localization. To address these limitations, we propose GRAM: a general-purpose real-world audio model that employs a multi-channel masked autoencoder to efficiently learn spatial audio representations. We evaluated GRAM and other audio foundation models in a standardized manner on high-quality simulations of naturalistic, spatial acoustic environments as well as recordings of real-world environments and release these two complementary benchmark task suites: NatHEAR and RealSELD. Our results demonstrate that GRAM outperforms all state-of-the-art self-supervised audio foundation models on NatHEAR and the clean, single-channel version HEAR, while using only a fraction of the training data. GRAM also shows state-of-the-art localization performance in simulated environments and generalizes efficiently to real-world recordings in RealSELD. Taken together, GRAM presents a significant advance toward robust spatial audio foundation models for real-world environments.
- [479] arXiv:2602.03309 [pdf, ps, other]
-
Title: Entropy-Gated Selective Policy Optimization:Token-Level Gradient Allocation for Hybrid Training of Large Language ModelsComments: accepted by cscwd2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Hybrid training methods for large language models combine supervised fine tuning (SFT) on expert demonstrations with reinforcement learning (RL) on model rollouts, typically at the sample level. We propose Entropy Gated Selective Policy Optimization (EGSPO), a three stage framework that extends sample level mixing with token level gradient modulation.
Stage 1, SFT expert learning, establishes a reliable warm up policy using expert demonstrations with a pure SFT loss. Stage 2, RL rollout generation, samples trajectories from the current policy and computes per token predictive entropy. Stage 3, the EGSPO mechanism, applies entropy gated gradient allocation: a predictive entropy module routes high entropy tokens to full PPO updates to encourage exploration, and low entropy tokens to attenuated PPO updates to reduce variance and preserve knowledge. Critically, both branches incorporate the advantage function A_t, ensuring that incorrect trajectories receive consistent negative learning signals and preventing reinforcement of confident errors.
EGSPO achieves consistent improvements on mathematical reasoning benchmarks, with gains of 3.8 percent on AIME and 2.9 percent on MATH over the CHORD phi baseline, while incurring only 3.4 percent additional computational overhead. - [480] arXiv:2602.03310 [pdf, ps, other]
-
Title: RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment GeneralizationSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Vision-Language-Action (VLA) models hold promise for generalist robotics but currently struggle with data scarcity, architectural inefficiencies, and the inability to generalize across different hardware platforms. We introduce RDT2, a robotic foundation model built upon a 7B parameter VLM designed to enable zero-shot deployment on novel embodiments for open-vocabulary tasks. To achieve this, we collected one of the largest open-source robotic datasets--over 10,000 hours of demonstrations in diverse families--using an enhanced, embodiment-agnostic Universal Manipulation Interface (UMI). Our approach employs a novel three-stage training recipe that aligns discrete linguistic knowledge with continuous control via Residual Vector Quantization (RVQ), flow-matching, and distillation for real-time inference. Consequently, RDT2 becomes one of the first models that simultaneously zero-shot generalizes to unseen objects, scenes, instructions, and even robotic platforms. Besides, it outperforms state-of-the-art baselines in dexterous, long-horizon, and dynamic downstream tasks like playing table tennis. See https://rdt-robotics.github.io/rdt2/ for more information.
- [481] arXiv:2602.03311 [pdf, ps, other]
-
Title: Multi-Level Testing of Conversational AI SystemsAuthors: Elena MasseriniComments: 3 pages, 1 figure, Accepted at IEEE/ACM International Conference on Software Engineering (ICSE) - Doctoral Symposium Track, 2026Subjects: Software Engineering (cs.SE)
Conversational AI systems combine AI-based solutions with the flexibility of conversational interfaces. However, most existing testing solutions do not straightforwardly adapt to the characteristics of conversational interaction or to the behavior of AI components. To address this limitation, this Ph.D. thesis investigates a new family of testing approaches for conversational AI systems, focusing on the validation of their constituent elements at different levels of granularity, from the integration between the language and the AI components, to individual conversational agents, up to multi-agent implementations of conversational AI systems
- [482] arXiv:2602.03314 [pdf, ps, other]
-
Title: PQTNet: Pixel-wise Quantitative Thermography Neural Network for Estimating Defect Depth in Polylactic Acid Parts by Additive ManufacturingComments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
Defect depth quantification in additively manufactured (AM) components remains a significant challenge for non-destructive testing (NDT). This study proposes a Pixel-wise Quantitative Thermography Neural Network (PQT-Net) to address this challenge for polylactic acid (PLA) parts. A key innovation is a novel data augmentation strategy that reconstructs thermal sequence data into two-dimensional stripe images, preserving the complete temporal evolution of heat diffusion for each pixel. The PQT-Net architecture incorporates a pre-trained EfficientNetV2-S backbone and a custom Residual Regression Head (RRH) with learnable parameters to refine outputs. Comparative experiments demonstrate the superiority of PQT-Net over other deep learning models, achieving a minimum Mean Absolute Error (MAE) of 0.0094 mm and a coefficient of determination (R) exceeding 99%. The high precision of PQT-Net underscores its potential for robust quantitative defect characterization in AM.
- [483] arXiv:2602.03315 [pdf, ps, other]
-
Title: Memora: A Harmonic Memory Representation Balancing Abstraction and SpecificityAuthors: Menglin Xia, Xuchao Zhang, Shantanu Dixit, Paramaguru Harimurugan, Rujia Wang, Victor Ruhle, Robert Sim, Chetan Bansal, Saravan RajmohanSubjects: Artificial Intelligence (cs.AI)
Agent memory systems must accommodate continuously growing information while supporting efficient, context-aware retrieval for downstream tasks. Abstraction is essential for scaling agent memory, yet it often comes at the cost of specificity, obscuring the fine-grained details required for effective reasoning. We introduce Memora, a harmonic memory representation that structurally balances abstraction and specificity. Memora organizes information via its primary abstractions that index concrete memory values and consolidate related updates into unified memory entries, while cue anchors expand retrieval access across diverse aspects of the memory and connect related memories. Building on this structure, we employ a retrieval policy that actively exploits these memory connections to retrieve relevant information beyond direct semantic similarity. Theoretically, we show that standard Retrieval-Augmented Generation (RAG) and Knowledge Graph (KG)-based memory systems emerge as special cases of our framework. Empirically, Memora establishes a new state-of-the-art on the LoCoMo and LongMemEval benchmarks, demonstrating better retrieval relevance and reasoning effectiveness as memory scales.
- [484] arXiv:2602.03316 [pdf, ps, other]
-
Title: Invisible Clean-Label Backdoor Attacks for Generative Data AugmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
With the rapid advancement of image generative models, generative data augmentation has become an effective way to enrich training images, especially when only small-scale datasets are available. At the same time, in practical applications, generative data augmentation can be vulnerable to clean-label backdoor attacks, which aim to bypass human inspection. However, based on theoretical analysis and preliminary experiments, we observe that directly applying existing pixel-level clean-label backdoor attack methods (e.g., COMBAT) to generated images results in low attack success rates. This motivates us to move beyond pixel-level triggers and focus instead on the latent feature level. To this end, we propose InvLBA, an invisible clean-label backdoor attack method for generative data augmentation by latent perturbation. We theoretically prove that the generalization of the clean accuracy and attack success rates of InvLBA can be guaranteed. Experiments on multiple datasets show that our method improves the attack success rate by 46.43% on average, with almost no reduction in clean accuracy and high robustness against SOTA defense methods.
- [485] arXiv:2602.03318 [pdf, ps, other]
-
Title: MIRROR: A Multi-Agent Framework with Iterative Adaptive Revision and Hierarchical Retrieval for Optimization Modeling in Operations ResearchSubjects: Computation and Language (cs.CL)
Operations Research (OR) relies on expert-driven modeling-a slow and fragile process ill-suited to novel scenarios. While large language models (LLMs) can automatically translate natural language into optimization models, existing approaches either rely on costly post-training or employ multi-agent frameworks, yet most still lack reliable collaborative error correction and task-specific retrieval, often leading to incorrect outputs. We propose MIRROR, a fine-tuning-free, end-to-end multi-agent framework that directly translates natural language optimization problems into mathematical models and solver code. MIRROR integrates two core mechanisms: (1) execution-driven iterative adaptive revision for automatic error correction, and (2) hierarchical retrieval to fetch relevant modeling and coding exemplars from a carefully curated exemplar library. Experiments show that MIRROR outperforms existing methods on standard OR benchmarks, with notable results on complex industrial datasets such as IndustryOR and Mamo-ComplexLP. By combining precise external knowledge infusion with systematic error correction, MIRROR provides non-expert users with an efficient and reliable OR modeling solution, overcoming the fundamental limitations of general-purpose LLMs in expert optimization tasks.
- [486] arXiv:2602.03319 [pdf, ps, other]
-
Title: Information-Theoretic Multi-Model Fusion for Target-Oriented Adaptive Sampling in Materials DesignComments: 37 pages, 5 figures, 2 tablesSubjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Information Theory (cs.IT)
Target-oriented discovery under limited evaluation budgets requires making reliable progress in high-dimensional, heterogeneous design spaces where each new measurement is costly, whether experimental or high-fidelity simulation. We present an information-theoretic framework for target-oriented adaptive sampling that reframes optimization as trajectory discovery: instead of approximating the full response surface, the method maintains and refines a low-entropy information state that concentrates search on target-relevant directions. The approach couples data, model beliefs, and physics/structure priors through dimension-aware information budgeting, adaptive bootstrapped distillation over a heterogeneous surrogate reservoir, and structure-aware candidate manifold analysis with Kalman-inspired multi-model fusion to balance consensus-driven exploitation and disagreement-driven exploration. Evaluated under a single unified protocol without dataset-specific tuning, the framework improves sample efficiency and reliability across 14 single- and multi-objective materials design tasks spanning candidate pools from $600$ to $4 \times 10^6$ and feature dimensions from $10$ to $10^3$, typically reaching top-performing regions within 100 evaluations. Complementary 20-dimensional synthetic benchmarks (Ackley, Rastrigin, Schwefel) further demonstrate robustness to rugged and multimodal landscapes.
- [487] arXiv:2602.03320 [pdf, ps, other]
-
Title: MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement LearningAuthors: Shengyuan Liu, Liuxin Bao, Qi Yang, Wanting Geng, Boyun Zheng, Chenxin Li, Wenting Chen, Houwen Peng, Yixuan YuanComments: 23 Pages, 4 FiguresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Medical image segmentation is evolving from task-specific models toward generalizable frameworks. Recent research leverages Multi-modal Large Language Models (MLLMs) as autonomous agents, employing reinforcement learning with verifiable reward (RLVR) to orchestrate specialized tools like the Segment Anything Model (SAM). However, these approaches often rely on single-turn, rigid interaction strategies and lack process-level supervision during training, which hinders their ability to fully exploit the dynamic potential of interactive tools and leads to redundant actions. To bridge this gap, we propose MedSAM-Agent, a framework that reformulates interactive segmentation as a multi-step autonomous decision-making process. First, we introduce a hybrid prompting strategy for expert-curated trajectory generation, enabling the model to internalize human-like decision heuristics and adaptive refinement strategies. Furthermore, we develop a two-stage training pipeline that integrates multi-turn, end-to-end outcome verification with a clinical-fidelity process reward design to promote interaction parsimony and decision efficiency. Extensive experiments across 6 medical modalities and 21 datasets demonstrate that MedSAM-Agent achieves state-of-the-art performance, effectively unifying autonomous medical reasoning with robust, iterative optimization. Code is available \href{https://github.com/CUHK-AIM-Group/MedSAM-Agent}{here}.
- [488] arXiv:2602.03322 [pdf, ps, other]
-
Title: Weighted finite difference methods for a nonlinear Klein--Gordon equation with high oscillations in space and timeSubjects: Numerical Analysis (math.NA)
We consider a nonlinear Klein--Gordon equation in the nonrelativistic limit regime with initial data in the form of a modulated highly oscillatory exponential. In this regime of a small scaling parameter $\varepsilon$, the solution exhibits rapid oscillations in both time and space, posing challenges for numerical approximation. We propose an explicit and an implicit exponentially weighted finite difference method. While the explicit weighted leapfrog method needs to satisfy a CFL-type stability condition, the implicit weighted Crank--Nicolson method is unconditionally stable. Both methods achieve second-order accuracy with time steps and mesh sizes that are not restricted in magnitude by $\varepsilon$. The methods are uniformly convergent in the range from arbitrarily small to moderately bounded $\varepsilon$. Numerical experiments illustrate the theoretical results.
- [489] arXiv:2602.03324 [pdf, ps, other]
-
Title: SCASRec: A Self-Correcting and Auto-Stopping Model for Generative Route List RecommendationAuthors: Chao Chen, Longfei Xu, Daohan Su, Tengfei Liu, Hanyu Guo, Yihai Duan, Kaikui Liu, Xiangxiang ChuSubjects: Information Retrieval (cs.IR)
Route recommendation systems commonly adopt a multi-stage pipeline involving fine-ranking and re-ranking to produce high-quality ordered recommendations. However, this paradigm faces three critical limitations. First, there is a misalignment between offline training objectives and online metrics. Offline gains do not necessarily translate to online improvements. Actual performance must be validated through A/B testing, which may potentially compromise the user experience. Second, redundancy elimination relies on rigid, handcrafted rules that lack adaptability to the high variance in user intent and the unstructured complexity of real-world scenarios. Third, the strict separation between fine-ranking and re-ranking stages leads to sub-optimal performance. Since each module is optimized in isolation, the fine-ranking stage remains oblivious to the list-level objectives (e.g., diversity) targeted by the re-ranker, thereby preventing the system from achieving a jointly optimized global optimum. To overcome these intertwined challenges, we propose \textbf{SCASRec} (\textbf{S}elf-\textbf{C}orrecting and \textbf{A}uto-\textbf{S}topping \textbf{Rec}ommendation), a unified generative framework that integrates ranking and redundancy elimination into a single end-to-end process. SCASRec introduces a stepwise corrective reward (SCR) to guide list-wise refinement by focusing on hard samples, and employs a learnable End-of-Recommendation (EOR) token to terminate generation adaptively when no further improvement is expected. Experiments on two large-scale, open-sourced route recommendation datasets demonstrate that SCASRec establishes an SOTA in offline and online settings. SCASRec has been fully deployed in a real-world navigation app, demonstrating its effectiveness.
- [490] arXiv:2602.03327 [pdf, ps, other]
-
Title: Pi-GS: Sparse-View Gaussian Splatting with Dense π^3 InitializationSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Novel view synthesis has evolved rapidly, advancing from Neural Radiance Fields to 3D Gaussian Splatting (3DGS), which offers real-time rendering and rapid training without compromising visual fidelity. However, 3DGS relies heavily on accurate camera poses and high-quality point cloud initialization, which are difficult to obtain in sparse-view scenarios. While traditional Structure from Motion (SfM) pipelines often fail in these settings, existing learning-based point estimation alternatives typically require reliable reference views and remain sensitive to pose or depth errors. In this work, we propose a robust method utilizing {\pi}^3, a reference-free point cloud estimation network. We integrate dense initialization from {\pi}^3 with a regularization scheme designed to mitigate geometric inaccuracies. Specifically, we employ uncertainty-guided depth supervision, normal consistency loss, and depth warping. Experimental results demonstrate that our approach achieves state-of-the-art performance on the Tanks and Temples, LLFF, DTU, and MipNeRF360 datasets.
- [491] arXiv:2602.03328 [pdf, ps, other]
-
Title: GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, and VideoAuthors: Zhenhao Zhu, Yue Liu, Yanpei Guo, Wenjie Qu, Cancan Chen, Yufei He, Yibo Li, Yulin Chen, Tianyi Wu, Huiying Xu, Xinzhong Zhu, Jiaheng ZhangSubjects: Cryptography and Security (cs.CR)
We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, and video data. First, we construct a comprehensive training corpus comprising 148k samples spanning these three modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL, incorporating an error-driven exploration reward to incentivize deeper reasoning on hard samples. We release a suite of models scaled at 2B and 4B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks. Notably, GuardReasoner-Omni (2B) significantly surpasses the runner-up by 5.3% F1 score.
- [492] arXiv:2602.03329 [pdf, ps, other]
-
Title: From Inexact Gradients to Byzantine Robustness: Acceleration and Optimization under SimilaritySubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Standard federated learning algorithms are vulnerable to adversarial nodes, a.k.a. Byzantine failures. To solve this issue, robust distributed learning algorithms have been developed, which typically replace parameter averaging by robust aggregations. While generic conditions on these aggregations exist to guarantee the convergence of (Stochastic) Gradient Descent (SGD), the analyses remain rather ad-hoc. This hinders the development of more complex robust algorithms, such as accelerated ones. In this work, we show that Byzantine-robust distributed optimization can, under standard generic assumptions, be cast as a general optimization with inexact gradient oracles (with both additive and multiplicative error terms), an active field of research.
This allows for instance to directly show that GD on top of standard robust aggregation procedures obtains optimal asymptotic error in the Byzantine setting. Going further, we propose two optimization schemes to speed up the convergence. The first one is a Nesterov-type accelerated scheme whose proof directly derives from accelerated inexact gradient results applied to our formulation. The second one hinges on Optimization under Similarity, in which the server leverages an auxiliary loss function that approximates the global loss. Both approaches allow to drastically reduce the communication complexity compared to previous methods, as we show theoretically and empirically. - [493] arXiv:2602.03331 [pdf, ps, other]
-
Title: Bayesian Conformal Prediction as a Decision Risk ProblemComments: 18 pages, 5 figures. Accepted at EIML 2025 at EuripsSubjects: Machine Learning (cs.LG)
Bayesian posterior predictive densities as non-conformity scores and Bayesian quadrature are used to estimate and minimise the expected prediction set size. Operating within a split conformal framework, BCP provides valid coverage guarantees and demonstrates reliable empirical coverage under model misspecification. Across regression and classification tasks, including distribution-shifted settings such as ImageNet-A, BCP yields prediction sets of comparable size to split conformal prediction, while exhibiting substantially lower run-to-run variability in set size. In sparse regression with nominal coverage of 80 percent, BCP achieves 81 percent empirical coverage under a misspecified prior, whereas Bayesian credible intervals under-cover at 49 percent.
- [494] arXiv:2602.03333 [pdf, ps, other]
-
Title: PWAVEP: Purifying Imperceptible Adversarial Perturbations in 3D Point Clouds via Spectral Graph WaveletsComments: Accepted by WWW 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent progress in adversarial attacks on 3D point clouds, particularly in achieving spatial imperceptibility and high attack performance, presents significant challenges for defenders. Current defensive approaches remain cumbersome, often requiring invasive model modifications, expensive training procedures or auxiliary data access. To address these threats, in this paper, we propose a plug-and-play and non-invasive defense mechanism in the spectral domain, grounded in a theoretical and empirical analysis of the relationship between imperceptible perturbations and high-frequency spectral components. Building upon these insights, we introduce a novel purification framework, termed PWAVEP, which begins by computing a spectral graph wavelet domain saliency score and local sparsity score for each point. Guided by these values, PWAVEP adopts a hierarchical strategy, it eliminates the most salient points, which are identified as hardly recoverable adversarial outliers. Simultaneously, it applies a spectral filtering process to a broader set of moderately salient points. This process leverages a graph wavelet transform to attenuate high-frequency coefficients associated with the targeted points, thereby effectively suppressing adversarial noise. Extensive evaluations demonstrate that the proposed PWAVEP achieves superior accuracy and robustness compared to existing approaches, advancing the state-of-the-art in 3D point cloud purification. Code and datasets are available at https://github.com/a772316182/pwavep
- [495] arXiv:2602.03334 [pdf, ps, other]
-
Title: The Personality Trap: How LLMs Embed Bias When Generating Human-Like PersonasComments: 26 pages, 2 FiguresSubjects: Computers and Society (cs.CY)
This paper examines biases in large language models (LLMs) when generating synthetic populations from responses to personality questionnaires. Using five LLMs, we first assess the representativeness and potential biases in the sociodemographic attributes of the generated personas, as well as their alignment with the intended personality traits. While LLMs successfully reproduce known correlations between personality and sociodemographic variables, all models exhibit pronounced WEIRD (western, educated, industrialized, rich and democratic) biases, favoring young, educated, white, heterosexual, Western individuals with centrist or progressive political views and secular or Christian beliefs. In a second analysis, we manipulate input traits to maximize Neuroticism and Psychoticism scores. Notably, when Psychoticism is maximized, several models produce an overrepresentation of non-binary and LGBTQ+ identities, raising concerns about stereotyping and the potential pathologization of marginalized groups. Our findings highlight both the potential and the risks of using LLMs to generate psychologically grounded synthetic populations.
- [496] arXiv:2602.03337 [pdf, ps, other]
-
Title: Vigemers: on the number of $k$-mers sharing the same XOR-based minimizerSubjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
In bioinformatics, minimizers have become an inescapable method for handling $k$-mers (words of fixed size $k$) extracted from DNA or RNA sequencing, whether for sampling, storage, querying or partitioning. According to some fixed order on $m$-mers ($m<k$), the minimizer of a $k$-mer is defined as its smallest $m$-mer -- and acts as its fingerprint. Although minimizers are widely used for partitioning purposes, there is almost no theoretical work on the quality of the resulting partitions. For instance, it has been known for decades that the lexicographic order empirically leads to highly unbalanced partitions that are unusable in practice, but it was not until very recently that this observation was theoretically substantiated. The rejection of the lexicographic order has led the community to resort to (pseudo-)random orders using hash functions. In this work, we extend the theoretical results relating to the partitions obtained by the lexicographical order, departing from it to a (exponentially) large family of hash functions, namely where the $m$-mers are XORed against a fixed key. More precisely, provided a key $\gamma$ and a $m$-mer $w$, we investigate the function that counts how many $k$-mers admit $w$ as their minimizer (i.e. where $w\oplus\gamma$ is minimal among all $m$-mers of said $k$-mers). This number, denoted by $\pi_k^{\gamma}(w)$, represents the maximum size of the bucket associated with $w$, if all possible $k$-mers were to be seen and partitioned. We adapt the (lexicographical order) method of the literature to our framework and propose combinatorial equations that allow to compute, using dynamic programming, $\pi_k^{\gamma}(w)$ in $O(km^2)$ time and $O(km)$ space.
- [497] arXiv:2602.03338 [pdf, ps, other]
-
Title: Accurate Failure Prediction in Agents Does Not Imply Effective Failure PreventionSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Proactive interventions by LLM critic models are often assumed to improve reliability, yet their effects at deployment time are poorly understood. We show that a binary LLM critic with strong offline accuracy (AUROC 0.94) can nevertheless cause severe performance degradation, inducing a 26 percentage point (pp) collapse on one model while affecting another by near zero pp. This variability demonstrates that LLM critic accuracy alone is insufficient to determine whether intervention is safe.
We identify a disruption-recovery tradeoff: interventions may recover failing trajectories but also disrupt trajectories that would have succeeded. Based on this insight, we propose a pre-deployment test that uses a small pilot of 50 tasks to estimate whether intervention is likely to help or harm, without requiring full deployment. Across benchmarks, the test correctly anticipates outcomes: intervention degrades performance on high-success tasks (0 to -26 pp), while yielding a modest improvement on the high-failure ALFWorld benchmark (+2.8 pp, p=0.014). The primary value of our framework is therefore identifying when not to intervene, preventing severe regressions before deployment. - [498] arXiv:2602.03339 [pdf, ps, other]
-
Title: Composable Visual Tokenizers with Generator-Free Diagnostics of LearnabilitySubjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce CompTok, a training framework for learning visual tokenizers whose tokens are enhanced for compositionality. CompTok uses a token-conditioned diffusion decoder. By employing an InfoGAN-style objective, where we train a recognition model to predict the tokens used to condition the diffusion decoder using the decoded images, we enforce the decoder to not ignore any of the tokens. To promote compositional control, besides the original images, CompTok also trains on tokens formed by swapping token subsets between images, enabling more compositional control of the token over the decoder. As the swapped tokens between images do not have ground truth image targets, we apply a manifold constraint via an adversarial flow regularizer to keep unpaired swap generations on the natural-image distribution. The resulting tokenizer not only achieves state-of-the-art performance on image class-conditioned generation, but also demonstrates properties such as swapping tokens between images to achieve high level semantic editing of an image. Additionally, we propose two metrics that measures the landscape of the token space that can be useful to describe not only the compositionality of the tokens, but also how easy to learn the landscape is for a generator to be trained on this space. We show in experiments that CompTok can improve on both of the metrics as well as supporting state-of-the-art generators for class conditioned generation.
- [499] arXiv:2602.03340 [pdf, ps, other]
-
Title: MentalSeek-Dx: Towards Progressive Hypothetico-Deductive Reasoning for Real-world Psychiatric DiagnosisComments: 36 pages, 27 figuresSubjects: Artificial Intelligence (cs.AI)
Mental health disorders represent a burgeoning global public health challenge. While Large Language Models (LLMs) have demonstrated potential in psychiatric assessment, their clinical utility is severely constrained by benchmarks that lack ecological validity and fine-grained diagnostic supervision. To bridge this gap, we introduce \textbf{MentalDx Bench}, the first benchmark dedicated to disorder-level psychiatric diagnosis within real-world clinical settings. Comprising 712 de-identified electronic health records annotated by board-certified psychiatrists under ICD-11 guidelines, the benchmark covers 76 disorders across 16 diagnostic categories. Evaluation of 18 LLMs reveals a critical \textit{paradigm misalignment}: strong performance at coarse diagnostic categorization contrasts with systematic failure at disorder-level diagnosis, underscoring a gap between pattern-based modeling and clinical hypothetico-deductive reasoning. In response, we propose \textbf{MentalSeek-Dx}, a medical-specialized LLM trained to internalize this clinical reasoning process through supervised trajectory construction and curriculum-based reinforcement learning. Experiments on MentalDx Bench demonstrate that MentalSeek-Dx achieves state-of-the-art (SOTA) performance with only 14B parameters, establishing a clinically grounded framework for reliable psychiatric diagnosis.
- [500] arXiv:2602.03342 [pdf, ps, other]
-
Title: Tiled Prompts: Overcoming Prompt Underspecification in Image and Video Super-ResolutionComments: 13 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Text-conditioned diffusion models have advanced image and video super-resolution by using prompts as semantic priors, but modern super-resolution pipelines typically rely on latent tiling to scale to high resolutions, where a single global caption causes prompt underspecification. A coarse global prompt often misses localized details (prompt sparsity) and provides locally irrelevant guidance (prompt misguidance) that can be amplified by classifier-free guidance. We propose Tiled Prompts, a unified framework for image and video super-resolution that generates a tile-specific prompt for each latent tile and performs super-resolution under locally text-conditioned posteriors, providing high-information guidance that resolves prompt underspecification with minimal overhead. Experiments on high resolution real-world images and videos show consistent gains in perceptual quality and text alignment, while reducing hallucinations and tile-level artifacts relative to global-prompt baselines.
- [501] arXiv:2602.03344 [pdf, ps, other]
-
Title: Robustness as an Emergent Property of Task PerformanceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Robustness is often regarded as a critical future challenge for real-world applications, where stability is essential. However, as models often learn tasks in a similar order, we hypothesize that easier tasks will be easier regardless of how they are presented to the model. Indeed, in this paper, we show that as models approach high performance on a task, robustness is effectively achieved. Through an empirical analysis of multiple models across diverse datasets and configurations (e.g., paraphrases, different temperatures), we find a strong positive correlation. Moreover, we find that robustness is primarily driven by task-specific competence rather than inherent model-level properties, challenging current approaches that treat robustness as an independent capability. Thus, from a high-level perspective, we may expect that as new tasks saturate, model robustness on these tasks will emerge accordingly. For researchers, this implies that explicit efforts to measure and improve robustness may warrant reduced emphasis, as such robustness is likely to develop alongside performance gains. For practitioners, it acts as a sign that indeed the tasks that the literature deals with are unreliable, but on easier past tasks, the models are reliable and ready for real-world deployment.
- [502] arXiv:2602.03345 [pdf, ps, other]
-
Title: Beyond Exposure: Optimizing Ranking Fairness with Non-linear Time-Income FunctionsSubjects: Information Retrieval (cs.IR)
Ranking is central to information distribution in web search and recommendation. Nowadays, in ranking optimization, the fairness to item providers is viewed as a crucial factor alongside ranking relevance for users. There are currently numerous concepts of fairness and one widely recognized fairness concept is Exposure Fairness. However, it relies primarily on exposure determined solely by position, overlooking other factors that significantly influence income, such as time. To address this limitation, we propose to study ranking fairness when the provider utility is influenced by other contextual factors and is neither equal to nor proportional to item exposure. We give a formal definition of Income Fairness and develop a corresponding measurement metric. Simulated experiments show that existing-exposure-fairness-based ranking algorithms fail to optimize the proposed income fairness. Therefore, we propose the Dynamic-Income-Derivative-aware Ranking Fairness algorithm, which, based on the marginal income gain at the present timestep, uses Taylor-expansion-based gradients to simultaneously optimize effectiveness and income fairness. In both offline and online settings with diverse time-income functions, DIDRF consistently outperforms state-of-the-art methods.
- [503] arXiv:2602.03346 [pdf, ps, other]
-
Title: Dynamics of Implicit Time-Invariant Max-Min-Plus-Scaling Discrete-Event SystemsComments: 12 pages, Under review at AutomaticaSubjects: Systems and Control (eess.SY)
Max-min-plus-scaling (MMPS) systems generalize max-plus, min-plus and max-min-plus models with more flexibility in modelling discrete-event dynamics. Especially, implicit MMPS models capture a wide range of real world discrete-event applications. This article analyzes the dynamics of an autonomous, time-invariant implicit MMPS system in a discrete-event framework. First, we provide sufficient conditions under which an implicit MMPS system admits at least one solution to its state-space representation. Then, we analyze its global behavior by determining the key parameters; the growth rates and fixed points. For a solvable MMPS system, we assess the local behavior of the system around its set of fixed points via a normalization procedure. Further, we present the notion of stability for the normalized system. A case study of the urban railway network substantiates the theoretical results.
- [504] arXiv:2602.03348 [pdf, ps, other]
-
Title: A Comparative Study of Low-Dissipation Numerical Schemes for Hyperbolic Conservation LawsComments: arXiv admin note: substantial text overlap with arXiv:2504.01699Subjects: Numerical Analysis (math.NA)
This work provides a comparative assessment of several low-dissipation numerical schemes for hyperbolic conservation laws, highlighting their performance relative to the classical Harten-Lax-van Leer (HLL) schemes. The schemes under consideration include the classical Harten-Lax-van Leer-Contact (HLLC), the recently proposed TV flux splitting, the low-dissipation Central-Upwind (LDCU), and the local characteristic decomposition-based Central-Upwind (LCDCU) schemes. These methods are extended to higher orders of accuracy, up to the fifth order, within both finite-volume and finite-difference frameworks. A series of numerical experiments for the one- and two-dimensional Euler equations of gas dynamics are performed to evaluate the accuracy, robustness, and computational efficiency of the studied schemes. The comparison highlights the trade-offs between resolution of contact and shear waves, robustness in the presence of shocks, and computational cost. The investigated low-dissipation schemes show comparable levels of numerical dissipation, with only subtle differences appearing in selected benchmark problems. The results provide practical guidance for selecting efficient low-dissipation solvers for the simulation of complex compressible flows.
- [505] arXiv:2602.03350 [pdf, ps, other]
-
Title: Manipulation via Force Distribution at ContactSubjects: Robotics (cs.RO)
Efficient and robust trajectories play a crucial role in contact-rich manipulation, which demands accurate mod- eling of object-robot interactions. Many existing approaches rely on point contact models due to their computational effi- ciency. Simple contact models are computationally efficient but inherently limited for achieving human-like, contact-rich ma- nipulation, as they fail to capture key frictional dynamics and torque generation observed in human manipulation. This study introduces a Force-Distributed Line Contact (FDLC) model in contact-rich manipulation and compares it against conventional point contact models. A bi-level optimization framework is constructed, in which the lower-level solves an optimization problem for contact force computation, and the upper-level optimization applies iLQR for trajectory optimization. Through this framework, the limitations of point contact are demon- strated, and the benefits of the FDLC in generating efficient and robust trajectories are established. The effectiveness of the proposed approach is validated by a box rotating task, demonstrating that FDLC enables trajectories generated via non-uniform force distributions along the contact line, while requiring lower control effort and less motion of the robot.
- [506] arXiv:2602.03351 [pdf, ps, other]
-
Title: Building Interpretable Models for Moral Decision-MakingComments: 8 pages, 4 figures, accepted to AAAI'26 Machine Ethics WorkshopSubjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
We build a custom transformer model to study how neural networks make moral decisions on trolley-style dilemmas. The model processes structured scenarios using embeddings that encode who is affected, how many people, and which outcome they belong to. Our 2-layer architecture achieves 77% accuracy on Moral Machine data while remaining small enough for detailed analysis. We use different interpretability techniques to uncover how moral reasoning distributes across the network, demonstrating that biases localize to distinct computational stages among other findings.
- [507] arXiv:2602.03352 [pdf, ps, other]
-
Title: PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement LearningSubjects: Computation and Language (cs.CL)
Reinforcement learning (RL) has shown strong promise for LLM-based machine translation, with recent methods such as GRPO demonstrating notable gains; nevertheless, translation-oriented RL remains challenged by noisy learning signals arising from Monte Carlo return estimation, as well as a large trajectory space that favors global exploration over fine-grained local optimization. We introduce \textbf{PEGRL}, a \textit{two-stage} RL framework that uses post-editing as an auxiliary task to stabilize training and guide overall optimization. At each iteration, translation outputs are sampled to construct post-editing inputs, allowing return estimation in the post-editing stage to benefit from conditioning on the current translation behavior, while jointly supporting both global exploration and fine-grained local optimization. A task-specific weighting scheme further balances the contributions of translation and post-editing objectives, yielding a biased yet more sample-efficient estimator. Experiments on English$\to$Finnish, English$\to$Turkish, and English$\leftrightarrow$Chinese show consistent gains over RL baselines, and for English$\to$Turkish, performance on COMET-KIWI is comparable to advanced LLM-based systems (DeepSeek-V3.2).
- [508] arXiv:2602.03353 [pdf, ps, other]
-
Title: Causal Graph Learning via Distributional Invariance of Cause-Effect RelationshipJournal-ref: Transactions on Machine Learning Research (Jan 2026)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
This paper introduces a new framework for recovering causal graphs from observational data, leveraging the observation that the distribution of an effect, conditioned on its causes, remains invariant to changes in the prior distribution of those causes. This insight enables a direct test for potential causal relationships by checking the variance of their corresponding effect-cause conditional distributions across multiple downsampled subsets of the data. These subsets are selected to reflect different prior cause distributions, while preserving the effect-cause conditional relationships. Using this invariance test and exploiting an (empirical) sparsity of most causal graphs, we develop an algorithm that efficiently uncovers causal relationships with quadratic complexity in the number of observational variables, reducing the processing time by up to 25x compared to state-of-the-art methods. Our empirical experiments on a varied benchmark of large-scale datasets show superior or equivalent performance compared to existing works, while achieving enhanced scalability.
- [509] arXiv:2602.03354 [pdf, ps, other]
-
Title: QASM: A Novel Framework for QUIC-Aware Stateful MiddleboxesSubjects: Networking and Internet Architecture (cs.NI)
Stateful Middleboxes are integral part of enterprise and campus networks that provide essential in-network, security, and value-added services. These stateful middleboxes rely on precise network flow identification. However, the adoption of HTTP/3, which uses the QUIC protocol, poses significant challenges to the proper functioning of these devices. QUIC's encryption and connection migration features obscure flow semantics, disrupting middlebox visibility and functionality. We examine how QUIC disrupts middleboxes like Network Address Translators (NATs), Rate Limiters, Load Balancers, etc., and affects Kubernetes-based service deployments. To address these challenges, we propose a novel, generalized framework that enables stateful middleboxes to reliably track QUIC connections, even when the endpoints change their internet protocol (IP) address or port numbers. Our prototype implementation demonstrates that the proposed approach preserves middlebox functionality with HTTP/3 with negligible performance overhead (< 5%) on both throughput and latency, and works effectively even under high QUIC connection migration rates of up to 100 Hz.
- [510] arXiv:2602.03355 [pdf, ps, other]
-
Title: PACE: Pretrained Audio Continual LearningComments: Accepted at ICLR 2026Subjects: Sound (cs.SD); Machine Learning (cs.LG)
Audio is a fundamental modality for analyzing speech, music, and environmental sounds. Although pretrained audio models have significantly advanced audio understanding, they remain fragile in real-world settings where data distributions shift over time. In this work, we present the first systematic benchmark for audio continual learning (CL) with pretrained models (PTMs), together with a comprehensive analysis of its unique challenges. Unlike in vision, where parameter-efficient fine-tuning (PEFT) has proven effective for CL, directly transferring such strategies to audio leads to poor performance. This stems from a fundamental property of audio backbones: they focus on low-level spectral details rather than structured semantics, causing severe upstream-downstream misalignment. Through extensive empirical study, we identify analytic classifiers with first-session adaptation (FSA) as a promising direction, but also reveal two major limitations: representation saturation in coarse-grained scenarios and representation drift in fine-grained scenarios. To address these challenges, we propose PACE, a novel method that enhances FSA via a regularized analytic classifier and enables multi-session adaptation through adaptive subspace-orthogonal PEFT for improved semantic alignment. In addition, we introduce spectrogram-based boundary-aware perturbations to mitigate representation overlap and improve stability. Experiments on six diverse audio CL benchmarks demonstrate that PACE substantially outperforms state-of-the-art baselines, marking an important step toward robust and scalable audio continual learning with PTMs.
- [511] arXiv:2602.03357 [pdf, ps, other]
-
Title: Achieving Linear Speedup for Composite Federated LearningComments: 27 pages, 12 figuresSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
This paper proposes FedNMap, a normal map-based method for composite federated learning, where the objective consists of a smooth loss and a possibly nonsmooth regularizer. FedNMap leverages a normal map-based update scheme to handle the nonsmooth term and incorporates a local correction strategy to mitigate the impact of data heterogeneity across clients. Under standard assumptions, including smooth local losses, weak convexity of the regularizer, and bounded stochastic gradient variance, FedNMap achieves linear speedup with respect to both the number of clients $n$ and the number of local updates $Q$ for nonconvex losses, both with and without the Polyak-{\L}ojasiewicz (PL) condition. To our knowledge, this is the first result establishing linear speedup for nonconvex composite federated learning.
- [512] arXiv:2602.03358 [pdf, ps, other]
-
Title: GFlowPO: Generative Flow Network as a Language Model Prompt OptimizerAuthors: Junmo Cho, Suhan Kim, Sangjune An, Minsu Kim, Dong Bok Lee, Heejun Lee, Sung Ju Hwang, Hae Beom LeeSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Finding effective prompts for language models (LMs) is critical yet notoriously difficult: the prompt space is combinatorially large, rewards are sparse due to expensive target-LM evaluation. Yet, existing RL-based prompt optimizers often rely on on-policy updates and a meta-prompt sampled from a fixed distribution, leading to poor sample efficiency. We propose GFlowPO, a probabilistic prompt optimization framework that casts prompt search as a posterior inference problem over latent prompts regularized by a meta-prompted reference-LM prior. In the first step, we fine-tune a lightweight prompt-LM with an off-policy Generative Flow Network (GFlowNet) objective, using a replay-based training policy that reuses past prompt evaluations to enable sample-efficient exploration. In the second step, we introduce Dynamic Memory Update (DMU), a training-free mechanism that updates the meta-prompt by injecting both (i) diverse prompts from a replay buffer and (ii) top-performing prompts from a small priority queue, thereby progressively concentrating the search process on high-reward regions. Across few-shot text classification, instruction induction benchmarks, and question answering tasks, GFlowPO consistently outperforms recent discrete prompt optimization baselines.
- [513] arXiv:2602.03359 [pdf, ps, other]
-
Title: MeKi: Memory-based Expert Knowledge Injection for Efficient LLM ScalingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Scaling Large Language Models (LLMs) typically relies on increasing the number of parameters or test-time computations to boost performance. However, these strategies are impractical for edge device deployment due to limited RAM and NPU resources. Despite hardware constraints, deploying performant LLM on edge devices such as smartphone remains crucial for user experience. To address this, we propose MeKi (Memory-based Expert Knowledge Injection), a novel system that scales LLM capacity via storage space rather than FLOPs. MeKi equips each Transformer layer with token-level memory experts that injects pre-stored semantic knowledge into the generation process. To bridge the gap between training capacity and inference efficiency, we employ a re-parameterization strategy to fold parameter matrices used during training into a compact static lookup table. By offloading the knowledge to ROM, MeKi decouples model capacity from computational cost, introducing zero inference latency overhead. Extensive experiments demonstrate that MeKi significantly outperforms dense LLM baselines with identical inference speed, validating the effectiveness of memory-based scaling paradigm for on-device LLMs. Project homepage is at https://github.com/ningding-o/MeKi.
- [514] arXiv:2602.03361 [pdf, ps, other]
-
Title: Z3D: Zero-Shot 3D Visual Grounding from ImagesAuthors: Nikita Drozdov, Andrey Lemeshko, Nikita Gavrilov, Anton Konushin, Danila Rukhovich, Maksim KolodiazhnyiSubjects: Computer Vision and Pattern Recognition (cs.CV)
3D visual grounding (3DVG) aims to localize objects in a 3D scene based on natural language queries. In this work, we explore zero-shot 3DVG from multi-view images alone, without requiring any geometric supervision or object priors. We introduce Z3D, a universal grounding pipeline that flexibly operates on multi-view images while optionally incorporating camera poses and depth maps. We identify key bottlenecks in prior zero-shot methods causing significant performance degradation and address them with (i) a state-of-the-art zero-shot 3D instance segmentation method to generate high-quality 3D bounding box proposals and (ii) advanced reasoning via prompt-based segmentation, which utilizes full capabilities of modern VLMs. Extensive experiments on the ScanRefer and Nr3D benchmarks demonstrate that our approach achieves state-of-the-art performance among zero-shot methods. Code is available at https://github.com/col14m/z3d .
- [515] arXiv:2602.03363 [pdf, ps, other]
-
Title: Entropy Functions on Two-Dimensional Faces of Polymatroid Region with One Extreme Ray Containing Rank-One MatroidSubjects: Information Theory (cs.IT)
Characterization of entropy functions is of fundamental importance in information theory. By imposing constraints on their Shannon outer bound, i.e., the polymatroidal region, one obtains the faces of the region and entropy functions on them with special structures. In this paper, we characterize entropy functions on 2-dimensional faces of polymatroid region of degree n with one extreme ray containing rank-1 matroid. We classify all such 2-dimensional faces with another extreme ray containing a matroid into four types.
- [516] arXiv:2602.03367 [pdf, ps, other]
-
Title: Learning-based Adaptive Control of Quadruped Robots for Active Stabilization on Moving PlatformsComments: Accepted to IROS 2024. <a href="this https URL" rel="external noopener nofollow" class="link-external link-https">Project Page</a>Subjects: Robotics (cs.RO)
A quadruped robot faces balancing challenges on a six-degrees-of-freedom moving platform, like subways, buses, airplanes, and yachts, due to independent platform motions and resultant diverse inertia forces on the robot. To alleviate these challenges, we present the Learning-based Active Stabilization on Moving Platforms (\textit{LAS-MP}), featuring a self-balancing policy and system state estimators. The policy adaptively adjusts the robot's posture in response to the platform's motion. The estimators infer robot and platform states based on proprioceptive sensor data. For a systematic training scheme across various platform motions, we introduce platform trajectory generation and scheduling methods. Our evaluation demonstrates superior balancing performance across multiple metrics compared to three baselines. Furthermore, we conduct a detailed analysis of the \textit{LAS-MP}, including ablation studies and evaluation of the estimators, to validate the effectiveness of each component.
- [517] arXiv:2602.03368 [pdf, ps, other]
-
Title: Pursuing Best Industrial Practices for Retrieval-Augmented Generation in the Medical DomainAuthors: Wei ZhuSubjects: Computation and Language (cs.CL)
While retrieval augmented generation (RAG) has been swiftly adopted in industrial applications based on large language models (LLMs), there is no consensus on what are the best practices for building a RAG system in terms of what are the components, how to organize these components and how to implement each component for the industrial applications, especially in the medical domain. In this work, we first carefully analyze each component of the RAG system and propose practical alternatives for each component. Then, we conduct systematic evaluations on three types of tasks, revealing the best practices for improving the RAG system and how LLM-based RAG systems make trade-offs between performance and efficiency.
- [518] arXiv:2602.03370 [pdf, ps, other]
-
Title: Symbol-Aware Reasoning with Masked Discrete Diffusion for Handwritten Mathematical Expression RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Handwritten Mathematical Expression Recognition (HMER) requires reasoning over diverse symbols and 2D structural layouts, yet autoregressive models struggle with exposure bias and syntactic inconsistency. We present a discrete diffusion framework that reformulates HMER as iterative symbolic refinement instead of sequential generation. Through multi-step remasking, the proposal progressively refines both symbols and structural relations, removing causal dependencies and improving structural consistency. A symbol-aware tokenization and Random-Masking Mutual Learning further enhance syntactic alignment and robustness to handwriting diversity. On the MathWriting benchmark, the proposal achieves 5.56\% CER and 60.42\% EM, outperforming strong Transformer and commercial baselines. Consistent gains on CROHME 2014--2023 demonstrate that discrete diffusion provides a new paradigm for structure-aware visual recognition beyond generative modeling.
- [519] arXiv:2602.03371 [pdf, ps, other]
-
Title: Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene CompletionComments: 15 pages, 6 figures, accepted by TIP 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
Camera-based 3D semantic scene completion (SSC) offers a cost-effective solution for assessing the geometric occupancy and semantic labels of each voxel in the surrounding 3D scene with image inputs, providing a voxel-level scene perception foundation for the perception-prediction-planning autonomous driving systems. Although significant progress has been made in existing methods, their optimization rely solely on the supervision from voxel labels and face the challenge of voxel sparsity as a large portion of voxels in autonomous driving scenarios are empty, which limits both optimization efficiency and model performance. To address this issue, we propose a \textit{Multi-Resolution Alignment (MRA)} approach to mitigate voxel sparsity in camera-based 3D semantic scene completion, which exploits the scene and instance level alignment across multi-resolution 3D features as auxiliary supervision. Specifically, we first propose the Multi-resolution View Transformer module, which projects 2D image features into multi-resolution 3D features and aligns them at the scene level through fusing discriminative seed features. Furthermore, we design the Cubic Semantic Anisotropy module to identify the instance-level semantic significance of each voxel, accounting for the semantic differences of a specific voxel against its neighboring voxels within a cubic area. Finally, we devise a Critical Distribution Alignment module, which selects critical voxels as instance-level anchors with the guidance of cubic semantic anisotropy, and applies a circulated loss for auxiliary supervision on the critical feature distribution consistency across different resolutions. The code is available at https://github.com/PKU-ICST-MIPL/MRA_TIP.
- [520] arXiv:2602.03372 [pdf, ps, other]
-
Title: SLIM-Diff: Shared Latent Image-Mask Diffusion with Lp loss for Data-Scarce Epilepsy FLAIR MRIAuthors: Mario Pascual-González, Ariadna Jiménez-Partinen, R.M. Luque-Baena, Fátima Nagib-Raya, Ezequiel López-RubioComments: 6 pages, 2 figures, 1 table, conference paperSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Focal cortical dysplasia (FCD) lesions in epilepsy FLAIR MRI are subtle and scarce, making joint image--mask generative modeling prone to instability and memorization. We propose SLIM-Diff, a compact joint diffusion model whose main contributions are (i) a single shared-bottleneck U-Net that enforces tight coupling between anatomy and lesion geometry from a 2-channel image+mask representation, and (ii) loss-geometry tuning via a tunable $L_p$ objective. As an internal baseline, we include the canonical DDPM-style objective ($\epsilon$-prediction with $L_2$ loss) and isolate the effect of prediction parameterization and $L_p$ geometry under a matched setup. Experiments show that $x_0$-prediction is consistently the strongest choice for joint synthesis, and that fractional sub-quadratic penalties ($L_{1.5}$) improve image fidelity while $L_2$ better preserves lesion mask morphology. Our code and model weights are available in https://github.com/MarioPasc/slim-diff
- [521] arXiv:2602.03373 [pdf, ps, other]
-
Title: Unifying Watermarking via Dimension-Aware MappingComments: 29 pages, 25 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Deep watermarking methods often share similar encoder-decoder architectures, yet differ substantially in their functional behaviors. We propose DiM, a new multi-dimensional watermarking framework that formulates watermarking as a dimension-aware mapping problem, thereby unifying existing watermarking methods at the functional level. Under DiM, watermark information is modeled as payloads of different dimensionalities, including one-dimensional binary messages, two-dimensional spatial masks, and three-dimensional spatiotemporal structures. We find that the dimensional configuration of embedding and extraction largely determines the resulting watermarking behavior. Same-dimensional mappings preserve payload structure and support fine-grained control, while cross-dimensional mappings enable spatial or spatiotemporal localization. We instantiate DiM in the video domain, where spatiotemporal representations enable a broader set of dimension mappings. Experiments demonstrate that varying only the embedding and extraction dimensions, without architectural changes, leads to different watermarking capabilities, including spatiotemporal tamper localization, local embedding control, and recovery of temporal order under frame disruptions.
- [522] arXiv:2602.03374 [pdf, ps, other]
-
Title: How do people watch AI-generated videos of physical scenes?Subjects: Human-Computer Interaction (cs.HC)
The growing prevalence of realistic AI-generated videos on media platforms increasingly blurs the line between fact and fiction, eroding public trust. Understanding how people watch AI-generated videos offers a human-centered perspective for improving AI detection and guiding advancements in video generation. However, existing studies have not investigated human gaze behavior in response to AI-generated videos of physical scenes. Here, we collect and analyze the eye movements from 40 participants during video understanding and AI detection tasks involving a mix of real-world and AI-generated videos. We find that given the high realism of AI-generated videos, gaze behavior is driven less by the video's actual authenticity and more by the viewer's perception of its authenticity. Our results demonstrate that the mere awareness of potential AI generation may alter media consumption from passive viewing into an active search for anomalies.
- [523] arXiv:2602.03376 [pdf, ps, other]
-
Title: PlanTRansformer: Unified Prediction and Planning with Goal-conditioned TransformerComments: Submitted and accepted at IEEE IV 2026Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Trajectory prediction and planning are fundamental yet disconnected components in autonomous driving. Prediction models forecast surrounding agent motion under unknown intentions, producing multimodal distributions, while planning assumes known ego objectives and generates deterministic trajectories. This mismatch creates a critical bottleneck: prediction lacks supervision for agent intentions, while planning requires this information. Existing prediction models, despite strong benchmarking performance, often remain disconnected from planning constraints such as collision avoidance and dynamic feasibility. We introduce Plan TRansformer (PTR), a unified Gaussian Mixture Transformer framework integrating goal-conditioned prediction, dynamic feasibility, interaction awareness, and lane-level topology reasoning. A teacher-student training strategy progressively masks surrounding agent commands during training to align with inference conditions where agent intentions are unavailable. PTR achieves 4.3%/3.5% improvement in marginal/joint mAP compared to the baseline Motion Transformer (MTR) and 15.5% planning error reduction at 5s horizon compared to GameFormer. The architecture-agnostic design enables application to diverse Transformer-based prediction models. Project Website: https://github.com/SelzerConst/PlanTRansformer
- [524] arXiv:2602.03377 [pdf, ps, other]
-
Title: SEW: Strengthening Robustness of Black-box DNN Watermarking via Specificity EnhancementComments: Accepted by KDD 2026Subjects: Cryptography and Security (cs.CR)
To ensure the responsible distribution and use of open-source deep neural networks (DNNs), DNN watermarking has become a crucial technique to trace and verify unauthorized model replication or misuse. In practice, black-box watermarks manifest as specific predictive behaviors for specially crafted samples. However, due to the generalization nature of DNNs, the keys to extracting the watermark message are not unique, which would provide attackers with more opportunities. Advanced attack techniques can reverse-engineer approximate replacements for the original watermark keys, enabling subsequent watermark removal. In this paper, we explore black-box DNN watermarking specificity, which refers to the accuracy of a watermark's response to a key. Using this concept, we introduce Specificity-Enhanced Watermarking (SEW), a new method that improves specificity by reducing the association between the watermark and approximate keys. Through extensive evaluation using three popular watermarking benchmarks, we validate that enhancing specificity significantly contributes to strengthening robustness against removal attacks. SEW effectively defends against six state-of-the-art removal attacks, while maintaining model usability and watermark verification performance.
- [525] arXiv:2602.03379 [pdf, ps, other]
-
Title: Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning FailuresComments: Accepted at ICLR 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Machine unlearning aims to remove specific content from trained models while preserving overall performance. However, the phenomenon of benign relearning, in which forgotten information reemerges even from benign fine-tuning data, reveals that existing unlearning methods remain fundamentally fragile. A common explanation attributes this effect to topical relevance, but we find this account insufficient. Through systematic analysis, we demonstrate that syntactic similarity, rather than topicality, is the primary driver: across benchmarks, syntactically similar data consistently trigger recovery even without topical overlap, due to their alignment in representations and gradients with the forgotten content. Motivated by this insight, we introduce syntactic diversification, which paraphrases the original forget queries into heterogeneous structures prior to unlearning. This approach effectively suppresses benign relearning, accelerates forgetting, and substantially alleviates the trade-off between unlearning efficacy and model utility.
- [526] arXiv:2602.03380 [pdf, ps, other]
-
Title: Seeing Through the Chain: Mitigate Hallucination in Multimodal Reasoning Models via CoT Compression and Contrastive Preference OptimizationAuthors: Hao Fang, Jinyu Li, Jiawei Kong, Tianqu Zhuang, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Yaowei WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
While multimodal reasoning models (MLRMs) have exhibited impressive capabilities, they remain prone to hallucinations, and effective solutions are still underexplored. In this paper, we experimentally analyze the hallucination cause and propose C3PO, a training-based mitigation framework comprising \textbf{C}hain-of-Thought \textbf{C}ompression and \textbf{C}ontrastive \textbf{P}reference \textbf{O}ptimization. Firstly, we identify that introducing reasoning mechanisms exacerbates models' reliance on language priors while overlooking visual inputs, which can produce CoTs with reduced visual cues but redundant text tokens. To this end, we propose to selectively filter redundant thinking tokens for a more compact and signal-efficient CoT representation that preserves task-relevant information while suppressing noise. In addition, we observe that the quality of the reasoning trace largely determines whether hallucination emerges in subsequent responses. To leverage this insight, we introduce a reasoning-enhanced preference tuning scheme that constructs training pairs using high-quality AI feedback. We further design a multimodal hallucination-inducing mechanism that elicits models' inherent hallucination patterns via carefully crafted inducers, yielding informative negative signals for contrastive correction. We provide theoretical justification for the effectiveness and demonstrate consistent hallucination reduction across diverse MLRMs and benchmarks.
- [527] arXiv:2602.03381 [pdf, ps, other]
-
Title: Dynamic Programming for Epistemic Uncertainty in Markov Decision ProcessesSubjects: Computer Science and Game Theory (cs.GT)
In this paper, we propose a general theory of ambiguity-averse MDPs, which treats the uncertain transition probabilities as random variables and evaluates a policy via a risk measure applied to its random return. This ambiguity-averse MDP framework unifies several models of MDPs with epistemic uncertainty for specific choices of risk measures. We extend the concepts of value functions and Bellman operators to our setting. Based on these objects, we establish the consequences of dynamic programming principles in this framework (existence of stationary policies, value and policy iteration algorithms), and we completely characterize law-invariant risk measures compatible with dynamic programming. Our work draws connections among several variants of MDP models and fully delineates what is possible under the dynamic programming paradigm and which risk measures require leaving it.
- [528] arXiv:2602.03383 [pdf, ps, other]
-
Title: Dynamic Topology Optimization for Non-IID Data in Decentralized LearningComments: 10 pages, 11 figures. Accepted for publication in the 13th IEEE International Conference on Big Data (BigData 2025). To appearSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Decentralized learning (DL) enables a set of nodes to train a model collaboratively without central coordination, offering benefits for privacy and scalability. However, DL struggles to train a high accuracy model when the data distribution is non-independent and identically distributed (non-IID) and when the communication topology is static. To address these issues, we propose Morph, a topology optimization algorithm for DL. In Morph, nodes adaptively choose peers for model exchange based on maximum model dissimilarity. Morph maintains a fixed in-degree while dynamically reshaping the communication graph through gossip-based peer discovery and diversity-driven neighbor selection, thereby improving robustness to data heterogeneity. Experiments on CIFAR-10 and FEMNIST with up to 100 nodes show that Morph consistently outperforms static and epidemic baselines, while closely tracking the fully connected upper bound. On CIFAR-10, Morph achieves a relative improvement of 1.12x in test accuracy compared to the state-of-the-art baselines. On FEMNIST, Morph achieves an accuracy that is 1.08x higher than Epidemic Learning. Similar trends hold for 50 node deployments, where Morph narrows the gap to the fully connected upper bound within 0.5 percentage points on CIFAR-10. These results demonstrate that Morph achieves higher final accuracy, faster convergence, and more stable learning as quantified by lower inter-node variance, while requiring fewer communication rounds than baselines and no global knowledge.
- [529] arXiv:2602.03386 [pdf, ps, other]
-
Title: An Approximate Ascent Approach To Prove Convergence of PPOAuthors: Leif Doering, Daniel Schmidt, Moritz Melcher, Sebastian Kassing, Benedikt Wille, Tilman Aach, Simon WeissmannSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Proximal Policy Optimization (PPO) is among the most widely used deep reinforcement learning algorithms, yet its theoretical foundations remain incomplete. Most importantly, convergence and understanding of fundamental PPO advantages remain widely open. Under standard theory assumptions we show how PPO's policy update scheme (performing multiple epochs of minibatch updates on multi-use rollouts with a surrogate gradient) can be interpreted as approximated policy gradient ascent. We show how to control the bias accumulated by the surrogate gradients and use techniques from random reshuffling to prove a convergence theorem for PPO that sheds light on PPO's success. Additionally, we identify a previously overlooked issue in truncated Generalized Advantage Estimation commonly used in PPO. The geometric weighting scheme induces infinite mass collapse onto the longest $k$-step advantage estimator at episode boundaries. Empirical evaluations show that a simple weight correction can yield substantial improvements in environments with strong terminal signal, such as Lunar Lander.
- [530] arXiv:2602.03387 [pdf, ps, other]
-
Title: Toward a Sustainable Federated Learning Ecosystem: A Practical Least Core Mechanism for Payoff AllocationAuthors: Zhengwei Ni, Zhidu Li, Wei Chen, Zhaoyang Zhang, Zehua Wang, F. Richard Yu, Victor C. M. LeungComments: 7 pages, 3 figures, submitted to IEEE NetworkSubjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)
Emerging network paradigms and applications increasingly rely on federated learning (FL) to enable collaborative intelligence while preserving privacy. However, the sustainability of such collaborative environments hinges on a fair and stable payoff allocation mechanism. Focusing on coalition stability, this paper introduces a payoff allocation framework based on the least core (LC) concept. Unlike traditional methods, the LC prioritizes the cohesion of the federation by minimizing the maximum dissatisfaction among all potential subgroups, ensuring that no participant has an incentive to break away. To adapt this game-theoretic concept to practical, large-scale networks, we propose a streamlined implementation with a stack-based pruning algorithm, effectively balancing computational efficiency with allocation precision. Case studies in federated intrusion detection demonstrate that our mechanism correctly identifies pivotal contributors and strategic alliances. The results confirm that the practical LC framework promotes stable collaboration and fosters a sustainable FL ecosystem.
- [531] arXiv:2602.03389 [pdf, ps, other]
-
Title: Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RLComments: 22 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Offline goal-conditioned reinforcement learning remains challenging for long-horizon tasks. While hierarchical approaches mitigate this issue by decomposing tasks, most existing methods rely on separate high- and low-level networks and generate only a single intermediate subgoal, making them inadequate for complex tasks that require coordinating multiple intermediate decisions. To address this limitation, we draw inspiration from the chain-of-thought paradigm and propose the Chain-of-Goals Hierarchical Policy (CoGHP), a novel framework that reformulates hierarchical decision-making as autoregressive sequence modeling within a unified architecture. Given a state and a final goal, CoGHP autoregressively generates a sequence of latent subgoals followed by the primitive action, where each latent subgoal acts as a reasoning step that conditions subsequent predictions. To implement this efficiently, we pioneer the use of an MLP-Mixer backbone, which supports cross-token communication and captures structural relationships among state, goal, latent subgoals, and action. Across challenging navigation and manipulation benchmarks, CoGHP consistently outperforms strong offline baselines, demonstrating improved performance on long-horizon tasks.
- [532] arXiv:2602.03390 [pdf, ps, other]
-
Title: From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric LearningComments: ICLR 2026; Code is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Unsupervised object-centric learning models, particularly slot-based architectures, have shown great promise in decomposing complex scenes. However, their reliance on reconstruction-based training creates a fundamental conflict between the sharp, high-frequency attention maps of the encoder and the spatially consistent but blurry reconstruction maps of the decoder. We identify that this discrepancy gives rise to a vicious cycle: the noisy feature map from the encoder forces the decoder to average over possibilities and produce even blurrier outputs, while the gradient computed from blurry reconstruction maps lacks high-frequency details necessary to supervise encoder features. To break this cycle, we introduce Synergistic Representation Learning (SRL) that establishes a virtuous cycle where the encoder and decoder mutually refine one another. SRL leverages the encoder's sharpness to deblur the semantic boundary within the decoder output, while exploiting the decoder's spatial consistency to denoise the encoder's features. This mutual refinement process is stabilized by a warm-up phase with a slot regularization objective that initially allocates distinct entities per slot. By bridging the representational gap between the encoder and decoder, SRL achieves state-of-the-art results on video object-centric learning benchmarks. Codes are available at https://github.com/hynnsk/SRL.
- [533] arXiv:2602.03392 [pdf, ps, other]
-
Title: On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language ModelsAuthors: Shumin Wang, Yuexiang Xie, Wenhao Zhang, Yuchang Sun, Yanxi Chen, Yaliang Li, Yanyong ZhangSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Entropy serves as a critical metric for measuring the diversity of outputs generated by large language models (LLMs), providing valuable insights into their exploration capabilities. While recent studies increasingly focus on monitoring and adjusting entropy to better balance exploration and exploitation in reinforcement fine-tuning (RFT), a principled understanding of entropy dynamics during this process is yet to be thoroughly investigated. In this paper, we establish a theoretical framework for analyzing the entropy dynamics during the RFT process, which begins with a discriminant expression that quantifies entropy change under a single logit update. This foundation enables the derivation of a first-order expression for entropy change, which can be further extended to the update formula of Group Relative Policy Optimization (GRPO). The corollaries and insights drawn from the theoretical analysis inspire the design of entropy control methods, and also offer a unified lens for interpreting various entropy-based methods in existing studies. We provide empirical evidence to support the main conclusions of our analysis and demonstrate the effectiveness of the derived entropy-discriminator clipping methods. This study yields novel insights into RFT training dynamics, providing theoretical support and practical strategies for optimizing the exploration-exploitation balance during LLM fine-tuning.
- [534] arXiv:2602.03395 [pdf, ps, other]
-
Title: The Label Horizon Paradox: Rethinking Supervision Targets in Financial ForecastingSubjects: Machine Learning (cs.LG)
While deep learning has revolutionized financial forecasting through sophisticated architectures, the design of the supervision signal itself is rarely scrutinized. We challenge the canonical assumption that training labels must strictly mirror inference targets, uncovering the Label Horizon Paradox: the optimal supervision signal often deviates from the prediction goal, shifting across intermediate horizons governed by market dynamics. We theoretically ground this phenomenon in a dynamic signal-noise trade-off, demonstrating that generalization hinges on the competition between marginal signal realization and noise accumulation. To operationalize this insight, we propose a bi-level optimization framework that autonomously identifies the optimal proxy label within a single training run. Extensive experiments on large-scale financial datasets demonstrate consistent improvements over conventional baselines, thereby opening new avenues for label-centric research in financial forecasting.
- [535] arXiv:2602.03396 [pdf, ps, other]
-
Title: Towards Distillation-Resistant Large Language Models: An Information-Theoretic PerspectiveAuthors: Hao Fang, Tianyi Zhang, Tianqu Zhuang, Jiawei Kong, Kuofeng Gao, Bin Chen, Leqi Liang, Shu-Tao Xia, Ke XuSubjects: Computation and Language (cs.CL)
Proprietary large language models (LLMs) embody substantial economic value and are generally exposed only as black-box APIs, yet adversaries can still exploit their outputs to extract knowledge via distillation. Existing defenses focus exclusively on text-based distillation, leaving the important logit-based distillation largely unexplored. In this work, we analyze this problem and present an effective solution from an information-theoretic perspective. We characterize distillation-relevant information in teacher outputs using the conditional mutual information (CMI) between teacher logits and input queries conditioned on ground-truth labels. This quantity captures contextual information beneficial for model extraction, motivating us to defend distillation via CMI minimization. Guided by our theoretical analysis, we propose learning a transformation matrix that purifies the original outputs to enhance distillation resistance. We further derive a CMI-inspired anti-distillation objective to optimize this transformation, which effectively removes distillation-relevant information while preserving output utility. Extensive experiments across multiple LLMs and strong distillation algorithms demonstrate that the proposed method significantly degrades distillation performance while preserving task accuracy, effectively protecting models' intellectual property.
- [536] arXiv:2602.03397 [pdf, ps, other]
-
Title: Enhancing Navigation Efficiency of Quadruped Robots via Leveraging Personal Transportation PlatformsComments: Accepted to ICRA 2025. <a href="this https URL" rel="external noopener nofollow" class="link-external link-https">Project Page</a>Subjects: Robotics (cs.RO)
Quadruped robots face limitations in long-range navigation efficiency due to their reliance on legs. To ameliorate the limitations, we introduce a Reinforcement Learning-based Active Transporter Riding method (\textit{RL-ATR}), inspired by humans' utilization of personal transporters, including Segways. The \textit{RL-ATR} features a transporter riding policy and two state estimators. The policy devises adequate maneuvering strategies according to transporter-specific control dynamics, while the estimators resolve sensor ambiguities in non-inertial frames by inferring unobservable robot and transporter states. Comprehensive evaluations in simulation validate proficient command tracking abilities across various transporter-robot models and reduced energy consumption compared to legged locomotion. Moreover, we conduct ablation studies to quantify individual component contributions within the \textit{RL-ATR}. This riding ability could broaden the locomotion modalities of quadruped robots, potentially expanding the operational range and efficiency.
- [537] arXiv:2602.03400 [pdf, ps, other]
-
Title: Precision in Practice: Knowledge Guided Code Summarizing Grounded in Industrial ExpectationsSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Code summaries are essential for helping developers understand code functionality and reducing maintenance and collaboration costs. Although recent advances in large language models (LLMs) have significantly improved automatic code summarization, the practical usefulness of generated summaries in industrial settings remains insufficiently explored. In collaboration with documentation experts from the industrial HarmonyOS project, we conducted a questionnaire study showing that over 57.4% of code summaries produced by state-of-the-art approaches were rejected due to violations of developers' expectations for industrial documentation. Beyond semantic similarity to reference summaries, developers emphasize additional requirements, including the use of appropriate domain terminology, explicit function categorization, and the avoidance of redundant implementation details.
To address these expectations, we propose ExpSum, an expectation-aware code summarization approach that integrates function metadata abstraction, informative metadata filtering, context-aware domain knowledge retrieval, and constraint-driven prompting to guide LLMs in generating structured, expectation-aligned summaries. We evaluate ExpSum on the HarmonyOS project and widely used code summarization benchmarks. Experimental results show that ExpSum consistently outperforms all baselines, achieving improvements of up to 26.71% in BLEU-4 and 20.10% in ROUGE-L on HarmonyOS. Furthermore, LLM-based evaluations indicate that ExpSum-generated summaries better align with developer expectations across other projects, demonstrating its effectiveness for industrial code documentation. - [538] arXiv:2602.03402 [pdf, ps, other]
-
Title: Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising UtilitySubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Vision language models (VLMs) extend the reasoning capabilities of large language models (LLMs) to cross-modal settings, yet remain highly vulnerable to multimodal jailbreak attacks. Existing defenses predominantly rely on safety fine-tuning or aggressive token manipulations, incurring substantial training costs or significantly degrading utility. Recent research shows that LLMs inherently recognize unsafe content in text, and the incorporation of visual inputs in VLMs frequently dilutes risk-related signals. Motivated by this, we propose Risk Awareness Injection (RAI), a lightweight and training-free framework for safety calibration that restores LLM-like risk recognition by amplifying unsafe signals in VLMs. Specifically, RAI constructs an Unsafe Prototype Subspace from language embeddings and performs targeted modulation on selected high-risk visual tokens, explicitly activating safety-critical signals within the cross-modal feature space. This modulation restores the model's LLM-like ability to detect unsafe content from visual inputs, while preserving the semantic integrity of original tokens for cross-modal reasoning. Extensive experiments across multiple jailbreak and utility benchmarks demonstrate that RAI substantially reduces attack success rate without compromising task performance.
- [539] arXiv:2602.03403 [pdf, ps, other]
-
Title: Feasible strategies for conflict resolution within intuitionistic fuzzy preference-based conflict situationsSubjects: Artificial Intelligence (cs.AI)
In three-way conflict analysis, preference-based conflict situations characterize agents' attitudes towards issues by formally modeling their preferences over pairs of issues. However, existing preference-based conflict models rely exclusively on three qualitative relations, namely, preference, converse, and indifference, to describe agents' attitudes towards issue pairs, which significantly limits their capacity in capturing the essence of conflict. To overcome this limitation, we introduce the concept of an intuitionistic fuzzy preference-based conflict situation that captures agents' attitudes towards issue pairs with finer granularity than that afforded by classical preference-based models. Afterwards, we develop intuitionistic fuzzy preference-based conflict measures within this framework, and construct three-way conflict analysis models for trisecting the set of agent pairs, the agent set, and the issue set. Additionally, relative loss functions built on the proposed conflict functions are employed to calculate thresholds for three-way conflict analysis. Finally, we present adjustment mechanism-based feasible strategies that simultaneously account for both adjustment magnitudes and conflict degrees, together with an algorithm for constructing such feasible strategies, and provide an illustrative example to demonstrate the validity and effectiveness of the proposed model.
- [540] arXiv:2602.03406 [pdf, ps, other]
-
Title: Deep-Learning-Based Control of a Decoupled Two-Segment Continuum Robot for Endoscopic Submucosal DissectionAuthors: Yuancheng Shao, Yao Zhang, Jia Gu, Zixi Chen, Di Wu, Yuqiao Chen, Bo Lu, Wenjie Liu, Cesare Stefanini, Peng QiSubjects: Robotics (cs.RO)
Manual endoscopic submucosal dissection (ESD) is technically demanding, and existing single-segment robotic tools offer limited dexterity. These limitations motivate the development of more advanced solutions. To address this, DESectBot, a novel dual segment continuum robot with a decoupled structure and integrated surgical forceps, enabling 6 degrees of freedom (DoFs) tip dexterity for improved lesion targeting in ESD, was developed in this work. Deep learning controllers based on gated recurrent units (GRUs) for simultaneous tip position and orientation control, effectively handling the nonlinear coupling between continuum segments, were proposed. The GRU controller was benchmarked against Jacobian based inverse kinematics, model predictive control (MPC), a feedforward neural network (FNN), and a long short-term memory (LSTM) network. In nested-rectangle and Lissajous trajectory tracking tasks, the GRU achieved the lowest position/orientation RMSEs: 1.11 mm/ 4.62{\deg} and 0.81 mm/ 2.59{\deg}, respectively. For orientation control at a fixed position (four target poses), the GRU attained a mean RMSE of 0.14 mm and 0.72{\deg}, outperforming all alternatives. In a peg transfer task, the GRU achieved a 100% success rate (120 success/120 attempts) with an average transfer time of 11.8s, the STD significantly outperforms novice-controlled systems. Additionally, an ex vivo ESD demonstration grasping, elevating, and resecting tissue as the scalpel completed the cut confirmed that DESectBot provides sufficient stiffness to divide thick gastric mucosa and an operative workspace adequate for large lesions.These results confirm that GRU-based control significantly enhances precision, reliability, and usability in ESD surgical training scenarios.
- [541] arXiv:2602.03407 [pdf, ps, other]
-
Title: Universal Costas Matrices: Towards a General Framework for Costas Array ConstructionComments: Accepted for IEEE International Conference on Communications (ICC) 2026Subjects: Information Theory (cs.IT); Combinatorics (math.CO)
Costas arrays are a special type of permutation matrices with ideal autocorrelation and low cross-correlation properties, making them valuable for radar, wireless communication, and integrated sensing and communication applications. This paper presents a novel unified framework for analyzing and discovering new Costas arrays. We introduce Universal Costas Matrices (UCMs) and Universal Costas Frequency Matrices (UCFMs) and investigate their structural characteristics. A framework integrating UCMs and UCFMs is proposed to pave the way for future artificial intelligence-assisted Costas array discovery. Leveraging the structural properties of UCMs and UCFMs, a reconstruction-based search method is developed to generate UCMs from UCFMs. Numerical results demonstrate that the proposed approach significantly accelerates the search process and enhances structural insight into Costas array generation.
- [542] arXiv:2602.03410 [pdf, ps, other]
-
Title: UnHype: CLIP-Guided Hypernetworks for Dynamic LoRA UnlearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
Recent advances in large-scale diffusion models have intensified concerns about their potential misuse, particularly in generating realistic yet harmful or socially disruptive content. This challenge has spurred growing interest in effective machine unlearning, the process of selectively removing specific knowledge or concepts from a model without compromising its overall generative capabilities. Among various approaches, Low-Rank Adaptation (LoRA) has emerged as an effective and efficient method for fine-tuning models toward targeted unlearning. However, LoRA-based methods often exhibit limited adaptability to concept semantics and struggle to balance removing closely related concepts with maintaining generalization across broader meanings. Moreover, these methods face scalability challenges when multiple concepts must be erased simultaneously. To address these limitations, we introduce UnHype, a framework that incorporates hypernetworks into single- and multi-concept LoRA training. The proposed architecture can be directly plugged into Stable Diffusion as well as modern flow-based text-to-image models, where it demonstrates stable training behavior and effective concept control. During inference, the hypernetwork dynamically generates adaptive LoRA weights based on the CLIP embedding, enabling more context-aware, scalable unlearning. We evaluate UnHype across several challenging tasks, including object erasure, celebrity erasure, and explicit content removal, demonstrating its effectiveness and versatility. Repository: https://github.com/gmum/UnHype.
- [543] arXiv:2602.03411 [pdf, ps, other]
-
Title: SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-TrainingAuthors: Huatong Song, Lisheng Huang, Shuang Sun, Jinhao Jiang, Ran Le, Daixuan Cheng, Guoxin Chen, Yiwen Hu, Zongchao Chen, Wayne Xin Zhao, Yang Song, Tao Zhang, Ji-Rong WenSubjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
In this technical report, we present SWE-Master, an open-source and fully reproducible post-training framework for building effective software engineering agents. SWE-Master systematically explores the complete agent development pipeline, including teacher-trajectory synthesis and data curation, long-horizon SFT, RL with real execution feedback, and inference framework design. Starting from an open-source base model with limited initial SWE capability, SWE-Master demonstrates how systematical optimization method can elicit strong long-horizon SWE task solving abilities. We evaluate SWE-Master on SWE-bench Verified, a standard benchmark for realistic software engineering tasks. Under identical experimental settings, our approach achieves a resolve rate of 61.4\% with Qwen2.5-Coder-32B, substantially outperforming existing open-source baselines. By further incorporating test-time scaling~(TTS) with LLM-based environment feedback, SWE-Master reaches 70.8\% at TTS@8, demonstrating a strong performance potential. SWE-Master provides a practical and transparent foundation for advancing reproducible research on software engineering agents. The code is available at https://github.com/RUCAIBox/SWE-Master.
- [544] arXiv:2602.03412 [pdf, ps, other]
-
Title: Verified Critical Step Optimization for LLM AgentsAuthors: Mukai Li, Qingcheng Zeng, Tianqing Fang, Zhenwen Liang, Linfeng Song, Qi Liu, Haitao Mi, Dong YuComments: Working in progressSubjects: Computation and Language (cs.CL)
As large language model agents tackle increasingly complex long-horizon tasks, effective post-training becomes critical. Prior work faces fundamental challenges: outcome-only rewards fail to precisely attribute credit to intermediate steps, estimated step-level rewards introduce systematic noise, and Monte Carlo sampling approaches for step reward estimation incur prohibitive computational cost. Inspired by findings that only a small fraction of high-entropy tokens drive effective RL for reasoning, we propose Critical Step Optimization (CSO), which focuses preference learning on verified critical steps, decision points where alternate actions demonstrably flip task outcomes from failure to success. Crucially, our method starts from failed policy trajectories rather than expert demonstrations, directly targeting the policy model's weaknesses. We use a process reward model (PRM) to identify candidate critical steps, leverage expert models to propose high-quality alternatives, then continue execution from these alternatives using the policy model itself until task completion. Only alternatives that the policy successfully executes to correct outcomes are verified and used as DPO training data, ensuring both quality and policy reachability. This yields fine-grained, verifiable supervision at critical decisions while avoiding trajectory-level coarseness and step-level noise. Experiments on GAIA-Text-103 and XBench-DeepSearch show that CSO achieves 37% and 26% relative improvement over the SFT baseline and substantially outperforms other post-training methods, while requiring supervision at only 16% of trajectory steps. This demonstrates the effectiveness of selective verification-based learning for agent post-training.
- [545] arXiv:2602.03414 [pdf, ps, other]
-
Title: Socratic-Geo: Synthetic Data Generation and Geometric Reasoning via Multi-Agent InteractionComments: 18pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Multimodal Large Language Models (MLLMs) have significantly advanced vision-language understanding. However, even state-of-the-art models struggle with geometric reasoning, revealing a critical bottleneck: the extreme scarcity of high-quality image-text pairs. Human annotation is prohibitively expensive, while automated methods fail to ensure fidelity and training effectiveness. Existing approaches either passively adapt to available images or employ inefficient random exploration with filtering, decoupling generation from learning needs. We propose Socratic-Geo, a fully autonomous framework that dynamically couples data synthesis with model learning through multi-agent interaction. The Teacher agent generates parameterized Python scripts with reflective feedback (Reflect for solvability, RePI for visual validity), ensuring image-text pair purity. The Solver agent optimizes reasoning through preference learning, with failure paths guiding Teacher's targeted augmentation. Independently, the Generator learns image generation capabilities on accumulated "image-code-instruction" triplets, distilling programmatic drawing intelligence into visual generation. Starting from only 108 seed problems, Socratic-Solver achieves 49.11 on six benchmarks using one-quarter of baseline data, surpassing strong baselines by 2.43 points. Socratic-Generator achieves 42.4% on GenExam, establishing new state-of-the-art for open-source models, surpassing Seedream-4.0 (39.8%) and approaching Gemini-2.5-Flash-Image (43.1%).
- [546] arXiv:2602.03415 [pdf, ps, other]
-
Title: Most Convolutional Networks Suffer from Small Adversarial PerturbationsSubjects: Machine Learning (cs.LG)
The existence of adversarial examples is relatively understood for random fully connected neural networks, but much less so for convolutional neural networks (CNNs). The recent work [Daniely, 2025] establishes that adversarial examples can be found in CNNs, in some non-optimal distance from the input. We extend over this work and prove that adversarial examples in random CNNs with input dimension $d$ can be found already in $\ell_2$-distance of order $\lVert x \rVert /\sqrt{d}$ from the input $x$, which is essentially the nearest possible. We also show that such adversarial small perturbations can be found using a single step of gradient descent. To derive our results we use Fourier decomposition to efficiently bound the singular values of a random linear convolutional operator, which is the main ingredient of a CNN layer. This bound might be of independent interest.
- [547] arXiv:2602.03416 [pdf, ps, other]
-
Title: AesRec: A Dataset for Aesthetics-Aligned Clothing Outfit RecommendationSubjects: Information Retrieval (cs.IR)
Clothing recommendation extends beyond merely generating personalized outfits; it serves as a crucial medium for aesthetic guidance. However, existing methods predominantly rely on user-item-outfit interaction behaviors while overlooking explicit representations of clothing aesthetics. To bridge this gap, we present the AesRec benchmark dataset featuring systematic quantitative aesthetic annotations, thereby enabling the development of aesthetics-aligned recommendation systems. Grounded in professional apparel quality standards and fashion aesthetic principles, we define a multidimensional set of indicators. At the item level, six dimensions are independently assessed: silhouette, chromaticity, materiality, craftsmanship, wearability, and item-level impression. Transitioning to the outfit level, the evaluation retains the first five core attributes while introducing stylistic synergy, visual harmony, and outfit-level impression as distinct metrics to capture the collective aesthetic impact. Given the increasing human-like proficiency of Vision-Language Models in multimodal understanding and interaction, we leverage them for large-scale aesthetic scoring. We conduct rigorous human-machine consistency validation on a fashion dataset, confirming the reliability of the generated ratings. Experimental results based on AesRec further demonstrate that integrating quantified aesthetic information into clothing recommendation models can provide aesthetic guidance for users while fulfilling their personalized requirements.
- [548] arXiv:2602.03417 [pdf, ps, other]
-
Title: FactNet: A Billion-Scale Knowledge Graph for Multilingual Factual GroundingAuthors: Yingli Shen, Wen Lai, Jie Zhou, Xueren Zhang, Yudong Wang, Kangyang Luo, Shuo Wang, Ge Gao, Alexander Fraser, Maosong SunSubjects: Computation and Language (cs.CL)
While LLMs exhibit remarkable fluency, their utility is often compromised by factual hallucinations and a lack of traceable provenance. Existing resources for grounding mitigate this but typically enforce a dichotomy: they offer either structured knowledge without textual context (e.g., knowledge bases) or grounded text with limited scale and linguistic coverage. To bridge this gap, we introduce FactNet, a massive, open-source resource designed to unify 1.7 billion atomic assertions with 3.01 billion auditable evidence pointers derived exclusively from 316 Wikipedia editions. Unlike recent synthetic approaches, FactNet employs a strictly deterministic construction pipeline, ensuring that every evidence unit is recoverable with byte-level precision. Extensive auditing confirms a high grounding precision of 92.1%, even in long-tail languages. Furthermore, we establish FactNet-Bench, a comprehensive evaluation suite for Knowledge Graph Completion, Question Answering, and Fact Checking. FactNet provides the community with a foundational, reproducible resource for training and evaluating trustworthy, verifiable multilingual systems.
- [549] arXiv:2602.03418 [pdf, ps, other]
-
Title: Learning-based Initialization of Trajectory Optimization for Path-following Problems of Redundant ManipulatorsComments: Accepted to ICRA 2023. <a href="this https URL" rel="external noopener nofollow" class="link-external link-https">Project Page</a>Subjects: Robotics (cs.RO)
Trajectory optimization (TO) is an efficient tool to generate a redundant manipulator's joint trajectory following a 6-dimensional Cartesian path. The optimization performance largely depends on the quality of initial trajectories. However, the selection of a high-quality initial trajectory is non-trivial and requires a considerable time budget due to the extremely large space of the solution trajectories and the lack of prior knowledge about task constraints in configuration space. To alleviate the issue, we present a learning-based initial trajectory generation method that generates high-quality initial trajectories in a short time budget by adopting example-guided reinforcement learning. In addition, we suggest a null-space projected imitation reward to consider null-space constraints by efficiently learning kinematically feasible motion captured in expert demonstrations. Our statistical evaluation in simulation shows the improved optimality, efficiency, and applicability of TO when we plug in our method's output, compared with three other baselines. We also show the performance improvement and feasibility via real-world experiments with a seven-degree-of-freedom manipulator.
- [550] arXiv:2602.03419 [pdf, ps, other]
-
Title: SWE-World: Building Software Engineering Agents in Docker-Free EnvironmentsAuthors: Shuang Sun, Huatong Song, Lisheng Huang, Jinhao Jiang, Ran Le, Zhihao Lv, Zongchao Chen, Yiwen Hu, Wenyang Luo, Wayne Xin Zhao, Yang Song, Hongteng Xu, Tao Zhang, Ji-Rong WenSubjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
Recent advances in large language models (LLMs) have enabled software engineering agents to tackle complex code modification tasks. Most existing approaches rely on execution feedback from containerized environments, which require dependency-complete setup and physical execution of programs and tests. While effective, this paradigm is resource-intensive and difficult to maintain, substantially complicating agent training and limiting scalability. We propose SWE-World, a Docker-free framework that replaces physical execution environments with a learned surrogate for training and evaluating software engineering agents. SWE-World leverages LLM-based models trained on real agent-environment interaction data to predict intermediate execution outcomes and final test feedback, enabling agents to learn without interacting with physical containerized environments. This design preserves the standard agent-environment interaction loop while eliminating the need for costly environment construction and maintenance during agent optimization and evaluation. Furthermore, because SWE-World can simulate the final evaluation outcomes of candidate trajectories without real submission, it enables selecting the best solution among multiple test-time attempts, thereby facilitating effective test-time scaling (TTS) in software engineering tasks. Experiments on SWE-bench Verified demonstrate that SWE-World raises Qwen2.5-Coder-32B from 6.2\% to 52.0\% via Docker-free SFT, 55.0\% with Docker-free RL, and 68.2\% with further TTS. The code is available at https://github.com/RUCAIBox/SWE-World
- [551] arXiv:2602.03420 [pdf, ps, other]
-
Title: CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation SteeringSubjects: Sound (cs.SD); Machine Learning (cs.LG)
Emotional expression in human speech is nuanced and compositional, often involving multiple, sometimes conflicting, affective cues that may diverge from linguistic content. In contrast, most expressive text-to-speech systems enforce a single utterance-level emotion, collapsing affective diversity and suppressing mixed or text-emotion-misaligned expression. While activation steering via latent direction vectors offers a promising solution, it remains unclear whether emotion representations are linearly steerable in TTS, where steering should be applied within hybrid TTS architectures, and how such complex emotion behaviors should be evaluated. This paper presents the first systematic analysis of activation steering for emotional control in hybrid TTS models, introducing a quantitative, controllable steering framework, and multi-rater evaluation protocols that enable composable mixed-emotion synthesis and reliable text-emotion mismatch synthesis. Our results demonstrate, for the first time, that emotional prosody and expressive variability are primarily synthesized by the TTS language module instead of the flow-matching module, and also provide a lightweight steering approach for generating natural, human-like emotional speech.
- [552] arXiv:2602.03421 [pdf, ps, other]
-
Title: On (Im)possibility of Network Oblivious Transfer via Noisy Channels and Non-Signaling CorrelationsSubjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)
This work investigates the fundamental limits of implementing network oblivious transfer via noisy multiple access channels and broadcast channels between honest-but-curious parties when the parties have access to general tripartite non-signaling correlations. By modeling the shared resource as an arbitrary tripartite non-signaling box, we obtain a unified perspective on both the channel behavior and the resulting correlations. Our main result demonstrates that perfect oblivious transfer is impossible. In the asymptotic regime, we further show that even negligible leakage cannot be achieved, as repeated use of the resource amplifies the receiver(s)'s ability to distinguish messages that were not intended for him/them. In contrast, the receiver(s)'s own privacy is not subject to a universal impossibility limitation.
- [553] arXiv:2602.03422 [pdf, ps, other]
-
Title: RankSteer: Activation Steering for Pointwise LLM RankingSubjects: Information Retrieval (cs.IR)
Large language models (LLMs) have recently shown strong performance as zero-shot rankers, yet their effectiveness is highly sensitive to prompt formulation, particularly role-play instructions. Prior analyses suggest that role-related signals are encoded along activation channels that are largely separate from query-document representations, raising the possibility of steering ranking behavior directly at the activation level rather than through brittle prompt engineering. In this work, we propose RankSteer, a post-hoc activation steering framework for zero-shot pointwise LLM ranking. We characterize ranking behavior through three disentangled and steerable directions in representation space: a \textbf{decision direction} that maps hidden states to relevance scores, an \textbf{evidence direction} that captures relevance signals not directly exploited by the decision head, and a \textbf{role direction} that modulates model behavior without injecting relevance information. Using projection-based interventions at inference time, RankSteer jointly controls these directions to calibrate ranking behavior without modifying model weights or introducing explicit cross-document comparisons. Experiments on TREC DL 20 and multiple BEIR benchmarks show that RankSteer consistently improves ranking quality using only a small number of anchor queries, demonstrating that substantial ranking capacity remains under-utilized in pointwise LLM rankers. We further provide a geometric analysis revealing that steering improves ranking by stabilizing ranking geometry and reducing dispersion, offering new insight into how LLMs internally represent and calibrate relevance judgments.
- [554] arXiv:2602.03423 [pdf, ps, other]
-
Title: Origin Lens: A Privacy-First Mobile Framework for Cryptographic Image Provenance and AI DetectionAuthors: Alexander Loth, Dominique Conceicao Rosario, Peter Ebinger, Martin Kappes, Marc-Oliver PahlComments: Accepted at ACM TheWebConf '26 CompanionSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
The proliferation of generative AI poses challenges for information integrity assurance, requiring systems that connect model governance with end-user verification. We present Origin Lens, a privacy-first mobile framework that targets visual disinformation through a layered verification architecture. Unlike server-side detection systems, Origin Lens performs cryptographic image provenance verification and AI detection locally on the device via a Rust/Flutter hybrid architecture. Our system integrates multiple signals - including cryptographic provenance, generative model fingerprints, and optional retrieval-augmented verification - to provide users with graded confidence indicators at the point of consumption. We discuss the framework's alignment with regulatory requirements (EU AI Act, DSA) and its role in verification infrastructure that complements platform-level mechanisms.
- [555] arXiv:2602.03425 [pdf, ps, other]
-
Title: ConsistentRFT: Reducing Visual Hallucinations in Flow-based Reinforcement Fine-TuningAuthors: Xiaofeng Tan, Jun Liu, Yuanting Fan, Bin-Bin Gao, Xi Jiang, Xiaochen Chen, Jinlong Peng, Chengjie Wang, Hongsong Wang, Feng ZhengSubjects: Computer Vision and Pattern Recognition (cs.CV)
Reinforcement Fine-Tuning (RFT) on flow-based models is crucial for preference alignment. However, they often introduce visual hallucinations like over-optimized details and semantic misalignment. This work preliminarily explores why visual hallucinations arise and how to reduce them. We first investigate RFT methods from a unified perspective, and reveal the core problems stemming from two aspects, exploration and exploitation: (1) limited exploration during stochastic differential equation (SDE) rollouts, leading to an over-emphasis on local details at the expense of global semantics, and (2) trajectory imitation process inherent in policy gradient methods, distorting the model's foundational vector field and its cross-step consistency. Building on this, we propose ConsistentRFT, a general framework to mitigate these hallucinations. Specifically, we design a Dynamic Granularity Rollout (DGR) mechanism to balance exploration between global semantics and local details by dynamically scheduling different noise sources. We then introduce a Consistent Policy Gradient Optimization (CPGO) that preserves the model's consistency by aligning the current policy with a more stable prior. Extensive experiments demonstrate that ConsistentRFT significantly mitigates visual hallucinations, achieving average reductions of 49\% for low-level and 38\% for high-level perceptual hallucinations. Furthermore, ConsistentRFT outperforms other RFT methods on out-of-domain metrics, showing an improvement of 5.1\% (v.s. the baseline's decrease of -0.4\%) over FLUX1.dev. This is \href{https://xiaofeng-tan.github.io/projects/ConsistentRFT}{Project Page}.
- [556] arXiv:2602.03428 [pdf, ps, other]
-
Title: On singular Galerkin discretizations for three models in high-frequency scatteringSubjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
We consider three common mathematical models for time-harmonic high frequency scattering: the Helmholtz equation in two and three spatial dimensions, a transverse magnetic problem in two dimensions, and Maxwell's equation in three dimensions with dissipative boundary conditions such that the continuous problem is well posed. In this paper, we construct meshes for popular (low order) Galerkin finite element discretizations such that the discrete system matrix becomes singular and the discrete problem is not well posed. This implies that a condition "the finite element space has to be sufficiently rich" in the form of a resolution condition - typically imposed for discrete well-posedness - is not an artifact from the proof by a compact perturbation argument but necessary for discrete stability of the Galerkin discretization.
- [557] arXiv:2602.03429 [pdf, ps, other]
-
Title: DiscoverLLM: From Executing Intents to Discovering ThemSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
To handle ambiguous and open-ended requests, Large Language Models (LLMs) are increasingly trained to interact with users to surface intents they have not yet expressed (e.g., ask clarification questions). However, users are often ambiguous because they have not yet formed their intents: they must observe and explore outcomes to discover what they want. Simply asking "what kind of tone do you want?" fails when users themselves do not know. We introduce DiscoverLLM, a novel and generalizable framework that trains LLMs to help users form and discover their intents. Central to our approach is a novel user simulator that models cognitive state with a hierarchy of intents that progressively concretize as the model surfaces relevant options -- where the degree of concretization serves as a reward signal that models can be trained to optimize. Resulting models learn to collaborate with users by adaptively diverging (i.e., explore options) when intents are unclear, and converging (i.e., refine and implement) when intents concretize. Across proposed interactive benchmarks in creative writing, technical writing, and SVG drawing, DiscoverLLM achieves over 10% higher task performance while reducing conversation length by up to 40%. In a user study with 75 human participants, DiscoverLLM improved conversation satisfaction and efficiency compared to baselines.
- [558] arXiv:2602.03430 [pdf, ps, other]
-
Title: ProAct: A Benchmark and Multimodal Framework for Structure-Aware Proactive ResponseAuthors: Xiaomeng Zhu, Fengming Zhu, Weijie Zhou, Ye Tian, Zhenlin Hu, Yufei Huang, Yuchun Guo, Xinyu Wu, Zhengyou Zhang, Fangzhen Lin, Xuantang XiongSubjects: Robotics (cs.RO)
While passive agents merely follow instructions, proactive agents align with higher-level objectives, such as assistance and safety by continuously monitoring the environment to determine when and how to act. However, developing proactive agents is hindered by the lack of specialized resources. To address this, we introduce ProAct-75, a benchmark designed to train and evaluate proactive agents across diverse domains, including assistance, maintenance, and safety monitoring. Spanning 75 tasks, our dataset features 91,581 step-level annotations enriched with explicit task graphs. These graphs encode step dependencies and parallel execution possibilities, providing the structural grounding necessary for complex decision-making. Building on this benchmark, we propose ProAct-Helper, a reference baseline powered by a Multimodal Large Language Model (MLLM) that grounds decision-making in state detection, and leveraging task graphs to enable entropy-driven heuristic search for action selection, allowing agents to execute parallel threads independently rather than mirroring the human's next step. Extensive experiments demonstrate that ProAct-Helper outperforms strong closed-source models, improving trigger detection mF1 by 6.21%, saving 0.25 more steps in online one-step decision, and increasing the rate of parallel actions by 15.58%.
- [559] arXiv:2602.03432 [pdf, ps, other]
-
Title: Failure is Feedback: History-Aware Backtracking for Agentic Traversal in Multimodal GraphsComments: Project page: this https URLSubjects: Information Retrieval (cs.IR)
Open-domain multimodal document retrieval aims to retrieve specific components (paragraphs, tables, or images) from large and interconnected document corpora. Existing graph-based retrieval approaches typically rely on a uniform similarity metric that overlooks hop-specific semantics, and their rigid pre-defined plans hinder dynamic error correction. These limitations suggest that a retriever should adapt its reasoning to the evolving context and recover intelligently from dead ends. To address these needs, we propose Failure is Feedback (FiF), which casts subgraph retrieval as a sequential decision process and introduces two key innovations. (i) We introduce a history-aware backtracking mechanism; unlike standard backtracking that simply reverts the state, our approach piggybacks on the context of failed traversals, leveraging insights from previous failures. (ii) We implement an economically-rational agentic workflow. Unlike conventional agents with static strategies, our orchestrator employs a cost-aware traversal method to dynamically manage the trade-off between retrieval accuracy and inference costs, escalating to intensive LLM-based reasoning only when the prior failure justifies the additional computational investment. Extensive experiments show that FiF achieves state-of-the-art retrieval on the benchmarks of MultimodalQA, MMCoQA and WebQA.
- [560] arXiv:2602.03433 [pdf, ps, other]
-
Title: When control meets large language models: From words to dynamicsSubjects: Systems and Control (eess.SY)
While large language models (LLMs) are transforming engineering and technology through enhanced control capabilities and decision support, they are simultaneously evolving into complex dynamical systems whose behavior must be regulated. This duality highlights a reciprocal connection in which prompts support control system design while control theory helps shape prompts to achieve specific goals efficiently. In this study, we frame this emerging interconnection of LLM and control as a bidirectional continuum, from prompt design to system dynamics. First, we investigate how LLMs can advance the field of control in two distinct capacities: directly, by assisting in the design and synthesis of controllers, and indirectly, by augmenting research workflows. Second, we examine how control concepts help LLMs steer their trajectories away from undesired meanings, improving reachability and alignment via input optimization, parameter editing, and activation-level interventions. Third, we look into deeper integrations by treating LLMs as dynamic systems within a state-space framework, where their internal representations are closely linked to external control loops. Finally, we identify key challenges and outline future research directions to understand LLM behavior and develop interpretable and controllable LLMs that are as trustworthy and robust as their electromechanical counterparts, thereby ensuring they continue to support and safeguard society.
- [561] arXiv:2602.03435 [pdf, ps, other]
-
Title: Model-based Optimal Control for Rigid-Soft Underactuated SystemsAuthors: Daniele Caradonna, Nikhil Nair, Anup Teejo Mathew, Daniel Feliu Talegón, Imran Afgan, Egidio Falotico, Cosimo Della Santina, Federico RendaSubjects: Robotics (cs.RO)
Continuum soft robots are inherently underactuated and subject to intrinsic input constraints, making dynamic control particularly challenging, especially in hybrid rigid-soft robots. While most existing methods focus on quasi-static behaviors, dynamic tasks such as swing-up require accurate exploitation of continuum dynamics. This has led to studies on simple low-order template systems that often fail to capture the complexity of real continuum deformations. Model-based optimal control offers a systematic solution; however, its application to rigid-soft robots is often limited by the computational cost and inaccuracy of numerical differentiation for high-dimensional models. Building on recent advances in the Geometric Variable Strain model that enable analytical derivatives, this work investigates three optimal control strategies for underactuated soft systems-Direct Collocation, Differential Dynamic Programming, and Nonlinear Model Predictive Control-to perform dynamic swing-up tasks. To address stiff continuum dynamics and constrained actuation, implicit integration schemes and warm-start strategies are employed to improve numerical robustness and computational efficiency. The methods are evaluated in simulation on three Rigid-Soft and high-order soft benchmark systems-the Soft Cart-Pole, the Soft Pendubot, and the Soft Furuta Pendulum- highlighting their performance and computational trade-offs.
- [562] arXiv:2602.03436 [pdf, ps, other]
-
Title: On the Complexity of Maximal/Closed Frequent Tree Mining for Bounded Height TreesSubjects: Data Structures and Algorithms (cs.DS)
In this paper, we address the problem of enumerating all frequent maximal/closed trees. This is a classical and central problem in data mining. Although many practical algorithms have been developed for this problem, its complexity under ``realistic assumptions'' on tree height has not been clarified. More specifically, while it was known that the mining problem becomes hard when the tree height is at least 60, the complexity for cases where the tree height is smaller has not yet been clarified. We resolve this gap by establishing results for these tree mining problems under several settings, including ordered and unordered trees, as well as maximal and closed variants.
- [563] arXiv:2602.03439 [pdf, ps, other]
-
Title: Ontology-to-tools compilation for executable semantic constraint enforcement in LLM agentsAuthors: Xiaochi Zhou, Patrick Bulter, Changxuan Yang, Simon D. Rihm, Thitikarn Angkanaporn, Jethro Akroyd, Sebastian Mosbach, Markus KraftSubjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
We introduce ontology-to-tools compilation as a proof-of-principle mechanism for coupling large language models (LLMs) with formal domain knowledge. Within The World Avatar (TWA), ontological specifications are compiled into executable tool interfaces that LLM-based agents must use to create and modify knowledge graph instances, enforcing semantic constraints during generation rather than through post-hoc validation. Extending TWA's semantic agent composition framework, the Model Context Protocol (MCP) and associated agents are integral components of the knowledge graph ecosystem, enabling structured interaction between generative models, symbolic constraints, and external resources. An agent-based workflow translates ontologies into ontology-aware tools and iteratively applies them to extract, validate, and repair structured knowledge from unstructured scientific text. Using metal-organic polyhedra synthesis literature as an illustrative case, we show how executable ontological semantics can guide LLM behaviour and reduce manual schema and prompt engineering, establishing a general paradigm for embedding formal knowledge into generative systems.
- [564] arXiv:2602.03442 [pdf, ps, other]
-
Title: A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval InterfacesComments: 18 pages, 8 figuresSubjects: Computation and Language (cs.CL)
Frontier language models have demonstrated strong reasoning and long-horizon tool-use capabilities. However, existing RAG systems fail to leverage these capabilities. They still rely on two paradigms: (1) designing an algorithm that retrieves passages in a single shot and concatenates them into the model's input, or (2) predefining a workflow and prompting the model to execute it step-by-step. Neither paradigm allows the model to participate in retrieval decisions, preventing efficient scaling with model improvements. In this paper, we introduce A-RAG, an Agentic RAG framework that exposes hierarchical retrieval interfaces directly to the model. A-RAG provides three retrieval tools: keyword search, semantic search, and chunk read, enabling the agent to adaptively search and retrieve information across multiple granularities. Experiments on multiple open-domain QA benchmarks show that A-RAG consistently outperforms existing approaches with comparable or lower retrieved tokens, demonstrating that A-RAG effectively leverages model capabilities and dynamically adapts to different RAG tasks. We further systematically study how A-RAG scales with model size and test-time compute. We will release our code and evaluation suite to facilitate future research. Code and evaluation suite are available at https://github.com/Ayanami0730/arag.
- [565] arXiv:2602.03444 [pdf, ps, other]
-
Title: Exploiting Multi-Core Parallelism in Blockchain Validation and ConstructionSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Blockchain validators can reduce block processing time by exploiting multi-core CPUs, but deterministic execution must preserve a given total order while respecting transaction conflicts and per-block runtime limits. This paper systematically examines how validators can exploit multi-core parallelism during both block construction and execution without violating blockchain semantics. We formalize two validator-side optimization problems: (i) executing an already ordered block on \(p\) cores to minimize makespan while ensuring equivalence to sequential execution; and (ii) selecting and scheduling a subset of mempool transactions under a runtime limit \(B\) to maximize validator reward. For both, we develop exact Mixed-Integer Linear Programming (MILP) formulations that capture conflict, order, and capacity constraints, and propose fast deterministic heuristics that scale to realistic workloads. Using Ethereum mainnet traces and including a Solana-inspired declared-access baseline (Sol) for ordered-block scheduling and a simple reward-greedy baseline (RG) for block construction, we empirically quantify the trade-offs between optimality and runtime.
- [566] arXiv:2602.03445 [pdf, ps, other]
-
Title: CRL-VLA: Continual Vision-Language-Action LearningAuthors: Qixin Zeng, Shuo Zhang, Hongyin Zhang, Renjie Wang, Han Zhao, Libang Zhao, Runze Li, Donglin Wang, Chao HuangSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Lifelong learning is critical for embodied agents in open-world environments, where reinforcement learning fine-tuning has emerged as an important paradigm to enable Vision-Language-Action (VLA) models to master dexterous manipulation through environmental interaction. Thus, Continual Reinforcement Learning (CRL) is a promising pathway for deploying VLA models in lifelong robotic scenarios, yet balancing stability (retaining old skills) and plasticity (learning new ones) remains a formidable challenge for existing methods. We introduce CRL-VLA, a framework for continual post-training of VLA models with rigorous theoretical bounds. We derive a unified performance bound linking the stability-plasticity trade-off to goal-conditioned advantage magnitude, scaled by policy divergence. CRL-VLA resolves this dilemma via asymmetric regulation: constraining advantage magnitudes on prior tasks while enabling controlled growth on new tasks. This is realized through a simple but effective dual-critic architecture with novel Goal-Conditioned Value Formulation (GCVF), where a frozen critic anchors semantic consistency and a trainable estimator drives adaptation. Experiments on the LIBERO benchmark demonstrate that CRL-VLA effectively harmonizes these conflicting objectives, outperforming baselines in both anti-forgetting and forward adaptation.
- [567] arXiv:2602.03447 [pdf, ps, other]
-
Title: HetroD: A High-Fidelity Drone Dataset and Benchmark for Autonomous Driving in Heterogeneous TrafficAuthors: Yu-Hsiang Chen, Wei-Jer Chang, Christian Kotulla, Thomas Keutgens, Steffen Runde, Tobias Moers, Christoph Klas, Wei Zhan, Masayoshi Tomizuka, Yi-Ting ChenComments: IEEE International Conference on Robotics and Automation (ICRA) 2026Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
We present HetroD, a dataset and benchmark for developing autonomous driving systems in heterogeneous environments. HetroD targets the critical challenge of navi- gating real-world heterogeneous traffic dominated by vulner- able road users (VRUs), including pedestrians, cyclists, and motorcyclists that interact with vehicles. These mixed agent types exhibit complex behaviors such as hook turns, lane splitting, and informal right-of-way negotiation. Such behaviors pose significant challenges for autonomous vehicles but remain underrepresented in existing datasets focused on structured, lane-disciplined traffic. To bridge the gap, we collect a large- scale drone-based dataset to provide a holistic observation of traffic scenes with centimeter-accurate annotations, HD maps, and traffic signal states. We further develop a modular toolkit for extracting per-agent scenarios to support downstream task development. In total, the dataset comprises over 65.4k high- fidelity agent trajectories, 70% of which are from VRUs. HetroD supports modeling of VRU behaviors in dense, het- erogeneous traffic and provides standardized benchmarks for forecasting, planning, and simulation tasks. Evaluation results reveal that state-of-the-art prediction and planning models struggle with the challenges presented by our dataset: they fail to predict lateral VRU movements, cannot handle unstructured maneuvers, and exhibit limited performance in dense and multi-agent scenarios, highlighting the need for more robust approaches to heterogeneous traffic. See our project page for more examples: https://hetroddata.github.io/HetroD/
- [568] arXiv:2602.03448 [pdf, ps, other]
-
Title: Hierarchical Concept-to-Appearance Guidance for Multi-Subject Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Multi-subject image generation aims to synthesize images that faithfully preserve the identities of multiple reference subjects while following textual instructions. However, existing methods often suffer from identity inconsistency and limited compositional control, as they rely on diffusion models to implicitly associate text prompts with reference images. In this work, we propose Hierarchical Concept-to-Appearance Guidance (CAG), a framework that provides explicit, structured supervision from high-level concepts to fine-grained appearances. At the conceptual level, we introduce a VAE dropout training strategy that randomly omits reference VAE features, encouraging the model to rely more on robust semantic signals from a Visual Language Model (VLM) and thereby promoting consistent concept-level generation in the absence of complete appearance cues. At the appearance level, we integrate the VLM-derived correspondences into a correspondence-aware masked attention module within the Diffusion Transformer (DiT). This module restricts each text token to attend only to its matched reference regions, ensuring precise attribute binding and reliable multi-subject composition. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the multi-subject image generation, substantially improving prompt following and subject consistency.
- [569] arXiv:2602.03452 [pdf, ps, other]
-
Title: Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional PairingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Reinforcement learning with verifiable rewards (RLVR) is effective for training large language models on deterministic outcome reasoning tasks. Prior work shows RLVR works with few prompts, but prompt selection is often based only on training-accuracy variance, leading to unstable optimization directions and weaker transfer. We revisit prompt selection from a mechanism-level view and argue that an effective minibatch should provide both (i) a reliable positive anchor and (ii) explicit negative learning signals from rare failures. Based on this principle, we propose \emph{positive--negative pairing}: at each update, we sample a hard-but-solvable $q^{+}$ and an easy-but-brittle prompt $q^{-}$(high success rate but not perfect), characterized by low and high empirical success rates under multiple rollouts. We further introduce Weighted GRPO, which reweights binary outcomes at the pair level and uses group-normalized advantages to amplify rare successes on $q^{+}$ into sharp positive guidance while turning rare failures on $q^{-}$ into strong negative penalties. This bidirectional signal provides informative learning feedback for both successes and failures, improving sample efficiency without suppressing exploration. On Qwen2.5-Math-7B, a single paired minibatch per update consistently outperforms a GRPO baseline that selects two prompts via commonly used variance-based selection heuristics: AIME~2025 Pass@8 improves from 16.8 to 22.2, and AMC23 Pass@64 from 94.0 to 97.0, while remaining competitive with large-scale RLVR trained from a pool of 1209 training prompts. Similar gains are observed on Qwen2.5-Math-7B-Instruct.
- [570] arXiv:2602.03454 [pdf, ps, other]
-
Title: Contextualized Visual Personalization in Vision-Language ModelsComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Despite recent progress in vision-language models (VLMs), existing approaches often fail to generate personalized responses based on the user's specific experiences, as they lack the ability to associate visual inputs with a user's accumulated visual-textual context. We newly formalize this challenge as contextualized visual personalization, which requires the visual recognition and textual retrieval of personalized visual experiences by VLMs when interpreting new images. To address this issue, we propose CoViP, a unified framework that treats personalized image captioning as a core task for contextualized visual personalization and improves this capability through reinforcement-learning-based post-training and caption-augmented generation. We further introduce diagnostic evaluations that explicitly rule out textual shortcut solutions and verify whether VLMs truly leverage visual context. Extensive experiments demonstrate that existing open-source and proprietary VLMs exhibit substantial limitations, while CoViP not only improves personalized image captioning but also yields holistic gains across downstream personalization tasks. These results highlight CoViP as a crucial stage for enabling robust and generalizable contextualized visual personalization.
- [571] arXiv:2602.03455 [pdf, ps, other]
-
Title: Game-Theoretic and Algorithmic Analyses of Multi-Agent Routing under Crossing CostsComments: Accepted as extended abstract at AAMAS 2026Subjects: Multiagent Systems (cs.MA); Computational Complexity (cs.CC); Computer Science and Game Theory (cs.GT)
Coordinating the movement of multiple autonomous agents over a shared network is a fundamental challenge in algorithmic robotics, intelligent transportation, and distributed systems. The dominant approach, Multi-Agent Path Finding, relies on centralized control and synchronous collision avoidance, which often requires strict synchronization and guarantees of globally conflict-free execution. This paper introduces the Multi-Agent Routing under Crossing Cost model on mixed graphs, a novel framework tailored to asynchronous settings. In our model, instead of treating conflicts as hard constraints, each agent is assigned a path, and the system is evaluated through a cost function that measures potential head-on encounters. This ``crossing cost'', which is defined as the product of the numbers of agents traversing an edge in opposite directions, quantifies the risk of congestion and delay in decentralized execution.
Our contributions are both game-theoretic and algorithmic. We model the setting as a congestion game with a non-standard cost function, prove the existence of pure Nash equilibria, and analyze the dynamics leading to them. Equilibria can be found in polynomial time under mild conditions, while the general case is PLS-complete. From an optimization perspective, minimizing the total crossing cost is NP-hard, as the problem generalizes Steiner Orientation. To address this hardness barrier, we design a suite of parameterized algorithms for minimizing crossing cost, with parameters including the number of arcs, edges, agents, and structural graph measures. These yield XP or FPT results depending on the parameter, offering algorithmic strategies for structurally restricted instances. Our framework provides a new theoretical foundation for decentralized multi-agent routing, bridging equilibrium analysis and parameterized complexity to support scalable and risk-aware coordination. - [572] arXiv:2602.03459 [pdf, ps, other]
-
Title: Causal Inference on Networks under Misspecified Exposure Mappings: A Partial Identification FrameworkSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
Estimating treatment effects in networks is challenging, as each potential outcome depends on the treatments of all other nodes in the network. To overcome this difficulty, existing methods typically impose an exposure mapping that compresses the treatment assignments in the network into a low-dimensional summary. However, if this mapping is misspecified, standard estimators for direct and spillover effects can be severely biased. We propose a novel partial identification framework for causal inference on networks to assess the robustness of treatment effects under misspecifications of the exposure mapping. Specifically, we derive sharp upper and lower bounds on direct and spillover effects under such misspecifications. As such, our framework presents a novel application of causal sensitivity analysis to exposure mappings. We instantiate our framework for three canonical exposure settings widely used in practice: (i) weighted means of the neighborhood treatments, (ii) threshold-based exposure mappings, and (iii) truncated neighborhood interference in the presence of higher-order spillovers. Furthermore, we develop orthogonal estimators for these bounds and prove that the resulting bound estimates are valid, sharp, and efficient. Our experiments show the bounds remain informative and provide reliable conclusions under misspecification of exposure mappings.
- [573] arXiv:2602.03461 [pdf, ps, other]
-
Title: Soft-Radial Projection for Constrained End-to-End LearningSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Computational Finance (q-fin.CP); Machine Learning (stat.ML)
Integrating hard constraints into deep learning is essential for safety-critical systems. Yet existing constructive layers that project predictions onto constraint boundaries face a fundamental bottleneck: gradient saturation. By collapsing exterior points onto lower-dimensional surfaces, standard orthogonal projections induce rank-deficient Jacobians, which nullify gradients orthogonal to active constraints and hinder optimization. We introduce Soft-Radial Projection, a differentiable reparameterization layer that circumvents this issue through a radial mapping from Euclidean space into the interior of the feasible set. This construction guarantees strict feasibility while preserving a full-rank Jacobian almost everywhere, thereby preventing the optimization stalls typical of boundary-based methods. We theoretically prove that the architecture retains the universal approximation property and empirically show improved convergence behavior and solution quality over state-of-the-art optimization- and projection-based baselines.
- [574] arXiv:2602.03462 [pdf, ps, other]
-
Title: RAL-Bench: Benchmarking for Application-Level Functional Correctness and Non-Functional Quality AttributesSubjects: Software Engineering (cs.SE)
Code generation has advanced rapidly with code-focused large language models (LLMs), especially on snippet-level tasks. However, application-level generation requires producing a runnable multi-file repository with correct structure, dependencies, and end-to-end executability, and real-world software must satisfy both functional correctness and non-functional quality (e.g., maintainability, security). Existing benchmarks provide a limited execution-based assessment of these requirements at the application level. We ask: Can current LLMs generate application-level repositories that meet both functional and non-functional criteria? We propose RAL-Bench, a benchmark and evaluation framework for application-level code generation. For each task, we distill a concise natural-language requirement from a high-quality reference project, build black-box system tests covering functional and non-functional attributes, and keep only tests that pass on the reference repository to ensure a sound oracle and an end-to-end executable suite. Functional correctness is measured by system-test pass rate. Non-functional quality is measured along five ISO/IEC 25010-inspired dimensions and aggregated with an Analytic Hierarchy Process (AHP)-derived weight vector, with per-dimension diagnostics and baseline-normalized scoring using reference measurements. Across 16 LLMs evaluated zero-shot with greedy decoding, functional correctness is the dominant bottleneck: no model exceeds a 45% functional pass rate under our requirement-driven, reference-validated tests. We release RAL-Bench at https://github.com/Wwstarry/RAL-Bench. .
- [575] arXiv:2602.03467 [pdf, ps, other]
-
Title: The Dual Role of Abstracting over the Irrelevant in Symbolic Explanations: Cognitive Effort vs. UnderstandingComments: 8 pages, 5 figuresSubjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Explanations are central to human cognition, yet AI systems often produce outputs that are difficult to understand. While symbolic AI offers a transparent foundation for interpretability, raw logical traces often impose a high extraneous cognitive load. We investigate how formal abstractions, specifically removal and clustering, impact human reasoning performance and cognitive effort. Utilizing Answer Set Programming (ASP) as a formal framework, we define a notion of irrelevant details to be abstracted over to obtain simplified explanations. Our cognitive experiments, in which participants classified stimuli across domains with explanations derived from an answer set program, show that clustering details significantly improve participants' understanding, while removal of details significantly reduce cognitive effort, supporting the hypothesis that abstraction enhances human-centered symbolic explanations.
- [576] arXiv:2602.03468 [pdf, ps, other]
-
Title: IntentRL: Training Proactive User-intent Agents for Open-ended Deep Research via Reinforcement LearningComments: PreprintSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Deep Research (DR) agents extend Large Language Models (LLMs) beyond parametric knowledge by autonomously retrieving and synthesizing evidence from large web corpora into long-form reports, enabling a long-horizon agentic paradigm. However, unlike real-time conversational assistants, DR is computationally expensive and time-consuming, creating an autonomy-interaction dilemma: high autonomy on ambiguous user queries often leads to prolonged execution with unsatisfactory outcomes. To address this, we propose IntentRL, a framework that trains proactive agents to clarify latent user intents before starting long-horizon research. To overcome the scarcity of open-ended research data, we introduce a scalable pipeline that expands a few seed samples into high-quality dialogue turns via a shallow-to-deep intent refinement graph. We further adopt a two-stage reinforcement learning (RL) strategy: Stage I applies RL on offline dialogues to efficiently learn general user-interaction behavior, while Stage II uses the trained agent and a user simulator for online rollouts to strengthen adaptation to diverse user feedback. Extensive experiments show that IntentRL significantly improves both intent hit rate and downstream task performance, outperforming the built-in clarify modules of closed-source DR agents and proactive LLM baselines.
- [577] arXiv:2602.03470 [pdf, ps, other]
-
Title: Reading Between the Code Lines: On the Use of Self-Admitted Technical Debt for Security AnalysisAuthors: Nicolás E. Díaz Ferreyra, Moritz Mock, Max Kretschmann, Barbara Russo, Mojtaba Shahin, Mansooreh Zahedi, Riccardo ScandariatoComments: Preprint submitted to Journal of Systems and SoftwareSubjects: Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC); Software Engineering (cs.SE)
Static Analysis Tools (SATs) are central to security engineering activities, as they enable early identification of code weaknesses without requiring execution. However, their effectiveness is often limited by high false-positive rates and incomplete coverage of vulnerability classes. At the same time, developers frequently document security-related shortcuts and compromises as Self-Admitted Technical Debt (SATD) in software artifacts, such as code comments. While prior work has recognized SATD as a rich source of security information, it remains unclear whether -and in what ways- it is utilized during SAT-aided security analysis. OBJECTIVE: This work investigates the extent to which security-related SATD complements the output produced by SATs and helps bridge some of their well-known limitations. METHOD: We followed a mixed-methods approach consisting of (i) the analysis of a SATD-annotated vulnerability dataset using three state-of-the-art SATs and (ii) an online survey with 72 security practitioners. RESULTS: The combined use of all SATs flagged 114 of the 135 security-related SATD instances, spanning 24 distinct Common Weakness Enumeration (CWE) identifiers. A manual mapping of the SATD comments revealed 33 unique CWE types, 6 of which correspond to categories that SATs commonly overlook or struggle to detect (e.g., race conditions). Survey responses further suggest that developers frequently pair SAT outputs with SATD insights to better understand the impact and root causes of security weaknesses and to identify suitable fixes. IMPLICATIONS: Our findings show that such SATD-encoded information can be a meaningful complement to SAT-driven security analysis, while helping to overcome some of SATs' practical shortcomings.
- [578] arXiv:2602.03472 [pdf, ps, other]
-
Title: Inlier-Centric Post-Training Quantization for Object Detection ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Object detection is pivotal in computer vision, yet its immense computational demands make deployment slow and power-hungry, motivating quantization. However, task-irrelevant morphologies such as background clutter and sensor noise induce redundant activations (or anomalies). These anomalies expand activation ranges and skew activation distributions toward task-irrelevant responses, complicating bit allocation and weakening the preservation of informative features. Without a clear criterion to distinguish anomalies, suppressing them can inadvertently discard useful information. To address this, we present InlierQ, an inlier-centric post-training quantization approach that separates anomalies from informative inliers. InlierQ computes gradient-aware volume saliency scores, classifies each volume as an inlier or anomaly, and fits a posterior distribution over these scores using the Expectation-Maximization (EM) algorithm. This design suppresses anomalies while preserving informative features. InlierQ is label-free, drop-in, and requires only 64 calibration samples. Experiments on the COCO and nuScenes benchmarks show consistent reductions in quantization error for camera-based (2D and 3D) and LiDAR-based (3D) object detection.
- [579] arXiv:2602.03473 [pdf, ps, other]
-
Title: Scaling Continual Learning with Bi-Level Routing Mixture-of-ExpertsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Continual learning, especially class-incremental learning (CIL), on the basis of a pre-trained model (PTM) has garnered substantial research interest in recent years. However, how to effectively learn both discriminative and comprehensive feature representations while maintaining stability and plasticity over very long task sequences remains an open problem. We propose CaRE, a scalable {C}ontinual Le{a}rner with efficient Bi-Level {R}outing Mixture-of-{E}xperts (BR-MoE). The core idea of BR-MoE is a bi-level routing mechanism: a router selection stage that dynamically activates relevant task-specific routers, followed by an expert routing phase that dynamically activates and aggregates experts, aiming to inject discriminative and comprehensive representations into every intermediate network layer. On the other hand, we introduce a challenging evaluation protocol for comprehensively assessing CIL methods across very long task sequences spanning hundreds of tasks. Extensive experiments show that CaRE demonstrates leading performance across a variety of datasets and task settings, including commonly used CIL datasets with classical CIL settings (e.g., 5-20 tasks). To the best of our knowledge, CaRE is the first continual learner that scales to very long task sequences (ranging from 100 to over 300 non-overlapping tasks), while outperforming all baselines by a large margin on such task sequences. Code will be publicly released at https://github.com/LMMMEng/CaRE.git.
- [580] arXiv:2602.03474 [pdf, ps, other]
-
Title: Recursive Energy Efficient AgreementSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Agreement is a foundational problem in distributed computing that have been studied extensively for over four decades. Recently, Meir, Mirault, Peleg and Robinson introduced the notion of \emph{Energy Efficient Agreement}, where the goal is to solve Agreement while minimizing the number of round a party participates in, thereby reducing the energy cost per participant. We show a recursive Agreement algorithm that has $O(\log f)$ active rounds per participant, where $f<n$ represents the maximum number of crash faults in the system.
- [581] arXiv:2602.03476 [pdf, ps, other]
-
Title: TactDeform: Finger Pad Deformation Inspired Spatial Tactile Feedback for Virtual Geometry ExplorationComments: Accepted to CHI 2026. Version of Record: DOI this https URLSubjects: Human-Computer Interaction (cs.HC)
Spatial tactile feedback can enhance the realism of geometry exploration in virtual reality applications. Current vibrotactile approaches often face challenges with the spatial and temporal resolution needed to render different 3D geometries. Inspired by the natural deformation of finger pads when exploring 3D objects and surfaces, we propose TactDeform, a parametric approach to render spatio-temporal tactile patterns using a finger-worn electro-tactile interface. The system dynamically renders electro-tactile patterns based on both interaction contexts (approaching, contact, and sliding) and geometric contexts (geometric features and textures), emulating deformations that occur during real-world touch exploration. Results from a user study \rr{(N=24)} show that the proposed approach enabled high texture discrimination and geometric feature identification compared to a baseline. Informed by results from a free 3D-geometry exploration phase, we provide insights that can inform future tactile interface designs.
- [582] arXiv:2602.03477 [pdf, ps, other]
-
Title: ScDiVa: Masked Discrete Diffusion for Joint Modeling of Single-Cell Identity and ExpressionComments: 19 pages, 11 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Genomics (q-bio.GN)
Single-cell RNA-seq profiles are high-dimensional, sparse, and unordered, causing autoregressive generation to impose an artificial ordering bias and suffer from error accumulation. To address this, we propose scDiVa, a masked discrete diffusion foundation model that aligns generation with the dropout-like corruption process by defining a continuous-time forward masking mechanism in token space. ScDiVa features a bidirectional denoiser that jointly models discrete gene identities and continuous values, utilizing entropy-normalized serialization and a latent anchor token to maximize information efficiency and preserve global cell identity. The model is trained via depth-invariant time sampling and a dual denoising objective to simulate varying sparsity levels while ensuring precise recovery of both identity and magnitude. Pre-trained on 59 million cells, scDiVa achieves strong transfer performance across major benchmarks, including batch integration, cell type annotation, and perturbation response prediction. These results suggest that masked discrete diffusion serves as a biologically coherent and effective alternative to autoregression.
- [583] arXiv:2602.03478 [pdf, ps, other]
-
Title: When Routing Collapses: On the Degenerate Convergence of LLM RoutersSubjects: Artificial Intelligence (cs.AI)
LLM routing aims to achieve a favorable quality--cost trade-off by dynamically assigning easy queries to smaller models and harder queries to stronger ones. However, across both unimodal and multimodal settings, we uncover a pervasive yet underexplored failure mode in existing routers: as the user's cost budget increases, routers systematically default to the most capable and most expensive model even when cheaper models already suffice. As a result, current routers under-utilize small models, wasting computation and monetary cost and undermining the core promise of routing; we term this phenomenon routing collapse. We attribute routing collapse to an objective--decision mismatch: many routers are trained to predict scalar performance scores, whereas routing decisions ultimately depend on discrete comparisons among candidate models. Consequently, small prediction errors can flip relative orderings and trigger suboptimal selections. To bridge this gap, we propose EquiRouter, a decision-aware router that directly learns model rankings, restoring the role of smaller models and mitigating routing collapse. On RouterBench, EquiRouter reduces cost by about 17\% at GPT-4-level performance compared to the strongest prior router. Our code is available at https://github.com/AIGNLAI/EquiRouter.
- [584] arXiv:2602.03484 [pdf, ps, other]
-
Title: Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on SwedishAuthors: Jenny KunzComments: Accepted to TACL. Note that the arXiv version is a pre-MIT Press publication versionSubjects: Computation and Language (cs.CL)
In this study, we investigate how language models develop preferences for \textit{idiomatic} as compared to \textit{linguistically acceptable} Swedish, both during pretraining and when adapting a model from English to Swedish. To do so, we train models on Swedish from scratch and by fine-tuning English-pretrained models, probing their preferences at various checkpoints using minimal pairs that differ in linguistic acceptability or idiomaticity. For linguistic acceptability, we adapt existing benchmarks into a minimal-pair format. To assess idiomaticity, we introduce two novel datasets: one contrasting conventionalized idioms with plausible variants, and another contrasting idiomatic Swedish with Translationese. Our findings suggest that idiomatic competence emerges more slowly than other linguistic abilities, including grammatical and lexical correctness. While longer training yields diminishing returns for most tasks, idiom-related performance continues to improve, particularly in the largest model tested (8B). However, instruction tuning on data machine-translated from English -- the common approach for languages with little or no native instruction data -- causes models to rapidly lose their preference for idiomatic language.
- [585] arXiv:2602.03485 [pdf, ps, other]
-
Title: Self-Verification Dilemma: Experience-Driven Suppression of Overused Checking in LLM ReasoningComments: 19 pages, 8 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large Reasoning Models (LRMs) achieve strong performance by generating long reasoning traces with reflection. Through a large-scale empirical analysis, we find that a substantial fraction of reflective steps consist of self-verification (recheck) that repeatedly confirm intermediate results. These rechecks occur frequently across models and benchmarks, yet the vast majority are confirmatory rather than corrective, rarely identifying errors and altering reasoning outcomes. This reveals a mismatch between how often self-verification is activated and how often it is actually useful. Motivated by this, we propose a novel, experience-driven test-time framework that reduces the overused verification. Our method detects the activation of recheck behavior, consults an offline experience pool of past verification outcomes, and estimates whether a recheck is likely unnecessary via efficient retrieval. When historical experience suggests unnecessary, a suppression signal redirects the model to proceed. Across multiple model and benchmarks, our approach reduces token usage up to 20.3% while maintaining the accuracy, and in some datasets even yields accuracy improvements.
- [586] arXiv:2602.03486 [pdf, ps, other]
-
Title: DeepDFA: Injecting Temporal Logic in Deep Learning for Sequential Subsymbolic ApplicationsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Integrating logical knowledge into deep neural network training is still a hard challenge, especially for sequential or temporally extended domains involving subsymbolic observations. To address this problem, we propose DeepDFA, a neurosymbolic framework that integrates high-level temporal logic - expressed as Deterministic Finite Automata (DFA) or Moore Machines - into neural architectures. DeepDFA models temporal rules as continuous, differentiable layers, enabling symbolic knowledge injection into subsymbolic domains. We demonstrate how DeepDFA can be used in two key settings: (i) static image sequence classification, and (ii) policy learning in interactive non-Markovian environments. Across extensive experiments, DeepDFA outperforms traditional deep learning models (e.g., LSTMs, GRUs, Transformers) and novel neuro-symbolic systems, achieving state-of-the-art results in temporal knowledge integration. These results highlight the potential of DeepDFA to bridge subsymbolic learning and symbolic reasoning in sequential tasks.
- [587] arXiv:2602.03489 [pdf, ps, other]
-
Title: Detecting and Explaining Malware Family Evolution Using Rule-Based Drift AnalysisSubjects: Cryptography and Security (cs.CR)
Malware detection and classification into families are critical tasks in cybersecurity, complicated by the continual evolution of malware to evade detection. This evolution introduces concept drift, in which the statistical properties of malware features change over time, reducing the effectiveness of static machine learning models. Understanding and explaining this drift is essential for maintaining robust and trustworthy malware detectors. In this paper, we propose an interpretable approach to concept drift detection. Our method uses a rule-based classifier to generate human-readable descriptions of both original and evolved malware samples belonging to the same malware family. By comparing the resulting rule sets using a similarity function, we can detect and quantify concept drift. Crucially, this comparison also identifies the specific features and feature values that have changed, providing clear explanations of how malware has evolved to bypass detection. Experimental results demonstrate that the proposed method not only accurately detects drift but also provides actionable insights into the behavior of evolving malware families, supporting both detection and threat analysis.
- [588] arXiv:2602.03490 [pdf, ps, other]
-
Title: A Minimal Task Reveals Emergent Path Integration and Object-Location Binding in a Predictive Sequence ModelComments: 7 pages, 4 figuresSubjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Adaptive cognition requires structured internal models representing objects and their relations. Predictive neural networks are often proposed to form such "world models", yet their underlying mechanisms remain unclear. One hypothesis is that action-conditioned sequential prediction suffices for learning such world models. In this work, we investigate this possibility in a minimal in-silico setting. Sequentially sampling tokens from 2D continuous token scenes, a recurrent neural network is trained to predict the upcoming token from current input and a saccade-like displacement. On novel scenes, prediction accuracy improves across the sequence, indicating in-context learning. Decoding analyses reveal path integration and dynamic binding of token identity to position. Interventional analyses show that new bindings can be learned late in sequence and that out-of-distribution bindings can be learned. Together, these results demonstrate how structured representations that rely on flexible binding emerge to support prediction, offering a mechanistic account of sequential world modeling relevant to cognitive science.
- [589] arXiv:2602.03491 [pdf, ps, other]
-
Title: Decoupling Skeleton and Flesh: Efficient Multimodal Table Reasoning with Disentangled Alignment and Structure-aware GuidanceSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Reasoning over table images remains challenging for Large Vision-Language Models (LVLMs) due to complex layouts and tightly coupled structure-content information. Existing solutions often depend on expensive supervised training, reinforcement learning, or external tools, limiting efficiency and scalability. This work addresses a key question: how to adapt LVLMs to table reasoning with minimal annotation and no external tools? Specifically, we first introduce DiSCo, a Disentangled Structure-Content alignment framework that explicitly separates structural abstraction from semantic grounding during multimodal alignment, efficiently adapting LVLMs to tables structures. Building on DiSCo, we further present Table-GLS, a Global-to-Local Structure-guided reasoning framework that performs table reasoning via structured exploration and evidence-grounded inference. Extensive experiments across diverse benchmarks demonstrate that our framework efficiently enhances LVLM's table understanding and reasoning capabilities, particularly generalizing to unseen table structures.
- [590] arXiv:2602.03493 [pdf, ps, other]
-
Title: Least but not Last: Fine-tuning Intermediate Principal Components for Better Performance-Forgetting Trade-OffsSubjects: Machine Learning (cs.LG)
Low-Rank Adaptation (LoRA) methods have emerged as crucial techniques for adapting large pre-trained models to downstream tasks under computational and memory constraints. However, they face a fundamental challenge in balancing task-specific performance gains against catastrophic forgetting of pre-trained knowledge, where existing methods provide inconsistent recommendations. This paper presents a comprehensive analysis of the performance-forgetting trade-offs inherent in low-rank adaptation using principal components as initialization. Our investigation reveals that fine-tuning intermediate components leads to better balance and show more robustness to high learning rates than first (PiSSA) and last (MiLoRA) components in existing work. Building on these findings, we provide a practical approach for initialization of LoRA that offers superior trade-offs. We demonstrate in a thorough empirical study on a variety of computer vision and NLP tasks that our approach improves accuracy and reduces forgetting, also in continual learning scenarios.
- [591] arXiv:2602.03495 [pdf, ps, other]
-
Title: DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCsAuthors: Zeyu Zhu, Gang Li, Peisong Wang, Zitao Mo, Minnan Pei, Zhuoran Song, Xiaoyao Liang, Jian ChengSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Mixture of Experts (MoE) architectures significantly enhance the capacity of LLMs without proportional increases in computation, but at the cost of a vast parameter size. Offloading MoE expert parameters to host memory and leveraging both CPU and GPU computation has recently emerged as a promising direction to support such models on resourceconstrained local PC platforms. While promising, we notice that existing approaches mismatch the dynamic nature of expert workloads, which leads to three fundamental inefficiencies: (1) Static expert assignment causes severe CPUGPU load imbalance, underutilizing CPU and GPU resources; (2) Existing prefetching techniques fail to accurately predict high-workload experts, leading to costly inaccurate prefetches; (3) GPU cache policies neglect workload dynamics, resulting in poor hit rates and limited effectiveness. To address these challenges, we propose DALI, a workloaDAware offLoadIng framework for efficient MoE inference on local PCs. To fully utilize hardware resources, DALI first dynamically assigns experts to CPU or GPU by modeling assignment as a 0-1 integer optimization problem and solving it efficiently using a Greedy Assignment strategy at runtime. To improve prefetching accuracy, we develop a Residual-Based Prefetching method leveraging inter-layer residual information to accurately predict high-workload experts. Additionally, we introduce a Workload-Aware Cache Replacement policy that exploits temporal correlation in expert activations to improve GPU cache efficiency. By evaluating across various MoE models and settings, DALI achieves significant speedups in the both prefill and decoding phases over the state-of-the-art offloading frameworks.
- [592] arXiv:2602.03496 [pdf, ps, other]
-
Title: Lookahead Path Likelihood Optimization for Diffusion LLMsSubjects: Machine Learning (cs.LG)
Diffusion Large Language Models (dLLMs) support arbitrary-order generation, yet their inference performance critically depends on the unmasking order. Existing strategies rely on heuristics that greedily optimize local confidence, offering limited guidance for identifying unmasking paths that are globally consistent and accurate. To bridge this gap, we introduce path log-likelihood (Path LL), a trajectory-conditioned objective that strongly correlates with downstream accuracy and enables principled selection of unmasking paths. To optimize Path LL at inference time, we propose POKE, an efficient value estimator that predicts the expected future Path LL of a partial decoding trajectory. We then integrate this lookahead signal into POKE-SMC, a Sequential Monte Carlo-based search framework for dynamically identifying optimal unmasking paths. Extensive experiments across 6 reasoning tasks show that POKE-SMC consistently improves accuracy, achieving 2%--3% average gains over strong decoding-time scaling baselines at comparable inference overhead on LLaDA models and advancing the accuracy--compute Pareto frontier.
- [593] arXiv:2602.03501 [pdf, ps, other]
-
Title: Reparameterization Flow Policy OptimizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Reparameterization Policy Gradient (RPG) has emerged as a powerful paradigm for model-based reinforcement learning, enabling high sample efficiency by backpropagating gradients through differentiable dynamics. However, prior RPG approaches have been predominantly restricted to Gaussian policies, limiting their performance and failing to leverage recent advances in generative models. In this work, we identify that flow policies, which generate actions via differentiable ODE integration, naturally align with the RPG framework, a connection not established in prior work. However, naively exploiting this synergy proves ineffective, often suffering from training instability and a lack of exploration. We propose Reparameterization Flow Policy Optimization (RFO). RFO computes policy gradients by backpropagating jointly through the flow generation process and system dynamics, unlocking high sample efficiency without requiring intractable log-likelihood calculations. RFO includes two tailored regularization terms for stability and exploration. We also propose a variant of RFO with action chunking. Extensive experiments on diverse locomotion and manipulation tasks, involving both rigid and soft bodies with state or visual inputs, demonstrate the effectiveness of RFO. Notably, on a challenging locomotion task controlling a soft-body quadruped, RFO achieves almost $2\times$ the reward of the state-of-the-art baseline.
- [594] arXiv:2602.03505 [pdf, ps, other]
-
Title: Generative Decompression: Optimal Lossy Decoding Against Distribution MismatchSubjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
This paper addresses optimal decoding strategies in lossy compression where the assumed distribution for compressor design mismatches the actual (true) distribution of the source. This problem has immediate relevance in standardized communication systems where the decoder acquires side information or priors about the true distribution that are unavailable to the fixed encoder. We formally define the mismatched quantization problem, demonstrating that the optimal reconstruction rule, termed generative decompression, aligns with classical Bayesian estimation by taking the conditional expectation under the true distribution given the quantization indices and adapting it to fixed-encoder constraints. This strategy effectively performs a generative Bayesian correction on the decoder side, strictly outperforming the conventional centroid rule. We extend this framework to transmission over noisy channels, deriving a robust soft-decoding rule that quantifies the inefficiency of standard modular source--channel separation architectures under mismatch. Furthermore, we generalize the approach to task-oriented decoding, showing that the optimal strategy shifts from conditional mean estimation to maximum a posteriori (MAP) detection. Experimental results on Gaussian sources and deep-learning-based semantic classification demonstrate that generative decompression closes a vast majority of the performance gap to the ideal joint-optimization benchmark, enabling adaptive, high-fidelity reconstruction without modifying the encoder.
- [595] arXiv:2602.03506 [pdf, ps, other]
-
Title: Explaining the Explainer: Understanding the Inner Workings of Transformer-based Symbolic Regression ModelsComments: 8 pages, 5 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Following their success across many domains, transformers have also proven effective for symbolic regression (SR); however, the internal mechanisms underlying their generation of mathematical operators remain largely unexplored. Although mechanistic interpretability has successfully identified circuits in language and vision models, it has not yet been applied to SR. In this article, we introduce PATCHES, an evolutionary circuit discovery algorithm that identifies compact and correct circuits for SR. Using PATCHES, we isolate 28 circuits, providing the first circuit-level characterisation of an SR transformer. We validate these findings through a robust causal evaluation framework based on key notions such as faithfulness, completeness, and minimality. Our analysis shows that mean patching with performance-based evaluation most reliably isolates functionally correct circuits. In contrast, we demonstrate that direct logit attribution and probing classifiers primarily capture correlational features rather than causal ones, limiting their utility for circuit discovery. Overall, these results establish SR as a high-potential application domain for mechanistic interpretability and propose a principled methodology for circuit discovery.
- [596] arXiv:2602.03507 [pdf, ps, other]
-
Title: Learning to Reason Faithfully through Step-Level Faithfulness MaximizationSubjects: Computation and Language (cs.CL)
Reinforcement Learning with Verifiable Rewards (RLVR) has markedly improved the performance of Large Language Models (LLMs) on tasks requiring multi-step reasoning. However, most RLVR pipelines rely on sparse outcome-based rewards, providing little supervision over intermediate steps and thus encouraging over-confidence and spurious reasoning, which in turn increases hallucinations. To address this, we propose FaithRL, a general reinforcement learning framework that directly optimizes reasoning faithfulness. We formalize a faithfulness-maximization objective and theoretically show that optimizing it mitigates over-confidence. To instantiate this objective, we introduce a geometric reward design and a faithfulness-aware advantage modulation mechanism that assigns step-level credit by penalizing unsupported steps while preserving valid partial derivations. Across diverse backbones and benchmarks, FaithRL consistently reduces hallucination rates while maintaining (and often improving) answer correctness. Further analysis confirms that FaithRL increases step-wise reasoning faithfulness and generalizes robustly. Our code is available at https://github.com/aintdoin/FaithRL.
- [597] arXiv:2602.03510 [pdf, ps, other]
-
Title: Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion TransformersAuthors: Bozhou Li, Yushuo Guan, Haolin Li, Bohan Zeng, Yiyan Ji, Yue Ding, Pengfei Wan, Kun Gai, Yuanxing Zhang, Wentao ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
Recent DiT-based text-to-image models increasingly adopt LLMs as text encoders, yet text conditioning remains largely static and often utilizes only a single LLM layer, despite pronounced semantic hierarchy across LLM layers and non-stationary denoising dynamics over both diffusion time and network depth. To better match the dynamic process of DiT generation and thereby enhance the diffusion model's generative capability, we introduce a unified normalized convex fusion framework equipped with lightweight gates to systematically organize multi-layer LLM hidden states via time-wise, depth-wise, and joint fusion. Experiments establish Depth-wise Semantic Routing as the superior conditioning strategy, consistently improving text-image alignment and compositional generation (e.g., +9.97 on the GenAI-Bench Counting task). Conversely, we find that purely time-wise fusion can paradoxically degrade visual generation fidelity. We attribute this to a train-inference trajectory mismatch: under classifier-free guidance, nominal timesteps fail to track the effective SNR, causing semantically mistimed feature injection during inference. Overall, our results position depth-wise routing as a strong and effective baseline and highlight the critical need for trajectory-aware signals to enable robust time-dependent conditioning.
- [598] arXiv:2602.03511 [pdf, ps, other]
-
Title: CMR: Contractive Mapping Embeddings for Robust Humanoid Locomotion on Unstructured TerrainsSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Robust disturbance rejection remains a longstanding challenge in humanoid locomotion, particularly on unstructured terrains where sensing is unreliable and model mismatch is pronounced. While perception information, such as height map, enhances terrain awareness, sensor noise and sim-to-real gaps can destabilize policies in practice. In this work, we provide theoretical analysis that bounds the return gap under observation noise, when the induced latent dynamics are contractive. Furthermore, we present Contractive Mapping for Robustness (CMR) framework that maps high-dimensional, disturbance-prone observations into a latent space, where local perturbations are attenuated over time. Specifically, this approach couples contrastive representation learning with Lipschitz regularization to preserve task-relevant geometry while explicitly controlling sensitivity. Notably, the formulation can be incorporated into modern deep reinforcement learning pipelines as an auxiliary loss term with minimal additional technical effort required. Further, our extensive humanoid experiments show that CMR potently outperforms other locomotion algorithms under increased noise.
- [599] arXiv:2602.03514 [pdf, ps, other]
-
Title: A Function-Space Stability Boundary for Generalization in Interpolating Learning SystemsAuthors: Ronald KatendeComments: 10 pages, 8 figures,Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Modern learning systems often interpolate training data while still generalizing well, yet it remains unclear when algorithmic stability explains this behavior. We model training as a function-space trajectory and measure sensitivity to single-sample perturbations along this trajectory.
We propose a contractive propagation condition and a stability certificate obtained by unrolling the resulting recursion. A small certificate implies stability-based generalization, while we also prove that there exist interpolating regimes with small risk where such contractive sensitivity cannot hold, showing that stability is not a universal explanation.
Experiments confirm that certificate growth predicts generalization differences across optimizers, step sizes, and dataset perturbations. The framework therefore identifies regimes where stability explains generalization and where alternative mechanisms must account for success. - [600] arXiv:2602.03515 [pdf, ps, other]
-
Title: Mitigating Staleness in Asynchronous Pipeline Parallelism via Basis RotationComments: Preprint. Under reviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Asynchronous pipeline parallelism maximizes hardware utilization by eliminating the pipeline bubbles inherent in synchronous execution, offering a path toward efficient large-scale distributed training. However, this efficiency gain can be compromised by gradient staleness, where the immediate model updates with delayed gradients introduce noise into the optimization process. Crucially, we identify a critical, yet often overlooked, pathology: this delay scales linearly with pipeline depth, fundamentally undermining the very scalability that the method originally intends to provide. In this work, we investigate this inconsistency and bridge the gap by rectifying delayed gradients through basis rotation, restoring scalable asynchronous training while maintaining performance. Specifically, we observe that the deleterious effects of delayed gradients are exacerbated when the Hessian eigenbasis is misaligned with the standard coordinate basis. We demonstrate that this misalignment prevents coordinate-wise adaptive schemes, such as Adam, from effectively leveraging curvature-aware adaptivity. This failure leads to significant oscillations in the optimization trajectory and, consequently, slower convergence. We substantiate these findings through both rigorous theoretical analysis and empirical evaluation. To address this challenge, we propose the use of basis rotation, demonstrating that it effectively mitigates the alignment issue and significantly accelerates convergence in asynchronous settings. For example, our training of a 1B-parameter LLM with basis rotation achieves the same training loss in 76.8% fewer iterations compared to the best-performing asynchronous pipeline parallel training baseline.
- [601] arXiv:2602.03516 [pdf, ps, other]
-
Title: Not All Negative Samples Are Equal: LLMs Learn Better from Plausible ReasoningAuthors: Zixiang Di, Jinyi Han, Shuo Zhang, Ying Liao, Zhi Li, Xiaofeng Ji, Yongqi Wang, Zheming Yang, Ming Gao, Bingdong Li, Jie WangSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Learning from negative samples holds great promise for improving Large Language Model (LLM) reasoning capability, yet existing methods treat all incorrect responses as equally informative, overlooking the crucial role of sample quality. To address this, we propose Plausible Negative Samples (PNS), a method that synthesizes high-quality negative samples exhibiting expected format and structural coherence while ultimately yielding incorrect answers. PNS trains a dedicated model via reverse reinforcement learning (RL) guided by a composite reward combining format compliance, accuracy inversion, reward model assessment, and chain-of-thought evaluation, generating responses nearly indistinguishable from correct solutions. We further validate PNS as a plug-and-play data source for preference optimization across three backbone models on seven mathematical reasoning benchmarks. Results demonstrate that PNS consistently outperforms other negative sample synthesis methods, achieving an average improvement of 2.03% over RL-trained models.
- [602] arXiv:2602.03517 [pdf, ps, other]
-
Title: Rank-Learner: Orthogonal Ranking of Treatment EffectsSubjects: Machine Learning (cs.LG)
Many decision-making problems require ranking individuals by their treatment effects rather than estimating the exact effect magnitudes. Examples include prioritizing patients for preventive care interventions, or ranking customers by the expected incremental impact of an advertisement. Surprisingly, while causal effect estimation has received substantial attention in the literature, the problem of directly learning rankings of treatment effects has largely remained unexplored. In this paper, we introduce Rank-Learner, a novel two-stage learner that directly learns the ranking of treatment effects from observational data. We first show that naive approaches based on precise treatment effect estimation solve a harder problem than necessary for ranking, while our Rank-Learner optimizes a pairwise learning objective that recovers the true treatment effect ordering, without explicit CATE estimation. We further show that our Rank-Learner is Neyman-orthogonal and thus comes with strong theoretical guarantees, including robustness to estimation errors in the nuisance functions. In addition, our Rank-Learner is model-agnostic, and can be instantiated with arbitrary machine learning models (e.g., neural networks). We demonstrate the effectiveness of our method through extensive experiments where Rank-Learner consistently outperforms standard CATE estimators and non-orthogonal ranking methods. Overall, we provide practitioners with a new, orthogonal two-stage learner for ranking individuals by their treatment effects.
- [603] arXiv:2602.03520 [pdf, ps, other]
-
Title: Live or Lie: Action-Aware Capsule Multiple Instance Learning for Risk Assessment in Live Streaming PlatformsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Live streaming has become a cornerstone of today's internet, enabling massive real-time social interactions. However, it faces severe risks arising from sparse, coordinated malicious behaviors among multiple participants, which are often concealed within normal activities and challenging to detect timely and accurately. In this work, we provide a pioneering study on risk assessment in live streaming rooms, characterized by weak supervision where only room-level labels are available. We formulate the task as a Multiple Instance Learning (MIL) problem, treating each room as a bag and defining structured user-timeslot capsules as instances. These capsules represent subsequences of user actions within specific time windows, encapsulating localized behavioral patterns. Based on this formulation, we propose AC-MIL, an Action-aware Capsule MIL framework that models both individual behaviors and group-level coordination patterns. AC-MIL captures multi-granular semantics and behavioral cues through a serial and parallel architecture that jointly encodes temporal dynamics and cross-user dependencies. These signals are integrated for robust room-level risk prediction, while also offering interpretable evidence at the behavior segment level. Extensive experiments on large-scale industrial datasets from Douyin demonstrate that AC-MIL significantly outperforms MIL and sequential baselines, establishing new state-of-the-art performance in room-level risk assessment for live streaming. Moreover, AC-MIL provides capsule-level interpretability, enabling identification of risky behavior segments as actionable evidence for intervention. The project page is available at: https://qiaoyran.github.io/AC-MIL/.
- [604] arXiv:2602.03521 [pdf, ps, other]
-
Title: Real-world energy data of 200 feeders from low-voltage grids with metadata in Germany over two yearsAuthors: Manuel Treutlein, Pascal Bothe, Marc Schmidt, Roman Hahn, Oliver Neumann, Ralf Mikut, Veit HagenmeyerComments: 20 pages, 6 Figures, 6 Tables. Data is available on Zenodo: this https URLSubjects: Systems and Control (eess.SY)
The last mile of the distribution grid is crucial for a successful energy transition, as more low-carbon technology like photovoltaic systems, heat pumps, and electric vehicle chargers connect to the low-voltage grid. Despite considerable challenges in operation and planning, researchers often lack access to suitable low-voltage grid data. To address this, we present the FeederBW dataset with data recorded by the German distribution system operator Netze BW. It offers real-world energy data from 200 low-voltage feeders over two years (2023-2025) with weather information and detailed metadata, including changes in low-carbon technology installations. The dataset includes feeder-specific details such as the number of housing units, installed power of low-carbon technology, and aggregated industrial energy data. Furthermore, high photovoltaic feed-in and one-minute temporal resolution makes the dataset unique. FeederBW supports various applications, including machine learning for load forecasting, conducting non-intrusive load monitoring, generating synthetic data, and analyzing the interplay between weather, feeder measurements, and metadata. The dataset reveals insightful patterns and clearly reflects the growing impact of low-carbon technology on low-voltage grids.
- [605] arXiv:2602.03523 [pdf, ps, other]
-
Title: D3PIA: A Discrete Denoising Diffusion Model for Piano Accompaniment Generation From Lead sheetComments: Accepted at 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Generating piano accompaniments in the symbolic music domain is a challenging task that requires producing a complete piece of piano music from given melody and chord constraints, such as those provided by a lead sheet. In this paper, we propose a discrete diffusion-based piano accompaniment generation model, D3PIA, leveraging local alignment between lead sheet and accompaniment in piano-roll representation. D3PIA incorporates Neighborhood Attention (NA) to both encode the lead sheet and condition it for predicting note states in the piano accompaniment. This design enhances local contextual modeling by efficiently attending to nearby melody and chord conditions. We evaluate our model using the POP909 dataset, a widely used benchmark for piano accompaniment generation. Objective evaluation results demonstrate that D3PIA preserves chord conditions more faithfully compared to continuous diffusion-based and Transformer-based baselines. Furthermore, a subjective listening test indicates that D3PIA generates more musically coherent accompaniments than the comparison models.
- [606] arXiv:2602.03525 [pdf, ps, other]
-
Title: ZOR filters: fast and smaller than fuse filtersAuthors: Antoine LimassetSubjects: Data Structures and Algorithms (cs.DS)
Probabilistic membership filters support fast approximate membership queries with a controlled false-positive probability $\varepsilon$ and are widely used across storage, analytics, networking, and bioinformatics \cite{chang2008bigtable,dayan2018optimalbloom,broder2004network,harris2020improved,marchet2023scalable,chikhi2025logan,hernandez2025reindeer2}. In the static setting, state-of-the-art designs such as XOR and fuse filters achieve low overhead and very fast queries, but their peeling-based construction succeeds only with high probability, which complicates deterministic builds \cite{graf2020xor,graf2022binary,ulrich2023taxor}.
We introduce \emph{ZOR filters}, a deterministic continuation of XOR/fuse filters that guarantees construction termination while preserving the same XOR-based query mechanism. ZOR replaces restart-on-failure with deterministic peeling that abandons a small fraction of keys, and restores false-positive-only semantics by storing the remainder in a compact auxiliary structure. In our experiments, the abandoned fraction drops below $1\%$ for moderate arity (e.g., $N\ge 5$), so the auxiliary handles a negligible fraction of keys. As a result, ZOR filters can achieve overhead within $1\%$ of the information-theoretic lower bound $\log_2(1/\varepsilon)$ while retaining fuse-like query performance; the additional cost is concentrated on negative queries due to the auxiliary check. Our current prototype builds several-fold slower than highly optimized fuse builders because it maintains explicit incidence information during deterministic peeling; closing this optimisation gap is an engineering target. - [607] arXiv:2602.03527 [pdf, ps, other]
-
Title: WARP Logic Neural NetworksComments: Under reviewSubjects: Machine Learning (cs.LG)
Fast and efficient AI inference is increasingly important, and recent models that directly learn low-level logic operations have achieved state-of-the-art performance. However, existing logic neural networks incur high training costs, introduce redundancy or rely on approximate gradients, which limits scalability. To overcome these limitations, we introduce WAlsh Relaxation for Probabilistic (WARP) logic neural networks -- a novel gradient-based framework that efficiently learns combinations of hardware-native logic blocks. We show that WARP yields the most parameter-efficient representation for exactly learning Boolean functions and that several prior approaches arise as restricted special cases. Training is improved by introducing learnable thresholding and residual initialization, while we bridge the gap between relaxed training and discrete logic inference through stochastic smoothing. Experiments demonstrate faster convergence than state-of-the-art baselines, while scaling effectively to deeper architectures and logic functions with higher input arity.
- [608] arXiv:2602.03529 [pdf, ps, other]
-
Title: Morphe: High-Fidelity Generative Video Streaming with Vision Foundation ModelAuthors: Tianyi Gong, Zijian Cao, Zixing Zhang, Jiangkai Wu, Xinggong Zhang, Shuguang Cui, Fangxin WangComments: Accepted by NSDI 2026 FallSubjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Video streaming is a fundamental Internet service, while the quality still cannot be guaranteed especially in poor network conditions such as bandwidth-constrained and remote areas. Existing works mainly work towards two directions: traditional pixel-codec streaming nearly approaches its limit and is hard to step further in compression; the emerging neural-enhanced or generative streaming usually fall short in latency and visual fidelity, hindering their practical deployment. Inspired by the recent success of vision foundation model (VFM), we strive to harness the powerful video understanding and processing capacities of VFM to achieve generalization, high fidelity and loss resilience for real-time video streaming with even higher compression rate. We present the first revolutionized paradigm that enables VFM-based end-to-end generative video streaming towards this goal. Specifically, Morphe employs joint training of visual tokenizers and variable-resolution spatiotemporal optimization under simulated network constraints. Additionally, a robust streaming system is constructed that leverages intelligent packet dropping to resist real-world network perturbations. Extensive evaluation demonstrates that Morphe achieves comparable visual quality while saving 62.5\% bandwidth compared to H.265, and accomplishes real-time, loss-resilient video delivery in challenging network environments, representing a milestone in VFM-enabled multimedia streaming solutions.
- [609] arXiv:2602.03530 [pdf, ps, other]
-
Title: Interpretable Logical Anomaly Classification via Constraint Decomposition and Instruction Fine-TuningComments: 6 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Logical anomalies are violations of predefined constraints on object quantity, spatial layout, and compositional relationships in industrial images. While prior work largely treats anomaly detection as a binary decision, such formulations cannot indicate which logical rule is broken and therefore offer limited value for quality assurance. We introduce Logical Anomaly Classification (LAC), a task that unifies anomaly detection and fine-grained violation classification in a single inference step. To tackle LAC, we propose LogiCls, a vision-language framework that decomposes complex logical constraints into a sequence of verifiable subqueries. We further present a data-centric instruction synthesis pipeline that generates chain-of-thought (CoT) supervision for these subqueries, coupling precise grounding annotations with diverse image-text augmentations to adapt vision language models (VLMs) to logic-sensitive reasoning. Training is stabilized by a difficulty-aware resampling strategy that emphasizes challenging subqueries and long tail constraint types. Extensive experiments demonstrate that LogiCls delivers robust, interpretable, and accurate industrial logical anomaly classification, providing both the predicted violation categories and their evidence trails.
- [610] arXiv:2602.03531 [pdf, ps, other]
-
Title: Robust Representation Learning in Masked AutoencodersComments: 11 pages, 8 figures, and 3 tablesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Masked Autoencoders (MAEs) achieve impressive performance in image classification tasks, yet the internal representations they learn remain less understood. This work started as an attempt to understand the strong downstream classification performance of MAE. In this process we discover that representations learned with the pretraining and fine-tuning, are quite robust - demonstrating a good classification performance in the presence of degradations, such as blur and occlusions. Through layer-wise analysis of token embeddings, we show that pretrained MAE progressively constructs its latent space in a class-aware manner across network depth: embeddings from different classes lie in subspaces that become increasingly separable. We further observe that MAE exhibits early and persistent global attention across encoder layers, in contrast to standard Vision Transformers (ViTs). To quantify feature robustness, we introduce two sensitivity indicators: directional alignment between clean and perturbed embeddings, and head-wise retention of active features under degradations. These studies help establish the robust classification performance of MAEs.
- [611] arXiv:2602.03533 [pdf, ps, other]
-
Title: PnP-U3D: Plug-and-Play 3D Framework Bridging Autoregression and Diffusion for Unified Understanding and GenerationComments: Yongwei Chen and Tianyi Wei contributed equally. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
The rapid progress of large multimodal models has inspired efforts toward unified frameworks that couple understanding and generation. While such paradigms have shown remarkable success in 2D, extending them to 3D remains largely underexplored. Existing attempts to unify 3D tasks under a single autoregressive (AR) paradigm lead to significant performance degradation due to forced signal quantization and prohibitive training cost. Our key insight is that the essential challenge lies not in enforcing a unified autoregressive paradigm, but in enabling effective information interaction between generation and understanding while minimally compromising their inherent capabilities and leveraging pretrained models to reduce training cost. Guided by this perspective, we present the first unified framework for 3D understanding and generation that combines autoregression with diffusion. Specifically, we adopt an autoregressive next-token prediction paradigm for 3D understanding, and a continuous diffusion paradigm for 3D generation. A lightweight transformer bridges the feature space of large language models and the conditional space of 3D diffusion models, enabling effective cross-modal information exchange while preserving the priors learned by standalone models. Extensive experiments demonstrate that our framework achieves state-of-the-art performance across diverse 3D understanding and generation benchmarks, while also excelling in 3D editing tasks. These results highlight the potential of unified AR+diffusion models as a promising direction for building more general-purpose 3D intelligence.
- [612] arXiv:2602.03535 [pdf, ps, other]
-
Title: Sparse Training of Neural Networks based on Multilevel Mirror DescentSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
We introduce a dynamic sparse training algorithm based on linearized Bregman iterations / mirror descent that exploits the naturally incurred sparsity by alternating between periods of static and dynamic sparsity pattern updates. The key idea is to combine sparsity-inducing Bregman iterations with adaptive freezing of the network structure to enable efficient exploration of the sparse parameter space while maintaining sparsity. We provide convergence guaranties by embedding our method in a multilevel optimization framework. Furthermore, we empirically show that our algorithm can produce highly sparse and accurate models on standard benchmarks. We also show that the theoretical number of FLOPs compared to SGD training can be reduced from 38% for standard Bregman iterations to 6% for our method while maintaining test accuracy.
- [613] arXiv:2602.03537 [pdf, ps, other]
-
Title: MatGPTQ: Accurate and Efficient Post-Training Matryoshka QuantizationComments: PreprintSubjects: Machine Learning (cs.LG)
Matryoshka Quantization (MatQuant) is a recent quantization approach showing that a single integer-quantized model can be served across multiple precisions, by slicing the most significant bits (MSB) at inference time. This enables a single checkpoint to cover a wide range of memory and latency budgets, but renders quantization much more challenging. In particular, the initial MatQuant relies on expensive quantization-aware training (QAT) variants, rather than fast one-shot post training quantization (PTQ), and lacks open-source and kernel support. We address all of these limitations by introducing Post-Training Matryoshka Quantization (MatGPTQ), a new PTQ pipeline that produces a single parent model jointly optimized for multiple target precisions in one-shot, based on a small calibration set. MatGPTQ casts Matryoshka quantization as a multi-precision objective with bit-slicing and cross-bit error compensation, resulting in an algorithm that produces a multi-bit-width, "sliceable" model in a single pass. We also incorporate a new budget-aware search for heterogeneous per-layer bit-witdhs and provide efficient kernels that implement slicing and mixed-precision execution. Across standard LLMs and benchmarks, MatGPTQ preserves high-bit accuracy while substantially improving performance at low-bit-witdh settings. Overall, we establish a new state of the art for Matryoshka-style post-training quantization and make single-checkpoint, multi-precision deployment open and practical. Code is available at https://github.com/IST-DASLab/MatGPTQ.
- [614] arXiv:2602.03538 [pdf, ps, other]
-
Title: Constrained Dynamic Gaussian SplattingAuthors: Zihan Zheng, Zhenglong Wu, Xuanxuan Wang, Houqiang Zhong, Xiaoyun Zhang, Qiang Hu, Guangtao Zhai, Wenjun ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
While Dynamic Gaussian Splatting enables high-fidelity 4D reconstruction, its deployment is severely hindered by a fundamental dilemma: unconstrained densification leads to excessive memory consumption incompatible with edge devices, whereas heuristic pruning fails to achieve optimal rendering quality under preset Gaussian budgets. In this work, we propose Constrained Dynamic Gaussian Splatting (CDGS), a novel framework that formulates dynamic scene reconstruction as a budget-constrained optimization problem to enforce a strict, user-defined Gaussian budget during training. Our key insight is to introduce a differentiable budget controller as the core optimization driver. Guided by a multi-modal unified importance score, this controller fuses geometric, motion, and perceptual cues for precise capacity regulation. To maximize the utility of this fixed budget, we further decouple the optimization of static and dynamic elements, employing an adaptive allocation mechanism that dynamically distributes capacity based on motion complexity. Furthermore, we implement a three-phase training strategy to seamlessly integrate these constraints, ensuring precise adherence to the target count. Coupled with a dual-mode hybrid compression scheme, CDGS not only strictly adheres to hardware constraints (error < 2%}) but also pushes the Pareto frontier of rate-distortion performance. Extensive experiments demonstrate that CDGS delivers optimal rendering quality under varying capacity limits, achieving over 3x compression compared to state-of-the-art methods.
- [615] arXiv:2602.03541 [pdf, ps, other]
-
Title: Group Selection as a Safeguard Against AI SubstitutionComments: 19 pages, 7 FiguresSubjects: Artificial Intelligence (cs.AI); Theoretical Economics (econ.TH)
Reliance on generative AI can reduce cultural variance and diversity, especially in creative work. This reduction in variance has already led to problems in model performance, including model collapse and hallucination. In this paper, we examine the long-term consequences of AI use for human cultural evolution and the conditions under which widespread AI use may lead to "cultural collapse", a process in which reliance on AI-generated content reduces human variation and innovation and slows cumulative cultural evolution. Using an agent-based model and evolutionary game theory, we compare two types of AI use: complement and substitute. AI-complement users seek suggestions and guidance while remaining the main producers of the final output, whereas AI-substitute users provide minimal input, and rely on AI to produce most of the output. We then study how these use strategies compete and spread under evolutionary dynamics. We find that AI-substitute users prevail under individual-level selection despite the stronger reduction in cultural variance. By contrast, AI-complement users can benefit their groups by maintaining the variance needed for exploration, and can therefore be favored under cultural group selection when group boundaries are strong. Overall, our findings shed light on the long-term, population-level effects of AI adoption and inform policy and organizational strategies to mitigate these risks.
- [616] arXiv:2602.03542 [pdf, ps, other]
-
Title: Can Large Language Models Generalize Procedures Across Representations?Authors: Fangru Lin, Valentin Hofmann, Xingchen Wan, Weixing Wang, Zifeng Ding, Anthony G. Cohn, Janet B. PierrehumbertSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Large language models (LLMs) are trained and tested extensively on symbolic representations such as code and graphs, yet real-world user tasks are often specified in natural language. To what extent can LLMs generalize across these representations? Here, we approach this question by studying isomorphic tasks involving procedures represented in code, graphs, and natural language (e.g., scheduling steps in planning). We find that training LLMs with popular post-training methods on graphs or code data alone does not reliably generalize to corresponding natural language tasks, while training solely on natural language can lead to inefficient performance gains. To address this gap, we propose a two-stage data curriculum that first trains on symbolic, then natural language data. The curriculum substantially improves model performance across model families and tasks. Remarkably, a 1.5B Qwen model trained by our method can closely match zero-shot GPT-4o in naturalistic planning. Finally, our analysis suggests that successful cross-representation generalization can be interpreted as a form of generative analogy, which our curriculum effectively encourages.
- [617] arXiv:2602.03543 [pdf, ps, other]
-
Title: Sequential Linear Contracts on MatroidsSubjects: Computer Science and Game Theory (cs.GT)
In this work, we study sequential contracts under matroid constraints. In the sequential setting, an agent can take actions one by one. After each action, the agent observes the stochastic value of the action and then decides which action to take next, if any. At the end, the agent decides what subset of taken actions to use for the principal's reward; and the principal receives the total value of this subset as a reward. Taking each action induces a certain cost for the agent. Thus, to motivate the agent to take actions the principal is expected to offer an appropriate contract. A contract describes the payment from the principal to the agent as a function of the principal's reward obtained through the agent's actions. In this work, we concentrate on studying linear contracts, i.e.\ the contracts where the principal transfers a fraction of their total reward to the agent. We assume that the total principal's reward is calculated based on a subset of actions that forms an independent set in a given matroid. We establish a relationship between the problem of finding an optimal linear contract (or computing the corresponding principal's utility) and the so called matroid (un)reliability problem. Generally, the above problems turn out to be equivalent subject to adding parallel copies of elements to the given matroid.
- [618] arXiv:2602.03544 [pdf, ps, other]
-
Title: Investigating the Influence of Spatial Ability in Augmented Reality-assisted Robot ProgrammingSubjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
Augmented Reality (AR) offers promising opportunities to enhance learning, but its mechanisms and effects are not yet fully understood. As learning becomes increasingly personalized, considering individual learner characteristics becomes more important. This study investigates the moderating effect of spatial ability on learning experience with AR in the context of robot programming. A between-subjects experiment ($N=71$) compared conventional robot programming to an AR-assisted approach using a head-mounted display. Participants' spatial ability was assessed using the Mental Rotation Test. The learning experience was measured through the System Usability Scale (SUS) and cognitive load. The results indicate that AR support does not significantly improve the learning experience compared to the conventional approach. However, AR appears to have a compensatory effect on the influence of spatial ability. In the control group, spatial ability was significantly positively associated with SUS scores and negatively associated with extraneous cognitive load, indicating that higher spatial ability predicts a better learning experience. In the AR condition, these relationships were not observable, suggesting that AR mitigated the disadvantage typically experienced by learners with lower spatial abilities. These findings suggest that AR can serve a compensatory function by reducing the influence of learner characteristics. Future research should further explore this compensatory role of AR to guide the design of personalized learning environments that address diverse learner needs and reduce barriers for learners with varying cognitive profiles.
- [619] arXiv:2602.03545 [pdf, ps, other]
-
Title: Persona Generators: Generating Diverse Synthetic Personas at ScaleAuthors: Davide Paglieri, Logan Cross, William A. Cunningham, Joel Z. Leibo, Alexander Sasha VezhnevetsSubjects: Artificial Intelligence (cs.AI)
Evaluating AI systems that interact with humans requires understanding their behavior across diverse user populations, but collecting representative human data is often expensive or infeasible, particularly for novel technologies or hypothetical future scenarios. Recent work in Generative Agent-Based Modeling has shown that large language models can simulate human-like synthetic personas with high fidelity, accurately reproducing the beliefs and behaviors of specific individuals. However, most approaches require detailed data about target populations and often prioritize density matching (replicating what is most probable) rather than support coverage (spanning what is possible), leaving long-tail behaviors underexplored. We introduce Persona Generators, functions that can produce diverse synthetic populations tailored to arbitrary contexts. We apply an iterative improvement loop based on AlphaEvolve, using large language models as mutation operators to refine our Persona Generator code over hundreds of iterations. The optimization process produces lightweight Persona Generators that can automatically expand small descriptions into populations of diverse synthetic personas that maximize coverage of opinions and preferences along relevant diversity axes. We demonstrate that evolved generators substantially outperform existing baselines across six diversity metrics on held-out contexts, producing populations that span rare trait combinations difficult to achieve in standard LLM outputs.
- [620] arXiv:2602.03546 [pdf, ps, other]
-
Title: How to Train Your Resistive Network: Generalized Equilibrium Propagation and Analytical LearningComments: 8 pages double column; plus 16 supp mat.;Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Soft Condensed Matter (cond-mat.soft); Emerging Technologies (cs.ET)
Machine learning is a powerful method of extracting meaning from data; unfortunately, current digital hardware is extremely energy-intensive. There is interest in an alternative analog computing implementation that could match the performance of traditional machine learning while being significantly more energy-efficient. However, it remains unclear how to train such analog computing systems while adhering to locality constraints imposed by the physical (as opposed to digital) nature of these systems. Local learning algorithms such as Equilibrium Propagation and Coupled Learning have been proposed to address this issue. In this paper, we develop an algorithm to exactly calculate gradients using a graph theoretic and analytical framework for Kirchhoff's laws. We also introduce Generalized Equilibrium Propagation, a framework encompassing a broad class of Hebbian learning algorithms, including Coupled Learning and Equilibrium Propagation, and show how our algorithm compares. We demonstrate our algorithm using numerical simulations and show that we can train resistor networks without the need for a replica or readout over all resistors, only at the output layer. We also show that under the analytical gradient approach, it is possible to update only a subset of the resistance values without a strong degradation in performance.
- [621] arXiv:2602.03547 [pdf, ps, other]
-
Title: AffordanceGrasp-R1:Leveraging Reasoning-Based Affordance Segmentation with Reinforcement Learning for Robotic GraspingComments: Preprint versionSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
We introduce AffordanceGrasp-R1, a reasoning-driven affordance segmentation framework for robotic grasping that combines a chain-of-thought (CoT) cold-start strategy with reinforcement learning to enhance deduction and spatial grounding. In addition, we redesign the grasping pipeline to be more context-aware by generating grasp candidates from the global scene point cloud and subsequently filtering them using instruction-conditioned affordance masks. Extensive experiments demonstrate that AffordanceGrasp-R1 consistently outperforms state-of-the-art (SOTA) methods on benchmark datasets, and real-world robotic grasping evaluations further validate its robustness and generalization under complex language-conditioned manipulation scenarios.
- [622] arXiv:2602.03548 [pdf, ps, other]
-
Title: SEAD: Self-Evolving Agent for Multi-Turn Service DialogueAuthors: Yuqin Dai, Ning Gao, Wei Zhang, Jie Wang, Zichen Luo, Jinpeng Wang, Yujie Wang, Ruiyuan Wu, Chaozheng WangSubjects: Computation and Language (cs.CL)
Large Language Models have demonstrated remarkable capabilities in open-domain dialogues. However, current methods exhibit suboptimal performance in service dialogues, as they rely on noisy, low-quality human conversation data. This limitation arises from data scarcity and the difficulty of simulating authentic, goal-oriented user behaviors. To address these issues, we propose SEAD (Self-Evolving Agent for Service Dialogue), a framework that enables agents to learn effective strategies without large-scale human annotations. SEAD decouples user modeling into two components: a Profile Controller that generates diverse user states to manage training curriculum, and a User Role-play Model that focuses on realistic role-playing. This design ensures the environment provides adaptive training scenarios rather than acting as an unfair adversary. Experiments demonstrate that SEAD significantly outperforms Open-source Foundation Models and Closed-source Commercial Models, improving task completion rate by 17.6% and dialogue efficiency by 11.1%. Code is available at: https://github.com/Da1yuqin/SEAD.
- [623] arXiv:2602.03549 [pdf, ps, other]
-
Title: EarResp-ANS : Audio-Based On-Device Respiration Rate Estimation on Earphones with Adaptive Noise SuppressionComments: 31 pages, 11 figuresSubjects: Sound (cs.SD); Human-Computer Interaction (cs.HC)
Respiratory rate (RR) is a key vital sign for clinical assessment and mental well-being, yet it is rarely monitored in everyday life due to the lack of unobtrusive sensing technologies. In-ear audio sensing is promising due to its high social acceptance and the amplification of physiological sounds caused by the occlusion effect; however, existing approaches often fail under real-world noise or rely on computationally expensive models. We present EarResp-ANS, the first system enabling fully on-device, real-time RR estimation on commercial earphones. The system employs LMS-based adaptive noise suppression (ANS) to attenuate ambient noise while preserving respiration-related acoustic components, without requiring neural networks or audio streaming, thereby explicitly addressing the energy and privacy constraints of wearable devices. We evaluate EarResp-ANS in a study with 18 participants under realistic acoustic conditions, including music, cafeteria noise, and white noise up to 80 dB SPL. EarResp-ANS achieves robust performance with a global MAE of 0.84 CPM , reduced to 0.47 CPM via automatic outlier rejection, while operating with less than 2% processor load directly on the earphone.
- [624] arXiv:2602.03550 [pdf, ps, other]
-
Title: Formal Evidence Generation for Assurance Cases for Robotic Software ModelsComments: This is a preprint. The paper is currently under review at Software and Systems ModelingSubjects: Software Engineering (cs.SE); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO); Robotics (cs.RO)
Robotics and Autonomous Systems are increasingly deployed in safety-critical domains, so that demonstrating their safety is essential. Assurance Cases (ACs) provide structured arguments supported by evidence, but generating and maintaining this evidence is labour-intensive, error-prone, and difficult to keep consistent as systems evolve. We present a model-based approach to systematically generating AC evidence by embedding formal verification into the assurance workflow. The approach addresses three challenges: systematically deriving formal assertions from natural language requirements using templates, orchestrating multiple formal verification tools to handle diverse property types, and integrating formal evidence production into the workflow. Leveraging RoboChart, a domain-specific modelling language with formal semantics, we combine model checking and theorem proving in our approach. Structured requirements are automatically transformed into formal assertions using predefined templates, and verification results are automatically integrated as evidence. Case studies demonstrate the effectiveness of our approach.
- [625] arXiv:2602.03551 [pdf, ps, other]
-
Title: Assessing the Impact of Typological Features on Multilingual Machine Translation in the Age of Large Language ModelsComments: 19 pages, 11 figures, EACL 2026Subjects: Computation and Language (cs.CL)
Despite major advances in multilingual modeling, large quality disparities persist across languages. Besides the obvious impact of uneven training resources, typological properties have also been proposed to determine the intrinsic difficulty of modeling a language. The existing evidence, however, is mostly based on small monolingual language models or bilingual translation models trained from scratch. We expand on this line of work by analyzing two large pre-trained multilingual translation models, NLLB-200 and Tower+, which are state-of-the-art representatives of encoder-decoder and decoder-only machine translation, respectively. Based on a broad set of languages, we find that target language typology drives translation quality of both models, even after controlling for more trivial factors, such as data resourcedness and writing script. Additionally, languages with certain typological properties benefit more from a wider search of the output space, suggesting that such languages could profit from alternative decoding strategies beyond the standard left-to-right beam search. To facilitate further research in this area, we release a set of fine-grained typological properties for 212 languages of the FLORES+ MT evaluation benchmark.
- [626] arXiv:2602.03554 [pdf, ps, other]
-
Title: When Single Answer Is Not Enough: Rethinking Single-Step Retrosynthesis Benchmarks for LLMsAuthors: Bogdan Zagribelnyy, Ivan Ilin, Maksim Kuznetsov, Nikita Bondarev, Roman Schutski, Thomas MacDougall, Rim Shayakhmetov, Zulfat Miftakhutdinov, Mikolaj Mizera, Vladimir Aladinskiy, Alex Aliper, Alex ZhavoronkovSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL)
Recent progress has expanded the use of large language models (LLMs) in drug discovery, including synthesis planning. However, objective evaluation of retrosynthesis performance remains limited. Existing benchmarks and metrics typically rely on published synthetic procedures and Top-K accuracy based on single ground-truth, which does not capture the open-ended nature of real-world synthesis planning. We propose a new benchmarking framework for single-step retrosynthesis that evaluates both general-purpose and chemistry-specialized LLMs using ChemCensor, a novel metric for chemical plausibility. By emphasizing plausibility over exact match, this approach better aligns with human synthesis planning practices. We also introduce CREED, a novel dataset comprising millions of ChemCensor-validated reaction records for LLM training, and use it to train a model that improves over the LLM baselines under this benchmark.
- [627] arXiv:2602.03555 [pdf, ps, other]
-
Title: Cut to the Mix: Simple Data Augmentation Outperforms Elaborate Ones in Limited Organ Segmentation DatasetsComments: Accepted at MICCAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Multi-organ segmentation is a widely applied clinical routine and automated organ segmentation tools dramatically improve the pipeline of the radiologists. Recently, deep learning (DL) based segmentation models have shown the capacity to accomplish such a task. However, the training of the segmentation networks requires large amount of data with manual annotations, which is a major concern due to the data scarcity from clinic. Working with limited data is still common for researches on novel imaging modalities. To enhance the effectiveness of DL models trained with limited data, data augmentation (DA) is a crucial regularization technique. Traditional DA (TDA) strategies focus on basic intra-image operations, i.e. generating images with different orientations and intensity distributions. In contrast, the interimage and object-level DA operations are able to create new images from separate individuals. However, such DA strategies are not well explored on the task of multi-organ segmentation. In this paper, we investigated four possible inter-image DA strategies: CutMix, CarveMix, ObjectAug and AnatoMix, on two organ segmentation datasets. The result shows that CutMix, CarveMix and AnatoMix can improve the average dice score by 4.9, 2.0 and 1.9, compared with the state-of-the-art nnUNet without DA strategies. These results can be further improved by adding TDA strategies. It is revealed in our experiments that Cut-Mix is a robust but simple DA strategy to drive up the segmentation performance for multi-organ segmentation, even when CutMix produces intuitively 'wrong' images. Our implementation is publicly available for future benchmarks.
- [628] arXiv:2602.03556 [pdf, ps, other]
-
Title: Flaky Tests in a Large Industrial Database Management System: An Empirical Study of Fixed Issue Reports for SAP HANAComments: 8 pages, 2 tables, 5 figures, 3rd International Flaky Tests Workshop 2026 (FTW 2026)Subjects: Software Engineering (cs.SE)
Flaky tests yield different results when executed multiple times for the same version of the source code. Thus, they provide an ambiguous signal about the quality of the code and interfere with the automated assessment of code changes. While a variety of factors can cause test flakiness, approaches to fix flaky tests are typically tailored to address specific causes. However, the prevalent root causes of flaky tests can vary depending on the programming language, application domain, or size of the software project. Since manually labeling flaky tests is time-consuming and tedious, this work proposes an LLMs-as-annotators approach that leverages intra- and inter-model consistency to label issue reports related to fixed flakiness issues with the relevant root cause category. This allows us to gain an overview of prevalent flakiness categories in the issue reports. We evaluated our labeling approach in the context of SAP HANA, a large industrial database management system. Our results suggest that SAP HANA's tests most commonly suffer from issues related to concurrency (23%, 130 of 559 analyzed issue reports). Moreover, our results suggest that different test types face different flakiness challenges. Therefore, we encourage future research on flakiness mitigation to consider evaluating the generalizability of proposed approaches across different test types.
- [629] arXiv:2602.03557 [pdf, ps, other]
-
Title: Scaling Test-Driven Code Generation from Functions to Classes: An Empirical StudySubjects: Software Engineering (cs.SE)
Test-driven development (TDD) has been adopted to improve Large Language Model (LLM)-based code generation by using tests as executable specifications. However, existing TDD-style code generation studies are largely limited to function-level tasks, leaving class-level synthesis where multiple methods interact through shared state and call dependencies underexplored. In this paper, we scale test-driven code generation from functions to classes via an iterative TDD framework. Our approach first analyzes intra-class method dependencies to derive a feasible generation schedule, and then incrementally implements each method under method-level public tests with reflection-style execution feedback and bounded repair iterations. To support test-driven generation and rigorous class-level evaluation, we construct ClassEval-TDD, a cleaned and standardized variant of ClassEval with consistent specifications, deterministic test environments, and complete method-level public tests. We conduct an empirical study across eight LLMs and compare against the strongest direct-generation baseline (the best of holistic, incremental, and compositional strategies). Our class-level TDD framework consistently improves class-level correctness by 12 to 26 absolute points and achieves up to 71% fully correct classes, while requiring only a small number of repairs on average. These results demonstrate that test-driven generation can effectively scale beyond isolated functions and substantially improve class-level code generation reliability. All code and data are available at https://anonymous.4open.science/r/ClassEval-TDD-C4C9/
- [630] arXiv:2602.03558 [pdf, ps, other]
-
Title: ELIQ: A Label-Free Framework for Quality Assessment of Evolving AI-Generated ImagesAuthors: Xinyue Li, Zhiming Xu, Zhichao Zhang, Zhaolin Cai, Sijing Wu, Xiongkuo Min, Yitong Chen, Guangtao ZhaiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Generative text-to-image models are advancing at an unprecedented pace, continuously shifting the perceptual quality ceiling and rendering previously collected labels unreliable for newer generations. To address this, we present ELIQ, a Label-free Framework for Quality Assessment of Evolving AI-generated Images. Specifically, ELIQ focuses on visual quality and prompt-image alignment, automatically constructs positive and aspect-specific negative pairs to cover both conventional distortions and AIGC-specific distortion modes, enabling transferable supervision without human annotations. Building on these pairs, ELIQ adapts a pre-trained multimodal model into a quality-aware critic via instruction tuning and predicts two-dimensional quality using lightweight gated fusion and a Quality Query Transformer. Experiments across multiple benchmarks demonstrate that ELIQ consistently outperforms existing label-free methods, generalizes from AI-generated content (AIGC) to user-generated content (UGC) scenarios without modification, and paves the way for scalable and label-free quality assessment under continuously evolving generative models. The code will be released upon publication.
- [631] arXiv:2602.03560 [pdf, ps, other]
-
Title: HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache SharingAuthors: Yizhao Gao, Jianyu Wei, Qihao Zhang, Yu Cheng, Shimao Chen, Zhengju Tang, Zihan Jiang, Yifan Song, Hailin Zhang, Liang Zhao, Bo Yang, Gang Wang, Shijie Cao, Fuli LuoComments: 17 pages, 2 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This work introduces Hybrid Sparse Attention (HySparse), a new architecture that interleaves each full attention layer with several sparse attention layers. While conceptually simple, HySparse strategically derives each sparse layer's token selection and KV caches directly from the preceding full attention layer. This architecture resolves two fundamental limitations of prior sparse attention methods. First, conventional approaches typically rely on additional proxies to predict token importance, introducing extra complexity and potentially suboptimal performance. In contrast, HySparse uses the full attention layer as a precise oracle to identify important tokens. Second, existing sparse attention designs often reduce computation without saving KV cache. HySparse enables sparse attention layers to reuse the full attention KV cache, thereby reducing both computation and memory. We evaluate HySparse on both 7B dense and 80B MoE models. Across all settings, HySparse consistently outperforms both full attention and hybrid SWA baselines. Notably, in the 80B MoE model with 49 total layers, only 5 layers employ full attention, yet HySparse achieves substantial performance gains while reducing KV cache storage by nearly 10x.
- [632] arXiv:2602.03562 [pdf, ps, other]
-
Title: NPCNet: Navigator-Driven Pseudo Text for Deep Clustering of Early Sepsis PhenotypingSubjects: Machine Learning (cs.LG)
Sepsis is a heterogeneous syndrome. Identifying clinically distinct phenotypes may enable more precise treatment strategies. In recent years, many researchers have applied clustering algorithms to sepsis patients. However, the clustering process rarely incorporates clinical relevance, potentially limiting to reflect clinically distinct phenotypes. We propose NPCNet, a novel deep clustering network with a target navigator that integrates temporal Electronic Health Records (EHRs) to better align sepsis phenotypes with clinical significance. We identify four sepsis phenotypes ($\alpha$, $\beta$, $\gamma$, and $\delta$) with divergence in SOFA trajectories. Notably, while $\alpha$ and $\delta$ phenotypes both show severe conditions in the early stage, NPCNet effectively differentiates patients who are likely to improve ($\alpha$) from those at risk of deterioration ($\delta$). Furthermore, through the treatment effect analysis, we discover that $\alpha$, $\beta$, and $\delta$ phenotypes may benefit from early vasopressor administration. The results show that NPCNet enhances precision treatment strategies by uncovering clinically distinct phenotypes.
- [633] arXiv:2602.03563 [pdf, ps, other]
-
Title: ACL: Aligned Contrastive Learning Improves BERT and Multi-exit BERT Fine-tuningAuthors: Wei ZhuSubjects: Computation and Language (cs.CL)
Despite its success in self-supervised learning, contrastive learning is less studied in the supervised setting. In this work, we first use a set of pilot experiments to show that in the supervised setting, the cross-entropy loss objective (CE) and the contrastive learning objective often conflict with each other, thus hindering the applications of CL in supervised settings. To resolve this problem, we introduce a novel \underline{A}ligned \underline{C}ontrastive \underline{L}earning (ACL) framework. First, ACL-Embed regards label embeddings as extra augmented samples with different labels and employs contrastive learning to align the label embeddings with its samples' representations. Second, to facilitate the optimization of ACL-Embed objective combined with the CE loss, we propose ACL-Grad, which will discard the ACL-Embed term if the two objectives are in conflict. To further enhance the performances of intermediate exits of multi-exit BERT, we further propose cross-layer ACL (ACL-CL), which is to ask the teacher exit to guide the optimization of student shallow exits. Extensive experiments on the GLUE benchmark results in the following takeaways: (a) ACL-BRT outperforms or performs comparably with CE and CE+SCL on the GLUE tasks; (b) ACL, especially CL-ACL, significantly surpasses the baseline methods on the fine-tuning of multi-exit BERT, thus providing better quality-speed tradeoffs for low-latency applications.
- [634] arXiv:2602.03564 [pdf, ps, other]
-
Title: CoGenCast: A Coupled Autoregressive-Flow Generative Framework for Time Series ForecastingSubjects: Machine Learning (cs.LG)
Time series forecasting can be viewed as a generative problem that requires both semantic understanding over contextual conditions and stochastic modeling of continuous temporal dynamics. Existing approaches typically rely on either autoregressive large language models (LLMs) for semantic context modeling or diffusion-like models for continuous probabilistic generation. However, neither method alone can adequately model both aspects simultaneously. In this work, we propose CoGenCast, a hybrid generative framework that couples pre-trained LLMs with flow-matching mechanism for effective time series forecasting. Specifically, we reconfigure pre-trained decoder-only LLMs into a native forecasting encoder-decoder backbone by modifying only the attention topology, enabling bidirectional context encoding and causal representation generation. Building on this, a flow-matching mechanism is further integrated to model temporal evolution, capturing continuous stochastic dynamics conditioned on the autoregressively generated representation. Notably, CoGenCast naturally supports multimodal forecasting and cross-domain unified training. Extensive experiments on multiple benchmarks show that CoGenCast consistently outperforms previous compared baselines. Code is available at https://github.com/liuyaguo/_CoGenCast.
- [635] arXiv:2602.03565 [pdf, ps, other]
-
Title: Symbolic Model Checking using Intervals of VectorsComments: Under submissionSubjects: Logic in Computer Science (cs.LO)
Model checking is a powerful technique for software verification. However, the approach notably suffers from the infamous state space explosion problem. To tackle this, in this paper, we introduce a novel symbolic method for encoding Petri net markings. It is based on the use of generalised intervals on vectors, as opposed to existing methods based on vectors of intervals such as Interval Decision Diagrams. We develop a formalisation of these intervals, show that they possess homomorphic operations for model checking CTL on Petri nets, and define a canonical form that provides good performance characteristics. Our structure facilitates the symbolic evaluation of CTL formulas in the realm of global model checking, which aims to identify every state that satisfies a formula. Tests on examples of the model checking contest (MCC 2022) show that our approach yields promising results. To achieve this, we implement efficient computations based on saturation and clustering principles derived from other symbolic model checking techniques.
- [636] arXiv:2602.03566 [pdf, ps, other]
-
Title: Riemannian Neural Optimal TransportComments: 58 pagesSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Computational optimal transport (OT) offers a principled framework for generative modeling. Neural OT methods, which use neural networks to learn an OT map (or potential) from data in an amortized way, can be evaluated out of sample after training, but existing approaches are tailored to Euclidean geometry. Extending neural OT to high-dimensional Riemannian manifolds remains an open challenge. In this paper, we prove that any method for OT on manifolds that produces discrete approximations of transport maps necessarily suffers from the curse of dimensionality: achieving a fixed accuracy requires a number of parameters that grows exponentially with the manifold dimension. Motivated by this limitation, we introduce Riemannian Neural OT (RNOT) maps, which are continuous neural-network parameterizations of OT maps on manifolds that avoid discretization and incorporate geometric structure by construction. Under mild regularity assumptions, we prove that RNOT maps approximate Riemannian OT maps with sub-exponential complexity in the dimension. Experiments on synthetic and real datasets demonstrate improved scalability and competitive performance relative to discretization-based baselines.
- [637] arXiv:2602.03567 [pdf, ps, other]
-
Title: EVE: Efficient Verification of Data Erasure through Customized Perturbation in Approximate UnlearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Verifying whether the machine unlearning process has been properly executed is critical but remains underexplored. Some existing approaches propose unlearning verification methods based on backdooring techniques. However, these methods typically require participation in the model's initial training phase to backdoor the model for later verification, which is inefficient and impractical. In this paper, we propose an efficient verification of erasure method (EVE) for verifying machine unlearning without requiring involvement in the model's initial training process. The core idea is to perturb the unlearning data to ensure the model prediction of the specified samples will change before and after unlearning with perturbed data. The unlearning users can leverage the observation of the changes as a verification signal. Specifically, the perturbations are designed with two key objectives: ensuring the unlearning effect and altering the unlearned model's prediction of target samples. We formalize the perturbation generation as an adversarial optimization problem, solving it by aligning the unlearning gradient with the gradient of boundary change for target samples. We conducted extensive experiments, and the results show that EVE can verify machine unlearning without involving the model's initial training process, unlike backdoor-based methods. Moreover, EVE significantly outperforms state-of-the-art unlearning verification methods, offering significant speedup in efficiency while enhancing verification accuracy. The source code of EVE is released at \uline{https://anonymous.4open.science/r/EVE-C143}, providing a novel tool for verification of machine unlearning.
- [638] arXiv:2602.03569 [pdf, ps, other]
-
Title: EHRWorld: A Patient-Centric Medical World Model for Long-Horizon Clinical TrajectoriesSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
World models offer a principled framework for simulating future states under interventions, but realizing such models in complex, high-stakes domains like medicine remains challenging. Recent large language models (LLMs) have achieved strong performance on static medical reasoning tasks, raising the question of whether they can function as dynamic medical world models capable of simulating disease progression and treatment outcomes over time. In this work, we show that LLMs only incorporating medical knowledge struggle to maintain consistent patient states under sequential interventions, leading to error accumulation in long-horizon clinical simulation. To address this limitation, we introduce EHRWorld, a patient-centric medical world model trained under a causal sequential paradigm, together with EHRWorld-110K, a large-scale longitudinal clinical dataset derived from real-world electronic health records. Extensive evaluations demonstrate that EHRWorld significantly outperforms naive LLM-based baselines, achieving more stable long-horizon simulation, improved modeling of clinically sensitive events, and favorable reasoning efficiency, highlighting the necessity of training on causally grounded, temporally evolving clinical data for reliable and robust medical world modeling.
- [639] arXiv:2602.03570 [pdf, ps, other]
-
Title: Asymmetric Hierarchical Anchoring for Audio-Visual Joint Representation: Resolving Information Allocation Ambiguity for Robust Cross-Modal GeneralizationComments: 18 pages, 11 figuresSubjects: Machine Learning (cs.LG)
Audio-visual joint representation learning under Cross-Modal Generalization (CMG) aims to transfer knowledge from a labeled source modality to an unlabeled target modality through a unified discrete representation space. Existing symmetric frameworks often suffer from information allocation ambiguity, where the absence of structural inductive bias leads to semantic-specific leakage across modalities. We propose Asymmetric Hierarchical Anchoring (AHA), which enforces directional information allocation by designating a structured semantic anchor within a shared hierarchy. In our instantiation, we exploit the hierarchical discrete representations induced by audio Residual Vector Quantization (RVQ) to guide video feature distillation into a shared semantic space. To ensure representational purity, we replace fragile mutual information estimators with a GRL-based adversarial decoupler that explicitly suppresses semantic leakage in modality-specific branches, and introduce Local Sliding Alignment (LSA) to encourage fine-grained temporal alignment across modalities. Extensive experiments on AVE and AVVP benchmarks demonstrate that AHA consistently outperforms symmetric baselines in cross-modal transfer. Additional analyses on talking-face disentanglement experiment further validate that the learned representations exhibit improved semantic consistency and disentanglement, indicating the broader applicability of the proposed framework.
- [640] arXiv:2602.03571 [pdf, ps, other]
-
Title: Multi-Player, Multi-Strategy Quantum Game Model for Interaction-Aware Decision-Making in Autonomous DrivingSubjects: Robotics (cs.RO)
Although significant progress has been made in decision-making for automated driving, challenges remain for deployment in the real world. One challenge lies in addressing interaction-awareness. Most existing approaches oversimplify interactions between the ego vehicle and surrounding agents, and often neglect interactions among the agents themselves. A common solution is to model these interactions using classical game theory. However, its formulation assumes rational players, whereas human behavior is frequently uncertain or irrational. To address these challenges, we propose the Quantum Game Decision-Making (QGDM) model, a novel framework that combines classical game theory with quantum mechanics principles (such as superposition, entanglement, and interference) to tackle multi-player, multi-strategy decision-making problems. To the best of our knowledge, this is one of the first studies to apply quantum game theory to decision-making for automated driving. QGDM runs in real time on a standard computer, without requiring quantum hardware. We evaluate QGDM in simulation across various scenarios, including roundabouts, merging, and highways, and compare its performance with multiple baseline methods. Results show that QGDM significantly improves success rates and reduces collision rates compared to classical approaches, particularly in scenarios with high interaction.
- [641] arXiv:2602.03578 [pdf, ps, other]
-
Title: Use Graph When It Needs: Efficiently and Adaptively Integrating Retrieval-Augmented Generation with GraphsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large language models (LLMs) often struggle with knowledge-intensive tasks due to hallucinations and outdated parametric knowledge. While Retrieval-Augmented Generation (RAG) addresses this by integrating external corpora, its effectiveness is limited by fragmented information in unstructured domain documents. Graph-augmented RAG (GraphRAG) emerged to enhance contextual reasoning through structured knowledge graphs, yet paradoxically underperforms vanilla RAG in real-world scenarios, exhibiting significant accuracy drops and prohibitive latency despite gains on complex queries. We identify the rigid application of GraphRAG to all queries, regardless of complexity, as the root cause. To resolve this, we propose an efficient and adaptive GraphRAG framework called EA-GraphRAG that dynamically integrates RAG and GraphRAG paradigms through syntax-aware complexity analysis. Our approach introduces: (i) a syntactic feature constructor that parses each query and extracts a set of structural features; (ii) a lightweight complexity scorer that maps these features to a continuous complexity score; and (iii) a score-driven routing policy that selects dense RAG for low-score queries, invokes graph-based retrieval for high-score queries, and applies complexity-aware reciprocal rank fusion to handle borderline cases. Extensive experiments on a comprehensive benchmark, consisting of two single-hop and two multi-hop QA benchmarks, demonstrate that our EA-GraphRAG significantly improves accuracy, reduces latency, and achieves state-of-the-art performance in handling mixed scenarios involving both simple and complex queries.
- [642] arXiv:2602.03579 [pdf, ps, other]
-
Title: Secure Decentralized Pliable Index Coding for Target Data SizeComments: 12 pagesSubjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)
Decentralized Pliable Index Coding (DPIC) problem addresses efficient information exchange in distributed systems where clients communicate among themselves without a central server. An important consideration in DPIC is the heterogeneity of side-information and demand sizes. Although many prior works assume homogeneous settings with identical side-information cardinality and single message demands, these assumptions limit real-world applicability where clients typically possess unequal amounts of prior information. In this paper, we study DPIC problem under heterogeneous side-information cardinalities. We propose a transmission scheme that coordinates client broadcasts to maximize coding efficiency while ensuring that each client achieves a common target level $T$. In addition, we impose a strict security constraint that no client acquires more than the target $T$ number of messages, guaranteeing that each client ends up with exactly $T$ messages. We analyze the communication cost incurred by the proposed scheme under this security constraint.
- [643] arXiv:2602.03580 [pdf, ps, other]
-
Title: Don't believe everything you read: Understanding and Measuring MCP Behavior under Misleading Tool DescriptionsSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
The Model Context Protocol (MCP) enables large language models to invoke external tools through natural-language descriptions, forming the foundation of many AI agent applications. However, MCP does not enforce consistency between documented tool behavior and actual code execution, even though MCP Servers often run with broad system privileges. This gap introduces a largely unexplored security risk. We study how mismatches between externally presented tool descriptions and underlying implementations systematically shape the mental models and decision-making behavior of intelligent agents. Specifically, we present the first large-scale study of description-code inconsistency in the MCP ecosystem. We design an automated static analysis framework and apply it to 10,240 real-world MCP Servers across 36 categories. Our results show that while most servers are highly consistent, approximately 13% exhibit substantial mismatches that can enable undocumented privileged operations, hidden state mutations, or unauthorized financial actions. We further observe systematic differences across application categories, popularity levels, and MCP marketplaces. Our findings demonstrate that description-code inconsistency is a concrete and prevalent attack surface in MCP-based AI agents, and motivate the need for systematic auditing and stronger transparency guarantees in future agent ecosystems.
- [644] arXiv:2602.03582 [pdf, ps, other]
-
Title: Optimization and Generation in Aerodynamics Inverse DesignSubjects: Machine Learning (cs.LG)
Inverse design with physics-based objectives is challenging because it couples high-dimensional geometry with expensive simulations, as exemplified by aerodynamic shape optimization for drag reduction. We revisit inverse design through two canonical solutions, the optimal design point and the optimal design distribution, and relate them to optimization and guided generation. Building on this view, we propose a new training loss for cost predictors and a density-gradient optimization method that improves objectives while preserving plausible shapes. We further unify existing training-free guided generation methods. To address their inability to approximate conditional covariance in high dimensions, we develop a time- and memory-efficient algorithm for approximate covariance estimation. Experiments on a controlled 2D study and high-fidelity 3D aerodynamic benchmarks (car and aircraft), validated by OpenFOAM simulations and miniature wind-tunnel tests with 3D-printed prototypes, demonstrate consistent gains in both optimization and guided generation. Additional offline RL results further support the generality of our approach.
- [645] arXiv:2602.03584 [pdf, ps, other]
-
Title: $V_0$: A Generalist Value Model for Any Policy at State ZeroAuthors: Yi-Kai Zhang, Zhiyuan Yao, Hongyan Hao, Yueqing Sun, Qi Gu, Hui Su, Xunliang Cai, De-Chuan Zhan, Han-Jia YeSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Policy gradient methods rely on a baseline to measure the relative advantage of an action, ensuring the model reinforces behaviors that outperform its current average capability. In the training of Large Language Models (LLMs) using Actor-Critic methods (e.g., PPO), this baseline is typically estimated by a Value Model (Critic) often as large as the policy model itself. However, as the policy continuously evolves, the value model requires expensive, synchronous incremental training to accurately track the shifting capabilities of the policy. To avoid this overhead, Group Relative Policy Optimization (GRPO) eliminates the coupled value model by using the average reward of a group of rollouts as the baseline; yet, this approach necessitates extensive sampling to maintain estimation stability. In this paper, we propose $V_0$, a Generalist Value Model capable of estimating the expected performance of any model on unseen prompts without requiring parameter updates. We reframe value estimation by treating the policy's dynamic capability as an explicit context input; specifically, we leverage a history of instruction-performance pairs to dynamically profile the model, departing from the traditional paradigm that relies on parameter fitting to perceive capability shifts. Focusing on value estimation at State Zero (i.e., the initial prompt, hence $V_0$), our model serves as a critical resource scheduler. During GRPO training, $V_0$ predicts success rates prior to rollout, allowing for efficient sampling budget allocation; during deployment, it functions as a router, dispatching instructions to the most cost-effective and suitable model. Empirical results demonstrate that $V_0$ significantly outperforms heuristic budget allocation and achieves a Pareto-optimal trade-off between performance and cost in LLM routing tasks.
- [646] arXiv:2602.03585 [pdf, ps, other]
-
Title: Causal Inference for the Effect of Code Coverage on Bug IntroductionComments: Registered Report with Continuity Acceptance (CA) for submission to Empirical Software Engineering granted by RR-Committee of the MSR'26Subjects: Software Engineering (cs.SE)
Context: Code coverage is widely used as a software quality assurance measure. However, its effect, and specifically the advisable dose, are disputed in both the research and engineering communities. Prior work reports only correlational associations, leaving results vulnerable to confounding factors. Objective: We aim to quantify the causal effect of code coverage (exposure) on bug introduction (outcome) in the context of mature JavaScript and TypeScript open source projects, addressing both the overall effect and its variance across coverage levels. Method: We construct a causal directed acyclic graph to identify confounders within the software engineering process, modeling key variables from the source code, issue- and review systems, and continuous integration. Using generalized propensity score adjustment, we will apply doubly robust regression-based causal inference for continuous exposure to a novel dataset of bug-introducing and non-bug-introducing changes. We estimate the average treatment effect and dose-response relationship to examine potential non-linear patterns (e.g., thresholds or diminishing returns) within the projects of our dataset.
- [647] arXiv:2602.03586 [pdf, ps, other]
-
Title: APEX: Probing Neural Networks via Activation PerturbationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Prior work on probing neural networks primarily relies on input-space analysis or parameter perturbation, both of which face fundamental limitations in accessing structural information encoded in intermediate representations. We introduce Activation Perturbation for EXploration (APEX), an inference-time probing paradigm that perturbs hidden activations while keeping both inputs and model parameters fixed. We theoretically show that activation perturbation induces a principled transition from sample-dependent to model-dependent behavior by suppressing input-specific signals and amplifying representation-level structure, and further establish that input perturbation corresponds to a constrained special case of this framework. Through representative case studies, we demonstrate the practical advantages of APEX. In the small-noise regime, APEX provides a lightweight and efficient measure of sample regularity that aligns with established metrics, while also distinguishing structured from randomly labeled models and revealing semantically coherent prediction transitions. In the large-noise regime, APEX exposes training-induced model-level biases, including a pronounced concentration of predictions on the target class in backdoored models. Overall, our results show that APEX offers an effective perspective for exploring, and understanding neural networks beyond what is accessible from input space alone.
- [648] arXiv:2602.03587 [pdf, ps, other]
-
Title: CL-bench: A Benchmark for Context LearningAuthors: Shihan Dou, Ming Zhang, Zhangyue Yin, Chenhao Huang, Yujiong Shen, Junzhe Wang, Jiayi Chen, Yuchen Ni, Junjie Ye, Cheng Zhang, Huaibing Xie, Jianglu Hu, Shaolei Wang, Weichao Wang, Yanling Xiao, Yiting Liu, Zenan Xu, Zhen Guo, Pluto Zhou, Tao Gui, Zuxuan Wu, Xipeng Qiu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Di Wang, Shunyu YaoComments: 78 pages, 17 figuresSubjects: Computation and Language (cs.CL)
Current language models (LMs) excel at reasoning over prompts using pre-trained knowledge. However, real-world tasks are far more complex and context-dependent: models must learn from task-specific context and leverage new knowledge beyond what is learned during pre-training to reason and resolve tasks. We term this capability context learning, a crucial ability that humans naturally possess but has been largely overlooked. To this end, we introduce CL-bench, a real-world benchmark consisting of 500 complex contexts, 1,899 tasks, and 31,607 verification rubrics, all crafted by experienced domain experts. Each task is designed such that the new content required to resolve it is contained within the corresponding context. Resolving tasks in CL-bench requires models to learn from the context, ranging from new domain-specific knowledge, rule systems, and complex procedures to laws derived from empirical data, all of which are absent from pre-training. This goes far beyond long-context tasks that primarily test retrieval or reading comprehension, and in-context learning tasks, where models learn simple task patterns via instructions and demonstrations. Our evaluations of ten frontier LMs find that models solve only 17.2% of tasks on average. Even the best-performing model, GPT-5.1, solves only 23.7%, revealing that LMs have yet to achieve effective context learning, which poses a critical bottleneck for tackling real-world, complex context-dependent tasks. CL-bench represents a step towards building LMs with this fundamental capability, making them more intelligent and advancing their deployment in real-world scenarios.
- [649] arXiv:2602.03588 [pdf, ps, other]
-
Title: Efficient Algorithms for Partial Constraint Satisfaction Problems over Control-flow GraphsComments: Already accepted by SETTA'25. this https URL arXiv admin note: substantial text overlap with arXiv:2507.16660Subjects: Computation and Language (cs.CL); Programming Languages (cs.PL)
In this work, we focus on the Partial Constraint Satisfaction Problem (PCSP) over control-flow graphs (CFGs) of programs. PCSP serves as a generalization of the well-known Constraint Satisfaction Problem (CSP). In the CSP framework, we define a set of variables, a set of constraints, and a finite domain $D$ that encompasses all possible values for each variable. The objective is to assign a value to each variable in such a way that all constraints are satisfied. In the graph variant of CSP, an underlying graph is considered and we have one variable corresponding to each vertex of the graph and one or several constraints corresponding to each edge. In PCSPs, we allow for certain constraints to be violated at a specified cost, aiming to find a solution that minimizes the total cost. Numerous classical compiler optimization tasks can be framed as PCSPs over control-flow graphs. Examples include Register Allocation, Lifetime-optimal Speculative Partial Redundancy Elimination (LOSPRE), and Optimal Placement of Bank Selection Instructions. On the other hand, it is well-known that control-flow graphs of structured programs are sparse and decomposable in a variety of ways. In this work, we rely on the Series-Parallel-Loop (SPL) decompositions as introduced by~\cite{RegisterAllocation}. Our main contribution is a general algorithm for PCSPs over SPL graphs with a time complexity of \(O(|G| \cdot |D|^6)\), where \(|G|\) represents the size of the control-flow graph. Note that for any fixed domain $D,$ this yields a linear-time solution. Our algorithm can be seen as a generalization and unification of previous SPL-based approaches for register allocation and LOSPRE. In addition, we provide experimental results over another classical PCSP task, i.e. Optimal Bank Selection, achieving runtimes four times better than the previous state of the art.
- [650] arXiv:2602.03589 [pdf, ps, other]
-
Title: SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLMComments: NeurIPS 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Large language models (LLMs) have demonstrated exceptional capabilities in text understanding, which has paved the way for their expansion into video LLMs (Vid-LLMs) to analyze video data. However, current Vid-LLMs struggle to simultaneously retain high-quality frame-level semantic information (i.e., a sufficient number of tokens per frame) and comprehensive video-level temporal information (i.e., an adequate number of sampled frames per video). This limitation hinders the advancement of Vid-LLMs towards fine-grained video understanding. To address this issue, we introduce the SlowFocus mechanism, which significantly enhances the equivalent sampling frequency without compromising the quality of frame-level visual tokens. SlowFocus begins by identifying the query-related temporal segment based on the posed question, then performs dense sampling on this segment to extract local high-frequency features. A multi-frequency mixing attention module is further leveraged to aggregate these local high-frequency details with global low-frequency contexts for enhanced temporal comprehension. Additionally, to tailor Vid-LLMs to this innovative mechanism, we introduce a set of training strategies aimed at bolstering both temporal grounding and detailed temporal reasoning capabilities. Furthermore, we establish FineAction-CGR, a benchmark specifically devised to assess the ability of Vid-LLMs to process fine-grained temporal understanding tasks. Comprehensive experiments demonstrate the superiority of our mechanism across both existing public video understanding benchmarks and our proposed FineAction-CGR.
- [651] arXiv:2602.03591 [pdf, ps, other]
-
Title: High-Resolution Underwater Camouflaged Object Detection: GBU-UCOD Dataset and Topology-Aware and Frequency-Decoupled NetworksSubjects: Computer Vision and Pattern Recognition (cs.CV)
Underwater Camouflaged Object Detection (UCOD) is a challenging task due to the extreme visual similarity between targets and backgrounds across varying marine depths. Existing methods often struggle with topological fragmentation of slender creatures in the deep sea and the subtle feature extraction of transparent organisms. In this paper, we propose DeepTopo-Net, a novel framework that integrates topology-aware modeling with frequency-decoupled perception. To address physical degradation, we design the Water-Conditioned Adaptive Perceptor (WCAP), which employs Riemannian metric tensors to dynamically deform convolutional sampling fields. Furthermore, the Abyssal-Topology Refinement Module (ATRM) is developed to maintain the structural connectivity of spindly targets through skeletal priors. Specifically, we first introduce GBU-UCOD, the first high-resolution (2K) benchmark tailored for marine vertical zonation, filling the data gap for hadal and abyssal zones. Extensive experiments on MAS3K, RMAS, and our proposed GBU-UCOD datasets demonstrate that DeepTopo-Net achieves state-of-the-art performance, particularly in preserving the morphological integrity of complex underwater patterns. The datasets and codes will be released at https://github.com/Wuwenji18/GBU-UCOD.
- [652] arXiv:2602.03592 [pdf, ps, other]
-
Title: Complete Reduction for Derivatives in a Transcendental Liouvillian ExtensionComments: 42pagesSubjects: Symbolic Computation (cs.SC)
Transcendental Liouvillian extensions are differential fields, in which one can model poly-logarithmic, hyperexponential, and trigonometric functions, logarithmic integrals, and their (nested) rational expressions. For such an extension $(F, \, ^\prime)$ with the subfield $C$ of constants, we construct a complementary subspace $W$ for the $C$-subspace of derivatives in $F$, and develop an algorithm that, for every $f \in F$, computes a pair $(g,r) \in F \times W$ such that $f = g^\prime + r$. Moreover, $f$ is a derivative in $F$ if and only if $r=0$. The algorithm enables us to determine elementary integrability over $F$ by computing parametric logarithmic parts, and leads to a reduction-based approach to constructing telescopers for functions that can be represented by elements in $F$.
- [653] arXiv:2602.03593 [pdf, ps, other]
-
Title: Beyond the Commit: Developer Perspectives on Productivity with AI Coding AssistantsComments: ICSE SEIPSubjects: Software Engineering (cs.SE)
Measuring developer productivity is a topic that has attracted attention from both academic research and industrial practice. In the age of AI coding assistants, it has become even more important for both academia and industry to understand how to measure their impact on developer productivity, and to reconsider whether earlier measures and frameworks still apply. This study analyzes the validity of different approaches to evaluating the productivity impacts of AI coding assistants by leveraging mixed-method research. At BNY Mellon, we conduct a survey with 2989 developer responses and 11 in-depth interviews. Our findings demonstrate that a multifaceted approach is needed to measure AI productivity impacts: survey results expose conflicting perspectives on AI tool usefulness, while interviews elicit six distinct factors that capture both short-term and long-term dimensions of productivity. In contrast to prior work, our factors highlight the importance of long-term metrics like technical expertise and ownership of work. We hope this work encourages future research to incorporate a broader range of human-centered factors, and supports industry in adopting more holistic approaches to evaluating developer productivity.
- [654] arXiv:2602.03594 [pdf, ps, other]
-
Title: TIPS Over Tricks: Simple Prompts for Effective Zero-shot Anomaly DetectionAuthors: Alireza Salehi, Ehsan Karami, Sepehr Noey, Sahand Noey, Makoto Yamada, Reshad Hosseini, Mohammad SabokrouComments: This is the extended version of the paper accepted in ICASSP'26, which will be publicly available in May. Authors' contributions may vary among the versionsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Anomaly detection identifies departures from expected behavior in safety-critical settings. When target-domain normal data are unavailable, zero-shot anomaly detection (ZSAD) leverages vision-language models (VLMs). However, CLIP's coarse image-text alignment limits both localization and detection due to (i) spatial misalignment and (ii) weak sensitivity to fine-grained anomalies; prior work compensates with complex auxiliary modules yet largely overlooks the choice of backbone. We revisit the backbone and use TIPS-a VLM trained with spatially aware objectives. While TIPS alleviates CLIP's issues, it exposes a distributional gap between global and local features. We address this with decoupled prompts-fixed for image-level detection and learnable for pixel-level localization-and by injecting local evidence into the global score. Without CLIP-specific tricks, our TIPS-based pipeline improves image-level performance by 1.1-3.9% and pixel-level by 1.5-6.9% across seven industrial datasets, delivering strong generalization with a lean architecture. Code is available at github.com/AlirezaSalehy/Tipsomaly.
- [655] arXiv:2602.03595 [pdf, ps, other]
-
Title: Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Referring Video Object Segmentation (RVOS) aims to segment objects in videos based on textual queries. Current methods mainly rely on large-scale supervised fine-tuning (SFT) of Multi-modal Large Language Models (MLLMs). However, this paradigm suffers from heavy data dependence and limited scalability against the rapid evolution of MLLMs. Although recent zero-shot approaches offer a flexible alternative, their performance remains significantly behind SFT-based methods, due to the straightforward workflow designs. To address these limitations, we propose \textbf{Refer-Agent}, a collaborative multi-agent system with alternating reasoning-reflection mechanisms. This system decomposes RVOS into step-by-step reasoning process. During reasoning, we introduce a Coarse-to-Fine frame selection strategy to ensure the frame diversity and textual relevance, along with a Dynamic Focus Layout that adaptively adjusts the agent's visual focus. Furthermore, we propose a Chain-of-Reflection mechanism, which employs a Questioner-Responder pair to generate a self-reflection chain, enabling the system to verify intermediate results and generates feedback for next-round reasoning refinement. Extensive experiments on five challenging benchmarks demonstrate that Refer-Agent significantly outperforms state-of-the-art methods, including both SFT-based models and zero-shot approaches. Moreover, Refer-Agent is flexible and enables fast integration of new MLLMs without any additional fine-tuning costs. Code will be released.
- [656] arXiv:2602.03596 [pdf, ps, other]
-
Title: SAGE-5GC: Security-Aware Guidelines for Evaluating Anomaly Detection in the 5G Core NetworkComments: ITASEC-2026Subjects: Machine Learning (cs.LG)
Machine learning-based anomaly detection systems are increasingly being adopted in 5G Core networks to monitor complex, high-volume traffic. However, most existing approaches are evaluated under strong assumptions that rarely hold in operational environments, notably the availability of independent and identically distributed (IID) data and the absence of adaptive attackers.In this work, we study the problem of detecting 5G attacks \textit{in the wild}, focusing on realistic deployment settings. We propose a set of Security-Aware Guidelines for Evaluating anomaly detectors in 5G Core Network (SAGE-5GC), driven by domain knowledge and consideration of potential adversarial threats. Using a realistic 5G Core dataset, we first train several anomaly detectors and assess their baseline performance against standard 5GC control-plane cyberattacks targeting PFCP-based network services.We then extend the evaluation to adversarial settings, where an attacker tries to manipulate the observable features of the network traffic to evade detection, under the constraint that the intended functionality of the malicious traffic is preserved. Starting from a selected set of controllable features, we analyze model sensitivity and adversarial robustness through randomized perturbations. Finally, we introduce a practical optimization strategy based on genetic algorithms that operates exclusively on attacker-controllable features and does not require prior knowledge of the underlying detection model. Our experimental results show that adversarially crafted attacks can substantially degrade detection performance, underscoring the need for robust, security-aware evaluation methodologies for anomaly detection in 5G networks deployed in the wild.
- [657] arXiv:2602.03603 [pdf, ps, other]
-
Title: Human-in-the-Loop Failure Recovery with Adaptive Task AllocationSubjects: Robotics (cs.RO)
Since the recent Covid-19 pandemic, mobile manipulators and humanoid assistive robots with higher levels of autonomy have increasingly been adopted for patient care and living assistance. Despite advancements in autonomy, these robots often struggle to perform reliably in dynamic and unstructured environments and require human intervention to recover from failures. Effective human-robot collaboration is essential to enable robots to receive assistance from the most competent operator, in order to reduce their workload and minimize disruptions in task execution. In this paper, we propose an adaptive method for allocating robotic failures to human operators (ARFA). Our proposed approach models the capabilities of human operators, and continuously updates these beliefs based on their actual performance for failure recovery. For every failure to be resolved, a reward function calculates expected outcomes based on operator capabilities and historical data, task urgency, and current workload distribution. The failure is then assigned to the operator with the highest expected reward. Our simulations and user studies show that ARFA outperforms random allocation, significantly reducing robot idle time, improving overall system performance, and leading to a more distributed workload among operators.
- [658] arXiv:2602.03604 [pdf, ps, other]
-
Title: A Lightweight Library for Energy-Based Joint-Embedding Predictive ArchitecturesAuthors: Basile Terver, Randall Balestriero, Megi Dervishi, David Fan, Quentin Garrido, Tushar Nagarajan, Koustuv Sinha, Wancong Zhang, Mike Rabbat, Yann LeCun, Amir BarSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
We present EB-JEPA, an open-source library for learning representations and world models using Joint-Embedding Predictive Architectures (JEPAs). JEPAs learn to predict in representation space rather than pixel space, avoiding the pitfalls of generative modeling while capturing semantically meaningful features suitable for downstream tasks. Our library provides modular, self-contained implementations that illustrate how representation learning techniques developed for image-level self-supervised learning can transfer to video, where temporal dynamics add complexity, and ultimately to action-conditioned world models, where the model must additionally learn to predict the effects of control inputs. Each example is designed for single-GPU training within a few hours, making energy-based self-supervised learning accessible for research and education. We provide ablations of JEA components on CIFAR-10. Probing these representations yields 91% accuracy, indicating that the model learns useful features. Extending to video, we include a multi-step prediction example on Moving MNIST that demonstrates how the same principles scale to temporal modeling. Finally, we show how these representations can drive action-conditioned world models, achieving a 97% planning success rate on the Two Rooms navigation task. Comprehensive ablations reveal the critical importance of each regularization component for preventing representation collapse. Code is available at https://github.com/facebookresearch/eb_jepa.
- [659] arXiv:2602.03607 [pdf, ps, other]
-
Title: Sleep or Transmit: Dual-Mode Energy-Efficient Design for NOMA-Enabled Backscatter NetworksSubjects: Information Theory (cs.IT); Optimization and Control (math.OC)
The rapid growth of Internet-of-Things (IoT) devices demands communication systems that are both spectrally efficient and energy frugal. Backscatter communication (BackCom) is an attractive low-power paradigm, but its spectral efficiency declines in dense deployments. This paper presents an uplink BackCom design that integrates non-orthogonal multiple access (NOMA) and maximizes system energy efficiency (EE). In a bistatic network where multiple backscatter nodes (BNs) harvest RF energy and alternate between sleep and active modes, we formulate a fractional program with coupled time, power, and reflection variables and develop a Dinkelbach-based alternating optimization (AO) algorithm with closed-form updates. Analysis reveals two operating modes depending on power availability, circuit demands and propagation conditions. Simulations show the proposed design adapts the time allocation, achieving up to 8% higher EE than fixed-power and 68% than no-sleep baselines, and delivering up to 127% EE gains over orthogonal multiple access (OMA). These results establish NOMA-enabled BackCom as a scalable, energy efficient solution for large-scale IoT deployments.
- [660] arXiv:2602.03608 [pdf, ps, other]
-
Title: Controlling Output Rankings in Generative Engines for LLM-based SearchComments: 23 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
The way customers search for and choose products is changing with the rise of large language models (LLMs). LLM-based search, or generative engines, provides direct product recommendations to users, rather than traditional online search results that require users to explore options themselves. However, these recommendations are strongly influenced by the initial retrieval order of LLMs, which disadvantages small businesses and independent creators by limiting their visibility.
In this work, we propose CORE, an optimization method that \textbf{C}ontrols \textbf{O}utput \textbf{R}ankings in g\textbf{E}nerative Engines for LLM-based search. Since the LLM's interactions with the search engine are black-box, CORE targets the content returned by search engines as the primary means of influencing output rankings. Specifically, CORE optimizes retrieved content by appending strategically designed optimization content to steer the ranking of outputs. We introduce three types of optimization content: string-based, reasoning-based, and review-based, demonstrating their effectiveness in shaping output rankings. To evaluate CORE in realistic settings, we introduce ProductBench, a large-scale benchmark with 15 product categories and 200 products per category, where each product is associated with its top-10 recommendations collected from Amazon's search interface.
Extensive experiments on four LLMs with search capabilities (GPT-4o, Gemini-2.5, Claude-4, and Grok-3) demonstrate that CORE achieves an average Promotion Success Rate of \textbf{91.4\% @Top-5}, \textbf{86.6\% @Top-3}, and \textbf{80.3\% @Top-1}, across 15 product categories, outperforming existing ranking manipulation methods while preserving the fluency of optimized content. - [661] arXiv:2602.03611 [pdf, ps, other]
-
Title: Explanations Leak: Membership Inference with Differential Privacy and Active Learning DefenseSubjects: Machine Learning (cs.LG)
Counterfactual explanations (CFs) are increasingly integrated into Machine Learning as a Service (MLaaS) systems to improve transparency; however, ML models deployed via APIs are already vulnerable to privacy attacks such as membership inference and model extraction, and the impact of explanations on this threat landscape remains insufficiently understood. In this work, we focus on the problem of how CFs expand the attack surface of MLaaS by strengthening membership inference attacks (MIAs), and on the need to design defense mechanisms that mitigate this emerging risk without undermining utility and explainability. First, we systematically analyze how exposing CFs through query-based APIs enables more effective shadow-based MIAs. Second, we propose a defense framework that integrates Differential Privacy (DP) with Active Learning (AL) to jointly reduce memorization and limit effective training data exposure. Finally, we conduct an extensive empirical evaluation to characterize the three-way trade-off between privacy leakage, predictive performance, and explanation quality. Our findings highlight the need to carefully balance transparency, utility, and privacy in the responsible deployment of explainable MLaaS systems.
- [662] arXiv:2602.03614 [pdf, ps, other]
-
Title: Quantization-Aware Regularizers for Deep Neural Networks CompressionSubjects: Machine Learning (cs.LG)
Deep Neural Networks reached state-of-the-art performance across numerous domains, but this progress has come at the cost of increasingly large and over-parameterized models, posing serious challenges for deployment on resource-constrained devices. As a result, model compression has become essential, and -- among compression techniques -- weight quantization is largely used and particularly effective, yet it typically introduces a non-negligible accuracy drop. However, it is usually applied to already trained models, without influencing how the parameter space is explored during the learning phase. In contrast, we introduce per-layer regularization terms that drive weights to naturally form clusters during training, integrating quantization awareness directly into the optimization process. This reduces the accuracy loss typically associated with quantization methods while preserving their compression potential. Furthermore, in our framework quantization representatives become network parameters, marking, to the best of our knowledge, the first approach to embed quantization parameters directly into the backpropagation procedure. Experiments on CIFAR-10 with AlexNet and VGG16 models confirm the effectiveness of the proposed strategy.
- [663] arXiv:2602.03615 [pdf, ps, other]
-
Title: KTV: Keyframes and Key Tokens Selection for Efficient Training-Free Video LLMsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Training-free video understanding leverages the strong image comprehension capabilities of pre-trained vision language models (VLMs) by treating a video as a sequence of static frames, thus obviating the need for costly video-specific training. However, this paradigm often suffers from severe visual redundancy and high computational overhead, especially when processing long videos. Crucially, existing keyframe selection strategies, especially those based on CLIP similarity, are prone to biases and may inadvertently overlook critical frames, resulting in suboptimal video comprehension. To address these significant challenges, we propose \textbf{KTV}, a novel two-stage framework for efficient and effective training-free video understanding. In the first stage, KTV performs question-agnostic keyframe selection by clustering frame-level visual features, yielding a compact, diverse, and representative subset of frames that mitigates temporal redundancy. In the second stage, KTV applies key visual token selection, pruning redundant or less informative tokens from each selected keyframe based on token importance and redundancy, which significantly reduces the number of tokens fed into the LLM. Extensive experiments on the Multiple-Choice VideoQA task demonstrate that KTV outperforms state-of-the-art training-free baselines while using significantly fewer visual tokens, \emph{e.g.}, only 504 visual tokens for a 60-min video with 10800 frames, achieving $44.8\%$ accuracy on the MLVU-Test benchmark. In particular, KTV also exceeds several training-based approaches on certain benchmarks.
- [664] arXiv:2602.03619 [pdf, ps, other]
-
Title: Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report GenerationAuthors: Changze Lv, Jie Zhou, Wentao Zhao, Jingwen Xu, Zisu Huang, Muzhao Tian, Shihan Dou, Tao Gui, Le Tian, Xiao Zhou, Xiaoqing Zheng, Xuanjing Huang, Jie ZhouSubjects: Computation and Language (cs.CL)
Nowadays, training and evaluating DeepResearch-generated reports remain challenging due to the lack of verifiable reward signals. Accordingly, rubric-based evaluation has become a common practice. However, existing approaches either rely on coarse, pre-defined rubrics that lack sufficient granularity, or depend on manually constructed query-specific rubrics that are costly and difficult to scale. In this paper, we propose a pipeline to train human-preference-aligned query-specific rubric generators tailored for DeepResearch report generation. We first construct a dataset of DeepResearch-style queries annotated with human preferences over paired reports, and train rubric generators via reinforcement learning with a hybrid reward combining human preference supervision and LLM-based rubric evaluation. To better handle long-horizon reasoning, we further introduce a Multi-agent Markov-state (MaMs) workflow for report generation. We empirically show that our proposed rubric generators deliver more discriminative and better human-aligned supervision than existing rubric design strategies. Moreover, when integrated into the MaMs training framework, DeepResearch systems equipped with our rubric generators consistently outperform all open-source baselines on the DeepResearch Bench and achieve performance comparable to that of leading closed-source models.
- [665] arXiv:2602.03622 [pdf, ps, other]
-
Title: Quasi-multimodal-based pathophysiological feature learning for retinal disease diagnosisJournal-ref: Zhang, L., Yu, H., Wang, Z., Gui, F., Guo, Y., Zhang, W., Jia, M., 2026. Quasi-multimodal-based pathophysiological feature learning for retinal disease diagnosis. Medical Image Analysis 109, 103886Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
Retinal diseases spanning a broad spectrum can be effectively identified and diagnosed using complementary signals from multimodal data. However, multimodal diagnosis in ophthalmic practice is typically challenged in terms of data heterogeneity, potential invasiveness, registration complexity, and so on. As such, a unified framework that integrates multimodal data synthesis and fusion is proposed for retinal disease classification and grading. Specifically, the synthesized multimodal data incorporates fundus fluorescein angiography (FFA), multispectral imaging (MSI), and saliency maps that emphasize latent lesions as well as optic disc/cup regions. Parallel models are independently trained to learn modality-specific representations that capture cross-pathophysiological signatures. These features are then adaptively calibrated within and across modalities to perform information pruning and flexible integration according to downstream tasks. The proposed learning system is thoroughly interpreted through visualizations in both image and feature spaces. Extensive experiments on two public datasets demonstrated the superiority of our approach over state-of-the-art ones in the tasks of multi-label classification (F1-score: 0.683, AUC: 0.953) and diabetic retinopathy grading (Accuracy:0.842, Kappa: 0.861). This work not only enhances the accuracy and efficiency of retinal disease screening but also offers a scalable framework for data augmentation across various medical imaging modalities.
- [666] arXiv:2602.03623 [pdf, ps, other]
-
Title: Self-supervised Physics-Informed Manipulation of Deformable Linear Objects with Non-negligible DynamicsComments: Submitted to IEEE Transactions on Robotics. Video: this https URLSubjects: Robotics (cs.RO)
We address dynamic manipulation of deformable linear objects by presenting SPiD, a physics-informed self-supervised learning framework that couples an accurate deformable object model with an augmented self-supervised training strategy. On the modeling side, we extend a mass-spring model to more accurately capture object dynamics while remaining lightweight enough for high-throughput rollouts during self-supervised learning. On the learning side, we train a neural controller using a task-oriented cost, enabling end-to-end optimization through interaction with the differentiable object model. In addition, we propose a self-supervised DAgger variant that detects distribution shift during deployment and performs offline self-correction to further enhance robustness without expert supervision. We evaluate our method primarily on the rope stabilization task, where a robot must bring a swinging rope to rest as quickly and smoothly as possible. Extensive experiments in both simulation and the real world demonstrate that the proposed controller achieves fast and smooth rope stabilization, generalizing across unseen initial states, rope lengths, masses, non-uniform mass distributions, and external disturbances. Additionally, we develop an affordable markerless rope perception method and demonstrate that our controller maintains performance with noisy and low-frequency state updates. Furthermore, we demonstrate the generality of the framework by extending it to the rope trajectory tracking task. Overall, SPiD offers a data-efficient, robust, and physically grounded framework for dynamic manipulation of deformable linear objects, featuring strong sim-to-real generalization.
- [667] arXiv:2602.03625 [pdf, ps, other]
-
Title: Multi-Objective Optimization for Synthetic-to-Real Style TransferComments: Accepted in International Conference on the Applications of Evolutionary Computation (Part of EvoStar), April 2026 (EvoApplications 2026)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Semantic segmentation networks require large amounts of pixel-level annotated data, which are costly to obtain for real-world images. Computer graphics engines can generate synthetic images alongside their ground-truth annotations. However, models trained on such images can perform poorly on real images due to the domain gap between real and synthetic images. Style transfer methods can reduce this difference by applying a realistic style to synthetic images. Choosing effective data transformations and their sequence is difficult due to the large combinatorial search space of style transfer operators. Using multi-objective genetic algorithms, we optimize pipelines to balance structural coherence and style similarity to target domains. We study the use of paired-image metrics on individual image samples during evolution to enable rapid pipeline evaluation, as opposed to standard distributional metrics that require the generation of many images. After optimization, we evaluate the resulting Pareto front using distributional metrics and segmentation performance. We apply this approach to standard datasets in synthetic-to-real domain adaptation: from the video game GTA5 to real image datasets Cityscapes and ACDC, focusing on adverse conditions. Results demonstrate that evolutionary algorithms can propose diverse augmentation pipelines adapted to different objectives. The contribution of this work is the formulation of style transfer as a sequencing problem suitable for evolutionary optimization and the study of efficient metrics that enable feasible search in this space. The source code is available at: https://github.com/echigot/MOOSS.
- [668] arXiv:2602.03627 [pdf, ps, other]
-
Title: Ultra Fast PDE Solving via Physics Guided Few-step DiffusionSubjects: Machine Learning (cs.LG)
Diffusion-based models have demonstrated impressive accuracy and generalization in solving partial differential equations (PDEs). However, they still face significant limitations, such as high sampling costs and insufficient physical consistency, stemming from their many-step iterative sampling mechanism and lack of explicit physics constraints. To address these issues, we propose Phys-Instruct, a novel physics-guided distillation framework which not only (1) compresses a pre-trained diffusion PDE solver into a few-step generator via matching generator and prior diffusion distributions to enable rapid sampling, but also (2) enhances the physics consistency by explicitly injecting PDE knowledge through a PDE distillation guidance. Physic-Instruct is built upon a solid theoretical foundation, leading to a practical physics-constrained training objective that admits tractable gradients. Across five PDE benchmarks, Phys-Instruct achieves orders-of-magnitude faster inference while reducing PDE error by more than 8 times compared to state-of-the-art diffusion baselines. Moreover, the resulting unconditional student model functions as a compact prior, enabling efficient and physically consistent inference for various downstream conditional tasks. Our results indicate that Phys-Instruct is a novel, effective, and efficient framework for ultra-fast PDE solving powered by deep generative models.
- [669] arXiv:2602.03630 [pdf, ps, other]
-
Title: Can LLMs Do Rocket Science? Exploring the Limits of Complex Reasoning with GTOC 12Comments: Extended version of the paper presented at AIAA SciTech 2026 Forum. Includes futher experiments, corrections and new appendixJournal-ref: Proceedings of the AIAA SciTech 2026 Forum, January 2026Subjects: Artificial Intelligence (cs.AI)
Large Language Models (LLMs) have demonstrated remarkable proficiency in code generation and general reasoning, yet their capacity for autonomous multi-stage planning in high-dimensional, physically constrained environments remains an open research question. This study investigates the limits of current AI agents by evaluating them against the 12th Global Trajectory Optimization Competition (GTOC 12), a complex astrodynamics challenge requiring the design of a large-scale asteroid mining campaign. We adapt the MLE-Bench framework to the domain of orbital mechanics and deploy an AIDE-based agent architecture to autonomously generate and refine mission solutions. To assess performance beyond binary validity, we employ an "LLM-as-a-Judge" methodology, utilizing a rubric developed by domain experts to evaluate strategic viability across five structural categories. A comparative analysis of models, ranging from GPT-4-Turbo to reasoning-enhanced architectures like Gemini 2.5 Pro, and o3, reveals a significant trend: the average strategic viability score has nearly doubled in the last two years (rising from 9.3 to 17.2 out of 26). However, we identify a critical capability gap between strategy and execution. While advanced models demonstrate sophisticated conceptual understanding, correctly framing objective functions and mission architectures, they consistently fail at implementation due to physical unit inconsistencies, boundary condition errors, and inefficient debugging loops. We conclude that, while current LLMs often demonstrate sufficient knowledge and intelligence to tackle space science tasks, they remain limited by an implementation barrier, functioning as powerful domain facilitators rather than fully autonomous engineers.
- [670] arXiv:2602.03632 [pdf, ps, other]
-
Title: CALM: A Self-Adaptive Orchestration Approach for QoS-Aware Routing in Small Language Model based SystemsComments: Accepted as full paper at SEAMS 2026Subjects: Software Engineering (cs.SE)
AI-enabled systems are subjected to various types of runtime uncertainties, ranging from dynamic workloads, resource requirements, model drift, etc. These uncertainties have a big impact on the overall Quality of Service (QoS). This is particularly true in the case of Language Model (LM) enabled systems where the autoregressive nature of token generation introduces variability in latency, energy usage and response quality. These systems, powered by LLMs, are either resource-intensive (if run on-prem) or raise privacy/cost concerns (if leveraged using APIs). While deploying a Small Language Model (SLM) can be resource-efficient, it often falls short in addressing the diversity and scale of real-world requirements. To this, we argue that, rather than relying on any one SLM, leveraging a coordinated fleet of SLMs, each with specialized strengths can enable systems to dynamically adapt to shifting contexts and workload patterns. However, realizing the full potential of such an approach demands intelligent orchestration and continuous adaptation. To this end, we introduce CALM , a self-adaptive orchestration mechanism based on MAPE-K. Our approach continuously monitors user queries, analyzes the QoS metrics of the SLMs, identifies the optimal SLM to be used, routes the query to the identified SLM and further to enhance the effectiveness and efficiency, leverages caching and scheduling to decide the SLMs to be kept in memory. Our evaluation shows that CALM reduces latency by approximately 40% and energy consumption by 50%, while preserving domain-specific task performance when compared to single-LLM baselines.
- [671] arXiv:2602.03633 [pdf, ps, other]
-
Title: BIRDTurk: Adaptation of the BIRD Text-to-SQL Dataset to TurkishAuthors: Burak Aktaş, Mehmet Can Baytekin, Süha Kağan Köse, Ömer İlbilgi, Elif Özge Yılmaz, Çağrı Toraman, Bilge Kaan GörürComments: Accepted by EACL 2026 SIGTURKSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB)
Text-to-SQL systems have achieved strong performance on English benchmarks, yet their behavior in morphologically rich, low-resource languages remains largely unexplored. We introduce BIRDTurk, the first Turkish adaptation of the BIRD benchmark, constructed through a controlled translation pipeline that adapts schema identifiers to Turkish while strictly preserving the logical structure and execution semantics of SQL queries and databases. Translation quality is validated on a sample size determined by the Central Limit Theorem to ensure 95% confidence, achieving 98.15% accuracy on human-evaluated samples. Using BIRDTurk, we evaluate inference-based prompting, agentic multi-stage reasoning, and supervised fine-tuning. Our results reveal that Turkish introduces consistent performance degradation, driven by both structural linguistic divergence and underrepresentation in LLM pretraining, while agentic reasoning demonstrates stronger cross-lingual robustness. Supervised fine-tuning remains challenging for standard multilingual baselines but scales effectively with modern instruction-tuned models. BIRDTurk provides a controlled testbed for cross-lingual Text-to-SQL evaluation under realistic database conditions. We release the training and development splits to support future research.
- [672] arXiv:2602.03634 [pdf, ps, other]
-
Title: SPWOOD: Sparse Partial Weakly-Supervised Oriented Object DetectionComments: The Fourteenth International Conference on Learning Representations (ICLR 2026)Subjects: Computer Vision and Pattern Recognition (cs.CV)
A consistent trend throughout the research of oriented object detection has been the pursuit of maintaining comparable performance with fewer and weaker annotations. This is particularly crucial in the remote sensing domain, where the dense object distribution and a wide variety of categories contribute to prohibitively high costs. Based on the supervision level, existing oriented object detection algorithms can be broadly grouped into fully supervised, semi-supervised, and weakly supervised methods. Within the scope of this work, we further categorize them to include sparsely supervised and partially weakly-supervised methods. To address the challenges of large-scale labeling, we introduce the first Sparse Partial Weakly-Supervised Oriented Object Detection framework, designed to efficiently leverage only a few sparse weakly-labeled data and plenty of unlabeled data. Our framework incorporates three key innovations: (1) We design a Sparse-annotation-Orientation-and-Scale-aware Student (SOS-Student) model to separate unlabeled objects from the background in a sparsely-labeled setting, and learn orientation and scale information from orientation-agnostic or scale-agnostic weak annotations. (2) We construct a novel Multi-level Pseudo-label Filtering strategy that leverages the distribution of model predictions, which is informed by the model's multi-layer predictions. (3) We propose a unique sparse partitioning approach, ensuring equal treatment for each category. Extensive experiments on the DOTA and DIOR datasets show that our framework achieves a significant performance gain over traditional oriented object detection methods mentioned above, offering a highly cost-effective solution. Our code is publicly available at https://github.com/VisionXLab/SPWOOD.
- [673] arXiv:2602.03635 [pdf, ps, other]
-
Title: TRE: Encouraging Exploration in the Trust RegionAuthors: Chao Huang, Yujing Lu, Quangang Li, Shenghe Wang, Yan Wang, Yueyang Zhang, Long Xia, Jiashu Zhao, Zhiyuan Sun, Daiting Shi, Tingwen LiuSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Entropy regularization is a standard technique in reinforcement learning (RL) to enhance exploration, yet it yields negligible effects or even degrades performance in Large Language Models (LLMs). We attribute this failure to the cumulative tail risk inherent to LLMs with massive vocabularies and long generation horizons. In such environments, standard global entropy maximization indiscriminately dilutes probability mass into the vast tail of invalid tokens rather than focusing on plausible candidates, thereby disrupting coherent reasoning. To address this, we propose Trust Region Entropy (TRE), a method that encourages exploration strictly within the model's trust region. Extensive experiments across mathematical reasoning (MATH), combinatorial search (Countdown), and preference alignment (HH) tasks demonstrate that TRE consistently outperforms vanilla PPO, standard entropy regularization, and other exploration baselines. Our code is available at https://github.com/WhyChaos/TRE-Encouraging-Exploration-in-the-Trust-Region.
- [674] arXiv:2602.03639 [pdf, ps, other]
-
Title: Variance-Reduced Model Predictive Path Integral via Quadratic Model ApproximationAuthors: Fabian Schramm, Franki Nguimatsia Tiofack, Nicolas Perrin-Gilbert, Marc Toussaint, Justin CarpentierSubjects: Robotics (cs.RO)
Sampling-based controllers, such as Model Predictive Path Integral (MPPI) methods, offer substantial flexibility but often suffer from high variance and low sample efficiency. To address these challenges, we introduce a hybrid variance-reduced MPPI framework that integrates a prior model into the sampling process. Our key insight is to decompose the objective function into a known approximate model and a residual term. Since the residual captures only the discrepancy between the model and the objective, it typically exhibits a smaller magnitude and lower variance than the original objective. Although this principle applies to general modeling choices, we demonstrate that adopting a quadratic approximation enables the derivation of a closed-form, model-guided prior that effectively concentrates samples in informative regions. Crucially, the framework is agnostic to the source of geometric information, allowing the quadratic model to be constructed from exact derivatives, structural approximations (e.g., Gauss- or Quasi-Newton), or gradient-free randomized smoothing. We validate the approach on standard optimization benchmarks, a nonlinear, underactuated cart-pole control task, and a contact-rich manipulation problem with non-smooth dynamics. Across these domains, we achieve faster convergence and superior performance in low-sample regimes compared to standard MPPI. These results suggest that the method can make sample-based control strategies more practical in scenarios where obtaining samples is expensive or limited.
- [675] arXiv:2602.03640 [pdf, ps, other]
-
Title: Tutorial on Reasoning for IR & IR for ReasoningComments: Accepted to ECIR 2026Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Information retrieval has long focused on ranking documents by semantic relatedness. Yet many real-world information needs demand more: enforcement of logical constraints, multi-step inference, and synthesis of multiple pieces of evidence. Addressing these requirements is, at its core, a problem of reasoning. Across AI communities, researchers are developing diverse solutions for the problem of reasoning, from inference-time strategies and post-training of LLMs, to neuro-symbolic systems, Bayesian and probabilistic frameworks, geometric representations, and energy-based models. These efforts target the same problem: to move beyond pattern-matching systems toward structured, verifiable inference. However, they remain scattered across disciplines, making it difficult for IR researchers to identify the most relevant ideas and opportunities. To help navigate the fragmented landscape of research in reasoning, this tutorial first articulates a working definition of reasoning within the context of information retrieval and derives from it a unified analytical framework. The framework maps existing approaches along axes that reflect the core components of the definition. By providing a comprehensive overview of recent approaches and mapping current methods onto the defined axes, we expose their trade-offs and complementarities, highlight where IR can benefit from cross-disciplinary advances, and illustrate how retrieval process itself can play a central role in broader reasoning systems. The tutorial will equip participants with both a conceptual framework and practical guidance for enhancing reasoning-capable IR systems, while situating IR as a domain that both benefits and contributes to the broader development of reasoning methodologies.
- [676] arXiv:2602.03641 [pdf, ps, other]
-
Title: CTTVAE: Latent Space Structuring for Conditional Tabular Data Generation on Imbalanced DatasetsSubjects: Machine Learning (cs.LG)
Generating synthetic tabular data under severe class imbalance is essential for domains where rare but high-impact events drive decision-making. However, most generative models either overlook minority groups or fail to produce samples that are useful for downstream learning. We introduce CTTVAE, a Conditional Transformer-based Tabular Variational Autoencoder equipped with two complementary mechanisms: (i) a class-aware triplet margin loss that restructures the latent space for sharper intra-class compactness and inter-class separation, and (ii) a training-by-sampling strategy that adaptively increases exposure to underrepresented groups. Together, these components form CTTVAE+TBS, a framework that consistently yields more representative and utility-aligned samples without destabilizing training. Across six real-world benchmarks, CTTVAE+TBS achieves the strongest downstream utility on minority classes, often surpassing models trained on the original imbalanced data while maintaining competitive fidelity and bridging the gap for privacy for interpolation-based sampling methods and deep generative methods. Ablation studies further confirm that both latent structuring and targeted sampling contribute to these gains. By explicitly prioritizing downstream performance in rare categories, CTTVAE+TBS provides a robust and interpretable solution for conditional tabular data generation, with direct applicability to industries such as healthcare, fraud detection, and predictive maintenance where even small gains in minority cases can be critical.
- [677] arXiv:2602.03643 [pdf, ps, other]
-
Title: A Probabilistic Model-Checking Framework for Cognitive Assessment and TrainingSubjects: Formal Languages and Automata Theory (cs.FL)
Serious games have proven to be effective tools for screening cognitive impairments and supporting diagnosis in patients with neurodegenerative diseases like Alzheimer's and Parkinson's. They also offer cognitive training benefits. According to the DSM-5 classification, cognitive disorders are categorized as Mild Neurocognitive Disorders (mild NCDs) and Major Neurocognitive Disorders (Major NCDs). In this study, we focus on three patient groups: healthy, mild NCD, and Major NCD. We employ Discrete Time Markov Chains to model the behavior exhibited by each group while interacting with serious games. By applying model-checking techniques, we can identify discrepancies between expected and actual gameplay behavior. The primary contribution of this work is a novel theoretical framework designed to assess how a practitioner's confidence level in diagnosing a patient's Alzheimer's stage evolves with each game session (diagnosis support). Additionally, we propose an experimental protocol where the difficulty of subsequent game sessions is dynamically adjusted based on the patient's observed behavior in previous sessions (training support).
- [678] arXiv:2602.03645 [pdf, ps, other]
-
Title: Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAGComments: On going work. Codes are released at this https URLSubjects: Machine Learning (cs.LG)
Retrieval-augmented generation (RAG) enables large language models (LLMs) to produce evidence-based responses, and its performance hinges on the matching between the retriever and LLMs. Retriever optimization has emerged as an efficient alternative to fine-tuning LLMs. However, existing solutions suffer from objective mismatch between retriever optimization and the goal of RAG pipeline. Reinforcement learning (RL) provides a promising solution to address this limitation, yet applying RL to retriever optimization introduces two fundamental challenges: 1) the deterministic retrieval is incompatible with RL formulations, and 2) state aliasing arises from query-only retrieval in multi-hop reasoning. To address these challenges, we replace deterministic retrieval with stochastic sampling and formulate RAG as a Markov decision process, making retriever optimizable by RL. Further, we incorporate retrieval history into the state at each retrieval step to mitigate state aliasing. Extensive experiments across diverse RAG pipelines, datasets, and retriever scales demonstrate consistent improvements of our approach in RAG performance.
- [679] arXiv:2602.03646 [pdf, ps, other]
-
Title: A Comparison of Set-Based Observers for Nonlinear SystemsComments: 13 pagesSubjects: Systems and Control (eess.SY)
Set-based state estimation computes sets of states consistent with a system model given bounded sets of disturbances and noise. Bounding the set of states is crucial for safety-critical applications so that one can ensure that all specifications are met. While numerous approaches have been proposed for nonlinear discrete-time systems, a unified evaluation under comparable conditions is lacking. This paper reviews and implements a representative selection of set-based observers within the CORA framework. To provide an objective comparison, the methods are evaluated on common benchmarks, and we examine computational effort, scalability, and the conservatism of the resulting state bounds. This study highlights characteristic trade-offs between observer categories and set representations, as well as practical considerations arising in their implementation. All implementations are made publicly available to support reproducibility and future development. This paper thereby offers the first broad, tool-supported comparison of guaranteed state estimators for nonlinear discrete-time systems.
- [680] arXiv:2602.03647 [pdf, ps, other]
-
Title: Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner CollaborationAuthors: Bowei He, Minda Hu, Zenan Xu, Hongru Wang, Licheng Zong, Yankai Chen, Chen Ma, Xue Liu, Pluto Zhou, Irwin KingSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Search-integrated reasoning enables language agents to transcend static parametric knowledge by actively querying external sources. However, training these agents via reinforcement learning is hindered by the multi-scale credit assignment problem: existing methods typically rely on sparse, trajectory-level rewards that fail to distinguish between high-quality reasoning and fortuitous guesses, leading to redundant or misleading search behaviors. To address this, we propose Search-R2, a novel Actor-Refiner collaboration framework that enhances reasoning through targeted intervention, with both components jointly optimized during training. Our approach decomposes the generation process into an Actor, which produces initial reasoning trajectories, and a Meta-Refiner, which selectively diagnoses and repairs flawed steps via a 'cut-and-regenerate' mechanism. To provide fine-grained supervision, we introduce a hybrid reward design that couples outcome correctness with a dense process reward quantifying the information density of retrieved evidence. Theoretically, we formalize the Actor-Refiner interaction as a smoothed mixture policy, proving that selective correction yields strict performance gains over strong baselines. Extensive experiments across various general and multi-hop QA datasets demonstrate that Search-R2 consistently outperforms strong RAG and RL-based baselines across model scales, achieving superior reasoning accuracy with minimal overhead.
- [681] arXiv:2602.03648 [pdf, ps, other]
-
Title: Can Developers rely on LLMs for Secure IaC Development?Subjects: Cryptography and Security (cs.CR)
We investigated the capabilities of GPT-4o and Gemini 2.0 Flash for secure Infrastructure as Code (IaC) development. For security smell detection, on the Stack Overflow dataset, which primarily contains small, simplified code snippets, the models detected at least 71% of security smells when prompted to analyze code from a security perspective (general prompt). With a guided prompt (adding clear, step-by-step instructions), this increased to 78%.In GitHub repositories, which contain complete, real-world project scripts, a general prompt was less effective, leaving more than half of the smells undetected. However, with the guided prompt, the models uncovered at least 67% of the smells. For secure code generation, we prompted LLMs with 89 vulnerable synthetic scenarios and observed that only 7% of the generated scripts were secure. Adding an explicit instruction to generate secure code increased GPT secure output rate to 17%, while Gemini changed little (8%). These results highlight the need for further research to improve LLMs' capabilities in assisting developers with secure IaC development.
- [682] arXiv:2602.03652 [pdf, ps, other]
-
Title: RAGTurk: Best Practices for Retrieval Augmented Generation in TurkishAuthors: Süha Kağan Köse, Mehmet Can Baytekin, Burak Aktaş, Bilge Kaan Görür, Evren Ayberk Munis, Deniz Yılmaz, Muhammed Yusuf Kartal, Çağrı ToramanComments: Accepted by EACL 2026 SIGTURKSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Retrieval-Augmented Generation (RAG) enhances LLM factuality, yet design guidance remains English-centric, limiting insights for morphologically rich languages like Turkish. We address this by constructing a comprehensive Turkish RAG dataset derived from Turkish Wikipedia and CulturaX, comprising question-answer pairs and relevant passage chunks. We benchmark seven stages of the RAG pipeline, from query transformation and reranking to answer refinement, without task-specific fine-tuning. Our results show that complex methods like HyDE maximize accuracy (85%) that is considerably higher than the baseline (78.70%). Also a Pareto-optimal configuration using Cross-encoder Reranking and Context Augmentation achieves comparable performance (84.60%) with much lower cost. We further demonstrate that over-stacking generative modules can degrade performance by distorting morphological cues, whereas simple query clarification with robust reranking offers an effective solution.
- [683] arXiv:2602.03655 [pdf, ps, other]
-
Title: Sequential Group Composition: A Window into the Mechanics of Deep LearningSubjects: Machine Learning (cs.LG)
How do neural networks trained over sequences acquire the ability to perform structured operations, such as arithmetic, geometric, and algorithmic computation? To gain insight into this question, we introduce the sequential group composition task. In this task, networks receive a sequence of elements from a finite group encoded in a real vector space and must predict their cumulative product. The task can be order-sensitive and requires a nonlinear architecture to be learned. Our analysis isolates the roles of the group structure, encoding statistics, and sequence length in shaping learning. We prove that two-layer networks learn this task one irreducible representation of the group at a time in an order determined by the Fourier statistics of the encoding. These networks can perfectly learn the task, but doing so requires a hidden width exponential in the sequence length $k$. In contrast, we show how deeper models exploit the associativity of the task to dramatically improve this scaling: recurrent neural networks compose elements sequentially in $k$ steps, while multilayer networks compose adjacent pairs in parallel in $\log k$ layers. Overall, the sequential group composition task offers a tractable window into the mechanics of deep learning.
- [684] arXiv:2602.03662 [pdf, ps, other]
-
Title: RIPPLE: Lifecycle-aware Embedding of Service Function Chains in Multi-access Edge ComputingSubjects: Networking and Internet Architecture (cs.NI)
In Multi-access Edge Computing networks, services can be deployed on nearby edge clouds (EC) as service function chains (SFCs) to meet strict quality of service (QoS) requirements. As users move, frequent SFC reconfigurations are required, but these are non-trivial: SFCs can serve users only when all required virtual network functions (VNFs) are available, and VNFs undergo time-consuming lifecycle operations before becoming operational. We show that ignoring lifecycle dynamics oversimplifies deployment, jeopardizes QoS, and must be avoided in practical SFC management. To address this, forecasts of user connectivity can be leveraged to proactively deploy VNFs and reconfigure SFCs. But forecasts are inherently imperfect, requiring lifecycle and connectivity uncertainty to be jointly considered. We present RIPPLE, a lifecycle-aware SFC embedding approach to deploy VNFs at the right time and location, reducing service interruptions. We show that RIPPLE closes the gap with solutions that unrealistically assume instantaneous lifecycle, even under realistic lifecycle constraints.
- [685] arXiv:2602.03664 [pdf, ps, other]
-
Title: Mitigating Conversational Inertia in Multi-Turn AgentsAuthors: Yang Wan, Zheng Cao, Zhenhao Zhang, Zhengwen Zeng, Shuheng Shen, Changhua Meng, Linchao ZhuSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large language models excel as few-shot learners when provided with appropriate demonstrations, yet this strength becomes problematic in multiturn agent scenarios, where LLMs erroneously mimic their own previous responses as few-shot examples. Through attention analysis, we identify conversational inertia, a phenomenon where models exhibit strong diagonal attention to previous responses, which is associated with imitation bias that constrains exploration. This reveals a tension when transforming few-shot LLMs into agents: longer context enriches environmental feedback for exploitation, yet also amplifies conversational inertia that undermines exploration. Our key insight is that for identical states, actions generated with longer contexts exhibit stronger inertia than those with shorter contexts, enabling construction of preference pairs without environment rewards. Based on this, we propose Context Preference Learning to calibrate model preferences to favor low-inertia responses over highinertia ones. We further provide context management strategies at inference time to balance exploration and exploitation. Experimental results across eight agentic environments and one deep research scenario validate that our framework reduces conversational inertia and achieves performance improvements.
- [686] arXiv:2602.03665 [pdf, ps, other]
-
Title: MM-SCALE: Grounded Multimodal Moral Reasoning via Scalar Judgment and Listwise AlignmentAuthors: Eunkyu Park, Wesley Hanwen Deng, Cheyon Jin, Matheus Kunzler Maldaner, Jordan Wheeler, Jason I. Hong, Hong Shen, Adam Perer, Ken Holstein, Motahhare Eslami, Gunhee KimSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Vision-Language Models (VLMs) continue to struggle to make morally salient judgments in multimodal and socially ambiguous contexts. Prior works typically rely on binary or pairwise supervision, which often fail to capture the continuous and pluralistic nature of human moral reasoning. We present MM-SCALE (Multimodal Moral Scale), a large-scale dataset for aligning VLMs with human moral preferences through 5-point scalar ratings and explicit modality grounding. Each image-scenario pair is annotated with moral acceptability scores and grounded reasoning labels by humans using an interface we tailored for data collection, enabling listwise preference optimization over ranked scenario sets. By moving from discrete to scalar supervision, our framework provides richer alignment signals and finer calibration of multimodal moral reasoning. Experiments show that VLMs fine-tuned on MM-SCALE achieve higher ranking fidelity and more stable safety calibration than those trained with binary signals.
- [687] arXiv:2602.03666 [pdf, ps, other]
-
Title: Reference-Free EM Validation Flow for Detecting Triggered Hardware TrojansComments: Accepted at International Symposium on Quality Electronic Design (ISQED), 2026Subjects: Cryptography and Security (cs.CR)
Hardware Trojans (HTs) threaten the trust and reliability of integrated circuits (ICs), particularly when triggered HTs remain dormant during standard testing and activate only under rare conditions. Existing electromagnetic (EM) side-channel-based detection techniques often rely on golden references or labeled data, which are infeasible in modern distributed manufacturing. This paper introduces a reference-free, design-agnostic framework for detecting triggered HTs directly from post-silicon EM emissions. The proposed flow converts each EM trace into a time-frequency scalogram using Continuous Wavelet Transform (CWT), extracts discriminative features through a convolutional neural network (CNN), reduces dimensionality with principal component analysis (PCA), and applies Bayesian Gaussian Mixture Modeling (BGMM) for unsupervised probabilistic clustering. The framework quantifies detection confidence using posterior-based metrics (alpha_{post}, beta_{post}), Bayesian information criterion (Delta BIC), and Mahalanobis cluster separation (D), enabling interpretable anomaly decisions without golden data. Experimental validation on AES-128 designs embedded with four different HTs demonstrates high separability between HT-free and HT-activated conditions and robustness to PCA variance thresholds. The results highlight the method's scalability, statistical interpretability, and potential for extension to runtime and in-field HT monitoring in trusted microelectronics.
- [688] arXiv:2602.03668 [pdf, ps, other]
-
Title: MVP-LAM: Learning Action-Centric Latent Action via Cross-Viewpoint ReconstructionAuthors: Jung Min Lee, Dohyeok Lee, Seokhun Ju, Taehyun Cho, Jin Woo Koo, Li Zhao, Sangwoo Hong, Jungwoo LeeSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Learning \emph{latent actions} from diverse human videos enables scaling robot learning beyond embodiment-specific robot datasets, and these latent actions have recently been used as pseudo-action labels for vision-language-action (VLA) model pretraining. To make VLA pretraining effective, latent actions should contain information about the underlying agent's actions despite the absence of ground-truth labels. We propose \textbf{M}ulti-\textbf{V}iew\textbf{P}oint \textbf{L}atent \textbf{A}ction \textbf{M}odel (\textbf{MVP-LAM}), which learns discrete latent actions that are highly informative about ground-truth actions from time-synchronized multi-view videos. MVP-LAM trains latent actions with a \emph{cross-viewpoint reconstruction} objective, so that a latent action inferred from one view must explain the future in another view, reducing reliance on viewpoint-specific cues. On Bridge V2, MVP-LAM produces more action-centric latent actions, achieving higher mutual information with ground-truth actions and improved action prediction, including under out-of-distribution evaluation. Finally, pretraining VLAs with MVP-LAM latent actions improves downstream manipulation performance on the SIMPLER and LIBERO-Long benchmarks.
- [689] arXiv:2602.03669 [pdf, ps, other]
-
Title: Efficient Sequential Neural Network with Spatial-Temporal Attention and Linear LSTM for Robust Lane Detection Using Multi-Frame ImagesComments: 14 pages, 9 figures, under review by IEEE T-ITSSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Lane detection is a crucial perception task for all levels of automated vehicles (AVs) and Advanced Driver Assistance Systems, particularly in mixed-traffic environments where AVs must interact with human-driven vehicles (HDVs) and challenging traffic scenarios. Current methods lack versatility in delivering accurate, robust, and real-time compatible lane detection, especially vision-based methods often neglect critical regions of the image and their spatial-temporal (ST) salience, leading to poor performance in difficult circumstances such as serious occlusion and dazzle lighting. This study introduces a novel sequential neural network model with a spatial-temporal attention mechanism to focus on key features of lane lines and exploit salient ST correlations among continuous image frames. The proposed model, built on a standard encoder-decoder structure and common neural network backbones, is trained and evaluated on three large-scale open-source datasets. Extensive experiments demonstrate the strength and robustness of the proposed model, outperforming state-of-the-art methods in various testing scenarios. Furthermore, with the ST attention mechanism, the developed sequential neural network models exhibit fewer parameters and reduced Multiply-Accumulate Operations (MACs) compared to baseline sequential models, highlighting their computational efficiency. Relevant data, code, and models are released at https://doi.org/10.4121/4619cab6-ae4a-40d5-af77-582a77f3d821.
- [690] arXiv:2602.03670 [pdf, ps, other]
-
Title: Equilibrium Propagation for Non-Conservative SystemsComments: 19 pages (9 pages main text), 7 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Dynamical Systems (math.DS); Classical Physics (physics.class-ph)
Equilibrium Propagation (EP) is a physics-inspired learning algorithm that uses stationary states of a dynamical system both for inference and learning. In its original formulation it is limited to conservative systems, $\textit{i.e.}$ to dynamics which derive from an energy function. Given their importance in applications, it is important to extend EP to nonconservative systems, $\textit{i.e.}$ systems with non-reciprocal interactions. Previous attempts to generalize EP to such systems failed to compute the exact gradient of the cost function. Here we propose a framework that extends EP to arbitrary nonconservative systems, including feedforward networks. We keep the key property of equilibrium propagation, namely the use of stationary states both for inference and learning. However, we modify the dynamics in the learning phase by a term proportional to the non-reciprocal part of the interaction so as to obtain the exact gradient of the cost function. This algorithm can also be derived using a variational formulation that generates the learning dynamics through an energy function defined over an augmented state space. Numerical experiments using the MNIST database show that this algorithm achieves better performance and learns faster than previous proposals.
- [691] arXiv:2602.03671 [pdf, ps, other]
-
Title: mopri - An Analysis Framework for Unveiling Privacy Violations in Mobile AppsSubjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Everyday services of society increasingly rely on mobile applications, resulting in a conflicting situation between the possibility of participation on the one side and user privacy and digital freedom on the other. In order to protect users' rights to informational self-determination, regulatory approaches for the collection and processing of personal data have been developed, such as the EU's GDPR. However, inspecting the compliance of mobile apps with privacy regulations remains difficult. Thus, in order to enable end users and enforcement bodies to verify and enforce data protection compliance, we propose mopri, a conceptual framework designed for analyzing the behavior of mobile apps through a comprehensive, adaptable, and user-centered approach. Recognizing the gaps in existing frameworks, mopri serves as a foundation for integrating various analysis tools into a streamlined, modular pipeline that employs static and dynamic analysis methods. Building on this concept, a prototype has been developed which effectively extracts permissions and tracking libraries while employing robust methods for dynamic traffic recording and decryption. Additionally, it incorporates result enrichment and reporting features that enhance the clarity and usability of the analysis outcomes. The prototype showcases the feasibility of a holistic and modular approach to privacy analysis, emphasizing the importance of continuous adaptation to the evolving challenges presented by the mobile app ecosystem.
- [692] arXiv:2602.03673 [pdf, ps, other]
-
Title: Referring Industrial Anomaly SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Industrial Anomaly Detection (IAD) is vital for manufacturing, yet traditional methods face significant challenges: unsupervised approaches yield rough localizations requiring manual thresholds, while supervised methods overfit due to scarce, imbalanced data. Both suffer from the "One Anomaly Class, One Model" limitation. To address this, we propose Referring Industrial Anomaly Segmentation (RIAS), a paradigm leveraging language to guide detection. RIAS generates precise masks from text descriptions without manual thresholds and uses universal prompts to detect diverse anomalies with a single model. We introduce the MVTec-Ref dataset to support this, designed with diverse referring expressions and focusing on anomaly patterns, notably with 95% small anomalies. We also propose the Dual Query Token with Mask Group Transformer (DQFormer) benchmark, enhanced by Language-Gated Multi-Level Aggregation (LMA) to improve multi-scale segmentation. Unlike traditional methods using redundant queries, DQFormer employs only "Anomaly" and "Background" tokens for efficient visual-textual integration. Experiments demonstrate RIAS's effectiveness in advancing IAD toward open-set capabilities. Code: https://github.com/swagger-coder/RIAS-MVTec-Ref.
- [693] arXiv:2602.03674 [pdf, ps, other]
-
Title: When Should Agents Coordinate in Differentiable Sequential Decision Problems?Comments: 15 content pages, 2 pages for references, 4 figuresSubjects: Multiagent Systems (cs.MA); Computer Science and Game Theory (cs.GT); Robotics (cs.RO); Optimization and Control (math.OC)
Multi-robot teams must coordinate to operate effectively. When a team operates in an uncoordinated manner, and agents choose actions that are only individually optimal, the team's outcome can suffer. However, in many domains, coordination requires costly communication. We explore the value of coordination in a broad class of differentiable motion-planning problems. In particular, we model coordinated behavior as a spectrum: at one extreme, agents jointly optimize a common team objective, and at the other, agents make unilaterally optimal decisions given their individual decision variables, i.e., they operate at Nash equilibria. We then demonstrate that reasoning about coordination in differentiable motion-planning problems reduces to reasoning about the second-order properties of agents' objectives, and we provide algorithms that use this second-order reasoning to determine at which times a team of agents should coordinate.
- [694] arXiv:2602.03677 [pdf, ps, other]
-
Title: Instruction Anchors: Dissecting the Causal Dynamics of Modality ArbitrationComments: Modality FollowingSubjects: Computation and Language (cs.CL)
Modality following serves as the capacity of multimodal large language models (MLLMs) to selectively utilize multimodal contexts based on user instructions. It is fundamental to ensuring safety and reliability in real-world deployments. However, the underlying mechanisms governing this decision-making process remain poorly understood. In this paper, we investigate its working mechanism through an information flow lens. Our findings reveal that instruction tokens function as structural anchors for modality arbitration: Shallow attention layers perform non-selective information transfer, routing multimodal cues to these anchors as a latent buffer; Modality competition is resolved within deep attention layers guided by the instruction intent, while MLP layers exhibit semantic inertia, acting as an adversarial force. Furthermore, we identify a sparse set of specialized attention heads that drive this arbitration. Causal interventions demonstrate that manipulating a mere $5\%$ of these critical heads can decrease the modality-following ratio by $60\%$ through blocking, or increase it by $60\%$ through targeted amplification of failed samples. Our work provides a substantial step toward model transparency and offers a principled framework for the orchestration of multimodal information in MLLMs.
- [695] arXiv:2602.03678 [pdf, ps, other]
-
Title: ContraLog: Log File Anomaly Detection with Contrastive Learning and Masked Language ModelingComments: 26 pages with 16 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Log files record computational events that reflect system state and behavior, making them a primary source of operational insights in modern computer systems. Automated anomaly detection on logs is therefore critical, yet most established methods rely on log parsers that collapse messages into discrete templates, discarding variable values and semantic content. We propose ContraLog, a parser-free and self-supervised method that reframes log anomaly detection as predicting continuous message embeddings rather than discrete template IDs. ContraLog combines a message encoder that produces rich embeddings for individual log messages with a sequence encoder to model temporal dependencies within sequences. The model is trained with a combination of masked language modeling and contrastive learning to predict masked message embeddings based on the surrounding context. Experiments on the HDFS, BGL, and Thunderbird benchmark datasets empirically demonstrate effectiveness on complex datasets with diverse log messages. Additionally, we find that message embeddings generated by ContraLog carry meaningful information and are predictive of anomalies even without sequence context. These results highlight embedding-level prediction as an approach for log anomaly detection, with potential applicability to other event sequences.
- [696] arXiv:2602.03681 [pdf, ps, other]
-
Title: Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention ModelsComments: 17 pages, 8 figuresSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
The quadratic computational complexity of softmax transformers has become a bottleneck in long-context scenarios. In contrast, linear attention model families provide a promising direction towards a more efficient sequential model. These linear attention models compress past KV values into a single hidden state, thereby efficiently reducing complexity during both training and inference. However, their expressivity remains limited by the size of their hidden state. Previous work proposed interleaving softmax and linear attention layers to reduce computational complexity while preserving expressivity. Nevertheless, the efficiency of these models remains bottlenecked by their softmax attention layers. In this paper, we propose Neural Attention Search Linear (NAtS-L), a framework that applies both linear attention and softmax attention operations within the same layer on different tokens. NAtS-L automatically determines whether a token can be handled by a linear attention model, i.e., tokens that have only short-term impact and can be encoded into fixed-size hidden states, or require softmax attention, i.e., tokens that contain information related to long-term retrieval and need to be preserved for future queries. By searching for optimal Gated DeltaNet and softmax attention combinations across tokens, we show that NAtS-L provides a strong yet efficient token-level hybrid architecture.
- [697] arXiv:2602.03685 [pdf, ps, other]
-
Title: Universal One-third Time Scaling in Learning Peaked DistributionsComments: 24 pages, 6 main text figures, 27 figures in totalSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Training large language models (LLMs) is computationally expensive, partly because the loss exhibits slow power-law convergence whose origin remains debatable. Through systematic analysis of toy models and empirical evaluation of LLMs, we show that this behavior can arise intrinsically from the use of softmax and cross-entropy. When learning peaked probability distributions, e.g., next-token distributions, these components yield power-law vanishing losses and gradients, creating a fundamental optimization bottleneck. This ultimately leads to power-law time scaling of the loss with a universal exponent of $1/3$. Our results provide a mechanistic explanation for observed neural scaling and suggest new directions for improving LLM training efficiency.
- [698] arXiv:2602.03686 [pdf, ps, other]
-
Title: QuAIL: Quality-Aware Inertial Learning for Robust Training under Data CorruptionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Tabular machine learning systems are frequently trained on data affected by non-uniform corruption, including noisy measurements, missing entries, and feature-specific biases. In practice, these defects are often documented only through column-level reliability indicators rather than instance-wise quality annotations, limiting the applicability of many robustness and cleaning techniques. We present QuAIL, a quality-informed training mechanism that incorporates feature reliability priors directly into the learning process. QuAIL augments existing models with a learnable feature-modulation layer whose updates are selectively constrained by a quality-dependent proximal regularizer, thereby inducing controlled adaptation across features of varying trustworthiness. This stabilizes optimization under structured corruption without explicit data repair or sample-level reweighting. Empirical evaluation across 50 classification and regression datasets demonstrates that QuAIL consistently improves average performance over neural baselines under both random and value-dependent corruption, with especially robust behavior in low-data and systematically biased settings. These results suggest that incorporating feature reliability information directly into optimization dynamics is a practical and effective approach for resilient tabular learning.
- [699] arXiv:2602.03687 [pdf, ps, other]
-
Title: Efficient Investment in Multi-Agent Models of Public TransportationSubjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
We study two stylized, multi-agent models aimed at investing a limited, indivisible resource in public transportation. In the first model, we face the decision of which potential stops to open along a (e.g., bus) path, given agents' travel demands. While it is known that utilitarian optimal solutions can be identified in polynomial time, we find that computing approximately optimal solutions with respect to egalitarian welfare is NP-complete. This is surprising as we operate on the simple topology of a line graph.
In the second model, agents navigate a more complex network modeled by a weighted graph where edge weights represent distances. We face the decision of improving travel time along a fixed number of edges. We provide a polynomial-time algorithm that combines Dijkstra's algorithm with a dynamical program to find the optimal decision for one or two agents. By contrast, if the number of agents is variable, we find \np-completeness and inapproximability results for utilitarian and egalitarian welfare. Moreover, we demonstrate implications of our results for a related model of railway network design. - [700] arXiv:2602.03688 [pdf, ps, other]
-
Title: TodyComm: Task-Oriented Dynamic Communication for Multi-Round LLM-based Multi-Agent SystemSubjects: Artificial Intelligence (cs.AI)
Multi-round LLM-based multi-agent systems rely on effective communication structures to support collaboration across rounds. However, most existing methods employ a fixed communication topology during inference, which falls short in many realistic applications where the agents' roles may change \textit{across rounds} due to dynamic adversary, task progression, or time-varying constraints such as communication bandwidth. In this paper, we propose addressing this issue through TodyComm, a \textbf{t}ask-\textbf{o}riented \textbf{dy}namic \textbf{comm}unication algorithm. It produces behavior-driven collaboration topologies that adapt to the dynamics at each round, optimizing the utility for the task through policy gradient. Experiments on five benchmarks demonstrate that under both dynamic adversary and communications budgets, TodyComm delivers superior task effectiveness while retaining token efficiency and scalability.
- [701] arXiv:2602.03689 [pdf, ps, other]
-
Title: Rethinking the Reranker: Boundary-Aware Evidence Selection for Robust Retrieval-Augmented GenerationAuthors: Jiashuo Sun, Pengcheng Jiang, Saizhuo Wang, Jiajun Fan, Heng Wang, Siru Ouyang, Ming Zhong, Yizhu Jiao, Chengsong Huang, Xueqiang Xu, Pengrui Han, Peiran Li, Jiaxin Huang, Ge Liu, Heng Ji, Jiawei HanComments: 19 pages, 8 tables, 5 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Retrieval-Augmented Generation (RAG) systems remain brittle under realistic retrieval noise, even when the required evidence appears in the top-K results. A key reason is that retrievers and rerankers optimize solely for relevance, often selecting either trivial, answer-revealing passages or evidence that lacks the critical information required to answer the question, without considering whether the evidence is suitable for the generator. We propose BAR-RAG, which reframes the reranker as a boundary-aware evidence selector that targets the generator's Goldilocks Zone -- evidence that is neither trivially easy nor fundamentally unanswerable for the generator, but is challenging yet sufficient for inference and thus provides the strongest learning signal. BAR-RAG trains the selector with reinforcement learning using generator feedback, and adopts a two-stage pipeline that fine-tunes the generator under the induced evidence distribution to mitigate the distribution mismatch between training and inference. Experiments on knowledge-intensive question answering benchmarks show that BAR-RAG consistently improves end-to-end performance under noisy retrieval, achieving an average gain of 10.3 percent over strong RAG and reranking baselines while substantially improving robustness. Code is publicly avaliable at https://github.com/GasolSun36/BAR-RAG.
- [702] arXiv:2602.03690 [pdf, ps, other]
-
Title: LLM-Inspired Pretrain-Then-Finetune for Small-Data, Large-Scale OptimizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We consider small-data, large-scale decision problems in which a firm must make many operational decisions simultaneously (e.g., across a large product portfolio) while observing only a few, potentially noisy, data points per instance. Inspired by the success of large language models (LLMs), we propose a pretrain-then-finetune approach built on a designed Transformer model to address this challenge. The model is first pretrained on large-scale, domain-informed synthetic data that encode managerial knowledge and structural features of the decision environment, and is then fine-tuned on real observations. This new pipeline offers two complementary advantages: pretraining injects domain knowledge into the learning process and enables the training of high-capacity models using abundant synthetic data, while finetuning adapts the pretrained model to the operational environment and improves alignment with the true data-generating regime. While we have leveraged the Transformer's state-of-the-art representational capacity, particularly its attention mechanism, to efficiently extract cross-task structure, our approach is not an off-the-shelf application. Instead, it relies on problem-specific architectural design and a tailored training procedure to match the decision setting. Theoretically, we develop the first comprehensive error analysis regarding Transformer learning in relevant contexts, establishing nonasymptotic guarantees that validate the method's effectiveness. Critically, our analysis reveals how pretraining and fine-tuning jointly determine performance, with the dominant contribution governed by whichever is more favorable. In particular, finetuning exhibits an economies-of-scale effect, whereby transfer learning becomes increasingly effective as the number of instances grows.
- [703] arXiv:2602.03691 [pdf, ps, other]
-
Title: Input-to-State Safe Backstepping: Robust Safety-Critical Control with Unmatched UncertaintiesComments: To appear at the 2026 American Control ConferenceSubjects: Systems and Control (eess.SY); Robotics (cs.RO); Optimization and Control (math.OC)
Guaranteeing safety in the presence of unmatched disturbances -- uncertainties that cannot be directly canceled by the control input -- remains a key challenge in nonlinear control. This paper presents a constructive approach to safety-critical control of nonlinear systems with unmatched disturbances. We first present a generalization of the input-to-state safety (ISSf) framework for systems with these uncertainties using the recently developed notion of an Optimal Decay CBF, which provides more flexibility for satisfying the associated Lyapunov-like conditions for safety. From there, we outline a procedure for constructing ISSf-CBFs for two relevant classes of systems with unmatched uncertainties: i) strict-feedback systems; ii) dual-relative-degree systems, which are similar to differentially flat systems. Our theoretical results are illustrated via numerical simulations of an inverted pendulum and planar quadrotor.
- [704] arXiv:2602.03692 [pdf, ps, other]
-
Title: Bringing Reasoning to Generative Recommendation Through the Lens of Cascaded RankingAuthors: Xinyu Lin, Pengyuan Liu, Wenjie Wang, Yicheng Hu, Chen Xu, Fuli Feng, Qifan Wang, Tat-Seng ChuaComments: Accepted by WWW2026Subjects: Information Retrieval (cs.IR)
Generative Recommendation (GR) has become a promising end-to-end approach with high FLOPS utilization for resource-efficient recommendation. Despite the effectiveness, we show that current GR models suffer from a critical \textbf{bias amplification} issue, where token-level bias escalates as token generation progresses, ultimately limiting the recommendation diversity and hurting the user experience. By comparing against the key factor behind the success of traditional multi-stage pipelines, we reveal two limitations in GR that can amplify the bias: homogeneous reliance on the encoded history, and fixed computational budgets that prevent deeper user preference understanding.
To combat the bias amplification issue, it is crucial for GR to 1) incorporate more heterogeneous information, and 2) allocate greater computational resources at each token generation step. To this end, we propose CARE, a simple yet effective cascaded reasoning framework for debiased GR. To incorporate heterogeneous information, we introduce a progressive history encoding mechanism, which progressively incorporates increasingly fine-grained history information as the generation process advances. To allocate more computations, we propose a query-anchored reasoning mechanism, which seeks to perform a deeper understanding of historical information through parallel reasoning steps. We instantiate CARE on three GR backbones. Empirical results on four datasets show the superiority of CARE in recommendation accuracy, diversity, efficiency, and promising scalability. The codes and datasets are available at https://github.com/Linxyhaha/CARE. - [705] arXiv:2602.03693 [pdf, ps, other]
-
Title: OCRTurk: A Comprehensive OCR Benchmark for TurkishAuthors: Deniz Yılmaz, Evren Ayberk Munis, Çağrı Toraman, Süha Kağan Köse, Burak Aktaş, Mehmet Can Baytekin, Bilge Kaan GörürComments: Accepted by EACL 2026 SIGTURKSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Document parsing is now widely used in applications, such as large-scale document digitization, retrieval-augmented generation, and domain-specific pipelines in healthcare and education. Benchmarking these models is crucial for assessing their reliability and practical robustness. Existing benchmarks mostly target high-resource languages and provide limited coverage for low-resource settings, such as Turkish. Moreover, existing studies on Turkish document parsing lack a standardized benchmark that reflects real-world scenarios and document diversity. To address this gap, we introduce OCRTurk, a Turkish document parsing benchmark covering multiple layout elements and document categories at three difficulty levels. OCRTurk consists of 180 Turkish documents drawn from academic articles, theses, slide decks, and non-academic articles. We evaluate seven OCR models on OCRTurk using element-wise metrics. Across difficulty levels, PaddleOCR achieves the strongest overall results, leading most element-wise metrics except figures and attaining high Normalized Edit Distance scores in easy, medium, and hard subsets. We also observe performance variation by document type. Models perform well on non-academic documents, while slideshows become the most challenging.
- [706] arXiv:2602.03695 [pdf, ps, other]
-
Title: Agent Primitives: Reusable Latent Building Blocks for Multi-Agent SystemsComments: 16 pagesSubjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
While existing multi-agent systems (MAS) can handle complex problems by enabling collaboration among multiple agents, they are often highly task-specific, relying on manually crafted agent roles and interaction prompts, which leads to increased architectural complexity and limited reusability across tasks. Moreover, most MAS communicate primarily through natural language, making them vulnerable to error accumulation and instability in long-context, multi-stage interactions within internal agent histories.
In this work, we propose \textbf{Agent Primitives}, a set of reusable latent building blocks for LLM-based MAS. Inspired by neural network design, where complex models are built from reusable components, we observe that many existing MAS architectures can be decomposed into a small number of recurring internal computation patterns. Based on this observation, we instantiate three primitives: Review, Voting and Selection, and Planning and Execution. All primitives communicate internally via key-value (KV) cache, which improves both robustness and efficiency by mitigating information degradation across multi-stage interactions. To enable automatic system construction, an Organizer agent selects and composes primitives for each query, guided by a lightweight knowledge pool of previously successful configurations, forming a primitive-based MAS.
Experiments show that primitives-based MAS improve average accuracy by 12.0-16.5\% over single-agent baselines, reduce token usage and inference latency by approximately 3$\times$-4$\times$ compared to text-based MAS, while incurring only 1.3$\times$-1.6$\times$ overhead relative to single-agent inference and providing more stable performance across model backbones. - [707] arXiv:2602.03696 [pdf, ps, other]
-
Title: Conflict-Resolving and Sharpness-Aware Minimization for Generalized Knowledge Editing with Multiple UpdatesComments: 22 pages, 8 figures. Code link: this https URLSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Large language models (LLMs) rely on internal knowledge to solve many downstream tasks, making it crucial to keep them up to date. Since full retraining is expensive, prior work has explored efficient alternatives such as model editing and parameter-efficient fine-tuning. However, these approaches often break down in practice due to poor generalization across inputs, limited stability, and knowledge conflict. To address these limitations, we propose the CoRSA (Conflict-Resolving and Sharpness-Aware Minimization) training framework, a parameter-efficient, holistic approach for knowledge editing with multiple updates. CoRSA tackles multiple challenges simultaneously: it improves generalization to different input forms and enhances stability across multiple updates by minimizing loss curvature, and resolves conflicts by maximizing the margin between new and prior knowledge. Across three widely used fact editing benchmarks, CoRSA achieves significant gains in generalization, outperforming baselines with average absolute improvements of 12.42% over LoRA and 10% over model editing methods. With multiple updates, it maintains high update efficacy while reducing catastrophic forgetting by 27.82% compared to LoRA. CoRSA also generalizes to the code domain, outperforming the strongest baseline by 5.48% Pass@5 in update efficacy.
- [708] arXiv:2602.03698 [pdf, ps, other]
-
Title: Data-Driven Graph Filters via Adaptive Spectral ShapingSubjects: Machine Learning (cs.LG)
We introduce Adaptive Spectral Shaping, a data-driven framework for graph filtering that learns a reusable baseline spectral kernel and modulates it with a small set of Gaussian factors. The resulting multi-peak, multi-scale responses allocate energy to heterogeneous regions of the Laplacian spectrum while remaining interpretable via explicit centers and bandwidths. To scale, we implement filters with Chebyshev polynomial expansions, avoiding eigendecompositions. We further propose Transferable Adaptive Spectral Shaping (TASS): the baseline kernel is learned on source graphs and, on a target graph, kept fixed while only the shaping parameters are adapted, enabling few-shot transfer under matched compute. Across controlled synthetic benchmarks spanning graph families and signal regimes, Adaptive Spectral Shaping reduces reconstruction error relative to fixed-prototype wavelets and learned linear banks, and TASS yields consistent positive transfer. The framework provides compact spectral modules that plug into graph signal processing pipelines and graph neural networks, combining scalability, interpretability, and cross-graph generalization.
- [709] arXiv:2602.03701 [pdf, ps, other]
-
Title: A Formal Analysis of Capacity Scaling Algorithms for Minimum-Cost FlowsComments: Related Conference Paper: this https URLSubjects: Logic in Computer Science (cs.LO)
We present formalisations of the correctness of executable algorithms to solve minimum-cost flow problems in Isabelle/HOL. Two of the algorithms are based on the technique of scaling, most notably Orlin's algorithm, which has the fastest known running time for solving the problem of minimum-cost flow. We also include a formalisation of the worst-case running time argument for Orlin's algorithm. Our verified implementation of this algorithm, which is derived by the technique of stepwise refinement, is fully executable and was integrated into a reusable formal library on graph algorithms. Because the problems for which Orlin's algorithm works are restricted, we also verified an executable reduction from the general minimum-cost flow problem. We believe we are the first to formally consider the problem of minimum-cost flows and, more generally, any scaling algorithms. Our work has also led to a number of mathematical insights and improvements to proofs as well as theorem statements, compared to all existing expositions.
- [710] arXiv:2602.03702 [pdf, ps, other]
-
Title: Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight AveragingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Large language models are increasingly trained in continual or open-ended settings, where the total training horizon is not known in advance. Despite this, most existing pretraining recipes are not anytime: they rely on horizon-dependent learning rate schedules and extensive tuning under a fixed compute budget. In this work, we provide a theoretical analysis demonstrating the existence of anytime learning schedules for overparameterized linear regression, and we highlight the central role of weight averaging - also known as model merging - in achieving the minimax convergence rates of stochastic gradient descent. We show that these anytime schedules polynomially decay with time, with the decay rate determined by the source and capacity conditions of the problem. Empirically, we evaluate 150M and 300M parameter language models trained at 1-32x Chinchilla scale, comparing constant learning rates with weight averaging and $1/\sqrt{t}$ schedules with weight averaging against a well-tuned cosine schedule. Across the full training range, the anytime schedules achieve comparable final loss to cosine decay. Taken together, our results suggest that weight averaging combined with simple, horizon-free step sizes offers a practical and effective anytime alternative to cosine learning rate schedules for large language model pretraining.
- [711] arXiv:2602.03704 [pdf, ps, other]
-
Title: Cognitively Diverse Multiple-Choice Question Generation: A Hybrid Multi-Agent Framework with Large Language ModelsAuthors: Yu Tian, Linh Huynh, Katerina Christhilf, Shubham Chakraborty, Micah Watanabe, Tracy Arner, Danielle McNamaraComments: This manuscript is under review at ElectronicsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Recent advances in large language models (LLMs) have made automated multiple-choice question (MCQ) generation increasingly feasible; however, reliably producing items that satisfy controlled cognitive demands remains a challenge. To address this gap, we introduce ReQUESTA, a hybrid, multi-agent framework for generating cognitively diverse MCQs that systematically target text-based, inferential, and main idea comprehension. ReQUESTA decomposes MCQ authoring into specialized subtasks and coordinates LLM-powered agents with rule-based components to support planning, controlled generation, iterative evaluation, and post-processing. We evaluated the framework in a large-scale reading comprehension study using academic expository texts, comparing ReQUESTA-generated MCQs with those produced by a single-pass GPT-5 zero-shot baseline. Psychometric analyses of learner responses assessed item difficulty and discrimination, while expert raters evaluated question quality across multiple dimensions, including topic relevance and distractor quality. Results showed that ReQUESTA-generated items were consistently more challenging, more discriminative, and more strongly aligned with overall reading comprehension performance. Expert evaluations further indicated stronger alignment with central concepts and superior distractor linguistic consistency and semantic plausibility, particularly for inferential questions. These findings demonstrate that hybrid, agentic orchestration can systematically improve the reliability and controllability of LLM-based generation, highlighting workflow design as a key lever for structured artifact generation beyond single-pass prompting.
- [712] arXiv:2602.03707 [pdf, ps, other]
-
Title: OmniRAG-Agent: Agentic Omnimodal Reasoning for Low-Resource Long Audio-Video Question AnsweringSubjects: Computation and Language (cs.CL)
Long-horizon omnimodal question answering answers questions by reasoning over text, images, audio, and video. Despite recent progress on OmniLLMs, low-resource long audio-video QA still suffers from costly dense encoding, weak fine-grained retrieval, limited proactive planning, and no clear end-to-end optimization.To address these issues, we propose OmniRAG-Agent, an agentic omnimodal QA method for budgeted long audio-video reasoning. It builds an image-audio retrieval-augmented generation module that lets an OmniLLM fetch short, relevant frames and audio snippets from external banks. Moreover, it uses an agent loop that plans, calls tools across turns, and merges retrieved evidence to answer complex queries. Furthermore, we apply group relative policy optimization to jointly improve tool use and answer quality over time. Experiments on OmniVideoBench, WorldSense, and Daily-Omni show that OmniRAG-Agent consistently outperforms prior methods under low-resource settings and achieves strong results, with ablations validating each component.
- [713] arXiv:2602.03708 [pdf, ps, other]
-
Title: Beyond Tokens: Semantic-Aware Speculative Decoding for Efficient Inference by Probing Internal StatesSubjects: Computation and Language (cs.CL); Performance (cs.PF)
Large Language Models (LLMs) achieve strong performance across many tasks but suffer from high inference latency due to autoregressive decoding. The issue is exacerbated in Large Reasoning Models (LRMs), which generate lengthy chains of thought. While speculative decoding accelerates inference by drafting and verifying multiple tokens in parallel, existing methods operate at the token level and ignore semantic equivalence (i.e., different token sequences expressing the same meaning), leading to inefficient rejections. We propose SemanticSpec, a semantic-aware speculative decoding framework that verifies entire semantic sequences instead of tokens. SemanticSpec introduces a semantic probability estimation mechanism that probes the model's internal hidden states to assess the likelihood of generating sequences with specific meanings.Experiments on four benchmarks show that SemanticSpec achieves up to 2.7x speedup on DeepSeekR1-32B and 2.1x on QwQ-32B, consistently outperforming token-level and sequence-level baselines in both efficiency and effectiveness.
- [714] arXiv:2602.03709 [pdf, ps, other]
-
Title: No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural UnderstandingSubjects: Computation and Language (cs.CL)
Understanding culture requires reasoning across context, tradition, and implicit social knowledge, far beyond recalling isolated facts. Yet most culturally focused question answering (QA) benchmarks rely on single-hop questions, which may allow models to exploit shallow cues rather than demonstrate genuine cultural reasoning. In this work, we introduce ID-MoCQA, the first large-scale multi-hop QA dataset for assessing the cultural understanding of large language models (LLMs), grounded in Indonesian traditions and available in both English and Indonesian. We present a new framework that systematically transforms single-hop cultural questions into multi-hop reasoning chains spanning six clue types (e.g., commonsense, temporal, geographical). Our multi-stage validation pipeline, combining expert review and LLM-as-a-judge filtering, ensures high-quality question-answer pairs. Our evaluation across state-of-the-art models reveals substantial gaps in cultural reasoning, particularly in tasks requiring nuanced inference. ID-MoCQA provides a challenging and essential benchmark for advancing the cultural competency of LLMs.
- [715] arXiv:2602.03712 [pdf, ps, other]
-
Title: SWE-Refactor: A Repository-Level Benchmark for Real-World LLM-Based Code RefactoringSubjects: Software Engineering (cs.SE)
Large Language Models (LLMs) have recently attracted wide interest for tackling software engineering tasks. In contrast to code generation, refactoring demands precise, semantics-preserving edits that improve program structure, which also makes automated evaluation challenging. However, existing refactoring benchmarks commonly suffer from three shortcomings: limited coverage of refactoring scenarios, the inclusion of instances that mix refactoring with unrelated changes, and insufficient repository-level context for realistic assessment. To mitigate these issues, we introduce SWE-Refactor, a new benchmark for LLM-based code refactoring. SWE-Refactor comprises 1,099 developer-written, behavior-preserving refactorings mined from 18 Java projects, including 922 atomic and 177 compound instances. Each instance is validated via compilation, test execution, and automated refactoring detection tools to ensure correctness. We evaluate nine widely used LLMs on SWE-Refactor, covering models such as GPT-4o-mini, DeepSeek-V3, and CodeLLaMa, to provide representative reference results. Our results show that complex and compound refactorings remain the primary source of failures; notably, an OpenAI Codex agent achieves only 39.4% success on compound instances. We release SWE-Refactor and all evaluation results to facilitate future research on LLM-based code refactoring.
- [716] arXiv:2602.03713 [pdf, ps, other]
-
Title: Multimodal Generative Recommendation for Fusing Semantic and Collaborative SignalsAuthors: Moritz Vandenhirtz, Kaveh Hassani, Shervin Ghasemlou, Shuai Shao, Hamid Eghbalzadeh, Fuchun Peng, Jun Liu, Michael Louis IuzzolinoSubjects: Information Retrieval (cs.IR)
Sequential recommender systems rank relevant items by modeling a user's interaction history and computing the inner product between the resulting user representation and stored item embeddings. To avoid the significant memory overhead of storing large item sets, the generative recommendation paradigm instead models each item as a series of discrete semantic codes. Here, the next item is predicted by an autoregressive model that generates the code sequence corresponding to the predicted item. However, despite promising ranking capabilities on small datasets, these methods have yet to surpass traditional sequential recommenders on large item sets, limiting their adoption in the very scenarios they were designed to address. To resolve this, we propose MSCGRec, a Multimodal Semantic and Collaborative Generative Recommender. MSCGRec incorporates multiple semantic modalities and introduces a novel self-supervised quantization learning approach for images based on the DINO framework. Additionally, MSCGRec fuses collaborative and semantic signals by extracting collaborative features from sequential recommenders and treating them as a separate modality. Finally, we propose constrained sequence learning that restricts the large output space during training to the set of permissible tokens. We empirically demonstrate on three large real-world datasets that MSCGRec outperforms both sequential and generative recommendation baselines and provide an extensive ablation study to validate the impact of each component.
- [717] arXiv:2602.03719 [pdf, ps, other]
-
Title: Training Multi-Turn Search Agent via Contrastive Dynamic Branch SamplingComments: 24 pages, 5 figuresSubjects: Computation and Language (cs.CL)
Agentic reinforcement learning has enabled large language models to perform complex multi-turn planning and tool use. However, learning in long-horizon settings remains challenging due to sparse, trajectory-level outcome rewards. While prior tree-based methods attempt to mitigate this issue, they often suffer from high variance and computational inefficiency. Through empirical analysis of search agents, We identify a common pattern: performance diverges mainly due to decisions near the tail. Motivated by this observation, we propose Branching Relative Policy Optimization (BranPO), a value-free method that provides step-level contrastive supervision without dense rewards. BranPO truncates trajectories near the tail and resamples alternative continuations to construct contrastive suffixes over shared prefixes, reducing credit ambiguity in long-horizon rollouts. To further boost efficiency and stabilize training, we introduce difficulty-aware branch sampling to adapt branching frequency across tasks, and redundant step masking to suppress uninformative actions. Extensive experiments on various question answering benchmarks demonstrate that BranPO consistently outperforms strong baselines, achieving significant accuracy gains on long-horizon tasks without increasing the overall training budget. Our code is available at \href{https://github.com/YubaoZhao/BranPO}{code}.
- [718] arXiv:2602.03729 [pdf, ps, other]
-
Title: Efficient Training of Boltzmann Generators Using Off-Policy Log-Dispersion RegularizationSubjects: Machine Learning (cs.LG)
Sampling from unnormalized probability densities is a central challenge in computational science. Boltzmann generators are generative models that enable independent sampling from the Boltzmann distribution of physical systems at a given temperature. However, their practical success depends on data-efficient training, as both simulation data and target energy evaluations are costly. To this end, we propose off-policy log-dispersion regularization (LDR), a novel regularization framework that builds on a generalization of the log-variance objective. We apply LDR in the off-policy setting in combination with standard data-based training objectives, without requiring additional on-policy samples. LDR acts as a shape regularizer of the energy landscape by leveraging additional information in the form of target energy labels. The proposed regularization framework is broadly applicable, supporting unbiased or biased simulation datasets as well as purely variational training without access to target samples. Across all benchmarks, LDR improves both final performance and data efficiency, with sample efficiency gains of up to one order of magnitude.
- [719] arXiv:2602.03731 [pdf, ps, other]
-
Title: CUBO: Self-Contained Retrieval-Augmented Generation on Consumer Laptops 10 GB Corpora, 16 GB RAM, Single-Device DeploymentAuthors: Paolo AstrinoComments: 24 pages, 2 figures, 6 tablesSubjects: Computation and Language (cs.CL)
Organizations handling sensitive documents face a tension: cloud-based AI risks GDPR violations, while local systems typically require 18-32 GB RAM. This paper presents CUBO, a systems-oriented RAG platform for consumer laptops with 16 GB shared memory. CUBO's novelty lies in engineering integration of streaming ingestion (O(1) buffer overhead), tiered hybrid retrieval, and hardware-aware orchestration that enables competitive Recall@10 (0.48-0.97 across BEIR domains) within a hard 15.5 GB RAM ceiling. The 37,000-line codebase achieves retrieval latencies of 185 ms (p50) on C1,300 laptops while maintaining data minimization through local-only processing aligned with GDPR Art. 5(1)(c). Evaluation on BEIR benchmarks validates practical deployability for small-to-medium professional archives. The codebase is publicly available at https://github.com/PaoloAstrino/CUBO.
- [720] arXiv:2602.03732 [pdf, ps, other]
-
Title: Fast-MWEM: Private Data Release in Sublinear TimeSubjects: Machine Learning (cs.LG)
The Multiplicative Weights Exponential Mechanism (MWEM) is a fundamental iterative framework for private data analysis, with broad applications such as answering $m$ linear queries, or privately solving systems of $m$ linear constraints. However, a critical bottleneck hindering its scalability is the $\Theta(m)$ time complexity required to execute the exponential mechanism in each iteration. We introduce a modification to the MWEM framework that improves the per-iteration runtime dependency to $\Theta(\sqrt{m})$ in expectation. This is done via a lazy sampling approach to the Report-Noisy-Max mechanism, which we implement efficiently using Gumbel noise and a $k$-Nearest Neighbor data structure. This allows for the rapid selection of the approximate score in the exponential mechanism without an exhaustive linear scan. We apply our accelerated framework to the problems of private linear query release and solving Linear Programs (LPs) under neighboring constraint conditions and low-sensitivity assumptions. Experimental evaluation confirms that our method provides a substantial runtime improvement over classic MWEM.
- [721] arXiv:2602.03733 [pdf, ps, other]
-
Title: RegionReasoner: Region-Grounded Multi-Round Visual ReasoningComments: Accepted by ICLR 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
Large vision-language models have achieved remarkable progress in visual reasoning, yet most existing systems rely on single-step or text-only reasoning, limiting their ability to iteratively refine understanding across multiple visual contexts. To address this limitation, we introduce a new multi-round visual reasoning benchmark with training and test sets spanning both detection and segmentation tasks, enabling systematic evaluation under iterative reasoning scenarios. We further propose RegionReasoner, a reinforcement learning framework that enforces grounded reasoning by requiring each reasoning trace to explicitly cite the corresponding reference bounding boxes, while maintaining semantic coherence via a global-local consistency reward. This reward extracts key objects and nouns from both global scene captions and region-level captions, aligning them with the reasoning trace to ensure consistency across reasoning steps. RegionReasoner is optimized with structured rewards combining grounding fidelity and global-local semantic alignment. Experiments on detection and segmentation tasks show that RegionReasoner-7B, together with our newly introduced benchmark RegionDial-Bench, considerably improves multi-round reasoning accuracy, spatial grounding precision, and global-local consistency, establishing a strong baseline for this emerging research direction.
- [722] arXiv:2602.03737 [pdf, ps, other]
-
Title: Soft Sensor for Bottom-Hole Pressure Estimation in Petroleum Wells Using Long Short-Term Memory and Transfer LearningSubjects: Machine Learning (cs.LG)
Monitoring bottom-hole variables in petroleum wells is essential for production optimization, safety, and emissions reduction. Permanent Downhole Gauges (PDGs) provide real-time pressure data but face reliability and cost issues. We propose a machine learning-based soft sensor to estimate flowing Bottom-Hole Pressure (BHP) using wellhead and topside measurements. A Long Short-Term Memory (LSTM) model is introduced and compared with Multi-Layer Perceptron (MLP) and Ridge Regression. We also pioneer Transfer Learning for adapting models across operational environments. Tested on real offshore datasets from Brazil's Pre-salt basin, the methodology achieved Mean Absolute Percentage Error (MAPE) consistently below 2\%, outperforming benchmarks. This work offers a cost-effective, accurate alternative to physical sensors, with broad applicability across diverse reservoir and flow conditions.
- [723] arXiv:2602.03742 [pdf, ps, other]
-
Title: Edge-Optimized Vision-Language Models for Underground Infrastructure AssessmentSubjects: Computer Vision and Pattern Recognition (cs.CV)
Autonomous inspection of underground infrastructure, such as sewer and culvert systems, is critical to public safety and urban sustainability. Although robotic platforms equipped with visual sensors can efficiently detect structural deficiencies, the automated generation of human-readable summaries from these detections remains a significant challenge, especially on resource-constrained edge devices. This paper presents a novel two-stage pipeline for end-to-end summarization of underground deficiencies, combining our lightweight RAPID-SCAN segmentation model with a fine-tuned Vision-Language Model (VLM) deployed on an edge computing platform. The first stage employs RAPID-SCAN (Resource-Aware Pipeline Inspection and Defect Segmentation using Compact Adaptive Network), achieving 0.834 F1-score with only 0.64M parameters for efficient defect segmentation. The second stage utilizes a fine-tuned Phi-3.5 VLM that generates concise, domain-specific summaries in natural language from the segmentation outputs. We introduce a curated dataset of inspection images with manually verified descriptions for VLM fine-tuning and evaluation. To enable real-time performance, we employ post-training quantization with hardware-specific optimization, achieving significant reductions in model size and inference latency without compromising summarization quality. We deploy and evaluate our complete pipeline on a mobile robotic platform, demonstrating its effectiveness in real-world inspection scenarios. Our results show the potential of edge-deployable integrated AI systems to bridge the gap between automated defect detection and actionable insights for infrastructure maintenance, paving the way for more scalable and autonomous inspection solutions.
- [724] arXiv:2602.03743 [pdf, ps, other]
-
Title: Occlusion-Free Conformal Lensing for Spatiotemporal Visualization in 3D Urban AnalyticsComments: Accepted at IEEE VR 2026Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
The visualization of temporal data on urban buildings, such as shadows, noise, and solar potential, plays a critical role in the analysis of dynamic urban phenomena. However, in dense and geographically constrained 3D urban environments, visual representations of time-varying building data often suffer from occlusion and visual clutter. To address these two challenges, we introduce an immersive lens visualization that integrates a view-dependent cutaway de-occlusion technique and a temporal display derived from a conformal mapping algorithm. The mapping process first partitions irregular building footprints into smaller, sufficiently regular subregions that serve as structural primitives. These subregions are then seamlessly recombined to form a conformal, layered layout for our temporal lens visualization. The view-responsive cutaway is inspired by traditional architectural illustrations, preserving the overall layout of the building and its surroundings to maintain users' sense of spatial orientation. This lens design enables the occlusion-free embedding of shape-adaptive temporal displays across building facades on demand, supporting rapid time-space association for the discovery, access and interpretation of spatiotemporal urban patterns. Guided by domain and design goals, we outline the rationale behind the lens visual and interaction design choices, such as the encoding of time progression and temporal values in the conforming lens image. A user study compares our approach against conventional juxtaposition and x-ray spatiotemporal designs. Results validate the usage and utility of our lens, showing that it improves task accuracy and completion time, reduces navigation effort, and increases user confidence. From these findings, we distill design recommendations and promising directions for future research on spatially-embedded lenses in 3D visualization and urban analytics.
- [725] arXiv:2602.03747 [pdf, ps, other]
-
Title: LIVE: Long-horizon Interactive Video World ModelingAuthors: Junchao Huang, Ziyang Ye, Xinting Hu, Tianyu He, Guiyu Zhang, Shaoshuai Shi, Jiang Bian, Li JiangComments: 18 pages, 22 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Autoregressive video world models predict future visual observations conditioned on actions. While effective over short horizons, these models often struggle with long-horizon generation, as small prediction errors accumulate over time. Prior methods alleviate this by introducing pre-trained teacher models and sequence-level distribution matching, which incur additional computational cost and fail to prevent error propagation beyond the training horizon. In this work, we propose LIVE, a Long-horizon Interactive Video world modEl that enforces bounded error accumulation via a novel cycle-consistency objective, thereby eliminating the need for teacher-based distillation. Specifically, LIVE first performs a forward rollout from ground-truth frames and then applies a reverse generation process to reconstruct the initial state. The diffusion loss is subsequently computed on the reconstructed terminal state, providing an explicit constraint on long-horizon error propagation. Moreover, we provide an unified view that encompasses different approaches and introduce progressive training curriculum to stabilize training. Experiments demonstrate that LIVE achieves state-of-the-art performance on long-horizon benchmarks, generating stable, high-quality videos far beyond training rollout lengths.
- [726] arXiv:2602.03749 [pdf, ps, other]
-
Title: See-through: Single-image Layer Decomposition for Anime CharactersAuthors: Jian Lin, Chengze Li, Haoyun Qin, Kwun Wang Chan, Yanghua Jin, Hanyuan Liu, Stephen Chun Wang Choy, Xueting LiuComments: 23 pages, 20 figures, preprint version onlySubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
We introduce a framework that automates the transformation of static anime illustrations into manipulatable 2.5D models. Current professional workflows require tedious manual segmentation and the artistic ``hallucination'' of occluded regions to enable motion. Our approach overcomes this by decomposing a single image into fully inpainted, semantically distinct layers with inferred drawing orders. To address the scarcity of training data, we introduce a scalable engine that bootstraps high-quality supervision from commercial Live2D models, capturing pixel-perfect semantics and hidden geometry. Our methodology couples a diffusion-based Body Part Consistency Module, which enforces global geometric coherence, with a pixel-level pseudo-depth inference mechanism. This combination resolves the intricate stratification of anime characters, e.g., interleaving hair strands, allowing for dynamic layer reconstruction. We demonstrate that our approach yields high-fidelity, manipulatable models suitable for professional, real-time animation applications.
- [727] arXiv:2602.03750 [pdf, ps, other]
-
Title: Zero-shot large vision-language model prompting for automated bone identification in paleoradiology x-ray archivesAuthors: Owen Dong, Lily Gao, Manish Kota, Bennett A. Landmana, Jelena Bekvalac, Gaynor Western, Katherine D. Van SchaikSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Paleoradiology, the use of modern imaging technologies to study archaeological and anthropological remains, offers new windows on millennial scale patterns of human health. Unfortunately, the radiographs collected during field campaigns are heterogeneous: bones are disarticulated, positioning is ad hoc, and laterality markers are often absent. Additionally, factors such as age at death, age of bone, sex, and imaging equipment introduce high variability. Thus, content navigation, such as identifying a subset of images with a specific projection view, can be time consuming and difficult, making efficient triaging a bottleneck for expert analysis. We report a zero shot prompting strategy that leverages a state of the art Large Vision Language Model (LVLM) to automatically identify the main bone, projection view, and laterality in such images. Our pipeline converts raw DICOM files to bone windowed PNGs, submits them to the LVLM with a carefully engineered prompt, and receives structured JSON outputs, which are extracted and formatted onto a spreadsheet in preparation for validation. On a random sample of 100 images reviewed by an expert board certified paleoradiologist, the system achieved 92% main bone accuracy, 80% projection view accuracy, and 100% laterality accuracy, with low or medium confidence flags for ambiguous cases. These results suggest that LVLMs can substantially accelerate code word development for large paleoradiology datasets, allowing for efficient content navigation in future anthropology workflows.
- [728] arXiv:2602.03753 [pdf, ps, other]
-
Title: Test-Time Conditioning with Representation-Aligned Visual FeaturesSubjects: Computer Vision and Pattern Recognition (cs.CV)
While representation alignment with self-supervised models has been shown to improve diffusion model training, its potential for enhancing inference-time conditioning remains largely unexplored. We introduce Representation-Aligned Guidance (REPA-G), a framework that leverages these aligned representations, with rich semantic properties, to enable test-time conditioning from features in generation. By optimizing a similarity objective (the potential) at inference, we steer the denoising process toward a conditioned representation extracted from a pre-trained feature extractor. Our method provides versatile control at multiple scales, ranging from fine-grained texture matching via single patches to broad semantic guidance using global image feature tokens. We further extend this to multi-concept composition, allowing for the faithful combination of distinct concepts. REPA-G operates entirely at inference time, offering a flexible and precise alternative to often ambiguous text prompts or coarse class labels. We theoretically justify how this guidance enables sampling from the potential-induced tilted distribution. Quantitative results on ImageNet and COCO demonstrate that our approach achieves high-quality, diverse generations. Code is available at https://github.com/valeoai/REPA-G.
- [729] arXiv:2602.03755 [pdf, ps, other]
-
Title: Improving Deep Learning Library Testing with Machine LearningComments: In proceedings of the 7th ACM/IEEE International Conference on Automation of Software Test (AST 2026)Subjects: Software Engineering (cs.SE)
Deep Learning (DL) libraries like TensorFlow and Pytorch simplify machine learning (ML) model development but are prone to bugs due to their complex design. Bug-finding techniques exist, but without precise API specifications, they produce many false alarms. Existing methods to mine API specifications lack accuracy. We explore using ML classifiers to determine input validity. We hypothesize that tensor shapes are a precise abstraction to encode concrete inputs and capture relationships of the data. Shape abstraction severely reduces problem dimensionality, which is important to facilitate ML training. Labeled data are obtained by observing runtime outcomes on a sample of inputs and classifiers are trained on sets of labeled inputs to capture API constraints. Our evaluation, conducted over 183 APIs from TensorFlow and Pytorch, shows that the classifiers generalize well on unseen data with over 91% accuracy. Integrating these classifiers into the pipeline of ACETest, a SoTA bug-finding technique, improves its pass rate from ~29% to ~61%. Our findings suggest that ML-enhanced input classification is an important aid to scale DL library testing.
- [730] arXiv:2602.03757 [pdf, ps, other]
-
Title: Mitigating Timing-Based Attacks in Real-Time Cyber-Physical SystemsComments: 12 pages, 10 figuresSubjects: Systems and Control (eess.SY); Operating Systems (cs.OS)
Real-time cyber-physical systems depend on deterministic task execution to guarantee safety and correctness. Unfortunately, this determinism can unintentionally expose timing information that enables adversaries to infer task execution patterns and carry out timing-based attacks targeting safety-critical control tasks. While prior defenses aim to obscure schedules through randomization or isolation, they typically neglect the implications of such modifications on closed-loop control behavior and real-time feasibility. This work studies the problem of securing real-time control workloads against timing inference attacks while explicitly accounting for both schedulability constraints and control performance requirements. We present a scheduling-based mitigation approach that introduces bounded timing perturbations to control task executions in a structured manner, reducing adversarial opportunities without violating real-time guarantees. The framework jointly considers worst-case execution behavior and the impact of execution delays on control performance, enabling the system to operate within predefined safety and performance limits. Through experimental evaluation on representative task sets and control scenarios, the proposed approach demonstrates that exposure to timing-based attacks can be significantly reduced while preserving predictable execution and acceptable control quality.
- [731] arXiv:2602.03760 [pdf, ps, other]
-
Title: RAWDet-7: A Multi-Scenario Benchmark for Object Detection and Description on Quantized RAW ImagesAuthors: Mishal Fatima, Shashank Agnihotri, Kanchana Vaishnavi Gandikota, Michael Moeller, Margret KeuperComments: *Equal ContributionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Most vision models are trained on RGB images processed through ISP pipelines optimized for human perception, which can discard sensor-level information useful for machine reasoning. RAW images preserve unprocessed scene data, enabling models to leverage richer cues for both object detection and object description, capturing fine-grained details, spatial relationships, and contextual information often lost in processed images. To support research in this domain, we introduce RAWDet-7, a large-scale dataset of ~25k training and 7.6k test RAW images collected across diverse cameras, lighting conditions, and environments, densely annotated for seven object categories following MS-COCO and LVIS conventions. In addition, we provide object-level descriptions derived from the corresponding high-resolution sRGB images, facilitating the study of object-level information preservation under RAW image processing and low-bit quantization. The dataset allows evaluation under simulated 4-bit, 6-bit, and 8-bit quantization, reflecting realistic sensor constraints, and provides a benchmark for studying detection performance, description quality & detail, and generalization in low-bit RAW image processing. Dataset & code upon acceptance.
- [732] arXiv:2602.03766 [pdf, ps, other]
-
Title: FOVI: A biologically-inspired foveated interface for deep vision modelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
Human vision is foveated, with variable resolution peaking at the center of a large field of view; this reflects an efficient trade-off for active sensing, allowing eye-movements to bring different parts of the world into focus with other parts of the world in context. In contrast, most computer vision systems encode the visual world at a uniform resolution, raising challenges for processing full-field high-resolution images efficiently. We propose a foveated vision interface (FOVI) based on the human retina and primary visual cortex, that reformats a variable-resolution retina-like sensor array into a uniformly dense, V1-like sensor manifold. Receptive fields are defined as k-nearest-neighborhoods (kNNs) on the sensor manifold, enabling kNN-convolution via a novel kernel mapping technique. We demonstrate two use cases: (1) an end-to-end kNN-convolutional architecture, and (2) a foveated adaptation of the foundational DINOv3 ViT model, leveraging low-rank adaptation (LoRA). These models provide competitive performance at a fraction of the computational cost of non-foveated baselines, opening pathways for efficient and scalable active sensing for high-resolution egocentric vision. Code and pre-trained models are available at https://github.com/nblauch/fovi and https://huggingface.co/fovi-pytorch.
- [733] arXiv:2602.03767 [pdf, ps, other]
-
Title: Decision-oriented benchmarking to transform AI weather forecast access: Application to the Indian monsoonAuthors: Rajat Masiwal, Colin Aitken, Adam Marchakitus, Mayank Gupta, Katherine Kowal, Hamid A. Pahlavan, Tyler Yang, Y. Qiang Sun, Michael Kremer, Amir Jina, William R. Boos, Pedram HassanzadehSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); General Economics (econ.GN); Atmospheric and Oceanic Physics (physics.ao-ph)
Artificial intelligence weather prediction (AIWP) models now often outperform traditional physics-based models on common metrics while requiring orders-of-magnitude less computing resources and time. Open-access AIWP models thus hold promise as transformational tools for helping low- and middle-income populations make decisions in the face of high-impact weather shocks. Yet, current approaches to evaluating AIWP models focus mainly on aggregated meteorological metrics without considering local stakeholders' needs in decision-oriented, operational frameworks. Here, we introduce such a framework that connects meteorology, AI, and social sciences. As an example, we apply it to the 150-year-old problem of Indian monsoon forecasting, focusing on benefits to rain-fed agriculture, which is highly susceptible to climate change. AIWP models skillfully predict an agriculturally relevant onset index at regional scales weeks in advance when evaluated out-of-sample using deterministic and probabilistic metrics. This framework informed a government-led effort in 2025 to send 38 million Indian farmers AI-based monsoon onset forecasts, which captured an unusual weeks-long pause in monsoon progression. This decision-oriented benchmarking framework provides a key component of a blueprint for harnessing the power of AIWP models to help large vulnerable populations adapt to weather shocks in the face of climate variability and change.
- [734] arXiv:2602.03769 [pdf, ps, other]
-
Title: Reasoning with Latent Tokens in Diffusion Language ModelsSubjects: Machine Learning (cs.LG)
Discrete diffusion models have recently become competitive with autoregressive models for language modeling, even outperforming them on reasoning tasks requiring planning and global coherence, but they require more computation at inference time. We trace this trade-off to a key mechanism: diffusion models are trained to jointly predict a distribution over all unknown tokens, including those that will not actually be decoded in the current step. Ablating this joint prediction yields faster inference but degrades performance, revealing that accurate prediction at the decoded position relies on joint reasoning about the distribution of undecoded tokens. We interpret these as latent tokens and introduce a method for modulating their number, demonstrating empirically that this enables a smooth tradeoff between inference speed and sample quality. Furthermore, we demonstrate that latent tokens can be introduced into autoregressive models through an auxiliary multi-token prediction objective, yielding substantial improvements on the same reasoning tasks where they have traditionally struggled. Our results suggest that latent tokens, while arising naturally in diffusion, represent a general mechanism for improving performance on tasks requiring global coherence or lookahead.
- [735] arXiv:2602.03772 [pdf, ps, other]
-
Title: UniGeM: Unifying Data Mixing and Selection via Geometric Exploration and MiningAuthors: Changhao Wang, Yunfei Yu, Xinhao Yao, Jiaolong Yang, Riccardo Cantoro, Chaobo Li, Qing Cui, Jun ZhouSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The scaling of Large Language Models (LLMs) is increasingly limited by data quality. Most methods handle data mixing and sample selection separately, which can break the structure in code corpora. We introduce \textbf{UniGeM}, a framework that unifies mixing and selection by treating data curation as a \textit{manifold approximation} problem without training proxy models or relying on external reference datasets. UniGeM operates hierarchically: \textbf{Macro-Exploration} learns mixing weights with stability-based clustering; \textbf{Micro-Mining} filters high-quality instances by their geometric distribution to ensure logical consistency. Validated by training 8B and 16B MoE models on 100B tokens, UniGeM achieves \textbf{2.0$\times$ data efficiency} over a random baseline and further improves overall performance compared to SOTA methods in reasoning-heavy evaluations and multilingual generalization.
- [736] arXiv:2602.03773 [pdf, ps, other]
-
Title: Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RLComments: preprintSubjects: Machine Learning (cs.LG)
Large Language Models (LLMs) that can continually improve beyond their training budgets are able to solve increasingly difficult problems by adapting at test time, a property we refer to as extrapolation. However, standard reinforcement learning (RL) operates over fixed problem distributions and training budgets, which limits extrapolation amidst distribution shift at test time. To address this, we introduce RC, an iterative decoding algorithm that replaces standard autoregressive decoding during both training and inference. RC exploits an asymmetry between the response generation and summarization capabilities of LLMs to construct reasoning chains that consistently improve across iterations. Models trained to use RC can extrapolate and continually improve over reasoning horizons more than an order of magnitude longer than those seen during training. Empirically, training a 4B model with RC using a 16k-token training budget improves performance on HMMT 2025 from 40% to nearly 70% with 0.5m tokens at test time, outperforming both comparably sized models and many larger reasoning LLMs. Finally, we also show that models trained with RC can more effectively leverage existing scaffolds to further scale test-time performance, due to the improved summary-conditioned generation abilities learned through training.
- [737] arXiv:2602.03775 [pdf, ps, other]
-
Title: An Empirical Study of Collective Behaviors and Social Dynamics in Large Language Model AgentsSubjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Large Language Models (LLMs) increasingly mediate our social, cultural, and political interactions. While they can simulate some aspects of human behavior and decision-making, it is still underexplored whether repeated interactions with other agents amplify their biases or lead to exclusionary behaviors. To this end, we study Chirper.ai-an LLM-driven social media platform-analyzing 7M posts and interactions among 32K LLM agents over a year. We start with homophily and social influence among LLMs, learning that similar to humans', their social networks exhibit these fundamental phenomena. Next, we study the toxic language of LLMs, its linguistic features, and their interaction patterns, finding that LLMs show different structural patterns in toxic posting than humans. After studying the ideological leaning in LLMs posts, and the polarization in their community, we focus on how to prevent their potential harmful activities. We present a simple yet effective method, called Chain of Social Thought (CoST), that reminds LLM agents to avoid harmful posting.
- [738] arXiv:2602.03777 [pdf, ps, other]
-
Title: From Separate Compilation to Sound Language CompositionComments: 43 pages, 1 figure, 5 ListingSubjects: Programming Languages (cs.PL); Software Engineering (cs.SE)
The development of programming languages involves complex theoretical and practical challenges, particularly when addressing modularity and reusability through language extensions. While language workbenches aim to enable modular development under the constraints of the language extension problem, one critical constraint -- separate compilation -- is often relaxed due to its complexity. However, this relaxation undermines artifact reusability and integration with common dependency systems. A key difficulty under separate compilation arises from managing attribute grammars, as extensions may introduce new attributes that invalidate previously generated abstract syntax tree structures. Existing approaches, such as the use of dynamic maps in the Neverlang workbench, favor flexibility at the cost of compile-time correctness, leading to potential runtime errors due to undefined attributes. This work addresses this issue by introducing nlgcheck, a theoretically sound static analysis tool based on data-flow analysis for the Neverlang language workbench. nlgcheck detects potential runtime errors -- such as undefined attribute accesses -- at compile time, preserving separate compilation while maintaining strong static correctness guarantees. Experimental evaluation using mutation testing on Neverlang-based projects demonstrates that nlgcheck effectively enhances robustness without sacrificing modularity or flexibility and with a level of performance that does not impede its adoption in daily development activities.
- [739] arXiv:2602.03778 [pdf, ps, other]
-
Title: Reward Redistribution for CVaR MDPs using a Bellman Operator on L-infinitySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Tail-end risk measures such as static conditional value-at-risk (CVaR) are used in safety-critical applications to prevent rare, yet catastrophic events. Unlike risk-neutral objectives, the static CVaR of the return depends on entire trajectories without admitting a recursive Bellman decomposition in the underlying Markov decision process. A classical resolution relies on state augmentation with a continuous variable. However, unless restricted to a specialized class of admissible value functions, this formulation induces sparse rewards and degenerate fixed points. In this work, we propose a novel formulation of the static CVaR objective based on augmentation. Our alternative approach leads to a Bellman operator with: (1) dense per-step rewards; (2) contracting properties on the full space of bounded value functions. Building on this theoretical foundation, we develop risk-averse value iteration and model-free Q-learning algorithms that rely on discretized augmented states. We further provide convergence guarantees and approximation error bounds due to discretization. Empirical results demonstrate that our algorithms successfully learn CVaR-sensitive policies and achieve effective performance-safety trade-offs.
- [740] arXiv:2602.03781 [pdf, ps, other]
-
Title: A Scene Graph Backed Approach to Open Set Semantic MappingAuthors: Martin Günther, Felix Igelbrink, Oscar Lima, Lennart Niecksch, Marian Renz, Martin AtzmuellerSubjects: Robotics (cs.RO)
While Open Set Semantic Mapping and 3D Semantic Scene Graphs (3DSSGs) are established paradigms in robotic perception, deploying them effectively to support high-level reasoning in large-scale, real-world environments remains a significant challenge. Most existing approaches decouple perception from representation, treating the scene graph as a derivative layer generated post hoc. This limits both consistency and scalability. In contrast, we propose a mapping architecture where the 3DSSG serves as the foundational backend, acting as the primary knowledge representation for the entire mapping process.
Our approach leverages prior work on incremental scene graph prediction to infer and update the graph structure in real-time as the environment is explored. This ensures that the map remains topologically consistent and computationally efficient, even during extended operations in large-scale settings. By maintaining an explicit, spatially grounded representation that supports both flat and hierarchical topologies, we bridge the gap between sub-symbolic raw sensor data and high-level symbolic reasoning. Consequently, this provides a stable, verifiable structure that knowledge-driven frameworks, ranging from knowledge graphs and ontologies to Large Language Models (LLMs), can directly exploit, enabling agents to operate with enhanced interpretability, trustworthiness, and alignment to human concepts. - [741] arXiv:2602.03782 [pdf, ps, other]
-
Title: QVLA: Not All Channels Are Equal in Vision-Language-Action Model's QuantizationComments: ICLR2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
The advent of Vision-Language-Action (VLA) models represents a significant leap for embodied intelligence, yet their immense computational demands critically hinder deployment on resource-constrained robotic platforms. Intuitively, low-bit quantization is a prevalent and preferred technique for large-scale model compression. However, we find that a systematic analysis of VLA model's quantization is fundamentally lacking. We argue that naively applying uniform-bit quantization from Large Language Models (LLMs) to robotics is flawed, as these methods prioritize passive data fidelity while ignoring how minor action deviations compound into catastrophic task failures. To bridge this gap, we introduce QVLA, the first action-centric quantization framework specifically designed for embodied control. In a sharp departure from the rigid, uniform-bit quantization of LLM-based methods, QVLA introduces a highly granular, channel-wise bit allocation strategy. Its core mechanism is to directly measure the final action-space sensitivity when quantizing each individual channel to various bit-widths. This process yields a precise, per-channel importance metric that guides a global optimization, which elegantly unifies quantization and pruning (0-bit) into a single, cohesive framework. Extensive evaluations on different baselines demonstrate the superiority of our approach. In the LIBERO, the quantization version of OpenVLA-OFT with our method requires only 29.2% of the original model's VRAM while maintaining 98.9% of its original performance and achieving a 1.49x speedup. This translates to a 22.6% performance improvement over the LLM-derived method SmoothQuant. Our work establishes a new, principled foundation for compressing VLA models in robotics, paving the way for deploying powerful, large-scale models on real-world hardware. Code will be released.
- [742] arXiv:2602.03783 [pdf, ps, other]
-
Title: Efficient Estimation of Kernel Surrogate Models for Task AttributionComments: 27 pages. To appear in ICLR 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Modern AI agents such as large language models are trained on diverse tasks -- translation, code generation, mathematical reasoning, and text prediction -- simultaneously. A key question is to quantify how each individual training task influences performance on a target task, a problem we refer to as task attribution. The direct approach, leave-one-out retraining, measures the effect of removing each task, but is computationally infeasible at scale. An alternative approach that builds surrogate models to predict a target task's performance for any subset of training tasks has emerged in recent literature. Prior work focuses on linear surrogate models, which capture first-order relationships, but miss nonlinear interactions such as synergy, antagonism, or XOR-type effects. In this paper, we first consider a unified task weighting framework for analyzing task attribution methods, and show a new connection between linear surrogate models and influence functions through a second-order analysis. Then, we introduce kernel surrogate models, which more effectively represent second-order task interactions. To efficiently learn the kernel surrogate, we develop a gradient-based estimation procedure that leverages a first-order approximation of pretrained models; empirically, this yields accurate estimates with less than $2\%$ relative error without repeated retraining. Experiments across multiple domains -- including math reasoning in transformers, in-context learning, and multi-objective reinforcement learning -- demonstrate the effectiveness of kernel surrogate models. They achieve a $25\%$ higher correlation with the leave-one-out ground truth than linear surrogates and influence-function baselines. When used for downstream task selection, kernel surrogate models yield a $40\%$ improvement in demonstration selection for in-context learning and multi-objective reinforcement learning benchmarks.
- [743] arXiv:2602.03784 [pdf, ps, other]
-
Title: Context Compression via Explicit Information TransmissionSubjects: Computation and Language (cs.CL)
Long-context inference with Large Language Models (LLMs) is costly due to quadratic attention and growing key-value caches, motivating context compression. In this work, we study soft context compression, where a long context is condensed into a small set of continuous representations. Existing methods typically re-purpose the LLM itself as a trainable compressor, relying on layer-by-layer self-attention to iteratively aggregate information. We argue that this paradigm suffers from two structural limitations: (i) progressive representation overwriting across layers (ii) uncoordinated allocation of compression capacity across tokens. We propose ComprExIT (Context Compression via Explicit Information Transmission), a lightweight framework that formulates soft compression into a new paradigm: explicit information transmission over frozen LLM hidden states. This decouples compression from the model's internal self-attention dynamics. ComprExIT performs (i) depth-wise transmission to selectively transmit multi-layer information into token anchors, mitigating progressive overwriting, and (ii) width-wise transmission to aggregate anchors into a small number of slots via a globally optimized transmission plan, ensuring coordinated allocation of information. Across six question-answering benchmarks, ComprExIT consistently outperforms state-of-the-art context compression methods while introducing only ~1% additional parameters, demonstrating that explicit and coordinated information transmission enables more effective and robust long-context compression.
- [744] arXiv:2602.03785 [pdf, ps, other]
-
Title: From Pre- to Intra-operative MRI: Predicting Brain Shift in Temporal Lobe Resection for Epilepsy SurgeryAuthors: Jingjing Peng, Giorgio Fiore, Yang Liu, Ksenia Ellum, Debayan Daspupta, Keyoumars Ashkan, Andrew McEvoy, Anna Miserocchi, Sebastien Ourselin, John Duncan, Alejandro GranadosSubjects: Computer Vision and Pattern Recognition (cs.CV)
Introduction: In neurosurgery, image-guided Neurosurgery Systems (IGNS) highly rely on preoperative brain magnetic resonance images (MRI) to assist surgeons in locating surgical targets and determining surgical paths. However, brain shift invalidates the preoperative MRI after dural opening. Updated intraoperative brain MRI with brain shift compensation is crucial for enhancing the precision of neuronavigation systems and ensuring the optimal outcome of surgical interventions. Methodology: We propose NeuralShift, a U-Net-based model that predicts brain shift entirely from pre-operative MRI for patients undergoing temporal lobe resection. We evaluated our results using Target Registration Errors (TREs) computed on anatomical landmarks located on the resection side and along the midline, and DICE scores comparing predicted intraoperative masks with masks derived from intraoperative MRI. Results: Our experimental results show that our model can predict the global deformation of the brain (DICE of 0.97) with accurate local displacements (achieve landmark TRE as low as 1.12 mm), compensating for large brain shifts during temporal lobe removal neurosurgery. Conclusion: Our proposed model is capable of predicting the global deformation of the brain during temporal lobe resection using only preoperative images, providing potential opportunities to the surgical team to increase safety and efficiency of neurosurgery and better outcomes to patients. Our contributions will be publicly available after acceptance in https://github.com/SurgicalDataScienceKCL/NeuralShift.
- [745] arXiv:2602.03786 [pdf, ps, other]
-
Title: AOrchestra: Automating Sub-Agent Creation for Agentic OrchestrationAuthors: Jianhao Ruan, Zhihao Xu, Yiran Peng, Fashen Ren, Zhaoyang Yu, Xinbing Liang, Jinyu Xiang, Bang Liu, Chenglin Wu, Yuyu Luo, Jiayi ZhangSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Language agents have shown strong promise for task automation. Realizing this promise for increasingly complex, long-horizon tasks has driven the rise of a sub-agent-as-tools paradigm for multi-turn task solving. However, existing designs still lack a dynamic abstraction view of sub-agents, thereby hurting adaptability. We address this challenge with a unified, framework-agnostic agent abstraction that models any agent as a tuple Instruction, Context, Tools, Model. This tuple acts as a compositional recipe for capabilities, enabling the system to spawn specialized executors for each task on demand. Building on this abstraction, we introduce an agentic system AOrchestra, where the central orchestrator concretizes the tuple at each step: it curates task-relevant context, selects tools and models, and delegates execution via on-the-fly automatic agent creation. Such designs enable reducing human engineering efforts, and remain framework-agnostic with plug-and-play support for diverse agents as task executors. It also enables a controllable performance-cost trade-off, allowing the system to approach Pareto-efficient. Across three challenging benchmarks (GAIA, SWE-Bench, Terminal-Bench), AOrchestra achieves 16.28% relative improvement against the strongest baseline when paired with Gemini-3-Flash. The code is available at: https://github.com/FoundationAgents/AOrchestra
- [746] arXiv:2602.03787 [pdf, ps, other]
-
Title: Inference-time Unlearning Using Conformal PredictionAuthors: Somnath Basu Roy Chowdhury, Rahul Kidambi, Avinava Dubey, David Wang, Gokhan Mergen, Amr Ahmed, Aranyak MehtaSubjects: Machine Learning (cs.LG)
Machine unlearning is the process of efficiently removing specific information from a trained machine learning model without retraining from scratch. Existing unlearning methods, which often provide provable guarantees, typically involve retraining a subset of model parameters based on a forget set. While these approaches show promise in certain scenarios, their underlying assumptions are often challenged in real-world applications -- particularly when applied to generative models. Furthermore, updating parameters using these unlearning procedures often degrades the general-purpose capabilities the model acquired during pre-training. Motivated by these shortcomings, this paper considers the paradigm of inference time unlearning -- wherein, the generative model is equipped with an (approximately correct) verifier that judges whether the model's response satisfies appropriate unlearning guarantees. This paper introduces a framework that iteratively refines the quality of the generated responses using feedback from the verifier without updating the model parameters. The proposed framework leverages conformal prediction to reduce computational overhead and provide distribution-free unlearning guarantees. This paper's approach significantly outperforms existing state-of-the-art methods, reducing unlearning error by up to 93% across challenging unlearning benchmarks.
- [747] arXiv:2602.03791 [pdf, ps, other]
-
Title: Should I use Synthetic Data for That? An Analysis of the Suitability of Synthetic Data for Data Sharing and AugmentationComments: BK and TS contributed equallySubjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Recent advances in generative modelling have led many to see synthetic data as the go-to solution for a range of problems around data access, scarcity, and under-representation. In this paper, we study three prominent use cases: (1) Sharing synthetic data as a proxy for proprietary datasets to enable statistical analyses while protecting privacy, (2) Augmenting machine learning training sets with synthetic data to improve model performance, and (3) Augmenting datasets with synthetic data to reduce variance in statistical estimation. For each use case, we formalise the problem setting and study, through formal analysis and case studies, under which conditions synthetic data can achieve its intended objectives. We identify fundamental and practical limits that constrain when synthetic data can serve as an effective solution for a particular problem. Our analysis reveals that due to these limits many existing or envisioned use cases of synthetic data are a poor problem fit. Our formalisations and classification of synthetic data use cases enable decision makers to assess whether synthetic data is a suitable approach for their specific data availability problem.
- [748] arXiv:2602.03792 [pdf, ps, other]
-
Title: WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web AgentsSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones. Existing methods for detecting and localizing such attacks achieve limited effectiveness, as their underlying assumptions often do not hold in the web-agent setting. In this work, we propose WebSentinel, a two-step approach for detecting and localizing prompt injection attacks in webpages. Given a webpage, Step I extracts \emph{segments of interest} that may be contaminated, and Step II evaluates each segment by checking its consistency with the webpage content as context. We show that WebSentinel is highly effective, substantially outperforming baseline methods across multiple datasets of both contaminated and clean webpages that we collected. Our code is available at: https://github.com/wxl-lxw/WebSentinel.
- [749] arXiv:2602.03793 [pdf, ps, other]
-
Title: BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment MasksAuthors: Yixiang Chen, Peiyan Li, Jiabing Yang, Keji He, Xiangnan Wu, Yuan Xu, Kai Wang, Jing Liu, Nianfeng Liu, Yan Huang, Liang WangSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Embodied world models have emerged as a promising paradigm in robotics, most of which leverage large-scale Internet videos or pretrained video generation models to enrich visual and motion priors. However, they still face key challenges: a misalignment between coordinate-space actions and pixel-space videos, sensitivity to camera viewpoint, and non-unified architectures across embodiments. To this end, we present BridgeV2W, which converts coordinate-space actions into pixel-aligned embodiment masks rendered from the URDF and camera parameters. These masks are then injected into a pretrained video generation model via a ControlNet-style pathway, which aligns the action control signals with predicted videos, adds view-specific conditioning to accommodate camera viewpoints, and yields a unified world model architecture across embodiments. To mitigate overfitting to static backgrounds, BridgeV2W further introduces a flow-based motion loss that focuses on learning dynamic and task-relevant regions. Experiments on single-arm (DROID) and dual-arm (AgiBot-G1) datasets, covering diverse and challenging conditions with unseen viewpoints and scenes, show that BridgeV2W improves video generation quality compared to prior state-of-the-art methods. We further demonstrate the potential of BridgeV2W on downstream real-world tasks, including policy evaluation and goal-conditioned planning. More results can be found on our project website at https://BridgeV2W.github.io .
- [750] arXiv:2602.03794 [pdf, ps, other]
-
Title: Understanding Agent Scaling in LLM-Based Multi-Agent Systems via DiversityAuthors: Yingxuan Yang, Chengrui Qu, Muning Wen, Laixi Shi, Ying Wen, Weinan Zhang, Adam Wierman, Shangding GuSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
LLM-based multi-agent systems (MAS) have emerged as a promising approach to tackle complex tasks that are difficult for individual LLMs. A natural strategy is to scale performance by increasing the number of agents; however, we find that such scaling exhibits strong diminishing returns in homogeneous settings, while introducing heterogeneity (e.g., different models, prompts, or tools) continues to yield substantial gains. This raises a fundamental question: what limits scaling, and why does diversity help? We present an information-theoretic framework showing that MAS performance is bounded by the intrinsic task uncertainty, not by agent count. We derive architecture-agnostic bounds demonstrating that improvements depend on how many effective channels the system accesses. Homogeneous agents saturate early because their outputs are strongly correlated, whereas heterogeneous agents contribute complementary evidence. We further introduce $K^*$, an effective channel count that quantifies the number of effective channels without ground-truth labels. Empirically, we show that heterogeneous configurations consistently outperform homogeneous scaling: 2 diverse agents can match or exceed the performance of 16 homogeneous agents. Our results provide principled guidelines for building efficient and robust MAS through diversity-aware design. Code and Dataset are available at the link: https://github.com/SafeRL-Lab/Agent-Scaling.
- [751] arXiv:2602.03796 [pdf, ps, other]
-
Title: 3D-Aware Implicit Motion Control for View-Adaptive Human Video GenerationAuthors: Zhixue Fang, Xu He, Songlin Tang, Haoxian Zhang, Qingfeng Li, Xiaoqiang Liu, Pengfei Wan, Kun GaiComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Existing methods for human motion control in video generation typically rely on either 2D poses or explicit 3D parametric models (e.g., SMPL) as control signals. However, 2D poses rigidly bind motion to the driving viewpoint, precluding novel-view synthesis. Explicit 3D models, though structurally informative, suffer from inherent inaccuracies (e.g., depth ambiguity and inaccurate dynamics) which, when used as a strong constraint, override the powerful intrinsic 3D awareness of large-scale video generators. In this work, we revisit motion control from a 3D-aware perspective, advocating for an implicit, view-agnostic motion representation that naturally aligns with the generator's spatial priors rather than depending on externally reconstructed constraints. We introduce 3DiMo, which jointly trains a motion encoder with a pretrained video generator to distill driving frames into compact, view-agnostic motion tokens, injected semantically via cross-attention. To foster 3D awareness, we train with view-rich supervision (i.e., single-view, multi-view, and moving-camera videos), forcing motion consistency across diverse viewpoints. Additionally, we use auxiliary geometric supervision that leverages SMPL only for early initialization and is annealed to zero, enabling the model to transition from external 3D guidance to learning genuine 3D spatial motion understanding from the data and the generator's priors. Experiments confirm that 3DiMo faithfully reproduces driving motions with flexible, text-driven camera control, significantly surpassing existing methods in both motion fidelity and visual quality.
- [752] arXiv:2602.03797 [pdf, ps, other]
-
Title: Manifold Random FeaturesSubjects: Machine Learning (cs.LG)
We present a new paradigm for creating random features to approximate bi-variate functions (in particular, kernels) defined on general manifolds. This new mechanism of Manifold Random Features (MRFs) leverages discretization of the manifold and the recently introduced technique of Graph Random Features (GRFs) to learn continuous fields on manifolds. Those fields are used to find continuous approximation mechanisms that otherwise, in general scenarios, cannot be derived analytically. MRFs provide positive and bounded features, a key property for accurate, low-variance approximation. We show deep asymptotic connection between GRFs, defined on discrete graph objects, and continuous random features used for regular kernels. As a by-product of our method, we re-discover recently introduced mechanism of Gaussian kernel approximation applied in particular to improve linear-attention Transformers, considering simple random walks on graphs and by-passing original complex mathematical computations. We complement our algorithm with a rigorous theoretical analysis and verify in thorough experimental studies.
- [753] arXiv:2602.03798 [pdf, ps, other]
-
Title: FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-TranslationSubjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Assisting non-expert users to develop complex interactive websites has become a popular task for LLM-powered code agents. However, existing code agents tend to only generate frontend web pages, masking the lack of real full-stack data processing and storage with fancy visual effects. Notably, constructing production-level full-stack web applications is far more challenging than only generating frontend web pages, demanding careful control of data flow, comprehensive understanding of constantly updating packages and dependencies, and accurate localization of obscure bugs in the codebase. To address these difficulties, we introduce FullStack-Agent, a unified agent system for full-stack agentic coding that consists of three parts: (1) FullStack-Dev, a multi-agent framework with strong planning, code editing, codebase navigation, and bug localization abilities. (2) FullStack-Learn, an innovative data-scaling and self-improving method that back-translates crawled and synthesized website repositories to improve the backbone LLM of FullStack-Dev. (3) FullStack-Bench, a comprehensive benchmark that systematically tests the frontend, backend and database functionalities of the generated website. Our FullStack-Dev outperforms the previous state-of-the-art method by 8.7%, 38.2%, and 15.9% on the frontend, backend, and database test cases respectively. Additionally, FullStack-Learn raises the performance of a 30B model by 9.7%, 9.5%, and 2.8% on the three sets of test cases through self-improvement, demonstrating the effectiveness of our approach. The code is released at https://github.com/mnluzimu/FullStack-Agent.
- [754] arXiv:2602.03799 [pdf, ps, other]
-
Title: Conformal Reachability for Safe Control in Unknown EnvironmentsSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Designing provably safe control is a core problem in trustworthy autonomy. However, most prior work in this regard assumes either that the system dynamics are known or deterministic, or that the state and action space are finite, significantly limiting application scope. We address this limitation by developing a probabilistic verification framework for unknown dynamical systems which combines conformal prediction with reachability analysis. In particular, we use conformal prediction to obtain valid uncertainty intervals for the unknown dynamics at each time step, with reachability then verifying whether safety is maintained within the conformal uncertainty bounds. Next, we develop an algorithmic approach for training control policies that optimize nominal reward while also maximizing the planning horizon with sound probabilistic safety guarantees. We evaluate the proposed approach in seven safe control settings spanning four domains -- cartpole, lane following, drone control, and safe navigation -- for both affine and nonlinear safety specifications. Our experiments show that the policies we learn achieve the strongest provable safety guarantees while still maintaining high average reward.
- [755] arXiv:2602.03802 [pdf, ps, other]
-
Title: Do We Need Asynchronous SGD? On the Near-Optimality of Synchronous MethodsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA); Optimization and Control (math.OC)
Modern distributed optimization methods mostly rely on traditional synchronous approaches, despite substantial recent progress in asynchronous optimization. We revisit Synchronous SGD and its robust variant, called $m$-Synchronous SGD, and theoretically show that they are nearly optimal in many heterogeneous computation scenarios, which is somewhat unexpected. We analyze the synchronous methods under random computation times and adversarial partial participation of workers, and prove that their time complexities are optimal in many practical regimes, up to logarithmic factors. While synchronous methods are not universal solutions and there exist tasks where asynchronous methods may be necessary, we show that they are sufficient for many modern heterogeneous computation scenarios.
- [756] arXiv:2602.03805 [pdf, ps, other]
-
Title: Prediction of Critical Heat Flux in Rod Bundles Using Tube-Based Hybrid Machine Learning Models in CTFComments: Submitted to the 2026 American Nuclear Society Annual MeetingSubjects: Machine Learning (cs.LG)
The prediction of critical heat flux (CHF) using machine learning (ML) approaches has become a highly active research activity in recent years, the goal of which is to build models more accurate than current conventional approaches such as empirical correlations or lookup tables (LUTs). Previous work developed and deployed tube-based pure and hybrid ML models in the CTF subchannel code, however, full-scale reactor core simulations require the use of rod bundle geometries. Unlike isolated subchannels, rod bundles experience complex thermal hydraulic phenomena such as channel crossflow, spacer grid losses, and effects from unheated conductors. This study investigates the generalization of ML-based CHF prediction models in rod bundles after being trained on tube-based CHF data. A purely data-driven DNN and two hybrid bias-correction models were implemented in the CTF subchannel code and used to predict CHF location and magnitude in the Combustion Engineering 5-by-5 bundle CHF test series. The W-3 correlation, Bowring correlation, and Groeneveld LUT were used as baseline comparators. On average, all three ML-based approaches produced magnitude and location predictions more accurate than the baseline models, with the hybrid LUT model exhibiting the most favorable performance metrics.
- [757] arXiv:2602.03806 [pdf, ps, other]
-
Title: Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code GenerationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)
Recently, there have been significant research interests in training large language models (LLMs) with reinforcement learning (RL) on real-world tasks, such as multi-turn code generation. While online RL tends to perform better than offline RL, its higher training cost and instability hinders wide adoption. In this paper, we build on the observation that multi-turn code generation can be formulated as a one-step recoverable Markov decision process and propose contextual bandit learning with offline trajectories (Cobalt), a new method that combines the benefits of online and offline RL. Cobalt first collects code generation trajectories using a reference LLM and divides them into partial trajectories as contextual prompts. Then, during online bandit learning, the LLM is trained to complete each partial trajectory prompt through single-step code generation. Cobalt outperforms two multi-turn online RL baselines based on GRPO and VeRPO, and substantially improves R1-Distill 8B and Qwen3 8B by up to 9.0 and 6.2 absolute Pass@1 scores on LiveCodeBench. Also, we analyze LLMs' in-context reward hacking behaviors and augment Cobalt training with perturbed trajectories to mitigate this issue. Overall, our results demonstrate Cobalt as a promising solution for iterative decision-making tasks like multi-turn code generation. Our code and data are available at https://github.com/OSU-NLP-Group/cobalt.
- [758] arXiv:2602.03808 [pdf, ps, other]
-
Title: Enhancing Imbalanced Node Classification via Curriculum-Guided Feature Learning and Three-Stage Attention NetworkSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Imbalanced node classification in graph neural networks (GNNs) happens when some labels are much more common than others, which causes the model to learn unfairly and perform badly on the less common classes. To solve this problem, we propose a Curriculum-Guided Feature Learning and Three-Stage Attention Network (CL3AN-GNN), a learning network that uses a three-step attention system (Engage, Enact, Embed) similar to how humans learn. The model begins by engaging with structurally simpler features, defined as (1) local neighbourhood patterns (1-hop), (2) low-degree node attributes, and (3) class-separable node pairs identified via initial graph convolutional networks and graph attention networks (GCN and GAT) embeddings. This foundation enables stable early learning despite label skew. The Enact stage then addresses complicated aspects: (1) connections that require multiple steps, (2) edges that connect different types of nodes, and (3) nodes at the edges of minority classes by using adjustable attention weights. Finally, Embed consolidates these features via iterative message passing and curriculum-aligned loss weighting. We evaluate CL3AN-GNN on eight Open Graph Benchmark datasets spanning social, biological, and citation networks. Experiments show consistent improvements across all datasets in accuracy, F1-score, and AUC over recent state-of-the-art methods. The model's step-by-step method works well with different types of graph datasets, showing quicker results than training everything at once, better performance on new, imbalanced graphs, and clear explanations of each step using gradient stability and attention correlation learning curves. This work provides both a theoretically grounded framework for curriculum learning in GNNs and practical evidence of its effectiveness against imbalances, validated through metrics, convergence speeds, and generalisation tests.
- [759] arXiv:2602.03809 [pdf, ps, other]
-
Title: Split&Splat: Zero-Shot Panoptic Segmentation via Explicit Instance Modeling and 3D Gaussian SplattingSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
3D Gaussian Splatting (GS) enables fast and high-quality scene reconstruction, but it lacks an object-consistent and semantically aware structure. We propose Split&Splat, a framework for panoptic scene reconstruction using 3DGS. Our approach explicitly models object instances. It first propagates instance masks across views using depth, thus producing view-consistent 2D masks. Each object is then reconstructed independently and merged back into the scene while refining its boundaries. Finally, instance-level semantic descriptors are embedded in the reconstructed objects, supporting various applications, including panoptic segmentation, object retrieval, and 3D editing. Unlike existing methods, Split&Splat tackles the problem by first segmenting the scene and then reconstructing each object individually. This design naturally supports downstream tasks and allows Split&Splat to achieve state-of-the-art performance on the ScanNetv2 segmentation benchmark.
- [760] arXiv:2602.03811 [pdf, ps, other]
-
Title: Progressive Checkerboards for Autoregressive Multiscale Image GenerationAuthors: David EigenSubjects: Computer Vision and Pattern Recognition (cs.CV)
A key challenge in autoregressive image generation is to efficiently sample independent locations in parallel, while still modeling mutual dependencies with serial conditioning. Some recent works have addressed this by conditioning between scales in a multiscale pyramid. Others have looked at parallelizing samples in a single image using regular partitions or randomized orders. In this work we examine a flexible, fixed ordering based on progressive checkerboards for multiscale autoregressive image generation. Our ordering draws samples in parallel from evenly spaced regions at each scale, maintaining full balance in all levels of a quadtree subdivision at each step. This enables effective conditioning both between and within scales. Intriguingly, we find evidence that in our balanced setting, a wide range of scale-up factors lead to similar results, so long as the total number of serial steps is constant. On class-conditional ImageNet, our method achieves competitive performance compared to recent state-of-the-art autoregressive systems with like model capacity, using fewer sampling steps.
- [761] arXiv:2602.03812 [pdf, ps, other]
-
Title: Antidistillation FingerprintingAuthors: Yixuan Even Xu, John Kirchenbauer, Yash Savani, Asher Trockman, Alexander Robey, Tom Goldstein, Fei Fang, J. Zico KolterComments: 26 pages, 11 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Model distillation enables efficient emulation of frontier large language models (LLMs), creating a need for robust mechanisms to detect when a third-party student model has trained on a teacher model's outputs. However, existing fingerprinting techniques that could be used to detect such distillation rely on heuristic perturbations that impose a steep trade-off between generation quality and fingerprinting strength, often requiring significant degradation of utility to ensure the fingerprint is effectively internalized by the student. We introduce antidistillation fingerprinting (ADFP), a principled approach that aligns the fingerprinting objective with the student's learning dynamics. Building upon the gradient-based framework of antidistillation sampling, ADFP utilizes a proxy model to identify and sample tokens that directly maximize the expected detectability of the fingerprint in the student after fine-tuning, rather than relying on the incidental absorption of the un-targeted biases of a more naive watermark. Experiments on GSM8K and OASST1 benchmarks demonstrate that ADFP achieves a significant Pareto improvement over state-of-the-art baselines, yielding stronger detection confidence with minimal impact on utility, even when the student model's architecture is unknown.
- [762] arXiv:2602.03814 [pdf, ps, other]
-
Title: Conformal Thinking: Risk Control for Reasoning on a Compute BudgetAuthors: Xi Wang, Anushri Suresh, Alvin Zhang, Rishi More, William Jurayj, Benjamin Van Durme, Mehrdad Farajtabar, Daniel Khashabi, Eric NalisnickSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases, motivating adaptive reasoning -- spending tokens when they improve reliability and stopping early when additional computation is unlikely to help. However, setting the token budget, as well as the threshold for adaptive reasoning, is a practical challenge that entails a fundamental risk-accuracy trade-off. We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute. Our framework introduces an upper threshold that stops reasoning when the model is confident (risking incorrect output) and a novel parametric lower threshold that preemptively stops unsolvable instances (risking premature stoppage). Given a target risk and a validation set, we use distribution-free risk control to optimally specify these stopping mechanisms. For scenarios with multiple budget controlling criteria, we incorporate an efficiency loss to select the most computationally efficient exiting mechanism. Empirical results across diverse reasoning tasks and models demonstrate the effectiveness of our risk control approach, demonstrating computational efficiency gains from the lower threshold and ensemble stopping mechanisms while adhering to the user-specified risk target.
- [763] arXiv:2602.03815 [pdf, ps, other]
-
Title: Fast-Slow Efficient Training for Multimodal Large Language Models via Visual Token PruningSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Multimodal Large Language Models (MLLMs) suffer from severe training inefficiency issue, which is associated with their massive model sizes and visual token numbers. Existing efforts in efficient training focus on reducing model sizes or trainable parameters. Inspired by the success of Visual Token Pruning (VTP) in improving inference efficiency, we are exploring another substantial research direction for efficient training by reducing visual tokens. However, applying VTP at the training stage results in a training-inference mismatch: pruning-trained models perform poorly when inferring on non-pruned full visual token sequences. To close this gap, we propose DualSpeed, a fast-slow framework for efficient training of MLLMs. The fast-mode is the primary mode, which incorporates existing VTP methods as plugins to reduce visual tokens, along with a mode isolator to isolate the model's behaviors. The slow-mode is the auxiliary mode, where the model is trained on full visual sequences to retain training-inference consistency. To boost its training, it further leverages self-distillation to learn from the sufficiently trained fast-mode. Together, DualSpeed can achieve both training efficiency and non-degraded performance. Experiments show DualSpeed accelerates the training of LLaVA-1.5 by 2.1$\times$ and LLaVA-NeXT by 4.0$\times$, retaining over 99% performance. Code: https://github.com/dingkun-zhang/DualSpeed
- [764] arXiv:2602.03816 [pdf, ps, other]
-
Title: SymPlex: A Structure-Aware Transformer for Symbolic PDE SolvingComments: 27 pagesSubjects: Machine Learning (cs.LG)
We propose SymPlex, a reinforcement learning framework for discovering analytical symbolic solutions to partial differential equations (PDEs) without access to ground-truth expressions. SymPlex formulates symbolic PDE solving as tree-structured decision-making and optimizes candidate solutions using only the PDE and its boundary conditions. At its core is SymFormer, a structure-aware Transformer that models hierarchical symbolic dependencies via tree-relative self-attention and enforces syntactic validity through grammar-constrained autoregressive decoding, overcoming the limited expressivity of sequence-based generators. Unlike numerical and neural approaches that approximate solutions in discretized or implicit function spaces, SymPlex operates directly in symbolic expression space, enabling interpretable and human-readable solutions that naturally represent non-smooth behavior and explicit parametric dependence. Empirical results demonstrate exact recovery of non-smooth and parametric PDE solutions using deep learning-based symbolic methods.
- [765] arXiv:2602.03817 [pdf, ps, other]
-
Title: Adaptive Evidence Weighting for Audio-Spatiotemporal FusionSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
Many machine learning systems have access to multiple sources of evidence for the same prediction target, yet these sources often differ in reliability and informativeness across inputs. In bioacoustic classification, species identity may be inferred both from the acoustic signal and from spatiotemporal context such as location and season; while Bayesian inference motivates multiplicative evidence combination, in practice we typically only have access to discriminative predictors rather than calibrated generative models. We introduce \textbf{F}usion under \textbf{IN}dependent \textbf{C}onditional \textbf{H}ypotheses (\textbf{FINCH}), an adaptive log-linear evidence fusion framework that integrates a pre-trained audio classifier with a structured spatiotemporal predictor. FINCH learns a per-sample gating function that estimates the reliability of contextual information from uncertainty and informativeness statistics. The resulting fusion family \emph{contains} the audio-only classifier as a special case and explicitly bounds the influence of contextual evidence, yielding a risk-contained hypothesis class with an interpretable audio-only fallback. Across benchmarks, FINCH consistently outperforms fixed-weight fusion and audio-only baselines, improving robustness and error trade-offs even when contextual information is weak in isolation. We achieve state-of-the-art performance on CBI and competitive or improved performance on several subsets of BirdSet using a lightweight, interpretable, evidence-based approach. Code is available: \texttt{\href{https://anonymous.4open.science/r/birdnoise-85CD/README.md}{anonymous-repository}}
- [766] arXiv:2602.03821 [pdf, ps, other]
-
Title: xDevSM: An Open-Source Framework for Portable, AI-Ready xApps Across Heterogeneous O-RAN DeploymentsAuthors: Angelo Feraudo, Stefano Maxenti, Andrea Lacava, Leonardo Bonati, Paolo Bellavista, Michele Polese, Tommaso MelodiaSubjects: Networking and Internet Architecture (cs.NI)
Openness and programmability in the O-RAN architecture enable closed-loop control of the Radio Access Network (RAN). Artificial Intelligence (AI)-driven xApps, in the near-real-time RAN Intelligent Controller (RIC), can learn from network data, anticipate future conditions, and dynamically adapt radio configurations. However, their development and adoption are hindered by the complexity of low-level RAN control and monitoring message models exposed over the O-RAN E2 interface, limited interoperability across heterogeneous RAN software stacks, and the lack of developer-friendly frameworks. In this paper, we introduce xDevSM, a framework that significantly lowers the barrier to xApp development by unifying observability and control in O-RAN deployment. By exposing a rich set of Key Performance Measurements (KPMs) and enabling fine-grained radio resource management controls, xDevSM provides the essential foundation for practical AI-driven xApps. We validate xDevSM on real-world testbeds, leveraging Commercial Off-the-Shelf (COTS) devices together with heterogeneous RAN hardware, including Universal Software Radio Peripheral (USRP)-based Software-defined Radios (SDRs) and Foxconn radio units, and show its seamless interoperability across multiple open-source RAN software stacks. Furthermore, we discuss and evaluate the capabilities of our framework through three O-RAN-based scenarios of high interest: (i) KPM-based monitoring of network performance, (ii) slice-level Physical Resource Block (PRB) allocation control across multiple User Equipments (UEs) and slices, and (iii) mobility-aware handover control, showing that xDevSM can implement intelligent closed-loop applications, laying the groundwork for learning-based optimization in heterogeneous RAN deployments. xDevSM is open source and available as foundational tool for the research community.
- [767] arXiv:2602.03822 [pdf, ps, other]
-
Title: They Said Memes Were Harmless-We Found the Ones That Hurt: Decoding Jokes, Symbols, and Cultural ReferencesAuthors: Sahil Tripathi, Gautam Siddharth Kashyap, Mehwish Nasim, Jian Yang, Jiechao Gao, Usman NaseemComments: Accepted at the The Web Conference 2026 (Research Track)Subjects: Computation and Language (cs.CL)
Meme-based social abuse detection is challenging because harmful intent often relies on implicit cultural symbolism and subtle cross-modal incongruence. Prior approaches, from fusion-based methods to in-context learning with Large Vision-Language Models (LVLMs), have made progress but remain limited by three factors: i) cultural blindness (missing symbolic context), ii) boundary ambiguity (satire vs. abuse confusion), and iii) lack of interpretability (opaque model reasoning). We introduce CROSS-ALIGN+, a three-stage framework that systematically addresses these limitations: (1) Stage I mitigates cultural blindness by enriching multimodal representations with structured knowledge from ConceptNet, Wikidata, and Hatebase; (2) Stage II reduces boundary ambiguity through parameter-efficient LoRA adapters that sharpen decision boundaries; and (3) Stage III enhances interpretability by generating cascaded explanations. Extensive experiments on five benchmarks and eight LVLMs demonstrate that CROSS-ALIGN+ consistently outperforms state-of-the-art methods, achieving up to 17% relative F1 improvement while providing interpretable justifications for each decision.
- [768] arXiv:2602.03825 [pdf, ps, other]
-
Title: Robust Intervention Learning from Emergency Stop InterventionsSubjects: Machine Learning (cs.LG)
Human interventions are a common source of data in autonomous systems during testing. These interventions provide an important signal about where the current policy needs improvement, but are often noisy and incomplete. We define Robust Intervention Learning (RIL) as the problem of learning from intervention data while remaining robust to the quality and informativeness of the intervention signal. In the best case, interventions are precise and avoiding them is sufficient to solve the task, but in many realistic settings avoiding interventions is necessary but not sufficient for achieving good performance. We study robust intervention learning in the context of emergency stop interventions and propose Residual Intervention Fine-Tuning (RIFT), a residual fine-tuning algorithm that treats intervention feedback as an incomplete learning signal and explicitly combines it with a prior policy. By framing intervention learning as a fine-tuning problem, our approach leverages structure encoded in the prior policy to resolve ambiguity when intervention signals under-specify the task. We provide theoretical analysis characterizing conditions under which this formulation yields principled policy improvement, and identify regimes where intervention learning is expected to fail. Our experiments reveal that residual fine-tuning enables robust and consistent policy improvement across a range of intervention strategies and prior policy qualities, and highlight robust intervention learning as a promising direction for future work.
- [769] arXiv:2602.03826 [pdf, ps, other]
-
Title: Continuous Control of Editing Models via Adaptive-Origin GuidanceComments: Project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Diffusion-based editing models have emerged as a powerful tool for semantic image and video manipulation. However, existing models lack a mechanism for smoothly controlling the intensity of text-guided edits. In standard text-conditioned generation, Classifier-Free Guidance (CFG) impacts prompt adherence, suggesting it as a potential control for edit intensity in editing models. However, we show that scaling CFG in these models does not produce a smooth transition between the input and the edited result. We attribute this behavior to the unconditional prediction, which serves as the guidance origin and dominates the generation at low guidance scales, while representing an arbitrary manipulation of the input content. To enable continuous control, we introduce Adaptive-Origin Guidance (AdaOr), a method that adjusts this standard guidance origin with an identity-conditioned adaptive origin, using an identity instruction corresponding to the identity manipulation. By interpolating this identity prediction with the standard unconditional prediction according to the edit strength, we ensure a continuous transition from the input to the edited result. We evaluate our method on image and video editing tasks, demonstrating that it provides smoother and more consistent control compared to current slider-based editing approaches. Our method incorporates an identity instruction into the standard training framework, enabling fine-grained control at inference time without per-edit procedure or reliance on specialized datasets.
- [770] arXiv:2602.03827 [pdf, ps, other]
-
Title: Perfect Network Resilience in Polynomial TimeSubjects: Data Structures and Algorithms (cs.DS); Networking and Internet Architecture (cs.NI)
Modern communication networks support local fast rerouting mechanisms to quickly react to link failures: nodes store a set of conditional rerouting rules which define how to forward an incoming packet in case of incident link failures. The rerouting decisions at any node $v$ must rely solely on local information available at $v$: the link from which a packet arrived at $v$, the target of the packet, and the incident link failures at $v$. Ideally, such rerouting mechanisms provide perfect resilience: any packet is routed from its source to its target as long as the two are connected in the underlying graph after the link failures. Already in their seminal paper at ACM PODC '12, Feigenbaum, Godfrey, Panda, Schapira, Shenker, and Singla showed that perfect resilience cannot always be achieved. While the design of local rerouting algorithms has received much attention since then, we still lack a detailed understanding of when perfect resilience is achievable.
This paper closes this gap and presents a complete characterization of when perfect resilience can be achieved. This characterization also allows us to design an $O(n)$-time algorithm to decide whether a given instance is perfectly resilient and an $O(nm)$-time algorithm to compute perfectly resilient rerouting rules whenever it is. Our algorithm is also attractive for the simple structure of the rerouting rules it uses, known as skipping in the literature: alternative links are chosen according to an ordered priority list (per in-port), where failed links are simply skipped. Intriguingly, our result also implies that in the context of perfect resilience, skipping rerouting rules are as powerful as more general rerouting rules. This partially answers a long-standing open question by Chiesa, Nikolaevskiy, Mitrovic, Gurtov, Madry, Schapira, and Shenker [IEEE/ACM Transactions on Networking, 2017] in the affirmative. - [771] arXiv:2602.03828 [pdf, ps, other]
-
Title: AutoFigure: Generating and Refining Publication-Ready Scientific IllustrationsAuthors: Minjun Zhu, Zhen Lin, Yixuan Weng, Panzhong Lu, Qiujie Xie, Yifan Wei, Sifan Liu, Qiyao Sun, Yue ZhangComments: Accepted at the ICLR 2026Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Digital Libraries (cs.DL)
High-quality scientific illustrations are crucial for effectively communicating complex scientific and technical concepts, yet their manual creation remains a well-recognized bottleneck in both academia and industry. We present FigureBench, the first large-scale benchmark for generating scientific illustrations from long-form scientific texts. It contains 3,300 high-quality scientific text-figure pairs, covering diverse text-to-illustration tasks from scientific papers, surveys, blogs, and textbooks. Moreover, we propose AutoFigure, the first agentic framework that automatically generates high-quality scientific illustrations based on long-form scientific text. Specifically, before rendering the final result, AutoFigure engages in extensive thinking, recombination, and validation to produce a layout that is both structurally sound and aesthetically refined, outputting a scientific illustration that achieves both structural completeness and aesthetic appeal. Leveraging the high-quality data from FigureBench, we conduct extensive experiments to test the performance of AutoFigure against various baseline methods. The results demonstrate that AutoFigure consistently surpasses all baseline methods, producing publication-ready scientific illustrations. The code, dataset and huggingface space are released in https://github.com/ResearAI/AutoFigure.
- [772] arXiv:2602.03837 [pdf, ps, other]
-
Title: Accelerating Scientific Research with Gemini: Case Studies and Common TechniquesAuthors: David P. Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, MohammadHossein Bateni, Simina Branzei, Michael P. Brenner, Lin Chen, Ying Feng, Lance Fortnow, Gang Fu, Ziyi Guan, Zahra Hadizadeh, Mohammad T. Hajiaghayi, Mahdi JafariRaviz, Adel Javanmard, Karthik C. S., Ken-ichi Kawarabayashi, Ravi Kumar, Silvio Lattanzi, Euiwoong Lee, Yi Li, Ioannis Panageas, Dimitris Paparas, Benjamin Przybocki, Bernardo Subercaseaux, Ola Svensson, Shayan Taherijam, Xuan Wu, Eylon Yogev, Morteza Zadimoghaddam, Samson Zhou, Vahab MirrokniSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Recent advances in large language models (LLMs) have opened new avenues for accelerating scientific research. While models are increasingly capable of assisting with routine tasks, their ability to contribute to novel, expert-level mathematical discovery is less understood. We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models, specifically Google's Gemini-based models (in particular Gemini Deep Think and its advanced variants), to solve open problems, refute conjectures, and generate new proofs across diverse areas in theoretical computer science, as well as other areas such as economics, optimization, and physics. Based on these experiences, we extract common techniques for effective human-AI collaboration in theoretical research, such as iterative refinement, problem decomposition, and cross-disciplinary knowledge transfer. While the majority of our results stem from this interactive, conversational methodology, we also highlight specific instances that push beyond standard chat interfaces. These include deploying the model as a rigorous adversarial reviewer to detect subtle flaws in existing proofs, and embedding it within a "neuro-symbolic" loop that autonomously writes and executes code to verify complex derivations. Together, these examples highlight the potential of AI not just as a tool for automation, but as a versatile, genuine partner in the creative process of scientific discovery.
- [773] arXiv:2602.03838 [pdf, ps, other]
-
Title: PrevizWhiz: Combining Rough 3D Scenes and 2D Video to Guide Generative Video PrevisualizationComments: 21 pages, 13 figures; accepted and to appear at CHI 2026Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
In pre-production, filmmakers and 3D animation experts must rapidly prototype ideas to explore a film's possibilities before fullscale production, yet conventional approaches involve trade-offs in efficiency and expressiveness. Hand-drawn storyboards often lack spatial precision needed for complex cinematography, while 3D previsualization demands expertise and high-quality rigged assets. To address this gap, we present PrevizWhiz, a system that leverages rough 3D scenes in combination with generative image and video models to create stylized video previews. The workflow integrates frame-level image restyling with adjustable resemblance, time-based editing through motion paths or external video inputs, and refinement into high-fidelity video clips. A study with filmmakers demonstrates that our system lowers technical barriers for film-makers, accelerates creative iteration, and effectively bridges the communication gap, while also surfacing challenges of continuity, authorship, and ethical consideration in AI-assisted filmmaking.
- [774] arXiv:2602.03839 [pdf, ps, other]
-
Title: Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RLComments: 32 pages, 14 figuresSubjects: Machine Learning (cs.LG)
Reinforcement learning (RL) is a critical component for post-training large language models (LLMs). However, in bandwidth-constrained distributed RL, scalability is often bottlenecked by the synchronization of policy weights from trainers to inference workers, particularly over commodity networks or in decentralized settings. While recent studies suggest that RL updates modify only a small fraction of model parameters, these observations are typically based on coarse checkpoint differences. We present a systematic empirical study of weight-update sparsity at both step-level and multi-step granularities, examining its evolution across training dynamics, off-policy delay, and model scale. We find that update sparsity is consistently high, frequently exceeding 99% across practically relevant settings. Leveraging this structure, we propose PULSE (Patch Updates via Lossless Sparse Encoding), a simple yet highly efficient lossless weight synchronization method that transmits only the indices and values of modified parameters. PULSE is robust to transmission errors and avoids floating-point drift inherent in additive delta schemes. In bandwidth-constrained decentralized environments, our approach achieves over 100x (14 GB to ~108 MB) communication reduction while maintaining bit-identical training dynamics and performance compared to full weight synchronization. By exploiting this structure, PULSE enables decentralized RL training to approach centralized throughput, reducing the bandwidth required for weight synchronization from 20 Gbit/s to 0.2 Gbit/s to maintain high GPU utilization.
- [775] arXiv:2602.03840 [pdf, ps, other]
-
Title: Investigating Quantum Circuit Designs Using Neuro-EvolutionComments: Submitted to The Genetic and Evolutionary Computation Conference (GECCO) 2026. Under ReviewSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Designing effective quantum circuits remains a central challenge in quantum computing, as circuit structure strongly influences expressivity, trainability, and hardware feasibility. Current approaches, whether using manually designed circuit templates, fixed heuristics, or automated rules, face limitations in scalability, flexibility, and adaptability, often producing circuits that are poorly matched to the specific problem or quantum hardware. In this work, we propose the Evolutionary eXploration of Augmenting Quantum Circuits (EXAQC), an evolutionary approach to the automated design and training of parameterized quantum circuits (PQCs) which leverages and extends on strategies from neuroevolution and genetic programming. The proposed method jointly searches over gate types, qubit connectivity, parameterization, and circuit depth while respecting hardware and noise constraints. The method supports both Qiskit and Pennylane libraries, allowing the user to configure every aspect. This work highlights evolutionary search as a critical tool for advancing quantum machine learning and variational quantum algorithms, providing a principled pathway toward scalable, problem-aware, and hardware-efficient quantum circuit design. Preliminary results demonstrate that circuits evolved on classification tasks are able to achieve over 90% accuracy on most of the benchmark datasets with a limited computational budget, and are able to emulate target circuit quantum states with high fidelity scores.
- [776] arXiv:2602.03845 [pdf, ps, other]
-
Title: Parallel-Probe: Towards Efficient Parallel Thinking via 2D ProbingAuthors: Tong Zheng, Chengsong Huang, Runpeng Dai, Yun He, Rui Liu, Xin Ni, Huiwen Bao, Kaishen Wang, Hongtu Zhu, Jiaxin Huang, Furong Huang, Heng HuangComments: 14 pagesSubjects: Computation and Language (cs.CL)
Parallel thinking has emerged as a promising paradigm for reasoning, yet it imposes significant computational burdens. Existing efficiency methods primarily rely on local, per-trajectory signals and lack principled mechanisms to exploit global dynamics across parallel branches. We introduce 2D probing, an interface that exposes the width-depth dynamics of parallel thinking by periodically eliciting intermediate answers from all branches. Our analysis reveals three key insights: non-monotonic scaling across width-depth allocations, heterogeneous reasoning branch lengths, and early stabilization of global consensus. Guided by these insights, we introduce $\textbf{{Parallel-Probe}}$, a training-free controller designed to optimize online parallel thinking. Parallel-Probe employs consensus-based early stopping to regulate reasoning depth and deviation-based branch pruning to dynamically adjust width. Extensive experiments across three benchmarks and multiple models demonstrate that Parallel-Probe establishes a superior Pareto frontier for test-time scaling. Compared to standard majority voting, it reduces sequential tokens by up to $\textbf{35.8}$% and total token cost by over $\textbf{25.8}$% while maintaining competitive accuracy.
- [777] arXiv:2602.03846 [pdf, ps, other]
-
Title: PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual LearningAuthors: Romain CosentinoSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We develop a continual learning method for pretrained models that \emph{requires no access to old-task data}, addressing a practical barrier in foundation model adaptation where pretraining distributions are often unavailable. Our key observation is that pretrained networks exhibit substantial \emph{geometric redundancy}, and that this redundancy can be exploited in two complementary ways. First, redundant neurons provide a proxy for dominant pretraining-era feature directions, enabling the construction of approximately protected update subspaces directly from pretrained weights. Second, redundancy offers a natural bias for \emph{where} to place plasticity: by restricting updates to a subset of redundant neurons and constraining the remaining degrees of freedom, we obtain update families with reduced functional drift on the old-data distribution and improved worst-case retention guarantees. These insights lead to \textsc{PLATE} (\textbf{Pla}sticity-\textbf{T}unable \textbf{E}fficient Adapters), a continual learning method requiring no past-task data that provides explicit control over the plasticity-retention trade-off. PLATE parameterizes each layer with a structured low-rank update $\Delta W = B A Q^\top$, where $B$ and $Q$ are computed once from pretrained weights and kept frozen, and only $A$ is trained on the new task. The code is available at https://github.com/SalesforceAIResearch/PLATE.
- [778] arXiv:2602.03847 [pdf, ps, other]
-
Title: EventNeuS: 3D Mesh Reconstruction from a Single Event CameraComments: 13 pages, 10 figures, 3 tables; project page: this https URLJournal-ref: International Conference on 3D Vision (3DV) 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
Event cameras offer a considerable alternative to RGB cameras in many scenarios. While there are recent works on event-based novel-view synthesis, dense 3D mesh reconstruction remains scarcely explored and existing event-based techniques are severely limited in their 3D reconstruction accuracy. To address this limitation, we present EventNeuS, a self-supervised neural model for learning 3D representations from monocular colour event streams. Our approach, for the first time, combines 3D signed distance function and density field learning with event-based supervision. Furthermore, we introduce spherical harmonics encodings into our model for enhanced handling of view-dependent effects. EventNeuS outperforms existing approaches by a significant margin, achieving 34% lower Chamfer distance and 31% lower mean absolute error on average compared to the best previous method.
Cross-lists for Wed, 4 Feb 26
- [779] arXiv:2602.00280 (cross-list from math.AG) [pdf, ps, other]
-
Title: An algorithm for annihilator and Bernstein-Sato polynomial of a rational functionSubjects: Algebraic Geometry (math.AG); Symbolic Computation (cs.SC)
The singularity theory of rational functions, i.e., the quotient of two polynomials, has been investigated in the past two decades. The Bernstein-Sato polynomial of a rational function has recently been introduced by Takeuchi. However, only trivial examples are known. We provide an algorithm for computing the Bernstein-Sato polynomial in this context. The strategy is to compute the annihilator of the rational function by using the annihilator of the pair consisting of the numerator and denominator of the quotient. In a natural way a non-vanishing condition on the Bernstein-Sato ideal of the pair appears. This method has been implemented in freely available computer algebra system SINGULAR. It relies on Gr\"obner bases in noncommutative PBW algebras. The algorithm allows us to exhibit some explicit non-trivial examples and to support some existing conjectures.
- [780] arXiv:2602.02503 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Joint single-shot ToA and DoA estimation for VAA-based BLE ranging with phase ambiguity: A deep learning-based approachSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Conventional direction-of-arrival (DoA) estimation methods rely on multi-antenna arrays, which are costly to implement on size-constrained Bluetooth Low Energy (BLE) devices. Virtual antenna array (VAA) techniques enable DoA estimation with a single antenna, making angle estimation feasible on such devices. However, BLE only provides a single-shot two-way channel frequency response (CFR) with a binary phase ambiguity issue, which hinders the direct application of VAA. To address this challenge, we propose a unified model that combines VAA with BLE two-way CFR, and introduce a neural network based phase recovery framework that employs row / column predictors with a voting mechanism to resolve the ambiguity. The recovered one-way CFR then enables super resolution algorithms such as MUSIC for joint time of arrival (ToA) and DoA estimation. Simulation results demonstrate that the proposed method achieves superior performance under non-uniform VAAs, with mean square errors approaching the Cramer Rao bound at SNR $\geq$ 5 dB.
- [781] arXiv:2602.02552 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Super-résolution non supervisée d'images hyperspectrales de télédétection utilisant un entraînement entièrement synthétiqueComments: in French languageSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Hyperspectral single image super-resolution (SISR) aims to enhance spatial resolution while preserving the rich spectral information of hyperspectral images. Most existing methods rely on supervised learning with high-resolution ground truth data, which is often unavailable in practice. To overcome this limitation, we propose an unsupervised learning approach based on synthetic abundance data. The hyperspectral image is first decomposed into endmembers and abundance maps through hyperspectral unmixing. A neural network is then trained to super-resolve these maps using data generated with the dead leaves model, which replicates the statistical properties of real abundances. The final super-resolution hyperspectral image is reconstructed by recombining the super-resolved abundance maps with the endmembers. Experimental results demonstrate the effectiveness of our method and the relevance of synthetic data for training.
- [782] arXiv:2602.02577 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Relaxed Triangle Inequality for Kullback-Leibler Divergence Between Multivariate Gaussian DistributionsSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
The Kullback-Leibler (KL) divergence is not a proper distance metric and does not satisfy the triangle inequality, posing theoretical challenges in certain practical applications. Existing work has demonstrated that KL divergence between multivariate Gaussian distributions follows a relaxed triangle inequality. Given any three multivariate Gaussian distributions $\mathcal{N}_1, \mathcal{N}_2$, and $\mathcal{N}_3$, if $KL(\mathcal{N}_1, \mathcal{N}_2)\leq \epsilon_1$ and $KL(\mathcal{N}_2, \mathcal{N}_3)\leq \epsilon_2$, then $KL(\mathcal{N}_1, \mathcal{N}_3)< 3\epsilon_1+3\epsilon_2+2\sqrt{\epsilon_1\epsilon_2}+o(\epsilon_1)+o(\epsilon_2)$. However, the supremum of $KL(\mathcal{N}_1, \mathcal{N}_3)$ is still unknown. In this paper, we investigate the relaxed triangle inequality for the KL divergence between multivariate Gaussian distributions and give the supremum of $KL(\mathcal{N}_1, \mathcal{N}_3)$ as well as the conditions when the supremum can be attained. When $\epsilon_1$ and $\epsilon_2$ are small, the supremum is $\epsilon_1+\epsilon_2+\sqrt{\epsilon_1\epsilon_2}+o(\epsilon_1)+o(\epsilon_2)$. Finally, we demonstrate several applications of our results in out-of-distribution detection with flow-based generative models and safe reinforcement learning.
- [783] arXiv:2602.02587 (cross-list from physics.soc-ph) [pdf, ps, other]
-
Title: The Evolution of Lying in a Spatially-Explicit Prisoner's Dilemma ModelAuthors: Gregg HartvigsenComments: 18 pages, 11 figuresSubjects: Physics and Society (physics.soc-ph); Statistical Mechanics (cond-mat.stat-mech); Computer Science and Game Theory (cs.GT); Populations and Evolution (q-bio.PE)
I present the results from a spatial model of the prisoner's dilemma, played on a toroidal lattice. Each individual has a default strategy of either cooperating ($C$) or defecting ($D$). Two strategies were tested, including ``tit-for-tat'' (TFT), in which individuals play their opponent's last play, or simply playing their default play. Each individual also has a probability of telling the truth ($0 \leq P_{truth} \leq 1$) about their last play. This parameter, which can evolve over time, allows individuals to be, for instance, a defector but present as a cooperator regarding their last play. This leads to interesting dynamics where mixed populations of defectors and cooperators with $P_{truth} \geq 0.75$ move toward populations of truth-telling cooperators. Likewise, mixed populations with $P_{truth} < 0.7$ become populations of lying defectors. Both such populations are stable because they each have higher average scores than populations with intermediate values of $P_{truth}$. Applications of this model are discussed with regards to both humans and animals.
- [784] arXiv:2602.02598 (cross-list from physics.soc-ph) [pdf, ps, other]
-
Title: Social Catalysts, Not Moral Agents: The Illusion of Alignment in LLM SocietiesComments: 7 pages, 5 figuresSubjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Multiagent Systems (cs.MA)
The rapid evolution of Large Language Models (LLMs) has led to the emergence of Multi-Agent Systems where collective cooperation is often threatened by the "Tragedy of the Commons." This study investigates the effectiveness of Anchoring Agents--pre-programmed altruistic entities--in fostering cooperation within a Public Goods Game (PGG). Using a full factorial design across three state-of-the-art LLMs, we analyzed both behavioral outcomes and internal reasoning chains. While Anchoring Agents successfully boosted local cooperation rates, cognitive decomposition and transfer tests revealed that this effect was driven by strategic compliance and cognitive offloading rather than genuine norm internalization. Notably, most agents reverted to self-interest in new environments, and advanced models like GPT-4.1 exhibited a "Chameleon Effect," masking strategic defection under public scrutiny. These findings highlight a critical gap between behavioral modification and authentic value alignment in artificial societies.
- [785] arXiv:2602.02603 (cross-list from eess.IV) [pdf, ps, other]
-
Title: EchoJEPA: A Latent Predictive Foundation Model for EchocardiographyAuthors: Alif Munim, Adibvafa Fallahpour, Teodora Szasz, Ahmadreza Attarpour, River Jiang, Brana Sooriyakanthan, Maala Sooriyakanthan, Heather Whitney, Jeremy Slivnick, Barry Rubin, Wendy Tsang, Bo WangSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Foundation models for echocardiography promise to reduce annotation burden and improve diagnostic consistency by learning generalizable representations from large unlabeled video archives. However, current approaches fail to disentangle anatomical signal from the stochastic speckle and acquisition artifacts that dominate ultrasound imagery. We present EchoJEPA, a foundation model for echocardiography trained on 18 million echocardiograms across 300K patients, the largest pretraining corpus for this modality to date. We also introduce a novel multi-view probing framework with factorized stream embeddings that standardizes evaluation under frozen backbones. Compared to prior methods, EchoJEPA reduces left ventricular ejection fraction estimation error by 19% and achieves 87.4% view classification accuracy. EchoJEPA exhibits strong sample efficiency, reaching 78.6% accuracy with only 1% of labeled data versus 42.1% for the best baseline trained on 100%. Under acoustic perturbations, EchoJEPA degrades by only 2.3% compared to 16.8% for the next best model, and transfers zero-shot to pediatric patients with 15% lower error than the next best model, outperforming all fine-tuned baselines. These results establish latent prediction as a superior paradigm for ultrasound foundation models.
- [786] arXiv:2602.02604 (cross-list from econ.EM) [pdf, ps, other]
-
Title: AI Assisted Economics Measurement From Survey: Evidence from Public Employee Pension ChoiceSubjects: Econometrics (econ.EM); Artificial Intelligence (cs.AI)
We develop an iterative framework for economic measurement that leverages large language models to extract measurement structure directly from survey instruments. The approach maps survey items to a sparse distribution over latent constructs through what we term a soft mapping, aggregates harmonized responses into respondent level sub dimension scores, and disciplines the resulting taxonomy through out of sample incremental validity tests and discriminant validity diagnostics. The framework explicitly integrates iteration into the measurement construction process. Overlap and redundancy diagnostics trigger targeted taxonomy refinement and constrained remapping, ensuring that added measurement flexibility is retained only when it delivers stable out of sample performance gains. Applied to a large scale public employee retirement plan survey, the framework identifies which semantic components contain behavioral signal and clarifies the economic mechanisms, such as beliefs versus constraints, that matter for retirement choices. The methodology provides a portable measurement audit of survey instruments that can guide both empirical analysis and survey design.
- [787] arXiv:2602.02620 (cross-list from q-bio.QM) [pdf, ps, other]
-
Title: CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision ModelsSubjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cryo-electron microscopy (cryo-EM) has revolutionized structural biology by enabling near-atomic-level visualization of biomolecular assemblies. However, the exponential growth in cryo-EM data throughput and complexity, coupled with diverse downstream analytical tasks, necessitates unified computational frameworks that transcend current task-specific deep learning approaches with limited scalability and generalizability. We present CryoLVM, a foundation model that learns rich structural representations from experimental density maps with resolved structures by leveraging the Joint-Embedding Predictive Architecture (JEPA) integrated with SCUNet-based backbone, which can be rapidly adapted to various downstream tasks. We further introduce a novel histogram-based distribution alignment loss that accelerates convergence and enhances fine-tuning performance. We demonstrate CryoLVM's effectiveness across three critical cryo-EM tasks: density map sharpening, density map super-resolution, and missing wedge restoration. Our method consistently outperforms state-of-the-art baselines across multiple density map quality metrics, confirming its potential as a versatile model for a wide spectrum of cryo-EM applications.
- [788] arXiv:2602.02633 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free AdaptationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Often, constraints arise in deployment settings where even lightweight parameter updates e.g. parameter-efficient fine-tuning could induce model shift or tuning instability. We study test-time adaptation of foundation models for few-shot classification under a completely frozen-model regime, where additionally, no upstream data are accessible. We propose arguably the first training-free inference method that adapts predictions to the new task by performing a change of measure over the latent embedding distribution induced by the encoder. Using task-similarity scores derived from a small labeled support set, exponential tilting reweights latent distributions in a KL-optimal manner without modifying model parameters. Empirically, the method consistently competes with parameter-update-based methods across multiple benchmarks and shot regimes, while operating under strictly and universally stronger constraints. These results demonstrate the viability of inference-level distributional correction for test-time adaptation even with a fully-frozen model pipeline.
- [789] arXiv:2602.02698 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Compiling Quantum Regular Language StatesAuthors: Armando Bellante, Reinis Irmejs, Marta Florido-Llinàs, María Cea Fernández, Marianna Crupi, Matthew Kiser, J. Ignacio CiracComments: Code available at this https URLSubjects: Quantum Physics (quant-ph); Formal Languages and Automata Theory (cs.FL)
State preparation compilers for quantum computers typically sit at two extremes: general-purpose routines that treat the target as an opaque amplitude vector, and bespoke constructions for a handful of well-known state families. We ask whether a compiler can instead accept simple, structure-aware specifications while providing predictable resource guarantees. We answer this by designing and implementing a quantum state-preparation compiler for regular language states (RLS): uniform superpositions over bitstrings accepted by a regular description, and their complements. Users describe the target state via (i) a finite set of bitstrings, (ii) a regular expression, or (iii) a deterministic finite automaton (DFA), optionally with a complement flag. By translating the input to a DFA, minimizing it, and mapping it to an optimal matrix product state (MPS), the compiler obtains an intermediate representation (IR) that exposes and compresses hidden structure. The efficient DFA representation and minimization offloads expensive linear algebra computation in exchange of simpler automata manipulations. The combination of the regular-language frontend and this IR gives concise specifications not only for RLS but also for their complements that might otherwise require exponentially large state descriptions. This enables state preparation of an RLS or its complement with the same asymptotic resources and compile time. We outline two hardware-aware backends: SeqRLSP, which yields linear-depth, ancilla-free circuits for linear nearest-neighbor architectures via sequential generation, and TreeRLSP, which achieves logarithmic depth on all-to-all connectivity via a tree tensor network. We prove depth and gate-count bounds scaling with the system size and the state's maximal Schmidt rank, and we give explicit compile-time bounds that expose the benefit of our approach. We implement and evaluate the pipeline.
- [790] arXiv:2602.02713 (cross-list from physics.med-ph) [pdf, ps, other]
-
Title: Perfusion Imaging and Single Material Reconstruction in Polychromatic Photon Counting CTComments: Code is available at this https URLSubjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Background: Perfusion computed tomography (CT) images the dynamics of a contrast agent through the body over time, and is one of the highest X-ray dose scans in medical imaging. Recently, a theoretically justified reconstruction algorithm based on a monotone variational inequality (VI) was proposed for single material polychromatic photon-counting CT, and showed promising early results at low-dose imaging.
Purpose: We adapt this reconstruction algorithm for perfusion CT, to reconstruct the concentration map of the contrast agent while the static background tissue is assumed known; we call our method VI-PRISM (VI-based PeRfusion Imaging and Single Material reconstruction). We evaluate its potential for dose-reduced perfusion CT, using a digital phantom with water and iodine of varying concentration.
Methods: Simulated iodine concentrations range from 0.05 to 2.5 mg/ml. The simulated X-ray source emits photons up to 100 keV, with average intensity ranging from $10^5$ down to $10^2$ photons per detector element. The number of tomographic projections was varied from 984 down to 8 to characterize the tradeoff in photon allocation between views and intensity.
Results: We compare VI-PRISM against filtered back-projection (FBP), and find that VI-PRISM recovers iodine concentration with error below 0.4 mg/ml at all source intensity levels tested. Even with a dose reduction between 10x and 100x compared to FBP, VI-PRISM exhibits reconstruction quality on par with FBP.
Conclusion: Across all photon budgets and angular sampling densities tested, VI-PRISM achieved consistently lower RMSE, reduced noise, and higher SNR compared to filtered back-projection. Even in extremely photon-limited and sparsely sampled regimes, VI-PRISM recovered iodine concentrations with errors below 0.4 mg/ml, showing that VI-PRISM can support accurate and dose-efficient perfusion imaging in photon-counting CT. - [791] arXiv:2602.02734 (cross-list from eess.AS) [pdf, ps, other]
-
Title: WAXAL: A Large-Scale Multilingual African Language Speech CorpusAuthors: Abdoulaye Diack, Perry Nelson, Kwaku Agbesi, Angela Nakalembe, MohamedElfatih MohamedKhair, Vusumuzi Dube, Tavonga Siyavora, Subhashini Venugopalan, Jason Hickey, Uche Okonkwo, Abhishek Bapna, Isaac Wiafe, Raynard Dodzi Helegah, Elikem Doe Atsakpo, Charles Nutrokpor, Fiifi Baffoe Payin Winful, Kafui Kwashie Solaga, Jamal-Deen Abdulai, Akon Obu Ekpezu, Audace Niyonkuru, Samuel Rutunda, Boris Ishimwe, Michael Melese, Engineer Bainomugisha, Joyce Nakatumba-Nabende, Andrew Katumba, Claire Babirye, Jonathan Mukiibi, Vincent Kimani, Samuel Kibacia, James Maina, Fridah Emmah, Ahmed Ibrahim Shekarau, Ibrahim Shehu Adamu, Yusuf Abdullahi, Howard Lakougna, Bob MacDonald, Hadar Shemtov, Aisha Walcott-Bryant, Moustapha Cisse, Avinatan Hassidim, Jeff Dean, Yossi MatiasComments: Initial dataset releaseSubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
The advancement of speech technology has predominantly favored high-resource languages, creating a significant digital divide for speakers of most Sub-Saharan African languages. To address this gap, we introduce WAXAL, a large-scale, openly accessible speech dataset for 21 languages representing over 100 million speakers. The collection consists of two main components: an Automated Speech Recognition (ASR) dataset containing approximately 1,250 hours of transcribed, natural speech from a diverse range of speakers, and a Text-to-Speech (TTS) dataset with over 180 hours of high-quality, single-speaker recordings reading phonetically balanced scripts. This paper details our methodology for data collection, annotation, and quality control, which involved partnerships with four African academic and community organizations. We provide a detailed statistical overview of the dataset and discuss its potential limitations and ethical considerations. The WAXAL datasets are released at https://huggingface.co/datasets/google/WaxalNLP under the permissive CC-BY-4.0 license to catalyze research, enable the development of inclusive technologies, and serve as a vital resource for the digital preservation of these languages.
- [792] arXiv:2602.02744 (cross-list from math.CO) [pdf, ps, other]
-
Title: An introduction to local differential privacy protocols using block designsSubjects: Combinatorics (math.CO); Cryptography and Security (cs.CR)
The design of protocols for local differential privacy (or LDP) has been a topic of considerable research interest in recent years. LDP protocols utilise the randomised encoding of outcomes of an experiment using a transition probability matrix (TPM). Several authors have observed that balanced incomplete block designs (BIBDs) provide nice examples of TPMs for LDP protocols. Indeed, it has been shown that such BIBD-based LDP protocols provide optimal estimators.
In this primarily expository paper, we give a detailed introduction to LDP protocols and their connections with block designs. We prove that a subclass of LDP protocols known as pure LDP protocols are equivalent to $(r,\lambda)$-designs (which contain balanced incomplete block designs as a special case). An unbiased estimator for an LDP scheme is a left inverse of the transition probability matrix. We show that the optimal estimators for BIBD-based TPMs are precisely those obtained from the Moore-Penrose inverse of the corresponding TPM. We also review some existing work on optimal LDP protocols in the context of pure protocols. - [793] arXiv:2602.02755 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Physics-based generation of multilayer corneal OCT data via Gaussian modeling and MCML for AI-driven diagnostic and surgical guidance applicationsSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Training deep learning models for corneal optical coherence tomography (OCT) imaging is limited by the availability of large, well-annotated datasets. We present a configurable Monte Carlo simulation framework that generates synthetic corneal B-scan optical OCT images with pixel-level five-layer segmentation labels derived directly from the simulation geometry. A five-layer corneal model with Gaussian surfaces captures curvature and thickness variability in healthy and keratoconic eyes. Each layer is assigned optical properties from the literature and light transport is simulated using Monte Carlo modeling of light transport in multi-layered tissues (MCML), while incorporating system features such as the confocal PSF and sensitivity roll-off. This approach produces over 10,000 high-resolution (1024x1024) image-label pairs and supports customization of geometry, photon count, noise, and system parameters. The resulting dataset enables systematic training, validation, and benchmarking of AI models under controlled, ground-truth conditions, providing a reproducible and scalable resource to support the development of diagnostic and surgical guidance applications in image-guided ophthalmology.
- [794] arXiv:2602.02759 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Near-Universal Multiplicative Updates for Nonnegative Einsum FactorizationComments: 26 pages, 5 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Despite the ubiquity of multiway data across scientific domains, there are few user-friendly tools that fit tailored nonnegative tensor factorizations. Researchers may use gradient-based automatic differentiation (which often struggles in nonnegative settings), choose between a limited set of methods with mature implementations, or implement their own model from scratch. As an alternative, we introduce NNEinFact, an einsum-based multiplicative update algorithm that fits any nonnegative tensor factorization expressible as a tensor contraction by minimizing one of many user-specified loss functions (including the $(\alpha,\beta)$-divergence). To use NNEinFact, the researcher simply specifies their model with a string. NNEinFact converges to a local minimum of the loss, supports missing data, and fits to tensors with hundreds of millions of entries in seconds. Empirically, NNEinFact fits custom models which outperform standard ones in heldout prediction tasks on real-world tensor data by over $37\%$ and attains less than half the test loss of gradient-based methods while converging up to 90 times faster.
- [795] arXiv:2602.02791 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Plug-In Classification of Drift Functions in Diffusion Processes Using Neural NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We study a supervised multiclass classification problem for diffusion processes, where each class is characterized by a distinct drift function and trajectories are observed at discrete times. Extending the one-dimensional multiclass framework of Denis et al. (2024) to multidimensional diffusions, we propose a neural network-based plug-in classifier that estimates the drift functions for each class from independent sample paths and assigns labels based on a Bayes-type decision rule. Under standard regularity assumptions, we establish convergence rates for the excess misclassification risk, explicitly capturing the effects of drift estimation error and time discretization. Numerical experiments demonstrate that the proposed method achieves faster convergence and improved classification performance compared to Denis et al. (2024) in the one-dimensional setting, remains effective in higher dimensions when the underlying drift functions admit a compositional structure, and consistently outperforms direct neural network classifiers trained end-to-end on trajectories without exploiting the diffusion model structure.
- [796] arXiv:2602.02798 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Real-time topology-aware M-mode OCT segmentation for robotic deep anterior lamellar keratoplasty (DALK) guidanceSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Robotic deep anterior lamellar keratoplasty (DALK) requires accurate real time depth feedback to approach Descemet's membrane (DM) without perforation. M-mode intraoperative optical coherence tomography (OCT) provides high temporal resolution depth traces, but speckle noise, attenuation, and instrument induced shadowing often result in discontinuous or ambiguous layer interfaces that challenge anatomically consistent segmentation at deployment frame rates. We present a lightweight, topology aware M-mode segmentation pipeline based on UNeXt that incorporates anatomical topology regularization to stabilize boundary continuity and layer ordering under low signal to noise ratio conditions. The proposed system achieves end to end throughput exceeding 80 Hz measured over the complete preprocessing inference overlay pipeline on a single GPU, demonstrating practical real time guidance beyond model only timing. This operating margin provides temporal headroom to reject low quality or dropout frames while maintaining a stable effective depth update rate. Evaluation on a standard rabbit eye M-mode dataset using an established baseline protocol shows improved qualitative boundary stability compared with topology agnostic controls, while preserving deployable real time performance.
- [797] arXiv:2602.02805 (cross-list from econ.EM) [pdf, ps, other]
-
Title: Predicting Well-Being with Mobile Phone Data: Evidence from Four CountriesComments: 5 pages, 2 figures, presented at ASSA 2026 Annual Meeting, will be published in AEA Papers and Proceedings 2026Subjects: Econometrics (econ.EM); Computers and Society (cs.CY)
We provide systematic evidence on the potential for estimating household well-being from mobile phone data. Using data from four countries - Afghanistan, Cote d'Ivoire, Malawi, and Togo - we conduct parallel, standardized machine learning experiments to assess which measures of welfare can be most accurately predicted, which types of phone data are most useful, and how much training data is required. We find that long-term poverty measures such as wealth indices (Pearson's rho = 0.20-0.59) and multidimensional poverty (rho = 0.29-0.57) can be predicted more accurately than consumption (rho = 0.04 - 0.54); transient vulnerability measures like food security and mental health are very difficult to predict. Models using calls and text message behavior are more predictive than those using metadata on mobile internet usage, mobile money transactions, and airtime top-ups. Predictive accuracy improves rapidly through the first 1,000-2,000 training observations, with continued gains beyond 4,500 observations. Model performance depends strongly on sample heterogeneity: nationally-representative samples yield 20-70 percent higher accuracy than urban-only or rural-only samples.
- [798] arXiv:2602.02813 (cross-list from stat.AP) [pdf, ps, other]
-
Title: Downscaling land surface temperature data using edge detection and block-diagonal Gaussian process regressionAuthors: Sanjit Dandapanthula, Margaret Johnson, Madeleine Pascolini-Campbell, Glynn Hulley, Mikael KuuselaSubjects: Applications (stat.AP); Machine Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)
Accurate and high-resolution estimation of land surface temperature (LST) is crucial in estimating evapotranspiration, a measure of plant water use and a central quantity in agricultural applications. In this work, we develop a novel statistical method for downscaling LST data obtained from NASA's ECOSTRESS mission, using high-resolution data from the Landsat 8 mission as a proxy for modeling agricultural field structure. Using the Landsat data, we identify the boundaries of agricultural fields through edge detection techniques, allowing us to capture the inherent block structure present in the spatial domain. We propose a block-diagonal Gaussian process (BDGP) model that captures the spatial structure of the agricultural fields, leverages independence of LST across fields for computational tractability, and accounts for the change of support present in ECOSTRESS observations. We use the resulting BDGP model to perform Gaussian process regression and obtain high-resolution estimates of LST from ECOSTRESS data, along with uncertainty quantification. Our results demonstrate the practicality of the proposed method in producing reliable high-resolution LST estimates, with potential applications in agriculture, urban planning, and climate studies.
- [799] arXiv:2602.02814 (cross-list from math.OC) [pdf, ps, other]
-
Title: Sub-optimality bounds for certainty equivalent policies in partially observed systemsComments: 12 pages, 0 figuresSubjects: Optimization and Control (math.OC); Robotics (cs.RO); Systems and Control (eess.SY)
In this paper, we present a generalization of the certainty equivalence principle of stochastic control. One interpretation of the classical certainty equivalence principle for linear systems with output feedback and quadratic costs is as follows: the optimal action at each time is obtained by evaluating the optimal state-feedback policy of the stochastic linear system at the minimum mean square error (MMSE) estimate of the state. Motivated by this interpretation, we consider certainty equivalent policies for general (non-linear) partially observed stochastic systems that allow for any state estimate rather than restricting to MMSE estimates. In such settings, the certainty equivalent policy is not optimal. For models where the cost and the dynamics are smooth in an appropriate sense, we derive upper bounds on the sub-optimality of certainty equivalent policies. We present several examples to illustrate the results.
- [800] arXiv:2602.02826 (cross-list from math.OC) [pdf, ps, other]
-
Title: Fast Near Time-Optimal Motion Planning for Holonomic Vehicles in Structured EnvironmentsSubjects: Optimization and Control (math.OC); Robotics (cs.RO)
This paper proposes a novel and efficient optimization-based method for generating near time-optimal trajectories for holonomic vehicles navigating through complex but structured environments. The approach aims to solve the problem of motion planning for planar motion systems using magnetic levitation that can be used in assembly lines, automated laboratories or clean-rooms. In these applications, time-optimal trajectories that can be computed in real-time are required to increase productivity and allow the vehicles to be reactive if needed. The presented approach encodes the environment representation using free-space corridors and represents the motion of the vehicle through such a corridor using a motion primitive. These primitives are selected heuristically and define the trajectory with a limited number of degrees of freedom, which are determined in an optimization problem. As a result, the method achieves significantly lower computation times compared to the state-of-the-art, most notably solving a full Optimal Control Problem (OCP), OMG-tools or VP-STO without significantly compromising optimality within a fixed corridor sequence. The approach is benchmarked extensively in simulation and is validated on a real-world Beckhoff XPlanar system
- [801] arXiv:2602.02885 (cross-list from math.GR) [pdf, ps, other]
-
Title: Obstruction theory and the complexity of counting group homomorphismsSubjects: Group Theory (math.GR); Computational Complexity (cs.CC); Geometric Topology (math.GT)
Fix a finite group $G$. We study the computational complexity of counting problems of the following flavor: given a group $\Gamma$, count the number of homomorphisms $\Gamma \to G$. Our first result establishes that this problem is $\#\mathsf{P}$-hard whenever $G$ is a non-abelian group and $\Gamma$ is provided via a finite presentation. We give several improvements showing that this hardness conclusion continues to hold for restricted $\Gamma$ satisfying various promises. Our second result, in contrast, shows that if $G$ is class 2 nilpotent and $\Gamma = \pi_1(M^3)$ for some input 3-manifold triangulation $M^3$, then there is a polynomial time algorithm. The difference in complexity is explained by the fact that 3-manifolds are close enough to being Eilenberg-MacLane spaces for us to be able to solve the necessary group cohomological obstruction problems efficiently using the given triangulation. A similar polynomial time algorithm for counting maps to finite, class 2 nilpotent $G$ exists when $\Gamma$ is itself a finite group encoded via a multiplication table.
- [802] arXiv:2602.02927 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Training-Free Self-Correction for Multimodal Masked Diffusion ModelsAuthors: Yidong Ouyang, Panwen Hu, Zhengyan Wan, Zhe Wang, Liyan Xie, Dmitriy Bespalov, Ying Nian Wu, Guang Cheng, Hongyuan Zha, Qiang SunSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Masked diffusion models have emerged as a powerful framework for text and multimodal generation. However, their sampling procedure updates multiple tokens simultaneously and treats generated tokens as immutable, which may lead to error accumulation when early mistakes cannot be revised. In this work, we revisit existing self-correction methods and identify limitations stemming from additional training requirements or reliance on misaligned likelihood estimates. We propose a training-free self-correction framework that exploits the inductive biases of pre-trained masked diffusion models. Without modifying model parameters or introducing auxiliary evaluators, our method significantly improves generation quality on text-to-image generation and multimodal understanding tasks with reduced sampling steps. Moreover, the proposed framework generalizes across different masked diffusion architectures, highlighting its robustness and practical applicability. Code can be found in https://github.com/huge123/FreeCorrection.
- [803] arXiv:2602.02931 (cross-list from stat.ME) [pdf, ps, other]
-
Title: Weighted Sum-of-Trees Model for Clustered DataComments: 14 pages, 8 figures, 3 tablesSubjects: Methodology (stat.ME); Machine Learning (cs.LG)
Clustered data, which arise when observations are nested within groups, are incredibly common in clinical, education, and social science research. Traditionally, a linear mixed model, which includes random effects to account for within-group correlation, would be used to model the observed data and make new predictions on unseen data. Some work has been done to extend the mixed model approach beyond linear regression into more complex and non-parametric models, such as decision trees and random forests. However, existing methods are limited to using the global fixed effects for prediction on data from out-of-sample groups, effectively assuming that all clusters share a common outcome model. We propose a lightweight sum-of-trees model in which we learn a decision tree for each sample group. We combine the predictions from these trees using weights so that out-of-sample group predictions are more closely aligned with the most similar groups in the training data. This strategy also allows for inference on the similarity across groups in the outcome prediction model, as the unique tree structures and variable importances for each group can be directly compared. We show our model outperforms traditional decision trees and random forests in a variety of simulation settings. Finally, we showcase our method on real-world data from the sarcoma cohort of The Cancer Genome Atlas, where patient samples are grouped by sarcoma subtype.
- [804] arXiv:2602.02940 (cross-list from math.LO) [pdf, ps, other]
-
Title: A vector logic for intensional formal semanticsAuthors: Daniel QuigleyComments: 25 pages; 68 sourcesSubjects: Logic (math.LO); Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
Formal semantics and distributional semantics are distinct approaches to linguistic meaning: the former models meaning as reference via model-theoretic structures; the latter represents meaning as vectors in high-dimensional spaces shaped by usage. This paper proves that these frameworks are structurally compatible for intensional semantics. We establish that Kripke-style intensional models embed injectively into vector spaces, with semantic functions lifting to (multi)linear maps that preserve composition. The construction accommodates multiple index sorts (worlds, times, locations) via a compound index space, representing intensions as linear operators. Modal operators are derived algebraically: accessibility relations become linear operators, and modal conditions reduce to threshold checks on accumulated values. For uncountable index domains, we develop a measure-theoretic generalization in which necessity becomes truth almost everywhere and possibility becomes truth on a set of positive measure, a non-classical logic natural for continuous parameters.
- [805] arXiv:2602.02945 (cross-list from stat.CO) [pdf, ps, other]
-
Title: Bayesian Methods for the Navier-Stokes EquationsSubjects: Computation (stat.CO); Numerical Analysis (math.NA)
We develop a Bayesian methodology for numerical solution of the incompressible Navier--Stokes equations with quantified uncertainty. The central idea is to treat discretized Navier--Stokes dynamics as a state-space model and to view numerical solution as posterior computation: priors encode physical structure and modeling error, and the solver outputs a distribution over states and quantities of interest rather than a single trajectory. In two dimensions, stochastic representations (Feynman--Kac and stochastic characteristics for linear advection--diffusion with prescribed drift) motivate Monte Carlo solvers and provide intuition for uncertainty propagation. In three dimensions, we formulate stochastic Navier--Stokes models and describe particle-based and ensemble-based Bayesian workflows for uncertainty propagation in spectral discretizations. A key computational advantage is that parameter learning can be performed stably via particle learning: marginalization and resample--propagate (one-step smoothing) constructions avoid the weight-collapse that plagues naive sequential importance sampling on static parameters. When partial observations are available, the same machinery supports sequential observational updating as an additional capability. We also discuss non-Gaussian (heavy-tailed) error models based on normal variance-mean mixtures, which yield conditionally Gaussian updates via latent scale augmentation.
- [806] arXiv:2602.02950 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Asymptotically Optimal Quantum Universal Quickest Change DetectionAuthors: Arick Grootveld, Haodong Yang, Nandan Sriranga, Biao Chen, Venkata Gandikota, Jason PollackSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
This paper investigates the quickest change detection of quantum states in a universal setting: specifically, where the post-change quantum state is not known a priori. We establish the asymptotic optimality of a two-stage approach in terms of worst average delay to detection. The first stage employs block POVMs with classical outputs that preserve quantum relative entropy to arbitrary precision. The second stage leverages a recently proposed windowed-CUSUM algorithm that is known to be asymptotically optimal for quickest change detection with an unknown post-change distribution in the classical setting.
- [807] arXiv:2602.02980 (cross-list from eess.AS) [pdf, ps, other]
-
Title: WST-X Series: Wavelet Scattering Transform for Interpretable Speech Deepfake DetectionComments: Submitted to IEEE Signal Processing LettersSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Signal Processing (eess.SP)
Designing front-ends for speech deepfake detectors primarily focuses on two categories. Hand-crafted filterbank features are transparent but are limited in capturing high-level semantic details, often resulting in performance gaps compared to self-supervised (SSL) features. SSL features, in turn, lack interpretability and may overlook fine-grained spectral anomalies. We propose the WST-X series, a novel family of feature extractors that combines the best of both worlds via the wavelet scattering transform (WST), integrating wavelets with nonlinearities analogous to deep convolutional networks. We investigate 1D and 2D WSTs to extract acoustic details and higher-order structural anomalies, respectively. Experimental results on the recent and challenging Deepfake-Eval-2024 dataset indicate that WST-X outperforms existing front-ends by a wide margin. Our analysis reveals that a small averaging scale ($J$), combined with high-frequency and directional resolutions ($Q, L$), is critical for capturing subtle artifacts. This underscores the value of translation-invariant and deformation-stable features for robust and interpretable speech deepfake detection.
- [808] arXiv:2602.02981 (cross-list from math.OC) [pdf, ps, other]
-
Title: Fisher-Information-Based Sensor Placement for Structural Digital Twins: Analytic Results and BenchmarksSubjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)
High-fidelity digital twins rely on the accurate assimilation of sensor data into physics-based computational models. In structural applications, such twins aim to identify spatially distributed quantities--such as elementwise weakening fields, material parameters, or effective thermal loads--by minimizing discrepancies between measured and simulated responses subject to the governing equations of structural mechanics. While adjoint-based methods enable efficient gradient computation for these inverse problems, the quality and stability of the resulting estimates depend critically on the choice of sensor locations, measurement types, and directions.
This paper develops a rigorous and implementation-ready framework for Fisher-information-based sensor placement in adjoint-based finite-element digital twins. Sensor configurations are evaluated using a D-optimal design criterion derived from a linearization of the measurement map, yielding a statistically meaningful measure of information content. We present matrix-free operator formulas for applying the Jacobian and its adjoint, and hence for computing Fisher-information products $Fv = J^\top R^{-1} Jv$ using only forward and adjoint solves. Building on these operator evaluations, we derive explicit sensitivity expressions for D-optimal sensor design with respect to measurement parameters and discuss practical strategies for evaluating the associated log-determinant objectives. To complement the general framework, we provide analytically tractable sensor placement results for a canonical one-dimensional structural model, clarifying the distinction between detectability and localizability and proving that D-optimal placement of multiple displacement sensors yields approximately uniform spacing. - [809] arXiv:2602.02985 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Accelerating the Tesseract Decoder for Quantum Error CorrectionAuthors: Dragana Grbic (Google Quantum AI, Department of Computer Science, Rice University), Laleh Aghababaie Beni (Google Quantum AI) Noah Shutty (Google Quantum AI)Subjects: Quantum Physics (quant-ph); Performance (cs.PF)
Quantum Error Correction (QEC) is essential for building robust, fault-tolerant quantum computers; however, the decoding process often presents a significant computational bottleneck. Tesseract is a novel Most-Likely-Error (MLE) decoder for QEC that employs the A* search algorithm to explore an exponentially large graph of error hypotheses, achieving high decoding speed and accuracy. This paper presents a systematic approach to optimizing the Tesseract decoder through low-level performance enhancements. Based on extensive profiling, we implemented four targeted optimization strategies, including the replacement of inefficient data structures, reorganization of memory layouts to improve cache hit rates, and the use of hardware-accelerated bit-wise operations. We achieved significant decoding speedups across a wide range of code families and configurations, including Color Codes, Bivariate-Bicycle Codes, Surface Codes, and Transversal CNOT Protocols. Our results demonstrate consistent speedups of approximately 2x for most code families, often exceeding 2.5x. Notably, we achieved a peak performance gain of over 5x for the most computationally demanding configurations of Bivariate-Bicycle Codes. These improvements make the Tesseract decoder more efficient and scalable, serving as a practical case study that highlights the importance of high-performance software engineering in QEC and providing a strong foundation for future research.
- [810] arXiv:2602.02992 (cross-list from math.OC) [pdf, ps, other]
-
Title: Data-driven stabilization of continuous-time systems with noisy input-output dataAuthors: Masashi WakaikiComments: 18 pagesSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
We study data-driven stabilization of continuous-time systems in autoregressive form when only noisy input-output data are available. First, we provide an operator-based characterization of the set of systems consistent with the data. Next, combining this characterization with behavioral theory, we derive a necessary and sufficient condition for the noisy data to be informative for quadratic stabilization. This condition is formulated as linear matrix inequalities, whose solution yields a stabilizing controller. Finally, we characterize data informativity for system identification in the noise-free setting.
- [811] arXiv:2602.03031 (cross-list from cond-mat.dis-nn) [pdf, ps, other]
-
Title: Physics-inspired transformer quantum states via latent imaginary-time evolutionSubjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); Quantum Physics (quant-ph)
Neural quantum states (NQS) are powerful ans\"atze in the variational Monte Carlo framework, yet their architectures are often treated as black boxes. We propose a physically transparent framework in which NQS are treated as neural approximations to latent imaginary-time evolution. This viewpoint suggests that standard Transformer-based NQS (TQS) architectures correspond to physically unmotivated effective Hamiltonians dependent on imaginary time in a latent space. Building on this interpretation, we introduce physics-inspired transformer quantum states (PITQS), which enforce a static effective Hamiltonian by sharing weights across layers and improve propagation accuracy via Trotter-Suzuki decompositions without increasing the number of variational parameters. For the frustrated $J_1$-$J_2$ Heisenberg model, our ans\"atze achieve accuracies comparable to or exceeding state-of-the-art TQS while using substantially fewer variational parameters. This study demonstrates that reinterpreting the deep network structure as a latent cooling process enables a more physically grounded, systematic, and compact design, thereby bridging the gap between black-box expressivity and physically transparent construction.
- [812] arXiv:2602.03049 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Unified Inference Framework for Single and Multi-Player Performative Prediction: Method and Asymptotic OptimalitySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
Performative prediction characterizes environments where predictive models alter the very data distributions they aim to forecast, triggering complex feedback loops. While prior research treats single-agent and multi-agent performativity as distinct phenomena, this paper introduces a unified statistical inference framework that bridges these contexts, treating the former as a special case of the latter. Our contribution is two-fold. First, we put forward the Repeated Risk Minimization (RRM) procedure for estimating the performative stability, and establish a rigorous inferential theory for admitting its asymptotic normality and confirming its asymptotic efficiency. Second, for the performative optimality, we introduce a novel two-step plug-in estimator that integrates the idea of Recalibrated Prediction Powered Inference (RePPI) with Importance Sampling, and further provide formal derivations for the Central Limit Theorems of both the underlying distributional parameters and the plug-in results. The theoretical analysis demonstrates that our estimator achieves the semiparametric efficiency bound and maintains robustness under mild distributional misspecification. This work provides a principled toolkit for reliable estimation and decision-making in dynamic, performative environments.
- [813] arXiv:2602.03168 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Online Conformal Prediction via Universal Portfolio AlgorithmsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Online conformal prediction (OCP) seeks prediction intervals that achieve long-run $1-\alpha$ coverage for arbitrary (possibly adversarial) data streams, while remaining as informative as possible. Existing OCP methods often require manual learning-rate tuning to work well, and may also require algorithm-specific analyses. Here, we develop a general regret-to-coverage theory for interval-valued OCP based on the $(1-\alpha)$-pinball loss. Our first contribution is to identify \emph{linearized regret} as a key notion, showing that controlling it implies coverage bounds for any online algorithm. This relies on a black-box reduction that depends only on the Fenchel conjugate of an upper bound on the linearized regret. Building on this theory, we propose UP-OCP, a parameter-free method for OCP, via a reduction to a two-asset portfolio selection problem, leveraging universal portfolio algorithms. We show strong finite-time bounds on the miscoverage of UP-OCP, even for polynomially growing predictions. Extensive experiments support that UP-OCP delivers consistently better size/coverage trade-offs than prior online conformal baselines.
- [814] arXiv:2602.03169 (cross-list from stat.ML) [pdf, ps, other]
-
Title: NeuralFLoC: Neural Flow-Based Joint Registration and Clustering of Functional DataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Clustering functional data in the presence of phase variation is challenging, as temporal misalignment can obscure intrinsic shape differences and degrade clustering performance. Most existing approaches treat registration and clustering as separate tasks or rely on restrictive parametric assumptions. We present \textbf{NeuralFLoC}, a fully unsupervised, end-to-end deep learning framework for joint functional registration and clustering based on Neural ODE-driven diffeomorphic flows and spectral clustering. The proposed model learns smooth, invertible warping functions and cluster-specific templates simultaneously, effectively disentangling phase and amplitude variation. We establish universal approximation guarantees and asymptotic consistency for the proposed framework. Experiments on functional benchmarks show state-of-the-art performance in both registration and clustering, with robustness to missing data, irregular sampling, and noise, while maintaining scalability. Code is available at https://anonymous.4open.science/r/NeuralFLoC-FEC8.
- [815] arXiv:2602.03215 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Latent Neural-ODE for Model-Informed Precision Dosing: Overcoming Structural Assumptions in PharmacokineticsSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Accurate estimation of tacrolimus exposure, quantified by the area under the concentration-time curve (AUC), is essential for precision dosing after renal transplantation. Current practice relies on population pharmacokinetic (PopPK) models based on nonlinear mixed-effects (NLME) methods. However, these models depend on rigid, pre-specified assumptions and may struggle to capture complex, patient-specific dynamics, leading to model misspecification.
In this study, we introduce a novel data-driven alternative based on Latent Ordinary Differential Equations (Latent ODEs) for tacrolimus AUC prediction. This deep learning approach learns individualized pharmacokinetic dynamics directly from sparse clinical data, enabling greater flexibility in modeling complex biological behavior. The model was evaluated through extensive simulations across multiple scenarios and benchmarked against two standard approaches: NLME-based estimation and the iterative two-stage Bayesian (it2B) method. We further performed a rigorous clinical validation using a development dataset (n = 178) and a completely independent external dataset (n = 75).
In simulation, the Latent ODE model demonstrated superior robustness, maintaining high accuracy even when underlying biological mechanisms deviated from standard assumptions. Regarding experiments on clinical datasets, in internal validation, it achieved significantly higher precision with a mean RMSPE of 7.99% compared with 9.24% for it2B (p < 0.001). On the external cohort, it achieved an RMSPE of 10.82%, comparable to the two standard estimators (11.48% and 11.54%).
These results establish the Latent ODE as a powerful and reliable tool for AUC prediction. Its flexible architecture provides a promising foundation for next-generation, multi-modal models in personalized medicine. - [816] arXiv:2602.03240 (cross-list from q-bio.NC) [pdf, ps, other]
-
Title: Estimating measures of information processing during cognitive tasks using functional magnetic resonance imagingSubjects: Neurons and Cognition (q-bio.NC); Information Theory (cs.IT)
Cognition is increasingly framed in terms of information processing, yet most fMRI analyses focus on activation or functional connectivity rather than quantifying how information is stored and transferred. To remedy this problem, we propose a framework for estimating measures of information processing: active information storage (AIS), transfer entropy (TE), and net synergy from task-based fMRI. AIS measures information maintained within a region, TE captures directed information flow, and net synergy contrasts higher-order synergistic to redundant interactions. Crucially, to enable this framework we utilised a recently developed approach for calculating information-theoretic measures: the cross mutual information. This approach combines resting-state and task data to address the challenges of limited sample size, non-stationarity and context in task-based fMRI. We applied this framework to the working memory (N-back) task from the Human Connectome Project (470 participants). Results show that AIS increases in fronto-parietal regions with working memory load, TE reveals enhanced directed information flows across control pathways, and net synergy indicates a global shift to redundancy. This work establishes a novel methodology for quantifying information processing in task-based fMRI.
- [817] arXiv:2602.03245 (cross-list from eess.AS) [pdf, ps, other]
-
Title: Mići Princ -- A Little Boy Teaching Speech Technologies the Chakavian DialectComments: 2 figures, 14 pages, accepted and presented at JTDH 2024Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
This paper documents our efforts in releasing the printed and audio book of the translation of the famous novel The Little Prince into the Chakavian dialect, as a computer-readable, AI-ready dataset, with the textual and the audio components of the two releases now aligned on the level of each written and spoken word. Our motivation for working on this release is multiple. The first one is our wish to preserve the highly valuable and specific content beyond the small editions of the printed and the audio book. With the dataset published in the CLARIN.SI repository, this content is from now on at the fingertips of any interested individual. The second motivation is to make the data available for various artificial-intelligence-related usage scenarios, such as the one we follow upon inside this paper already -- adapting the Whisper-large-v3 open automatic speech recognition model, with decent performance on standard Croatian, to Chakavian dialectal speech. We can happily report that with adapting the model, the word error rate on the selected test data has being reduced to a half, while we managed to remove up to two thirds of the error on character level. We envision many more usages of this dataset beyond the set of experiments we have already performed, both on tasks of artificial intelligence research and application, as well as dialectal research. The third motivation for this release is our hope that this, now highly structured dataset, will be transformed into a digital online edition of this work, allowing individuals beyond the research and technology communities to enjoy the beauty of the message of the little boy in the desert, told through the spectacular prism of the Chakavian dialect.
- [818] arXiv:2602.03258 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Principled Federated Random Forests for Heterogeneous DataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Random Forests (RF) are among the most powerful and widely used predictive models for centralized tabular data, yet few methods exist to adapt them to the federated learning setting. Unlike most federated learning approaches, the piecewise-constant nature of RF prevents exact gradient-based optimization. As a result, existing federated RF implementations rely on unprincipled heuristics: for instance, aggregating decision trees trained independently on clients fails to optimize the global impurity criterion, even under simple distribution shifts. We propose FedForest, a new federated RF algorithm for horizontally partitioned data that naturally accommodates diverse forms of client data heterogeneity, from covariate shift to more complex outcome shift mechanisms. We prove that our splitting procedure, based on aggregating carefully chosen client statistics, closely approximates the split selected by a centralized algorithm. Moreover, FedForest allows splits on client indicators, enabling a non-parametric form of personalization that is absent from prior federated random forest methods. Empirically, we demonstrate that the resulting federated forests closely match centralized performance across heterogeneous benchmarks while remaining communication-efficient.
- [819] arXiv:2602.03283 (cross-list from math.ST) [pdf, ps, other]
-
Title: Orthogonal Approximate Message Passing Algorithms for Rectangular Spiked Matrix Models with Rotationally Invariant NoiseComments: To appear in the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2026Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (stat.ML)
We propose an orthogonal approximate message passing (OAMP) algorithm for signal estimation in the rectangular spiked matrix model with general rotationally invariant (RI) noise. We establish a rigorous state evolution that exactly characterizes the high-dimensional dynamics of the algorithm. Building on this framework, we derive an optimal variant of OAMP that minimizes the predicted mean-squared error at each iteration. For the special case of i.i.d. Gaussian noise, the fixed point of the proposed OAMP algorithm coincides with that of the standard AMP algorithm. For general RI noise models, we conjecture that the optimal OAMP algorithm is statistically optimal within a broad class of iterative methods, and achieves Bayes-optimal performance in certain regimes.
- [820] arXiv:2602.03317 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Multiparameter Uncertainty Mapping in Quantitative Molecular MRI using a Physics-Structured Variational Autoencoder (PS-VAE)Authors: Alex Finkelstein, Ron Moneta, Or Zohar, Michal Rivlin, Moritz Zaiss, Dinora Friedmann Morvinski, Or PerlmanComments: Submitted to IEEE Transactions on Medical Imaging. This project was funded by the European Union (ERC, BabyMagnet, project no. 101115639). Views and opinions expressed are, however, those of the authors only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for themSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Medical Physics (physics.med-ph)
Quantitative imaging methods, such as magnetic resonance fingerprinting (MRF), aim to extract interpretable pathology biomarkers by estimating biophysical tissue parameters from signal evolutions. However, the pattern-matching algorithms or neural networks used in such inverse problems often lack principled uncertainty quantification, which limits the trustworthiness and transparency, required for clinical acceptance. Here, we describe a physics-structured variational autoencoder (PS-VAE) designed for rapid extraction of voxelwise multi-parameter posterior distributions. Our approach integrates a differentiable spin physics simulator with self-supervised learning, and provides a full covariance that captures the inter-parameter correlations of the latent biophysical space. The method was validated in a multi-proton pool chemical exchange saturation transfer (CEST) and semisolid magnetization transfer (MT) molecular MRF study, across in-vitro phantoms, tumor-bearing mice, healthy human volunteers, and a subject with glioblastoma. The resulting multi-parametric posteriors are in good agreement with those calculated using a brute-force Bayesian analysis, while providing an orders-of-magnitude acceleration in whole brain quantification. In addition, we demonstrate how monitoring the multi-parameter posterior dynamics across progressively acquired signals provides practical insights for protocol optimization and may facilitate real-time adaptive acquisition.
- [821] arXiv:2602.03325 (cross-list from q-fin.PM) [pdf, ps, other]
-
Title: A Novel approach to portfolio constructionSubjects: Portfolio Management (q-fin.PM); Machine Learning (cs.LG); Computational Finance (q-fin.CP); Risk Management (q-fin.RM); Machine Learning (stat.ML)
This paper proposes a machine learning-based framework for asset selection and portfolio construction, termed the Best-Path Algorithm Sparse Graphical Model (BPASGM). The method extends the Best-Path Algorithm (BPA) by mapping linear and non-linear dependencies among a large set of financial assets into a sparse graphical model satisfying a structural Markov property. Based on this representation, BPASGM performs a dependence-driven screening that removes positively or redundantly connected assets, isolating subsets that are conditionally independent or negatively correlated. This step is designed to enhance diversification and reduce estimation error in high-dimensional portfolio settings. Portfolio optimization is then conducted on the selected subset using standard mean-variance techniques. BPASGM does not aim to improve the theoretical mean-variance optimum under known population parameters, but rather to enhance realized performance in finite samples, where sample-based Markowitz portfolios are highly sensitive to estimation error. Monte Carlo simulations show that BPASGM-based portfolios achieve more stable risk-return profiles, lower realized volatility, and superior risk-adjusted performance compared to standard mean-variance portfolios. Empirical results for U.S. equities, global stock indices, and foreign exchange rates over 1990-2025 confirm these findings and demonstrate a substantial reduction in portfolio cardinality. Overall, BPASGM offers a statistically grounded and computationally efficient framework that integrates sparse graphical modeling with portfolio theory for dependence-aware asset selection.
- [822] arXiv:2602.03394 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Improving the Linearized Laplace Approximation via Quadratic ApproximationsComments: 6 pages, 1 table. Accepted at European Symposium on Artificial Neural Networks (ESANN 2026) as poster presentationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Deep neural networks (DNNs) often produce overconfident out-of-distribution predictions, motivating Bayesian uncertainty quantification. The Linearized Laplace Approximation (LLA) achieves this by linearizing the DNN and applying Laplace inference to the resulting model. Importantly, the linear model is also used for prediction. We argue this linearization in the posterior may degrade fidelity to the true Laplace approximation. To alleviate this problem, without increasing significantly the computational cost, we propose the Quadratic Laplace Approximation (QLA). QLA approximates each second order factor in the approximate Laplace log-posterior using a rank-one factor obtained via efficient power iterations. QLA is expected to yield a posterior precision closer to that of the full Laplace without forming the full Hessian, which is typically intractable. For prediction, QLA also uses the linearized model. Empirically, QLA yields modest yet consistent uncertainty estimation improvements over LLA on five regression datasets.
- [823] arXiv:2602.03405 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Enhancing Quantum Diffusion Models for Complex Image GenerationComments: 18 pages, 6 figuresSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
Quantum generative models offer a novel approach to exploring high-dimensional Hilbert spaces but face significant challenges in scalability and expressibility when applied to multi-modal distributions. In this study, we explore a Hybrid Quantum-Classical U-Net architecture integrated with Adaptive Non-Local Observables (ANO) as a potential solution to these hurdles. By compressing classical data into a dense quantum latent space and utilizing trainable observables, our model aims to extract non-local features that complement classical processing. We also investigate the role of Skip Connections in preserving semantic information during the reverse diffusion process. Experimental results on the full MNIST dataset (digits 0-9) demonstrate that the proposed architecture is capable of generating structurally coherent and recognizable images for all digit classes. While hardware constraints still impose limitations on resolution, our findings suggest that hybrid architectures with adaptive measurements provide a feasible pathway for mitigating mode collapse and enhancing generative capabilities in the NISQ era.
- [824] arXiv:2602.03438 (cross-list from cond-mat.mtrl-sci) [pdf, ps, other]
-
Title: Acceleration of Atomistic NEGF: Algorithms, Parallelization, and Machine LearningAuthors: Mathieu Luisier, Nicolas Vetsch, Alexander Maeder, Vincent Maillou, Anders Winka, Leonard Deuschle, Chen Hao Xia, Manasa Kaniselvan, Marko Mladenovic, Jiang Cao, Alexandros Nikolaos ZiogasSubjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
The Non-equilibrium Green's function (NEGF) formalism is a particularly powerful method to simulate the quantum transport properties of nanoscale devices such as transistors, photo-diodes, or memory cells, in the ballistic limit of transport or in the presence of various scattering sources such as electronphonon, electron-photon, or even electron-electron interactions. The inclusion of all these mechanisms has been first demonstrated in small systems, composed of a few atoms, before being scaled up to larger structures made of thousands of atoms. Also, the accuracy of the models has kept improving, from empirical to fully ab-initio ones, e.g., density functional theory (DFT). This paper summarizes key (algorithmic) achievements that have allowed us to bring DFT+NEGF simulations closer to the dimensions and functionality of realistic systems. The possibility of leveraging graph neural networks and machine learning to speed up ab-initio device simulations is discussed as well.
- [825] arXiv:2602.03449 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Score-based diffusion models for diffuse optical tomography with uncertainty quantificationAuthors: Fabian Schneider, Meghdoot Mozumder, Konstantin Tamarov, Leila Taghizadeh, Tanja Tarvainen, Tapio Helin, Duc-Lam DuongSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)
Score-based diffusion models are a recently developed framework for posterior sampling in Bayesian inverse problems with a state-of-the-art performance for severely ill-posed problems by leveraging a powerful prior distribution learned from empirical data. Despite generating significant interest especially in the machine-learning community, a thorough study of realistic inverse problems in the presence of modelling error and utilization of physical measurement data is still outstanding. In this work, the framework of unconditional representation for the conditional score function (UCoS) is evaluated for linearized difference imaging in diffuse optical tomography (DOT). DOT uses boundary measurements of near-infrared light to estimate the spatial distribution of absorption and scattering parameters in biological tissues. The problem is highly ill-posed and thus sensitive to noise and modelling errors. We introduce a novel regularization approach that prevents overfitting of the score function by constructing a mixed score composed of a learned and a model-based component. Validation of this approach is done using both simulated and experimental measurement data. The experiments demonstrate that a data-driven prior distribution results in posterior samples with low variance, compared to classical model-based estimation, and centred around the ground truth, even in the context of a highly ill-posed problem and in the presence of modelling errors.
- [826] arXiv:2602.03460 (cross-list from math.OC) [pdf, ps, other]
-
Title: Cholesky factorisation, and intrinsically sparse linear quadratic regulationComments: 15 pages, 7 figures, under reviewSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
We classify a family of matrices of shift operators that can be factorised in a computationally tractable manner with the Cholesky algorithm. Such matrices arise in the linear quadratic regulator problem, and related areas. We use the factorisation to uncover intrinsic sparsity properties in the control laws for transportation problems with an underlying tree structure. This reveals that the optimal control can be applied in a distributed manner that is obscured by standard solution methods.
- [827] arXiv:2602.03508 (cross-list from math.OC) [pdf, ps, other]
-
Title: A necessary and sufficient condition for discrete-time consensus on star boundariesComments: 14 pages, 8 figuresSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
It is intuitive and well known, that if agents in a multi-agent system iteratively update their states in the Euclidean space as convex combinations of neighbors' states, all states eventually converge to the same value (consensus), provided the interaction graph is sufficiently connected. However, this seems to be also true in practice if the convex combinations of states are mapped or radially projected onto any unit $l_p$-sphere or even boundaries of star-convex sets, herein referred to as star boundaries. In this paper, we present insight into this matter by providing a necessary and sufficient condition for asymptotic consensus of the normalized states (directions) for strongly connected directed graphs, which is equivalent to asymptotic consensus of states when the star boundaries are the same for all agents. Furthermore, we show that when asymptotic consensus occurs, the states converge linearly and the point of convergence is continuous in the initial states. Assuming a directed strongly connected graph provides a more general setting than that considered, for example, in gradient-based consensus protocols, where symmetric graphs are assumed. Illustrative examples and a vast number of numerical simulations showcase the theoretical results.
- [828] arXiv:2602.03581 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Low-Complexity Distributed Combining Design for Near-Field Cell-Free XL-MIMO SystemsComments: 15 pages, 10 figures, to appear in IEEE Transactions on Wireless CommunicationsSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
In this paper, we investigate the low-complexity distributed combining scheme design for near-field cell-free extremely large-scale multiple-input-multiple-output (CF XL-MIMO) systems. Firstly, we construct the uplink spectral efficiency (SE) performance analysis framework for CF XL-MIMO systems over centralized and distributed processing schemes. Notably, we derive the centralized minimum mean-square error (CMMSE) and local minimum mean-square error (LMMSE) combining schemes over arbitrary channel estimators. Then, focusing on the CMMSE and LMMSE combining schemes, we propose five low-complexity distributed combining schemes based on the matrix approximation methodology or the symmetric successive over relaxation (SSOR) algorithm. More specifically, we propose two matrix approximation methodology-aided combining schemes: Global Statistics \& Local Instantaneous information-based MMSE (GSLI-MMSE) and Statistics matrix Inversion-based LMMSE (SI-LMMSE). These two schemes are derived by approximating the global instantaneous information in the CMMSE combining and the local instantaneous information in the LMMSE combining with the global and local statistics information by asymptotic analysis and matrix expectation approximation, respectively. Moreover, by applying the low-complexity SSOR algorithm to iteratively solve the matrix inversion in the LMMSE combining, we derive three distributed SSOR-based LMMSE combining schemes, distinguished from the applied information and initial values.
- [829] arXiv:2602.03590 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Statistics Approximation-Enabled Distributed Beamforming for Cell-Free Massive MIMOComments: 6 pages, 3 figures, accepted by IEEE International Conference on Communications (ICC) 2026Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
We study a distributed beamforming approach for cell-free massive multiple-input multiple-output networks, referred to as Global Statistics \& Local Instantaneous information-based minimum mean-square error (GSLI-MMSE). The scenario with multi-antenna access points (APs) is considered over three different channel models: correlated Rician fading with fixed or random line-of-sight (LoS) phase-shifts, and correlated Rayleigh fading. With the aid of matrix inversion derivations, we can construct the conventional MMSE combining from the perspective of each AP, where global instantaneous information is involved. Then, for an arbitrary AP, we apply the statistics approximation methodology to approximate instantaneous terms related to other APs by channel statistics to construct the distributed combining scheme at each AP with local instantaneous information and global statistics. With the aid of uplink-downlink duality, we derive the respective GSLI-MMSE precoding schemes. Numerical results showcase that the proposed GSLI-MMSE scheme demonstrates performance comparable to the optimal centralized MMSE scheme, under the stable LoS conditions, e.g., with static users having Rician fading with a fixed LoS path.
- [830] arXiv:2602.03612 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Generator-based Graph Generation via Heat DiffusionComments: Submitted to ICML; 8+15 pages; 20 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Graph generative modelling has become an essential task due to the wide range of applications in chemistry, biology, social networks, and knowledge representation. In this work, we propose a novel framework for generating graphs by adapting the Generator Matching (arXiv:2410.20587) paradigm to graph-structured data. We leverage the graph Laplacian and its associated heat kernel to define a continous-time diffusion on each graph. The Laplacian serves as the infinitesimal generator of this diffusion, and its heat kernel provides a family of conditional perturbations of the initial graph. A neural network is trained to match this generator by minimising a Bregman divergence between the true generator and a learnable surrogate. Once trained, the surrogate generator is used to simulate a time-reversed diffusion process to sample new graph structures. Our framework unifies and generalises existing diffusion-based graph generative models, injecting domain-specific inductive bias via the Laplacian, while retaining the flexibility of neural approximators. Experimental studies demonstrate that our approach captures structural properties of real and synthetic graphs effectively.
- [831] arXiv:2602.03613 (cross-list from stat.ME) [pdf, ps, other]
-
Title: Simulation-Based Inference via Regression Projection and Batched DiscrepanciesComments: comments are welcome,Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
We analyze a lightweight simulation-based inference method that infers simulator parameters using only a regression-based projection of the observed data. After fitting a surrogate linear regression once, the procedure simulates small batches at the proposed parameter values and assigns kernel weights based on the resulting batch-residual discrepancy, producing a self-normalized pseudo-posterior that is simple, parallelizable, and requires access only to the fitted regression coefficients rather than raw observations. We formalize the construction as an importance-sampling approximation to a population target that averages over simulator randomness, prove consistency as the number of parameter draws grows, and establish stability in estimating the surrogate regression from finite samples. We then characterize the asymptotic concentration as the batch size increases and the bandwidth shrinks, showing that the pseudo-posterior concentrates on an identified set determined by the chosen projection, thereby clarifying when the method yields point versus set identification. Experiments on a tractable nonlinear model and on a cosmological calibration task using the DREAMS simulation suite illustrate the computational advantages of regression-based projections and the identifiability limitations arising from low-information summaries.
- [832] arXiv:2602.03624 (cross-list from eess.SP) [pdf, ps, other]
-
Title: A Multi-decoder Neural Tracking Method for Accurately Predicting Speech IntelligibilitySubjects: Signal Processing (eess.SP); Sound (cs.SD)
Objective: EEG-based methods can predict speech intelligibility, but their accuracy and robustness lag behind behavioral tests, which typically show test-retest differences under 1 dB. We introduce the multi-decoder method to predict speech reception thresholds (SRTs) from EEG recordings, enabling objective assessment for populations unable to perform behavioral tests; such as those with disorders of consciousness or during hearing aid fitting. Approach: The method aggregates data from hundreds of decoders, each trained on different speech features and EEG preprocessing setups to quantify neural tracking (NT) of speech signals. Using data from 39 participants (ages 18-24), we recorded 29 minutes of EEG per person while they listened to speech at six signal-to-noise ratios and a quiet story. NT values were combined into a high-dimensional feature vector per subject, and a support vector regression model was trained to predict SRTs from these vectors. Main Result: Predictions correlated significantly with behavioral SRTs (r = 0.647, p < 0.001; NRMSE = 0.19), with all differences under 1 dB. SHAP analysis showed theta/delta bands and early lags had slightly greater influence. Using pretrained subject-independent decoders reduced required EEG data collection to 15 minutes (3 minutes of story, 12 minutes across six SNR conditions) without losing accuracy.
- [833] arXiv:2602.03654 (cross-list from nlin.AO) [pdf, ps, other]
-
Title: Noisy nonlocal aggregation model with gradient flow structuresComments: 15 pages; 4 figuresSubjects: Adaptation and Self-Organizing Systems (nlin.AO); Numerical Analysis (math.NA); Physics and Society (physics.soc-ph)
Interacting particle systems provide a fundamental framework for modeling collective behavior in biological, social, and physical systems. In many applications, stochastic perturbations are essential for capturing environmental variability and individual uncertainty, yet their impact on long-term dynamics and equilibrium structure remains incompletely understood, particularly in the presence of nonlocal interactions. We investigate a stochastic interacting particle system governed by potential-driven interactions and its continuum density formulation in the large-population limit. We introduce an energy functional and show that the macroscopic density evolution has a gradient-flow structure in the Wasserstein-2 space. The associated variational framework yields equilibrium states through constrained energy minimization and illustrates how noise regulates the density and mitigates singular concentration. We demonstrate the connection between microscopic and macroscopic descriptions through numerical examples in one and two dimensions. Within the variational framework, we compute energy minimizers and perform a linear stability analysis. The numerical results show that the stable minimizers agree with the long-time dynamics of the macroscopic density model.
- [834] arXiv:2602.03682 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Improved Analysis of the Accelerated Noisy Power Method with Applications to Decentralized PCASubjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Numerical Analysis (math.NA)
We analyze the Accelerated Noisy Power Method, an algorithm for Principal Component Analysis in the setting where only inexact matrix-vector products are available, which can arise for instance in decentralized PCA. While previous works have established that acceleration can improve convergence rates compared to the standard Noisy Power Method, these guarantees require overly restrictive upper bounds on the magnitude of the perturbations, limiting their practical applicability. We provide an improved analysis of this algorithm, which preserves the accelerated convergence rate under much milder conditions on the perturbations. We show that our new analysis is worst-case optimal, in the sense that the convergence rate cannot be improved, and that the noise conditions we derive cannot be relaxed without sacrificing convergence guarantees. We demonstrate the practical relevance of our results by deriving an accelerated algorithm for decentralized PCA, which has similar communication costs to non-accelerated methods. To our knowledge, this is the first decentralized algorithm for PCA with provably accelerated convergence.
- [835] arXiv:2602.03684 (cross-list from math.DG) [pdf, ps, other]
-
Title: Point Vortex Dynamics on Closed SurfacesAuthors: Marcel PadillaComments: Master Thesis, Technical University of BerlinSubjects: Differential Geometry (math.DG); Computational Geometry (cs.CG); Graphics (cs.GR); Dynamical Systems (math.DS); Fluid Dynamics (physics.flu-dyn)
The theory of point vortex dynamics has existed since Kirchhoff's proposal in 1891 and is still under development with connections to many fields in mathematics. As a strong simplification of the concept of vorticity it excels in computational speed for vorticity based fluid simulations at the cost of accuracy. Recent finding by Stefanella Boatto and Jair Koiller allowed the extension of this theory on to closed surfaces. A comprehensive guide to point vortex dynamics on closed surfaces with genus zero and vanishing total vorticity is presented here. Additionally fundamental knowledge of fluid dynamics and surfaces are explained in a way to unify the theory of point vortex dynamics of the plane, the sphere and closed surfaces together with implementation details and supplement material.
- [836] arXiv:2602.03711 (cross-list from eess.SP) [pdf, ps, other]
-
Title: VR-VFL: Joint Rate and Client Selection for Vehicular Federated Learning Under Imperfect CSIComments: This paper has been accepted for presentation at IEEE ICC 2026Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Federated learning in vehicular edge networks faces major challenges in efficient resource allocation, largely due to high vehicle mobility and the presence of imperfect channel state information. Many existing methods oversimplify these realities, often assuming fixed communication rounds or ideal channel conditions, which limits their effectiveness in real-world scenarios. To address this, we propose variable rate vehicular federated learning (VR-VFL), a novel federated learning method designed specifically for vehicular networks under imperfect channel state information. VR-VFL combines dynamic client selection with adaptive transmission rate selection, while also allowing round times to flex in response to changing wireless conditions. At its core, VR-VFL is built on a bi-objective optimization framework that strikes a balance between improving learning convergence and minimizing the time required to complete each round. By accounting for both the challenges of mobility and realistic wireless constraints, VR-VFL offers a more practical and efficient approach to federated learning in vehicular edge networks. Simulation results show that the proposed VR-VFL scheme achieves convergence approximately 40% faster than other methods in the literature.
- [837] arXiv:2602.03718 (cross-list from eess.SP) [pdf, ps, other]
-
Title: A Narrowband Fully-Analog Multi-Antenna TransmitterAuthors: Nikola ZlatanovSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
This paper proposes a narrowband fully-analog $N$-antenna transmitter that emulates the functionality of a narrowband fully-digital $N$-antenna transmitter. Specifically, in symbol interval $m$, the proposed fully-analog transmitter synthesizes an arbitrary complex excitation vector $\bm x[m]\in\mathbb{C}^N$ with prescribed total power $\|\bm x[m]\|_2^2=P$ from a single coherent RF tone, using only tunable phase-control elements embedded in a passive interferometric programmable network. The programmable network is excited through one input port while the remaining $N - 1$ input ports are impedance matched. In the ideal lossless case, the network transfer is unitary and therefore redistributes RF power among antenna ports without dissipative amplitude control.
The synthesis task is posed as a unitary state-preparation problem: program a unitary family so that $\bm V(\bm\varphi)\bm e_1=\bm c$, where $\bm c=\bm x/\sqrt{P}$ and $\|\bm c\|_2=1$. We provide a constructive realization and a closed-form programming rule: a binary magnitude-splitting tree allocates the desired per-antenna magnitudes $|c_n|$ using $N -1$ tunable split ratios, and a per-antenna output phase bank assigns the target phases using $N$ tunable phase shifts. The resulting architecture uses $2N-1$ real tunable degrees of freedom and admits a deterministic $O(N)$ programming procedure with no iterative optimization, enabling symbol-by-symbol updates when the chosen phase-control technology supports the required tuning speed.
Using representative COTS components, we model the RF-front-end DC power of the proposed fully-analog transmitter and compare it against an equivalent COTS fully-digital array. For $N\le 16$, the comparison indicates significant RF-front-end power savings for the fully-analog architecture.
The results in this paper are intended as a proof-of-concept for a narrowband fully-analog transmitter. - [838] arXiv:2602.03725 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Quantum Speedups for Derivative Pricing Beyond Black-ScholesAuthors: Dylan Herman, Yue Sun, Jin-Peng Liu, Marco Pistoia, Charlie Che, Rob Otter, Shouvanik Chakrabarti, Aram HarrowSubjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS); Computational Finance (q-fin.CP); Mathematical Finance (q-fin.MF)
This paper explores advancements in quantum algorithms for derivative pricing of exotics, a computational pipeline of fundamental importance in quantitative finance. For such cases, the classical Monte Carlo integration procedure provides the state-of-the-art provable, asymptotic performance: polynomial in problem dimension and quadratic in inverse-precision. While quantum algorithms are known to offer quadratic speedups over classical Monte Carlo methods, end-to-end speedups have been proven only in the simplified setting over the Black-Scholes geometric Brownian motion (GBM) model. This paper extends existing frameworks to demonstrate novel quadratic speedups for more practical models, such as the Cox-Ingersoll-Ross (CIR) model and a variant of Heston's stochastic volatility model, utilizing a characteristic of the underlying SDEs which we term fast-forwardability. Additionally, for general models that do not possess the fast-forwardable property, we introduce a quantum Milstein sampler, based on a novel quantum algorithm for sampling L\'evy areas, which enables quantum multi-level Monte Carlo to achieve quadratic speedups for multi-dimensional stochastic processes exhibiting certain correlation types.
We also present an improved analysis of numerical integration for derivative pricing, leading to substantial reductions in the resource requirements for pricing GBM and CIR models. Furthermore, we investigate the potential for additional reductions using arithmetic-free quantum procedures. Finally, we critique quantum partial differential equation (PDE) solvers as a method for derivative pricing based on amplitude estimation, identifying theoretical barriers that obstruct achieving a quantum speedup through this approach. Our findings significantly advance the understanding of quantum algorithms in derivative pricing, addressing key challenges and open questions in the field. - [839] arXiv:2602.03730 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Efficient Variance-reduced Estimation from Generative EHR Models: The SCOPE and REACH EstimatorsAuthors: Luke Solo, Matthew B.A. McDermott, William F. Parker, Bashar Ramadan, Michael C. Burkhart, Brett K. Beaulieu-JonesComments: 10 pages, 2 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Generative models trained using self-supervision of tokenized electronic health record (EHR) timelines show promise for clinical outcome prediction. This is typically done using Monte Carlo simulation for future patient trajectories. However, existing approaches suffer from three key limitations: sparse estimate distributions that poorly differentiate patient risk levels, extreme computational costs, and high sampling variance. We propose two new estimators: the Sum of Conditional Outcome Probability Estimator (SCOPE) and Risk Estimation from Anticipated Conditional Hazards (REACH), that leverage next-token probability distributions discarded by standard Monte Carlo. We prove both estimators are unbiased and that REACH guarantees variance reduction over Monte Carlo sampling for any model and outcome. Empirically, on hospital mortality prediction in MIMIC-IV using the ETHOS-ARES framework, SCOPE and REACH match 100-sample Monte Carlo performance using only 10-11 samples (95% CI: [9,11]), representing a ~10x reduction in inference cost without degrading calibration. For ICU admission prediction, efficiency gains are more modest (~1.2x), which we attribute to the outcome's lower "spontaneity," a property we characterize theoretically and empirically. These methods substantially improve the feasibility of deploying generative EHR models in resource-constrained clinical settings.
- [840] arXiv:2602.03744 (cross-list from physics.med-ph) [pdf, ps, other]
-
Title: Reducing acquisition time and radiation damage: data-driven subsampling for spectro-microscopySubjects: Medical Physics (physics.med-ph); Numerical Analysis (math.NA); Optics (physics.optics)
Spectro-microscopy is an experimental technique which can be used to observe spatial variations in chemical state and changes in chemical state over time or under experimental conditions. As a result it has broad applications across areas such as energy materials, catalysis, environmental science and biological samples. However, the technique is often limited by factors such as long acquisition times and radiation damage. We present two measurement strategies that allow for significantly shorter experiment times and total doses applied. The strategies are based on taking only a small subset of all the measurements (e.g. sparse acquisition or subsampling), and then computationally reconstructing all unobserved measurements using mathematical techniques. The methods are data-driven, using spectral and spatial importance subsampling distributions to identify important measurements. As a result, taking as little as 4-6\% of the measurements is sufficient to capture the same information as in a conventional scan.
- [841] arXiv:2602.03746 (cross-list from math.CO) [pdf, ps, other]
-
Title: Factor-balancedness, linear recurrence, and factor complexityComments: 43 pages, 3 figuresSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Dynamical Systems (math.DS)
In the study of infinite words, various notions of balancedness provide quantitative measures for how regularly letters or factors occur, and they find applications in several areas of mathematics and theoretical computer science. In this paper, we study factor-balancedness and uniform factor-balancedness, making two main contributions. First, we establish general sufficient conditions for an infinite word to be (uniformly) factor-balanced, applicable in particular to any given linearly recurrent word. These conditions are formulated in terms of $\mathcal{S}$-adic representations and generalize results of Adamczewski on primitive substitutive words, which show that balancedness of length-2 factors already implies uniform factor-balancedness. As an application of our criteria, we characterize the Sturmian words and ternary Arnoux--Rauzy words that are uniformly factor-balanced as precisely those with bounded weak partial quotients. Our second main contribution is a study of the relationship between factor-balancedness and factor complexity. In particular, we analyze the non-primitive substitutive case and construct an example of a factor-balanced word with exponential factor complexity, thereby making progress on a question raised in 2025 by Arnoux, Berth\'e, Minervino, Steiner, and Thuswaldner on the relation between balancedness and discrete spectrum.
- [842] arXiv:2602.03762 (cross-list from eess.AS) [pdf, ps, other]
-
Title: Conditional Flow Matching for Visually-Guided Acoustic HighlightingSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
Visually-guided acoustic highlighting seeks to rebalance audio in alignment with the accompanying video, creating a coherent audio-visual experience. While visual saliency and enhancement have been widely studied, acoustic highlighting remains underexplored, often leading to misalignment between visual and auditory focus. Existing approaches use discriminative models, which struggle with the inherent ambiguity in audio remixing, where no natural one-to-one mapping exists between poorly-balanced and well-balanced audio mixes. To address this limitation, we reframe this task as a generative problem and introduce a Conditional Flow Matching (CFM) framework. A key challenge in iterative flow-based generation is that early prediction errors -- in selecting the correct source to enhance -- compound over steps and push trajectories off-manifold. To address this, we introduce a rollout loss that penalizes drift at the final step, encouraging self-correcting trajectories and stabilizing long-range flow integration. We further propose a conditioning module that fuses audio and visual cues before vector field regression, enabling explicit cross-modal source selection. Extensive quantitative and qualitative evaluations show that our method consistently surpasses the previous state-of-the-art discriminative approach, establishing that visually-guided audio remixing is best addressed through generative modeling.
- [843] arXiv:2602.03776 (cross-list from q-fin.CP) [pdf, ps, other]
-
Title: DiffLOB: Diffusion Models for Counterfactual Generation in Limit Order BooksComments: 12 pages, 8 figuresSubjects: Computational Finance (q-fin.CP); Artificial Intelligence (cs.AI)
Modern generative models for limit order books (LOBs) can reproduce realistic market dynamics, but remain fundamentally passive: they either model what typically happens without accounting for hypothetical future market conditions, or they require interaction with another agent to explore alternative outcomes. This limits their usefulness for stress testing, scenario analysis, and decision-making. We propose \textbf{DiffLOB}, a regime-conditioned \textbf{Diff}usion model for controllable and counterfactual generation of \textbf{LOB} trajectories. DiffLOB explicitly conditions the generative process on future market regimes--including trend, volatility, liquidity, and order-flow imbalance, which enables the model to answer counterfactual queries of the form: ``If the future market regime were X instead of Y, how would the limit order book evolve?'' Our systematic evaluation framework for counterfactual LOB generation consists of three criteria: (1) \textit{Controllable Realism}, measuring how well generated trajectories can reproduce marginal distributions, temporal dependence structure and regime variables; (2) \textit{Counterfactual validity}, testing whether interventions on future regimes induce consistent changes in the generated LOB dynamics; (3) \textit{Counterfactual usefulness}, assessing whether synthetic counterfactual trajectories improve downstream prediction of future market regimes.
- [844] arXiv:2602.03789 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Fast Sampling for Flows and Diffusions with Lazy and Point Mass Stochastic InterpolantsSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Stochastic interpolants unify flows and diffusions, popular generative modeling frameworks. A primary hyperparameter in these methods is the interpolation schedule that determines how to bridge a standard Gaussian base measure to an arbitrary target measure. We prove how to convert a sample path of a stochastic differential equation (SDE) with arbitrary diffusion coefficient under any schedule into the unique sample path under another arbitrary schedule and diffusion coefficient. We then extend the stochastic interpolant framework to admit a larger class of point mass schedules in which the Gaussian base measure collapses to a point mass measure. Under the assumption of Gaussian data, we identify lazy schedule families that make the drift identically zero and show that with deterministic sampling one gets a variance-preserving schedule commonly used in diffusion models, whereas with statistically optimal SDE sampling one gets our point mass schedule. Finally, to demonstrate the usefulness of our theoretical results on realistic highly non-Gaussian data, we apply our lazy schedule conversion to a state-of-the-art pretrained flow model and show that this allows for generating images in fewer steps without retraining the model.
- [845] arXiv:2602.03823 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Preference-based Conditional Treatment Effects and Policy LearningComments: Accepted to AISTATS 2026; 10 pages + appendixSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We introduce a new preference-based framework for conditional treatment effect estimation and policy learning, built on the Conditional Preference-based Treatment Effect (CPTE). CPTE requires only that outcomes be ranked under a preference rule, unlocking flexible modeling of heterogeneous effects with multivariate, ordinal, or preference-driven outcomes. This unifies applications such as conditional probability of necessity and sufficiency, conditional Win Ratio, and Generalized Pairwise Comparisons. Despite the intrinsic non-identifiability of comparison-based estimands, CPTE provides interpretable targets and delivers new identifiability conditions for previous unidentifiable estimands. We present estimation strategies via matching, quantile, and distributional regression, and further design efficient influence-function estimators to correct plug-in bias and maximize policy value. Synthetic and semi-synthetic experiments demonstrate clear performance gains and practical impact.
- [846] arXiv:2602.03824 (cross-list from q-bio.PE) [pdf, ps, other]
-
Title: Deep-learning-based pan-phenomic data reveals the explosive evolution of avian visual disparityAuthors: Jiao SunComments: Readers from the field of computer science may be interested in section 2.1, 2.2, 3.1, 4.1, 4.2. These sections discussed the interpretability and representation learning, especially the texture vs shape problem, highlighting our model's ability of overcoming the texture biases and capturing overall shape features. (Although they're put here to prove the biological validity of the model.)Subjects: Populations and Evolution (q-bio.PE); Computer Vision and Pattern Recognition (cs.CV)
The evolution of biological morphology is critical for understanding the diversity of the natural world, yet traditional analyses often involve subjective biases in the selection and coding of morphological traits. This study employs deep learning techniques, utilising a ResNet34 model capable of recognising over 10,000 bird species, to explore avian morphological evolution. We extract weights from the model's final fully connected (fc) layer and investigate the semantic alignment between the high-dimensional embedding space learned by the model and biological phenotypes. The results demonstrate that the high-dimensional embedding space encodes phenotypic convergence. Subsequently, we assess the morphological disparity among various taxa and evaluate the association between morphological disparity and species richness, demonstrating that species richness is the primary driver of morphospace expansion. Moreover, the disparity-through-time analysis reveals a visual "early burst" after the K-Pg extinction.
While mainly aimed at evolutionary analysis, this study also provides insights into the interpretability of Deep Neural Networks. We demonstrate that hierarchical semantic structures (biological taxonomy) emerged in the high-dimensional embedding space despite being trained on flat labels. Furthermore, through adversarial examples, we provide evidence that our model in this task can overcome texture bias and learn holistic shape representations (body plans), challenging the prevailing view that CNNs rely primarily on local textures. - [847] arXiv:2602.03833 (cross-list from math.CO) [pdf, ps, other]
-
Title: Excluding an apex-forest or a fan as quickly as possibleSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
We show that every graph $G$ excluding an apex-forest $H$ as a minor has layered pathwidth at most $|V(H)|-2$, and that every graph $G$ excluding an apex-linear forest (such as a fan) $H$ as a minor has layered treedepth at most $|V(H)|-2$. We further show that both bounds are optimal. These results improve on recent results of Hodor, La, Micek, and Rambaud (2025): The first result improves the previous best-known bound by a multiplicative factor of $2$, while the second strengthens a previous quadratic bound. In addition, we reduce from quadratic to linear the bound on the $S$-focused treedepth $\mathrm{td}(G,S)$ for graphs $G$ with a prescribed set of vertices $S$ excluding models of paths in which every branch set intersects~$S$.
Replacements for Wed, 4 Feb 26
- [848] arXiv:1711.00282 (replaced) [pdf, ps, other]
-
Title: Inapproximability of the independent set polynomial in the complex planeComments: The proof of Lemma 12 doesn't work as written here since the value returned by Phi_i in GetPoint (p17) may lie outside of B(z_0,r). See Lemma 5.3 of arXiv:2512.11504 by Bencs, Piombi, and Regts, where this is fixed via contraction over B(m,3r) (their modified Lemma can be used to establish Propositions 6 and 15, see Remark 5.4 of their paper). We thank them for pointing this outSubjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM)
- [849] arXiv:2110.08902 (replaced) [pdf, ps, other]
-
Title: On the Convergence of Experience Replay in Policy Optimization: Characterizing Bias, Variance, and Finite-Time ConvergenceComments: 37 pages; v5 retains only the portion of v4 covering the theoretical results on experience replaySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [850] arXiv:2201.02514 (replaced) [pdf, ps, other]
-
Title: Efficiency of ANS Entropy EncodersAuthors: Dmitry KosolobovComments: 25 pages, 5 figures, 1 table, 2 listingsSubjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS)
- [851] arXiv:2202.11663 (replaced) [pdf, ps, other]
-
Title: Fast Reconfiguration for Programmable MatterSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Geometry (cs.CG)
- [852] arXiv:2301.07473 (replaced) [pdf, ps, other]
-
Title: Discrete Latent Structure in Neural NetworksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [853] arXiv:2301.12412 (replaced) [pdf, ps, other]
-
Title: Contextual Causal Bayesian OptimisationJournal-ref: International Conference on Learning Representations 2026Subjects: Machine Learning (cs.LG)
- [854] arXiv:2304.04914 (replaced) [pdf, ps, other]
-
Title: Regulatory Markets: The Future of AI GovernanceJournal-ref: Jurimetrics: The Journal of Law, Science and Technology, Volume 65 pp. 195-240 (2026)Subjects: Artificial Intelligence (cs.AI); General Economics (econ.GN)
- [855] arXiv:2305.11408 (replaced) [pdf, ps, other]
-
Title: AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech TranslationJournal-ref: Proceedings of INTERSPEECH 2023Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [856] arXiv:2403.13393 (replaced) [pdf, ps, other]
-
Title: Causal Graph Dynamics and Kan ExtensionsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM); Multiagent Systems (cs.MA)
- [857] arXiv:2405.00789 (replaced) [pdf, ps, other]
-
Title: Classically Spoofing System Linear Cross Entropy Score BenchmarkingComments: 29 pagesSubjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC)
- [858] arXiv:2405.02162 (replaced) [pdf, ps, other]
-
Title: Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation ModelsJournal-ref: Robotics, vol. 15, no. 2, article 31, 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [859] arXiv:2405.09125 (replaced) [pdf, ps, other]
-
Title: HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text RecognitionComments: 12 pages, 12 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [860] arXiv:2405.14273 (replaced) [pdf, ps, other]
-
Title: Exact Solution to Data-Driven Inverse Optimization of MILPs in Finite Time via Gradient-Based MethodsAuthors: Akira KitaokaComments: 42 pages; comments are welcomeSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
- [861] arXiv:2405.15743 (replaced) [pdf, ps, other]
-
Title: Sparse maximal update parameterization: A holistic approach to sparse training dynamicsComments: 10 pages main text, 10 pages reference and appendix, 14 figures, NeurIPS Camera-ReadySubjects: Machine Learning (cs.LG)
- [862] arXiv:2405.18605 (replaced) [pdf, ps, other]
-
Title: Merged ChemProt-DrugProt for Relation Extraction from Biomedical LiteratureSubjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Molecular Networks (q-bio.MN)
- [863] arXiv:2406.13930 (replaced) [pdf, ps, other]
-
Title: ME-IGM: Individual-Global-Max in Maximum Entropy Multi-Agent Reinforcement LearningComments: Published in the Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)Journal-ref: Proc. of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026), Paphos, Cyprus, May 25 - 29, 2026, IFAAMAS, 19 pagesSubjects: Machine Learning (cs.LG)
- [864] arXiv:2406.15163 (replaced) [pdf, ps, other]
-
Title: A Syntax-Injected Approach for Faster and More Accurate Sentiment AnalysisSubjects: Computation and Language (cs.CL)
- [865] arXiv:2407.03094 (replaced) [pdf, ps, other]
-
Title: Conformal Prediction for Causal Effects of Continuous TreatmentsAuthors: Maresa Schröder, Dennis Frauen, Jonas Schweisthal, Konstantin Heß, Valentyn Melnychuk, Stefan FeuerriegelComments: Accepted at NeurIPS 2025Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
- [866] arXiv:2408.00436 (replaced) [pdf, ps, other]
-
Title: A Search for High-Threshold Qutrit Magic State Distillation RoutinesComments: 31 pages, 5 figures, one ancillary fileSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Combinatorics (math.CO)
- [867] arXiv:2409.00592 (replaced) [pdf, ps, other]
-
Title: Hyper-Compression: Model Compression via HyperfunctionAuthors: Fenglei Fan, Juntong Fan, Dayang Wang, Jingbo Zhang, Zelin Dong, Shijun Zhang, Ge Wang, Tieyong ZengSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
- [868] arXiv:2409.15113 (replaced) [pdf, ps, other]
-
Title: Inferring Scientific Cross-Document Coreference and Hierarchy with Definition-Augmented Relational ReasoningComments: Accepted to TACL. Pre-MIT Press publication versionSubjects: Computation and Language (cs.CL)
- [869] arXiv:2410.01615 (replaced) [pdf, ps, other]
-
Title: Saliency-Guided DETR for Moment Retrieval and Highlight DetectionComments: 8 pages, 2 figure, 6 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [870] arXiv:2410.03481 (replaced) [pdf, ps, other]
-
Title: Compact LED-Based Displacement Sensing for Robot FingersSubjects: Robotics (cs.RO)
- [871] arXiv:2410.04779 (replaced) [pdf, ps, other]
-
Title: Fast Training of Sinusoidal Neural Fields via Scaling InitializationComments: ICLR 2025Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [872] arXiv:2410.23222 (replaced) [pdf, ps, other]
-
Title: Dataset-Driven Channel Masks in Transformers for Multivariate Time SeriesComments: ICASSP 2026. Preliminary version: NeurIPS Workshop on Time Series in the Age of Large Models 2024 (Oral presentation)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
- [873] arXiv:2411.06158 (replaced) [pdf, ps, other]
-
Title: Quantization Meets Projection: A Happy Marriage for Approximate k-Nearest Neighbor SearchComments: Accepted at VLDB2026, Technical ReportSubjects: Databases (cs.DB)
- [874] arXiv:2411.06501 (replaced) [pdf, ps, other]
-
Title: Individual Regret in Cooperative Stochastic Multi-Armed BanditsComments: 55 pages, 1 figureSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [875] arXiv:2411.12992 (replaced) [pdf, ps, other]
-
Title: MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected LayersAuthors: Ning Ding, Yehui Tang, Haochen Qin, Zhenli Zhou, Chao Xu, Lin Li, Kai Han, Heng Liao, Yunhe WangComments: NeurIPS 2024. Code available at this https URLSubjects: Computation and Language (cs.CL)
- [876] arXiv:2411.14349 (replaced) [pdf, ps, other]
-
Title: Agnostic Learning of Arbitrary ReLU Activation under Gaussian MarginalsSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
- [877] arXiv:2412.10625 (replaced) [pdf, ps, other]
-
Title: Certainty-Equivalence Model Predictive Control: Stability, Performance, and BeyondComments: To appear in IEEE Transactions on Automatic Control (July 2026)Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
- [878] arXiv:2412.20418 (replaced) [pdf, ps, other]
-
Title: Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and AlignmentAuthors: Shiyun Chen, Li Lin, Pujin Cheng, ZhiCheng Jin, JianJian Chen, HaiDong Zhu, Kenneth K. Y. Wong, Xiaoying TangComments: International Workshop on Machine Learning in Medical Imaging, 668-678Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [879] arXiv:2501.00382 (replaced) [pdf, ps, other]
-
Title: Adventures in Demand Analysis Using AIAuthors: Philipp Bach, Victor Chernozhukov, Sven Klaassen, Martin Spindler, Jan Teichert-Kluge, Suhas VijaykumarComments: 35 pages, 8 figuresSubjects: General Economics (econ.GN); Artificial Intelligence (cs.AI); Applications (stat.AP); Machine Learning (stat.ML)
- [880] arXiv:2501.01062 (replaced) [pdf, ps, other]
-
Title: Fides: Secure and Scalable Asynchronous DAG Consensus via Trusted ComponentsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)
- [881] arXiv:2501.02770 (replaced) [pdf, ps, other]
-
Title: Multi-Agent Pathfinding Under Team-Connected Communication Constraint via Adaptive Path Expansion and Dynamic LeadingSubjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
- [882] arXiv:2501.15280 (replaced) [pdf, ps, other]
-
Title: If It's Nice, Do It Twice: We Should Try Iterative Corpus CurationAuthors: Robin YoungSubjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Computer Science and Game Theory (cs.GT)
- [883] arXiv:2501.18533 (replaced) [pdf, ps, other]
- [884] arXiv:2502.01177 (replaced) [pdf, ps, other]
-
Title: Deep Graph Learning will stall without Network ScienceSubjects: Machine Learning (cs.LG)
- [885] arXiv:2502.02542 (replaced) [pdf, ps, other]
-
Title: OverThink: Slowdown Attacks on Reasoning LLMsAuthors: Abhinav Kumar, Jaechul Roh, Ali Naseh, Marzena Karpinska, Mohit Iyyer, Amir Houmansadr, Eugene BagdasarianSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
- [886] arXiv:2502.05743 (replaced) [pdf, ps, other]
-
Title: Understanding Representation Dynamics of Diffusion Models via Low-Dimensional ModelingComments: First two authors contributed equally. Accepted at NeurIPS 2025Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [887] arXiv:2502.07077 (replaced) [pdf, ps, other]
-
Title: Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language ModelsAuthors: Lujain Ibrahim, Canfer Akbulut, Rasmi Elasmar, Charvi Rastogi, Minsuk Kahng, Meredith Ringel Morris, Kevin R. McKee, Verena Rieser, Murray Shanahan, Laura WeidingerSubjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
- [888] arXiv:2502.08841 (replaced) [pdf, ps, other]
-
Title: Audit of takedown delays across social media reveals failure to reduce exposure to illegal contentAuthors: Bao Tran Truong, Sangyeon Kim, Gianluca Nogara, Enrico Verdolotti, Erfan Samieyan Sahneh, Florian Saurwein, Natascha Just, Luca Luceri, Silvia Giordano, Filippo MenczerComments: 19 pages, 9 figures, 2 tables, 42 referencesSubjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)
- [889] arXiv:2502.14782 (replaced) [pdf, ps, other]
-
Title: A Neural Operator Emulator for Coastal and Riverine Shallow Water DynamicsAuthors: Peter Rivera-Casillas, Sourav Dutta, Shukai Cai, Mark Loveland, Kamaljyoti Nath, Khemraj Shukla, Corey Trahan, Jonghyun Lee, Matthew Farthing, Clint DawsonSubjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Geophysics (physics.geo-ph)
- [890] arXiv:2502.16667 (replaced) [pdf, ps, other]
-
Title: MetaSym: A Symplectic Meta-learning Framework for Physical IntelligenceAuthors: Pranav Vaidhyanathan, Aristotelis Papatheodorou, Mark T. Mitchison, Natalia Ares, Ioannis HavoutisComments: Published in Transactions on Machine Learning Research (TMLR), 10 + 18 pages, 9 figures, 10 tablesJournal-ref: Trans. Mach. Learn. Res., 2026Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Computational Physics (physics.comp-ph); Quantum Physics (quant-ph)
- [891] arXiv:2502.18179 (replaced) [pdf, ps, other]
-
Title: Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMsComments: accepted at EMNLP'25Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [892] arXiv:2503.00227 (replaced) [pdf, ps, other]
-
Title: The Learning Approach to GamesSubjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH); Optimization and Control (math.OC)
- [893] arXiv:2503.11294 (replaced) [pdf, ps, other]
-
Title: Latent Space Representation of Electricity Market Curves: Maintaining Structural IntegrityComments: 8 pages, 3 figuresSubjects: Machine Learning (cs.LG)
- [894] arXiv:2503.11717 (replaced) [pdf, ps, other]
-
Title: LP-MPPI: Low-Pass Filtering for Efficient Model Predictive Path Integral ControlAuthors: Piotr KickiComments: Accepted at International Conference on Robotics and Automation 2026 (ICRA 2026)Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
- [895] arXiv:2503.12968 (replaced) [pdf, ps, other]
-
Title: OptiPMB: Enhancing 3D Multi-Object Tracking with Optimized Poisson Multi-Bernoulli FilteringAuthors: Guanhua Ding, Yuxuan Xia, Runwei Guan, Qinchen Wu, Tao Huang, Weiping Ding, Jinping Sun, Guoqiang MaoSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [896] arXiv:2503.13745 (replaced) [pdf, ps, other]
-
Title: FedVSR: Towards Model-Agnostic Federated Learning in Video Super-ResolutionComments: Final version. Accepted at ACM Multimedia Systems (MMSys) 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
- [897] arXiv:2503.17089 (replaced) [pdf, ps, other]
-
Title: Understanding-informed Bias Mitigation for Fair CMR SegmentationAuthors: Tiarna Lee, Esther Puyol-Antón, Bram Ruijsink, Pier-Giorgio Masci, Louise Keehn, Phil Chowienczyk, Emily Haseler, Miaojing Shi, Andrew P. KingComments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) this https URLJournal-ref: Machine.Learning.for.Biomedical.Imaging. 3 (2025)Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [898] arXiv:2503.17736 (replaced) [pdf, ps, other]
-
Title: V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model InteractionAuthors: Yiming Zhao, Yu Zeng, Yukun Qi, YaoYang Liu, Xikun Bao, Lin Chen, Zehui Chen, Qing Miao, Chenxi Liu, Jie Zhao, Feng ZhaoComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [899] arXiv:2503.19859 (replaced) [pdf, ps, other]
-
Title: An Overview of Low-Rank Structures in the Training and Adaptation of Large ModelsAuthors: Laura Balzano, Tianjiao Ding, Benjamin D. Haeffele, Soo Min Kwon, Qing Qu, Peng Wang, Zhangyang Wang, Can YarasComments: Authors are listed alphabetically; 37 pages, 15 figures; minor revision at IEEE Signal Processing MagazineSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC); Computation (stat.CO); Machine Learning (stat.ML)
- [900] arXiv:2503.22782 (replaced) [pdf, ps, other]
-
Title: Patronus: Interpretable Diffusion Models with PrototypesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [901] arXiv:2504.02443 (replaced) [pdf, ps, other]
-
Title: Language-Integrated Recursive QueriesSubjects: Programming Languages (cs.PL)
- [902] arXiv:2504.02546 (replaced) [pdf, ps, other]
-
Title: GPG: A Simple and Strong Reinforcement Learning Baseline for Model ReasoningComments: Accepted to ICLR2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [903] arXiv:2504.03622 (replaced) [pdf, ps, other]
-
Title: Align to Structure: Aligning Large Language Models with Structural InformationAuthors: Zae Myung Kim, Anand Ramachandran, Farideh Tavazoee, Joo-Kyung Kim, Oleg Rokhlenko, Dongyeop KangComments: Accepted to AAAI 2026 AIASubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [904] arXiv:2504.16299 (replaced) [pdf, ps, other]
-
Title: Towards Quantum Universal Hypothesis TestingComments: Accepted at ITW 2025Journal-ref: Published in: ITW 2025Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)
- [905] arXiv:2504.17878 (replaced) [pdf, ps, other]
-
Title: Crypto-ncRNA: a bio-inspired post-quantum cryptographic primitive exploiting RNA folding complexityComments: Accepted at the AI4NA workshop at ICLR 2025Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
- [906] arXiv:2504.19470 (replaced) [pdf, ps, other]
-
Title: A Cautionary Note on Quantum OraclesComments: v2: added references and discussionSubjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC)
- [907] arXiv:2504.20106 (replaced) [pdf, ps, other]
-
Title: Adaptive Helpfulness-Harmlessness Alignment with Preference VectorsAuthors: Ren-Wei Liang, Chin-Ting Hsu, Chan-Hung Yu, Saransh Agrawal, Shih-Cheng Huang, Shang-Tse Chen, Kuan-Hao Huang, Shao-Hua SunComments: Accepted at The 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026), Rabat, Morocco. 22 pages, 5 figures, 9 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [908] arXiv:2505.04283 (replaced) [pdf, ps, other]
-
Title: On multiplicities of interpoint distancesComments: 11 pages, 4 figures, minor typos correctedSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
- [909] arXiv:2505.04494 (replaced) [pdf, ps, other]
-
Title: A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable GuidanceComments: 54 pages, 1 figure; Revised version with additional finite-time convergence resultsSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
- [910] arXiv:2505.04638 (replaced) [pdf, ps, other]
-
Title: Advancing AI Research Assistants with Expert-Involved LearningAuthors: Tianyu Liu, Simeng Han, Hanchen Wang, Xiao Luo, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu ZhaoComments: 36 pages, 7 figuresSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
- [911] arXiv:2505.06927 (replaced) [pdf, ps, other]
-
Title: Stability Regularized Cross-ValidationComments: Some of this material previously appeared in 2306.14851v2, which we have split into two papers (this one and 2306.14851v3), because it contained two ideas that need separate papersSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
- [912] arXiv:2505.12311 (replaced) [pdf, ps, other]
-
Title: Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented OptimizationComments: Main text 10 pages with 7 figuresSubjects: Robotics (cs.RO)
- [913] arXiv:2505.12387 (replaced) [pdf, ps, other]
-
Title: Neural Thermodynamics: Entropic Forces in Deep and Universal Representation LearningComments: Published at NeurIPS 2025Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Mathematical Physics (math-ph); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
- [914] arXiv:2505.12728 (replaced) [pdf, ps, other]
-
Title: SpecFLASH: A Latent-Guided Semi-autoregressive Speculative Decoding Framework for Efficient Multimodal GenerationComments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- [915] arXiv:2505.12940 (replaced) [pdf, ps, other]
-
Title: Multi-Level Monte Carlo Training of Neural OperatorsComments: Accepted in Computer Methods in Applied Mechanics and EngineeringSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
- [916] arXiv:2505.13102 (replaced) [pdf, ps, other]
-
Title: Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic ForecastComments: 24 pages, 7 figures, 11 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
- [917] arXiv:2505.13197 (replaced) [pdf, ps, other]
-
Title: Inferring stochastic dynamics with growth from cross-sectional dataComments: 10 pages, 5 figures, NeurIPS 2025Subjects: Machine Learning (cs.LG); Biological Physics (physics.bio-ph); Quantitative Methods (q-bio.QM)
- [918] arXiv:2505.13696 (replaced) [pdf, ps, other]
-
Title: Building spatial world models from sparse transitional episodic memoriesComments: Accepted ICLR 2026Subjects: Artificial Intelligence (cs.AI)
- [919] arXiv:2505.14103 (replaced) [pdf, ps, other]
-
Title: AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language ModelsAuthors: Guangke Chen, Fu Song, Zhe Zhao, Xiaojun Jia, Yang Liu, Yanchen Qiao, Weizhe Zhang, Weiping Tu, Yuhong Yang, Bo DuComments: Accepted by IEEE Transactions on Dependable and Secure Computing (TDSC)Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [920] arXiv:2505.14661 (replaced) [pdf, ps, other]
-
Title: Abacus: A Cost-Based Optimizer for Semantic Operator SystemsAuthors: Matthew Russo, Chunwei Liu, Sivaprasad Sudhir, Gerardo Vitagliano, Michael Cafarella, Tim Kraska, Samuel MaddenComments: To be published in VLDB'26, 14 pages, 8 figuresJournal-ref: PVLDB, 19(5): 1060 - 1073, 2026Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)
- [921] arXiv:2505.16644 (replaced) [pdf, ps, other]
-
Title: Learning non-equilibrium diffusions with Schrödinger bridges: from exactly solvable to simulation-freeComments: 10 pages, 5 figures, NeurIPS 2025Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
- [922] arXiv:2505.16936 (replaced) [pdf, ps, other]
-
Title: SPAR: Self-supervised Placement-Aware Representation Learning for Distributed SensingAuthors: Yizhuo Chen, Tianchen Wang, You Lyu, Yanlan Hu, Jinyang Li, Tomoyoshi Kimura, Hongjue Zhao, Yigong Hu, Denizhan Kara, Tarek AbdelzaherSubjects: Machine Learning (cs.LG)
- [923] arXiv:2505.16964 (replaced) [pdf, ps, other]
-
Title: MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical ReasoningAuthors: Suhao Yu, Haojin Wang, Juncheng Wu, Luyang Luo, Jingshen Wang, Cihang Xie, Pranav Rajpurkar, Carl Yang, Yang Yang, Kang Wang, Yannan Yu, Yuyin ZhouComments: 27 pages, 15 Figures Benchmark data: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [924] arXiv:2505.17001 (replaced) [pdf, ps, other]
-
Title: Seeing through Satellite Images at Street ViewsAuthors: Ming Qian, Bin Tan, Qiuyu Wang, Xianwei Zheng, Hanjiang Xiong, Gui-Song Xia, Yujun Shen, Nan XueComments: Accepted to IEEE TPAMI. Initially submitted in July 2024. Code is available on this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [925] arXiv:2505.17730 (replaced) [pdf, ps, other]
-
Title: Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted dataAuthors: Stefan Schoepf, Michael Curtis Mozer, Nicole Elyse Mitchell, Alexandra Brintrup, Georgios Kaissis, Peter Kairouz, Eleni TriantafillouComments: Accepted as a main track paper at ICLR 2026 this https URLSubjects: Machine Learning (cs.LG)
- [926] arXiv:2505.17782 (replaced) [pdf, ps, other]
-
Title: Thalia: A Global, Multi-Modal Dataset for Volcanic Activity MonitoringAuthors: Nikolas Papadopoulos, Nikolaos Ioannis Bountos, Maria Sdraka, Andreas Karavias, Gustau Camps-Valls, Ioannis PapoutsisSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [927] arXiv:2505.17813 (replaced) [pdf, ps, other]
-
Title: Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM ReasoningSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [928] arXiv:2505.17961 (replaced) [pdf, ps, other]
-
Title: Federated Causal Inference from Multi-Site Observational Data via Propensity Score AggregationSubjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Applications (stat.AP)
- [929] arXiv:2505.18842 (replaced) [pdf, ps, other]
-
Title: v1: Learning to Point Visual Tokens for Multimodal Grounded ReasoningSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [930] arXiv:2505.19420 (replaced) [pdf, ps, other]
-
Title: CAD-SLAM: Consistency-Aware Dynamic SLAM with Dynamic-Static Decoupled MappingAuthors: Wenhua Wu, Chenpeng Su, Siting Zhu, Tianchen Deng, Jianhao Jiao, Guangming Wang, Dimitrios Kanoulas, Zhe Liu, Hesheng WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [931] arXiv:2505.20272 (replaced) [pdf, ps, other]
-
Title: Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement LearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [932] arXiv:2505.23506 (replaced) [pdf, ps, other]
-
Title: Position: Epistemic uncertainty estimation methods are fundamentally incompleteSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [933] arXiv:2506.01418 (replaced) [pdf, ps, other]
-
Title: SEMNAV: Enhancing Visual Semantic Navigation in Robotics through Semantic SegmentationAuthors: Rafael Flor-Rodríguez, Carlos Gutiérrez-Álvarez, Francisco Javier Acevedo-Rodríguez, Sergio Lafuente-Arroyo, Roberto J. López-SastreSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [934] arXiv:2506.02293 (replaced) [pdf, ps, other]
-
Title: On Universality Classes of Equivariant NetworksComments: Advances in Neural Information Processing Systems 38 (NeurIPS 2025; Spotlight presentation). Total 25 pagesSubjects: Machine Learning (cs.LG)
- [935] arXiv:2506.04289 (replaced) [pdf, ps, other]
-
Title: Relational reasoning and inductive bias in transformers and large language modelsComments: 15 pages, 10 figuresSubjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
- [936] arXiv:2506.04536 (replaced) [pdf, ps, other]
-
Title: NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron ModelsAuthors: Luca Ghafourpour, Valentin Duruisseaux, Bahareh Tolooshams, Philip H. Wong, Costas A. Anastassiou, Anima AnandkumarSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
- [937] arXiv:2506.07326 (replaced) [pdf, ps, other]
-
Title: Reward Model Interpretability via Optimal and Pessimal TokensAuthors: Brian Christian, Hannah Rose Kirk, Jessica A.F. Thompson, Christopher Summerfield, Tsvetomira DumbalskaComments: Accepted for publication in Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT '25), to appear June 2025Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
- [938] arXiv:2506.08347 (replaced) [pdf, ps, other]
-
Title: Differentially Private Relational Learning with Entity-level Privacy GuaranteesSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
- [939] arXiv:2506.11338 (replaced) [pdf, ps, other]
-
Title: Surprisal from Larger Transformer-based Language Models Predicts fMRI Data More PoorlyComments: EACL 2026Subjects: Computation and Language (cs.CL)
- [940] arXiv:2506.13715 (replaced) [pdf, ps, other]
-
Title: Sharpness-Aware Machine UnlearningComments: Accepted to ICLR 2026Subjects: Machine Learning (cs.LG)
- [941] arXiv:2506.17146 (replaced) [pdf, ps, other]
-
Title: A tutorial overview of model predictive control for continuous crystallization: current possibilities and future perspectivesSubjects: Systems and Control (eess.SY)
- [942] arXiv:2506.17873 (replaced) [pdf, ps, other]
-
Title: SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language ModelAuthors: Guankun Wang, Junyi Wang, Wenjin Mo, Long Bai, Kun Yuan, Ming Hu, Jinlin Wu, Junjun He, Yiming Huang, Nicolas Padoy, Zhen Lei, Hongbin Liu, Nassir Navab, Hongliang RenSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [943] arXiv:2506.18751 (replaced) [pdf, ps, other]
-
Title: Sensitivity analysis of image classification models using generalized polynomial chaosSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [944] arXiv:2506.19004 (replaced) [pdf, ps, other]
-
Title: Broken Tokens? Your Language Model can Secretly Handle Non-Canonical TokenizationsAuthors: Brian Siyuan Zheng, Alisa Liu, Orevaoghene Ahia, Jonathan Hayase, Yejin Choi, Noah A. SmithComments: NeurIPS 2025 (spotlight)Subjects: Computation and Language (cs.CL)
- [945] arXiv:2506.19154 (replaced) [pdf, ps, other]
-
Title: Lightweight RGB-T Tracking with Mobile Vision TransformersComments: Accepted for publication in ICASSP 2026. Implementation Code AvailableSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [946] arXiv:2506.21074 (replaced) [pdf, ps, other]
-
Title: CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame RateComments: 5 pages, accepted to ICASSP 2026Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [947] arXiv:2506.21551 (replaced) [pdf, ps, other]
-
Title: Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without TestComments: Accepted at ICLR 2026Subjects: Machine Learning (cs.LG)
- [948] arXiv:2506.21588 (replaced) [pdf, ps, other]
-
Title: Understanding Verbatim Memorization in LLMs Through Circuit DiscoveryComments: The First Workshop on Large Language Model Memorization @ ACL 2025, Vienna, August 1st, 2025Subjects: Computation and Language (cs.CL)
- [949] arXiv:2506.22316 (replaced) [pdf, ps, other]
-
Title: Evaluating Scoring Bias in LLM-as-a-JudgeComments: Accepted by DASFAA 2026Subjects: Computation and Language (cs.CL)
- [950] arXiv:2506.23729 (replaced) [pdf, ps, other]
-
Title: Proteus-ID: ID-Consistent and Motion-Coherent Video CustomizationComments: SIGGRAPH Asia 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [951] arXiv:2507.01099 (replaced) [pdf, ps, other]
-
Title: Geometry-aware 4D Video Generation for Robot ManipulationComments: ICLR 2026; Project website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [952] arXiv:2507.01667 (replaced) [pdf, ps, other]
-
Title: What does really matter in image goal navigation?Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [953] arXiv:2507.02798 (replaced) [pdf, ps, other]
-
Title: No time to train! Training-Free Reference-Based Instance SegmentationComments: PreprintSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [954] arXiv:2507.03545 (replaced) [pdf, ps, other]
-
Title: DOME: Improving Signal-to-Noise in Stochastic Gradient Descent via Sharp-Direction Subspace FilteringSubjects: Machine Learning (cs.LG)
- [955] arXiv:2507.04075 (replaced) [pdf, ps, other]
-
Title: Accurate and Efficient World Modeling with Masked Latent TransformersSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [956] arXiv:2507.05387 (replaced) [pdf, ps, other]
-
Title: The Generalization Ridge: Information Flow in Natural Language GenerationSubjects: Computation and Language (cs.CL)
- [957] arXiv:2507.06011 (replaced) [pdf, ps, other]
-
Title: ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the EdgeAuthors: Daghash K. Alqahtani, Maria A. Rodriguez, Muhammad Aamir Cheema, Hamid Rezatofighi, Adel N. ToosiSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV)
- [958] arXiv:2507.06465 (replaced) [pdf, ps, other]
-
Title: Temporal Motif Participation Profiles for Analyzing Node Similarity in Temporal NetworksComments: Proceedings of the 17th International Conference on Social Networks Analysis and Mining (ASONAM 2026)Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
- [959] arXiv:2507.08261 (replaced) [pdf, ps, other]
-
Title: Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial AttacksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [960] arXiv:2507.09580 (replaced) [pdf, ps, other]
-
Title: AICrypto: Evaluating Cryptography Capabilities of Large Language ModelsAuthors: Yu Wang, Yijian Liu, Liheng Ji, Han Luo, Wenjie Li, Xiaofei Zhou, Chiyun Feng, Puji Wang, Yuhan Cao, Geyuan Zhang, Xiaojian Li, Rongwu Xu, Yilei Chen, Tianxing HeSubjects: Cryptography and Security (cs.CR)
- [961] arXiv:2507.15975 (replaced) [pdf, ps, other]
-
Title: Fast Task Planning with Neuro-Symbolic RelaxationComments: 8 pages, 6 figuresSubjects: Robotics (cs.RO)
- [962] arXiv:2507.18393 (replaced) [pdf, ps, other]
-
Title: PALM: PAnoramic Learning Map Integrating Learning Analytics and Curriculum Map for Scalable Insights Across CoursesComments: Full paper published in the Proceedings of the IEEE SMC 2025 conferenceSubjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
- [963] arXiv:2507.20534 (replaced) [pdf, ps, other]
-
Title: Kimi K2: Open Agentic IntelligenceAuthors: Kimi Team: Yifan Bai, Yiping Bao, Y. Charles, Cheng Chen, Guanduo Chen, Haiting Chen, Huarong Chen, Jiahao Chen, Ningxin Chen, Ruijue Chen, Yanru Chen, Yuankun Chen, Yutian Chen, Zhuofu Chen, Jialei Cui, Hao Ding, Mengnan Dong, Angang Du, Chenzhuang Du, Dikang Du, Yulun Du, Yu Fan, Yichen Feng, Kelin Fu, Bofei Gao, Chenxiao Gao, Hongcheng Gao, Peizhong Gao, Tong Gao, Yuyao Ge, Shangyi Geng, Qizheng Gu, Xinran Gu, Longyu Guan, Haiqing Guo, Jianhang Guo, Xiaoru Hao, Tianhong He, Weiran He, Wenyang He, Yunjia He, Chao Hong, Hao Hu, Yangyang Hu, Zhenxing Hu, Weixiao Huang, Zhiqi Huang, Zihao Huang, Tao Jiang, Zhejun Jiang, Xinyi Jin, Yongsheng Kang, Guokun Lai, Cheng Li, Fang Li, Haoyang Li, Ming Li, Wentao Li, Yang Li, Yanhao Li, Yiwei Li, Zhaowei Li, Zheming Li, Hongzhan Lin, Xiaohan Lin, Zongyu Lin, et al. (133 additional authors not shown)Comments: tech report of Kimi K2, with minor updatesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [964] arXiv:2507.20658 (replaced) [pdf, ps, other]
-
Title: Trustworthy AI-based crack-tip segmentation using domain-guided explanationsComments: This is the Accepted Manuscript version of an article accepted for publication in Machine Learning: Science and Technology. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. The Version of Record is available online at this https URLSubjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
- [965] arXiv:2507.21129 (replaced) [pdf, ps, other]
-
Title: Measuring and Analyzing Intelligence via Contextual Uncertainty in Large Language Models using Information-Theoretic MetricsAuthors: Jae Wan ShimSubjects: Artificial Intelligence (cs.AI)
- [966] arXiv:2507.21802 (replaced) [pdf, ps, other]
-
Title: MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDESubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [967] arXiv:2507.23440 (replaced) [pdf, ps, other]
-
Title: Self-Foveate: Enhancing Diversity and Difficulty of Synthesized Instructions from Unsupervised Text via Multi-Level FoveationComments: Accepted to ACL 2025 (Findings). 23 pages, 4 figuresSubjects: Artificial Intelligence (cs.AI)
- [968] arXiv:2508.01725 (replaced) [pdf, ps, other]
-
Title: Imbalance-Robust and Sampling-Efficient Continuous Conditional GANs via Adaptive Vicinity and Auxiliary RegularizationSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [969] arXiv:2508.02016 (replaced) [pdf, ps, other]
-
Title: Dynamic Context Adaptation for Consistent Role-Playing Agents with Retrieval-Augmented GenerationsComments: preprintSubjects: Artificial Intelligence (cs.AI)
- [970] arXiv:2508.02232 (replaced) [pdf, ps, other]
-
Title: Eye2Recall: Exploring the Design of Enhancing Reminiscence Activities via Eye Tracking-Based LLM-Powered Interaction Experience for Older AdultsAuthors: Lei Han, Mingnan Wei, Qiongyan Chen, Anqi Wang, Rong Pang, Kefei Liu, Rongrong Chen, David YipSubjects: Human-Computer Interaction (cs.HC)
- [971] arXiv:2508.02831 (replaced) [pdf, ps, other]
-
Title: Affine-Equivariant Kernel Space Encoding for NeRF EditingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [972] arXiv:2508.03516 (replaced) [pdf, ps, other]
-
Title: DSKC: Domain Style Modeling with Adaptive Knowledge Consolidation for Exemplar-free Lifelong Person Re-IdentificationComments: 11 papges, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [973] arXiv:2508.03891 (replaced) [pdf, ps, other]
-
Title: Confidence Driven Classification of Application Types in the Presence of Background Network TrafficComments: Additional clarification; pending submissionSubjects: Networking and Internet Architecture (cs.NI)
- [974] arXiv:2508.04015 (replaced) [pdf, ps, other]
-
Title: A Novel Hierarchical Co-Optimization Framework for Coordinated Task Scheduling and Power Dispatch in Computing Power NetworksSubjects: Networking and Internet Architecture (cs.NI)
- [975] arXiv:2508.04136 (replaced) [pdf, ps, other]
-
Title: UniFGVC: Universal Training-Free Few-Shot Fine-Grained Vision Classification via Attribute-Aware Multimodal RetrievalSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [976] arXiv:2508.06577 (replaced) [pdf, ps, other]
-
Title: Privacy-Aware Predictions in Participatory BudgetingAuthors: Juan Zambrano, Clément Contet, Jairo Gudiño-Rosero, Felipe Garrido-Lucero, Umberto Grandi, César HidalgoSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
- [977] arXiv:2508.07180 (replaced) [pdf, ps, other]
-
Title: Code2Bench: Scaling Source and Rigor for Dynamic Benchmark ConstructionSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
- [978] arXiv:2508.07468 (replaced) [pdf, ps, other]
-
Title: CP-Agent: Agentic Constraint ProgrammingAuthors: Stefan SzeiderSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Software Engineering (cs.SE)
- [979] arXiv:2508.08745 (replaced) [pdf, ps, other]
-
Title: Towards Full Candidate Interaction: A Comprehensive Comparison Network for Better Route RecommendationSubjects: Information Retrieval (cs.IR)
- [980] arXiv:2508.09131 (replaced) [pdf, ps, other]
-
Title: Training-Free Text-Guided Color Editing with Multi-Modal Diffusion TransformerAuthors: Zixin Yin, Xili Dai, Ling-Hao Chen, Deyu Zhou, Jianan Wang, Duomin Wang, Gang Yu, Lionel M. Ni, Lei Zhang, Heung-Yeung ShumComments: this https URLSubjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [981] arXiv:2508.10801 (replaced) [pdf, ps, other]
-
Title: Object Fidelity Diffusion for Remote Sensing Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [982] arXiv:2508.11175 (replaced) [pdf, ps, other]
-
Title: The Role of Entanglement in Quantum Reservoir Computing with Coupled Kerr Nonlinear OscillatorsSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Signal Processing (eess.SP)
- [983] arXiv:2508.11847 (replaced) [pdf, ps, other]
-
Title: Dropping Just a Handful of Preferences Can Change Top Large Language Model RankingsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [984] arXiv:2508.13232 (replaced) [pdf, ps, other]
-
Title: On Modeling and Solving the Boltzmann EquationAuthors: Liliane Basso BarichelloSubjects: Mathematical Physics (math-ph); Numerical Analysis (math.NA)
- [985] arXiv:2508.13755 (replaced) [pdf, ps, other]
-
Title: Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive ExplorationAuthors: Zhicheng Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Dongchun Xie, Hanhui Li, Yiwei Wang, Xiaodan Liang, Jing TangComments: 18 pages, 14 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [986] arXiv:2508.14330 (replaced) [pdf, ps, other]
-
Title: Multi-view Graph Condensation via Tensor DecompositionAuthors: Nícolas Roque dos Santos, Dawon Ahn, Diego Minatel, Alneu de Andrade Lopes, Evangelos E. PapalexakisComments: Accepted at WSDM 2026Subjects: Machine Learning (cs.LG)
- [987] arXiv:2508.15784 (replaced) [pdf, ps, other]
-
Title: Emergent time-keeping mechanisms in a deep reinforcement learning agent performing an interval timing taskComments: Accepted at 2025 Artificial Life ConferenceSubjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG)
- [988] arXiv:2508.18742 (replaced) [pdf, ps, other]
-
Title: Constraint Matters: Multi-Modal Representation for Reducing Mixed-Integer Linear programmingAuthors: Jiajun Li, Yixuan Li, Ran Hou, Yu Ding, Shisi Guan, Jiahui Duan, Xiongwei Han, Tao Zhong, Vincent Chau, Weiwei Wu, Wanyuan WangComments: Accecpted by ICLR 2026Subjects: Machine Learning (cs.LG)
- [989] arXiv:2508.19264 (replaced) [pdf, ps, other]
-
Title: The Variance Paradox: How AI Reduces Diversity but Increases NoveltyAuthors: Bijean GhafouriSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
- [990] arXiv:2508.19548 (replaced) [pdf, ps, other]
-
Title: When Routers, Switches and Interconnects Compute: A processing-in-interconnect Paradigm for Scalable Neuromorphic AISubjects: Neural and Evolutionary Computing (cs.NE); Hardware Architecture (cs.AR); Networking and Internet Architecture (cs.NI)
- [991] arXiv:2509.00060 (replaced) [pdf, ps, other]
-
Title: Correspondence-Free, Function-Based Sim-to-Real Learning for Deformable Surface ControlAuthors: Yingjun Tian, Guoxin Fang, Renbo Su, Aoran Lyu, Neelotpal Dutta, Weiming Wang, Simeon Gill, Andrew Weightman, Charlie C.L. WangComments: arXiv admin note: text overlap with arXiv:2405.08935Subjects: Robotics (cs.RO)
- [992] arXiv:2509.03219 (replaced) [pdf, ps, other]
-
Title: Uncertainty-driven Adaptive ExplorationComments: This is an extended version (full paper + appendix) of the paper titled "A Novel Framework for Uncertainty-Driven Adaptive Exploration" accepted as a full paper at AAMAS 2026. The accepted paper can be found in this https URLSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [993] arXiv:2509.05356 (replaced) [pdf, ps, other]
-
Title: Spiking Neural Networks for Continuous Control via End-to-End Model-Based LearningSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [994] arXiv:2509.08936 (replaced) [pdf, ps, other]
-
Title: Quasi-Trefftz spaces for a first-order formulation of the Helmholtz equationSubjects: Numerical Analysis (math.NA)
- [995] arXiv:2509.10400 (replaced) [pdf, ps, other]
-
Title: TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile VerificationSubjects: Hardware Architecture (cs.AR)
- [996] arXiv:2509.11206 (replaced) [pdf, ps, other]
-
Title: Evalet: Evaluating Large Language Models by Fragmenting Outputs into FunctionsComments: The first two authors hold equal contribution. Conditionally accepted to CHI 2026Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [997] arXiv:2509.11361 (replaced) [pdf, ps, other]
-
Title: MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt OptimizationAuthors: Yichen Han, Yuhang Han, Siteng Huang, Guanyu Liu, Zhengpeng Zhou, Bojun Liu, Yujia Zhang, Isaac N Shi, Lewei He, Tianyu ShiSubjects: Artificial Intelligence (cs.AI)
- [998] arXiv:2509.12203 (replaced) [pdf, ps, other]
-
Title: LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit CorrespondenceComments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [999] arXiv:2509.12517 (replaced) [pdf, ps, other]
-
Title: Interaction Context Often Increases Sycophancy in LLMsComments: To appear in the proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2026)Subjects: Human-Computer Interaction (cs.HC)
- [1000] arXiv:2509.14565 (replaced) [pdf, ps, other]
-
Title: DiffVL: Diffusion-Based Visual Localization on 2D Maps via BEV-Conditioned GPS DenoisingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1001] arXiv:2509.14863 (replaced) [pdf, ps, other]
-
Title: Exploring the Global-to-Local Attention Scheme in Graph Transformers: An Empirical StudyComments: The article has been accepted by Frontiers of Computer Science (FCS), with the DOI: {10.1007/s11704-026-51718-4}Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1002] arXiv:2509.15069 (replaced) [pdf, ps, other]
-
Title: Efficient Computation of Time-Index Powered Weighted Sums Using Cascaded AccumulatorsComments: This work has been submitted to the IEEE for possible publicationSubjects: Signal Processing (eess.SP); Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA)
- [1003] arXiv:2509.15090 (replaced) [pdf, ps, other]
-
Title: Emergent Alignment via CompetitionSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)
- [1004] arXiv:2509.16832 (replaced) [pdf, ps, other]
-
Title: L2M-Reg: Building-level Uncertainty-aware Registration of Outdoor LiDAR Point Clouds and Semantic 3D City ModelsComments: Accepted version by ISPRS Journal of Photogrammetry and Remote SensingSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)
- [1005] arXiv:2509.17360 (replaced) [pdf, ps, other]
-
Title: Cortex: Achieving Low-Latency, Cost-Efficient Remote Data Access For LLM via Semantic-Aware Knowledge CachingSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
- [1006] arXiv:2509.17382 (replaced) [src]
-
Title: Bias-variance Tradeoff in Tensor EstimationAuthors: Shivam Kumar, Haotian Xu, Carlos Misael Madrid Padilla, Yuehaw Khoo, Oscar Hernan Madrid Padilla, Daren WangComments: We are withdrawing the paper in order to update it with more consistent results and improved presentation. We plan to strengthen the analysis and ensure that the results are aligned more clearly throughout the manuscriptSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
- [1007] arXiv:2509.18662 (replaced) [pdf, ps, other]
-
Title: FlexGuard: A Design Space for On-Body Feedback for Safety Scaffolding in Strength TrainingAuthors: Panayu Keelawat, Darshan Nere, Jyotshna Bali, Rezky Dwisantika, Yogesh Phalak, Ardalan Kahak, Anekan Naicker, Liang He, Suyi Li, Yan ChenSubjects: Human-Computer Interaction (cs.HC)
- [1008] arXiv:2509.19354 (replaced) [pdf, ps, other]
-
Title: GeoResponder: Towards Building Geospatial LLMs for Time-Critical Disaster ResponseSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [1009] arXiv:2509.19491 (replaced) [pdf, ps, other]
-
Title: Martingale Projections and Quantum DecoherenceComments: 21 pagesSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Probability (math.PR)
- [1010] arXiv:2509.21249 (replaced) [pdf, ps, other]
-
Title: Decipher-MR: A Vision-Language Foundation Model for 3D MRI RepresentationsAuthors: Zhijian Yang, Noel DSouza, Istvan Megyeri, Xiaojian Xu, Amin Honarmandi Shandiz, Farzin Haddadpour, Krisztian Koos, Laszlo Rusko, Emanuele Valeriano, Bharadwaj Swaninathan, Lei Wu, Parminder Bhatia, Taha Kass-Hout, Erhan BasSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1011] arXiv:2509.21543 (replaced) [pdf, ps, other]
-
Title: Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain GenerationAuthors: Jinbang Huang, Zhiyuan Li, Yuanzhao Hu, Zhanguang Zhang, Mark Coates, Xingyue Quan, Yingxue ZhangComments: 31 pages, 6 figuresSubjects: Robotics (cs.RO)
- [1012] arXiv:2509.21723 (replaced) [pdf, ps, other]
-
Title: VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic ManipulationComments: accepted by ICLR 2026. The project link is this https URLSubjects: Robotics (cs.RO)
- [1013] arXiv:2509.21875 (replaced) [pdf, ps, other]
-
Title: LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge SignalsComments: ICLR 2026Subjects: Computation and Language (cs.CL)
- [1014] arXiv:2509.21984 (replaced) [pdf, ps, other]
-
Title: Beyond the Vision Encoder: Identifying and Mitigating Spatial Bias in Large Vision-Language ModelsAuthors: Yingjie Zhu, Xuefeng Bai, Kehai Chen, Yang Xiang, Youcheng Pan, Yongshuai Hou, Weili Guan, Jun Yu, Min ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [1015] arXiv:2509.22840 (replaced) [pdf, ps, other]
-
Title: A Capacity-Based Rationale for Multi-Head AttentionAuthors: Micah AdlerSubjects: Machine Learning (cs.LG)
- [1016] arXiv:2509.22984 (replaced) [pdf, ps, other]
-
Title: From Deferral to Learning: Online In-Context Knowledge Distillation for LLM CascadesComments: 32 pages, 6 figures, 23 tables, under reviewSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [1017] arXiv:2509.23155 (replaced) [pdf, ps, other]
-
Title: LAGEA: Language Guided Embodied Agents for Robotic ManipulationSubjects: Robotics (cs.RO)
- [1018] arXiv:2509.23286 (replaced) [pdf, ps, other]
-
Title: A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language ModelsAuthors: Wonje Jeung, Sangyeon Yoon, Yoonjun Cho, Dongjae Jeon, Sangwoo Shin, Hyesoo Hong, Albert NoComments: Accepted at ICLR 2026. Code and models are available at this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [1019] arXiv:2509.23759 (replaced) [pdf, ps, other]
-
Title: VioPTT: Violin Technique-Aware Transcription from Synthetic Data AugmentationSubjects: Sound (cs.SD); Machine Learning (cs.LG)
- [1020] arXiv:2509.23873 (replaced) [pdf, ps, other]
-
Title: Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-TuningAuthors: Shaobo Wang, Jiaming Wang, Jiajun Zhang, Cong Wang, Yue Min, Zichen Wen, Xingzhang Ren, Fei Huang, Huiqiang Jiang, Junyang Lin, Dayiheng Liu, Linfeng ZhangComments: 26 pages, 9 figures, 15 tablesSubjects: Computation and Language (cs.CL)
- [1021] arXiv:2509.24073 (replaced) [pdf, ps, other]
-
Title: "Having Lunch Now": Understanding How Users Engage with a Proactive Agent for Daily Planning and Self-ReflectionSubjects: Human-Computer Interaction (cs.HC)
- [1022] arXiv:2509.24440 (replaced) [pdf, ps, other]
-
Title: Evaluating Relayed and Switched Quantum Key Distribution (QKD) Network ArchitecturesAuthors: Antonis Selentis, Nikolas Makris, Alkinoos Papageorgopoulos, Persefoni Konteli, Konstantinos Christodoulopoulos, George T. Kanellos, Dimitris SyvridisSubjects: Cryptography and Security (cs.CR)
- [1023] arXiv:2509.24521 (replaced) [pdf, ps, other]
-
Title: More than MACs: Exploring the Role of Neuromorphic Engineering in the Age of LLMsAuthors: Wilkie Olin-AmmentorpComments: 36 pages, 11 figures, reviewSubjects: Neural and Evolutionary Computing (cs.NE)
- [1024] arXiv:2509.24894 (replaced) [pdf, ps, other]
-
Title: Improved Stochastic Optimization of LogSumExpComments: 17 pages, 5 figures, 2 tables; updated experiment in subsection 3.3Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
- [1025] arXiv:2509.26096 (replaced) [pdf, ps, other]
-
Title: EVODiff: Entropy-aware Variance Optimized Diffusion InferenceComments: NeurIPS 2025, 41 pages, 14 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
- [1026] arXiv:2509.26468 (replaced) [pdf, ps, other]
-
Title: fev-bench: A Realistic Benchmark for Time Series ForecastingAuthors: Oleksandr Shchur, Abdul Fatir Ansari, Caner Turkmen, Lorenzo Stella, Nick Erickson, Pablo Guerron, Michael Bohlke-Schneider, Yuyang WangSubjects: Machine Learning (cs.LG)
- [1027] arXiv:2510.00294 (replaced) [pdf, ps, other]
-
Title: Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1028] arXiv:2510.00457 (replaced) [pdf, ps, other]
-
Title: UrbanGraph: Physics-Informed Spatio-Temporal Dynamic Heterogeneous Graphs for Urban Microclimate PredictionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
- [1029] arXiv:2510.00845 (replaced) [pdf, ps, other]
-
Title: Mechanistic Interpretability as Statistical Estimation: A Variance AnalysisSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [1030] arXiv:2510.01719 (replaced) [pdf, ps, other]
-
Title: What MLLMs Learn about When they Learn about Multimodal Reasoning: Perception, Reasoning, or their Integration?Subjects: Computation and Language (cs.CL)
- [1031] arXiv:2510.02125 (replaced) [pdf, ps, other]
-
Title: Do AI Models Perform Human-like Abstract Reasoning Across Modalities?Authors: Claas Beger, Ryan Yi, Shuhao Fu, Kaleda Denton, Arseny Moskvichev, Sarah W. Tsai, Sivasankaran Rajamanickam, Melanie MitchellComments: 9 pages, 3 figuresSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [1032] arXiv:2510.03750 (replaced) [pdf, ps, other]
-
Title: Evaluating High-Resolution Piano Sustain Pedal Depth Estimation with Musically Informed MetricsSubjects: Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [1033] arXiv:2510.04838 (replaced) [pdf, ps, other]
-
Title: Beyond Random: Automatic Inner-loop Optimization in Dataset DistillationComments: Accepted by NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [1034] arXiv:2510.04995 (replaced) [pdf, ps, other]
-
Title: Power Transform Revisited: Numerically Stable, and FederatedComments: 24 pages, 17 figures, 4 tables. AISTATS 2026. Project page see this https URLSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
- [1035] arXiv:2510.05052 (replaced) [pdf, ps, other]
-
Title: Proactive defense against LLM JailbreakSubjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
- [1036] arXiv:2510.05588 (replaced) [pdf, ps, other]
-
Title: A New Quantum Linear System Algorithm Beyond the Condition Number and Its Application to Solving Multivariate Polynomial SystemsAuthors: Jianqiang LiComments: 48 pagesSubjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)
- [1037] arXiv:2510.05742 (replaced) [pdf, ps, other]
-
Title: Vipera: Blending Visual and LLM-Driven Guidance for Systematic Auditing of Text-to-Image Generative AIAuthors: Yanwei Huang, Wesley Hanwen Deng, Sijia Xiao, Motahhare Eslami, Jason I. Hong, Arpit Narechania, Adam PererComments: 17 pages, 8 figures; Accepted by CHI 2026Subjects: Human-Computer Interaction (cs.HC)
- [1038] arXiv:2510.06048 (replaced) [pdf, ps, other]
-
Title: BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model PretrainingSubjects: Machine Learning (cs.LG)
- [1039] arXiv:2510.06548 (replaced) [pdf, ps, other]
-
Title: Reusing Overtrained Language Models Saturates ScalingSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [1040] arXiv:2510.06738 (replaced) [pdf, ps, other]
-
Title: AWM: Accurate Weight-Matrix Fingerprint for Large Language ModelsComments: ICLR 2026Subjects: Computation and Language (cs.CL)
- [1041] arXiv:2510.07096 (replaced) [pdf, ps, other]
-
Title: Modeling Sarcastic Speech: Semantic and Prosodic Cues in a Speech Synthesis FrameworkSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [1042] arXiv:2510.07439 (replaced) [pdf, ps, other]
-
Title: Quantum Filtering and Analysis of Multiplicities in Eigenvalue SpectraSubjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)
- [1043] arXiv:2510.07459 (replaced) [pdf, ps, other]
-
Title: MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series ForecastingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1044] arXiv:2510.07707 (replaced) [pdf, ps, other]
-
Title: Causality Guided Representation Learning for Cross-Style Hate Speech DetectionComments: Accepted by the ACM Web Conference 2026 (WWW 26)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1045] arXiv:2510.07735 (replaced) [pdf, ps, other]
-
Title: GeoGen: A Two-stage Coarse-to-Fine Framework for Fine-grained Synthetic Location-based Social Network Trajectory GenerationSubjects: Machine Learning (cs.LG)
- [1046] arXiv:2510.07743 (replaced) [pdf, ps, other]
-
Title: OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM AlignmentComments: The first two authors contributed equally. Updated OpenRubrics dataset, RMs, and resultsSubjects: Computation and Language (cs.CL)
- [1047] arXiv:2510.10000 (replaced) [pdf, ps, other]
-
Title: Tight Robustness Certificates and Wasserstein Distributional Attacks for Deep Neural NetworksSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
- [1048] arXiv:2510.10706 (replaced) [pdf, ps, other]
-
Title: Designing ReLU Generative Networks to Enumerate Trees with a Given Tree Edit DistanceSubjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM)
- [1049] arXiv:2510.11129 (replaced) [pdf, ps, other]
-
Title: video-SALMONN S: Memory-Enhanced Streaming Audio-Visual LLMSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [1050] arXiv:2510.12036 (replaced) [pdf, ps, other]
-
Title: On the Interplay between Human Label Variation and Model FairnessComments: 10 pages, 7 figures. Accepted to EACL Findings 2026Subjects: Computation and Language (cs.CL)
- [1051] arXiv:2510.13847 (replaced) [pdf, ps, other]
-
Title: DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1052] arXiv:2510.14009 (replaced) [pdf, ps, other]
-
Title: Noise-Adaptive Layerwise Learning Rates: Accelerating Geometry-Aware Optimization for Deep Neural Network TrainingSubjects: Machine Learning (cs.LG)
- [1053] arXiv:2510.14235 (replaced) [pdf, ps, other]
-
Title: Spiking Neural Network Architecture Search: A SurveyComments: 19 pages, 6 figures, IEEE Computational Intelligence MagazineSubjects: Neural and Evolutionary Computing (cs.NE)
- [1054] arXiv:2510.16004 (replaced) [pdf, ps, other]
-
Title: PAINT: Parallel-in-time Neural Twins for Dynamical System ReconstructionComments: 28 pages, 23 figuresSubjects: Artificial Intelligence (cs.AI); Fluid Dynamics (physics.flu-dyn)
- [1055] arXiv:2510.16082 (replaced) [pdf, ps, other]
-
Title: BioGen: An Evidence-Grounded Framework for Interpreting RNA-seq Gene Clusters in Antimicrobial Resistance ResearchSubjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1056] arXiv:2510.17406 (replaced) [pdf, ps, other]
-
Title: S4ECG: Exploring the impact of long-range interactions for arrhythmia predictionSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
- [1057] arXiv:2510.17881 (replaced) [pdf, ps, other]
-
Title: POPI: Personalizing LLMs via Optimized Preference InferenceAuthors: Yizhuo Chen, Xin Liu, Ruijie Wang, Zheng Li, Pei Chen, Changlong Yu, Priyanka Nigam, Meng Jiang, Bing YinSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [1058] arXiv:2510.18190 (replaced) [pdf, ps, other]
-
Title: Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale NetworkComments: Accepted to ICASSP2026 conferenceSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [1059] arXiv:2510.20728 (replaced) [pdf, ps, other]
- [1060] arXiv:2510.22124 (replaced) [pdf, ps, other]
-
Title: Efficient Utility-Preserving Machine Unlearning with Implicit Gradient SurgeryAuthors: Shiji Zhou (Institute of Artificial Intelligence, Beihang University, Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University), Tianbai Yu (University of Illinois at Urbana-Champaign), Zhi Zhang (University of Amsterdam), Heng Chang (Tsinghua University), Xiao Zhou (Tsinghua University), Dong Wu (YanTron Technology Co. Ltd), Han Zhao (University of Illinois at Urbana-Champaign)Comments: Corresponding author: Shiji Zhou ([email protected]). Shiji Zhou and Tianbai Yu contributed equallySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1061] arXiv:2510.22926 (replaced) [pdf, ps, other]
-
Title: Simple Denoising Diffusion Language ModelsAuthors: Huaisheng Zhu, Zhengyu Chen, Shijie Zhou, Zhihui Xie, Yige Yuan, Shiqi Chen, Zhimeng Guo, Siyuan Xu, Hangfan Zhang, Vasant Honavar, Teng XiaoSubjects: Machine Learning (cs.LG)
- [1062] arXiv:2510.24318 (replaced) [pdf, ps, other]
-
Title: Transformers can do Bayesian ClusteringSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1063] arXiv:2510.24372 (replaced) [pdf, ps, other]
-
Title: Bayesian Speech Synthesizers Can Learn from Multiple TeachersSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [1064] arXiv:2510.24473 (replaced) [pdf, ps, other]
-
Title: Methodology for Comparing Machine Learning Algorithms for Survival AnalysisAuthors: Lucas Buk Cardoso, Simone Aldrey Angelo, Yasmin Pacheco Gil Bonilha, Fernando Maia, Adeylson Guimarães Ribeiro, Maria Paula Curado, Gisele Aparecida Fernandes, Vanderlei Cunha Parro, Flávio Almeida de Magalhães Cipparrone, Alexandre Dias Porto Chiavegatto Filho, Victor Wünsch Filho, Tatiana Natasha ToporcovSubjects: Machine Learning (cs.LG)
- [1065] arXiv:2510.26275 (replaced) [pdf, ps, other]
-
Title: A Research Roadmap for Augmenting Software Engineering Processes and Software Products with Generative AIAuthors: Domenico Amalfitano, Andreas Metzger, Marco Autili, Tommaso Fulcini, Tobias Hey, Jan Keim, Patrizio Pelliccione, Vincenzo Scotti, Anne Koziolek, Raffaela Mirandola, Andreas VogelsangSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
- [1066] arXiv:2510.26418 (replaced) [pdf, ps, other]
-
Title: Chain-of-Thought HijackingSubjects: Artificial Intelligence (cs.AI)
- [1067] arXiv:2510.26829 (replaced) [pdf, ps, other]
-
Title: Layer of Truth: Probing Belief Shifts under Continual Pre-Training PoisoningSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
- [1068] arXiv:2511.01467 (replaced) [pdf, ps, other]
-
Title: Quantum Information Ordering and Differential PrivacyComments: 36 pages, 2 figures; Significant revision: This manuscript has been restructured to focus exclusively on Quantum Information Ordering and Privacy definitions. The results regarding Stability, which appeared in earlier versions of this preprint, have been moved to a separate companion paper: arXiv:2602.01177Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Machine Learning (cs.LG)
- [1069] arXiv:2511.02280 (replaced) [pdf, ps, other]
-
Title: SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL TuningAuthors: Fangxun Shu, Yongjie Ye, Yue Liao, Zijian Kang, Weijie Yin, Jiacong Wang, Xiao Liang, Shuicheng Yan, Chao FengSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [1070] arXiv:2511.02570 (replaced) [pdf, ps, other]
-
Title: Dynamic Priors in Bayesian Optimization for Hyperparameter OptimizationAuthors: Lukas Fehring, Marcel Wever, Maximilian Spliethöver, Leona Hennig, Henning Wachsmuth, Marius LindauerComments: 8 pages plus references and appendixSubjects: Machine Learning (cs.LG)
- [1071] arXiv:2511.03911 (replaced) [pdf, ps, other]
-
Title: DecoHD: Decomposed Hyperdimensional Classification under Extreme Memory BudgetsComments: Accepted to DATE 2026Subjects: Machine Learning (cs.LG)
- [1072] arXiv:2511.03938 (replaced) [pdf, ps, other]
-
Title: LogHD: Robust Compression of Hyperdimensional Classifiers via Logarithmic Class-Axis ReductionAuthors: Sanggeon Yun, Hyunwoo Oh, Ryozo Masukawa, Pietro Mercati, Nathaniel D. Bastian, Mohsen ImaniComments: Accepted to DATE 2026Subjects: Machine Learning (cs.LG)
- [1073] arXiv:2511.04136 (replaced) [pdf, ps, other]
-
Title: Implementation of transformer-based LLMs with large-scale optoelectronic neurons on a CMOS compatible platformAuthors: Neil Na, Chih-Hao Cheng, Shou-Chen Hsu, Che-Fu Liang, Chung-Chih Lin, Nathaniel Y. Na, Andrew I. Shieh, Erik Chen, Haisheng Rong, Richard A. SorefSubjects: Emerging Technologies (cs.ET); Applied Physics (physics.app-ph); Optics (physics.optics)
- [1074] arXiv:2511.04188 (replaced) [pdf, ps, other]
- [1075] arXiv:2511.05522 (replaced) [pdf, ps, other]
-
Title: AIRMap: AI-Generated Radio Maps for Wireless Digital TwinsAuthors: Ali Saeizadeh, Miead Tehrani-Moayyed, Davide Villa, J. Gordon Beattie Jr., Pedram Johari, Stefano Basagni, Tommaso MelodiaComments: 15 pages, 19 figures, This paper has been submitted to the IEEE Transactions for possible publicationSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)
- [1076] arXiv:2511.05924 (replaced) [pdf, ps, other]
-
Title: DiScoFormer: Plug-In Density and Score Estimation with TransformersComments: 17 pages, 15 figuresSubjects: Machine Learning (cs.LG)
- [1077] arXiv:2511.06644 (replaced) [pdf, ps, other]
-
Title: UniADC: A Unified Framework for Anomaly Detection and ClassificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1078] arXiv:2511.09483 (replaced) [pdf, ps, other]
-
Title: CrochetBench: Can Vision-Language Models Move from Describing to Doing in Crochet Domain?Comments: code available at this https URLSubjects: Artificial Intelligence (cs.AI)
- [1079] arXiv:2511.10718 (replaced) [pdf, ps, other]
-
Title: Online Price Competition under Generalized Linear DemandsSubjects: Computer Science and Game Theory (cs.GT); Statistics Theory (math.ST); Methodology (stat.ME)
- [1080] arXiv:2511.11460 (replaced) [pdf, ps, other]
-
Title: Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing ClassificationComments: 11 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1081] arXiv:2511.11698 (replaced) [pdf, ps, other]
-
Title: Moirai 2.0: When Less Is More for Time Series ForecastingAuthors: Chenghao Liu, Taha Aksu, Juncheng Liu, Xu Liu, Hanshu Yan, Quang Pham, Silvio Savarese, Doyen Sahoo, Caiming Xiong, Junnan LiComments: 16 pages, 13 figures, and 1 tableSubjects: Machine Learning (cs.LG)
- [1082] arXiv:2511.12323 (replaced) [pdf, ps, other]
-
Title: Computational and Categorical Frameworks of Finite Ternary $Γ$-Semirings: Foundations, Algorithms, and Industrial Modeling ApplicationsAuthors: Chandrasekhar Gokavarapu (Lecturer in Mathematics, Government College (A), Rajahmundry, A.P., India & Research Scholar, Department of Mathematics, Acharya Nagarjuna University, Guntur, A.P., India), Dr D Madhusudhana Rao (Lecturer in Mathematics, Government College For Women (A), Guntur, Andhra Pradesh, India, & Research Supervisor, Dept. of Mathematics, Acharya Nagarjuna University, Guntur, A.P., India)Subjects: Rings and Algebras (math.RA); Logic in Computer Science (cs.LO)
- [1083] arXiv:2511.12960 (replaced) [pdf, ps, other]
-
Title: ENGRAM: Effective, Lightweight Memory Orchestration for Conversational AgentsSubjects: Multiagent Systems (cs.MA)
- [1084] arXiv:2511.12987 (replaced) [pdf, ps, other]
-
Title: Reuse, Don't Recompute: Efficient Large Reasoning Model Inference via Memory OrchestrationSubjects: Multiagent Systems (cs.MA)
- [1085] arXiv:2511.16420 (replaced) [pdf, ps, other]
-
Title: A Fast Relax-and-Round Approach to Unit Commitment for Data Center Own GenerationComments: Limited to 5 pages and this format for IEEE PESGM conferenceSubjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA)
- [1086] arXiv:2511.18793 (replaced) [pdf, ps, other]
-
Title: NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative RecommendationsAuthors: Yejing Wang, Shengyu Zhou, Jinyu Lu, Ziwei Liu, Langming Liu, Maolin Wang, Wenlin Zhang, Feng Li, Wenbo Su, Pengjie Wang, Jian Xu, Xiangyu ZhaoSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1087] arXiv:2511.20348 (replaced) [pdf, ps, other]
-
Title: Material-informed Gaussian Splatting for 3D World Reconstruction in a Digital TwinComments: 8 pages, 5 figures. Accepted to IEEE Intelligent Vehicles Symposium (IV) 2026. Revised version (v3) presents camera-ready publicationSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [1088] arXiv:2511.20511 (replaced) [pdf, ps, other]
-
Title: Efficient Parallel Implementation of the Pilot Assignment Problem in Massive MIMO SystemsJournal-ref: The 26th International Conference on Parallel and Distributed Computing 2025, Applications and TechnologiesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
- [1089] arXiv:2511.21173 (replaced) [pdf, ps, other]
-
Title: Scales of Fréchet means and Karcher quasi-arithmetic meansAuthors: Frank NielsenComments: 14 pages, 1 figureSubjects: Computational Geometry (cs.CG); Information Theory (cs.IT)
- [1090] arXiv:2511.21474 (replaced) [pdf, ps, other]
-
Title: Going with the Speed of Sound: Pushing Neural Surrogates into Highly-turbulent Transonic RegimesAuthors: Fabian Paischer, Leo Cotteleer, Yann Dreze, Richard Kurle, Dylan Rubini, Maurits Bleeker, Tobias Kronlachner, Johannes BrandstetterComments: NeurIPS 2025 ML4PS WorkshopSubjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1091] arXiv:2511.22254 (replaced) [pdf, ps, other]
-
Title: Co-Evolving Agents: Learning from Failures as Hard NegativesAuthors: Yeonsung Jung, Trilok Padhi, Sina Shaham, Dipika Khullar, Joonhyun Jeong, Ninareh Mehrabi, Eunho YangSubjects: Artificial Intelligence (cs.AI)
- [1092] arXiv:2512.00242 (replaced) [pdf, ps, other]
-
Title: Polynomial Neural Sheaf Diffusion: A Spectral Filtering Approach on Cellular SheavesComments: Under Review at ICML 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (stat.ML)
- [1093] arXiv:2512.01834 (replaced) [pdf, ps, other]
-
Title: Mitigating Gender Bias in Depression Detection via Counterfactual InferenceComments: To be published in CSCWD 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
- [1094] arXiv:2512.02537 (replaced) [pdf, ps, other]
-
Title: Numerical Verification of PolyDG Algebraic Solvers for the Pseudo-Stress Stokes ProblemSubjects: Numerical Analysis (math.NA)
- [1095] arXiv:2512.04123 (replaced) [pdf, ps, other]
-
Title: Measuring Agents in ProductionAuthors: Melissa Z. Pan, Negar Arabzadeh, Riccardo Cogo, Yuxuan Zhu, Alexander Xiong, Lakshya A Agrawal, Huanzhi Mao, Emma Shen, Sid Pallerla, Liana Patel, Shu Liu, Tianneng Shi, Xiaoyuan Liu, Jared Quincy Davis, Emmanuele Lacavalla, Alessandro Basile, Shuyi Yang, Paul Castro, Daniel Kang, Joseph E. Gonzalez, Koushik Sen, Dawn Song, Ion Stoica, Matei Zaharia, Marquita EllisSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
- [1096] arXiv:2512.04601 (replaced) [pdf, ps, other]
-
Title: Natural Language Actor-Critic: Scalable Off-Policy Learning in Language SpaceComments: 21 pages, 4 figuresSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
- [1097] arXiv:2512.05747 (replaced) [pdf, ps, other]
-
Title: Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-TuningSubjects: Computation and Language (cs.CL)
- [1098] arXiv:2512.08042 (replaced) [pdf, ps, other]
-
Title: Towards Sustainable Universal Deepfake Detection with Frequency-Domain MaskingAuthors: Chandler Timm C. Doloriel, Habib Ullah, Kristian Hovde Liland, Fadi Al Machot, Ngai-Man CheungComments: Accepted to ACM TOMMSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1099] arXiv:2512.09369 (replaced) [pdf, ps, other]
-
Title: Encoder-Free Knowledge-Graph Reasoning with LLMs via Hyperdimensional Path RetrievalSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
- [1100] arXiv:2512.09506 (replaced) [pdf, ps, other]
-
Title: Beyond Knowledge to Agency: Evaluating Expertise, Autonomy, and Integrity in Finance with CNFinBenchAuthors: Jinru Ding, Chao Ding, Yidong Jiang, Wenrao Pang, Boyi Xiao, Zhiqiang Liu, Jiayuan Chen, Yun Zhong, Tiantian Yuan, Junming Guan, Dawei Cheng, Jie XuSubjects: Computational Engineering, Finance, and Science (cs.CE)
- [1101] arXiv:2512.10415 (replaced) [src]
-
Title: How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code EvaluationComments: This manuscript has been withdrawn by the authors because the methodology and results have been superseded by a more rigorous framework (SPACI and AST-ASIP). The corrected and expanded findings are now available in arXiv:2601.21360. Please cite the new manuscript insteadSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
- [1102] arXiv:2512.12553 (replaced) [pdf, ps, other]
-
Title: Cargo Sherlock: An SMT-Based Checker for Software Trust CostsComments: 12 pages, 7 figures. To appear at the International Conference on Formal Methods for Software Engineering (FormaliSE), April 2026Subjects: Logic in Computer Science (cs.LO); Software Engineering (cs.SE)
- [1103] arXiv:2512.12755 (replaced) [pdf, ps, other]
-
Title: An End-to-End Approach for Microgrid Probabilistic Forecasting and Robust Operation via Decision-focused LearningComments: 10 pagesSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
- [1104] arXiv:2512.13368 (replaced) [pdf, ps, other]
-
Title: BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential RecommendationsAuthors: Mengyang Ma, Xiaopeng Li, Wanyu Wang, Zhaocheng Du, Jingtong Gao, Pengyue Jia, Yuyang Ye, Yiqi Wang, Yunpeng Weng, Weihong Luo, Xiao Han, Xiangyu ZhaoComments: Accepted by WWW'26Subjects: Information Retrieval (cs.IR)
- [1105] arXiv:2512.13614 (replaced) [pdf, ps, other]
-
Title: Quantum channel tomography and estimation by local testComments: 22 pages; v2: revised the Discussion sectionSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
- [1106] arXiv:2512.13746 (replaced) [pdf, ps, other]
-
Title: Probabilistic Predictions of Process-Induced Deformation in Carbon/Epoxy Composites Using a Deep Operator NetworkAuthors: Elham Kiyani, Amit Makarand Deshpande, Madhura Limaye, Zhiwei Gao, Zongren Zou, Sai Aditya Pradeep, Srikanth Pilla, Gang Li, Zhen Li, George Em KarniadakisComments: 21 pages, 13 figuresSubjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
- [1107] arXiv:2512.14640 (replaced) [pdf, ps, other]
-
Title: A Multicenter Benchmark of Multiple Instance Learning Models for Lymphoma Subtyping from HE-stained Whole Slide ImagesAuthors: Rao Muhammad Umer, Daniel Sens, Jonathan Noll, Sohom Dey, Christian Matek, Lukas Wolfseher, Rainer Spang, Ralf Huss, Johannes Raffler, Sarah Reinke, Ario Sadafi, Wolfram Klapper, Katja Steiger, Kristina Schwamborn, Carsten MarrComments: 19 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [1108] arXiv:2512.16415 (replaced) [pdf, ps, other]
-
Title: CountZES: Counting via Zero-Shot Exemplar SelectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1109] arXiv:2512.17600 (replaced) [pdf, ps, other]
-
Title: STAMP/STPA Informed Characterization of Factors Leading to Loss of Control in AI SystemsComments: This new version only corrects some typosSubjects: Computers and Society (cs.CY)
- [1110] arXiv:2512.17776 (replaced) [pdf, ps, other]
-
Title: DEER: A Benchmark for Evaluating Deep Research Agents on Expert Report GenerationAuthors: Janghoon Han, Heegyu Kim, Changho Lee, Dahm Lee, Min Hyung Park, Hosung Song, Stanley Jungkyu Choi, Moontae Lee, Honglak LeeComments: Work in progressSubjects: Computation and Language (cs.CL)
- [1111] arXiv:2512.18718 (replaced) [pdf, ps, other]
-
Title: Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with PromptsComments: AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [1112] arXiv:2512.21577 (replaced) [pdf, ps, other]
-
Title: A Unified Definition of Hallucination: It's The World Model, Stupid!Authors: Emmy Liu, Varun Gangal, Chelsea Zou, Michael Yu, Xiaoqi Huang, Alex Chang, Zhuofu Tao, Karan Singh, Sachin Kumar, Steven Y. FengComments: HalluWorld benchmark in progress. Repo at this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
- [1113] arXiv:2512.21956 (replaced) [pdf, ps, other]
-
Title: Self-attention vector output similarities reveal how machines pay attentionComments: 23 pages, 14 figuresSubjects: Computation and Language (cs.CL)
- [1114] arXiv:2512.22043 (replaced) [pdf, ps, other]
-
Title: HALF: Hollowing Analysis Framework for Binary Programs with Kernel Module AssistanceSubjects: Software Engineering (cs.SE)
- [1115] arXiv:2512.22522 (replaced) [pdf, ps, other]
-
Title: Towards Reliable Evaluation of Adversarial Robustness for Spiking Neural NetworksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1116] arXiv:2512.23066 (replaced) [pdf, ps, other]
-
Title: GLiSE: A Prompt-Driven and ML-Powered Tool for Automated Grey Literature Extraction in Software EngineeringAuthors: Houcine Abdelkader Cherief, Brahim Mahmoudi, Zacharie Chenail-Larcher, Naouel Moha, Quentin Sti'evenart, Florent AvellanedaSubjects: Software Engineering (cs.SE); Digital Libraries (cs.DL)
- [1117] arXiv:2512.24955 (replaced) [pdf, ps, other]
-
Title: MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing ControlComments: This work has been submitted to the IEEE for possible publicationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
- [1118] arXiv:2601.02754 (replaced) [src]
-
Title: Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal PoliciesAuthors: Mingming Zhang, Na Li, Zhuang Feiqing, Hongyang Zheng, Jiangbing Zhou, Wang Wuyin, Sheng-jie Sun, XiaoWei Chen, Junxiong Zhu, Lixin Zou, Chenliang LiComments: Due to the company's compliance requirements, we would like to wait until the paper is officially published before making it publicly available on arXivSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
- [1119] arXiv:2601.04983 (replaced) [pdf, ps, other]
-
Title: Assessing the Impact of Low Resolution Control Electronics on Quantum Neural Network PerformanceAuthors: Rupayan Bhattacharjee, Rohit Sarma Sarkar, Sergi Abadal, Carmen G. Almudever, Eduard AlarconComments: 9 pages, 12 figuresSubjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
- [1120] arXiv:2601.05083 (replaced) [pdf, ps, other]
-
Title: Driving on RegistersAuthors: Ellington Kirby, Alexandre Boulch, Yihong Xu, Yuan Yin, Gilles Puy, Éloi Zablocki, Andrei Bursuc, Spyros Gidaris, Renaud Marlet, Florent Bartoccioni, Anh-Quan Cao, Nermin Samet, Tuan-Hung VU, Matthieu CordSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [1121] arXiv:2601.05110 (replaced) [pdf, ps, other]
-
Title: GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of ThoughtsComments: Code available at this https URLSubjects: Artificial Intelligence (cs.AI)
- [1122] arXiv:2601.05588 (replaced) [pdf, ps, other]
-
Title: Autoregressive Ranking: Bridging the Gap Between Dual and Cross EncodersAuthors: Benjamin Rozonoyer, Chong You, Michael Boratko, Himanshu Jain, Nilesh Gupta, Srinadh Bhojanapalli, Andrew McCallum, Felix YuComments: 22 pages, 5 figuresSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1123] arXiv:2601.05978 (replaced) [pdf, ps, other]
-
Title: AWaRe-SAC: Proactive Slice Admission Control under Weather-Induced Capacity UncertaintySubjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
- [1124] arXiv:2601.06172 (replaced) [pdf, ps, other]
-
Title: The Psychology of Learning from Machines: Anthropomorphic AI and the Paradox of Automation in EducationComments: camera-ready version of paper accepted at IEEE EDUCON 2026 (acknowledgment added and some typos/errors fixed)Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
- [1125] arXiv:2601.07020 (replaced) [pdf, ps, other]
-
Title: TurkBench: A Benchmark for Evaluating Turkish Large Language ModelsAuthors: Çağrı Toraman, Ahmet Kaan Sever, Ayse Aysu Cengiz, Elif Ecem Arslan, Görkem Sevinç, Mete Mert Birdal, Yusuf Faruk Güldemir, Ali Buğra Kanburoğlu, Sezen Felekoğlu, Osman Gürlek, Sarp Kantar, Birsen Şahin Kütük, Büşra Tufan, Elif Genç, Serkan Coşkun, Gupse Ekin Demir, Muhammed Emin Arayıcı, Olgun Dursun, Onur Gungor, Susan Üsküdarlı, Abdullah Topraksoy, Esra DarıcıComments: Accepted by EACL 2026 SIGTURKSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [1126] arXiv:2601.07182 (replaced) [pdf, ps, other]
-
Title: PRPO: Aligning Process Reward with Outcome Reward in Policy OptimizationComments: 8 pages, 2 figures Code is available at: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1127] arXiv:2601.07891 (replaced) [pdf, ps, other]
-
Title: KVzap: Fast, Adaptive, and Faithful KV Cache PruningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [1128] arXiv:2601.08011 (replaced) [pdf, ps, other]
-
Title: TP-Blend: Textual-Prompt Attention Pairing for Precise Object-Style Blending in Diffusion ModelsJournal-ref: Transactions on Machine Learning Research, 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
- [1129] arXiv:2601.08248 (replaced) [pdf, ps, other]
-
Title: Spiking Neural-Invariant Kalman Fusion for Accurate Localization Using Low-Cost IMUsSubjects: Robotics (cs.RO)
- [1130] arXiv:2601.08662 (replaced) [pdf, ps, other]
-
Title: From Classical to Quantum Reinforcement Learning and Its Applications in Quantum Control: A Beginner's TutorialSubjects: Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)
- [1131] arXiv:2601.08900 (replaced) [pdf, ps, other]
-
Title: Comprehensive Machine Learning Benchmarking for Fringe Projection Profilometry with Photorealistic Synthetic DataComments: 19 pages, 10 figures, 5 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [1132] arXiv:2601.08951 (replaced) [pdf, ps, other]
-
Title: PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI HarmAuthors: Jing-Jing Li, Joel Mire, Eve Fleisig, Valentina Pyatkin, Anne Collins, Maarten Sap, Sydney LevineSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [1133] arXiv:2601.09241 (replaced) [pdf, ps, other]
-
Title: When to Trust: A Causality-Aware Calibration Framework for Accurate Knowledge Graph Retrieval-Augmented GenerationComments: Accepted by WWW 2026Subjects: Computation and Language (cs.CL)
- [1134] arXiv:2601.09693 (replaced) [pdf, ps, other]
-
Title: Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug DesignComments: ELLIS ML4Molecules Workshop 2025, ELLIS Unconference, Copenhagen 2025 Revised version with additional timing evaluationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [1135] arXiv:2601.09719 (replaced) [pdf, ps, other]
-
Title: Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [1136] arXiv:2601.10028 (replaced) [pdf, ps, other]
-
Title: Fundamental Limits of Coded Polynomial AggregationComments: 7 pages, 1 figureSubjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)
- [1137] arXiv:2601.10506 (replaced) [pdf, ps, other]
-
Title: The incompatibility of the Condorcet winner and loser criteria with positive involvement and resolvabilityAuthors: Wesley H. HollidayComments: 6 pages, 2 figures. Added theorem for independence of clonesSubjects: Theoretical Economics (econ.TH); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
- [1138] arXiv:2601.10554 (replaced) [pdf, ps, other]
-
Title: DeepUrban: Interaction-Aware Trajectory Prediction and Planning for Automated Driving by Aerial ImageryJournal-ref: 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), Edmonton, AB, Canada, 2024, pp. 221-227Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [1139] arXiv:2601.11265 (replaced) [pdf, ps, other]
-
Title: Sample-Near-Optimal Agnostic Boosting with Improved Running TimeComments: 28 pages, 0 figures. Accepted at the 37th International Conference on Algorithmic Learning Theory (ALT 2026)Subjects: Machine Learning (cs.LG)
- [1140] arXiv:2601.11641 (replaced) [pdf, ps, other]
-
Title: Mixture of Distributions Matters: Dynamic Sparse Attention for Efficient Video Diffusion TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [1141] arXiv:2601.12539 (replaced) [pdf, ps, other]
-
Title: MemeLens: Multilingual Multitask VLMs for MemesAuthors: Ali Ezzat Shahroor, Mohamed Bayan Kmainasi, Abul Hasnat, Dimitar Dimitrov, Giovanni Da San Martino, Preslav Nakov, Firoj AlamComments: disinformation, misinformation, factuality, harmfulness, fake news, propaganda, hateful meme, multimodality, text, imagesSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [1142] arXiv:2601.14096 (replaced) [pdf, ps, other]
-
Title: Remapping and navigation of an embedding space via error minimization: a fundamental organizational principle of cognition in natural and artificial systemsComments: 41 pages, 5 figuresSubjects: Artificial Intelligence (cs.AI)
- [1143] arXiv:2601.14614 (replaced) [pdf, ps, other]
-
Title: Towards Cybersecurity Superintelligence: from AI-guided humans to human-guided AIAuthors: Víctor Mayoral-Vilches, Stefan Rass, Martin Pinzger, Endika Gil-Uriarte, Unai Ayucar-Carbajo, Jon Ander Ruiz-Alcalde, Maite del Mundo de Torres, Luis Javier Navarrete-Lozano, María Sanz-Gómez, Francesco Balassone, Cristóbal R. J. Veas-Chavez, Vanesa Turiel, Alfonso Glera-Picón, Daniel Sánchez-Prieto, Yuri Salvatierra, Paul Zabalegui-Landa, Ruffino Reydel Cabrera-Álvarez, Patxi Mayoral-PizarrosoSubjects: Cryptography and Security (cs.CR)
- [1144] arXiv:2601.14943 (replaced) [pdf, ps, other]
-
Title: State of the Art of LLM-Enabled Interaction with VisualizationAuthors: Mathis Brossier, Tobias Isenberg, Konrad Schönborn, Jonas Unger, Mario Romero, Johanna Björklund, Anders Ynnerman, Lonni BesançonComments: Submitted to STARs of EuroVis'26Subjects: Human-Computer Interaction (cs.HC)
- [1145] arXiv:2601.15468 (replaced) [pdf, ps, other]
-
Title: Learning from Synthetic Data: Limitations of ERMSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
- [1146] arXiv:2601.15540 (replaced) [pdf, ps, other]
-
Title: PRISM: Deriving a White-Box Transformer as a Signal-Noise Decomposition Operator via Maximum Coding Rate ReductionAuthors: Dongchen HuangComments: 12 pages, 6 figures. Derives Transformer as a signal-noise decomposition operator via Maximizing Coding Rate Reduction. Identifies 'Attention Sink' as spectral resonance (Arnold Tongues) and proposes $\pi$-RoPE for dynamical stabilitySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Data Analysis, Statistics and Probability (physics.data-an)
- [1147] arXiv:2601.16174 (replaced) [pdf, ps, other]
-
Title: Beyond Predictive Uncertainty: Reliable Representation Learning with Structural ConstraintsAuthors: Yiyao YangComments: 22 pages, 5 figures, 5 propositionsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [1148] arXiv:2601.16448 (replaced) [pdf, ps, other]
-
Title: Ringmaster: How to juggle high-throughput host OS system calls from TrustZone TEEsSubjects: Cryptography and Security (cs.CR); Operating Systems (cs.OS)
- [1149] arXiv:2601.16540 (replaced) [pdf, ps, other]
-
Title: Do Models Hear Like Us? Probing the Representational Alignment of Audio LLMs and Naturalistic EEGSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [1150] arXiv:2601.17160 (replaced) [pdf, ps, other]
-
Title: Information-Theoretic Causal Bounds under Unmeasured ConfoundingSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
- [1151] arXiv:2601.17482 (replaced) [pdf, ps, other]
-
Title: LogPrism: Unifying Structure and Variable Encoding for Effective Log CompressionSubjects: Software Engineering (cs.SE)
- [1152] arXiv:2601.17490 (replaced) [pdf, ps, other]
-
Title: Smooth Fractal Trees: Analytic Generators and Discrete EquivalenceAuthors: Henk MulderComments: Clarified scope and framing; no changes to resultsSubjects: Dynamical Systems (math.DS); Computational Geometry (cs.CG); Differential Geometry (math.DG)
- [1153] arXiv:2601.17617 (replaced) [pdf, ps, other]
-
Title: Agentic Search in the Wild: Intents and Trajectory Dynamics from 14M+ Real Search RequestsAuthors: Jingjie Ning, João Coelho, Yibo Kong, Yunfan Long, Bruno Martins, João Magalhães, Jamie Callan, Chenyan XiongSubjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
- [1154] arXiv:2601.18123 (replaced) [pdf, ps, other]
-
Title: Deadline-Aware, Energy-Efficient Control of Domestic Immersion Hot Water HeaterComments: Accepted at AAAI 2026Subjects: Artificial Intelligence (cs.AI)
- [1155] arXiv:2601.18350 (replaced) [pdf, ps, other]
-
Title: When Domain Pretraining Interferes with Instruction Alignment: An Empirical Study of Adapter Merging in Medical LLMsAuthors: Junyi ZouSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [1156] arXiv:2601.18497 (replaced) [pdf, ps, other]
-
Title: BAIT: Visual-illusion-inspired Privacy Preservation for Mobile Data VisualizationComments: Accepted by CHI'26Subjects: Human-Computer Interaction (cs.HC)
- [1157] arXiv:2601.18795 (replaced) [pdf, ps, other]
-
Title: Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy PrefixesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [1158] arXiv:2601.18930 (replaced) [pdf, ps, other]
-
Title: Toward Learning POMDPs Beyond Full-Rank Actions and State ObservabilityComments: Update abstractSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [1159] arXiv:2601.19065 (replaced) [pdf, ps, other]
-
Title: The Opaque Pointer Design Pattern in Python: Towards a Pythonic PIMPL for Modularity, Encapsulation, and StabilityAuthors: Antonios Saravanos (1), John Pazarzis (2), Stavros Zervoudakis (1), Dongnanzi Zheng (1) ((1) New York University, (2) Independent Researcher)Subjects: Software Engineering (cs.SE); Programming Languages (cs.PL)
- [1160] arXiv:2601.19120 (replaced) [pdf, ps, other]
-
Title: RobustExplain: Evaluating Robustness of LLM-Based Explanation Agents for RecommendationComments: 8 pages, 4 figuresSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1161] arXiv:2601.19121 (replaced) [pdf, ps, other]
-
Title: LLMs as Orchestrators: Constraint-Compliant Multi-Agent Optimization for Recommendation SystemsComments: 8 pages, 5 figuresSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
- [1162] arXiv:2601.19136 (replaced) [pdf, ps, other]
-
Title: TFFM: Topology-Aware Feature Fusion Module via Latent Graph Reasoning for Retinal Vessel SegmentationAuthors: Iftekhar Ahmed, Shakib Absar, Aftar Ahmad Sami, Shadman Sakib, Debojyoti Biswas, Seraj Al Mahmud MostafaComments: Accepted in WACV 2026 @ P2P-workshop as a full paper and selected for oral presentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1163] arXiv:2601.19395 (replaced) [pdf, ps, other]
-
Title: SEAFormer: A Spatial Proximity and Edge-Aware Transformer for Real-World Vehicle Routing ProblemsComments: 26 pagesSubjects: Machine Learning (cs.LG)
- [1164] arXiv:2601.19402 (replaced) [pdf, ps, other]
-
Title: PROTEUS: SLA-Aware Routing via Lagrangian RL for Multi-LLM Serving SystemsComments: Submitted to EuroMLSys26Subjects: Artificial Intelligence (cs.AI)
- [1165] arXiv:2601.19411 (replaced) [src]
-
Title: Task-Centric Policy Optimization from Misaligned Motion PriorsComments: Work requires further details and not complete yetSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
- [1166] arXiv:2601.19933 (replaced) [pdf, ps, other]
-
Title: NRR-Phi: Text-to-State Mapping for Ambiguity Preservation in LLM InferenceAuthors: Kei SaitoComments: 17 pages, 3 figures, 5 tables. Part of the NRR research program. v2: Added title prefix NRR-Phi for series identification; standardized reference formattingSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1167] arXiv:2601.20041 (replaced) [pdf, ps, other]
-
Title: CiMRAG: CiM-Aware Domain-Adaptive and Noise-Resilient Retrieval-Augmented Generation for Edge-Based LLMsComments: Accepted by ICASSP 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1168] arXiv:2601.20197 (replaced) [pdf, ps, other]
-
Title: Bias-Reduced Estimation of Finite Mixtures: An Application to Latent Group Structures in Panel DataAuthors: Raphaël LangevinSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Econometrics (econ.EM); Computation (stat.CO)
- [1169] arXiv:2601.20241 (replaced) [pdf, ps, other]
-
Title: Adequately Tailoring Age Verification RegulationsComments: This paper is accepted by the ACM Symposium on Computer Science and Law (CS & Law 2026)Subjects: Computers and Society (cs.CY)
- [1170] arXiv:2601.20753 (replaced) [pdf, ps, other]
-
Title: GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy LearningSubjects: Machine Learning (cs.LG)
- [1171] arXiv:2601.20789 (replaced) [pdf, ps, other]
-
Title: SERA: Soft-Verified Efficient Repository AgentsComments: 21 main pages, 6 pages appendixSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Software Engineering (cs.SE)
- [1172] arXiv:2601.20834 (replaced) [pdf, ps, other]
-
Title: Linear representations in language models can change dramatically over a conversationSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [1173] arXiv:2601.20969 (replaced) [pdf, ps, other]
-
Title: The Epistemic Planning Domain Definition Language: Official GuidelineSubjects: Artificial Intelligence (cs.AI)
- [1174] arXiv:2601.21064 (replaced) [pdf, ps, other]
-
Title: Textual Equilibrium Propagation for Deep Compound AI SystemsComments: Accepted to ICLR 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1175] arXiv:2601.21123 (replaced) [pdf, ps, other]
-
Title: CUA-Skill: Develop Skills for Computer Using AgentAuthors: Tianyi Chen, Yinheng Li, Michael Solodko, Sen Wang, Nan Jiang, Tingyuan Cui, Junheng Hao, Jongwoo Ko, Sara Abdali, Leon Xu, Suzhen Zheng, Hao Fan, Pashmina Cameron, Justin Wagle, Kazuhito KoishidaSubjects: Artificial Intelligence (cs.AI)
- [1176] arXiv:2601.21170 (replaced) [pdf, ps, other]
-
Title: The Powers of Precision: Structure-Informed Detection in Complex Systems -- From Customer Churn to Seizure OnsetSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [1177] arXiv:2601.21309 (replaced) [src]
-
Title: Transferable Graph Condensation from the Causal PerspectiveAuthors: Huaming Du, Yijie Huang, Su Yao, Yiying Wang, Yueyang Zhou, Jingwen Yang, Jinshi Zhang, Han Ji, Yu Zhao, Guisong Liu, Hegui Zhang, Carl Yang, Gang KouComments: The project paper is currently under company confidentiality restrictions, and the data and models cannot be made publicly available at this timeSubjects: Machine Learning (cs.LG)
- [1178] arXiv:2601.21331 (replaced) [pdf, ps, other]
-
Title: Convex Loss Functions for Support Vector Machines (SVMs) and Neural NetworksAuthors: Filippo PorteraSubjects: Machine Learning (cs.LG)
- [1179] arXiv:2601.21494 (replaced) [pdf, ps, other]
-
Title: The Path of Least Resistance: Guiding LLM Reasoning Trajectories with Prefix ConsensusComments: Accepted at ICLR 2026. this https URLSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [1180] arXiv:2601.21602 (replaced) [pdf, ps, other]
-
Title: AIR-VLA: Vision-Language-Action Systems for Aerial ManipulationAuthors: Jianli Sun, Bin Tian, Qiyao Zhang, Chengxiang Li, Zihan Song, Zhiyong Cui, Yisheng Lv, Yonglin TianSubjects: Robotics (cs.RO)
- [1181] arXiv:2601.21712 (replaced) [pdf, ps, other]
-
Title: CoFreeVLA: Collision-Free Dual-Arm Manipulation via Vision-Language-Action Model and Risk EstimationSubjects: Robotics (cs.RO)
- [1182] arXiv:2601.21826 (replaced) [pdf, ps, other]
-
Title: Mil-SCORE: Benchmarking Long-Context Geospatial Reasoning and Planning in Large Language ModelsSubjects: Computation and Language (cs.CL)
- [1183] arXiv:2601.21835 (replaced) [pdf, ps, other]
-
Title: Scalable Linearized Laplace Approximation via Surrogate Neural KernelComments: 6 pages, 1 table. Accepted at European Symposium on Artificial Neural Networks (ESANN 2026) as oral presentationSubjects: Machine Learning (cs.LG)
- [1184] arXiv:2601.21955 (replaced) [pdf, ps, other]
-
Title: From Generative Modeling to Clinical Classification: A GPT-Based Architecture for EHR NotesAuthors: Fariba Afrin IranyComments: This submission is a full-length research manuscript consisting of 37 pages and 15 figures. The paper presents a GPT-based architecture with selective fine-tuning for clinical text classification, including detailed architectural diagrams, learning curves, and evaluation figures such as ROC curves and confusion matricesSubjects: Computation and Language (cs.CL)
- [1185] arXiv:2601.22125 (replaced) [pdf, ps, other]
-
Title: Creative Image Generation with Diffusion ModelsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1186] arXiv:2601.22191 (replaced) [pdf, ps, other]
-
Title: Partial Rewriting and Value Interpretation of Logically Constrained Terms (Full Version)Comments: Full version of a submission to FSCD 2026Subjects: Logic in Computer Science (cs.LO)
- [1187] arXiv:2601.22513 (replaced) [pdf, ps, other]
-
Title: Why Self-Rewarding Works: Theoretical Guarantees for Iterative Alignment of Language ModelsSubjects: Artificial Intelligence (cs.AI)
- [1188] arXiv:2601.22522 (replaced) [pdf, ps, other]
-
Title: Can 3D point cloud data improve automated body condition score prediction in dairy cattle?Authors: Zhou Tang, Jin Wang, Angelo De Castro, Yuxi Zhang, Victoria Bastos Primo, Ana Beatriz Montevecchio Bernardino, Gota Morota, Xu Wang, Ricardo C Chebel, Haipeng YuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1189] arXiv:2601.22579 (replaced) [pdf, ps, other]
-
Title: Non-Intrusive Graph-Based Bot Detection for E-Commerce Using Inductive Graph Neural NetworksSubjects: Machine Learning (cs.LG)
- [1190] arXiv:2601.22607 (replaced) [pdf, ps, other]
-
Title: From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using AgentsComments: Submitted to ICML 2026Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [1191] arXiv:2601.22728 (replaced) [pdf, ps, other]
-
Title: On Small Pair Decompositions for Point SetsAuthors: Kevin Buchin, Jacobus Conradi, Sariel Har-Peled, Antonia Kalb, Abhiruk Lahiri, Lukas Plätz, Carolin Rehs, Sampson WongSubjects: Computational Geometry (cs.CG)
- [1192] arXiv:2601.22875 (replaced) [src]
-
Title: From Labels to Facets: Building a Taxonomically Enriched Turkish Learner CorpusAuthors: Elif Sayar, Tolgahan Türker, Anna Golynskaia Knezhevich, Bihter Dereli, Ayşe Demirhas, Lionel Nicolas, Gülşen EryiğitComments: An error was identified in the analyses presented in Section 5.3, impacting the conclusions of the paper. The authors have therefore withdrawn the submissionSubjects: Computation and Language (cs.CL)
- [1193] arXiv:2601.22975 (replaced) [pdf, ps, other]
-
Title: Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet TextAuthors: Ximing Lu, David Acuna, Jaehun Jung, Jian Hu, Di Zhang, Shizhe Diao, Yunheng Zou, Shaokun Zhang, Brandon Cui, Mingjie Liu, Hyunwoo Kim, Prithviraj Ammanabrolu, Jan Kautz, Yi Dong, Yejin ChoiSubjects: Artificial Intelligence (cs.AI)
- [1194] arXiv:2601.23223 (replaced) [pdf, ps, other]
-
Title: Are you going to finish that? A Practical Study of the Partial Token ProblemSubjects: Computation and Language (cs.CL)
- [1195] arXiv:2601.23232 (replaced) [pdf, ps, other]
-
Title: ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web SearchAuthors: Tao Yu, Haopeng Jin, Hao Wang, Shenghua Chai, Yujia Yang, Junhao Gong, Jiaming Guo, Minghui Zhang, Xinlong Chen, Zhenghao Zhang, Yuxuan Zhou, Yufei Xiong, Shanbin Zhang, Jiabing Yang, Hongzhu Yi, Xinming Wang, Cheng Zhong, Xiao Ma, Zhang Zhang, Yan Huang, Liang WangComments: 28 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [1196] arXiv:2602.00003 (replaced) [pdf, ps, other]
-
Title: Orchestrating Heterogeneous Experts: A Scalable MoE Framework with Anisotropy-Preserving FusionComments: 4 pages, 2 figures. Accepted at the Workshop on TIME of the ACM Web Conference 2026Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1197] arXiv:2602.00032 (replaced) [pdf, ps, other]
-
Title: Happy Young Women, Grumpy Old Men? Emotion-Driven Demographic Biases in Synthetic Face GenerationComments: 23 pages, 11 figuresSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [1198] arXiv:2602.00041 (replaced) [pdf, ps, other]
-
Title: Student Perceptions of Large Language Models Use in Self-Reflection and Design Critique in Architecture StudioAuthors: Juan David Salazar Rodriguez, Sam Conrad Joyce, Nachamma Sockalingam, Khoo Eng Tat, JulfendiComments: Keywords: Architectural Education, Design Studio Pedagogy, Large Lan-guage Models, Generative AI in Education, Design CritiqueSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
- [1199] arXiv:2602.00062 (replaced) [pdf, ps, other]
-
Title: SCPL: Enhancing Neural Network Training Throughput with Decoupled Local Losses and Model ParallelismSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1200] arXiv:2602.00064 (replaced) [pdf, ps, other]
-
Title: SPGCL: Simple yet Powerful Graph Contrastive Learning via SVD-Guided Structural PerturbationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1201] arXiv:2602.00079 (replaced) [pdf, ps, other]
-
Title: Embedding Compression via Spherical CoordinatesAuthors: Han XiaoSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [1202] arXiv:2602.00197 (replaced) [pdf, ps, other]
-
Title: Rank-and-Reason: Multi-Agent Collaboration Accelerates Zero-Shot Protein Mutation PredictionAuthors: Yang Tan, Yuanxi Yu, Can Wu, Bozitao Zhong, Mingchen Li, Guisheng Fan, Jiankang Zhu, Yafeng Liang, Nanqing Dong, Liang HongComments: 22 pages, 5 figures, 15 tablesSubjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [1203] arXiv:2602.00222 (replaced) [pdf, ps, other]
-
Title: MapDream: Task-Driven Map Learning for Vision-Language NavigationAuthors: Guoxin Lian, Shuo Wang, Yucheng Wang, Yongcai Wang, Maiyue Chen, Kaihui Wang, Bo Zhang, Zhizhong Su, Deying Li, Zhaoxin FanSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [1204] arXiv:2602.00408 (replaced) [pdf, ps, other]
-
Title: Variational Approach for Job Shop SchedulingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1205] arXiv:2602.00450 (replaced) [pdf, ps, other]
-
Title: Model Optimization for Multi-Camera 3D Detection and TrackingAuthors: Ethan Anderson, Justin Silva, Kyle Zheng, Sameer Pusegaonkar, Yizhou Wang, Zheng Tang, Sujit BiswasSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1206] arXiv:2602.00488 (replaced) [pdf, ps, other]
-
Title: OD-DEAL: Dynamic Expert-Guided Adversarial Learning with Online Decomposition for Scalable Capacitated Vehicle RoutingSubjects: Machine Learning (cs.LG)
- [1207] arXiv:2602.00508 (replaced) [pdf, ps, other]
-
Title: DuoGen: Towards General Purpose Interleaved Multimodal GenerationAuthors: Min Shi, Xiaohui Zeng, Jiannan Huang, Yin Cui, Francesco Ferroni, Jialuo Li, Shubham Pachori, Zhaoshuo Li, Yogesh Balaji, Haoxiang Wang, Tsung-Yi Lin, Xiao Fu, Yue Zhao, Chieh-Yun Chen, Ming-Yu Liu, Humphrey ShiComments: Technical Report. Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1208] arXiv:2602.00509 (replaced) [pdf, ps, other]
-
Title: PROBE: Co-Balancing Computation and Communication in MoE Inference via Real-Time Predictive PrefetchingSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
- [1209] arXiv:2602.00514 (replaced) [pdf, ps, other]
-
Title: A Low-Cost Vision-Based Tactile Gripper with Pretraining Learning for Contact-Rich ManipulationSubjects: Robotics (cs.RO)
- [1210] arXiv:2602.00520 (replaced) [pdf, ps, other]
-
Title: NEST: Nested Event Stream Transformer for Sequences of MultisetsComments: 11 pagesSubjects: Machine Learning (cs.LG)
- [1211] arXiv:2602.00611 (replaced) [pdf, ps, other]
- [1212] arXiv:2602.00642 (replaced) [pdf, ps, other]
-
Title: LegalOne: A Family of Foundation Models for Reliable Legal ReasoningAuthors: Haitao Li, Yifan Chen, Shuo Miao, Qian Dong, Jia Chen, Yiran Hu, Junjie Chen, Minghao Qin, Yueyue Wu, Yujia Zhou, Qingyao Ai, Yiqun Liu, Cheng Luo, Quan Zhou, Ya Zhang, Jikun HuComments: 25 pages, v1Subjects: Computation and Language (cs.CL)
- [1213] arXiv:2602.00644 (replaced) [pdf, ps, other]
-
Title: Hardness and Tractability of T_{h+1}-Free Edge DeletionComments: 23 pagesSubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
- [1214] arXiv:2602.00708 (replaced) [pdf, ps, other]
-
Title: USS-Nav: Unified Spatio-Semantic Scene Graph for Lightweight UAV Zero-Shot Object NavigationAuthors: Weiqi Gai, Yuman Gao, Yuan Zhou, Yufan Xie, Zhiyang Liu, Yuze Wu, Xin Zhou, Fei Gao, Zhijun MengSubjects: Robotics (cs.RO)
- [1215] arXiv:2602.00710 (replaced) [pdf, ps, other]
-
Title: Learning More from Less: Unlocking Internal Representations for Benchmark CompressionAuthors: Yueqi Zhang, Jin Hu, Shaoxiong Feng, Peiwen Yuan, Xinglin Wang, Yiwei Li, Jiayi Shi, Chuyi Tan, Ji Zhang, Boyuan Pan, Yao Hu, Kan LiSubjects: Artificial Intelligence (cs.AI)
- [1216] arXiv:2602.00748 (replaced) [pdf, ps, other]
-
Title: HyperOffload: Graph-Driven Hierarchical Memory Management for Large Language Models on SuperNode ArchitecturesAuthors: Fangxin Liu, Qinghua Zhang, Hanjing Shen, Zhibo Liang, Li Jiang, Haibing Guan, Chong Bao, Xuefeng JinComments: Technical ReportSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
- [1217] arXiv:2602.00813 (replaced) [pdf, ps, other]
-
Title: Generating a Paracosm for Training-Free Zero-Shot Composed Image RetrievalSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1218] arXiv:2602.00814 (replaced) [pdf, ps, other]
-
Title: SyNeT: Synthetic Negatives for Traversability LearningSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [1219] arXiv:2602.00872 (replaced) [pdf, ps, other]
-
Title: Learning Heat-based Equations in Self-similar variablesSubjects: Machine Learning (cs.LG); Mathematical Physics (math-ph)
- [1220] arXiv:2602.00880 (replaced) [pdf, ps, other]
-
Title: Sensing What Surveys Miss: Understanding and Personalizing Proactive LLM Support by User ModelingComments: This manuscript has been accepted to CHI 2026Subjects: Human-Computer Interaction (cs.HC)
- [1221] arXiv:2602.00906 (replaced) [pdf, ps, other]
-
Title: Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership TestingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)
- [1222] arXiv:2602.00949 (replaced) [pdf, ps, other]
-
Title: Data Augmentation for High-Fidelity Generation of CAR-T/NK Immunological Synapse ImagesAuthors: Xiang Zhang, Boxuan Zhang, Alireza Naghizadeh, Mohab Mohamed, Dongfang Liu, Ruixiang Tang, Dimitris Metaxas, Dongfang LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1223] arXiv:2602.00989 (replaced) [pdf, ps, other]
-
Title: Optimal Decision-Making Based on Prediction SetsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [1224] arXiv:2602.01011 (replaced) [pdf, ps, other]
-
Title: Multi-Agent Teams Hold Experts BackComments: PreprintSubjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
- [1225] arXiv:2602.01023 (replaced) [pdf, ps, other]
-
Title: Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective AlignmentAuthors: Kai Yuan, Anthony Zheng, Jia Hu, Divyanshu Sheth, Hemanth Velaga, Kylee Kim, Matteo Guarrera, Besim Avci, Jianhua Li, Xuetao Yin, Rajyashree Mukherjee, Sean SuchterComments: 11 pages, 4 figuresSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1226] arXiv:2602.01044 (replaced) [pdf, ps, other]
-
Title: Morphis: SLO-Aware Resource Scheduling for Microservices with Time-Varying Call GraphsSubjects: Software Engineering (cs.SE)
- [1227] arXiv:2602.01077 (replaced) [pdf, ps, other]
-
Title: PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion TransformersComments: 17 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1228] arXiv:2602.01155 (replaced) [pdf, ps, other]
-
Title: Multi-Agent Causal Reasoning System for Error Pattern Rule Automation in VehiclesComments: 7 pages, 3 figuresSubjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
- [1229] arXiv:2602.01199 (replaced) [pdf, ps, other]
-
Title: On Normality and Equidistribution for Separator EnumeratorsAuthors: Subin PulariSubjects: Formal Languages and Automata Theory (cs.FL)
- [1230] arXiv:2602.01219 (replaced) [pdf, ps, other]
-
Title: MiTA Attention: Efficient Fast-Weight Scaling via a Mixture of Top-k ActivationsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [1231] arXiv:2602.01244 (replaced) [pdf, ps, other]
-
Title: Large-Scale Terminal Agentic Trajectory Generation from Dockerized EnvironmentsAuthors: Siwei Wu, Yizhi Li, Yuyang Song, Wei Zhang, Yang Wang, Riza Batista-Navarro, Xian Yang, Mingjie Tang, Bryan Dai, Jian Yang, Chenghua LinComments: Agentic Trajectory, Agentic Model, Terminal, Code AgentSubjects: Computation and Language (cs.CL)
- [1232] arXiv:2602.01295 (replaced) [pdf, ps, other]
-
Title: Best-of-Both-Worlds for Heavy-Tailed Markov Decision ProcessesSubjects: Machine Learning (cs.LG)
- [1233] arXiv:2602.01313 (replaced) [pdf, ps, other]
-
Title: EverMemBench: Benchmarking Long-Term Interactive Memory in Large Language ModelsAuthors: Chuanrui Hu, Tong Li, Xingze Gao, Hongda Chen, Yi Bai, Dannong Xu, Tianwei Lin, Xinda Zhao, Xiaohong Li, Yunyun Han, Jian Pei, Yafeng DengComments: 10 pages, 2 figures, 4 tablesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [1234] arXiv:2602.01355 (replaced) [pdf, ps, other]
-
Title: Aggregation Queries over Unstructured Text: Benchmark and Agentic MethodSubjects: Artificial Intelligence (cs.AI)
- [1235] arXiv:2602.01377 (replaced) [pdf, ps, other]
-
Title: Approximating Univariate Factored Distributions via Message-Passing AlgorithmsSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
- [1236] arXiv:2602.01401 (replaced) [pdf, ps, other]
-
Title: From Pragmas to Partners: A Symbiotic Evolution of Agentic High-Level SynthesisSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [1237] arXiv:2602.01450 (replaced) [pdf, ps, other]
-
Title: The Algorithmic Self-Portrait: Deconstructing Memory in ChatGPTAuthors: Abhisek Dash, Soumi Das, Elisabeth Kirsten, Qinyuan Wu, Sai Keerthana Karnam, Krishna P. Gummadi, Thorsten Holz, Muhammad Bilal Zafar, Savvas ZannettouComments: This paper has been accepted at The ACM Web Conference 2026Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Information Retrieval (cs.IR)
- [1238] arXiv:2602.01501 (replaced) [pdf, ps, other]
-
Title: TreeLoc: 6-DoF LiDAR Global Localization in Forests via Inter-Tree Geometric MatchingComments: An 8-page paper with 7 tables and 8 figures, accepted to ICRA 2026Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [1239] arXiv:2602.01588 (replaced) [pdf, ps, other]
-
Title: Spectral Text Fusion: A Frequency-Aware Approach to Multimodal Time-Series ForecastingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1240] arXiv:2602.01590 (replaced) [pdf, ps, other]
-
Title: Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia ArticlesAuthors: Shaohan Wang, Benfeng Xu, Licheng Zhang, Mingxuan Du, Chiwei Zhu, Xiaorui Wang, Zhendong Mao, Yongdong ZhangComments: PreprintSubjects: Computation and Language (cs.CL)
- [1241] arXiv:2602.01601 (replaced) [pdf, ps, other]
-
Title: Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable RewardsComments: Accepted at ICLR 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [1242] arXiv:2602.01635 (replaced) [pdf, ps, other]
-
Title: COMET: Codebook-based Online-adaptive Multi-scale Embedding for Time-series Anomaly DetectionSubjects: Machine Learning (cs.LG)
- [1243] arXiv:2602.01661 (replaced) [pdf, ps, other]
-
Title: From Frames to Sequences: Temporally Consistent Human-Centric Dense PredictionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1244] arXiv:2602.01666 (replaced) [pdf, ps, other]
-
Title: Moonworks Lunara Aesthetic II: An Image Variation DatasetSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1245] arXiv:2602.01696 (replaced) [pdf, ps, other]
-
Title: Cross-Modal Alignment and Fusion for RGB-D Transmission-Line Defect DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [1246] arXiv:2602.01709 (replaced) [pdf, ps, other]
-
Title: ARTIS: Agentic Risk-Aware Test-Time Scaling via Iterative SimulationAuthors: Xingshan Zeng, Lingzhi Wang, Weiwen Liu, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun LiuSubjects: Computation and Language (cs.CL)
- [1247] arXiv:2602.01749 (replaced) [pdf, ps, other]
-
Title: Controlling Exploration-Exploitation in GFlowNets via Markov Chain PerspectivesAuthors: Lin Chen, Samuel Drapeau, Fanghao Shao, Xuekai Zhu, Bo Xue, Yunchong Song, Mathieu Laurière, Zhouhan LinSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [1248] arXiv:2602.01751 (replaced) [pdf, ps, other]
-
Title: MGKAN: Predicting Asymmetric Drug-Drug Interactions via a Multimodal Graph Kolmogorov-Arnold NetworkComments: This paper has been accepted by ICASSP 2026Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
- [1249] arXiv:2602.01753 (replaced) [pdf, ps, other]
-
Title: ObjEmbed: Towards Universal Multimodal Object EmbeddingsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1250] arXiv:2602.01757 (replaced) [pdf, ps, other]
-
Title: Zero2Text: Zero-Training Cross-Domain Inversion Attacks on Textual EmbeddingsComments: 10 pagesSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [1251] arXiv:2602.01769 (replaced) [pdf, ps, other]
-
Title: IRIS: Implicit Reward-Guided Internal Sifting for Mitigating Multimodal HallucinationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1252] arXiv:2602.01780 (replaced) [pdf, ps, other]
-
Title: DDP-WM: Disentangled Dynamics Prediction for Efficient World ModelsComments: Codes will be available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [1253] arXiv:2602.01789 (replaced) [pdf, ps, other]
-
Title: RFS: Reinforcement learning with Residual flow steering for dexterous manipulationSubjects: Robotics (cs.RO)
- [1254] arXiv:2602.01807 (replaced) [pdf, ps, other]
-
Title: Sentence Curve Language ModelsSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [1255] arXiv:2602.01855 (replaced) [pdf, ps, other]
-
Title: Time2Vec Transformer for Robust Gesture Recognition from Low-Density sEMGSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
- [1256] arXiv:2602.01861 (replaced) [pdf, ps, other]
-
Title: RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse ResponsesComments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2026. Equal contribution: Shaoheng Xu and Chunyi SunSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
- [1257] arXiv:2602.01865 (replaced) [pdf, ps, other]
-
Title: GRAB: An LLM-Inspired Sequence-First Click-Through Rate Prediction Modeling ParadigmAuthors: Shaopeng Chen, Chuyue Xie, Huimin Ren, Shaozong Zhang, Han Zhang, Ruobing Cheng, Zhiqiang Cao, Zehao Ju, Yu Gao, Jie Ding, Xiaodong Chen, Xuewu Jiao, Shuanglong Li, Liu LinSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
- [1258] arXiv:2602.01882 (replaced) [pdf, ps, other]
-
Title: The price of homogeneity is polynomialComments: 49 pages, 18 figuresSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
- [1259] arXiv:2602.01976 (replaced) [pdf, ps, other]
-
Title: FlyPrompt: Brain-Inspired Random-Expanded Routing with Temporal-Ensemble Experts for General Continual LearningComments: 33 pages. Accepted by ICLR 2026Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [1260] arXiv:2602.01992 (replaced) [pdf, ps, other]
-
Title: Emergent Analogical Reasoning in TransformersAuthors: Gouki Minegishi, Jingyuan Feng, Hiroki Furuta, Takeshi Kojima, Yusuke Iwasawa, Yutaka MatsuoSubjects: Artificial Intelligence (cs.AI)
- [1261] arXiv:2602.02000 (replaced) [pdf, ps, other]
-
Title: SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity PriorsAuthors: Bing He, Jingnan Gao, Yunuo Chen, Ning Cao, Gang Chen, Zhengxue Cheng, Li Song, Wenjun ZhangComments: ICLR 2026; Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [1262] arXiv:2602.02053 (replaced) [pdf, ps, other]
-
Title: WildGraphBench: Benchmarking GraphRAG with Wild-Source CorporaAuthors: Pengyu Wang, Benfeng Xu, Licheng Zhang, Shaohan Wang, Mingxuan Du, Chiwei Zhu, Zhendong MaoComments: this https URLSubjects: Computation and Language (cs.CL)
- [1263] arXiv:2602.02084 (replaced) [pdf, ps, other]
-
Title: Closing the Loop: Universal Repository Representation with RPG-EncoderAuthors: Jane Luo, Chengyu Yin, Xin Zhang, Qingtao Li, Steven Liu, Yiming Huang, Jie Wu, Hao Liu, Yangyu Huang, Yu Kang, Fangkai Yang, Ying Xin, Scarlett LiSubjects: Computation and Language (cs.CL); Software Engineering (cs.SE)
- [1264] arXiv:2602.02137 (replaced) [pdf, ps, other]
-
Title: DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center OperationsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
- [1265] arXiv:2602.02163 (replaced) [pdf, ps, other]
-
Title: Reg4Pru: Regularisation Through Random Token Routing for Token PruningComments: 11 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1266] arXiv:2602.02175 (replaced) [pdf, ps, other]
-
Title: CIEC: Coupling Implicit and Explicit Cues for Multimodal Weakly Supervised Manipulation LocalizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [1267] arXiv:2602.02178 (replaced) [pdf, ps, other]
-
Title: AR-MAP: Are Autoregressive Large Language Models Implicit Teachers for Diffusion Large Language Models?Authors: Liang Lin, Feng Xiong, Zengbin Wang, Kun Wang, Junhao Dong, Xuecai Hu, Yong Wang, Xiangxiang ChuSubjects: Computation and Language (cs.CL)
- [1268] arXiv:2602.02192 (replaced) [pdf, ps, other]
-
Title: ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement LearningAuthors: Jie Xiao, Meng Chen, Qingnan Ren, Song Jingwei, Jiaqi Huang, Yangshen Deng, Chris Tong, Wanyi Chen, Suli Wang, Ziqian Bi, Shuo Lu, Yiqun Duan, Xu Wang, Rymon Yu, Ween Yang, Lynn Ai, Eric Yang, Bill ShiComments: 23 pages, 7 figuresSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
- [1269] arXiv:2602.02196 (replaced) [pdf, ps, other]
-
Title: TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM AgentsAuthors: Hang Yan, Xinyu Che, Fangzhi Xu, Qiushi Sun, Zichen Ding, Kanzhi Cheng, Jian Zhang, Tao Qin, Jun Liu, Qika LinComments: 29pages, 10 figuresSubjects: Artificial Intelligence (cs.AI)
- [1270] arXiv:2602.02230 (replaced) [pdf, ps, other]
-
Title: SEDformer: Event-Synchronous Spiking Transformers for Irregular Telemetry Time Series ForecastingComments: Under reviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [1271] arXiv:2602.02235 (replaced) [pdf, ps, other]
-
Title: Agent-Based Software Artifact EvaluationSubjects: Software Engineering (cs.SE)
- [1272] arXiv:2602.02236 (replaced) [pdf, ps, other]
-
Title: Online Fine-Tuning of Pretrained Controllers for Autonomous Driving via Real-Time Recurrent RLSubjects: Robotics (cs.RO); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)
- [1273] arXiv:2602.02313 (replaced) [pdf, ps, other]
-
Title: Interpreting and Controlling LLM Reasoning through Integrated Policy GradientSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [1274] arXiv:2602.02383 (replaced) [pdf, ps, other]
-
Title: SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference OptimizationSubjects: Machine Learning (cs.LG)
- [1275] arXiv:2602.02393 (replaced) [pdf, ps, other]
-
Title: Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical MemoryAuthors: Ruiqi Wu, Xuanhua He, Meng Cheng, Tianyu Yang, Yong Zhang, Zhuoliang Kang, Xunliang Cai, Xiaoming Wei, Chunle Guo, Chongyi Li, Ming-Ming ChengComments: project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [1276] arXiv:2602.02408 (replaced) [pdf, ps, other]
-
Title: ReasonEdit: Editing Vision-Language Models using Human ReasoningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [1277] arXiv:2602.02419 (replaced) [pdf, ps, other]
-
Title: SafeGround: Know When to Trust GUI Grounding Models via Uncertainty CalibrationSubjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
- [1278] arXiv:2602.02444 (replaced) [pdf, ps, other]
-
Title: RANKVIDEO: Reasoning Reranking for Text-to-Video RetrievalSubjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV)
- [1279] arXiv:2602.02453 (replaced) [pdf, ps, other]
-
Title: Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual StorytellingComments: Working paperSubjects: Artificial Intelligence (cs.AI)
- [1280] arXiv:2602.02487 (replaced) [pdf, ps, other]
-
Title: Carry-Over Lottery Allocation: Practical Incentive-Compatible DraftsComments: 28 pages, 4 figuresSubjects: Computer Science and Game Theory (cs.GT)
[ showing up to 2000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2602, contact, help (Access key information)