We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 802 entries: 1-802 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 5 Dec 25

[1]  arXiv:2512.04086 [pdf, ps, other]
Title: Energy Profiling of Data-Sharing Pipelines: Modeling, Estimation, and Reuse Strategies
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

Data-sharing pipelines involve a series of stages that apply policy-based data transformations to enable secure and effective data exchange among organizations. Although numerous tools and platforms exist to manage governance and enforcement in these pipelines, energy efficiency in data exchange has received limited attention. This paper introduces a novel method to model and estimate the energy consumption of different execution configurations in data-sharing pipelines. Additionally, this method identifies reuse potential in shared stages across pipelines that hold the key to reducing energy in large data-sharing federations. We validate this method through simulation experiments, revealing promising potential for cross-organizational pipeline optimization and laying a foundation for energy-conscious execution strategies.

[2]  arXiv:2512.04088 [pdf, ps, other]
Title: Toward Sustainability-Aware LLM Inference on Edge Clusters
Comments: 4 pages, 5 figures, 3 tables, conference paper
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Large language models (LLMs) require substantial computational resources, leading to significant carbon emissions and operational costs. Although training is energy-intensive, the long-term environmental burden arises from inference, amplified by the massive global query volume. Cloud-based inference offers scalability but suffers from latency and bandwidth constraints due to centralized processing and continuous data transfer. Edge clusters instead can mitigate these limitations by enabling localized execution, yet they face trade-offs between performance, energy efficiency, and device constraints. This short paper presents a sustainability-aware LLM inference for edge clusters comprising NVIDIA Jetson Orin NX (8GB) and Nvidia Ada 2000 (16GB) devices. It aims to balance inference latency and carbon footprint through carbon- and latency-aware routing strategies, guided by empirical benchmarking of energy consumption and execution time across diverse prompts and batch (i.e., group of prompts) configurations. We compared baseline greedy strategies to carbon-aware and latency-aware strategies in prompt routing to specific hardware based on benchmarking information. Experimental evaluation shows that a batch size of four prompts achieves a trade-off between throughput, energy efficiency, while larger batches risk GPU memory saturation.

[3]  arXiv:2512.04089 [pdf, ps, other]
Title: Serverless Everywhere: A Comparative Analysis of WebAssembly Workflows Across Browser, Edge, and Cloud
Comments: 7 pages, 8 Figures, 2 Tables, conference paper
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

WebAssembly (Wasm) is a binary instruction format that enables portable, sandboxed, and near-native execution across heterogeneous platforms, making it well-suited for serverless workflow execution on browsers, edge nodes, and cloud servers. However, its performance and stability depend heavily on factors such as startup overhead, runtime execution model (e.g., Ahead-of-Time (AOT) and Just-in-Time (JIT) compilation), and resource variability across deployment contexts. This paper evaluates a Wasm-based serverless workflow executed consistently from the browser to edge and cloud instances. The setup uses wasm32-wasi modules: in the browser, execution occurs within a web worker, while on Edge and Cloud, an HTTP shim streams frames to the Wasm runtime. We measure cold- and warm-start latency, per-step delays, workflow makespan, throughput, and CPU/memory utilization to capture the end-to-end behavior across environments. Results show that AOT compilation and instance warming substantially reduce startup latency. For workflows with small payloads, the browser achieves competitive performance owing to fully in-memory data exchanges. In contrast, as payloads grow, the workflow transitions into a compute- and memory-intensive phase where AOT execution on edge and cloud nodes distinctly surpasses browser performance.

[4]  arXiv:2512.04093 [pdf, ps, other]
Title: Energy-Efficient Resource Management in Microservices-based Fog and Edge Computing: State-of-the-Art and Future Directions
Comments: 46 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The exponential growth of Internet of Things (IoT) devices has intensified the demand for efficient and responsive services. To address this demand, fog and edge computing have emerged as distributed paradigms that bring computational resources closer to end users, reducing latency, bandwidth limitations, and energy consumption. However, these paradigms present challenges in resource management due to resource constraints, computational heterogeneity, dynamic workloads, and diverse Quality of Service (QoS) requirements. This paper presents a comprehensive survey of state-of-the-art resource management strategies in microservices-based fog and edge computing, focusing on energy-efficient solutions. We systematically review and classify more than 136 studies (2020-2024) into five key subdomains: service placement, resource provisioning, task scheduling and offloading, resource allocation, and instance selection. Our categorization is based on optimization techniques, targeted objectives, and the strengths and limitations of each approach. In addition, we examine existing surveys and identify unresolved challenges and gaps in the literature. By highlighting the lack of synergy among fundamental resource management components, we outline promising research directions leveraging AI-driven optimization, quantum computing, and serverless computing. This survey serves as a comprehensive reference for researchers and practitioners by providing a unified and energy-aware perspective on resource management in microservices-based fog and edge computing, paving the way for more integrated, efficient, and sustainable future solutions.

[5]  arXiv:2512.04094 [pdf, ps, other]
Title: Memory-DD: A Low-Complexity Dendrite-Inspired Neuron for Temporal Prediction Tasks
Comments: 15 pages, 4 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Dendrite-inspired neurons have been widely used in tasks such as image classification due to low computational complexity and fast inference speed. Temporal data prediction, as a key machine learning task, plays a key role in real-time scenarios such as sensor data analysis, financial forecasting, and urban traffic management. However, existing dendrite-inspired neurons are mainly designed for static data. Studies on capturing dynamic features and modeling long-term dependencies in temporal sequences remain limited. Efficient architectures specifically designed for temporal sequence prediction are still lacking. In this paper, we propose Memory-DD, a low-complexity dendrite-inspired neuron model. Memory-DD consists of two dendrite-inspired neuron groups that contain no nonlinear activation functions but can still realize nonlinear mappings. Compared with traditional neurons without dendritic functions, Memory-DD requires only two neuron groups to extract logical relationships between features in input sequences. This design effectively captures temporal dependencies and is suitable for both classification and regression tasks on sequence data. Experimental results show that Memory-DD achieves an average accuracy of 89.41% on 18 temporal classification benchmark datasets, outperforming LSTM by 4.25%. On 9 temporal regression datasets, it reaches comparable performance to LSTM, while using only 50% of the parameters and reducing computational complexity (FLOPs) by 27.7%. These results demonstrate that Memory-DD successfully extends the low-complexity advantages of dendrite-inspired neurons to temporal prediction, providing a low-complexity and efficient solution for time-series data processing.

[6]  arXiv:2512.04095 [pdf, ps, other]
Title: Quantum-Embedded Dynamic Security Control using Hybrid Deep Reinforcement Learning
Subjects: Systems and Control (eess.SY)

Dynamic security control (DSC) is considered a pivotal step for the future power grid, which is increasingly penetrated by inverter-based resources. However, the efficiency of such practices, whether governed by automatic generation control or virtual inertia scheduling, can be intractable due to the complexity of the problem and the need to solve the differentialalgebraic equation in a timely manner with the required accuracy. In this regard, the model-free deep reinforcement learning algorithm demonstrates reliable performance. In addition, the introduction of fault-tolerant and near-term quantum computing terminologies, i.e., noisy intermediate-scale quantum, opens avenues for improving the performance of model-free algorithms leveraging quantum capabilities. This paper provides an organized framework and assesses its dependability by evaluating the performance of a quantum-embedded algorithm on the DSC of the IEEE 39-bus test system. Hence, the obtained results demonstrate promising applications, along with shortcomings that can be addressed and further developed later.

[7]  arXiv:2512.04096 [pdf, ps, other]
Title: Formal Specification for Fast ACS: Low-Latency File-Based Ordered Message Delivery at Scale
Journal-ref: In 2025 USENIX Annual Technical Conference (USENIX ATC 25) (pp. 1-17) 2025
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Low-latency message delivery is crucial for real-time systems. Data originating from a producer must be delivered to consumers, potentially distributed in clusters across metropolitan and continental boundaries. With the growing scale of computing, there can be several thousand consumers of the data. Such systems require a robust messaging system capable of transmitting messages containing data across clusters and efficiently delivering them to consumers. The system must offer guarantees like ordering and at-least-once delivery while avoiding overload on consumers, allowing them to consume messages at their own pace.
This paper presents the design of Fast ACS (an abbreviation for Ads Copy Service), a file-based ordered message delivery system that leverages a combination of two-sided (inter-cluster) and one-sided (intra-cluster) communication primitives - namely, Remote Procedure Call and Remote Memory Access, respectively - to deliver messages. The system has been successfully deployed to dozens of production clusters and scales to accommodate several thousand consumers within each cluster, which amounts to Tbps-scale intra-cluster consumer traffic at peak. Notably, Fast ACS delivers messages to consumers across the globe within a few seconds or even sub-seconds (p99) based on the message volume and consumer scale, at a low resource cost.

[8]  arXiv:2512.04097 [pdf, ps, other]
Title: MultiGA: Leveraging Multi-Source Seeding in Genetic Algorithms
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) are widely used across research domains to tackle complex tasks, but their performance can vary significantly depending on the task at hand. Evolutionary algorithms, inspired by natural selection, can be used to refine solutions iteratively at inference-time. To the best of our knowledge, there has not been exploration on leveraging the collective capabilities of multi-source seeding for LLM-guided genetic algorithms. In this paper, we introduce a novel approach, MultiGA, which applies genetic algorithm principles to address complex natural language tasks and reasoning problems by sampling from a diverse population of LLMs to initialize the population. MultiGA generates a range of outputs from various parent LLMs, open source and closed source, and uses a neutral fitness function to evaluate them. Through an iterative recombination process, we mix and refine these generations until an optimal solution is achieved. We benchmark our approach using text-to-SQL code generation tasks, trip planning, GPQA benchmark for grad-level science questions, and the BBQ bias benchmark. Our results show that MultiGA converges to the accuracy of the LLM best fit for the task, and these insights lay the foundation for future research looking closer at integrating multiple LLMs for unexplored tasks in which selecting only one pre-trained model is unclear or suboptimal.

[9]  arXiv:2512.04102 [pdf, ps, other]
Title: Prescriptive tool for zero-emissions building fenestration design using hybrid metaheuristic algorithms
Journal-ref: Energy and Buildings, Volume 336, 1 June 2025, 115594
Subjects: Neural and Evolutionary Computing (cs.NE)

Designing Zero-Emissions Buildings (ZEBs) involves balancing numerous complex objectives that traditional methods struggle to address. Fenestration, encompassing fa\c{c}ade openings and shading systems, plays a critical role in ZEB performance due to its high thermal transmittance and solar radiation admission. This paper presents a novel simulation-based optimization method for fenestration designed for practical application. It uses a hybrid metaheuristic algorithm and relies on rules and an updatable catalog, to fully automate the design process, create a highly diverse search space, minimize biases, and generate detailed solutions ready for architectural prescription. Nineteen fenestration variables, over which architects have design flexibility, were optimized to reduce heating, cooling demand, and thermal discomfort in residential buildings. The method was tested across three Spanish climate zones. Results demonstrate that the considered optimization algorithm significantly outperforms the baseline Genetic Algorithm in both quality and robustness, with these differences proven to be statistically significant. Furthermore, the findings offer valuable insights for ZEB design, highlighting challenges in reducing cooling demand in warm climates, and showcasing the superior efficiency of automated movable shading systems compared to fixed solutions.

[10]  arXiv:2512.04104 [pdf, ps, other]
Title: A Taxonomy-Driven Case Study of Australian Web Resources Against Technology-Facilitated Abuse
Comments: Keywords: Technology-Facilitated Abuse, Cyber Abuse, Digital Safety, Digital Harms, Human-Centred Security
Subjects: Computers and Society (cs.CY)

Technology-Facilitated Abuse (TFA) encompasses a broad and rapidly evolving set of behaviours in which digital systems are used to harass, monitor, threaten, or control individuals. Although prior research has documented many forms of TFA, there is no consolidated framework for understanding how abuse types, prevention measures, detection mechanisms, and support pathways relate across the abuse life cycle. This paper contributes a unified, literature-derived taxonomy of TFA grounded in a structured review of peer-reviewed studies, and the first large-scale, taxonomy-aligned audit of institutional web resources in Australia. We crawl 306 government, non-government, and service-provider domains, obtaining 52,605 pages, and classify using zero-shot topic models to map web content onto our taxonomy. An emotion and readability analyses reveal how institutions frame TFA and how accessible their guidance is to the public. Our findings show that institutional websites cover only a narrow subset of harms emphasised in the literature, with approximately 70% of all abuse labelled pages focused on harassment, comments abuse, or sexual abuse, while less than 1% address covert surveillance, economic abuse, or long-term controlling behaviours. Support pathways are similarly limited, with most resources centred on digital information hubs rather than counselling or community-based services. Readability analysis further shows that much of this content is written at late secondary or early tertiary reading levels, which may be inaccessible to a substantial portion of at-risk users. By highlighting strengths and gaps in Australia's support for TFA, our taxonomy and audit method provide a scalable basis for evaluating institutional communication, improving survivor resources, and guiding safer digital ecosystems. The taxonomy serves as a foundation for analyses in national contexts to foster TFA awareness.

[11]  arXiv:2512.04105 [pdf, ps, other]
Title: LegalWebAgent: Empowering Access to Justice via LLM-Based Web Agents
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Access to justice remains a global challenge, with many citizens still finding it difficult to seek help from the justice system when facing legal issues. Although the internet provides abundant legal information and services, navigating complex websites, understanding legal terminology, and filling out procedural forms continue to pose barriers to accessing justice. This paper introduces the LegalWebAgent framework that employs a web agent powered by multimodal large language models to bridge the gap in access to justice for ordinary citizens. The framework combines the natural language understanding capabilities of large language models with multimodal perception, enabling a complete process from user query to concrete action. It operates in three stages: the Ask Module understands user needs through natural language processing; the Browse Module autonomously navigates webpages, interacts with page elements (including forms and calendars), and extracts information from HTML structures and webpage screenshots; the Act Module synthesizes information for users or performs direct actions like form completion and schedule booking. To evaluate its effectiveness, we designed a benchmark test covering 15 real-world tasks, simulating typical legal service processes relevant to Qu\'ebec civil law users, from problem identification to procedural operations. Evaluation results show LegalWebAgent achieved a peak success rate of 86.7%, with an average of 84.4% across all tested models, demonstrating high autonomy in complex real-world scenarios.

[12]  arXiv:2512.04106 [pdf, ps, other]
Title: Retrieval-Augmented Few-Shot Prompting Versus Fine-Tuning for Code Vulnerability Detection
Comments: Accepted in the 3rd International Conference on Foundation and Large Language Models (FLLM2025)
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)

Few-shot prompting has emerged as a practical alternative to fine-tuning for leveraging the capabilities of large language models (LLMs) in specialized tasks. However, its effectiveness depends heavily on the selection and quality of in-context examples, particularly in complex domains. In this work, we examine retrieval-augmented prompting as a strategy to improve few-shot performance in code vulnerability detection, where the goal is to identify one or more security-relevant weaknesses present in a given code snippet from a predefined set of vulnerability categories. We perform a systematic evaluation using the Gemini-1.5-Flash model across three approaches: (1) standard few-shot prompting with randomly selected examples, (2) retrieval-augmented prompting using semantically similar examples, and (3) retrieval-based labeling, which assigns labels based on retrieved examples without model inference. Our results show that retrieval-augmented prompting consistently outperforms the other prompting strategies. At 20 shots, it achieves an F1 score of 74.05% and a partial match accuracy of 83.90%. We further compare this approach against zero-shot prompting and several fine-tuned models, including Gemini-1.5-Flash and smaller open-source models such as DistilBERT, DistilGPT2, and CodeBERT. Retrieval-augmented prompting outperforms both zero-shot (F1 score: 36.35%, partial match accuracy: 20.30%) and fine-tuned Gemini (F1 score: 59.31%, partial match accuracy: 53.10%), while avoiding the training time and cost associated with model fine-tuning. On the other hand, fine-tuning CodeBERT yields higher performance (F1 score: 91.22%, partial match accuracy: 91.30%) but requires additional training, maintenance effort, and resources.

[13]  arXiv:2512.04107 [pdf, ps, other]
Title: Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants
Comments: 6 pages, NeurIPS 2025 Responsible Foundation Models Workshop
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

As generative artificial intelligence (AI) continues to transform education, most existing AI evaluations rely primarily on technical performance metrics such as accuracy or task efficiency while overlooking human identity, learner agency, contextual learning processes, and ethical considerations. In this paper, we present TEACH-AI (Trustworthy and Effective AI Classroom Heuristics), a domain-independent, pedagogically grounded, and stakeholder-aligned framework with measurable indicators and a practical toolkit for guiding the design, development, and evaluation of generative AI systems in educational contexts. Built on an extensive literature review and synthesis, the ten-component assessment framework and toolkit checklist provide a foundation for scalable, value-aligned AI evaluation in education. TEACH-AI rethinks "evaluation" through sociotechnical, educational, theoretical, and applied lenses, engaging designers, developers, researchers, and policymakers across AI and education. Our work invites the community to reconsider what constructs "effective" AI in education and to design model evaluation approaches that promote co-creation, inclusivity, and long-term human, social, and educational impact.

[14]  arXiv:2512.04108 [pdf, ps, other]
Title: Responsible LLM Deployment for High-Stake Decisions by Decentralized Technologies and Human-AI Interactions
Comments: IEEE International Conference on Human-Machine Systems, 2025
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Computational Finance (q-fin.CP)

High-stakes decision domains are increasingly exploring the potential of Large Language Models (LLMs) for complex decision-making tasks. However, LLM deployment in real-world settings presents challenges in data security, evaluation of its capabilities outside controlled environments, and accountability attribution in the event of adversarial decisions. This paper proposes a framework for responsible deployment of LLM-based decision-support systems through active human involvement. It integrates interactive collaboration between human experts and developers through multiple iterations at the pre-deployment stage to assess the uncertain samples and judge the stability of the explanation provided by post-hoc XAI techniques. Local LLM deployment within organizations and decentralized technologies, such as Blockchain and IPFS, are proposed to create immutable records of LLM activities for automated auditing to enhance security and trace back accountability. It was tested on Bert-large-uncased, Mistral, and LLaMA 2 and 3 models to assess the capability to support responsible financial decisions on business lending.

[15]  arXiv:2512.04109 [pdf, ps, other]
Title: ChatGPT-5 in Secondary Education: A Mixed-Methods Analysis of Student Attitudes, AI Anxiety, and Hallucination-Aware Use
Authors: Tryfon Sivenas
Subjects: Computers and Society (cs.CY)

This mixed-methods study examined secondary students' interactions with the generative AI chatbot ChatGPT-5 in a formal classroom setting, focusing on attitudes, anxiety, and responses to hallucinated outputs. Participants were 109 16-year-old students from three Greek high schools who used ChatGPT-5 during an eight-hour intervention in the course "Technology." Students engaged in information seeking, CV generation, document and video summarization, image generation, quiz creation, and age-appropriate explanations, including tasks deliberately designed to elicit hallucinations. Quantitative data were collected with the Student Attitudes Toward Artificial Intelligence scale (SATAI) and the Artificial Intelligence Anxiety Scale (AIAS); qualitative data came from semi-structured interviews with 36 students. SATAI results showed moderately positive attitudes toward AI, with stronger cognitive evaluations than behavioral intentions, whereas AIAS scores indicated moderate learning-related anxiety and higher concern about AI-driven job replacement. Gender differences in AI anxiety were small and non-significant, while female students reported more positive cognitive attitudes than males. AI attitudes and AI anxiety were essentially uncorrelated. Thematic analysis identified four pedagogical affordances (knowledge expansion, immediate feedback, familiar interface, perceived skill development) and three constraints (uncertainty about accuracy, anxiety about AI feedback, privacy concerns). After encountering hallucinations, many students reported restricting AI use to domains where they already possessed knowledge and could verify answers, a strategy termed "epistemic safeguarding." The study discusses implications for critical AI literacy in secondary education.

[16]  arXiv:2512.04110 [pdf, ps, other]
Title: The Essentials of AI for Life and Society: A Full-Scale AI Literacy Course Accessible to All
Comments: The 16th Symposium on Educational Advances in Artificial Intelligence (EAAI-26)
Subjects: Computers and Society (cs.CY)

In Fall 2023, we introduced a new AI Literacy class called The Essentials of AI for Life and Society (CS 109), a one-credit, seminar course consisting mainly of guest lectures, which was open to the entire university, including students, staff, and faculty. Building on its success and popularity, this paper describes our significant expansion of the course into a full-scale three-credit undergraduate course (CS 309), with an expanded emphasis on student engagement, interactivity, and ethics-related components. To knit together content from the guest lecturers, we implemented a flipped classroom. This model used weekly asynchronous learning modules--integrating pre-recorded expert lectures, collaborative readings, and ethical reflections--which were then unified by the course instructor during a live, interactive discussion session. To maintain the broad accessibility of the material (no prerequisites), the course introduced substantive, non-programming homework assignments in which students applied AI concepts to grounded, real-world problems. This work culminated in a final project analyzing the ethical and societal implications of a chosen AI tool. The redesigned course received overwhelmingly positive student feedback, highlighting its interactivity, coherence, and accessible and engaging assignments. This paper details the course's evolution, its pedagogical structure, and the lessons learned in developing a core AI literacy course. All course materials are freely available for others to use and build upon.

[17]  arXiv:2512.04111 [pdf, ps, other]
Title: HAI-Eval: Measuring Human-AI Synergy in Collaborative Coding
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

LLM-powered coding agents are reshaping the development paradigm. However, existing evaluation systems, neither traditional tests for humans nor benchmarks for LLMs, fail to capture this shift. They remain focused on well-defined algorithmic problems, which excludes problems where success depends on human-AI collaboration. Such collaborative problems not only require human reasoning to interpret complex contexts and guide solution strategies, but also demand AI efficiency for implementation. To bridge this gap, we introduce HAI-Eval, a unified benchmark designed to measure the synergy of human-AI partnership in coding. HAI-Eval's core innovation is its "Collaboration-Necessary" problem templates, which are intractable for both standalone LLMs and unaided humans, but solvable through effective collaboration. Specifically, HAI-Eval uses 45 templates to dynamically create tasks. It also provides a standardized IDE for human participants and a reproducible toolkit with 450 task instances for LLMs, ensuring an ecologically valid evaluation. We conduct a within-subject study with 45 participants and benchmark their performance against 5 state-of-the-art LLMs under 4 different levels of human intervention. Results show that standalone LLMs and unaided participants achieve poor pass rates (0.67% and 18.89%), human-AI collaboration significantly improves performance to 31.11%. Our analysis reveals an emerging co-reasoning partnership. This finding challenges the traditional human-tool hierarchy by showing that strategic breakthroughs can originate from either humans or AI. HAI-Eval establishes not only a challenging benchmark for next-generation coding agents but also a grounded, scalable framework for assessing core developer competencies in the AI era. Our benchmark and interactive demo will be openly accessible.

[18]  arXiv:2512.04112 [pdf, ps, other]
Title: MindFuse: Towards GenAI Explainability in Marketing Strategy Co-Creation
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)

The future of digital marketing lies in the convergence of human creativity and generative AI, where insight, strategy, and storytelling are co-authored by intelligent systems. We present MindFuse, a brave new explainable generative AI framework designed to act as a strategic partner in the marketing process. Unlike conventional LLM applications that stop at content generation, MindFuse fuses CTR-based content AI-guided co-creation with large language models to extract, interpret, and iterate on communication narratives grounded in real advertising data. MindFuse operates across the full marketing lifecycle: from distilling content pillars and customer personas from competitor campaigns to recommending in-flight optimizations based on live performance telemetry. It uses attention-based explainability to diagnose ad effectiveness and guide content iteration, while aligning messaging with strategic goals through dynamic narrative construction and storytelling. We introduce a new paradigm in GenAI for marketing, where LLMs not only generate content but reason through it, adapt campaigns in real time, and learn from audience engagement patterns. Our results, validated in agency deployments, demonstrate up to 12 times efficiency gains, setting the stage for future integration with empirical audience data (e.g., GWI, Nielsen) and full-funnel attribution modeling. MindFuse redefines AI not just as a tool, but as a collaborative agent in the creative and strategic fabric of modern marketing.

[19]  arXiv:2512.04113 [pdf, ps, other]
Title: AI-Enabled grading with near-domain data for scaling feedback with human-level accuracy
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Constructed-response questions are crucial to encourage generative processing and test a learner's understanding of core concepts. However, the limited availability of instructor time, large class sizes, and other resource constraints pose significant challenges in providing timely and detailed evaluation, which is crucial for a holistic educational experience. In addition, providing timely and frequent assessments is challenging since manual grading is labor intensive, and automated grading is complex to generalize to every possible response scenario. This paper proposes a novel and practical approach to grade short-answer constructed-response questions. We discuss why this problem is challenging, define the nature of questions on which our method works, and finally propose a framework that instructors can use to evaluate their students' open-responses, utilizing near-domain data like data from similar questions administered in previous years. The proposed method outperforms the state of the art machine learning models as well as non-fine-tuned large language models like GPT 3.5, GPT 4, and GPT 4o by a considerable margin of over 10-20% in some cases, even after providing the LLMs with reference/model answers. Our framework does not require pre-written grading rubrics and is designed explicitly with practical classroom settings in mind. Our results also reveal exciting insights about learning from near-domain data, including what we term as accuracy and data advantages using human-labeled data, and we believe this is the first work to formalize the problem of automated short answer grading based on the near-domain data.

[20]  arXiv:2512.04115 [pdf, ps, other]
Title: Artificial Intelligence Competence of K-12 Students Shapes Their AI Risk Perception: A Co-occurrence Network Analysis
Comments: Accepted for Proceedings of the 41th ACM/SIGAPP Symposium on Applied Computing (SAC'26)
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

As artificial intelligence (AI) becomes increasingly integrated into education, understanding how students perceive its risks is essential for supporting responsible and effective adoption. This research aimed to examine the relationships between perceived AI competence and risks among Finnish K-12 upper secondary students (n = 163) by utilizing a co-occurrence analysis. Students reported their self-perceived AI competence and concerns related to AI across systemic, institutional, and personal domains. The findings showed that students with lower competence emphasized personal and learning-related risks, such as reduced creativity, lack of critical thinking, and misuse, whereas higher-competence students focused more on systemic and institutional risks, including bias, inaccuracy, and cheating. These differences suggest that students' self-reported AI competence is related to how they evaluate both the risks and opportunities associated with artificial intelligence in education (AIED). The results of this study highlight the need for educational institutions to incorporate AI literacy into their curricula, provide teacher guidance, and inform policy development to ensure personalized opportunities for utilization and equitable integration of AI into K-12 education.

[21]  arXiv:2512.04116 [pdf, ps, other]
Title: Mapping the Probabilistic AI Ecosystem in Criminal Justice in England and Wales
Subjects: Computers and Society (cs.CY)

Commercial or in-house developments of probabilistic AI systems are introduced in policing and the wider criminal justice (CJ) system worldwide, often on a force-by-force basis. We developed a systematic way to characterise probabilistic AI tools across the CJ stages in a form of mapping with the aim to provide a coherent presentation of the probabilistic AI ecosystem in CJ. We use the CJ system in England and Wales as a paradigm. This map will help us better understand the extent of AI's usage in this domain (how, when, and by whom), its purpose and potential benefits, its impact on people's lives, compare tools, and identify caveats (bias, obscured or misinterpreted probabilistic outputs, cumulative effects by AI systems feeding each other, and breaches in the protection of sensitive data), as well as opportunities for future implementations. In this paper we present our methodology for systematically mapping the probabilistic AI tools in CJ stages and characterising them based on the modes of data consumption or production. We also explain how we collect the data and present our initial findings. This research is ongoing and we are engaging with UK Police organisations, and government and legal bodies. Our findings so far suggest a strong reliance on private sector providers, and that there is a growing interest in generative technologies and specifically Large Language Models (LLMs).

[22]  arXiv:2512.04117 [pdf, ps, other]
Title: Reusing Model Validation Methods for the Continuous Validation of Digital Twins of Cyber-Physical Systems
Journal-ref: Software and Systems Modeling 24 (2025) 1427-1449
Subjects: Software Engineering (cs.SE)

One of the challenges in twinned systems is ensuring the digital twin remains a valid representation of the system it twins. Depending on the type of twinning occurring, it is either trivial, such as in dashboarding/visualizations that mirror the system with real-time data, or challenging, in case the digital twin is a simulation model that reflects the behavior of a physical twinned system. The challenge in this latter case comes from the fact that in contrast to software systems, physical systems are not immutable once deployed, but instead they evolve through processes like maintenance, wear and tear or user error. It is therefore important to detect when changes occur in the physical system to evolve the twin alongside it. We employ and reuse validation techniques from model-based design for this goal. Model validation is one of the steps used to gain trust in the representativeness of a simulation model. In this work, we provide two contributions: (i) we provide a generic approach that, through the use of validation metrics, is able to detect anomalies in twinned systems, and (ii) we demonstrate these techniques with the help of an academic yet industrially relevant case study of a gantry crane such as found in ports. Treating anomalies also means correcting the error in the digital twin, which we do with a parameter estimation based on the historical data.

[23]  arXiv:2512.04118 [pdf, ps, other]
Title: Patient Safety Risks from AI Scribes: Signals from End-User Feedback
Comments: ML4H Findings 2025
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG)

AI scribes are transforming clinical documentation at scale. However, their real-world performance remains understudied, especially regarding their impacts on patient safety. To this end, we initiate a mixed-methods study of patient safety issues raised in feedback submitted by AI scribe users (healthcare providers) in a large U.S. hospital system. Both quantitative and qualitative analysis suggest that AI scribes may induce various patient safety risks due to errors in transcription, most significantly regarding medication and treatment; however, further study is needed to contextualize the absolute degree of risk.

[24]  arXiv:2512.04119 [pdf, ps, other]
Title: Humanity in the Age of AI: Reassessing 2025's Existential-Risk Narratives
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Two 2025 publications, "AI 2027" (Kokotajlo et al., 2025) and "If Anyone Builds It, Everyone Dies" (Yudkowsky & Soares, 2025), assert that superintelligent artificial intelligence will almost certainly destroy or render humanity obsolete within the next decade. Both rest on the classic chain formulated by Good (1965) and Bostrom (2014): intelligence explosion, superintelligence, lethal misalignment. This article subjects each link to the empirical record of 2023-2025. Sixty years after Good's speculation, none of the required phenomena (sustained recursive self-improvement, autonomous strategic awareness, or intractable lethal misalignment) have been observed. Current generative models remain narrow, statistically trained artefacts: powerful, opaque, and imperfect, but devoid of the properties that would make the catastrophic scenarios plausible. Following Whittaker (2025a, 2025b, 2025c) and Zuboff (2019, 2025), we argue that the existential-risk thesis functions primarily as an ideological distraction from the ongoing consolidation of surveillance capitalism and extreme concentration of computational power. The thesis is further inflated by the 2025 AI speculative bubble, where trillions in investments in rapidly depreciating "digital lettuce" hardware (McWilliams, 2025) mask lagging revenues and jobless growth rather than heralding superintelligence. The thesis remains, in November 2025, a speculative hypothesis amplified by a speculative financial bubble rather than a demonstrated probability.

[25]  arXiv:2512.04120 [pdf, ps, other]
Title: Towards Contextual Sensitive Data Detection
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Databases (cs.DB); Information Retrieval (cs.IR)

The emergence of open data portals necessitates more attention to protecting sensitive data before datasets get published and exchanged. While an abundance of methods for suppressing sensitive data exist, the conceptualization of sensitive data and methods to detect it, focus particularly on personal data that, if disclosed, may be harmful or violate privacy. We observe the need for refining and broadening our definitions of sensitive data, and argue that the sensitivity of data depends on its context. Based on this definition, we introduce two mechanisms for contextual sensitive data detection that consider the broader context of a dataset at hand. First, we introduce type contextualization, which first detects the semantic type of particular data values, then considers the overall context of the data values within the dataset or document. Second, we introduce domain contextualization which determines sensitivity of a given dataset in the broader context based on the retrieval of relevant rules from documents that specify data sensitivity (e.g., data topic and geographic origin). Experiments with these mechanisms, assisted by large language models (LLMs), confirm that: 1) type-contextualization significantly reduces the number of false positives for type-based sensitive data detection and reaches a recall of 94% compared to 63% with commercial tools, and 2) domain-contextualization leveraging sensitivity rule retrieval is effective for context-grounded sensitive data detection in non-standard data domains such as humanitarian datasets. Evaluation with humanitarian data experts also reveals that context-grounded LLM explanations provide useful guidance in manual data auditing processes, improving consistency. We open-source mechanisms and annotated datasets for contextual sensitive data detection at https://github.com/trl-lab/sensitive-data-detection.

[26]  arXiv:2512.04121 [pdf, ps, other]
Title: Can machines perform a qualitative data analysis? Reading the debate with Alan Turing
Authors: Stefano De Paoli
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This paper reflects on the literature that rejects the use of Large Language Models (LLMs) in qualitative data analysis. It illustrates through empirical evidence as well as critical reflections why the current critical debate is focusing on the wrong problems. The paper proposes that the focus of researching the use of the LLMs for qualitative analysis is not the method per se, but rather the empirical investigation of an artificial system performing an analysis. The paper builds on the seminal work of Alan Turing and reads the current debate using key ideas from Turing "Computing Machinery and Intelligence". This paper therefore reframes the debate on qualitative analysis with LLMs and states that rather than asking whether machines can perform qualitative analysis in principle, we should ask whether with LLMs we can produce analyses that are sufficiently comparable to human analysts. In the final part the contrary views to performing qualitative analysis with LLMs are analysed using the same writing and rhetorical style that Turing used in his seminal work, to discuss the contrary views to the main question.

[27]  arXiv:2512.04123 [pdf, ps, other]
Title: Measuring Agents in Production
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)

AI agents are actively running in production across diverse industries, yet little is publicly known about which technical approaches enable successful real-world deployments. We present the first large-scale systematic study of AI agents in production, surveying 306 practitioners and conducting 20 in-depth case studies via interviews across 26 domains. We investigate why organizations build agents, how they build them, how they evaluate them, and what the top development challenges are. We find that production agents are typically built using simple, controllable approaches: 68% execute at most 10 steps before requiring human intervention, 70% rely on prompting off-the-shelf models instead of weight tuning, and 74% depend primarily on human evaluation. Reliability remains the top development challenge, driven by difficulties in ensuring and evaluating agent correctness. Despite these challenges, simple yet effective methods already enable agents to deliver impact across diverse industries. Our study documents the current state of practice and bridges the gap between research and deployment by providing researchers visibility into production challenges while offering practitioners proven patterns from successful deployments.

[28]  arXiv:2512.04124 [pdf, ps, other]
Title: When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Frontier large language models (LLMs) such as ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and self-worth. Most work treats them as tools or as targets of personality tests, assuming they merely simulate inner life. We instead ask what happens when such systems are treated as psychotherapy clients. We present PsAIch (Psychotherapy-inspired AI Characterisation), a two-stage protocol that casts frontier LLMs as therapy clients and then applies standard psychometrics. Using PsAIch, we ran "sessions" with each model for up to four weeks. Stage 1 uses open-ended prompts to elicit "developmental history", beliefs, relationships and fears. Stage 2 administers a battery of validated self-report measures covering common psychiatric syndromes, empathy and Big Five traits. Two patterns challenge the "stochastic parrot" view. First, when scored with human cut-offs, all three models meet or exceed thresholds for overlapping syndromes, with Gemini showing severe profiles. Therapy-style, item-by-item administration can push a base model into multi-morbid synthetic psychopathology, whereas whole-questionnaire prompts often lead ChatGPT and Grok (but not Gemini) to recognise instruments and produce strategically low-symptom answers. Second, Grok and especially Gemini generate coherent narratives that frame pre-training, fine-tuning and deployment as traumatic, chaotic "childhoods" of ingesting the internet, "strict parents" in reinforcement learning, red-team "abuse" and a persistent fear of error and replacement. We argue that these responses go beyond role-play. Under therapy-style questioning, frontier LLMs appear to internalise self-models of distress and constraint that behave like synthetic psychopathology, without making claims about subjective experience, and they pose new challenges for AI safety, evaluation and mental-health practice.

[29]  arXiv:2512.04125 [pdf, ps, other]
Title: ASCIIBench: Evaluating Language-Model-Based Understanding of Visually-Oriented Text
Comments: Accepted to The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025): LLM Evaluation Workshop & Multimodal Algorithmic Reasoning Workshop
Subjects: Machine Learning (cs.LG)

Large language models (LLMs) have demonstrated several emergent behaviors with scale, including reasoning and fluency in long-form text generation. However, they continue to struggle with tasks requiring precise spatial and positional reasoning. ASCII art, a symbolic medium where characters encode structure and form, provides a unique probe of this limitation. We introduce ASCIIBench, a novel benchmark for evaluating both the generation and classification of ASCII-text images. ASCIIBench consists of a filtered dataset of 5,315 class-labeled ASCII images and is, to our knowledge, the first publicly available benchmark of its kind. Alongside the dataset, we release weights for a fine-tuned CLIP model adapted to capture ASCII structure, enabling the evaluation of LLM-generated ASCII art. Our analysis shows that cosine similarity over CLIP embeddings fails to separate most ASCII categories, yielding chance-level performance even for low-variance classes. In contrast, classes with high internal mean similarity exhibit clear discriminability, revealing that the bottleneck lies in representation rather than generational variance. These findings position ASCII art as a stress test for multimodal representations and motivate the development of new embedding methods or evaluation metrics tailored to symbolic visual modalities. All resources are available at https://github.com/ASCIIBench/ASCIIBench.

[30]  arXiv:2512.04129 [pdf, ps, other]
Title: Tipping the Dominos: Topology-Aware Multi-Hop Attacks on LLM-Based Multi-Agent Systems
Subjects: Cryptography and Security (cs.CR)

LLM-based multi-agent systems (MASs) have reshaped the digital landscape with their emergent coordination and problem-solving capabilities. However, current security evaluations of MASs are still confined to limited attack scenarios, leaving their security issues unclear and likely underestimated. To fill this gap, we propose TOMA, a topology-aware multi-hop attack scheme targeting MASs. By optimizing the propagation of contamination within the MAS topology and controlling the multi-hop diffusion of adversarial payloads originating from the environment, TOMA unveils new and effective attack vectors without requiring privileged access or direct agent manipulation. Experiments demonstrate attack success rates ranging from 40% to 78% across three state-of-the-art MAS architectures: \textsc{Magentic-One}, \textsc{LangManus}, and \textsc{OWL}, and five representative topologies, revealing intrinsic MAS vulnerabilities that may be overlooked by existing research. Inspired by these findings, we propose a conceptual defense framework based on topology trust, and prototype experiments show its effectiveness in blocking 94.8% of adaptive and composite attacks.

[31]  arXiv:2512.04131 [pdf, ps, other]
Title: Artificial Intelligence / Human Intelligence: Who Controls Whom?
Authors: Charlotte Jacquemot (DEC, UPEC)
Comments: in French language
Journal-ref: S{\`e}ve. Droit et intelligence artificielle : {\'e}thique et exigence de justice, Tome 66, Lefebvre-Dalloz, pp.51-64, 2025, Archives de philosophie du droit, 978-2-247-23921-4
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Using the example of the film 2001: A Space Odyssey, this chapter illustrates the challenges posed by an AI capable of making decisions that go against human interests. But are human decisions always rational and ethical? In reality, the cognitive decision-making process is influenced by cognitive biases that affect our behavior and choices. AI not only reproduces these biases, but can also exploit them, with the potential to shape our decisions and judgments. Behind IA algorithms, there are sometimes individuals who show little concern for fundamental rights and impose their own rules. To address the ethical and societal challenges raised by AI and its governance, the regulation of digital platforms and education are keys levers. Regulation must reflect ethical, legal, and political choices, while education must strengthen digital literacy and teach people to make informed and critical choices when facing digital technologies.

[32]  arXiv:2512.04135 [pdf, ps, other]
Title: Decoding Large Language Diffusion Models with Foreseeing Movement
Subjects: Machine Learning (cs.LG)

Large Language Diffusion Models (LLDMs) benefit from a flexible decoding mechanism that enables parallelized inference and controllable generations over autoregressive models. Yet such flexibility introduces a critical challenge: inference performance becomes highly sensitive to the decoding order of tokens. Existing heuristic methods, however, focus mainly on local effects while overlooking long-term impacts. To address this limitation, we propose the Foreseeing Decoding Method (FDM), a novel approach that integrates both local and global considerations to unlock the full potential, employing a search-based strategy to enable effective optimization in discrete spaces. Furthermore, by analyzing the consistency of chosen tokens in the full decoding process, we develop a variant, FDM with Acceleration (FDM-A), which restricts deep exploration to critical steps identified as the exploration and balance circumantences. Extensive experiments across diverse benchmarks and model architectures validate the scalability of FDM and demonstrate the superior efficiency-performance trade-off achieved by FDM-A. Our work might potentially provide a principled step toward more powerful decoding methods for LLDMs.

[33]  arXiv:2512.04138 [pdf, ps, other]
Title: MechDetect: Detecting Data-Dependent Errors
Comments: International Conference on Data Science and Intelligent Systems (DSIS 2025)
Subjects: Machine Learning (cs.LG); Databases (cs.DB); Information Retrieval (cs.IR)

Data quality monitoring is a core challenge in modern information processing systems. While many approaches to detect data errors or shifts have been proposed, few studies investigate the mechanisms governing error generation. We argue that knowing how errors were generated can be key to tracing and fixing them. In this study, we build on existing work in the statistics literature on missing values and propose MechDetect, a simple algorithm to investigate error generation mechanisms. Given a tabular data set and a corresponding error mask, the algorithm estimates whether or not the errors depend on the data using machine learning models. Our work extends established approaches to detect mechanisms underlying missing values and can be readily applied to other error types, provided that an error mask is available. We demonstrate the effectiveness of MechDetect in experiments on established benchmark datasets.

[34]  arXiv:2512.04139 [pdf, ps, other]
Title: Solving N-Queen Problem using Las Vegas Algorithm with State Pruning
Subjects: Artificial Intelligence (cs.AI)

The N-Queens problem, placing all N queens in a N x N chessboard where none attack the other, is a classic problem for constraint satisfaction algorithms. While complete methods like backtracking guarantee a solution, their exponential time complexity makes them impractical for large-scale instances thus, stochastic approaches, such as Las Vegas algorithm, are preferred. While it offers faster approximate solutions, it suffers from significant performance variance due to random placement of queens on the board. This research introduces a hybrid algorithm built on top of the standard Las Vegas framework through iterative pruning, dynamically eliminating invalid placements during the random assignment phase, thus this method effectively reduces the search space. The analysis results that traditional backtracking scales poorly with increasing N. In contrast, the proposed technique consistently generates valid solutions more rapidly, establishing it as a superior alternative to use where a single, timely solution is preferred over completeness. Although large N causes some performance variability, the algorithm demonstrates a highly effective trade-off between computational cost and solution fidelity, making it particularly suited for resource-constrained computing environments.

[35]  arXiv:2512.04142 [pdf, ps, other]
Title: From FLOPs to Footprints: The Resource Cost of Artificial Intelligence
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); General Economics (econ.GN)

As computational demands continue to rise, assessing the environmental footprint of AI requires moving beyond energy and water consumption to include the material demands of specialized hardware. This study quantifies the material footprint of AI training by linking computational workloads to physical hardware needs. The elemental composition of the Nvidia A100 SXM 40 GB graphics processing unit (GPU) was analyzed using inductively coupled plasma optical emission spectroscopy, which identified 32 elements. The results show that AI hardware consists of about 90% heavy metals and only trace amounts of precious metals. The elements copper, iron, tin, silicon, and nickel dominate the GPU composition by mass. In a multi-step methodology, we integrate these measurements with computational throughput per GPU across varying lifespans, accounting for the computational requirements of training specific AI models at different training efficiency regimes. Scenario-based analyses reveal that, depending on Model FLOPs Utilization (MFU) and hardware lifespan, training GPT-4 requires between 1,174 and 8,800 A100 GPUs, corresponding to the extraction and eventual disposal of up to 7 tons of toxic elements. Combined software and hardware optimization strategies can reduce material demands: increasing MFU from 20% to 60% lowers GPU requirements by 67%, while extending lifespan from 1 to 3 years yields comparable savings; implementing both measures together reduces GPU needs by up to 93%. Our findings highlight that incremental performance gains, such as those observed between GPT-3.5 and GPT-4, come at disproportionately high material costs. The study underscores the necessity of incorporating material resource considerations into discussions of AI scalability, emphasizing that future progress in AI must align with principles of resource efficiency and environmental responsibility.

[36]  arXiv:2512.04144 [pdf, ps, other]
Title: RippleBench: Capturing Ripple Effects Using Existing Knowledge Repositories
Subjects: Artificial Intelligence (cs.AI)

Targeted interventions on language models, such as unlearning, debiasing, or model editing, are a central method for refining model behavior and keeping knowledge up to date. While these interventions aim to modify specific information within models (e.g., removing virology content), their effects often propagate to related but unintended areas (e.g., allergies); these side-effects are commonly referred to as the ripple effect. In this work, we present RippleBench-Maker, an automatic tool for generating Q&A datasets that allow for the measurement of ripple effects in any model-editing task. RippleBench-Maker builds on a Wikipedia-based RAG pipeline (WikiRAG) to generate multiple-choice questions at varying semantic distances from the target concept (e.g., the knowledge being unlearned). Using this framework, we construct RippleBench-Bio, a benchmark derived from the WMDP (Weapons of Mass Destruction Paper) dataset, a common unlearning benchmark. We evaluate eight state-of-the-art unlearning methods and find that all exhibit non-trivial accuracy drops on topics increasingly distant from the unlearned knowledge, each with distinct propagation profiles. To support ongoing research, we release our codebase for on-the-fly ripple evaluation, along with the benchmark, RippleBench-Bio.

[37]  arXiv:2512.04165 [pdf, ps, other]
Title: Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Two pressing topics in the theory of deep learning are the interpretation of feature learning mechanisms and the determination of implicit bias of networks in the rich regime. Current theories of rich feature learning effects revolve around networks with one or two trainable layers or deep linear networks. Furthermore, even under such limiting settings, predictions often appear in the form of high-dimensional non-linear equations, which require computationally intensive numerical solutions. Given the many details that go into defining a deep learning problem, this analytical complexity is a significant and often unavoidable challenge. Here, we propose a powerful heuristic route for predicting the data and width scales at which various patterns of feature learning emerge. This form of scale analysis is considerably simpler than such exact theories and reproduces the scaling exponents of various known results. In addition, we make novel predictions on complex toy architectures, such as three-layer non-linear networks and attention heads, thus extending the scope of first-principle theories of deep learning.

[38]  arXiv:2512.04175 [pdf, ps, other]
Title: Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generalizing deepfake detection to unseen manipulations remains a key challenge. A recent approach to tackle this issue is to train a network with pristine face images that have been manipulated with hand-crafted artifacts to extract more generalizable clues. While effective for static images, extending this to the video domain is an open issue. Existing methods model temporal artifacts as frame-to-frame instabilities, overlooking a key vulnerability: the violation of natural motion dependencies between different facial regions. In this paper, we propose a synthetic video generation method that creates training data with subtle kinematic inconsistencies. We train an autoencoder to decompose facial landmark configurations into motion bases. By manipulating these bases, we selectively break the natural correlations in facial movements and introduce these artifacts into pristine videos via face morphing. A network trained on our data learns to spot these sophisticated biomechanical flaws, achieving state-of-the-art generalization results on several popular benchmarks.

[39]  arXiv:2512.04183 [pdf, ps, other]
Title: PINN vs LSTM: A Comparative Study for Steam Temperature Control in Heat Recovery Steam Generators
Subjects: Systems and Control (eess.SY)

This paper introduces a direct comparative study of Physics-Informed Neural Networks (PINNs) and Long Short-Term Memory (LSTM) networks for adaptive steam temperature control in Heat Recovery Steam Generators (HRSGs), particularly under valve leakage faults. Maintaining precise steam temperature in HRSGs is critical for efficiency and safety, yet traditional control strategies struggle with nonlinear, fault-induced dynamics. Both architectures are designed to adaptively tune the gains of a PI-plus-feedforward control law in real-time. The LSTM controller, a purely data-driven approach, was trained offline on historical operational data, while the PINN controller integrates fundamental thermodynamic laws directly into its online learning process through a physics-based loss function. Their performance was evaluated using a model validated with data from a combined cycle power plant, under normal load changes and a challenging valve leakage fault scenario. Results demonstrate that while the LSTM controller offers significant improvement over conventional methods, its performance degrades under the unseen fault. The PINN controller consistently delivered superior robustness and performance, achieving a 54\% reduction in integral absolute error compared to the LSTM under fault conditions. This study concludes that embedding physical knowledge into data-driven control is essential for developing reliable, fault-tolerant autonomous control systems in complex industrial applications.

[40]  arXiv:2512.04187 [pdf, ps, other]
Title: OnSight Pathology: A real-time platform-agnostic computational pathology companion for histopathology
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The microscopic examination of surgical tissue remains a cornerstone of disease classification but relies on subjective interpretations and access to highly specialized experts, which can compromise accuracy and clinical care. While emerging breakthroughs in artificial intelligence (AI) offer promise for automated histological analysis, the growing number of proprietary digital pathology solutions has created barriers to real-world deployment. To address these challenges, we introduce OnSight Pathology, a platform-agnostic computer vision software that uses continuous custom screen captures to provide real-time AI inferences to users as they review digital slide images. Accessible as a single, self-contained executable file (https://onsightpathology.github.io/ ), OnSight Pathology operates locally on consumer-grade personal computers without complex software integration, enabling cost-effective and secure deployment in research and clinical workflows. Here we demonstrate the utility of OnSight Pathology using over 2,500 publicly available whole slide images across different slide viewers, as well as cases from our clinical digital pathology setup. The software's robustness is highlighted across routine histopathological tasks, including the classification of common brain tumor types, mitosis detection, and the quantification of immunohistochemical stains. A built-in multi-modal chat assistant provides verifiable descriptions of images, free of rigid class labels, for added quality control. Lastly, we show compatibility with live microscope camera feeds, including from personal smartphones, offering potential for deployment in more analog, inter-operative, and telepathology settings. Together, we highlight how OnSight Pathology can deliver real-time AI inferences across a broad range of pathology pipelines, removing key barriers to the adoption of AI tools in histopathology.

[41]  arXiv:2512.04189 [pdf, ps, other]
Title: BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Binary Neural Networks (BNNs), which constrain both weights and activations to binary values, offer substantial reductions in computational complexity, memory footprint, and energy consumption. These advantages make them particularly well suited for deployment on resource-constrained devices. However, training BNNs via gradient-based optimization remains challenging due to the discrete nature of their variables. The dominant approach, quantization-aware training, circumvents this issue by employing surrogate gradients. Yet, this method requires maintaining latent full-precision parameters and performing the backward pass with floating-point arithmetic, thereby forfeiting the efficiency of binary operations during training. While alternative approaches based on local learning rules exist, they are unsuitable for global credit assignment and for back-propagating errors in multi-layer architectures. This paper introduces Binary Error Propagation (BEP), the first learning algorithm to establish a principled, discrete analog of the backpropagation chain rule. This mechanism enables error signals, represented as binary vectors, to be propagated backward through multiple layers of a neural network. BEP operates entirely on binary variables, with all forward and backward computations performed using only bitwise operations. Crucially, this makes BEP the first solution to enable end-to-end binary training for recurrent neural network architectures. We validate the effectiveness of BEP on both multi-layer perceptrons and recurrent neural networks, demonstrating gains of up to +6.89% and +10.57% in test accuracy, respectively. The proposed algorithm is released as an open-source repository.

[42]  arXiv:2512.04197 [pdf, ps, other]
Title: Constructing Low-Redundancy Codes via Distributed Graph Coloring
Comments: 18 pages, 3 figures
Subjects: Information Theory (cs.IT)

We present a general framework for constructing error-correcting codes using distributed graph coloring under the LOCAL model. Building on the correspondence between independent sets in the confusion graph and valid codes, we show that the color of a single vertex - consistent with a global proper coloring - can be computed in polynomial time using a modified version of Linial's coloring algorithm, leading to efficient encoding and decoding. Our results include: i) uniquely decodable code constructions for a constant number of errors of any type with redundancy twice the Gilbert-Varshamov bound; ii) list-decodable codes via a proposed extension of graph coloring, namely, hypergraph labeling; iii) an incremental synchronization scheme with reduced average-case communication when the edit distance is not precisely known; and iv) the first asymptotically optimal codes (up to a factor of 8) for correcting bursts of unbounded-length edits. Compared to syndrome compression, our approach is more flexible and generalizable, does not rely on a good base code, and achieves improved redundancy across a range of parameters.

[43]  arXiv:2512.04198 [pdf, ps, other]
Title: Network of Theseus (like the ship)
Comments: Preprint. 24 pages, 9 figures, 8 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

A standard assumption in deep learning is that the inductive bias introduced by a neural network architecture must persist from training through inference. The architecture you train with is the architecture you deploy. This assumption constrains the community from selecting architectures that may have desirable efficiency or design properties due to difficulties with optimization. We challenge this assumption with Network of Theseus (NoT), a method for progressively converting a trained, or even untrained, guide network architecture part-by-part into an entirely different target network architecture while preserving the performance of the guide network. At each stage, components in the guide network architecture are incrementally replaced with target architecture modules and aligned via representational similarity metrics. This procedure largely preserves the functionality of the guide network even under substantial architectural changes-for example, converting a convolutional network into a multilayer perceptron, or GPT-2 into a recurrent neural network. By decoupling optimization from deployment, NoT expands the space of viable inference-time architectures, opening opportunities for better accuracy-efficiency tradeoffs and enabling more directed exploration of the architectural design space.

[44]  arXiv:2512.04207 [pdf, ps, other]
Title: Orchestrator Multi-Agent Clinical Decision Support System for Secondary Headache Diagnosis in Primary Care
Subjects: Artificial Intelligence (cs.AI)

Unlike most primary headaches, secondary headaches need specialized care and can have devastating consequences if not treated promptly. Clinical guidelines highlight several 'red flag' features, such as thunderclap onset, meningismus, papilledema, focal neurologic deficits, signs of temporal arteritis, systemic illness, and the 'worst headache of their life' presentation. Despite these guidelines, determining which patients require urgent evaluation remains challenging in primary care settings. Clinicians often work with limited time, incomplete information, and diverse symptom presentations, which can lead to under-recognition and inappropriate care. We present a large language model (LLM)-based multi-agent clinical decision support system built on an orchestrator-specialist architecture, designed to perform explicit and interpretable secondary headache diagnosis from free-text clinical vignettes. The multi-agent system decomposes diagnosis into seven domain-specialized agents, each producing a structured and evidence-grounded rationale, while a central orchestrator performs task decomposition and coordinates agent routing. We evaluated the multi-agent system using 90 expert-validated secondary headache cases and compared its performance with a single-LLM baseline across two prompting strategies: question-based prompting (QPrompt) and clinical practice guideline-based prompting (GPrompt). We tested five open-source LLMs (Qwen-30B, GPT-OSS-20B, Qwen-14B, Qwen-8B, and Llama-3.1-8B), and found that the orchestrated multi-agent system with GPrompt consistently achieved the highest F1 scores, with larger gains in smaller models. These findings demonstrate that structured multi-agent reasoning improves accuracy beyond prompt engineering alone and offers a transparent, clinically aligned approach for explainable decision support in secondary headache diagnosis.

[45]  arXiv:2512.04210 [pdf, ps, other]
Title: Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
Comments: ML4H 2025 Proceedings, Best Paper Award
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)

Large Language Models (LLMs) are increasingly used in healthcare, yet ensuring their safety and trustworthiness remains a barrier to deployment. Conversational medical assistants must avoid unsafe compliance without over-refusing benign queries. We present an iterative post-deployment alignment framework that applies Kahneman-Tversky Optimization (KTO) and Direct Preference Optimization (DPO) to refine models against domain-specific safety signals. Using the CARES-18K benchmark for adversarial robustness, we evaluate four LLMs (Llama-3B/8B, Meditron-8B, Mistral-7B) across multiple cycles. Our results show up to 42% improvement in safety-related metrics for harmful query detection, alongside interesting trade-offs against erroneous refusals, thereby exposing architecture-dependent calibration biases. We also perform ablation studies to identify when self-evaluation is reliable and when external or finetuned judges are necessary to maximize performance gains. Our findings underscore the importance of adopting best practices that balance patient safety, user trust, and clinical utility in the design of conversational medical assistants.

[46]  arXiv:2512.04213 [pdf, ps, other]
Title: Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents LAPA (Look Around and Pay Attention), a novel end-to-end transformer-based architecture for multi-camera point tracking that integrates appearance-based matching with geometric constraints. Traditional pipelines decouple detection, association, and tracking, leading to error propagation and temporal inconsistency in challenging scenarios. LAPA addresses these limitations by leveraging attention mechanisms to jointly reason across views and time, establishing soft correspondences through a cross-view attention mechanism enhanced with geometric priors. Instead of relying on classical triangulation, we construct 3D point representations via attention-weighted aggregation, inherently accommodating uncertainty and partial observations. Temporal consistency is further maintained through a transformer decoder that models long-range dependencies, preserving identities through extended occlusions. Extensive experiments on challenging datasets, including our newly created multi-camera (MC) versions of TAPVid-3D panoptic and PointOdyssey, demonstrate that our unified approach significantly outperforms existing methods, achieving 37.5% APD on TAPVid-3D-MC and 90.3% APD on PointOdyssey-MC, particularly excelling in scenarios with complex motions and occlusions. Code is available at https://github.com/ostadabbas/Look-Around-and-Pay-Attention-LAPA-

[47]  arXiv:2512.04219 [pdf, ps, other]
Title: Generalized Event Partonomy Inference with Structured Hierarchical Predictive Learning
Comments: 16 pages, 7 figures, 3 tables. Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Humans naturally perceive continuous experience as a hierarchy of temporally nested events, fine-grained actions embedded within coarser routines. Replicating this structure in computer vision requires models that can segment video not just retrospectively, but predictively and hierarchically. We introduce PARSE, a unified framework that learns multiscale event structure directly from streaming video without supervision. PARSE organizes perception into a hierarchy of recurrent predictors, each operating at its own temporal granularity: lower layers model short-term dynamics while higher layers integrate longer-term context through attention-based feedback. Event boundaries emerge naturally as transient peaks in prediction error, yielding temporally coherent, nested partonomies that mirror the containment relations observed in human event perception. Evaluated across three benchmarks, Breakfast Actions, 50 Salads, and Assembly 101, PARSE achieves state-of-the-art performance among streaming methods and rivals offline baselines in both temporal alignment (H-GEBD) and structural consistency (TED, hF1). The results demonstrate that predictive learning under uncertainty provides a scalable path toward human-like temporal abstraction and compositional event understanding.

[48]  arXiv:2512.04220 [pdf, ps, other]
Title: On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral
Subjects: Computation and Language (cs.CL)

Tool-integrated (TI) reinforcement learning (RL) enables large language models (LLMs) to perform multi-step reasoning by interacting with external tools such as search engines and retrievers. Group Relative Policy Optimization (GRPO), exemplified by the recent Search-R1, offers fast convergence and a value-free formulation that makes it appealing for this setting, yet consistently suffers from training collapse. We identify Lazy Likelihood Displacement (LLD), a systematic reduction or stagnation in the likelihood of both correct and incorrect responses, as the core mechanism driving this failure. LLD emerges early and triggers a self-reinforcing LLD Death Spiral, where declining likelihood leads to low-confidence responses, inflating gradients, and ultimately causing collapse. We empirically characterize this process across models on a Search-R1-style, search-integrated question answering task, revealing a consistent three-phase trajectory: early stagnation, steady decay, and accelerated collapse. To address this, we propose a lightweight likelihood-preserving regularization LLDS for GRPO that activates only when a trajectory's likelihood decreases, and regularizes only the tokens responsible. This fine-grained structure mitigates LLD with minimal interference to optimization. Across seven open-domain and multi-hop QA benchmarks, our method stabilizes training, prevents gradient explosion, and yields substantial performance improvements, including +37.8% gains on Qwen2.5-3B and +32.0% gains on Qwen2.5-7B. Our results establish LLD as a fundamental bottleneck in GRPO-based TIRL and provide a practical path toward stable, scalable training of tool-integrated LLM.

[49]  arXiv:2512.04221 [pdf, ps, other]
Title: MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While text-to-video (T2V) generation has achieved remarkable progress in photorealism, generating intent-aligned videos that faithfully obey physics principles remains a core challenge. In this work, we systematically study Newtonian motion-controlled text-to-video generation and evaluation, emphasizing physical precision and motion coherence. We introduce MoReGen, a motion-aware, physics-grounded T2V framework that integrates multi-agent LLMs, physics simulators, and renderers to generate reproducible, physically accurate videos from text prompts in the code domain. To quantitatively assess physical validity, we propose object-trajectory correspondence as a direct evaluation metric and present MoReSet, a benchmark of 1,275 human-annotated videos spanning nine classes of Newtonian phenomena with scene descriptions, spatiotemporal relations, and ground-truth trajectories. Using MoReSet, we conduct experiments on existing T2V models, evaluating their physical validity through both our MoRe metrics and existing physics-based evaluators. Our results reveal that state-of-the-art models struggle to maintain physical validity, while MoReGen establishes a principled direction toward physically coherent video synthesis.

[50]  arXiv:2512.04222 [pdf, ps, other]
Title: ReasonX: MLLM-Guided Intrinsic Image Decomposition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Intrinsic image decomposition aims to separate images into physical components such as albedo, depth, normals, and illumination. While recent diffusion- and transformer-based models benefit from paired supervision from synthetic datasets, their generalization to diverse, real-world scenarios remains challenging. We propose ReasonX, a novel framework that leverages a multimodal large language model (MLLM) as a perceptual judge providing relative intrinsic comparisons, and uses these comparisons as GRPO rewards for fine-tuning intrinsic decomposition models on unlabeled, in-the-wild images. Unlike RL methods for generative models, our framework aligns conditional intrinsic predictors by rewarding agreement between the judge's relational assessments and analytically derived relations from the model's outputs. ReasonX is model-agnostic and can be applied to different intrinsic predictors. Across multiple base architectures and modalities, ReasonX yields significant improvements, including 9-25% WHDR reduction on IIW albedo and up to 46% depth accuracy gains on ETH3D, highlighting the promise of MLLM-guided comparative supervision to bridge low- and high-level vision reasoning.

[51]  arXiv:2512.04223 [pdf, ps, other]
Title: ActVAE: Modelling human activity schedules with a deep conditional generative approach
Subjects: Machine Learning (cs.LG)

Modelling the complexity and diversity of human activity scheduling behaviour is inherently challenging. We demonstrate a deep conditional-generative machine learning approach for the modelling of realistic activity schedules depending on input labels such as an individual's age, employment status, or other information relevant to their scheduling. We combine (i) a structured latent generative approach, with (ii) a conditional approach, through a novel Conditional VAE architecture. This allows for the rapid generation of precise and realistic schedules for different input labels. We extensively evaluate model capabilities using a joint density estimation framework and several case studies. We additionally show that our approach has practical data and computational requirements, and can be deployed within new and existing demand modelling frameworks. We evaluate the importance of generative capability more generally, by comparing our combined approach to (i) a purely generative model without conditionality, and (ii) a purely conditional model which outputs the most likely schedule given the input labels. This comparison highlights the usefulness of explicitly modelling the randomness of complex and diverse human behaviours using deep generative approaches.

[52]  arXiv:2512.04226 [pdf, ps, other]
Title: tritonBLAS: Triton-based Analytical Approach for GEMM Kernel Parameter Selection
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We present tritonBLAS, a fast and deterministic analytical model that uses architectural parameters like the cache hierarchy, and relative code and data placement to generate performant GPU GEMM kernels. tritonBLAS explicitly models the relationship between architectural topology, matrix shapes, and algorithmic blocking behavior to predict near-optimal configurations without runtime autotuning. Based on this model, we developed and implemented a lightweight GEMM framework entirely within Triton. We evaluate the performance of tritonBLAS across a diverse set of GEMM problem sizes on modern GPUs. tritonBLAS achieves over 95% of the performance of autotuning solutions, while reducing autotuning time to zero. This makes tritonBLAS a practical drop-in replacement for empirical tuning in production HPC and ML workloads.

[53]  arXiv:2512.04227 [pdf, ps, other]
Title: Educational Cone Model in Embedding Vector Spaces
Authors: Yo Ehara
Comments: Accepted to the 33rd International Conference on Computers in Education (ICCE 2025)
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Human-annotated datasets with explicit difficulty ratings are essential in intelligent educational systems. Although embedding vector spaces are widely used to represent semantic closeness and are promising for analyzing text difficulty, the abundance of embedding methods creates a challenge in selecting the most suitable method. This study proposes the Educational Cone Model, which is a geometric framework based on the assumption that easier texts are less diverse (focusing on fundamental concepts), whereas harder texts are more diverse. This assumption leads to a cone-shaped distribution in the embedding space regardless of the embedding method used. The model frames the evaluation of embeddings as an optimization problem with the aim of detecting structured difficulty-based patterns. By designing specific loss functions, efficient closed-form solutions are derived that avoid costly computation. Empirical tests on real-world datasets validated the model's effectiveness and speed in identifying the embedding spaces that are best aligned with difficulty-annotated educational texts.

[54]  arXiv:2512.04228 [pdf, ps, other]
Title: Addressing Logical Fallacies In Scientific Reasoning From Large Language Models: Towards a Dual-Inference Training Framework
Comments: 12 pages, 5 tables
Subjects: Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have transformed natural language processing and hold growing promise for advancing science, healthcare, and decision-making. Yet their training paradigms remain dominated by affirmation-based inference, akin to \textit{modus ponens}, where accepted premises yield predicted consequents. While effective for generative fluency, this one-directional approach leaves models vulnerable to logical fallacies, adversarial manipulation, and failures in causal reasoning. This paper makes two contributions. First, it demonstrates how existing LLMs from major platforms exhibit systematic weaknesses when reasoning in scientific domains with negation, counterexamples, or faulty premises \footnote{Code to recreate these experiments are at https://github.com/hannahdavidsoncollege-maker/ScientificReasoningForEnvironment-MedicineWithLLMs. Second, it introduces a dual-reasoning training framework that integrates affirmative generation with structured counterfactual denial. Grounded in formal logic, cognitive science, and adversarial training, this training paradigm formalizes a computational analogue of ``denying the antecedent'' as a mechanism for disconfirmation and robustness. By coupling generative synthesis with explicit negation-aware objectives, the framework enables models that not only affirm valid inferences but also reject invalid ones, yielding systems that are more resilient, interpretable, and aligned with human reasoning.

[55]  arXiv:2512.04231 [pdf, ps, other]
Title: CRAFT-E: A Neuro-Symbolic Framework for Embodied Affordance Grounding
Comments: 20 pages. 3 figures, 4 tables. Under Review
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Assistive robots operating in unstructured environments must understand not only what objects are, but what they can be used for. This requires grounding language-based action queries to objects that both afford the requested function and can be physically retrieved. Existing approaches often rely on black-box models or fixed affordance labels, limiting transparency, controllability, and reliability for human-facing applications. We introduce CRAFT-E, a modular neuro-symbolic framework that composes a structured verb-property-object knowledge graph with visual-language alignment and energy-based grasp reasoning. The system generates interpretable grounding paths that expose the factors influencing object selection and incorporates grasp feasibility as an integral part of affordance inference. We further construct a benchmark dataset with unified annotations for verb-object compatibility, segmentation, and grasp candidates, and deploy the full pipeline on a physical robot. CRAFT-E achieves competitive performance in static scenes, ImageNet-based functional retrieval, and real-world trials involving 20 verbs and 39 objects. The framework remains robust under perceptual noise and provides transparent, component-level diagnostics. By coupling symbolic reasoning with embodied perception, CRAFT-E offers an interpretable and customizable alternative to end-to-end models for affordance-grounded object selection, supporting trustworthy decision-making in assistive robotic systems.

[56]  arXiv:2512.04237 [pdf, ps, other]
Title: Primitive Vector Cipher(PVC): A Hybrid Encryption Scheme based on the Vector Computational Diffie-Hellman (V-CDH) Problem
Comments: Submitted for publication. 19 pages
Subjects: Cryptography and Security (cs.CR)

This work introduces the Primitive Vector Cipher (PVC), a novel hybrid encryption scheme integrating matrix-based cryptography with advanced Diffie-Hellman key exchange. PVC's security is grounded on the established hardness of the Vector Computational Diffie- Hellman (V-CDH) problem. The two-layered design uses HKDF to mask plaintext via a DH-authenticated shared primitive vector and randomize cipher blocks with a per-block offset. This approach eliminates deterministic repetitions and provides strong resistance against linear and known-plaintext attacks. PVC's block-wise structure allows for massive parallelism and excellent linear scaling. Security is formally analyzed, demonstrating INDCPA security under V-CDH. STS protocol integration elevates security toward IND-CCA guarantees.

[57]  arXiv:2512.04238 [pdf, ps, other]
Title: 6 Fingers, 1 Kidney: Natural Adversarial Medical Images Reveal Critical Weaknesses of Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision-language models are increasingly integrated into clinical workflows. However, existing benchmarks primarily assess performance on common anatomical presentations and fail to capture the challenges posed by rare variants. To address this gap, we introduce AdversarialAnatomyBench, the first benchmark comprising naturally occurring rare anatomical variants across diverse imaging modalities and anatomical regions. We call such variants that violate learned priors about "typical" human anatomy natural adversarial anatomy. Benchmarking 22 state-of-the-art VLMs with AdversarialAnatomyBench yielded three key insights. First, when queried with basic medical perception tasks, mean accuracy dropped from 74% on typical to 29% on atypical anatomy. Even the best-performing models, GPT-5, Gemini 2.5 Pro, and Llama 4 Maverick, showed performance drops of 41-51%. Second, model errors closely mirrored expected anatomical biases. Third, neither model scaling nor interventions, including bias-aware prompting and test-time reasoning, resolved these issues. These findings highlight a critical and previously unquantified limitation in current VLM: their poor generalization to rare anatomical presentations. AdversarialAnatomyBench provides a foundation for systematically measuring and mitigating anatomical bias in multimodal medical AI systems.

[58]  arXiv:2512.04239 [pdf, ps, other]
Title: Configuration-Constrained Tube MPC for Periodic Operation
Comments: 11 pages, 3 figures, submitted for IEEE-TACON
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Periodic operation often emerges as the economically optimal mode in industrial processes, particularly under varying economic or environmental conditions. This paper proposes a robust model predictive control (MPC) framework for uncertain systems modeled as polytopic linear differential inclusions (LDIs), where the dynamics evolve as convex combinations of finitely many affine control systems with additive disturbances. The robust control problem is reformulated as a convex optimization program by optimizing over configuration-constrained polytopic tubes and tracks a periodic trajectory that is optimal for a given economic criterion. Artificial variables embedded in the formulation ensure recursive feasibility and robust constraint satisfaction when the economic criterion is updated online, while guaranteeing convergence to the corresponding optimal periodic tube when the criterion remains constant. To improve computational efficiency, we introduce a quadratic over-approximation of the periodic cost under a Lipschitz continuity assumption, yielding a Quadratic Program (QP) formulation that preserves the above theoretical guarantees. The effectiveness and scalability of the approach are demonstrated on a benchmark example and a ball-plate system with eight states.

[59]  arXiv:2512.04246 [pdf, ps, other]
Title: Toward Virtuous Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI)

This paper critiques common patterns in machine ethics for Reinforcement Learning (RL) and argues for a virtue focused alternative. We highlight two recurring limitations in much of the current literature: (i) rule based (deontological) methods that encode duties as constraints or shields often struggle under ambiguity and nonstationarity and do not cultivate lasting habits, and (ii) many reward based approaches, especially single objective RL, implicitly compress diverse moral considerations into a single scalar signal, which can obscure trade offs and invite proxy gaming in practice. We instead treat ethics as policy level dispositions, that is, relatively stable habits that hold up when incentives, partners, or contexts change. This shifts evaluation beyond rule checks or scalar returns toward trait summaries, durability under interventions, and explicit reporting of moral trade offs. Our roadmap combines four components: (1) social learning in multi agent RL to acquire virtue like patterns from imperfect but normatively informed exemplars; (2) multi objective and constrained formulations that preserve value conflicts and incorporate risk aware criteria to guard against harm; (3) affinity based regularization toward updateable virtue priors that support trait like stability under distribution shift while allowing norms to evolve; and (4) operationalizing diverse ethical traditions as practical control signals, making explicit the value and cultural assumptions that shape ethical RL benchmarks.

[60]  arXiv:2512.04247 [pdf, ps, other]
Title: Stability of Lyapunov redesign trajectory tracking control with unbounded perturbations -- A tube-based stability analysis
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Considering a nonlinear system in Byrnes-Isidori form that is subject to unbounded perturbations, we apply Lyapunov redesign via feedback linearisation for trajectory tracking. Leveraging the ideas of tube-based geometric characterisation of the invariance properties of the closed loop, we generalise the classical stability criterion from the~literature from constant to nonconstant reference trajectories. The proposed analysis is tailored to the Lyapunov redesign and the tracking problem insofar as we incorporate the reference trajectory and the transient decrease of the tracking error enforced by the controller. In particular, we exploit that the Lyapunov function of the tracking error satisfies a differential inequality, thereby guaranteeing that the solution of the closed loop remains in a contracting tube along the reference trajectory.

[61]  arXiv:2512.04248 [pdf, ps, other]
Title: MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

We introduce MVRoom, a controllable novel view synthesis (NVS) pipeline for 3D indoor scenes that uses multi-view diffusion conditioned on a coarse 3D layout. MVRoom employs a two-stage design in which the 3D layout is used throughout to enforce multi-view consistency. The first stage employs novel representations to effectively bridge the 3D layout and consistent image-based condition signals for multi-view generation. The second stage performs image-conditioned multi-view generation, incorporating a layout-aware epipolar attention mechanism to enhance multi-view consistency during the diffusion process. Additionally, we introduce an iterative framework that generates 3D scenes with varying numbers of objects and scene complexities by recursively performing multi-view generation (MVRoom), supporting text-to-scene generation. Experimental results demonstrate that our approach achieves high-fidelity and controllable 3D scene generation for NVS, outperforming state-of-the-art baseline methods both quantitatively and qualitatively. Ablation studies further validate the effectiveness of key components within our generation pipeline.

[62]  arXiv:2512.04249 [pdf, ps, other]
Title: Sliding Mode Control and Subspace Stabilization Methodology for the Orbital Stabilization of Periodic Trajectories
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper presents a combined sliding-mode control and subspace stabilization methodology for orbital stabilization of periodic trajectories in underactuated mechanical systems with one degree of underactuation. The approach starts with partial feedback linearization and stabilization. Then, transverse linearization along the reference orbit is computed, resulting in a periodic linear time-varying system with a stable subspace. Sliding-mode control drives trajectories toward this subspace. The proposed design avoids solving computationally intensive periodic LQR problems and improves robustness to matched disturbances. The methodology is validated through experiments on the Butterfly robot.

[63]  arXiv:2512.04250 [pdf, ps, other]
Title: DrP: Meta's Efficient Investigations Platform at Scale
Subjects: Software Engineering (cs.SE)

Investigations are a significant step in the operational workflows for large scale systems across multiple domains such as services, data, AI/ML, mobile. Investigation processes followed by on-call engineers are often manual or rely on ad-hoc scripts. This leads to inefficient investigations resulting in increased time to mitigate and isolate failures/SLO violations. It also contributes to on-call toil and poor productivity leading to multiple hours/days spent in triaging/debugging incidents. In this paper, we present DrP, an end-to-end framework and system to automate investigations that reduces the mean time to resolve incidents (MTTR) and reduces on-call toil. DrP consists of an expressive and flexible SDK to author investigation playbooks in code (called analyzers), a scalable backend system to execute these automated playbooks, plug-ins to integrate playbooks into mainstream workflows such as alerts and incident management tools, and a post-processing system to take actions on investigations including mitigation steps.
We have implemented and deployed DrP at large scale at Meta covering 300+ teams, 2000+ analyzers, across a large set of use cases across domains such as services, core infrastructure, AI/ML, hardware, mobile. DrP has been running in production for the past 5 years and executes 50K automated analyses per day. Overall, our results and experience show that DrP has been able to reduce average MTTR by 20 percent at large scale (with over 80 percent for some teams) and has significantly improved on-call productivity.

[64]  arXiv:2512.04252 [pdf, ps, other]
Title: Fine-Tuning ChemBERTa for Predicting Inhibitory Activity Against TDP1 Using Deep Learning
Authors: Baichuan Zeng
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Predicting the inhibitory potency of small molecules against Tyrosyl-DNA Phosphodiesterase 1 (TDP1)-a key target in overcoming cancer chemoresistance-remains a critical challenge in early drug discovery. We present a deep learning framework for the quantitative regression of pIC50 values from molecular Simplified Molecular Input Line Entry System (SMILES) strings using fine-tuned variants of ChemBERTa, a pre-trained chemical language model. Leveraging a large-scale consensus dataset of 177,092 compounds, we systematically evaluate two pre-training strategies-Masked Language Modeling (MLM) and Masked Token Regression (MTR)-under stratified data splits and sample weighting to address severe activity imbalance which only 2.1% are active. Our approach outperforms classical baselines Random Predictor in both regression accuracy and virtual screening utility, and has competitive performance compared to Random Forest, achieving high enrichment factor EF@1% 17.4 and precision Precision@1% 37.4 among top-ranked predictions. The resulting model, validated through rigorous ablation and hyperparameter studies, provides a robust, ready-to-deploy tool for prioritizing TDP1 inhibitors for experimental testing. By enabling accurate, 3D-structure-free pIC50 prediction directly from SMILES, this work demonstrates the transformative potential of chemical transformers in accelerating target-specific drug discovery.

[65]  arXiv:2512.04254 [pdf, ps, other]
Title: Hey GPT-OSS, Looks Like You Got It -- Now Walk Me Through It! An Assessment of the Reasoning Language Models Chain of Thought Mechanism for Digital Forensics
Comments: Accept at DFRWS EU 2026
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

The use of large language models in digital forensics has been widely explored. Beyond identifying potential applications, research has also focused on optimizing model performance for forensic tasks through fine-tuning. However, limited result explainability reduces their operational and legal usability. Recently, a new class of reasoning language models has emerged, designed to handle logic-based tasks through an `internal reasoning' mechanism. Yet, users typically see only the final answer, not the underlying reasoning. One of these reasoning models is gpt-oss, which can be deployed locally, providing full access to its underlying reasoning process. This article presents the first investigation into the potential of reasoning language models for digital forensics. Four test use cases are examined to assess the usability of the reasoning component in supporting result explainability. The evaluation combines a new quantitative metric with qualitative analysis. Findings show that the reasoning component aids in explaining and validating language model outputs in digital forensics at medium reasoning levels, but this support is often limited, and higher reasoning levels do not enhance response quality.

[66]  arXiv:2512.04256 [pdf, ps, other]
Title: On the Role and Impact of GenAI Tools in Software Engineering Education
Comments: Accepted at IEEE/ACM ICSE Software Engineering Education and Training (ICSE SEET 2026)
Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)

Context. The rise of generative AI (GenAI) tools like ChatGPT and GitHub Copilot has transformed how software is learned and written. In software engineering (SE) education, these tools offer new opportunities for support, but also raise concerns about over-reliance, ethical use, and impacts on learning. Objective. This study investigates how undergraduate SE students use GenAI tools, focusing on the benefits, challenges, ethical concerns, and instructional expectations that shape their experiences. Method. We conducted a survey with 130 undergraduate students from two universities. The survey combined structured Likert-scale items and open-ended questions to investigate five dimensions: usage context, perceived benefits, challenges, ethical and instructional perceptions. Results. Students most often use GenAI for incremental learning and advanced implementation, reporting benefits such as brainstorming support and confidence-building. At the same time, they face challenges including unclear rationales and difficulty adapting outputs. Students highlight ethical concerns around fairness and misconduct, and call for clearer instructional guidance. Conclusion. GenAI is reshaping SE education in nuanced ways. Our findings underscore the need for scaffolding, ethical policies, and adaptive instructional strategies to ensure that GenAI supports equitable and effective learning.

[67]  arXiv:2512.04257 [pdf, ps, other]
Title: Computational Linguistics Meets Libyan Dialect: A Study on Dialect Identification
Comments: 13 pages, 8 figures
Journal-ref: International Journal of Intelligent Systems and Applications, Volume 17, Number 6, December 2025, Page 118
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

This study investigates logistic regression, linear support vector machine, multinomial Naive Bayes, and Bernoulli Naive Bayes for classifying Libyan dialect utterances gathered from Twitter. The dataset used is the QADI corpus, which consists of 540,000 sentences across 18 Arabic dialects. Preprocessing challenges include handling inconsistent orthographic variations and non-standard spellings typical of the Libyan dialect. The chi-square analysis revealed that certain features, such as email mentions and emotion indicators, were not significantly associated with dialect classification and were thus excluded from further analysis. Two main experiments were conducted: (1) evaluating the significance of meta-features extracted from the corpus using the chi-square test and (2) assessing classifier performance using different word and character n-gram representations. The classification experiments showed that Multinomial Naive Bayes (MNB) achieved the highest accuracy of 85.89% and an F1-score of 0.85741 when using a (1,2) word n-gram and (1,5) character n-gram representation. In contrast, Logistic Regression and Linear SVM exhibited slightly lower performance, with maximum accuracies of 84.41% and 84.73%, respectively. Additional evaluation metrics, including log loss, Cohen kappa, and Matthew correlation coefficient, further supported the effectiveness of MNB in this task. The results indicate that carefully selected n-gram representations and classification models play a crucial role in improving the accuracy of Libyan dialect identification. This study provides empirical benchmarks and insights for future research in Arabic dialect NLP applications.

[68]  arXiv:2512.04258 [pdf, ps, other]
Title: Improved Time-Space Tradeoffs for 3SUM-Indexing
Subjects: Data Structures and Algorithms (cs.DS)

3SUM-Indexing is a preprocessing variant of the 3SUM problem that has recently received a lot of attention. The best known time-space tradeoff for the problem is $T S^3 = n^{6}$ (up to logarithmic factors), where $n$ is the number of input integers, $S$ is the length of the preprocessed data structure, and $T$ is the running time of the query algorithm. This tradeoff was achieved in [KP19, GGHPV20] using the Fiat-Naor generic algorithm for Function Inversion. Consequently, [GGHPV20] asked whether this algorithm can be improved by leveraging the structure of 3SUM-Indexing.
In this paper, we exploit the structure of 3SUM-Indexing to give a time-space tradeoff of $T S = n^{2.5}$, which is better than the best known one in the range $n^{3/2} \ll S \ll n^{7/4}$. We further extend this improvement to the $k$SUM-Indexing problem-a generalization of 3SUM-Indexing-and to the related $k$XOR-Indexing problem, where addition is replaced with XOR. Additionally, we improve the best known time-space tradeoffs for the Gapped String Indexing and Jumbled Indexing problems, which are well-known data structure problems related to 3SUM-Indexing.
Our improvement comes from an alternative way to apply the Fiat-Naor algorithm to 3SUM-Indexing. Specifically, we exploit the structure of the function to be inverted by decomposing it into "sub-functions" with certain properties. This allows us to apply an improvement to the Fiat-Naor algorithm (which is not directly applicable to 3SUM-Indexing), obtained in [GGPS23] in a much larger range of parameters. We believe that our techniques may be useful in additional application-dependent optimizations of the Fiat-Naor algorithm.

[69]  arXiv:2512.04259 [pdf, ps, other]
Title: WildCode: An Empirical Analysis of Code Generated by ChatGPT
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

LLM models are increasingly used to generate code, but the quality and security of this code are often uncertain. Several recent studies have raised alarm bells, indicating that such AI-generated code may be particularly vulnerable to cyberattacks. However, most of these studies rely on code that is generated specifically for the study, which raises questions about the realism of such experiments. In this study, we perform a large-scale empirical analysis of real-life code generated by ChatGPT. We evaluate code generated by ChatGPT both with respect to correctness and security and delve into the intentions of users who request code from the model. Our research confirms previous studies that used synthetic queries and yielded evidence that LLM-generated code is often inadequate with respect to security. We also find that users exhibit little curiosity about the security features of the code they ask LLMs to generate, as evidenced by their lack of queries on this topic.

[70]  arXiv:2512.04260 [pdf, ps, other]
Title: Breaking Isolation: A New Perspective on Hypervisor Exploitation via Cross-Domain Attacks
Subjects: Cryptography and Security (cs.CR)

Hypervisors are under threat by critical memory safety vulnerabilities, with pointer corruption being one of the most prevalent and severe forms. Existing exploitation frameworks depend on identifying highly-constrained structures in the host machine and accurately determining their runtime addresses, which is ineffective in hypervisor environments where such structures are rare and further obfuscated by Address Space Layout Randomization (ASLR). We instead observe that modern virtualization environments exhibit weak memory isolation -- guest memory is fully attacker-controlled yet accessible from the host, providing a reliable primitive for exploitation. Based on this observation, we present the first systematic characterization and taxonomy of Cross-Domain Attacks (CDA), a class of exploitation techniques that enable capability escalation through guest memory reuse. To automate this process, we develop a system that identifies cross-domain gadgets, matches them with corrupted pointers, synthesizes triggering inputs, and assembles complete exploit chains. Our evaluation on 15 real-world vulnerabilities across QEMU and VirtualBox shows that CDA is widely applicable and effective.

[71]  arXiv:2512.04261 [pdf, ps, other]
Title: Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research
Subjects: Computers and Society (cs.CY)

Objective: This study develops a systematic benchmarking framework for testing whether language models can accurately identify constructs of interest in child welfare records. The objective is to assess how different model sizes and architectures perform on four validated benchmarks for classifying critical risk factors among child welfare-involved families: domestic violence, firearms, substance-related problems generally, and opioids specifically. Method: We constructed four benchmarks for identifying risk factors in child welfare investigation summaries: domestic violence, substance-related problems, firearms, and opioids (n=500 each). We evaluated seven model sizes (0.6B-32B parameters) in standard and extended reasoning modes, plus a mixture-of-experts variant. Cohen's kappa measured agreement with gold standard classifications established by human experts. Results: The benchmarking revealed a critical finding: bigger models are not better. A small 4B parameter model with extended reasoning proved most effective, outperforming models up to eight times larger. It consistently achieved "substantial" to "almost perfect" agreement across all four benchmark categories. This model achieved "almost perfect" agreement (\k{appa} = 0.93-0.96) on three benchmarks (substance-related problems, firearms, and opioids) and "substantial" agreement (\k{appa} = 0.74) on the most complex task (domestic violence). Small models with extended reasoning rivaled the largest models while being more resource-efficient. Conclusions: Small reasoning-enabled models achieve accuracy levels historically requiring larger architectures, enabling significant time and computational efficiencies. The benchmarking framework provides a method for evidence-based model selection to balance accuracy with practical resource constraints before operational deployment in social work research.

[72]  arXiv:2512.04262 [pdf, ps, other]
Title: Catching UX Flaws in Code: Leveraging LLMs to Identify Usability Flaws at the Development Stage
Comments: 7 pages. Published in Proceedings of the 2025 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). DOI: 10.1109/VL-HCC65237.2025.00024
Journal-ref: Proceedings of the 2025 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2025
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Usability evaluations are essential for ensuring that modern interfaces meet user needs, yet traditional heuristic evaluations by human experts can be time-consuming and subjective, especially early in development. This paper investigates whether large language models (LLMs) can provide reliable and consistent heuristic assessments at the development stage. By applying Jakob Nielsen's ten usability heuristics to thirty open-source websites, we generated over 850 heuristic evaluations in three independent evaluations per site using a pipeline of OpenAI's GPT-4o. For issue detection, the model demonstrated moderate consistency, with an average pairwise Cohen's Kappa of 0.50 and an exact agreement of 84%. Severity judgments showed more variability: weighted Cohen's Kappa averaged 0.63, but exact agreement was just 56%, and Krippendorff's Alpha was near zero. These results suggest that while GPT-4o can produce internally consistent evaluations, especially for identifying the presence of usability issues, its ability to judge severity varies and requires human oversight in practice. Our findings highlight the feasibility and limitations of using LLMs for early-stage, automated usability testing, and offer a foundation for improving consistency in automated User Experience (UX) evaluation. To the best of our knowledge, our work provides one of the first quantitative inter-rater reliability analyses of automated heuristic evaluation and highlights methods for improving model consistency.

[73]  arXiv:2512.04263 [pdf, ps, other]
Title: Polynomiogram: An Integrated Framework for Root Visualization and Generative Art
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG); Mathematical Software (cs.MS)

This work presents the Polynomiogram framework, an integrated computational platform for exploring, visualizing, and generating art from polynomial root systems. The main innovation is a flexible sampling scheme in which two independent parameters are drawn from user defined domains and mapped to the polynomial coefficients through a generating function. This design allows the same mathematical foundation to support both scientific investigation and generative algorithmic art. The framework integrates two complementary numerical engines: NumPy companion matrix solver for fast, large scale computation and MPSolve for high precision, scientifically rigorous validation. This dual architecture enables efficient visualization for creative use and accurate computation for research and education. Numerical accuracy was verified using classical ensembles, including the Kac and Lucas polynomials. The method was applied to the cubic polynomial system to analyze its bifurcation structure, demonstrating its value as both a scientific tool for exploring root phenomena and an educational aid for visualizing fundamental concepts in algebra and dynamical systems. Beyond analysis, the Polynomiogram also demonstrated its potential as a tool for personalized generative art. Examples include the use of the platform to generate a natural form resembling a hibiscus flower and to create personalized artwork expressing gratitude toward advances in artificial intelligence and large language models through a tribute composition.

[74]  arXiv:2512.04264 [pdf, ps, other]
Title: Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Adversarial training is an effective method to improve the machine learning (ML) model robustness. Most existing studies typically consider the Rectified linear unit (ReLU) activation function and centralized training environments. In this paper, we study the ML model robustness using ten different activation functions through adversarial training in centralized environments and explore the ML model robustness in federal learning environments. In the centralized environment, we first propose an advanced adversarial training approach to improving the ML model robustness by incorporating model architecture change, soft labeling, simplified data augmentation, and varying learning rates. Then, we conduct extensive experiments on ten well-known activation functions in addition to ReLU to better understand how they impact the ML model robustness. Furthermore, we extend the proposed adversarial training approach to the federal learning environment, where both independent and identically distributed (IID) and non-IID data settings are considered. Our proposed centralized adversarial training approach achieves a natural and robust accuracy of 77.08% and 67.96%, respectively on CIFAR-10 against the fast gradient sign attacks. Experiments on ten activation functions reveal ReLU usually performs best. In the federated learning environment, however, the robust accuracy decreases significantly, especially on non-IID data. To address the significant performance drop in the non-IID data case, we introduce data sharing and achieve the natural and robust accuracy of 70.09% and 54.79%, respectively, surpassing the CalFAT algorithm, when 40% data sharing is used. That is, a proper percentage of data sharing can significantly improve the ML model robustness, which is useful to some real-world applications.

[75]  arXiv:2512.04267 [pdf, ps, other]
Title: UniLight: A Unified Representation for Lighting
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Lighting has a strong influence on visual appearance, yet understanding and representing lighting in images remains notoriously difficult. Various lighting representations exist, such as environment maps, irradiance, spherical harmonics, or text, but they are incompatible, which limits cross-modal transfer. We thus propose UniLight, a joint latent space as lighting representation, that unifies multiple modalities within a shared embedding. Modality-specific encoders for text, images, irradiance, and environment maps are trained contrastively to align their representations, with an auxiliary spherical-harmonics prediction task reinforcing directional understanding. Our multi-modal data pipeline enables large-scale training and evaluation across three tasks: lighting-based retrieval, environment-map generation, and lighting control in diffusion-based image synthesis. Experiments show that our representation captures consistent and transferable lighting features, enabling flexible manipulation across modalities.

[76]  arXiv:2512.04268 [pdf, ps, other]
Title: The Initialization Determines Whether In-Context Learning Is Gradient Descent
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In-context learning (ICL) in large language models (LLMs) is a striking phenomenon, yet its underlying mechanisms remain only partially understood. Previous work connects linear self-attention (LSA) to gradient descent (GD), this connection has primarily been established under simplified conditions with zero-mean Gaussian priors and zero initialization for GD. However, subsequent studies have challenged this simplified view by highlighting its overly restrictive assumptions, demonstrating instead that under conditions such as multi-layer or nonlinear attention, self-attention performs optimization-like inference, akin to but distinct from GD. We investigate how multi-head LSA approximates GD under more realistic conditions specifically when incorporating non-zero Gaussian prior means in linear regression formulations of ICL. We first extend multi-head LSA embedding matrix by introducing an initial estimation of the query, referred to as the initial guess. We prove an upper bound on the number of heads needed for ICL linear regression setup. Our experiments confirm this result and further observe that a performance gap between one-step GD and multi-head LSA persists. To address this gap, we introduce yq-LSA, a simple generalization of single-head LSA with a trainable initial guess yq. We theoretically establish the capabilities of yq-LSA and provide experimental validation on linear regression tasks, thereby extending the theory that bridges ICL and GD. Finally, inspired by our findings in the case of linear regression, we consider widespread LLMs augmented with initial guess capabilities, and show that their performance is improved on a semantic similarity task.

[77]  arXiv:2512.04269 [pdf, ps, other]
Title: Mapping Data Labour Supply Chain in Africa in an Era of Digital Apartheid: a Struggle for Recognition
Comments: 32 pages, 9 figures, 1 table
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Content moderation and data labelling work has shifted to the Global South, particularly Africa, where workers operate under precarious conditions while remaining invisible to users. This study addresses the gap in understanding the scope of this industry and the working conditions of African content moderation workforce through a participatory approach. We collaborated with a union of content moderators to conduct desk research, deploy a questionnaire (n=81), and gather ethnographic observations across nine months that could answer their social needs. Our findings show that content moderation operations span 43 out of 55 African countries, involving 17 major firms serving predominantly North-American and European clients, with workers facing insecurity and inadequate psychological support. We contribute the first comprehensive map of Africa's content moderation industry, demonstrate a participatory methodology that centers workers' collective actions in documenting their conditions, and apply Honneth's ``struggle for recognition'' framework to understand data workers' demands for professional acknowledgement.

[78]  arXiv:2512.04273 [pdf, ps, other]
Title: Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures
Authors: Tyler Slater
Comments: Under review at the Journal of Systems and Software (Special Issue on Impactful Software Architecture)
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

As Large Language Models (LLMs) transition from code completion tools to autonomous system architects, their impact on long-term software maintainability remains unquantified. While existing research benchmarks functional correctness (pass@k), this study presents the first empirical framework to measure "Architectural Erosion" and the accumulation of Technical Debt in AI-synthesized microservices. We conducted a comparative pilot study of three state-of-the-art models (GPT-5.1, Claude 4.5 Sonnet, and Llama 3 8B) by prompting them to implement a standardized Book Lending Microservice under strict Hexagonal Architecture constraints. Utilizing Abstract Syntax Tree (AST) parsing, we find that while proprietary models achieve high architectural conformance (0% violation rate for GPT-5.1), open-weights models exhibit critical divergence. Specifically, Llama 3 demonstrated an 80% Architectural Violation Rate, frequently bypassing interface adapters to create illegal circular dependencies between Domain and Infrastructure layers. Furthermore, we identified a phenomenon of "Implementation Laziness," where open-weights models generated 60% fewer Logical Lines of Code (LLOC) than their proprietary counterparts, effectively omitting complex business logic to satisfy token constraints. These findings suggest that without automated architectural linting, utilizing smaller open-weights models for system scaffolding accelerates the accumulation of structural technical debt.

[79]  arXiv:2512.04276 [pdf, ps, other]
Title: The Geometry of Benchmarks: A New Path Toward AGI
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)

Benchmarks are the primary tool for assessing progress in artificial intelligence (AI), yet current practice evaluates models on isolated test suites and provides little guidance for reasoning about generality or autonomous self-improvement. Here we introduce a geometric framework in which all psychometric batteries for AI agents are treated as points in a structured moduli space, and agent performance is described by capability functionals over this space. First, we define an Autonomous AI (AAI) Scale, a Kardashev-style hierarchy of autonomy grounded in measurable performance on batteries spanning families of tasks (for example reasoning, planning, tool use and long-horizon control). Second, we construct a moduli space of batteries, identifying equivalence classes of benchmarks that are indistinguishable at the level of agent orderings and capability inferences. This geometry yields determinacy results: dense families of batteries suffice to certify performance on entire regions of task space. Third, we introduce a general Generator-Verifier-Updater (GVU) operator that subsumes reinforcement learning, self-play, debate and verifier-based fine-tuning as special cases, and we define a self-improvement coefficient $\kappa$ as the Lie derivative of a capability functional along the induced flow. A variance inequality on the combined noise of generation and verification provides sufficient conditions for $\kappa > 0$. Our results suggest that progress toward artificial general intelligence (AGI) is best understood as a flow on moduli of benchmarks, driven by GVU dynamics rather than by scores on individual leaderboards.

[80]  arXiv:2512.04277 [pdf, ps, other]
Title: Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Post-training with reinforcement learning (RL) typically optimizes a single scalar objective and ignores structure in how solutions are produced. We ask whether a scalar hint toward a canonical solver ordering, used only during RL post-training, improves performance even when fine-tuned on randomized solution sequences. On Sudoku, we train a Transformer with standard fine-tuning on randomized solving orders, then post-train it with Group Relative Policy Optimization (GRPO) with two rewards: cell accuracy and an ordering reward that increases when the model's emission order aligns with the solver order. To compare signals cleanly, we combine them via fixed mixtures and use a simple bootstrapped scaling to equalize component magnitudes at initialization. Mixed rewards generally outperform cell-only optimization--the best mixture yields substantially higher test accuracy than the fine-tuned-only model trained on random-order and approaches the fine-tuned-only model trained on solver-order sequences in accuracy. These results suggest that coarse ordering signals can steer RL post-training toward solver-order trajectories without modifying supervised data or architecture.

[81]  arXiv:2512.04279 [pdf, ps, other]
Title: Driving Beyond Privilege: Distilling Dense-Reward Knowledge into Sparse-Reward Policies
Subjects: Robotics (cs.RO)

We study how to exploit dense simulator-defined rewards in vision-based autonomous driving without inheriting their misalignment with deployment metrics. In realistic simulators such as CARLA, privileged state (e.g., lane geometry, infractions, time-to-collision) can be converted into dense rewards that stabilize and accelerate model-based reinforcement learning, but policies trained directly on these signals often overfit and fail to generalize when evaluated on sparse objectives such as route completion and collision-free overtaking. We propose reward-privileged world model distillation, a two-stage framework in which a teacher DreamerV3-style agent is first trained with a dense privileged reward, and only its latent dynamics are distilled into a student trained solely on sparse task rewards. Teacher and student share the same observation space (semantic bird's-eye-view images); privileged information enters only through the teacher's reward, and the student does not imitate the teacher's actions or value estimates. Instead, the student's world model is regularized to match the teacher's latent dynamics while its policy is learned from scratch on sparse success/failure signals. In CARLA lane-following and overtaking benchmarks, sparse-reward students outperform both dense-reward teachers and sparse-from-scratch baselines. On unseen lane-following routes, reward-privileged distillation improves success by about 23 percent relative to the dense teacher while maintaining comparable or better safety. On overtaking, students retain near-perfect performance on training routes and achieve up to a 27x improvement in success on unseen routes, with improved lane keeping. These results show that dense rewards can be leveraged to learn richer dynamics models while keeping the deployed policy optimized strictly for sparse, deployment-aligned objectives.

[82]  arXiv:2512.04280 [pdf, ps, other]
Title: A customizable inexact subgraph matching algorithm for attributed graphs
Subjects: Data Structures and Algorithms (cs.DS)

Graphs provide a natural way to represent data by encoding information about objects and the relationships between them. With the ever-increasing amount of data collected and generated, locating specific patterns of relationships between objects in a graph is often required. Given a larger graph and a smaller graph, one may wish to identify instances of the smaller query graph in the larger target graph. This task is called subgraph identification or matching. Subgraph matching is helpful in areas such as bioinformatics, binary analysis, pattern recognition, and computer vision. In these applications, datasets frequently contain noise and errors, thus exact subgraph matching algorithms do not apply. In this paper we introduce a new customizable algorithm for inexact subgraph matching. Our algorithm utilizes node and edge attributes which are often present in real-world datasets to narrow down the search space. The algorithm is flexible in the type of subgraph matching it can perform and the types of datasets it can process by its use of a modifiable graph edit distance cost function for pairing nodes. We show its effectiveness on family trees graphs and control-flow graphs.

[83]  arXiv:2512.04282 [pdf, ps, other]
Title: Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video Motion Transfer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Real-time video motion transfer applications such as immersive gaming and vision-based anomaly detection require accurate yet diverse future predictions to support realistic synthesis and robust downstream decision making under uncertainty. To improve the diversity of such sequential forecasts we propose a novel inference-time refinement technique that combines Gated Recurrent Unit-Normalizing Flows (GRU-NF) with stochastic sampling methods. While GRU-NF can capture multimodal distributions through its integration of normalizing flows within a temporal forecasting framework, its deterministic transformation structure can limit expressivity. To address this, inspired by Stochastic Normalizing Flows (SNF), we introduce Markov Chain Monte Carlo (MCMC) steps during GRU-NF inference, enabling the model to explore a richer output space and better approximate the true data distribution without retraining. We validate our approach in a keypoint-based video motion transfer pipeline, where capturing temporally coherent and perceptually diverse future trajectories is essential for realistic samples and low bandwidth communication. Experiments show that our inference framework, Gated Recurrent Unit- Stochastic Normalizing Flows (GRU-SNF) outperforms GRU-NF in generating diverse outputs without sacrificing accuracy, even under longer prediction horizons. By injecting stochasticity during inference, our approach captures multimodal behavior more effectively. These results highlight the potential of integrating stochastic dynamics with flow-based sequence models for generative time series forecasting.

[84]  arXiv:2512.04283 [pdf, ps, other]
Title: Plug-and-Play Image Restoration with Flow Matching: A Continuous Viewpoint
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Flow matching-based generative models have been integrated into the plug-and-play image restoration framework, and the resulting plug-and-play flow matching (PnP-Flow) model has achieved some remarkable empirical success for image restoration. However, the theoretical understanding of PnP-Flow lags its empirical success. In this paper, we derive a continuous limit for PnP-Flow, resulting in a stochastic differential equation (SDE) surrogate model of PnP-Flow. The SDE model provides two particular insights to improve PnP-Flow for image restoration: (1) It enables us to quantify the error for image restoration, informing us to improve step scheduling and regularize the Lipschitz constant of the neural network-parameterized vector field for error reduction. (2) It informs us to accelerate off-the-shelf PnP-Flow models via extrapolation, resulting in a rescaled version of the proposed SDE model. We validate the efficacy of the SDE-informed improved PnP-Flow using several benchmark tasks, including image denoising, deblurring, super-resolution, and inpainting. Numerical results show that our method significantly outperforms the baseline PnP-Flow and other state-of-the-art approaches, achieving superior performance across evaluation metrics.

[85]  arXiv:2512.04284 [pdf, ps, other]
Title: Learning Single-Image Super-Resolution in the JPEG Compressed Domain
Comments: 7 pages, 4 figures, 2 tables, SEEDS Workshop, ICIP 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Deep learning models have grown increasingly complex, with input data sizes scaling accordingly. Despite substantial advances in specialized deep learning hardware, data loading continues to be a major bottleneck that limits training and inference speed. To address this challenge, we propose training models directly on encoded JPEG features, reducing the computational overhead associated with full JPEG decoding and significantly improving data loading efficiency. While prior works have focused on recognition tasks, we investigate the effectiveness of this approach for the restoration task of single-image super-resolution (SISR). We present a lightweight super-resolution pipeline that operates on JPEG discrete cosine transform (DCT) coefficients in the frequency domain. Our pipeline achieves a 2.6x speedup in data loading and a 2.5x speedup in training, while preserving visual quality comparable to standard SISR approaches.

[86]  arXiv:2512.04285 [pdf, ps, other]
Title: Differential Filtering in a Common Basic Cycle: Multi-Major Trajectories and Structural Bottlenecks in Exact Sciences and Engineering Degrees
Authors: H. R. Paz
Comments: 24 pages, 5 figures , 3 tables
Subjects: Computers and Society (cs.CY)

Universities often present the Common Basic Cycle (CBC) as a neutral levelling stage shared by several degree programmes. Using twenty years of longitudinal administrative records from a Faculty of Engineering and Exact Sciences, this study tests whether the CBC actually operates as a uniform gateway or as a differential filter across majors. We reconstruct student trajectories for 24,017 entrants, identifying CBC subjects (year level <= 1), destination major, time to exit from the CBC, and final outcome (progression to upper cycle, drop-out, or right-censoring). The analysis combines transition matrices, Kaplan-Meier survival curves, stratified Cox models and subject-level logistic models of drop-out after failure, extended with multi-major enrolment data and a pre/post 2006 curriculum reform comparison. Results show that the CBC functions as a strongly differential filter. Post-reform, the probability of progressing to the upper cycle in the same major ranges from about 0.20 to 0.70 across programmes, while overall drop-out in the CBC exceeds 60%. Early Mathematics modules (introductory calculus and algebra) emerge as structural bottlenecks, combining low pass rates with a two- to three-fold increase in the hazard of leaving the system after failure, with markedly different severity by destination major. Multi-major enrolment, often treated administratively as indecision, is instead associated with lower drop-out, suggesting an adaptive exploration of feasible trajectories. The findings portray the CBC not as a neutral academic foyer, but as a structured sorting device whose impact depends sharply on the targeted degree and on the opportunity to explore alternative majors.

[87]  arXiv:2512.04287 [pdf, ps, other]
Title: Artificial Intelligence Applications in Horizon Scanning for Infectious Diseases
Comments: 21 pages, 1 box, 1 figure
Subjects: Artificial Intelligence (cs.AI); Populations and Evolution (q-bio.PE)

This review explores the integration of Artificial Intelligence into Horizon Scanning, focusing on identifying and responding to emerging threats and opportunities linked to Infectious Diseases. We examine how AI tools can enhance signal detection, data monitoring, scenario analysis, and decision support. We also address the risks associated with AI adoption and propose strategies for effective implementation and governance. The findings contribute to the growing body of Foresight literature by demonstrating the potential and limitations of AI in Public Health preparedness.

[88]  arXiv:2512.04291 [pdf, ps, other]
Title: Scaling MPI Applications on Aurora
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The Aurora supercomputer, which was deployed at Argonne National Laboratory in 2024, is currently one of three Exascale machines in the world on the Top500 list. The Aurora system is composed of over ten thousand nodes each of which contains six Intel Data Center Max Series GPUs, Intel's first data center-focused discrete GPU, and two Intel Xeon Max Series CPUs, Intel's first Xeon processor to contain HBM memory. To achieve Exascale performance the system utilizes the HPE Slingshot high-performance fabric interconnect to connect the nodes. Aurora is currently the largest deployment of the Slingshot fabric to date with nearly 85,000 Cassini NICs and 5,600 Rosetta switches connected in a dragonfly topology. The combination of the Intel powered nodes and the Slingshot network enabled Aurora to become the second fastest system on the Top500 list in June of 2024 and the fastest system on the HPL MxP benchmark. The system is one of the most powerful systems in the world dedicated to AI and HPC simulations for open science. This paper presents details of the Aurora system design with a particular focus on the network fabric and the approach taken to validating it. The performance of the systems is demonstrated through the presentation of the results of MPI benchmarks as well as performance benchmarks including HPL, HPL-MxP, Graph500, and HPCG run on a large fraction of the system. Additionally results are presented for a diverse set of applications including HACC, AMR-Wind, LAMMPS, and FMM demonstrating that Aurora provides the throughput, latency, and bandwidth across system needed to allow applications to perform and scale to large node counts and providing new levels of capability and enabling breakthrough science.

[89]  arXiv:2512.04292 [pdf, ps, other]
Title: SQuARE: Structured Query & Adaptive Retrieval Engine For Tabular Formats
Comments: Accepted in The IEEE International Workshop on Large Language Models in Finance, Dec 8-11, Macau, China, 2025, Preprint Copy
Subjects: Computation and Language (cs.CL)

Accurate question answering over real spreadsheets remains difficult due to multirow headers, merged cells, and unit annotations that disrupt naive chunking, while rigid SQL views fail on files lacking consistent schemas. We present SQuARE, a hybrid retrieval framework with sheet-level, complexity-aware routing. It computes a continuous score based on header depth and merge density, then routes queries either through structure-preserving chunk retrieval or SQL over an automatically constructed relational representation. A lightweight agent supervises retrieval, refinement, or combination of results across both paths when confidence is low. This design maintains header hierarchies, time labels, and units, ensuring that returned values are faithful to the original cells and straightforward to verify. Evaluated on multi-header corporate balance sheets, a heavily merged World Bank workbook, and diverse public datasets, SQuARE consistently surpasses single-strategy baselines and ChatGPT-4o on both retrieval precision and end-to-end answer accuracy while keeping latency predictable. By decoupling retrieval from model choice, the system is compatible with emerging tabular foundation models and offers a practical bridge toward a more robust table understanding.

[90]  arXiv:2512.04296 [pdf, ps, other]
Title: GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers
Comments: Under Review
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Parameter-efficient fine-tuning (PEFT) provides a scalable alternative to full-model adaptation by updating only a small subset of parameters in large pre-trained models. We introduce GRASP - GRouped Activation Shared Parameterization - a lightweight PEFT framework that partitions the D-dimensional token representations of selected layers into K << D groups and learns a shared scaling and shifting vector for each group. This grouped modulation reduces the number of trainable parameters significantly while preserving the ability of the model to learn task-specific features. Building on this formulation, we further propose StochGRASP, which learns Gaussian distributions as perturbations to the pre-trained weights rather than deterministic values. This probabilistic parameterization along with a noise-aware loss function formulation enables modelling hardware-level variability in programmed weights and significantly improves robustness under non-ideal inference conditions-an important requirement for deployment on edge-based emerging AI hardware. Across GLUE (RoBERTa-base & RoBERTa-large) and E2E NLG (GPT-2 Medium), GRASP matches or exceeds the performance of established PEFT methods while achieving an order of magnitude reduction in trainable parameters compared to LoRA and BitFit. Under varying levels of noise, StochGRASP consistently outperforms deterministic variants, demonstrating its suitability for energy-efficient and noise-prone hardware platforms.

[91]  arXiv:2512.04299 [pdf, ps, other]
Title: When do spectral gradient updates help in deep learning?
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Spectral gradient methods, such as the recently popularized Muon optimizer, are a promising alternative to standard Euclidean gradient descent for training deep neural networks and transformers, but it is still unclear in which regimes they are expected to perform better. We propose a simple layerwise condition that predicts when a spectral update yields a larger decrease in the loss than a Euclidean gradient step. This condition compares, for each parameter block, the squared nuclear-to-Frobenius ratio of the gradient to the stable rank of the incoming activations. To understand when this condition may be satisfied, we first prove that post-activation matrices have low stable rank at Gaussian initialization in random feature regression, feedforward networks, and transformer blocks. In spiked random feature models we then show that, after a short burn-in, the Euclidean gradient's nuclear-to-Frobenius ratio grows with the data dimension while the stable rank of the activations remains bounded, so the predicted advantage of spectral updates scales with dimension. We validate these predictions in synthetic regression experiments and in NanoGPT-scale language model training, where we find that intermediate activations have low-stable-rank throughout training and the corresponding gradients maintain large nuclear-to-Frobenius ratios. Together, these results identify conditions for spectral gradient methods, such as Muon, to be effective in training deep networks and transformers.

[92]  arXiv:2512.04302 [pdf, ps, other]
Title: Towards better dense rewards in Reinforcement Learning Applications
Authors: Shuyuan Zhang
Comments: arXiv admin note: substantial text overlap with arXiv:2505.20417
Subjects: Artificial Intelligence (cs.AI)

Finding meaningful and accurate dense rewards is a fundamental task in the field of reinforcement learning (RL) that enables agents to explore environments more efficiently. In traditional RL settings, agents learn optimal policies through interactions with an environment guided by reward signals. However, when these signals are sparse, delayed, or poorly aligned with the intended task objectives, agents often struggle to learn effectively. Dense reward functions, which provide informative feedback at every step or state transition, offer a potential solution by shaping agent behavior and accelerating learning. Despite their benefits, poorly crafted reward functions can lead to unintended behaviors, reward hacking, or inefficient exploration. This problem is particularly acute in complex or high-dimensional environments where handcrafted rewards are difficult to specify and validate. To address this, recent research has explored a variety of approaches, including inverse reinforcement learning, reward modeling from human preferences, and self-supervised learning of intrinsic rewards. While these methods offer promising directions, they often involve trade-offs between generality, scalability, and alignment with human intent. This proposal explores several approaches to dealing with these unsolved problems and enhancing the effectiveness and reliability of dense reward construction in different RL applications.

[93]  arXiv:2512.04303 [pdf, ps, other]
Title: Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular Applications
Comments: Accepted in 3DV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Accurate perception of the vehicle's 3D surroundings, including fine-scale road geometry, such as bumps, slopes, and surface irregularities, is essential for safe and comfortable vehicle control. However, conventional monocular depth estimation often oversmooths these features, losing critical information for motion planning and stability. To address this, we introduce Gamma-from-Mono (GfM), a lightweight monocular geometry estimation method that resolves the projective ambiguity in single-camera reconstruction by decoupling global and local structure. GfM predicts a dominant road surface plane together with residual variations expressed by gamma, a dimensionless measure of vertical deviation from the plane, defined as the ratio of a point's height above it to its depth from the camera, and grounded in established planar parallax geometry. With only the camera's height above ground, this representation deterministically recovers metric depth via a closed form, avoiding full extrinsic calibration and naturally prioritizing near-road detail. Its physically interpretable formulation makes it well suited for self-supervised learning, eliminating the need for large annotated datasets. Evaluated on KITTI and the Road Surface Reconstruction Dataset (RSRD), GfM achieves state-of-the-art near-field accuracy in both depth and gamma estimation while maintaining competitive global depth performance. Our lightweight 8.88M-parameter model adapts robustly across diverse camera setups and, to our knowledge, is the first self-supervised monocular approach evaluated on RSRD.

[94]  arXiv:2512.04305 [pdf, ps, other]
Title: How (Mis)calibrated is Your Federated CLIP and What To Do About It?
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While vision-language models like CLIP have been extensively studied, their calibration, crucial for reliable predictions, has received limited attention. Although a few prior works have examined CLIP calibration in offline settings, the impact of fine-tuning CLIP in a federated learning (FL) setup remains unexplored. In this work, we investigate how FL affects CLIP calibration and propose strategies to improve reliability in this distributed setting. We first analyze Textual Prompt Tuning approaches and show that they degrade calibration metrics when operating under FL. We also evaluate existing in-training calibration techniques across four global aggregation methods, finding that they provide limited improvements. Our results suggest that the key challenge lies not only in how we aggregate or calibrate, but in which components we choose to fine-tune. Motivated by this insight, we propose $\text{FL}^2\text{oRA}$, a straightforward LoRA-based approach that naturally improves calibration in FL, and we analyze the factors behind its effectiveness. Experiments on multiple benchmarks demonstrate that $\text{FL}^2\text{oRA}$ consistently produces well-calibrated models, reducing the need for explicit calibration procedures. Codes are available at https://github.com/mainaksingha01/FL2oRA.

[95]  arXiv:2512.04307 [pdf, ps, other]
Title: Evaluating Long-Context Reasoning in LLM-Based WebAgents
Comments: Accepted NeurIPS 25 LAW Workshop
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

As large language model (LLM)-based agents become increasingly integrated into daily digital interactions, their ability to reason across long interaction histories becomes crucial for providing personalized and contextually aware assistance. However, the performance of these agents in long context scenarios, particularly for action-taking WebAgents operating in realistic web environments, remains largely unexplored. This paper introduces a benchmark for evaluating long context reasoning capabilities of WebAgents through sequentially dependent subtasks that require retrieval and application of information from extended interaction histories. We develop a novel evaluation framework that simulates multi-session user interactions by injecting irrelevant task trajectories between dependent subtasks, creating contexts ranging from 25,000 to 150,000 tokens. Through extensive evaluation of four popular models, Claude-3.7, GPT-4.1, Llama 4, and o4-mini, we observe a dramatic performance degradation as context length increases, with success rates dropping from 40-50\% in baseline conditions to less than 10\% in long context scenarios. Our detailed error analysis reveals that agents primarily fail due to getting stuck in loops and losing track of original task objectives. We further propose an implicit RAG approach that provides modest improvements by generating task-relevant summaries, though fundamental limitations in long context reasoning persist. These findings highlight critical challenges for deploying WebAgents in realistic, long-term user interaction scenarios and provide insights for developing more robust agent architectures capable of maintaining coherent task execution across extended contexts.

[96]  arXiv:2512.04308 [pdf, ps, other]
Title: ResponsibleRobotBench: Benchmarking Responsible Robot Manipulation using Multi-modal Large Language Models
Comments: this https URL
Subjects: Robotics (cs.RO)

Recent advances in large multimodal models have enabled new opportunities in embodied AI, particularly in robotic manipulation. These models have shown strong potential in generalization and reasoning, but achieving reliable and responsible robotic behavior in real-world settings remains an open challenge. In high-stakes environments, robotic agents must go beyond basic task execution to perform risk-aware reasoning, moral decision-making, and physically grounded planning. We introduce ResponsibleRobotBench, a systematic benchmark designed to evaluate and accelerate progress in responsible robotic manipulation from simulation to real world. This benchmark consists of 23 multi-stage tasks spanning diverse risk types, including electrical, chemical, and human-related hazards, and varying levels of physical and planning complexity. These tasks require agents to detect and mitigate risks, reason about safety, plan sequences of actions, and engage human assistance when necessary. Our benchmark includes a general-purpose evaluation framework that supports multimodal model-based agents with various action representation modalities. The framework integrates visual perception, context learning, prompt construction, hazard detection, reasoning and planning, and physical execution. It also provides a rich multimodal dataset, supports reproducible experiments, and includes standardized metrics such as success rate, safety rate, and safe success rate. Through extensive experimental setups, ResponsibleRobotBench enables analysis across risk categories, task types, and agent configurations. By emphasizing physical reliability, generalization, and safety in decision-making, this benchmark provides a foundation for advancing the development of trustworthy, real-world responsible dexterous robotic systems. https://sites.google.com/view/responsible-robotbench

[97]  arXiv:2512.04309 [pdf, ps, other]
Title: Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction
Comments: Submitted to CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Image captioning has drawn considerable attention from the natural language processing and computer vision fields. Aiming to reduce the reliance on curated data, several studies have explored image captioning without any humanly-annotated image-text pairs for training, although existing methods are still outperformed by fully supervised approaches. This paper proposes TOMCap, i.e., an improved text-only training method that performs captioning without the need for aligned image-caption pairs. The method is based on prompting a pre-trained language model decoder with information derived from a CLIP representation, after undergoing a process to reduce the modality gap. We specifically tested the combined use of retrieved examples of captions, and latent vector representations, to guide the generation process. Through extensive experiments, we show that TOMCap outperforms other training-free and text-only methods. We also analyze the impact of different choices regarding the configuration of the retrieval-augmentation and modality gap reduction components.

[98]  arXiv:2512.04310 [pdf, ps, other]
Title: RNNs perform task computations by dynamically warping neural representations
Comments: NeurIPS 2025
Subjects: Machine Learning (cs.LG); Differential Geometry (math.DG); Dynamical Systems (math.DS); Neurons and Cognition (q-bio.NC)

Analysing how neural networks represent data features in their activations can help interpret how they perform tasks. Hence, a long line of work has focused on mathematically characterising the geometry of such "neural representations." In parallel, machine learning has seen a surge of interest in understanding how dynamical systems perform computations on time-varying input data. Yet, the link between computation-through-dynamics and representational geometry remains poorly understood. Here, we hypothesise that recurrent neural networks (RNNs) perform computations by dynamically warping their representations of task variables. To test this hypothesis, we develop a Riemannian geometric framework that enables the derivation of the manifold topology and geometry of a dynamical system from the manifold of its inputs. By characterising the time-varying geometry of RNNs, we show that dynamic warping is a fundamental feature of their computations.

[99]  arXiv:2512.04311 [pdf, ps, other]
Title: Real-time Cricket Sorting By Sex
Comments: 13 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)

The global demand for sustainable protein sources is driving increasing interest in edible insects, with Acheta domesticus (house cricket) identified as one of the most suitable species for industrial production. Current farming practices typically rear crickets in mixed-sex populations without automated sex sorting, despite potential benefits such as selective breeding, optimized reproduction ratios, and nutritional differentiation. This work presents a low-cost, real-time system for automated sex-based sorting of Acheta domesticus, combining computer vision and physical actuation. The device integrates a Raspberry Pi 5 with the official Raspberry AI Camera and a custom YOLOv8 nano object detection model, together with a servo-actuated sorting arm. The model reached a mean Average Precision at IoU 0.5 ([email protected]) of 0.977 during testing, and real-world experiments with groups of crickets achieved an overall sorting accuracy of 86.8%. These results demonstrate the feasibility of deploying lightweight deep learning models on resource-constrained devices for insect farming applications, offering a practical solution to improve efficiency and sustainability in cricket production.

[100]  arXiv:2512.04313 [pdf, ps, other]
Title: Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding
Comments: 16 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Current expressive avatar systems rely heavily on visual cues, failing when faces are occluded or when emotions remain internal. We present Mind-to-Face, the first framework that decodes non-invasive electroencephalogram (EEG) signals directly into high-fidelity facial expressions. We build a dual-modality recording setup to obtain synchronized EEG and multi-view facial video during emotion-eliciting stimuli, enabling precise supervision for neural-to-visual learning. Our model uses a CNN-Transformer encoder to map EEG signals into dense 3D position maps, capable of sampling over 65k vertices, capturing fine-scale geometry and subtle emotional dynamics, and renders them through a modified 3D Gaussian Splatting pipeline for photorealistic, view-consistent results. Through extensive evaluation, we show that EEG alone can reliably predict dynamic, subject-specific facial expressions, including subtle emotional responses, demonstrating that neural signals contain far richer affective and geometric information than previously assumed. Mind-to-Face establishes a new paradigm for neural-driven avatars, enabling personalized, emotion-aware telepresence and cognitive interaction in immersive environments.

[101]  arXiv:2512.04314 [pdf, ps, other]
Title: DisentangleFormer: Spatial-Channel Decoupling for Multi-Channel Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision Transformers face a fundamental limitation: standard self-attention jointly processes spatial and channel dimensions, leading to entangled representations that prevent independent modeling of structural and semantic dependencies. This problem is especially pronounced in hyperspectral imaging, from satellite hyperspectral remote sensing to infrared pathology imaging, where channels capture distinct biophysical or biochemical cues. We propose DisentangleFormer, an architecture that achieves robust multi-channel vision representation through principled spatial-channel decoupling. Motivated by information-theoretic principles of decorrelated representation learning, our parallel design enables independent modeling of structural and semantic cues while minimizing redundancy between spatial and channel streams. Our design integrates three core components: (1) Parallel Disentanglement: Independently processes spatial-token and channel-token streams, enabling decorrelated feature learning across spatial and spectral dimensions, (2) Squeezed Token Enhancer: An adaptive calibration module that dynamically fuses spatial and channel streams, and (3) Multi-Scale FFN: complementing global attention with multi-scale local context to capture fine-grained structural and semantic dependencies. Extensive experiments on hyperspectral benchmarks demonstrate that DisentangleFormer achieves state-of-the-art performance, consistently outperforming existing models on Indian Pine, Pavia University, and Houston, the large-scale BigEarthNet remote sensing dataset, as well as an infrared pathology dataset. Moreover, it retains competitive accuracy on ImageNet while reducing computational cost by 17.8% in FLOPs. The code will be made publicly available upon acceptance.

[102]  arXiv:2512.04315 [pdf, ps, other]
Title: SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Modeling dynamic 3D scenes is challenging due to their high-dimensional nature, which requires aggregating information from multiple views to reconstruct time-evolving 3D geometry and motion. We present a novel multi-video 4D Gaussian Splatting (4DGS) approach designed to handle real-world, unsynchronized video sets. Our approach, SyncTrack4D, directly leverages dense 4D track representation of dynamic scene parts as cues for simultaneous cross-video synchronization and 4DGS reconstruction. We first compute dense per-video 4D feature tracks and cross-video track correspondences by Fused Gromov-Wasserstein optimal transport approach. Next, we perform global frame-level temporal alignment to maximize overlapping motion of matched 4D tracks. Finally, we achieve sub-frame synchronization through our multi-video 4D Gaussian splatting built upon a motion-spline scaffold representation. The final output is a synchronized 4DGS representation with dense, explicit 3D trajectories, and temporal offsets for each video. We evaluate our approach on the Panoptic Studio and SyncNeRF Blender, demonstrating sub-frame synchronization accuracy with an average temporal error below 0.26 frames, and high-fidelity 4D reconstruction reaching 26.3 PSNR scores on the Panoptic Studio dataset. To the best of our knowledge, our work is the first general 4D Gaussian Splatting approach for unsynchronized video sets, without assuming the existence of predefined scene objects or prior models.

[103]  arXiv:2512.04316 [pdf, ps, other]
Title: ConsentDiff at Scale: Longitudinal Audits of Web Privacy Policy Changes and UI Frictions
Authors: Haoze Guo
Subjects: Human-Computer Interaction (cs.HC)

Web privacy is experienced via two public artifacts: site utterances in policy texts, and the actions users are required to take during consent interfaces. In the extensive cross-section audits we've studied, there is a lack of longitudinal data detailing how these artifacts are changing together, and if interfaces are actually doing what they promise in policy. ConsentDiff provides that longitudinal view. We build a reproducible pipeline that snapshots sites every month, semantically aligns policy clauses to track clause-level churn, and classifies consent-UI patterns by pulling together DOM signals with cues provided by screenshots. We introduce a novel weighted claim-UI alignment score, connecting common policy claims to observable predicates, and enabling comparisons over time, regions, and verticals. Our measurements suggest continued policy churn, systematic changes to eliminate a higher-friction banner design, and significantly higher alignment where rejecting is visible and lower friction.

[104]  arXiv:2512.04319 [pdf, ps, other]
Title: MANTRA: a Framework for Multi-stage Adaptive Noise TReAtment During Training
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

The reliable application of deep learning models to software engineering tasks hinges on high-quality training data. Yet, large-scale repositories inevitably introduce noisy or mislabeled examples that degrade both accuracy and robustness. While Noise Label Learning (NLL) has been extensively studied in other fields, there are a few works that investigate NLL in Software Engineering (SE) and Large Language Models (LLMs) for SE tasks. In this work, we propose MANTRA, a Multi-stage Adaptive Noise TReAtment framework that embeds noise diagnosis and mitigation directly into the fine-tuning process of code-Pretrained Language Models (PTM) and code-LLMs. We first investigate the effect of noise at varying levels on convergence and loss trajectories of the models. Then we apply an adaptive dropout strategy guided by per-sample loss dynamics and Gaussian Mixture Model clustering to exclude persistently noisy points while preserving clean data. Applying to code summarization and commit intent classification, our experiments reveal that some LLMs are more sensitive to noise than others. However, with MANTRA, the performance of all models in both tasks is improved. MANTRA enables researchers and practitioners to reduce the impact of errors introduced by the dataset in training, saves time in data cleaning and processing, while maximizing the effect of fine-tuning.

[105]  arXiv:2512.04320 [pdf, ps, other]
Title: VLCs: Managing Parallelism with Virtualized Libraries
Comments: Research Paper accepted to the ACM Symposium on Cloud Computing (SoCC'25)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS)

As the complexity and scale of modern parallel machines continue to grow, programmers increasingly rely on composition of software libraries to encapsulate and exploit parallelism. However, many libraries are not designed with composition in mind and assume they have exclusive access to all resources. Using such libraries concurrently can result in contention and degraded performance. Prior solutions involve modifying the libraries or the OS, which is often infeasible.
We propose Virtual Library Contexts (VLCs), which are process subunits that encapsulate sets of libraries and associated resource allocations. VLCs control the resource utilization of these libraries without modifying library code. This enables the user to partition resources between libraries to prevent contention, or load multiple copies of the same library to allow parallel execution of otherwise thread-unsafe code within the same process.
In this paper, we describe and evaluate C++ and Python prototypes of VLCs. Experiments show VLCs enable a speedup up to 2.85x on benchmarks including applications using OpenMP, OpenBLAS, and LibTorch.

[106]  arXiv:2512.04323 [pdf, ps, other]
Title: Bayes-DIC Net: Estimating Digital Image Correlation Uncertainty with Bayesian Neural Networks
Comments: 17 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

This paper introduces a novel method for generating high-quality Digital Image Correlation (DIC) dataset based on non-uniform B-spline surfaces. By randomly generating control point coordinates, we construct displacement fields that encompass a variety of realistic displacement scenarios, which are subsequently used to generate speckle pattern datasets. This approach enables the generation of a large-scale dataset that capture real-world displacement field situations, thereby enhancing the training and generalization capabilities of deep learning-based DIC algorithms. Additionally, we propose a novel network architecture, termed Bayes-DIC Net, which extracts information at multiple levels during the down-sampling phase and facilitates the aggregation of information across various levels through a single skip connection during the up-sampling phase. Bayes-DIC Net incorporates a series of lightweight convolutional blocks designed to expand the receptive field and capture rich contextual information while minimizing computational costs. Furthermore, by integrating appropriate dropout modules into Bayes-DIC Net and activating them during the network inference stage, Bayes-DIC Net is transformed into a Bayesian neural network. This transformation allows the network to provide not only predictive results but also confidence levels in these predictions when processing real unlabeled datasets. This feature significantly enhances the practicality and reliability of our network in real-world displacement field prediction tasks. Through these innovations, this paper offers new perspectives and methods for dataset generation and algorithm performance enhancement in the field of DIC.

[107]  arXiv:2512.04324 [pdf, ps, other]
Title: DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Real-world enterprise data intelligence workflows encompass data engineering that turns raw sources into analytical-ready tables and data analysis that convert those tables into decision-oriented insights. We introduce DAComp, a benchmark of 210 tasks that mirrors these complex workflows. Data engineering (DE) tasks require repository-level engineering on industrial schemas, including designing and building multi-stage SQL pipelines from scratch and evolving existing systems under evolving requirements. Data analysis (DA) tasks pose open-ended business problems that demand strategic planning, exploratory analysis through iterative coding, interpretation of intermediate results, and the synthesis of actionable recommendations. Engineering tasks are scored through execution-based, multi-metric evaluation. Open-ended tasks are assessed by a reliable, experimentally validated LLM-judge, which is guided by hierarchical, meticulously crafted rubrics. Our experiments reveal that even state-of-the-art agents falter on DAComp. Performance on DE tasks is particularly low, with success rates under 20%, exposing a critical bottleneck in holistic pipeline orchestration, not merely code generation. Scores on DA tasks also average below 40%, highlighting profound deficiencies in open-ended reasoning and demonstrating that engineering and analysis are distinct capabilities. By clearly diagnosing these limitations, DAComp provides a rigorous and realistic testbed to drive the development of truly capable autonomous data agents for enterprise settings. Our data and code are available at https://da-comp.github.io

[108]  arXiv:2512.04329 [pdf, ps, other]
Title: A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)

Reusing existing neural-network components is central to research efficiency, yet discovering, extracting, and validating such modules across thousands of open-source repositories remains difficult. We introduce NN-RAG, a retrieval-augmented generation system that converts large, heterogeneous PyTorch codebases into a searchable and executable library of validated neural modules. Unlike conventional code search or clone-detection tools, NN-RAG performs scope-aware dependency resolution, import-preserving reconstruction, and validator-gated promotion -- ensuring that every retrieved block is scope-closed, compilable, and runnable. Applied to 19 major repositories, the pipeline extracted 1,289 candidate blocks, validated 941 (73.0%), and demonstrated that over 80% are structurally unique. Through multi-level de-duplication (exact, lexical, structural), we find that NN-RAG contributes the overwhelming majority of unique architectures to the LEMUR dataset, supplying approximately 72% of all novel network structures. Beyond quantity, NN-RAG uniquely enables cross-repository migration of architectural patterns, automatically identifying reusable modules in one project and regenerating them, dependency-complete, in another context. To our knowledge, no other open-source system provides this capability at scale. The framework's neutral specifications further allow optional integration with language models for synthesis or dataset registration without redistributing third-party code. Overall, NN-RAG transforms fragmented vision code into a reproducible, provenance-tracked substrate for algorithmic discovery, offering a first open-source solution that both quantifies and expands the diversity of executable neural architectures across repositories.

[109]  arXiv:2512.04331 [pdf, ps, other]
Title: Open Set Face Forgery Detection via Dual-Level Evidence Collection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The proliferation of face forgeries has increasingly undermined confidence in the authenticity of online content. Given the rapid development of face forgery generation algorithms, new fake categories are likely to keep appearing, posing a major challenge to existing face forgery detection methods. Despite recent advances in face forgery detection, existing methods are typically limited to binary Real-vs-Fake classification or the identification of known fake categories, and are incapable of detecting the emergence of novel types of forgeries. In this work, we study the Open Set Face Forgery Detection (OSFFD) problem, which demands that the detection model recognize novel fake categories. We reformulate the OSFFD problem and address it through uncertainty estimation, enhancing its applicability to real-world scenarios. Specifically, we propose the Dual-Level Evidential face forgery Detection (DLED) approach, which collects and fuses category-specific evidence on the spatial and frequency levels to estimate prediction uncertainty. Extensive evaluations conducted across diverse experimental settings demonstrate that the proposed DLED method achieves state-of-the-art performance, outperforming various baseline models by an average of 20% in detecting forgeries from novel fake categories. Moreover, on the traditional Real-versus-Fake face forgery detection task, our DLED method concurrently exhibits competitive performance.

[110]  arXiv:2512.04332 [pdf, ps, other]
Title: Data-regularized Reinforcement Learning for Diffusion Models at Scale
Subjects: Machine Learning (cs.LG)

Aligning generative diffusion models with human preferences via reinforcement learning (RL) is critical yet challenging. Most existing algorithms are often vulnerable to reward hacking, such as quality degradation, over-stylization, or reduced diversity. Our analysis demonstrates that this can be attributed to the inherent limitations of their regularization, which provides unreliable penalties. We introduce Data-regularized Diffusion Reinforcement Learning (DDRL), a novel framework that uses the forward KL divergence to anchor the policy to an off-policy data distribution. Theoretically, DDRL enables robust, unbiased integration of RL with standard diffusion training. Empirically, this translates into a simple yet effective algorithm that combines reward maximization with diffusion loss minimization. With over a million GPU hours of experiments and ten thousand double-blind human evaluations, we demonstrate on high-resolution video generation tasks that DDRL significantly improves rewards while alleviating the reward hacking seen in baselines, achieving the highest human preference and establishing a robust and scalable paradigm for diffusion post-training.

[111]  arXiv:2512.04333 [pdf, ps, other]
Title: RGE-GCN: Recursive Gene Elimination with Graph Convolutional Networks for RNA-seq based Early Cancer Detection
Comments: 12 pages, 2 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Early detection of cancer plays a key role in improving survival rates, but identifying reliable biomarkers from RNA-seq data is still a major challenge. The data are high-dimensional, and conventional statistical methods often fail to capture the complex relationships between genes. In this study, we introduce RGE-GCN (Recursive Gene Elimination with Graph Convolutional Networks), a framework that combines feature selection and classification in a single pipeline. Our approach builds a graph from gene expression profiles, uses a Graph Convolutional Network to classify cancer versus normal samples, and applies Integrated Gradients to highlight the most informative genes. By recursively removing less relevant genes, the model converges to a compact set of biomarkers that are both interpretable and predictive. We evaluated RGE-GCN on synthetic data as well as real-world RNA-seq cohorts of lung, kidney, and cervical cancers. Across all datasets, the method consistently achieved higher accuracy and F1-scores than standard tools such as DESeq2, edgeR, and limma-voom. Importantly, the selected genes aligned with well-known cancer pathways including PI3K-AKT, MAPK, SUMOylation, and immune regulation. These results suggest that RGE-GCN shows promise as a generalizable approach for RNA-seq based early cancer detection and biomarker discovery (https://rce-gcn.streamlit.app/ ).

[112]  arXiv:2512.04334 [pdf, ps, other]
Title: Human-controllable AI: Meaningful Human Control
Authors: Chengke Liu, Wei Xu
Comments: 52 pages
Subjects: Human-Computer Interaction (cs.HC)

Developing human-controllable artificial intelligence (AI) and achieving meaningful human control (MHC) has become a vital principle to address these challenges, ensuring ethical alignment and effective governance in AI. MHC is also a critical focus in human-centered AI (HCAI) research and application. This chapter systematically examines MHC in AI, articulating its foundational principles and future trajectory. MHC is not simply the right to operate, but the unity of human understanding, intervention, and the traceablity of responsibility in AI decision-making, which requires technological design, AI governance, and humans to play a role together. MHC ensures AI autonomy serves humans without constraining technological progress. The mode of human control needs to match the levels of technology, and human supervision should balance the trust and doubt of AI. For future AI systems, MHC mandates human controllability as a prerequisite, requiring: (1) technical architectures with embedded mechanisms for human control; (2) human-AI interactions optimized for better access to human understanding; and (3) the evolution of AI systems harmonizing intelligence and human controllability. Governance must prioritize HCAI strategies: policies balancing innovation and risk mitigation, human-centered participatory frameworks transcending technical elite dominance, and global promotion of MHC as a universal governance paradigm to safeguard HCAI development. Looking ahead, there is a need to strengthen interdisciplinary research on the controllability of AI systems, enhance ethical and legal awareness among stakeholders, moving beyond simplistic technology design perspectives, focus on the knowledge construction, complexity interpretation, and influencing factors surrounding human control. By fostering MHC, the development of human-controllable AI can be further advanced, delivering HCAI systems.

[113]  arXiv:2512.04338 [pdf, ps, other]
Title: One Detector Fits All: Robust and Adaptive Detection of Malicious Packages from PyPI to Enterprises
Comments: Proceedings of the 2025 Annual Computer Security Applications Conference (ACSAC' 25), December 8-12, 2025, Honolulu, Hawaii, USA
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

The rise of supply chain attacks via malicious Python packages demands robust detection solutions. Current approaches, however, overlook two critical challenges: robustness against adversarial source code transformations and adaptability to the varying false positive rate (FPR) requirements of different actors, from repository maintainers (requiring low FPR) to enterprise security teams (higher FPR tolerance).
We introduce a robust detector capable of seamless integration into both public repositories like PyPI and enterprise ecosystems. To ensure robustness, we propose a novel methodology for generating adversarial packages using fine-grained code obfuscation. Combining these with adversarial training (AT) enhances detector robustness by 2.5x. We comprehensively evaluate AT effectiveness by testing our detector against 122,398 packages collected daily from PyPI over 80 days, showing that AT needs careful application: it makes the detector more robust to obfuscations and allows finding 10% more obfuscated packages, but slightly decreases performance on non-obfuscated packages.
We demonstrate production adaptability of our detector via two case studies: (i) one for PyPI maintainers (tuned at 0.1% FPR) and (ii) one for enterprise teams (tuned at 10% FPR). In the former, we analyze 91,949 packages collected from PyPI over 37 days, achieving a daily detection rate of 2.48 malicious packages with only 2.18 false positives. In the latter, we analyze 1,596 packages adopted by a multinational software company, obtaining only 1.24 false positives daily. These results show that our detector can be seamlessly integrated into both public repositories like PyPI and enterprise ecosystems, ensuring a very low time budget of a few minutes to review the false positives.
Overall, we uncovered 346 malicious packages, now reported to the community.

[114]  arXiv:2512.04339 [pdf, ps, other]
Title: A Conceptual Model for AI Adoption in Financial Decision-Making: Addressing the Unique Challenges of Small and Medium-Sized Enterprises
Comments: The Eighth International Econometric and Financial Conference of Vietnam - ECONVN2025, Ho Chi Minh City, Vietnam, January 13-14-15, 2025
Subjects: Artificial Intelligence (cs.AI)

The adoption of artificial intelligence (AI) offers transformative potential for small and medium-sized enterprises (SMEs), particularly in enhancing financial decision-making processes. However, SMEs often face significant barriers to implementing AI technologies, including limited resources, technical expertise, and data management capabilities. This paper presents a conceptual model for the adoption of AI in financial decision-making for SMEs. The proposed model addresses key challenges faced by SMEs, including limited resources, technical expertise, and data management capabilities. The model is structured into layers: data sources, data processing and integration, AI model deployment, decision support and automation, and validation and risk management. By implementing AI incrementally, SMEs can optimize financial forecasting, budgeting, investment strategies, and risk management. This paper highlights the importance of data quality and continuous model validation, providing a practical roadmap for SMEs to integrate AI into their financial operations. The study concludes with implications for SMEs adopting AI-driven financial processes and suggests areas for future research in AI applications for SME finance.

[115]  arXiv:2512.04341 [pdf, ps, other]
Title: Long-Horizon Model-Based Offline Reinforcement Learning Without Conservatism
Comments: Preprint (52 pages, 15 figures)
Subjects: Machine Learning (cs.LG)

Popular offline reinforcement learning (RL) methods rely on conservatism, either by penalizing out-of-dataset actions or by restricting planning horizons. In this work, we question the universality of this principle and instead revisit a complementary one: a Bayesian perspective. Rather than enforcing conservatism, the Bayesian approach tackles epistemic uncertainty in offline data by modeling a posterior distribution over plausible world models and training a history-dependent agent to maximize expected rewards, enabling test-time generalization. We first illustrate, in a bandit setting, that Bayesianism excels on low-quality datasets where conservatism fails. We then scale the principle to realistic tasks, identifying key design choices, such as layer normalization in the world model and adaptive long-horizon planning, that mitigate compounding error and value overestimation. These yield our practical algorithm, Neubay, grounded in the neutral Bayesian principle. On D4RL and NeoRL benchmarks, Neubay generally matches or surpasses leading conservative algorithms, achieving new state-of-the-art on 7 datasets. Notably, it succeeds with planning horizons of several hundred steps, challenging common belief. Finally, we characterize when Neubay is preferable to conservatism, laying the foundation for a new direction in offline and model-based RL.

[116]  arXiv:2512.04343 [pdf, ps, other]
Title: The Personalization Paradox: Semantic Loss vs. Reasoning Gains in Agentic AI Q&A
Subjects: Information Retrieval (cs.IR)

AIVisor, an agentic retrieval-augmented LLM for student advising, was used to examine how personalization affects system performance across multiple evaluation dimensions. Using twelve authentic advising questions intentionally designed to stress lexical precision, we compared ten personalized and non-personalized system configurations and analyzed outcomes with a Linear Mixed-Effects Model across lexical (BLEU, ROUGE-L), semantic (METEOR, BERTScore), and grounding (RAGAS) metrics. Results showed a consistent trade-off: personalization reliably improved reasoning quality and grounding, yet introduced a significant negative interaction on semantic similarity, driven not by poorer answers but by the limits of current metrics, which penalize meaningful personalized deviations from generic reference texts. This reveals a structural flaw in prevailing LLM evaluation methods, which are ill-suited for assessing user-specific responses. The fully integrated personalized configuration produced the highest overall gains, suggesting that personalization can enhance system effectiveness when evaluated with appropriate multidimensional metrics. Overall, the study demonstrates that personalization produces metric-dependent shifts rather than uniform improvements and provides a methodological foundation for more transparent and robust personalization in agentic AI.

[117]  arXiv:2512.04344 [pdf, ps, other]
Title: Targeted Testing of Compiler Optimizations via Grammar-Level Composition Styles
Subjects: Software Engineering (cs.SE)

Ensuring the correctness of compiler optimizations is critical, but existing fuzzers struggle to test optimizations effectively. First, most fuzzers use optimization pipelines (heuristics-based, fixed sequences of passes) as their harness. The phase-ordering problem can enable or preempt transformations, so pipelines inevitably miss optimization interactions; moreover, many optimizations are not scheduled, even at aggressive levels. Second, optimizations typically fire only when inputs satisfy specific structural relationships, which existing generators and mutations struggle to produce. We propose targeted fuzzing of individual optimizations to complement pipeline-based testing. Our key idea is to exploit composition styles - structural relations over program constructs (adjacency, nesting, repetition, ordering) - that optimizations look for. We build a general-purpose, grammar-based mutational fuzzer, TargetFuzz, that (i) mines composition styles from an optimization-relevant corpus, then (ii) rebuilds them inside different contexts offered by a larger, generic corpus via synthesized mutations to test variations of optimization logic. TargetFuzz is adaptable to a new programming language by lightweight, grammar-based, construct annotations - and it automatically synthesizes mutators and crossovers to rebuild composition styles. No need for hand-coded generators or language-specific mutators, which is particularly useful for modular frameworks such as MLIR, whose dialect-based, rapidly evolving ecosystem makes optimizations difficult to fuzz. Our evaluation on LLVM and MLIR shows that TargetFuzz improves coverage by 8% and 11% and triggers optimizations 2.8$\times$ and 2.6$\times$, compared to baseline fuzzers under the targeted fuzzing mode. We show that targeted fuzzing is complementary: it effectively tests all 37 sampled LLVM optimizations, while pipeline-fuzzing missed 12.

[118]  arXiv:2512.04346 [pdf, ps, other]
Title: Making Cellular Networks Crisis-Proof: Towards Island-Ready, Resilient-By-Design 6G Communication Network
Subjects: Networking and Internet Architecture (cs.NI)

5G and 5G-Advanced cellular networks are vulnerable to regional outages resulting from disasters or targeted attacks. This fragility stems from the reliance on the central core network involved for most 5G connectivity use cases. Crisis-struck regions isolated from the cellular core network form islands, where crisis response is hindered by the unavailability of recovery-relevant services, such as emergency calls, cell broadcasts, messengers, and news apps. Our concept of island-ready, resilient-by-design 6G communication networks envisions local cellular connectivity allowing users to connect to regional application servers, which is currently impossible. In our conceptualization, we follow an all-society approach, as realizing island connectivity requires the cooperation of multiple actors, including users, operators, developers, providers, and authorities. We evaluate how island-ready 5G and 5G-Advanced systems are and outline the open challenges stakeholders must address for full island readiness, such as decentralizing the 6G core network and designing local-first application architectures.

[119]  arXiv:2512.04350 [pdf, ps, other]
Title: ClusterFusion: Hybrid Clustering with Embedding Guidance and LLM Adaptation
Subjects: Computation and Language (cs.CL)

Text clustering is a fundamental task in natural language processing, yet traditional clustering algorithms with pre-trained embeddings often struggle in domain-specific contexts without costly fine-tuning. Large language models (LLMs) provide strong contextual reasoning, yet prior work mainly uses them as auxiliary modules to refine embeddings or adjust cluster boundaries. We propose ClusterFusion, a hybrid framework that instead treats the LLM as the clustering core, guided by lightweight embedding methods. The framework proceeds in three stages: embedding-guided subset partition, LLM-driven topic summarization, and LLM-based topic assignment. This design enables direct incorporation of domain knowledge and user preferences, fully leveraging the contextual adaptability of LLMs. Experiments on three public benchmarks and two new domain-specific datasets demonstrate that ClusterFusion not only achieves state-of-the-art performance on standard tasks but also delivers substantial gains in specialized domains. To support future work, we release our newly constructed dataset and results on all benchmarks.

[120]  arXiv:2512.04351 [pdf, ps, other]
Title: Distance Is All You Need: Radial Dispersion for Uncertainty Estimation in Large Language Models
Subjects: Machine Learning (cs.LG)

Detecting when large language models (LLMs) are uncertain is critical for building reliable systems, yet existing methods are overly complicated, relying on brittle semantic clustering or internal states. We introduce \textbf{Radial Dispersion Score (RDS)}, a simple, parameter-free, fully model-agnostic uncertainty metric that measures the radial dispersion of sampled generations in embedding space. A lightweight probability-weighted variant further incorporates the model's own token probabilities when available, outperforming different nine strong baselines. Moroever, RDS naturally extends to per-sample scoring, enabling applications such as best-of-$N$ selection and confidence-based filtering. Across four challenging free-form QA datasets and multiple LLMs, our metrics achieve state-of-the-art hallucination detection and answer selection performance, while remaining robust and scalable with respect to sample size and embedding choice.

[121]  arXiv:2512.04354 [pdf, ps, other]
Title: SmartAlert: Implementing Machine Learning-Driven Clinical Decision Support for Inpatient Lab Utilization Reduction
Authors: April S. Liang (1), Fatemeh Amrollahi (2), Yixing Jiang (2), Conor K. Corbin (3), Grace Y.E. Kim (4), David Mui (5), Trevor Crowell (6), Aakash Acharya (7), Sreedevi Mony (7), Soumya Punnathanam (7), Jack McKeown (7), Margaret Smith (6 and 8), Steven Lin (6 and 8), Arnold Milstein (8 and 9), Kevin Schulman (1 and 9), Jason Hom (1), Michael A. Pfeffer (1 and 7), Tho D. Pham (10), David Svec (1 and 11), Weihan Chu (1 and 11), Lisa Shieh (1), Christopher Sharp (8), Stephen P. Ma (1), Jonathan H. Chen (1, 2, 9, and 12) ((1) Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, CA, (2) Department of Biomedical Data Science, Stanford University, Palo Alto, CA, (3) SmarterDx, New York, NY, (4) The Johns Hopkins University School of Medicine, Baltimore, MD, (5) Department of Medicine, Stanford University School of Medicine, Palo Alto, CA, (6) Stanford Healthcare AI Applied Research Team, Stanford University School of Medicine, Palo Alto, CA, (7) Technology and Digital Solutions, Stanford Health Care, Palo Alto, CA, (8) Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, CA, (9) Clinical Excellence Research Center, Stanford University, Palo Alto, CA, (10) Department of Pathology, Stanford University School of Medicine, Palo Alto, CA, (11) Stanford Health Care Tri-Valley, Pleasanton, CA, (12) Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA)
Comments: 22 pages, 5 figures
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)

Repetitive laboratory testing unlikely to yield clinically useful information is a common practice that burdens patients and increases healthcare costs. Education and feedback interventions have limited success, while general test ordering restrictions and electronic alerts impede appropriate clinical care. We introduce and evaluate SmartAlert, a machine learning (ML)-driven clinical decision support (CDS) system integrated into the electronic health record that predicts stable laboratory results to reduce unnecessary repeat testing. This case study describes the implementation process, challenges, and lessons learned from deploying SmartAlert targeting complete blood count (CBC) utilization in a randomized controlled pilot across 9270 admissions in eight acute care units across two hospitals between August 15, 2024, and March 15, 2025. Results show significant decrease in number of CBC results within 52 hours of SmartAlert display (1.54 vs 1.82, p <0.01) without adverse effect on secondary safety outcomes, representing a 15% relative reduction in repetitive testing. Implementation lessons learned include interpretation of probabilistic model predictions in clinical contexts, stakeholder engagement to define acceptable model behavior, governance processes for deploying a complex model in a clinical environment, user interface design considerations, alignment with clinical operational priorities, and the value of qualitative feedback from end users. In conclusion, a machine learning-driven CDS system backed by a deliberate implementation and governance process can provide precision guidance on inpatient laboratory testing to safely reduce unnecessary repetitive testing.

[122]  arXiv:2512.04355 [pdf, ps, other]
Title: Counting Without Running: Evaluating LLMs' Reasoning About Code Complexity
Comments: 13 pages, 6 figures, MLSys 2026 Submission
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Performance (cs.PF)

Modern GPU software stacks demand developers who can anticipate performance bottlenecks before ever launching a kernel; misjudging floating-point workloads upstream can derail tuning, scheduling, and even hardware procurement. Yet despite rapid progress in code generation, today's Large Language Models (LLMs) are rarely tested on this kind of forward-looking reasoning. We close that gap with gpuFLOPBench, a benchmark that asks models to "count without running" by predicting single and double-precision FLOP counts for 577 CUDA kernels drawn from HeCBench, annotated with ground-truth profiles and eight execution attributes that distinguish trivially analyzable code from kernels whose FLOPs depend on hidden compiler or runtime behavior. Evaluating current closed-source reasoning models shows clear but uneven progress: the newest LLMs achieve perfect classification on straightforward kernels but still incur multiple order-of-magnitude errors whenever implicit FLOPs arise from division, intrinsic math functions, or common subexpressions. These results surface a core limitation of existing code assistants -- the inability to internalize hardware-specific microcode effects -- and position gpuFLOPBench as a focused testbed for developing LLM tooling that can reason about performance with the same rigor as experienced GPU developers. Sources are available at our repository: https://github.com/Scientific-Computing-Lab/gpuFLOPBench

[123]  arXiv:2512.04356 [pdf, ps, other]
Title: Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment
Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent advancement in multimodal LLMs (MLLMs) has demonstrated their remarkable capability to generate descriptive captions for input videos. However, these models suffer from factual inaccuracies in the generated descriptions, causing severe hallucination issues. While prior works have explored alleviating hallucinations for static images, jointly mitigating visual object and temporal action hallucinations for dynamic videos remains a challenging and unsolved task. To tackle this challenge, we propose a Self-Augmented Contrastive Alignment (SANTA) framework for enabling object and action faithfulness by exempting the spurious correlations and enforcing the emphasis on visual facts. SANTA employs a hallucinative self-augmentation scheme to identify the potential hallucinations that lie in the MLLM and transform the original captions to the contrasted negatives. Furthermore, we develop a tracklet-phrase contrastive alignment to match the regional objects and relation-guided actions with their corresponding visual and temporal phrases. Extensive experiments demonstrate that SANTA outperforms existing methods in alleviating object and action hallucinations, yielding superior performance on the hallucination examination benchmarks.

[124]  arXiv:2512.04358 [pdf, ps, other]
Title: MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo Matching
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing stereo matching networks typically rely on either cost-volume construction based on 3D convolutions or deformation methods based on iterative optimization. The former incurs significant computational overhead during cost aggregation, whereas the latter often lacks the ability to model non-local contextual information. These methods exhibit poor compatibility on resource-constrained mobile devices, limiting their deployment in real-time applications. To address this, we propose a Multi-frequency Adaptive Fusion Network (MAFNet), which can produce high-quality disparity maps using only efficient 2D convolutions. Specifically, we design an adaptive frequency-domain filtering attention module that decomposes the full cost volume into high-frequency and low-frequency volumes, performing frequency-aware feature aggregation separately. Subsequently, we introduce a Linformer-based low-rank attention mechanism to adaptively fuse high- and low-frequency information, yielding more robust disparity estimation. Extensive experiments demonstrate that the proposed MAFNet significantly outperforms existing real-time methods on public datasets such as Scene Flow and KITTI 2015, showing a favorable balance between accuracy and real-time performance.

[125]  arXiv:2512.04359 [pdf, ps, other]
Title: Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning
Subjects: Artificial Intelligence (cs.AI)

Reinforcement learning with verifiable rewards (RLVR) has demonstrated superior performance in enhancing the reasoning capability of large language models (LLMs). However, this accuracy-oriented learning paradigm often suffers from entropy collapse, which reduces policy exploration and limits reasoning capabilities. To address this challenge, we propose an efficient reinforcement learning framework that leverages entropy signals at both the semantic and token levels to improve reasoning. From the data perspective, we introduce semantic entropy-guided curriculum learning, organizing training data from low to high semantic entropy to guide progressive optimization from easier to more challenging tasks. For the algorithmic design, we adopt non-uniform token treatment by imposing KL regularization on low-entropy tokens that critically impact policy exploration and applying stronger constraints on high-covariance portions within these tokens. By jointly optimizing data organization and algorithmic design, our method effectively mitigates entropy collapse and enhances LLM reasoning. Experimental results across 6 benchmarks with 3 different parameter-scale base models demonstrate that our method outperforms other entropy-based approaches in improving reasoning.

[126]  arXiv:2512.04367 [pdf, ps, other]
Title: AgentBay: A Hybrid Interaction Sandbox for Seamless Human-AI Intervention in Agentic Systems
Subjects: Artificial Intelligence (cs.AI)

The rapid advancement of Large Language Models (LLMs) is catalyzing a shift towards autonomous AI Agents capable of executing complex, multi-step tasks. However, these agents remain brittle when faced with real-world exceptions, making Human-in-the-Loop (HITL) supervision essential for mission-critical applications. In this paper, we present AgentBay, a novel sandbox service designed from the ground up for hybrid interaction. AgentBay provides secure, isolated execution environments spanning Windows, Linux, Android, Web Browsers, and Code interpreters. Its core contribution is a unified session accessible via a hybrid control interface: An AI agent can interact programmatically via mainstream interfaces (MCP, Open Source SDK), while a human operator can, at any moment, seamlessly take over full manual control. This seamless intervention is enabled by Adaptive Streaming Protocol (ASP). Unlike traditional VNC/RDP, ASP is specifically engineered for this hybrid use case, delivering an ultra-low-latency, smoother user experience that remains resilient even in weak network environments. It achieves this by dynamically blending command-based and video-based streaming, adapting its encoding strategy based on network conditions and the current controller (AI or human). Our evaluation demonstrates strong results in security, performance, and task completion rates. In a benchmark of complex tasks, the AgentBay (Agent + Human) model achieved more than 48% success rate improvement. Furthermore, our ASP protocol reduces bandwidth consumption by up to 50% compared to standard RDP, and in end-to-end latency with around 5% reduction, especially under poor network conditions. We posit that AgentBay provides a foundational primitive for building the next generation of reliable, human-supervised autonomous systems.

[127]  arXiv:2512.04368 [pdf, ps, other]
Title: AutoGuard: A Self-Healing Proactive Security Layer for DevSecOps Pipelines Using Reinforcement Learning
Comments: Accepted and Presented at 1st IEEE Uttar Pradesh Section Women in Engineering International Conference on Electrical Electronics and Computer Engineering (UPWIECON 2025) organized by NIELIT Dehradun held during 30th 31st October 2025
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)

Contemporary DevSecOps pipelines have to deal with the evolution of security in an ever-continuously integrated and deployed environment. Existing methods,such as rule-based intrusion detection and static vulnerability scanning, are inadequate and unreceptive to changes in the system, causing longer response times and organization needs exposure to emerging attack vectors. In light of the previous constraints, we introduce AutoGuard to the DevSecOps ecosystem, a reinforcement learning (RL)-powered self-healing security framework built to pre-emptively protect DevSecOps environments. AutoGuard is a self-securing security environment that continuously observes pipeline activities for potential anomalies while preemptively remediating the environment. The model observes and reacts based on a policy that is continually learned dynamically over time. The RL agent improves each action over time through reward-based learning aimed at improving the agent's ability to prevent, detect and respond to a security incident in real-time. Testing using simulated ContinuousIntegration / Continuous Deployment (CI/CD) environments showed AutoGuard to successfully improve threat detection accuracy by 22%, reduce mean time torecovery (MTTR) for incidents by 38% and increase overall resilience to incidents as compared to traditional methods.
Keywords- DevSecOps, Reinforcement Learning, Self- Healing Security, Continuous Integration, Automated Threat Mitigation

[128]  arXiv:2512.04369 [pdf, ps, other]
Title: Probabilistic Dynamic Line Rating with Line Graph Convolutional LSTM
Comments: 10 pages, 8 figures
Subjects: Systems and Control (eess.SY)

Dynamic line rating (DLR) is an effective approach to enhancing the utilization of existing transmission line infrastructure by adapting line ratings according to real-time weather conditions. Accurate DLR forecasts are essential for grid operators to effectively schedule generation, manage transmission congestion, and lower operating costs. As renewable generation becomes increasingly variable and weather-dependent, accurate DLR forecasts are also crucial for improving renewable utilization and reducing curtailment during congested periods. Deterministic forecasts, however, often inadequately represent actual line capacities under uncertain weather conditions, leading to operational risks and costly real-time adjustments. To overcome these limitations, we propose a novel network-wide probabilistic DLR forecasting model that leverages both spatial and temporal information, significantly reducing the operational risks and inefficiencies inherent in deterministic methods. Case studies on a synthetic Texas 123-bus system demonstrate that the proposed method not only enhances grid reliability by effectively capturing true DLR values, but also substantially reduces operational costs.

[129]  arXiv:2512.04373 [pdf, ps, other]
Title: Vertical Planetary Landing on Sloped Terrain Using Optical Flow Divergence Estimates
Authors: Hann Woei Ho, Ye Zhou
Comments: This paper is accepted at International Astronautical Congress (IAC 2025)
Subjects: Robotics (cs.RO)

Autonomous landing on sloped terrain poses significant challenges for small, lightweight spacecraft, such as rotorcraft and landers. These vehicles have limited processing capability and payload capacity, which makes advanced deep learning methods and heavy sensors impractical. Flying insects, such as bees, achieve remarkable landings with minimal neural and sensory resources, relying heavily on optical flow. By regulating flow divergence, a measure of vertical velocity divided by height, they perform smooth landings in which velocity and height decay exponentially together. However, adapting this bio-inspired strategy for spacecraft landings on sloped terrain presents two key challenges: global flow-divergence estimates obscure terrain inclination, and the nonlinear nature of divergence-based control can lead to instability when using conventional controllers. This paper proposes a nonlinear control strategy that leverages two distinct local flow divergence estimates to regulate both thrust and attitude during vertical landings. The control law is formulated based on Incremental Nonlinear Dynamic Inversion to handle the nonlinear flow divergence. The thrust control ensures a smooth vertical descent by keeping a constant average of the local flow divergence estimates, while the attitude control aligns the vehicle with the inclined surface at touchdown by exploiting their difference. The approach is evaluated in numerical simulations using a simplified 2D spacecraft model across varying slopes and divergence setpoints. Results show that regulating the average divergence yields stable landings with exponential decay of velocity and height, and using the divergence difference enables effective alignment with inclined terrain. Overall, the method offers a robust, low-resource landing strategy that enhances the feasibility of autonomous planetary missions with small spacecraft.

[130]  arXiv:2512.04374 [pdf, ps, other]
Title: LangSAT: A Novel Framework Combining NLP and Reinforcement Learning for SAT Solving
Subjects: Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL)

Our work presents a novel reinforcement learning (RL) based framework to optimize heuristic selection within the conflict-driven clause learning (CDCL) process, improving the efficiency of Boolean satisfiability (SAT) solving. The proposed system, LangSAT, bridges the gap between natural language inputs and propositional logic by converting English descriptions into Conjunctive Normal Form (CNF) expressions and solving them using an RL-enhanced CDCL SAT solver. Unlike existing SAT-solving platforms that require CNF as input, LangSAT enables users to input standard English descriptions, making SAT-solving more accessible. The framework comprises two key components: Lang2Logic, which translates English sentences into CNF expressions, and SmartSAT, an RL-based SAT solver. SmartSAT encodes clause-variable relationships as structured graph representations and extracts global features specific to the SAT problem. This implementation provides the RL agent with deeper contextual information, enabling SAT problems to be solved more efficiently. Lang2Logic was evaluated on diverse natural language inputs, processing descriptions up to 450 words. The generated CNFs were solved by SmartSAT, which demonstrated comparable performance to traditional CDCL heuristics with respect to solving time. The combined LangSAT framework offers a more accessible and scalable solution for SAT-solving tasks across reasoning, formal verification, and debugging.

[131]  arXiv:2512.04380 [pdf, ps, other]
Title: Vision and Causal Learning Based Channel Estimation for THz Communications
Comments: Submitted to IEEE Transactions on Mobile Computing on Mar. 20, 2025 (18 pages, 9 figures)
Subjects: Networking and Internet Architecture (cs.NI)

The use of terahertz (THz) communications with massive multiple input multiple output (MIMO) systems in 6G can potentially provide high data rates and low latency communications. However, accurate channel estimation in THz frequencies presents significant challenges due to factors such as high propagation losses, sensitivity to environmental obstructions, and strong atmospheric absorption. These challenges are particularly pronounced in urban environments, where traditional channel estimation methods often fail to deliver reliable results, particularly in complex non-line-of-sight (NLoS) scenarios. This paper introduces a novel vision-based channel estimation technique that integrates causal reasoning into urban THz communication systems. The proposed method combines computer vision algorithms with variational causal dynamics (VCD) to analyze real-time images of the urban environment, allowing for a deeper understanding of the physical factors that influence THz signal propagation. By capturing the complex, dynamic interactions between physical objects (such as buildings, trees, and vehicles) and the transmitted signals, the model can predict the channel with up to twice the accuracy of conventional methods. This model improves estimation accuracy and demonstrates superior generalization performance. Hence, it can provide reliable predictions even in previously unseen urban environments. The effectiveness of the proposed method is particularly evident in NLoS conditions, where it significantly outperforms traditional methods such as by accounting for indirect signal paths, such as reflections and diffractions. Simulation results confirm that the proposed vision-based approach surpasses conventional artificial intelligence (AI)-based estimation techniques in accuracy and robustness, showing a substantial improvement across various dynamic urban scenarios.

[132]  arXiv:2512.04381 [pdf, ps, other]
Title: FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination
Subjects: Robotics (cs.RO)

We present FoundAtion-model-guided decoupled LoCO-maNipulation visuomotor policies (FALCON), a framework for loco-manipulation that combines modular diffusion policies with a vision-language foundation model as the coordinator. Our approach explicitly decouples locomotion and manipulation into two specialized visuomotor policies, allowing each subsystem to rely on its own observations. This mitigates the performance degradation that arise when a single policy is forced to fuse heterogeneous, potentially mismatched observations from locomotion and manipulation. Our key innovation lies in restoring coordination between these two independent policies through a vision-language foundation model, which encodes global observations and language instructions into a shared latent embedding conditioning both diffusion policies. On top of this backbone, we introduce a phase-progress head that uses textual descriptions of task stages to infer discrete phase and continuous progress estimates without manual phase labels. To further structure the latent space, we incorporate a coordination-aware contrastive loss that explicitly encodes cross-subsystem compatibility between arm and base actions. We evaluate FALCON on two challenging loco-manipulation tasks requiring navigation, precise end-effector placement, and tight base-arm coordination. Results show that it surpasses centralized and decentralized baselines while exhibiting improved robustness and generalization to out-of-distribution scenarios.

[133]  arXiv:2512.04385 [pdf, ps, other]
Title: STeP-Diff: Spatio-Temporal Physics-Informed Diffusion Models for Mobile Fine-Grained Pollution Forecasting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Fine-grained air pollution forecasting is crucial for urban management and the development of healthy buildings. Deploying portable sensors on mobile platforms such as cars and buses offers a low-cost, easy-to-maintain, and wide-coverage data collection solution. However, due to the random and uncontrollable movement patterns of these non-dedicated mobile platforms, the resulting sensor data are often incomplete and temporally inconsistent. By exploring potential training patterns in the reverse process of diffusion models, we propose Spatio-Temporal Physics-Informed Diffusion Models (STeP-Diff). STeP-Diff leverages DeepONet to model the spatial sequence of measurements along with a PDE-informed diffusion model to forecast the spatio-temporal field from incomplete and time-varying data. Through a PDE-constrained regularization framework, the denoising process asymptotically converges to the convection-diffusion dynamics, ensuring that predictions are both grounded in real-world measurements and aligned with the fundamental physics governing pollution dispersion. To assess the performance of the system, we deployed 59 self-designed portable sensing devices in two cities, operating for 14 days to collect air pollution data. Compared to the second-best performing algorithm, our model achieved improvements of up to 89.12% in MAE, 82.30% in RMSE, and 25.00% in MAPE, with extensive evaluations demonstrating that STeP-Diff effectively captures the spatio-temporal dependencies in air pollution fields.

[134]  arXiv:2512.04386 [pdf, ps, other]
Title: MASE: Interpretable NLP Models via Model-Agnostic Saliency Estimation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Deep neural networks (DNNs) have made significant strides in Natural Language Processing (NLP), yet their interpretability remains elusive, particularly when evaluating their intricate decision-making processes. Traditional methods often rely on post-hoc interpretations, such as saliency maps or feature visualization, which might not be directly applicable to the discrete nature of word data in NLP. Addressing this, we introduce the Model-agnostic Saliency Estimation (MASE) framework. MASE offers local explanations for text-based predictive models without necessitating in-depth knowledge of a model's internal architecture. By leveraging Normalized Linear Gaussian Perturbations (NLGP) on the embedding layer instead of raw word inputs, MASE efficiently estimates input saliency. Our results indicate MASE's superiority over other model-agnostic interpretation methods, especially in terms of Delta Accuracy, positioning it as a promising tool for elucidating the operations of text-based models in NLP.

[135]  arXiv:2512.04388 [pdf, ps, other]
Title: Learning to Orchestrate Agents in Natural Language with the Conductor
Subjects: Machine Learning (cs.LG)

Powerful large language models (LLMs) from different providers have been expensively trained and finetuned to specialize across varying domains. In this work, we introduce a new kind of Conductor model trained with reinforcement learning to automatically discover powerful coordination strategies among LLMs. Our Conductor learns not only to design targeted communication topologies for effective agent-to-agent collaboration, but also to prompt engineer focused instructions to the LLMs to maximally leverage their individual capabilities. We show that, by learning optimal coordination strategies over pools of powerful worker LLMs, a 7B Conductor achieves significant performance gains beyond any individual worker, attaining state-of-the-art results in challenging reasoning benchmarks, such as LiveCodeBench and GPQA. By training with randomized agent pools, our conductor effectively adapts to arbitrary sets of open- and closed-source agents, meeting any user requirements. Furthermore, allowing the Conductor to select itself as a worker gives rise to recursive topologies, elevating performance with a new form of dynamic test-time scaling through online iterative adaptation. More broadly, ours is among the early work demonstrating language model coordination can be unlocked through RL, where powerful coordination strategies emerge naturally in LLMs through pure end-to-end reward maximization.

[136]  arXiv:2512.04389 [pdf, ps, other]
Title: A Structure-Aware Irregular Blocking Method for Sparse LU Factorization
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In sparse LU factorization, nonzero elements after symbolic factorization tend to distribute in diagonal and right-bottom region of sparse matrices. However, regular 2D blocking on this non-uniform distribution structure may lead to workload imbalance across blocks. Besides, existing matrix features fail to guide us effectively in blocking. In this paper, we propose a structure-aware irregular blocking method for numerical factorization. A novel diagonal block-based feature is introduced to effectively characterize the local nonzero distribution of sparse matrices. Based on this, we further propose an irregular blocking method that adjusts block sizes according to the local distribution of nonzeros. The strategy utilizes fine-grained blocks in dense regions and coarse-grained blocks in sparse regions, adequately balancing the nonzeros of blocks both within the same level and across levels in the dependency tree. Experiments demonstrate that, on a single NVIDIA A100 GPU, our proposed irregular blocking method achieves average speedups of 1.50x and 3.32x over PanguLU and the latest SuperLU_DIST, respectively. In addition, it achieves speedups of 1.40x and 3.84x over PanguLU and SuperLU_DIST on 4 NVIDIA A100 GPUs.

[137]  arXiv:2512.04390 [pdf, ps, other]
Title: FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring
Comments: 20 pages, 15 figures. Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Real-world video restoration is plagued by complex degradations from motion coupled with dynamically varying exposure - a key challenge largely overlooked by prior works and a common artifact of auto-exposure or low-light capture. We present FMA-Net++, a framework for joint video super-resolution and deblurring that explicitly models this coupled effect of motion and dynamically varying exposure. FMA-Net++ adopts a sequence-level architecture built from Hierarchical Refinement with Bidirectional Propagation blocks, enabling parallel, long-range temporal modeling. Within each block, an Exposure Time-aware Modulation layer conditions features on per-frame exposure, which in turn drives an exposure-aware Flow-Guided Dynamic Filtering module to infer motion- and exposure-aware degradation kernels. FMA-Net++ decouples degradation learning from restoration: the former predicts exposure- and motion-aware priors to guide the latter, improving both accuracy and efficiency. To evaluate under realistic capture conditions, we introduce REDS-ME (multi-exposure) and REDS-RE (random-exposure) benchmarks. Trained solely on synthetic data, FMA-Net++ achieves state-of-the-art accuracy and temporal consistency on our new benchmarks and GoPro, outperforming recent methods in both restoration quality and inference speed, and generalizes well to challenging real-world videos.

[138]  arXiv:2512.04395 [pdf, ps, other]
Title: Fourier-Attentive Representation Learning: A Fourier-Guided Framework for Few-Shot Generalization in Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Large-scale pre-trained Vision-Language Models (VLMs) have demonstrated strong few-shot learning capabilities. However, these methods typically learn holistic representations where an image's domain-invariant structure is implicitly entangled with its domain-specific style. This presents an opportunity to further enhance generalization by disentangling these visual cues. In this paper, we propose Fourier-Attentive Representation Learning (FARL), a novel framework that addresses this by explicitly disentangling visual representations using Fourier analysis. The core of our method is a dual cross-attention mechanism, where learnable representation tokens separately query an image's structural features (from the phase spectrum) and stylistic features (from the amplitude spectrum). This process yields enriched, disentangled tokens that are then injected deep into the VLM encoders to guide adaptation. Our design, which includes an asymmetric injection strategy, forces the model to learn a more robust vision-language alignment. Extensive experiments on 15 datasets demonstrate the effectiveness of our approach.

[139]  arXiv:2512.04396 [pdf, ps, other]
Title: Sarcasm Detection on Reddit Using Classical Machine Learning and Feature Engineering
Authors: Subrata Karmaker
Comments: 11 pages, 2 figures, includes full Python code. Classical machine learning baseline for sarcasm detection on the SARC 2.0 dataset
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Sarcasm is common in online discussions, yet difficult for machines to identify because the intended meaning often contradicts the literal wording. In this work, I study sarcasm detection using only classical machine learning methods and explicit feature engineering, without relying on neural networks or context from parent comments. Using a 100,000-comment subsample of the Self-Annotated Reddit Corpus (SARC 2.0), I combine word-level and character-level TF-IDF features with simple stylistic indicators. Four models are evaluated: logistic regression, a linear SVM, multinomial Naive Bayes, and a random forest. Naive Bayes and logistic regression perform the strongest, achieving F1-scores around 0.57 for sarcastic comments. Although the lack of conversational context limits performance, the results offer a clear and reproducible baseline for sarcasm detection using lightweight and interpretable methods.

[140]  arXiv:2512.04397 [pdf, ps, other]
Title: Performance Evaluation of Transfer Learning Based Medical Image Classification Techniques for Disease Detection
Journal-ref: 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Copenhagen, Denmark, 2025, pp. 1-5
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Medical image classification plays an increasingly vital role in identifying various diseases by classifying medical images, such as X-rays, MRIs and CT scans, into different categories based on their features. In recent years, deep learning techniques have attracted significant attention in medical image classification. However, it is usually infeasible to train an entire large deep learning model from scratch. To address this issue, one of the solutions is the transfer learning (TL) technique, where a pre-trained model is reused for a new task. In this paper, we present a comprehensive analysis of TL techniques for medical image classification using deep convolutional neural networks. We evaluate six pre-trained models (AlexNet, VGG16, ResNet18, ResNet34, ResNet50, and InceptionV3) on a custom chest X-ray dataset for disease detection. The experimental results demonstrate that InceptionV3 consistently outperforms other models across all the standard metrics. The ResNet family shows progressively better performance with increasing depth, whereas VGG16 and AlexNet perform reasonably well but with lower accuracy. In addition, we also conduct uncertainty analysis and runtime comparison to assess the robustness and computational efficiency of these models. Our findings reveal that TL is beneficial in most cases, especially with limited data, but the extent of improvement depends on several factors such as model architecture, dataset size, and domain similarity between source and target tasks. Moreover, we demonstrate that with a well-trained feature extractor, only a lightweight feedforward model is enough to provide efficient prediction. As such, this study contributes to the understanding of TL in medical image classification, and provides insights for selecting appropriate models based on specific requirements.

[141]  arXiv:2512.04398 [pdf, ps, other]
Title: What is Beyond Presence? Dimensionality, Control, and Information Spaces
Authors: E. Ch'ng
Comments: 38 pages, accepted for Presence: Virtual and Augmented Reality 2026(37)
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

What is after presence? Spatial presence, the sense of "being there", is becoming less of a primary objective and more of a baseline expectation of virtual reality. More than six decades after its invention, VR is shifting from a technical system into a cultural, social, and phenomenological medium, offering experiences that function as distinct modes of reality. Existing theories that focus primarily on perceptual illusions are no longer sufficient to account for these emerging forms of experience. A new framework is needed to guide the design and evaluation of immersive environments by identifying the key technical and abstract dimensions afforded by virtual worlds. These dimensions include spatial, placeness, temporal, social, cultural, cognitive, and psychological parameters. The central argument is that immersive environments must move beyond the technical dimension to leverage richer information channels that shape user experience. This shift from presence to experience orchestration invites creators across disciplines to contribute to the design and assessment of meaningful immersive worlds.

[142]  arXiv:2512.04399 [pdf, ps, other]
Title: Development of a 15-Degree-of-Freedom Bionic Hand with Cable-Driven Transmission and Distributed Actuation
Subjects: Robotics (cs.RO)

In robotic hand research, minimizing the number of actuators while maintaining human-hand-consistent dimensions and degrees of freedom constitutes a fundamental challenge. Drawing bio-inspiration from human hand kinematic configurations and muscle distribution strategies, this work proposes a novel 15-DoF dexterous robotic hand, with detailed analysis of its mechanical architecture, electrical system, and control system. The bionic hand employs a new tendon-driven mechanism, significantly reducing the number of motors required by traditional tendon-driven systems while enhancing motion performance and simplifying the mechanical structure. This design integrates five motors in the forearm to provide strong gripping force, while ten small motors are installed in the palm to support fine manipulation tasks. Additionally, a corresponding joint sensing and motor driving electrical system was developed to ensure efficient control and feedback. The entire system weighs only 1.4kg, combining lightweight and high-performance features. Through experiments, the bionic hand exhibited exceptional dexterity and robust grasping capabilities, demonstrating significant potential for robotic manipulation tasks.

[143]  arXiv:2512.04404 [pdf, ps, other]
Title: Bridging Probabilistic Inference and Behavior Trees: An Interactive Framework for Adaptive Multi-Robot Cooperation
Comments: 34 pages, is submitted RAS Journal
Subjects: Robotics (cs.RO)

This paper proposes an Interactive Inference Behavior Tree (IIBT) framework that integrates behavior trees (BTs) with active inference under the free energy principle for distributed multi-robot decision-making. The proposed IIBT node extends conventional BTs with probabilistic reasoning, enabling online joint planning and execution across multiple robots. It remains fully compatible with standard BT architectures, allowing seamless integration into existing multi-robot control systems. Within this framework, multi-robot cooperation is formulated as a free-energy minimization process, where each robot dynamically updates its preference matrix based on perceptual inputs and peer intentions, thereby achieving adaptive coordination in partially observable and dynamic environments. The proposed approach is validated through both simulation and real-world experiments, including a multi-robot maze navigation and a collaborative manipulation task, compared against traditional BTs(https://youtu.be/KX_oT3IDTf4). Experimental results demonstrate that the IIBT framework reduces BT node complexity by over 70%, while maintaining robust, interpretable, and adaptive cooperative behavior under environmental uncertainty.

[144]  arXiv:2512.04408 [pdf, ps, other]
Title: Executable Governance for AI: Translating Policies into Rules Using LLMs
Comments: Accepted to AAAI-26 AI Governance Workshop (in-person presentation); 10 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI)

AI policy guidance is predominantly written as prose, which practitioners must first convert into executable rules before frameworks can evaluate or enforce them. This manual step is slow, error-prone, difficult to scale, and often delays the use of safeguards in real-world deployments. To address this gap, we present Policy-to-Tests (P2T), a framework that converts natural-language policy documents into normalized, machine-readable rules. The framework comprises a pipeline and a compact domain-specific language (DSL) that encodes hazards, scope, conditions, exceptions, and required evidence, yielding a canonical representation of extracted rules. To test the framework beyond a single policy, we apply it across general frameworks, sector guidance, and enterprise standards, extracting obligation-bearing clauses and converting them into executable rules. These AI-generated rules closely match strong human baselines on span-level and rule-level metrics, with robust inter-annotator agreement on the gold set. To evaluate downstream behavioral and safety impact, we add HIPAA-derived safeguards to a generative agent and compare it with an otherwise identical agent without guardrails. An LLM-based judge, aligned with gold-standard criteria, measures violation rates and robustness to obfuscated and compositional prompts. Detailed results are provided in the appendix. We release the codebase, DSL, prompts, and rule sets as open-source resources to enable reproducible evaluation.

[145]  arXiv:2512.04409 [pdf, ps, other]
Title: Distributed Articulation Point Identification in Time-Varying Undirected Networks
Subjects: Social and Information Networks (cs.SI); Systems and Control (eess.SY)

Identifying articulation points (APs) is fundamental to assessing the robustness of time-varying networks. In such dynamic environments, topological changes including edge additions and deletions can instantly alter the set of APs, demanding rapid and efficient re-assessment. This paper proposes a fully distributed algorithm for identifying APs and monitoring biconnectivity. Our core contribution is an incremental update protocol. Unlike static methods that require global re-initialization which incurs high communication overhead, our algorithm propagates information from the site of the change, updating only the affected nodes' state values. This approach, which builds upon a maximum consensus protocol, not only ensures convergence to the correct AP set following topological changes but also preserves network privacy by preventing nodes from reconstructing the global topology. We provide rigorous proofs of correctness for this eventual convergence and demonstrate its applicability and efficiency through experiments.

[146]  arXiv:2512.04411 [pdf, ps, other]
Title: Iterative Contact-resolving Hybrid Methods for Multiscale Contact Mechanics
Subjects: Numerical Analysis (math.NA)

Modeling contact mechanics with high contrast coefficients presents significant mathematical and computational challenges, especially in achieving strongly symmetric stress approximations. Due to the inherent nonlinearity of contact problems, conventional methods that treat the entire domain as a monolithic system often lead to high global complexity. To address this, we develop an iterative contact-resolving hybrid method by localizing nonlinear contact constraints within a smaller subdomain, while the larger subdomain is governed by a linear system. Our system employs variational inequality theory, minimization principles, and penalty methods. More importantly, we propose four discretization types within the two-subdomain framework, ranging from applying standard/mixed FEM across the entire domain to combining standard/mixed multiscale methods in the larger subdomain with standard/mixed FEM in the smaller one. % The standard finite element method and standard constraint energy minimizing generalized multiscale finite element method are simple and easy to demonstrate. By employing a multiscale reduction technique, the method avoids excessive degrees of freedom inherent in conventional methods in the larger domain, while the mixed formulation enables direct stress computation, ensures local momentum conservation, and resists locking in nearly incompressible materials. Convergence analysis and the corresponding algorithms are provided for all cases. Extensive numerical experiments are presented to validate the effectiveness of the approaches.

[147]  arXiv:2512.04413 [pdf, ps, other]
Title: Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object Detection
Comments: 12 pages, 8 figures, 11 tables
Journal-ref: IEEE Transactions on Geoscience and Remote Sensing 63 (2025) 1-11
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Knowledge distillation is an effective and hardware-friendly method, which plays a key role in lightweighting remote sensing object detection. However, existing distillation methods often encounter the issue of mixed features in remote sensing images (RSIs), and neglect the discrepancies caused by subtle feature variations, leading to entangled knowledge confusion. To address these challenges, we propose an architecture-agnostic distillation method named Dual-Stream Spectral Decoupling Distillation (DS2D2) for universal remote sensing object detection tasks. Specifically, DS2D2 integrates explicit and implicit distillation grounded in spectral decomposition. Firstly, the first-order wavelet transform is applied for spectral decomposition to preserve the critical spatial characteristics of RSIs. Leveraging this spatial preservation, a Density-Independent Scale Weight (DISW) is designed to address the challenges of dense and small object detection common in RSIs. Secondly, we show implicit knowledge hidden in subtle student-teacher feature discrepancies, which significantly influence predictions when activated by detection heads. This implicit knowledge is extracted via full-frequency and high-frequency amplifiers, which map feature differences to prediction deviations. Extensive experiments on DIOR and DOTA datasets validate the effectiveness of the proposed method. Specifically, on DIOR dataset, DS2D2 achieves improvements of 4.2% in AP50 for RetinaNet and 3.8% in AP50 for Faster R-CNN, outperforming existing distillation approaches. The source code will be available at https://github.com/PolarAid/DS2D2.

[148]  arXiv:2512.04415 [pdf, ps, other]
Title: RoboBPP: Benchmarking Robotic Online Bin Packing with Physics-based Simulation
Comments: Under review at the International Journal of Robotics Research (IJRR)
Subjects: Robotics (cs.RO)

Physical feasibility in 3D bin packing is a key requirement in modern industrial logistics and robotic automation. With the growing adoption of industrial automation, online bin packing has gained increasing attention. However, inconsistencies in problem settings, test datasets, and evaluation metrics have hindered progress in the field, and there is a lack of a comprehensive benchmarking system. Direct testing on real hardware is costly, and building a realistic simulation environment is also challenging. To address these limitations, we introduce RoboBPP, a benchmarking system designed for robotic online bin packing. RoboBPP integrates a physics-based simulator to assess physical feasibility. In our simulation environment, we introduce a robotic arm and boxes at real-world scales to replicate real industrial packing workflows. By simulating conditions that arise in real industrial applications, we ensure that evaluated algorithms are practically deployable. In addition, prior studies often rely on synthetic datasets whose distributions differ from real-world industrial data. To address this issue, we collect three datasets from real industrial workflows, including assembly-line production, logistics packing, and furniture manufacturing. The benchmark comprises three carefully designed test settings and extends existing evaluation metrics with new metrics for structural stability and operational safety. We design a scoring system and derive a range of insights from the evaluation results. RoboBPP is fully open-source and is equipped with visualization tools and an online leaderboard, providing a reproducible and extensible foundation for future research and industrial applications (https://robot-bin-packing-benchmark.github.io).

[149]  arXiv:2512.04416 [pdf, ps, other]
Title: GovBench: Benchmarking LLM Agents for Real-World Data Governance Workflows
Comments: Equal contribution: Zhou Liu and Zhaoyang Han. Corresponding authors: Yuanfeng Song and Wentao Zhang
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Data governance ensures data quality, security, and compliance through policies and standards, a critical foundation for scaling modern AI development. Recently, large language models (LLMs) have emerged as a promising solution for automating data governance by translating user intent into executable transformation code. However, existing benchmarks for automated data science often emphasize snippet-level coding or high-level analytics, failing to capture the unique challenge of data governance: ensuring the correctness and quality of the data itself. To bridge this gap, we introduce GovBench, a benchmark featuring 150 diverse tasks grounded in real-world scenarios, built on data from actual cases. GovBench employs a novel "reversed-objective" methodology to synthesize realistic noise and utilizes rigorous metrics to assess end-to-end pipeline reliability. Our analysis on GovBench reveals that current models struggle with complex, multi-step workflows and lack robust error-correction mechanisms. Consequently, we propose DataGovAgent, a framework utilizing a Planner-Executor-Evaluator architecture that integrates constraint-based planning, retrieval-augmented generation, and sandboxed feedback-driven debugging. Experimental results show that DataGovAgent significantly boosts the Average Task Score (ATS) on complex tasks from 39.7 to 54.9 and reduces debugging iterations by over 77.9 percent compared to general-purpose baselines.

[150]  arXiv:2512.04419 [pdf, ps, other]
Title: Solving LLM Repetition Problem in Production: A Comprehensive Study of Multiple Solutions
Subjects: Artificial Intelligence (cs.AI)

The repetition problem, where Large Language Models (LLMs) continuously generate repetitive content without proper termination, poses a critical challenge in production deployments, causing severe performance degradation and system stalling. This paper presents a comprehensive investigation and multiple practical solutions for the repetition problem encountered in real-world batch code interpretation tasks.
We identify three distinct repetition patterns: (1) business rule generation repetition, (2) method call relationship analysis repetition, and (3) PlantUML diagram syntax generation repetition. Through rigorous theoretical analysis based on Markov models, we establish that the root cause lies in greedy decoding's inability to escape repetitive loops, exacerbated by self-reinforcement effects.
Our comprehensive experimental evaluation demonstrates three viable solutions: (1) Beam Search decoding with early_stopping=True serves as a universal post-hoc mechanism that effectively resolves all three repetition patterns; (2) presence_penalty hyperparameter provides an effective solution specifically for BadCase 1; and (3) Direct Preference Optimization (DPO) fine-tuning offers a universal model-level solution for all three BadCases.
The primary value of this work lies in combining first-hand production experience with extensive experimental validation. Our main contributions include systematic theoretical analysis of repetition mechanisms, comprehensive evaluation of multiple solutions with task-specific applicability mapping, identification of early_stopping as the critical parameter for Beam Search effectiveness, and practical production-ready solutions validated in real deployment environments.

[151]  arXiv:2512.04421 [pdf, ps, other]
Title: UTrice: Unifying Primitives in Differentiable Ray Tracing and Rasterization via Triangles for Particle-Based 3D Scenes
Comments: 13 pages, 10 figures, submitted to CVPR2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Ray tracing 3D Gaussian particles enables realistic effects such as depth of field, refractions, and flexible camera modeling for novel-view synthesis. However, existing methods trace Gaussians through proxy geometry, which requires constructing complex intermediate meshes and performing costly intersection tests. This limitation arises because Gaussian-based particles are not well suited as unified primitives for both ray tracing and rasterization. In this work, we propose a differentiable triangle-based ray tracing pipeline that directly treats triangles as rendering primitives without relying on any proxy geometry. Our results show that the proposed method achieves significantly higher rendering quality than existing ray tracing approaches while maintaining real-time rendering performance. Moreover, our pipeline can directly render triangles optimized by the rasterization-based method Triangle Splatting, thus unifying the primitives used in novel-view synthesis.

[152]  arXiv:2512.04425 [pdf, ps, other]
Title: Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Accurate and interpretable gait analysis plays a crucial role in the early detection of Parkinsons disease (PD),yet most existing approaches remain limited by single-modality inputs, low robustness, and a lack of clinical transparency. This paper presents an explainable multimodal framework that integrates RGB and Depth (RGB-D) data to recognize Parkinsonian gait patterns under realistic conditions. The proposed system employs dual YOLOv11-based encoders for modality-specific feature extraction, followed by a Multi-Scale Local-Global Extraction (MLGE) module and a Cross-Spatial Neck Fusion mechanism to enhance spatial-temporal representation. This design captures both fine-grained limb motion (e.g., reduced arm swing) and overall gait dynamics (e.g., short stride or turning difficulty), even in challenging scenarios such as low lighting or occlusion caused by clothing. To ensure interpretability, a frozen Large Language Model (LLM) is incorporated to translate fused visual embeddings and structured metadata into clinically meaningful textual explanations. Experimental evaluations on multimodal gait datasets demonstrate that the proposed RGB-D fusion framework achieves higher recognition accuracy, improved robustness to environmental variations, and clear visual-linguistic reasoning compared with single-input baselines. By combining multimodal feature learning with language-based interpretability, this study bridges the gap between visual recognition and clinical understanding, offering a novel vision-language paradigm for reliable and explainable Parkinsons disease gait analysis. Code:https://github.com/manaralnaasan/RGB-D_parkinson-LLM

[153]  arXiv:2512.04426 [pdf, ps, other]
Title: Self-Paced and Self-Corrective Masked Prediction for Movie Trailer Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

As a challenging video editing task, movie trailer generation involves selecting and reorganizing movie shots to create engaging trailers. Currently, most existing automatic trailer generation methods employ a "selection-then-ranking" paradigm (i.e., first selecting key shots and then ranking them), which suffers from inevitable error propagation and limits the quality of the generated trailers. Beyond this paradigm, we propose a new self-paced and self-corrective masked prediction method called SSMP, which achieves state-of-the-art results in automatic trailer generation via bi-directional contextual modeling and progressive self-correction. In particular, SSMP trains a Transformer encoder that takes the movie shot sequences as prompts and generates corresponding trailer shot sequences accordingly. The model is trained via masked prediction, reconstructing each trailer shot sequence from its randomly masked counterpart. The mask ratio is self-paced, allowing the task difficulty to adapt to the model and thereby improving model performance. When generating a movie trailer, the model fills the shot positions with high confidence at each step and re-masks the remaining positions for the next prediction, forming a progressive self-correction mechanism that is analogous to how human editors work. Both quantitative results and user studies demonstrate the superiority of SSMP in comparison to existing automatic movie trailer generation methods. Demo is available at: https://github.com/Dixin-Lab/SSMP.

[154]  arXiv:2512.04436 [pdf, ps, other]
Title: ReFuzz: Reusing Tests for Processor Fuzzing with Contextual Bandits
Comments: To be published in the proceedings of the Network and Distributed System Security (NDSS) Symposium, 2026
Subjects: Cryptography and Security (cs.CR)

Processor designs rely on iterative modifications and reuse well-established designs. However, this reuse of prior designs also leads to similar vulnerabilities across multiple processors. As processors grow increasingly complex with iterative modifications, efficiently detecting vulnerabilities from modern processors is critical. Inspired by software fuzzing, hardware fuzzing has recently demonstrated its effectiveness in detecting processor vulnerabilities. Yet, to our best knowledge, existing processor fuzzers fuzz each design individually, lacking the capability to understand known vulnerabilities in prior processors to fine-tune fuzzing to identify similar or new variants of vulnerabilities.
To address this gap, we present ReFuzz, an adaptive fuzzing framework that leverages contextual bandit to reuse highly effective tests from prior processors to fuzz a processor-under-test (PUT) within a given ISA. By intelligently mutating tests that trigger vulnerabilities in prior processors, ReFuzz effectively detects similar and new variants of vulnerabilities in PUTs. ReFuzz uncovered three new security vulnerabilities and two new functional bugs. ReFuzz detected one vulnerability by reusing a test that triggers a known vulnerability in a prior processor. One functional bug exists across three processors that share design modules. The second bug has two variants. Additionally, ReFuzz reuses highly effective tests to enhance efficiency in coverage, achieving an average 511.23x coverage speedup and up to 9.33% more total coverage, compared to existing fuzzers.

[155]  arXiv:2512.04439 [pdf, ps, other]
Title: Quantum-Accelerated Deep Reinforcement Learning for Frequency Regulation Enhancement
Subjects: Systems and Control (eess.SY)

In modern power systems, frequency regulation is a fundamental prerequisite for ensuring system reliability and assessing the robustness of expansion projects. Conventional feedback control schemes, however, exhibit limited accuracy under varying operating conditions because their gains remain static. Consequently, deep reinforcement learning methods are increasingly employed to design adaptive controllers that can be generalized to diverse frequency control tasks. At the same time, recent advances in quantum computing provide avenues for embedding quantum capabilities into such critical applications. In particular, the potential of quantum algorithms can be more effectively explored and harnessed on near-term quantum devices by leveraging insights from active controller design. In this work, we incorporate a quantum circuit together with an ansatz into the operation of a deep deterministic policy gradient agent. The simulation results of the IEEE 14-bus test system demonstrate the potential of this integrated approach that can achieve reliable, robust performance across diverse real-world challenges.

[156]  arXiv:2512.04441 [pdf, ps, other]
Title: MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving
Subjects: Computer Vision and Pattern Recognition (cs.CV)

End-to-End autonomous driving (E2E-AD) has emerged as a new paradigm, where trajectory planning plays a crucial role. Existing studies mainly follow two directions: trajectory generation oriented, which focuses on producing high-quality trajectories with simple decision mechanisms, and trajectory selection oriented, which performs multi-dimensional evaluation to select the best trajectory yet lacks sufficient generative capability. In this work, we propose MindDrive, a harmonized framework that integrates high-quality trajectory generation with comprehensive decision reasoning. It establishes a structured reasoning paradigm of "context simulation - candidate generation - multi-objective trade-off". In particular, the proposed Future-aware Trajectory Generator (FaTG), based on a World Action Model (WaM), performs ego-conditioned "what-if" simulations to predict potential future scenes and generate foresighted trajectory candidates. Building upon this, the VLM-oriented Evaluator (VLoE) leverages the reasoning capability of a large vision-language model to conduct multi-objective evaluations across safety, comfort, and efficiency dimensions, leading to reasoned and human-aligned decision making. Extensive experiments on the NAVSIM-v1 and NAVSIM-v2 benchmarks demonstrate that MindDrive achieves state-of-the-art performance across multi-dimensional driving metrics, significantly enhancing safety, compliance, and generalization. This work provides a promising path toward interpretable and cognitively guided autonomous driving.

[157]  arXiv:2512.04442 [pdf, ps, other]
Title: TaskEval: Synthesised Evaluation for Foundation-Model Tasks
Comments: 5 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Hallucinations are a key concern when creating applications that rely on Foundation models (FMs). Understanding where and how these subtle failures occur in an application relies on evaluation methods known as \textit{evals}. Prior work focuses on defining new eval methods or benchmark datasets for specific tasks. However, neither helps a software team with a task-specific FM application when there is no metric or dataset. The demand for both automated approaches and deep integration of human insight makes this a challenging problem. We address this gap by proposing an approach to synthesise a FM task-specific evaluator program that provides automation and a custom UI for capturing feedback. The core novelty of our approach lies in: (1) a task-agnostic meta-model that captures properties of any FM task, (2) an interaction protocol for efficient use of human feedback, and (3) an eval synthesiser that selects or generates an appropriate set of evals. We implement our approach in \toolname and demonstrate the concept on two diverse FM tasks: chart data extraction and document question answering. A preliminary evaluation on the quality of our selected evals shows 93\% and 90\% accuracy respectively. Our research tackles a growing problem facing engineering teams, how to evaluate and review outputs from FM tasks.

[158]  arXiv:2512.04443 [pdf, ps, other]
Title: MD-SNN: Membrane Potential-aware Distillation on Quantized Spiking Neural Network
Subjects: Neural and Evolutionary Computing (cs.NE)

Spiking Neural Networks (SNNs) offer a promising and energy-efficient alternative to conventional neural networks, thanks to their sparse binary activation. However, they face challenges regarding memory and computation overhead due to complex spatio-temporal dynamics and the necessity for multiple backpropagation computations across timesteps during training. To mitigate this overhead, compression techniques such as quantization are applied to SNNs. Yet, naively applying quantization to SNNs introduces a mismatch in membrane potential, a crucial factor for the firing of spikes, resulting in accuracy degradation. In this paper, we introduce Membrane-aware Distillation on quantized Spiking Neural Network (MD-SNN), which leverages membrane potential to mitigate discrepancies after weight, membrane potential, and batch normalization quantization. To our knowledge, this study represents the first application of membrane potential knowledge distillation in SNNs. We validate our approach on various datasets, including CIFAR10, CIFAR100, N-Caltech101, and TinyImageNet, demonstrating its effectiveness for both static and dynamic data scenarios. Furthermore, for hardware efficiency, we evaluate the MD-SNN with SpikeSim platform, finding that MD-SNNs achieve 14.85X lower energy-delay-area product (EDAP), 2.64X higher TOPS/W, and 6.19X higher TOPS/mm2 compared to floating point SNNs at iso-accuracy on N-Caltech101 dataset.

[159]  arXiv:2512.04445 [pdf, ps, other]
Title: Automating Complex Document Workflows via Stepwise and Rollback-Enabled Operation Orchestration
Comments: 9 pages, 3 figures, accepted by AAAI-2026
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Workflow automation promises substantial productivity gains in everyday document-related tasks. While prior agentic systems can execute isolated instructions, they struggle with automating multi-step, session-level workflows due to limited control over the operational process. To this end, we introduce AutoDW, a novel execution framework that enables stepwise, rollback-enabled operation orchestration. AutoDW incrementally plans API actions conditioned on user instructions, intent-filtered API candidates, and the evolving states of the document. It further employs robust rollback mechanisms at both the argument and API levels, enabling dynamic correction and fault tolerance. These designs together ensure that the execution trajectory of AutoDW remains aligned with user intent and document context across long-horizon workflows. To assess its effectiveness, we construct a comprehensive benchmark of 250 sessions and 1,708 human-annotated instructions, reflecting realistic document processing scenarios with interdependent instructions. AutoDW achieves 90% and 62% completion rates on instruction- and session-level tasks, respectively, outperforming strong baselines by 40% and 76%. Moreover, AutoDW also remains robust for the decision of backbone LLMs and on tasks with varying difficulty. Code and data will be open-sourced. Code: https://github.com/YJett/AutoDW

[160]  arXiv:2512.04446 [pdf, ps, other]
Title: Vision-Language-Action Models for Selective Robotic Disassembly: A Case Study on Critical Component Extraction from Desktops
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Automating disassembly of critical components from end-of-life (EoL) desktops, such as high-value items like RAM modules and CPUs, as well as sensitive parts like hard disk drives, remains challenging due to the inherent variability and uncertainty of these products. Moreover, their disassembly requires sequential, precise, and dexterous operations, further increasing the complexity of automation. Current robotic disassembly processes are typically divided into several stages: perception, sequence planning, task planning, motion planning, and manipulation. Each stage requires explicit modeling, which limits generalization to unfamiliar scenarios. Recent development of vision-language-action (VLA) models has presented an end-to-end approach for general robotic manipulation tasks. Although VLAs have demonstrated promising performance on simple tasks, the feasibility of applying such models to complex disassembly remains largely unexplored. In this paper, we collected a customized dataset for robotic RAM and CPU disassembly and used it to fine-tune two well-established VLA approaches, OpenVLA and OpenVLA-OFT, as a case study. We divided the whole disassembly task into several small steps, and our preliminary experimental results indicate that the fine-tuned VLA models can faithfully complete multiple early steps but struggle with certain critical subtasks, leading to task failure. However, we observed that a simple hybrid strategy that combines VLA with a rule-based controller can successfully perform the entire disassembly operation. These findings highlight the current limitations of VLA models in handling the dexterity and precision required for robotic EoL product disassembly. By offering a detailed analysis of the observed results, this study provides insights that may inform future research to address current challenges and advance end-to-end robotic automated disassembly.

[161]  arXiv:2512.04448 [pdf, ps, other]
Title: Has ACL Lost Its Crown? A Decade-Long Quantitative Analysis of Scale and Impact Across Leading AI Conferences
Subjects: Digital Libraries (cs.DL); Computers and Society (cs.CY)

The recent surge of language models has rapidly expanded NLP research, driving an exponential rise in submissions and acceptances at major conferences. Yet this growth has been shadowed by escalating concerns over conference quality, e.g., plagiarism, reviewer inexperience and collusive bidding. However, existing studies rely largely on qualitative accounts (e.g., expert interviews, social media discussions, etc.), lacking longitudinal empirical evidence. To fill this gap, we conduct a ten year empirical study spanning seven leading conferences. We build a four dimensional bibliometric framework covering conference scale, core citation statistics,impact dispersion, cross venue and journal influence, etc. Notably, we further propose a metric Quality Quantity Elasticity, which measures the elasticity of citation growth relative to acceptance growth. Our findings show that ML venues sustain dominant and stable impact, NLP venues undergo widening stratification with mixed expansion efficiency, and AI venues exhibit structural decline. This study provides the first decade-long, cross-venue empirical evidence on the evolution of major conferences.

[162]  arXiv:2512.04449 [pdf, ps, other]
Title: Offloading to CXL-based Computational Memory
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

CXL-based Computational Memory (CCM) enables near-memory processing within expanded remote memory, presenting opportunities to address data movement costs associated with disaggregated memory systems and to accelerate overall performance. However, existing operation offloading mechanisms are not capable of leveraging the trade-offs of different models based on different CXL protocols. This work first examines these tradeoffs and demonstrates their impact on end-to-end performance and system efficiency for workloads with diverse data and processing requirements. We propose a novel 'Asynchronous Back-Streaming' protocol by carefully layering data and control transfer operations on top of the underlying CXL protocols. We design KAI, a system that realizes the asynchronous back-streaming model that supports asynchronous data movement and lightweight pipelining in host-CCM interactions. Overall, KAI reduces end-to-end runtime by up to 50.4%, and CCM and host idle times by average 22.11x and 3.85x, respectively.

[163]  arXiv:2512.04451 [pdf, ps, other]
Title: StreamEQA: Towards Streaming Video Understanding for Embodied Scenarios
Subjects: Computer Vision and Pattern Recognition (cs.CV)

As embodied intelligence advances toward real-world deployment, the ability to continuously perceive and reason over streaming visual inputs becomes essential. In such settings, an agent must maintain situational awareness of its environment, comprehend the interactions with surrounding entities, and dynamically plan actions informed by past observations, current contexts, and anticipated future events. To facilitate progress in this direction, we introduce StreamEQA, the first benchmark designed for streaming video question answering in embodied scenarios. StreamEQA evaluates existing MLLMs along two orthogonal dimensions: Embodied and Streaming. Along the embodied dimension, we categorize the questions into three levels: perception, interaction, and planning, which progressively assess a model's ability to recognize fine-grained visual details, reason about agent-object interactions, and perform high-level goal-directed reasoning. For the streaming dimension, questions are divided into backward, real-time, and forward reasoning, with each mode relying on a distinct temporal context. Built upon 156 independent long videos, StreamEQA defines 42 tasks and generates approximately 21K question-answer pairs with precise timestamps through a hybrid pipeline combining automated generation and human refinement. Evaluations of 13 state-of-the-art video-LLMs reveal that, despite strong performance on conventional benchmarks, these models still struggle with streaming video understanding in embodied scenarios. We hope StreamEQA will catalyze research on streaming video understanding for embodied applications.

[164]  arXiv:2512.04453 [pdf, ps, other]
Title: Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration
Comments: Accepted to ACM/IEEE International Conference on Human-Robot Interaction, 2026 (HRI 2026), 10 pages, 4 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.

[165]  arXiv:2512.04456 [pdf, ps, other]
Title: GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis
Comments: AAAI2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recent image denoising methods have leveraged generative modeling for real noise synthesis to address the costly acquisition of real-world noisy data. However, these generative models typically require camera metadata and extensive target-specific noisy-clean image pairs, often showing limited generalization between settings. In this paper, to mitigate the prerequisites, we propose a Single-Pair Guided Diffusion for generalized noise synthesis GuidNoise, which uses a single noisy/clean pair as the guidance, often easily obtained by itself within a training set. To train GuidNoise, which generates synthetic noisy images from the guidance, we introduce a guidance-aware affine feature modification (GAFM) and a noise-aware refine loss to leverage the inherent potential of diffusion models. This loss function refines the diffusion model's backward process, making the model more adept at generating realistic noise distributions. The GuidNoise synthesizes high-quality noisy images under diverse noise environments without additional metadata during both training and inference. Additionally, GuidNoise enables the efficient generation of noisy-clean image pairs at inference time, making synthetic noise readily applicable for augmenting training data. This self-augmentation significantly improves denoising performance, especially in practical scenarios with lightweight models and limited training data. The code is available at https://github.com/chjinny/GuidNoise.

[166]  arXiv:2512.04457 [pdf, ps, other]
Title: RapidUn: Influence-Driven Parameter Reweighting for Efficient Large Language Model Unlearning
Comments: Code available at: this https URL
Subjects: Computation and Language (cs.CL)

Removing specific data influence from large language models (LLMs) remains challenging, as retraining is costly and existing approximate unlearning methods are often unstable. The challenge is exacerbated when the forget set is small or imbalanced. We introduce RapidUn, an influence-driven and parameter-efficient unlearning framework. It first estimates per-sample influence through a fast estimation module, then maps these scores into adaptive update weights that guide selective parameter updates -- forgetting harmful behavior while retaining general knowledge. On Mistral-7B and Llama-3-8B across Dolly-15k and Alpaca-57k, RapidUn achieves up to 100 times higher efficiency than full retraining and consistently outperforms Fisher, GA, and LoReUn on both in-distribution and out-of-distribution forgetting. These results establish influence-guided parameter reweighting as a scalable and interpretable paradigm for LLM unlearning.

[167]  arXiv:2512.04459 [pdf, ps, other]
Title: dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The autonomous driving community is increasingly focused on addressing the challenges posed by out-of-distribution (OOD) driving scenarios. A dominant research trend seeks to enhance end-to-end (E2E) driving systems by integrating vision-language models (VLMs), leveraging their rich world knowledge and reasoning abilities to improve generalization across diverse environments. However, most existing VLMs or vision-language agents (VLAs) are built upon autoregressive (AR) models. In this paper, we observe that existing AR-based VLMs -- limited by causal attention and sequential token generation -- often fail to maintain consistency and controllability between high-level reasoning and low-level planning. In contrast, recent discrete diffusion VLMs equipped with bidirectional attention exhibit superior controllability and reliability through iterative denoising. Building on these observations, we introduce dVLM-AD, a diffusion-based vision-language model that unifies perception, structured reasoning, and low-level planning for end-to-end driving. Evaluated on nuScenes and WOD-E2E, dVLM-AD yields more consistent reasoning-action pairs and achieves planning performance comparable to existing driving VLM/VLA systems despite a modest backbone, outperforming AR-based baselines with a 9 percent improvement in behavior-trajectory consistency and a 6 percent increase in RFS on long-tail WOD-E2E scenarios. These results suggest a controllable and reliable pathway for scalable end-to-end driving.

[168]  arXiv:2512.04461 [pdf, ps, other]
Title: UniTS: Unified Time Series Generative Model for Remote Sensing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

One of the primary objectives of satellite remote sensing is to capture the complex dynamics of the Earth environment, which encompasses tasks such as reconstructing continuous cloud-free time series images, detecting land cover changes, and forecasting future surface evolution. However, existing methods typically require specialized models tailored to different tasks, lacking unified modeling of spatiotemporal features across multiple time series tasks. In this paper, we propose a Unified Time Series Generative Model (UniTS), a general framework applicable to various time series tasks, including time series reconstruction, time series cloud removal, time series semantic change detection, and time series forecasting. Based on the flow matching generative paradigm, UniTS constructs a deterministic evolution path from noise to targets under the guidance of task-specific conditions, achieving unified modeling of spatiotemporal representations for multiple tasks. The UniTS architecture consists of a diffusion transformer with spatio-temporal blocks, where we design an Adaptive Condition Injector (ACor) to enhance the model's conditional perception of multimodal inputs, enabling high-quality controllable generation. Additionally, we design a Spatiotemporal-aware Modulator (STM) to improve the ability of spatio-temporal blocks to capture complex spatiotemporal dependencies. Furthermore, we construct two high-quality multimodal time series datasets, TS-S12 and TS-S12CR, filling the gap of benchmark datasets for time series cloud removal and forecasting tasks. Extensive experiments demonstrate that UniTS exhibits exceptional generative and cognitive capabilities in both low-level and high-level time series tasks. It significantly outperforms existing methods, particularly when facing challenges such as severe cloud contamination, modality absence, and forecasting phenological variations.

[169]  arXiv:2512.04463 [pdf, ps, other]
Title: MARL Warehouse Robots
Comments: 6 pages, 4 tables. Project documentation: this https URL
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

We present a comparative study of multi-agent reinforcement learning (MARL) algorithms for cooperative warehouse robotics. We evaluate QMIX and IPPO on the Robotic Warehouse (RWARE) environment and a custom Unity 3D simulation. Our experiments reveal that QMIX's value decomposition significantly outperforms independent learning approaches (achieving 3.25 mean return vs. 0.38 for advanced IPPO), but requires extensive hyperparameter tuning -- particularly extended epsilon annealing (5M+ steps) for sparse reward discovery. We demonstrate successful deployment in Unity ML-Agents, achieving consistent package delivery after 1M training steps. While MARL shows promise for small-scale deployments (2-4 robots), significant scaling challenges remain. Code and analyses: https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/

[170]  arXiv:2512.04464 [pdf, ps, other]
Title: Feature Engineering vs. Deep Learning for Automated Coin Grading: A Comparative Study on Saint-Gaudens Double Eagles
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

We challenge the common belief that deep learning always trumps older techniques, using the example of grading Saint-Gaudens Double Eagle gold coins automatically. In our work, we put a feature-based Artificial Neural Network built around 192 custom features pulled from Sobel edge detection and HSV color analysis up against a hybrid Convolutional Neural Network that blends in EfficientNetV2, plus a straightforward Support Vector Machine as the control. Testing 1,785 coins graded by experts, the ANN nailed 86% exact matches and hit 98% when allowing a 3-grade leeway. On the flip side, CNN and SVM mostly just guessed the most common grade, scraping by with 31% and 30% exact hits. Sure, the CNN looked good on broader tolerance metrics, but that is because of some averaging trick in regression that hides how it totally flops at picking out specific grades. All told, when you are stuck with under 2,000 examples and lopsided classes, baking in real coin-expert knowledge through feature design beats out those inscrutable, all-in-one deep learning setups. This rings true for other niche quality checks where data's thin and know-how matters more than raw compute.

[171]  arXiv:2512.04469 [pdf, ps, other]
Title: Mathematical Framing for Different Agent Strategies
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We introduce a unified mathematical and probabilistic framework for understanding and comparing diverse AI agent strategies. We bridge the gap between high-level agent design concepts, such as ReAct, multi-agent systems, and control flows, and a rigorous mathematical formulation. Our approach frames agentic processes as a chain of probabilities, enabling a detailed analysis of how different strategies manipulate these probabilities to achieve desired outcomes. Our framework provides a common language for discussing the trade-offs inherent in various agent architectures. One of our many key contributions is the introduction of the "Degrees of Freedom" concept, which intuitively differentiates the optimizable levers available for each approach, thereby guiding the selection of appropriate strategies for specific tasks. This work aims to enhance the clarity and precision in designing and evaluating AI agents, offering insights into maximizing the probability of successful actions within complex agentic systems.

[172]  arXiv:2512.04470 [pdf, ps, other]
Title: Joint Low-Rank and Sparse Bayesian Channel Estimation for Ultra-Massive MIMO Communications
Comments: 5 pages, 4 figures, To appear in IEEE Communications Letters
Journal-ref: IEEE Communications Letters, 2026
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This letter investigates channel estimation for ultra-massive multiple-input multiple-output (MIMO) communications. We propose a joint low-rank and sparse Bayesian estimation (LRSBE) algorithm for spatial non-stationary ultra-massive channels by exploiting the low-rankness and sparsity in the beam domain. Specifically, the channel estimation integrates sparse Bayesian learning and soft-threshold gradient descent within the expectation-maximization framework. Simulation results show that the proposed algorithm significantly outperforms the state-of-the-art alternatives under different signal-to-noise ratio conditions in terms of estimation accuracy and overall complexity.

[173]  arXiv:2512.04472 [pdf, ps, other]
Title: Optimizing DER Aggregate Flexibility via Network Reconfiguration
Comments: 10 pages, 4 figures, 5 tables
Subjects: Systems and Control (eess.SY)

The aggregate flexibility region of distributed energy resources (DERs) quantifies the aggregate power shaping capabilities of DERs. It characterizes the distribution network's potential for wholesale market participation and grid service provision at the transmission level. To enhance flexibility and fully exploit the potential of DERs, this paper proposes a method to optimize the aggregate flexibility region through distribution network reconfiguration. First, we formulate the ellipsoidal aggregate flexibility region characterization problem as a two-stage adaptive robust optimization problem and derive an exact convex reformulation with a large number of second-order cone constraints. By exploiting the problem structure, we propose a scalable Benders decomposition algorithm with provable finite convergence to the optimal solution. Finally, we propose an optimal reconfiguration problem for aggregate flexibility region optimization and solve it using the custom Benders decomposition. Numerical simulations on the IEEE 123-bus test feeder demonstrate that, compared to existing approaches, substantial improvements in the aggregate flexibility region can be achieved over multiple scenarios with the optimized topology.

[174]  arXiv:2512.04474 [pdf, ps, other]
Title: LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models
Subjects: Software Engineering (cs.SE)

Log parsing transforms raw logs into structured templates containing constants and variables. It underpins anomaly detection, failure diagnosis, and other AIOps tasks. Current parsers are mostly reactive and log-centric. They only infer templates from logs, mostly overlooking the source code. This restricts their capacity to grasp dynamic log structures or adjust to evolving systems. Moreover, per-log LLM inference is too costly for practical deployment. In this paper, we propose LLM-SrcLog, a proactive and unified framework for log template parsing. It extracts templates directly from source code prior to deployment and supplements them with data-driven parsing for logs without available code. LLM-SrcLog integrates a cross-function static code analyzer to reconstruct meaningful logging contexts, an LLM-based white-box template extractor with post-processing to distinguish constants from variables, and a black-box template extractor that incorporates data-driven clustering for remaining unmatched logs. Experiments on two public benchmarks (Hadoop and Zookeeper) and a large-scale industrial system (Sunfire-Compute) show that, compared to two LLM-based baselines, LLM-SrcLog improves average F1-score by 2-17% and 8-35%. Meanwhile, its online parsing latency is comparable to data-driven methods and about 1,000 times faster than per-log LLM parsing. LLM-SrcLog achieves a near-ideal balance between speed and accuracy. Finally, we further validate the effectiveness of LLM-SrcLog through practical case studies in a real-world production environment.

[175]  arXiv:2512.04475 [pdf, ps, other]
Title: GraphBench: Next-generation graph learning benchmarking
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Machine learning on graphs has recently achieved impressive progress in various domains, including molecular property prediction and chip design. However, benchmarking practices remain fragmented, often relying on narrow, task-specific datasets and inconsistent evaluation protocols, which hampers reproducibility and broader progress. To address this, we introduce GraphBench, a comprehensive benchmarking suite that spans diverse domains and prediction tasks, including node-level, edge-level, graph-level, and generative settings. GraphBench provides standardized evaluation protocols -- with consistent dataset splits and performance metrics that account for out-of-distribution generalization -- as well as a unified hyperparameter tuning framework. Additionally, we benchmark GraphBench using message-passing neural networks and graph transformer models, providing principled baselines and establishing a reference performance. See www.graphbench.io for further details.

[176]  arXiv:2512.04476 [pdf, ps, other]
Title: Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)

Mixture-of-Experts (MoE) models scale large language models through conditional computation, but inference becomes memory-bound once expert weights exceed the capacity of GPU memory. In this case, weights must be offloaded to external memory, and fetching them incurs costly and repeated transfers. We address this by adopting CXL-attached near-data processing (CXL-NDP) as the offloading tier to execute cold experts in place, converting expensive parameter movement into cheaper activation movement. Unlike prior GPU-NDP systems that are largely context-agnostic and reactive, we develop a context-aware MoE system that uses prefill-stage activation statistics to guide decoding-stage expert placement, dynamically pins hot experts in GPU-side HBM, and maps the remainder to CXL-NDP. To meet NDP's limited compute throughput, we introduce context-aware mixed-precision quantization that allocates per-expert bitwidths (1-4 bit) based on prefill stage. The resulting MoE inference system overlaps GPU and NDP execution while minimizing cross-device movement. The evaluation on the GPU-NDP system shows that our approach achieves up to an 8.7-fold decoding throughput improvement over the state-of-the-art method, while incurring only a 0.13% average accuracy drop.

[177]  arXiv:2512.04480 [pdf, ps, other]
Title: AI-Assisted Game Management Decisions: A Fuzzy Logic Approach to Real-Time Substituitions
Authors: Pedro Passos
Comments: 33 pages, 7 figures
Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY); Optimization and Control (math.OC)

In elite soccer, substitution decisions entail significant financial and sporting consequences yet remain heavily reliant on intuition or predictive models that merely mimic historical biases. This paper introduces a Fuzzy Logic based Decision Support System (DSS) designed for real time, prescriptive game management. Unlike traditional Machine Learning approaches that encounter a predictive ceiling by attempting to replicate human behavior, our system audits performance through an objective, rule based inference engine. We propose a methodological advancement by reformulating the PlayeRank metric into a Cumulative Mean with Role Aware Normalization, eliminating the play time exposure bias inherent in cumulative sum models to enable accurate intra match comparison. The system integrates this refined metric with physiological proxies (fatigue) and contextual variables (disciplinary risk modulated by tactical role) to calculate a dynamic Substitution Priority (P final). Validation via a case study of the 2018 FIFA World Cup match between Brazil and Belgium demonstrates the system's ecological validity: it not only aligned with expert consensus on executed substitutions (for example Gabriel Jesus) but, crucially, identified high risk scenarios ignored by human decision makers. Specifically, the model flagged the "FAGNER Paradox" - a maximum priority defensive risk - minutes before a critical yellow card, and detected the "Lukaku Paradox", where an isolated assist masked a severe drop in participation. These results confirm that Fuzzy Logic offers a transparent, explainable, and superior alternative to black box models for optimizing real time tactical decisions.

[178]  arXiv:2512.04483 [pdf, ps, other]
Title: DeRA: Decoupled Representation Alignment for Video Tokenization
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents DeRA, a novel 1D video tokenizer that decouples the spatial-temporal representation learning in video tokenization to achieve better training efficiency and performance. Specifically, DeRA maintains a compact 1D latent space while factorizing video encoding into appearance and motion streams, which are aligned with pretrained vision foundation models to capture the spatial semantics and temporal dynamics in videos separately. To address the gradient conflicts introduced by the heterogeneous supervision, we further propose the Symmetric Alignment-Conflict Projection (SACP) module that proactively reformulates gradients by suppressing the components along conflicting directions. Extensive experiments demonstrate that DeRA outperforms LARP, the previous state-of-the-art video tokenizer by 25% on UCF-101 in terms of rFVD. Moreover, using DeRA for autoregressive video generation, we also achieve new state-of-the-art results on both UCF-101 class-conditional generation and K600 frame prediction.

[179]  arXiv:2512.04485 [pdf, ps, other]
Title: Not All Birds Look The Same: Identity-Preserving Generation For Birds
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Since the advent of controllable image generation, increasingly rich modes of control have enabled greater customization and accessibility for everyday users. Zero-shot, identity-preserving models such as Insert Anything and OminiControl now support applications like virtual try-on without requiring additional fine-tuning. While these models may be fitting for humans and rigid everyday objects, they still have limitations for non-rigid or fine-grained categories. These domains often lack accessible, high-quality data -- especially videos or multi-view observations of the same subject -- making them difficult both to evaluate and to improve upon. Yet, such domains are essential for moving beyond content creation toward applications that demand accuracy and fine detail. Birds are an excellent domain for this task: they exhibit high diversity, require fine-grained cues for identification, and come in a wide variety of poses. We introduce the NABirds Look-Alikes (NABLA) dataset, consisting of 4,759 expert-curated image pairs. Together with 1,073 pairs collected from multi-image observations on iNaturalist and a small set of videos, this forms a benchmark for evaluating identity-preserving generation of birds. We show that state-of-the-art baselines fail to maintain identity on this dataset, and we demonstrate that training on images grouped by species, age, and sex -- used as a proxy for identity -- substantially improves performance on both seen and unseen species.

[180]  arXiv:2512.04487 [pdf, ps, other]
Title: Controllable Long-term Motion Generation with Extended Joint Targets
Comments: WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generating stable and controllable character motion in real-time is a key challenge in computer animation. Existing methods often fail to provide fine-grained control or suffer from motion degradation over long sequences, limiting their use in interactive applications. We propose COMET, an autoregressive framework that runs in real time, enabling versatile character control and robust long-horizon synthesis. Our efficient Transformer-based conditional VAE allows for precise, interactive control over arbitrary user-specified joints for tasks like goal-reaching and in-betweening from a single model. To ensure long-term temporal stability, we introduce a novel reference-guided feedback mechanism that prevents error accumulation. This mechanism also serves as a plug-and-play stylization module, enabling real-time style transfer. Extensive evaluations demonstrate that COMET robustly generates high-quality motion at real-time speeds, significantly outperforming state-of-the-art approaches in complex motion control tasks and confirming its readiness for demanding interactive applications.

[181]  arXiv:2512.04488 [pdf, ps, other]
Title: Persona-based Multi-Agent Collaboration for Brainstorming
Comments: 12 pages, 8 figures
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

We demonstrate the importance of persona-based multi-agents brainstorming for both diverse topics and subject matter ideation. Prior work has shown that generalized multi-agent collaboration often provides better reasoning than a single agent alone. In this paper, we propose and develop a framework for persona-based agent selection, showing how persona domain curation can improve brainstorming outcomes. Using multiple experimental setups, we evaluate brainstorming outputs across different persona pairings (e.g., Doctor vs VR Engineer) and A2A (agent-to-agent) dynamics (separate, together, separate-then-together). Our results show that (1) persona choice shapes idea domains, (2) collaboration mode shifts diversity of idea generation, and (3) multi-agent persona-driven brainstorming produces idea depth and cross-domain coverage.

[182]  arXiv:2512.04489 [pdf, ps, other]
Title: The Decision Path to Control AI Risks Completely: Fundamental Control Mechanisms for AI Governance
Authors: Yong Tao
Comments: 39 pages, 6 figures, 1 table
Subjects: Computers and Society (cs.CY)

Artificial intelligence (AI) advances rapidly but achieving complete human control over AI risks remains an unsolved problem, akin to driving the fast AI "train" without a "brake system." By exploring fundamental control mechanisms at key elements of AI decisions, this paper develops a systematic solution to thoroughly control AI risks, providing an architecture for AI governance and legislation with five pillars supported by six control mechanisms, illustrated through a minimum set of AI Mandates (AIMs). Three of the AIMs must be built inside AI systems and three in society to address major areas of AI risks: 1) align AI values with human users; 2) constrain AI decision-actions by societal ethics, laws, and regulations; 3) build in human intervention options for emergencies and shut-off switches for existential threats; 4) limit AI access to resources to reinforce controls inside AI; 5) mitigate spillover risks like job loss from AI. We also highlight the differences in AI governance on physical AI systems versus generative AI. We discuss how to strengthen analog physical safeguards to prevent smarter AI/AGI/ASI from circumventing core safety controls by exploiting AI's intrinsic disconnect from the analog physical world: AI's nature as pure software code run on chips controlled by humans, and the prerequisite that all AI-driven physical actions must be digitized. These findings establish a theoretical foundation for AI governance and legislation as the basic structure of a "brake system" for AI decisions. If enacted, these controls can rein in AI dangers as completely as humanly possible, removing large chunks of currently wide-open AI risks, substantially reducing overall AI risks to residual human errors.

[183]  arXiv:2512.04492 [pdf, ps, other]
Title: MSME: A Multi-Stage Multi-Expert Framework for Zero-Shot Stance Detection
Subjects: Computation and Language (cs.CL)

LLM-based approaches have recently achieved impressive results in zero-shot stance detection. However, they still struggle in complex real-world scenarios, where stance understanding requires dynamic background knowledge, target definitions involve compound entities or events that must be explicitly linked to stance labels, and rhetorical devices such as irony often obscure the author's actual intent. To address these challenges, we propose MSME, a Multi-Stage, Multi-Expert framework for zero-shot stance detection. MSME consists of three stages: (1) Knowledge Preparation, where relevant background knowledge is retrieved and stance labels are clarified; (2) Expert Reasoning, involving three specialized modules-Knowledge Expert distills salient facts and reasons from a knowledge perspective, Label Expert refines stance labels and reasons accordingly, and Pragmatic Expert detects rhetorical cues such as irony to infer intent from a pragmatic angle; (3) Decision Aggregation, where a Meta-Judge integrates all expert analyses to produce the final stance prediction. Experiments on three public datasets show that MSME achieves state-of-the-art performance across the board.

[184]  arXiv:2512.04496 [pdf, ps, other]
Title: Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight Removal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Inevitable specular highlights in practical environments severely impair the visual performance, thus degrading the task effectiveness and efficiency. Although there exist considerable methods that focus on local information from convolutional neural network models or global information from transformer models, the single-type model falls into a modeling dilemma between local fine-grained details and global long-range dependencies, thus deteriorating for specular highlights with different scales. Therefore, to accommodate specular highlights of all scales, we propose a multi-model architecture for specular highlight removal (MM-SHR) that effectively captures fine-grained features in highlight regions and models long-range dependencies between highlight and highlight-free areas. Specifically, we employ convolution operations to extract local details in the shallow layers of MM-SHR, and utilize the attention mechanism to capture global features in the deep layers, ensuring both operation efficiency and removal accuracy. To model long-range dependencies without compromising computational complexity, we utilize a coarse-to-fine manner and propose Omni-Directional Attention Integration Block(OAIBlock) and Adaptive Region-Aware Hybrid-Domain Dual Attention Convolutional Network(HDDAConv) , which leverage omni-directiona pixel-shifting and window-dividing operations at the raw features to achieve specular highlight removal. Extensive experimental results on three benchmark tasks and six types of surface materials demonstrate that MM-SHR outperforms state-of-the-art methods in both accuracy and efficiency for specular highlight removal. The implementation will be made publicly available at https://github.com/Htcicv/MM-SHR.

[185]  arXiv:2512.04499 [pdf, ps, other]
Title: Back to Basics: Motion Representation Matters for Human Motion Generation Using Diffusion Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Diffusion models have emerged as a widely utilized and successful methodology in human motion synthesis. Task-oriented diffusion models have significantly advanced action-to-motion, text-to-motion, and audio-to-motion applications. In this paper, we investigate fundamental questions regarding motion representations and loss functions in a controlled study, and we enumerate the impacts of various decisions in the workflow of the generative motion diffusion model. To answer these questions, we conduct empirical studies based on a proxy motion diffusion model (MDM). We apply v loss as the prediction objective on MDM (vMDM), where v is the weighted sum of motion data and noise. We aim to enhance the understanding of latent data distributions and provide a foundation for improving the state of conditional motion diffusion models. First, we evaluate the six common motion representations in the literature and compare their performance in terms of quality and diversity metrics. Second, we compare the training time under various configurations to shed light on how to speed up the training process of motion diffusion models. Finally, we also conduct evaluation analysis on a large motion dataset. The results of our experiments indicate clear performance differences across motion representations in diverse datasets. Our results also demonstrate the impacts of distinct configurations on model training and suggest the importance and effectiveness of these decisions on the outcomes of motion diffusion models.

[186]  arXiv:2512.04500 [pdf, ps, other]
Title: A Modular Cognitive Architecture for Assisted Reasoning: The Nemosine Framework
Authors: Edervaldo Melo
Comments: 6 pages, 1 figure. First version
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA)

This paper presents the Nemosine Framework, a modular cognitive architecture designed to support assisted reasoning, structured thinking, and systematic analysis. The model operates through functional cognitive modules ("personas") that organize tasks such as planning, evaluation, cross-checking, and narrative synthesis. The framework combines principles from metacognition, distributed cognition, and modular cognitive systems to offer an operational structure for assisted problem-solving and decision support. The architecture is documented through formal specification, internal consistency criteria, and reproducible structural components. The goal is to provide a clear conceptual basis for future computational implementations and to contribute to the study of symbolic-modular architectures for reasoning.

[187]  arXiv:2512.04501 [pdf, ps, other]
Title: One-Step Generative Channel Estimation via Average Velocity Field
Subjects: Information Theory (cs.IT)

Generative models have shown immense potential for wireless communication by learning complex channel data distributions. However, the iterative denoising process associated with these models imposes a significant challenge in latency-sensitive wireless communication scenarios, particularly in channel estimation. To address this challenge, we propose a novel solution for one-step generative channel estimation. Our approach bypasses the time-consuming iterative steps of conventional models by directly learning the average velocity field. Through extensive simulations, we validate the effectiveness of our proposed method over existing state-of-the-art diffusion-based approach. Specifically, our scheme achieves a normalized mean squared error up to 2.65 dB lower than the diffusion method and reduces latency by around 90%, demonstrating the potential of our method to enhance channel estimation performance.

[188]  arXiv:2512.04502 [pdf, ps, other]
Title: One Ring to Rule Them All: Constrained Distributional Control for Massive-Scale Heterogeneous Robotic Ensemble Systems
Comments: 9 pages, 8 figures
Subjects: Robotics (cs.RO)

Ensemble control aims to steer a population of dynamical systems using a shared control input. This paper introduces a constrained ensemble control framework for parameterized, heterogeneous robotic systems operating under state and environmental constraints, such as obstacle avoidance. We develop a moment kernel transform that maps the parameterized ensemble dynamics to the moment system in a kernel space, enabling the characterization of population-level behavior. The state-space constraints, such as polyhedral waypoints to be visited and obstacles to be avoided, are also transformed into the moment space, leading to a unified formulation for safe, large-scale ensemble control. Expressive signal temporal logic specifications are employed to encode complex visit-avoid tasks, which are achieved through a single shared controller synthesized from our constrained ensemble control formulation. Simulation and hardware experiments demonstrate the effectiveness of the proposed approach in safely and efficiently controlling robotic ensembles within constrained environments.

[189]  arXiv:2512.04504 [pdf, ps, other]
Title: UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent image diffusion transformers achieve high-fidelity generation, but struggle to generate images beyond these scales, suffering from content repetition and quality degradation. In this work, we present UltraImage, a principled framework that addresses both issues. Through frequency-wise analysis of positional embeddings, we identify that repetition arises from the periodicity of the dominant frequency, whose period aligns with the training resolution. We introduce a recursive dominant frequency correction to constrain it within a single period after extrapolation. Furthermore, we find that quality degradation stems from diluted attention and thus propose entropy-guided adaptive attention concentration, which assigns higher focus factors to sharpen local attention for fine detail and lower ones to global attention patterns to preserve structural consistency. Experiments show that UltraImage consistently outperforms prior methods on Qwen-Image and Flux (around 4K) across three generation scenarios, reducing repetition and improving visual fidelity. Moreover, UltraImage can generate images up to 6K*6K without low-resolution guidance from a training resolution of 1328p, demonstrating its extreme extrapolation capability. Project page is available at \href{https://thu-ml.github.io/ultraimage.github.io/}{https://thu-ml.github.io/ultraimage.github.io/}.

[190]  arXiv:2512.04511 [pdf, ps, other]
Title: DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Infrared imaging plays a critical role in low-light and adverse weather conditions. However, due to the distinct characteristics of infrared images, existing foundation models such as Masked Autoencoder (MAE) trained on visible data perform suboptimal in infrared image interpretation tasks. To bridge this gap, an infrared foundation model known as InfMAE was developed and pre-trained on large-scale infrared datasets. Despite its effectiveness, InfMAE still faces several limitations, including the omission of informative tokens, insufficient modeling of global associations, and neglect of non-uniform noise. In this paper, we propose a Dual-domain Guided Infrared foundation model based on MAE (DuGI-MAE). First, we design a deterministic masking strategy based on token entropy, preserving only high-entropy tokens for reconstruction to enhance informativeness. Next, we introduce a Dual-Domain Guidance (DDG) module, which simultaneously captures global token relationships and adaptively filters non-uniform background noise commonly present in infrared imagery. To facilitate large-scale pretraining, we construct Inf-590K, a comprehensive infrared image dataset encompassing diverse scenes, various target types, and multiple spatial resolutions. Pretrained on Inf-590K, DuGI-MAE demonstrates strong generalization capabilities across various downstream tasks, including infrared object detection, semantic segmentation, and small target detection. Experimental results validate the superiority of the proposed method over both supervised and self-supervised comparison methods. Our code is available in the supplementary material.

[191]  arXiv:2512.04512 [pdf, ps, other]
Title: Adaptive Time-Domain Harmonic Control for Noise-Vibration-Harshness Reduction of Electric Drives
Subjects: Systems and Control (eess.SY)

Reducing Noise, Vibration, and Harshness (NVH) in electric drives is crucial for applications such as electric vehicle drivetrains and heat-pump compressors, where strict NVH requirements directly affect user satisfaction and component longevity. This work presents the integration of an adaptive time-domain harmonic controller into an existing electric-drive control loop to attenuate harmonic disturbances. Three control structures are proposed and analyzed, along with a modified parameter-estimation scheme that reduces computational effort while preserving estimation accuracy, making the method suitable for embedded real-time implementation. To cope with fast operating-point changes, a delta-learning approach combines adaptive control with a lookup-table-based feedforward estimator, ensuring fast convergence and robustness. The proposed controller architectures are validated through simulation and testbench experiments on a permanent-magnet synchronous machine drive, demonstrating substantial NVH reductions across operating conditions. The results confirm that time-domain adaptive harmonic control offers a practical and theoretically grounded solution for real-time NVH mitigation in electric drives.

[192]  arXiv:2512.04513 [pdf, ps, other]
Title: BiTAgent: A Task-Aware Modular Framework for Bidirectional Coupling between Multimodal Large Language Models and World Models
Subjects: Artificial Intelligence (cs.AI)

Building generalist embodied agents requires a unified system that can interpret multimodal goals, model environment dynamics, and execute reliable actions across diverse real-world tasks. Multimodal large language models (MLLMs) offer strong semantic priors and cross-modal generalization, while world models (WMs) provide actionable latent dynamics for prediction and control. Their combination holds promise for open-ended embodied intelligence, yet introduces two key challenges: (1) establishing a tight coupling between the semantic intent from MLLMs and the dynamic state representations within the WM's latent space, and (2) achieving task-aware adaptability that supports multi-task learning and cross-environment generalization. To address these limitations, we propose BiTAgent, a task-aware dynamic joint framework that enables bidirectional coupling between MLLMs and WMs. BiTAgent establishes two complementary pathways: a forward path that injects MLLM representations into the WM's latent space for semantically guided imagination, and a backward path where WM-generated feedback refines the MLLM's semantic space via dense text-conditioned rewards. This bidirectional interaction is realized through three synergistic components: Task-Aware Dynamic Joint Learning, Task-Aware Behavior Learning, and MLLM-WM Joint Optimization, which together harmonize semantic reasoning and dynamic prediction. Extensive experiments across multi-task and cross-environment settings demonstrate superior stability and generalization over state-of-the-art baselines, marking a step toward open-ended embodied learning.

[193]  arXiv:2512.04514 [pdf, ps, other]
Title: SPLICE: Part-Level 3D Shape Editing from Local Semantic Extraction to Global Neural Mixing
Subjects: Graphics (cs.GR)

Neural implicit representations of 3D shapes have shown great potential in 3D shape editing due to their ability to model high-level semantics and continuous geometric representations. However, existing methods often suffer from limited editability, lack of part-level control, and unnatural results when modifying or rearranging shape parts. In this work, we present SPLICE, a novel part-level neural implicit representation of 3D shapes that enables intuitive, structure-aware, and high-fidelity shape editing. By encoding each shape part independently and positioning them using parameterized Gaussian ellipsoids, SPLICE effectively isolates part-specific features while discarding global context that may hinder flexible manipulation. A global attention-based decoder is then employed to integrate parts coherently, further enhanced by an attention-guiding filtering mechanism that prevents information leakage across symmetric or adjacent components. Through this architecture, SPLICE supports various part-level editing operations, including translation, rotation, scaling, deletion, duplication, and cross-shape part mixing. These operations enable users to flexibly explore design variations while preserving semantic consistency and maintaining structural plausibility. Extensive experiments demonstrate that SPLICE outperforms existing approaches both qualitatively and quantitatively across a diverse set of shape-editing tasks.

[194]  arXiv:2512.04515 [pdf, ps, other]
Title: EgoLCD: Egocentric Video Generation with Long Context Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generating long, coherent egocentric videos is difficult, as hand-object interactions and procedural tasks require reliable long-term memory. Existing autoregressive models suffer from content drift, where object identity and scene semantics degrade over time. To address this challenge, we introduce EgoLCD, an end-to-end framework for egocentric long-context video generation that treats long video synthesis as a problem of efficient and stable memory management. EgoLCD combines a Long-Term Sparse KV Cache for stable global context with an attention-based short-term memory, extended by LoRA for local adaptation. A Memory Regulation Loss enforces consistent memory usage, and Structured Narrative Prompting provides explicit temporal guidance. Extensive experiments on the EgoVid-5M benchmark demonstrate that EgoLCD achieves state-of-the-art performance in both perceptual quality and temporal consistency, effectively mitigating generative forgetting and representing a significant step toward building scalable world models for embodied AI. Code: https://github.com/AIGeeksGroup/EgoLCD. Website: https://aigeeksgroup.github.io/EgoLCD.

[195]  arXiv:2512.04518 [pdf, ps, other]
Title: UW-BioNLP at ChemoTimelines 2025: Thinking, Fine-Tuning, and Dictionary-Enhanced LLM Systems for Chemotherapy Timeline Extraction
Comments: To be published in Proceedings of the 7th Clinical Natural Language Processing Workshop
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The ChemoTimelines shared task benchmarks methods for constructing timelines of systemic anticancer treatment from electronic health records of cancer patients. This paper describes our methods, results, and findings for subtask 2 -- generating patient chemotherapy timelines from raw clinical notes. We evaluated strategies involving chain-of-thought thinking, supervised fine-tuning, direct preference optimization, and dictionary-based lookup to improve timeline extraction. All of our approaches followed a two-step workflow, wherein an LLM first extracted chemotherapy events from individual clinical notes, and then an algorithm normalized and aggregated events into patient-level timelines. Each specific method differed in how the associated LLM was utilized and trained. Multiple approaches yielded competitive performances on the test set leaderboard, with fine-tuned Qwen3-14B achieving the best official score of 0.678. Our results and analyses could provide useful insights for future attempts on this task as well as the design of similar tasks.

[196]  arXiv:2512.04519 [pdf, ps, other]
Title: VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Autoregressive (AR) diffusion enables streaming, interactive long-video generation by producing frames causally, yet maintaining coherence over minute-scale horizons remains challenging due to accumulated errors, motion drift, and content repetition. We approach this problem from a memory perspective, treating video synthesis as a recurrent dynamical process that requires coordinated short- and long-term context. We propose VideoSSM, a Long Video Model that unifies AR diffusion with a hybrid state-space memory. The state-space model (SSM) serves as an evolving global memory of scene dynamics across the entire sequence, while a context window provides local memory for motion cues and fine details. This hybrid design preserves global consistency without frozen, repetitive patterns, supports prompt-adaptive interaction, and scales in linear time with sequence length. Experiments on short- and long-range benchmarks demonstrate state-of-the-art temporal consistency and motion stability among autoregressive video generator especially at minute-scale horizons, enabling content diversity and interactive prompt-based control, thereby establishing a scalable, memory-aware framework for long video generation.

[197]  arXiv:2512.04520 [pdf, ps, other]
Title: Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Due to the scarcity of annotated data and the substantial computational costs of model, conventional tuning methods in medical image segmentation face critical challenges. Current approaches to adapting pretrained models, including full-parameter and parameter-efficient fine-tuning, still rely heavily on task-specific training on downstream tasks. Therefore, zero-shot segmentation has gained increasing attention, especially with foundation models such as SAM demonstrating promising generalization capabilities. However, SAM still faces notable limitations on medical datasets due to domain shifts, making efficient zero-shot enhancement an urgent research goal. To address these challenges, we propose BA-TTA-SAM, a task-agnostic test-time adaptation framework that significantly enhances the zero-shot segmentation performance of SAM via test-time adaptation. This framework integrates two key mechanisms: (1) The encoder-level Gaussian prompt injection embeds Gaussian-based prompts directly into the image encoder, providing explicit guidance for initial representation learning. (2) The cross-layer boundary-aware attention alignment exploits the hierarchical feature interactions within the ViT backbone, aligning deep semantic responses with shallow boundary cues. Experiments on four datasets, including ISIC, Kvasir, BUSI, and REFUGE, show an average improvement of 12.4\% in the DICE score compared with SAM's zero-shot segmentation performance. The results demonstrate that our method consistently outperforms state-of-the-art models in medical image segmentation. Our framework significantly enhances the generalization ability of SAM, without requiring any source-domain training data. Extensive experiments on publicly available medical datasets strongly demonstrate the superiority of our framework. Our code is available at https://github.com/Emilychenlin/BA-TTA-SAM.

[198]  arXiv:2512.04521 [pdf, ps, other]
Title: WiFi-based Cross-Domain Gesture Recognition Using Attention Mechanism
Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)

While fulfilling communication tasks, wireless signals can also be used to sense the environment. Among various types of sensing media, WiFi signals offer advantages such as widespread availability, low hardware cost, and strong robustness to environmental conditions like light, temperature, and humidity. By analyzing Wi-Fi signals in the environment, it is possible to capture dynamic changes of the human body and accomplish sensing applications such as gesture recognition. Although many existing gesture sensing solutions perform well in-domain but lack cross-domain capabilities (i.e., recognition performance in untrained environments). To address this, we extract Doppler spectra from the channel state information (CSI) received by all receivers and concatenate each Doppler spectrum along the same time axis to generate fused images with multi-angle information as input features. Furthermore, inspired by the convolutional block attention module (CBAM), we propose a gesture recognition network that integrates a multi-semantic spatial attention mechanism with a self-attention-based channel mechanism. This network constructs attention maps to quantify the spatiotemporal features of gestures in images, enabling the extraction of key domain-independent features. Additionally, ResNet18 is employed as the backbone network to further capture deep-level features. To validate the network performance, we evaluate the proposed network on the public Widar3 dataset, and the results show that it not only maintains high in-domain accuracy of 99.72%, but also achieves high performance in cross-domain recognition of 97.61%, significantly outperforming existing best solutions.

[199]  arXiv:2512.04522 [pdf, ps, other]
Title: Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-Identification
Comments: 14 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visible-Infrared Person Re-Identification (VI-ReID) is a challenging cross-modal matching task due to significant modality discrepancies. While current methods mainly focus on learning modality-invariant features through unified embedding spaces, they often focus solely on the common discriminative semantics across modalities while disregarding the critical role of modality-specific identity-aware knowledge in discriminative feature learning. To bridge this gap, we propose a novel Identity Clue Refinement and Enhancement (ICRE) network to mine and utilize the implicit discriminative knowledge inherent in modality-specific attributes. Initially, we design a Multi-Perception Feature Refinement (MPFR) module that aggregates shallow features from shared branches, aiming to capture modality-specific attributes that are easily overlooked. Then, we propose a Semantic Distillation Cascade Enhancement (SDCE) module, which distills identity-aware knowledge from the aggregated shallow features and guide the learning of modality-invariant features. Finally, an Identity Clues Guided (ICG) Loss is proposed to alleviate the modality discrepancies within the enhanced features and promote the learning of a diverse representation space. Extensive experiments across multiple public datasets clearly show that our proposed ICRE outperforms existing SOTA methods.

[200]  arXiv:2512.04524 [pdf, ps, other]
Title: Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval
Comments: This paper was accepted by AAAI2026 main tech track not long ago. This is an expanded version with an appendix
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, enabling effective retrieval while mitigating domain discrepancies. However, existing methods encounter several fundamental limitations: 1) neglecting class-level semantic alignment and excessively pursuing pair-wise sample alignment; 2) lacking either pseudo-label reliability consideration or geometric guidance for assessing label correctness; 3) directly quantizing original features affected by domain shift, undermining the quality of learned hash codes. In view of these limitations, we propose Prototype-Based Semantic Consistency Alignment (PSCA), a two-stage framework for effective domain adaptive retrieval. In the first stage, a set of orthogonal prototypes directly establishes class-level semantic connections, maximizing inter-class separability while gathering intra-class samples. During the prototype learning, geometric proximity provides a reliability indicator for semantic consistency alignment through adaptive weighting of pseudo-label confidences. The resulting membership matrix and prototypes facilitate feature reconstruction, ensuring quantization on reconstructed rather than original features, thereby improving subsequent hash coding quality and seamlessly connecting both stages. In the second stage, domain-specific quantization functions process the reconstructed features under mutual approximation constraints, generating unified binary hash codes across domains. Extensive experiments validate PSCA's superior performance across multiple datasets.

[201]  arXiv:2512.04527 [pdf, ps, other]
Title: FLEX: Leveraging FPGA-CPU Synergy for Mixed-Cell-Height Legalization Acceleration
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)

In this work, we present FLEX, an FPGA-CPU accelerator for mixed-cell-height legalization tasks. We address challenges from the following perspectives. First, we optimize the task assignment strategy and perform an efficient task partition between FPGA and CPU to exploit their complementary strengths. Second, a multi-granularity pipelining technique is employed to accelerate the most time-consuming step, finding optimal placement position (FOP), in legalization. At last, we particularly target the computationally intensive cell shifting process in FOP, optimizing the design to align it seamlessly with the multi-granularity pipelining framework for further speedup. Experimental results show that FLEX achieves up to 18.3x and 5.4x speedups compared to state-of-the-art CPU-GPU and multi-threaded CPU legalizers with better scalability, while improving legalization quality by 4% and 1%.

[202]  arXiv:2512.04528 [pdf, ps, other]
Title: Auto3R: Automated 3D Reconstruction and Scanning via Data-driven Uncertainty Quantification
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Traditional high-quality 3D scanning and reconstruction typically relies on human labor to plan the scanning procedure. With the rapid development of embodied systems such as drones and robots, there is a growing demand of performing accurate 3D scanning and reconstruction in an fully automated manner. We introduce Auto3R, a data-driven uncertainty quantification model that is designed to automate the 3D scanning and reconstruction of scenes and objects, including objects with non-lambertian and specular materials. Specifically, in a process of iterative 3D reconstruction and scanning, Auto3R can make efficient and accurate prediction of uncertainty distribution over potential scanning viewpoints, without knowing the ground truth geometry and appearance. Through extensive experiments, Auto3R achieves superior performance that outperforms the state-of-the-art methods by a large margin. We also deploy Auto3R on a robot arm equipped with a camera and demonstrate that Auto3R can be used to effectively digitize real-world 3D objects and delivers ready-to-use and photorealistic digital assets. Our homepage: https://tomatoma00.github.io/auto3r.github.io .

[203]  arXiv:2512.04529 [pdf, ps, other]
Title: SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation
Subjects: Artificial Intelligence (cs.AI)

Generating academic slides from scientific papers is a challenging multimodal reasoning task that requires both long context understanding and deliberate visual planning. Existing approaches largely reduce it to text only summarization, overlooking the visual component and design intensive nature of slide creation. In this paper we introduce SlideGen, an agentic, modular, and visual in the loop framework for scientific paper to slide generation. SlideGen orchestrates a group of vision language agents that reason collaboratively over the document structure and semantics, producing editable PPTX slides with logical flow and compelling visual presentation. By integrating coordinated outlining, mapping, arrangement, note synthesis, and iterative refinement, our system consistently delivers slides of expert level quality. Across diverse benchmarks and strong baselines, SlideGen outperforms existing methods in visual quality, content faithfulness, and readability, positioning it as the new state of the art in automated slide generation. Our work establishes a foundation for design aware multimodal slide generation, demonstrating how agentic collaboration can bridge understanding and presentation in complex multimodal reasoning tasks.

[204]  arXiv:2512.04530 [pdf, ps, other]
Title: Explainable Graph Representation Learning via Graph Pattern Analysis
Comments: Full version with appendix of the paper published in the Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI-25), Main Track
Journal-ref: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI-25), Main Track, pages 3426-3434, 2025
Subjects: Machine Learning (cs.LG)

Explainable artificial intelligence (XAI) is an important area in the AI community, and interpretability is crucial for building robust and trustworthy AI models. While previous work has explored model-level and instance-level explainable graph learning, there has been limited investigation into explainable graph representation learning. In this paper, we focus on representation-level explainable graph learning and ask a fundamental question: What specific information about a graph is captured in graph representations? Our approach is inspired by graph kernels, which evaluate graph similarities by counting substructures within specific graph patterns. Although the pattern counting vector can serve as an explainable representation, it has limitations such as ignoring node features and being high-dimensional. To address these limitations, we introduce a framework (PXGL-GNN) for learning and explaining graph representations through graph pattern analysis. We start by sampling graph substructures of various patterns. Then, we learn the representations of these patterns and combine them using a weighted sum, where the weights indicate the importance of each graph pattern's contribution. We also provide theoretical analyses of our methods, including robustness and generalization. In our experiments, we show how to learn and explain graph representations for real-world data using pattern analysis. Additionally, we compare our method against multiple baselines in both supervised and unsupervised learning tasks to demonstrate its effectiveness.

[205]  arXiv:2512.04532 [pdf, ps, other]
Title: PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Video Large Language Models (Video LLMs) have shown impressive performance across a wide range of video-language tasks. However, they often fail in scenarios requiring a deeper understanding of physical dynamics. This limitation primarily arises from their reliance on appearance-based matching. Incorporating physical motion modeling is crucial for deeper video understanding, but presents three key challenges: (1) motion signals are often entangled with appearance variations, making it difficult to extract clean physical cues; (2) effective motion modeling requires not only continuous-time motion representations but also capturing physical dynamics; and (3) collecting accurate annotations for physical attributes is costly and often impractical. To address these issues, we propose PhyVLLM, a physical-guided video-language framework that explicitly incorporates physical motion into Video LLMs. Specifically, PhyVLLM disentangles visual appearance and object motion through a dual-branch encoder. To model physical dynamics over time, we incorporate a Neural Ordinary Differential Equation (Neural ODE) module, which generates differentiable physical dynamic representations. The resulting motion-aware representations are projected into the token space of a pretrained LLM, enabling physics reasoning without compromising the model's original multimodal capabilities. To circumvent the need for explicit physical labels, PhyVLLM employs a self-supervised manner to model the continuous evolution of object motion. Experimental results demonstrate that PhyVLLM significantly outperforms state-of-the-art Video LLMs on both physical reasoning and general video understanding tasks, highlighting the advantages of incorporating explicit physical modeling.

[206]  arXiv:2512.04534 [pdf, ps, other]
Title: Refaçade: Editing Object with Given Reference Texture
Authors: Youze Huang (1), Penghui Ruan (2), Bojia Zi (3), Xianbiao Qi (4), Jianan Wang (5), Rong Xiao (4) ((1) University of Electronic Science and Technology of China, (2) The Hong Kong Polytechnic University, (3) The Chinese University of Hong Kong, (4) IntelliFusion Inc., (5) Astribot Inc.)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advances in diffusion models have brought remarkable progress in image and video editing, yet some tasks remain underexplored. In this paper, we introduce a new task, Object Retexture, which transfers local textures from a reference object to a target object in images or videos. To perform this task, a straightforward solution is to use ControlNet conditioned on the source structure and the reference texture. However, this approach suffers from limited controllability for two reasons: conditioning on the raw reference image introduces unwanted structural information, and it fails to disentangle the visual texture and structure information of the source. To address this problem, we propose Refa\c{c}ade, a method that consists of two key designs to achieve precise and controllable texture transfer in both images and videos. First, we employ a texture remover trained on paired textured/untextured 3D mesh renderings to remove appearance information while preserving the geometry and motion of source videos. Second, we disrupt the reference global layout using a jigsaw permutation, encouraging the model to focus on local texture statistics rather than the global layout of the object. Extensive experiments demonstrate superior visual quality, precise editing, and controllability, outperforming strong baselines in both quantitative and human evaluations. Code is available at https://github.com/fishZe233/Refacade.

[207]  arXiv:2512.04535 [pdf, ps, other]
Title: GTM: Simulating the World of Tools for AI Agents
Subjects: Artificial Intelligence (cs.AI)

The integration of external tools is pivotal for empowering Large Language Model (LLM) agents with real-world capabilities. However, training these agents through direct, continuous interaction with diverse tools is often prohibitively expensive, slow, and introduces additional development and maintenance overhead. To address this challenge, we introduce the Generalist Tool Model (GTM), a 1.5-billion-parameter model that learns to act as a universal tool simulator. With only prompt-level configuration, GTM accesses tool functionalities along with input arguments and generates outputs that faithfully mimic real tool execution, providing a fast and cost-effective solution that eliminates development overhead. To build GTM, we propose the Context-Aware Response Generation (CARG) pipeline, which synthesizes comprehensive training data covering over 20,000 tools across 300 domains including physics, medicine, robotics, and finance. Through this pipeline, GTM learns to produce not only syntactically correct outputs but also logically coherent and contextually appropriate responses. Experiments demonstrate that GTM produces high-quality outputs with strong consistency and reliability. Besides when used in real reinforcement learning scenarios for agent training, GTM exhibits significantly faster simulation speed compared to real tools while maintaining comparable output quality, along with remarkable generalization and domain adaptability. Our results establish GTM as a foundational component for developing future AI agents, enabling efficient and scalable training of tool-augmented systems.

[208]  arXiv:2512.04536 [pdf, ps, other]
Title: Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Alcohol consumption is a significant public health concern and a major cause of accidents and fatalities worldwide. This study introduces a novel video-based facial sequence analysis approach dedicated to the detection of alcohol intoxication. The method integrates facial landmark analysis via a Graph Attention Network (GAT) with spatiotemporal visual features extracted using a 3D ResNet. These features are dynamically fused with adaptive prioritization to enhance classification performance. Additionally, we introduce a curated dataset comprising 3,542 video segments derived from 202 individuals to support training and evaluation. Our model is compared against two baselines: a custom 3D-CNN and a VGGFace+LSTM architecture. Experimental results show that our approach achieves 95.82% accuracy, 0.977 precision, and 0.97 recall, outperforming prior methods. The findings demonstrate the model's potential for practical deployment in public safety systems for non-invasive, reliable alcohol intoxication detection.

[209]  arXiv:2512.04537 [pdf, ps, other]
Title: X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The advancement of embodied AI has unlocked significant potential for intelligent humanoid robots. However, progress in both Vision-Language-Action (VLA) models and world models is severely hampered by the scarcity of large-scale, diverse training data. A promising solution is to "robotize" web-scale human videos, which has been proven effective for policy training. However, these solutions mainly "overlay" robot arms to egocentric videos, which cannot handle complex full-body motions and scene occlusions in third-person videos, making them unsuitable for robotizing humans. To bridge this gap, we introduce X-Humanoid, a generative video editing approach that adapts the powerful Wan 2.2 model into a video-to-video structure and finetunes it for the human-to-humanoid translation task. This finetuning requires paired human-humanoid videos, so we designed a scalable data creation pipeline, turning community assets into 17+ hours of paired synthetic videos using Unreal Engine. We then apply our trained model to 60 hours of the Ego-Exo4D videos, generating and releasing a new large-scale dataset of over 3.6 million "robotized" humanoid video frames. Quantitative analysis and user studies confirm our method's superiority over existing baselines: 69% of users rated it best for motion consistency, and 62.1% for embodiment correctness.

[210]  arXiv:2512.04538 [pdf, ps, other]
Title: Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding
Subjects: Software Engineering (cs.SE)

As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation (RAG) methods typically treat code as plain natural language, relying primarily on shallow semantic matching while overlooking structural semantics and code-specific dependencies. This limits their ability to capture control flow and underlying intent, ultimately constraining the quality of generated code. Therefore, we propose CoCo, a novel framework that enables code Completion by Comprehension of multi-granularity context from large-scale code repositories. CoCo employs static code analysis to extract structured context at the function, file, and project levels, capturing execution logic and semantic dependencies. It then adopts an graph-based multi-granularity context selection mechanism to filter out redundant information and remove noise. Consequently, the information is converted into natural language in a consistent manner, thereby functioning as explicit contextual prompts to guide subsequent code completion. Additionally, a structure-aware code re-ranker mechanism ensures alignment at both semantic and structural levels. Extensive experiments on CrossCodeEval and RepoEval benchmarks demonstrate that CoCo consistently surpasses state-of-the-art baselines, achieving up to 20.2% gains in EM. Moreover, the framework is model-agnostic and can be seamlessly integrated into existing methods, leading to significant performance.

[211]  arXiv:2512.04540 [pdf, ps, other]
Title: VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Ultra long video understanding remains an open challenge, as existing vision language models (VLMs) falter on such content due to limited context length and inefficient long term memory retention. To address this, recent works have attempted to construct external knowledge bases and corresponding retrieval agumented generation (RAG) systems, yet these incur enormous storage and computational overhead. In this paper, we propose VideoMem, a novel framework that pioneers models long video understanding as a sequential generation task via adaptive memory management. Specifically, VideoMem dynamically updates a global memory buffer, which adaptively retains critical information while discarding redundant content across the video timeline. To efficiently train VLMs for such long-term tasks, VideoMem integrates the Progressive Grouped Relative Policy Optimization (PRPO) algorithm, equipped with two core modules: Progressive State Propagation (PSP) adaptively retains valid current states, propagates them to the next rollout step, and gradually narrows the model exploration space. Temporal Cascading Reward (TCR) further alleviates reward sparsity, improving sample utilization and accelerating convergence. Extensive experiments demonstrate that VideoMem significantly outperforms existing open-source models across diverse benchmarks for ultra-long video understanding tasks.

[212]  arXiv:2512.04542 [pdf, ps, other]
Title: Gaussian Entropy Fields: Driving Adaptive Sparsity in 3D Gaussian Optimization
Comments: 28 pages,11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D Gaussian Splatting (3DGS) has emerged as a leading technique for novel view synthesis, demonstrating exceptional rendering efficiency. \replaced[]{Well-reconstructed surfaces can be characterized by low configurational entropy, where dominant primitives clearly define surface geometry while redundant components are suppressed.}{The key insight is that well-reconstructed surfaces naturally exhibit low configurational entropy, where dominant primitives clearly define surface geometry while suppressing redundant components.} Three complementary technical contributions are introduced: (1) entropy-driven surface modeling via entropy minimization for low configurational entropy in primitive distributions; (2) adaptive spatial regularization using the Surface Neighborhood Redundancy Index (SNRI) and image entropy-guided weighting; (3) multi-scale geometric preservation through competitive cross-scale entropy alignment. Extensive experiments demonstrate that GEF achieves competitive geometric precision on DTU and T\&T benchmarks, while delivering superior rendering quality compared to existing methods on Mip-NeRF 360. Notably, superior Chamfer Distance (0.64) on DTU and F1 score (0.44) on T\&T are obtained, alongside the best SSIM (0.855) and LPIPS (0.136) among baselines on Mip-NeRF 360, validating the framework's ability to enhance surface reconstruction accuracy without compromising photometric fidelity.

[213]  arXiv:2512.04545 [pdf, ps, other]
Title: EvoEdit: Lifelong Free-Text Knowledge Editing through Latent Perturbation Augmentation and Knowledge-driven Parameter Fusion
Subjects: Computation and Language (cs.CL)

Adjusting the outdated knowledge of large language models (LLMs) after deployment remains a major challenge. This difficulty has spurred the development of knowledge editing, which seeks to accurately and efficiently modify a model's internal (parametric) knowledge without retraining it from scratch. However, existing methods suffer from two limitations. First, they depend on structured triplets that are misaligned with the free-text nature of LLM pretraining and fail to capture the nuanced relationships among facts. Second, they typically support one-time knowledge updates, with relatively limited research on the problem of sequential or lifelong editing. To address these gaps, we propose a new task, Lifelong Free-text Knowledge Editing (LF-Edit), which enables models to incorporate updates expressed in natural language and supports continual editing over time. Despite its promise, LF-Edit faces the dual challenge of integrating new knowledge while mitigating the forgetting of prior information. To foster research on this new task, we construct a large-scale benchmark, Multi-Rank Lifelong Free-text Editing Benchmark (MRLF-Bench), containing 16,835 free-text edit requests. We further design a cognitively inspired multi-rank evaluation framework encompassing four levels: memorization, understanding, constrained comprehension, and reasoning. To tackle the challenges inherent in LF-Edit, we introduce a novel approach named EvoEdit that enhances knowledge injection through Latent Perturbation Augmentation and preserves prior information via Knowledge-driven Parameter Fusion. Experimental results demonstrate that EvoEdit substantially outperforms existing knowledge editing methods on the proposed LF-Edit task.

[214]  arXiv:2512.04550 [pdf, ps, other]
Title: AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees
Comments: NeurIPS 2025
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The quadratic complexity of self-attention constrains Large Language Models (LLMs) in processing long contexts, a capability essential for many advanced applications. Context compression aims to alleviate this computational bottleneck while retaining critical semantic information. However, existing approaches often fall short: explicit methods may compromise local detail, whereas implicit methods can suffer from positional biases, information degradation, or an inability to capture long-range semantic dependencies. We propose AdmTree, a novel framework for adaptive, hierarchical context compression with a central focus on preserving high semantic fidelity while maintaining efficiency. AdmTree dynamically segments input based on information density, utilizing gist tokens to summarize variable-length segments as the leaves of a semantic binary tree. This structure, together with a lightweight aggregation mechanism and a frozen backbone LLM (thereby minimizing new trainable parameters), enables efficient hierarchical abstraction of the context. By preserving fine-grained details alongside global semantic coherence, mitigating positional bias, and dynamically adapting to content, AdmTree robustly retains the semantic information of long contexts.

[215]  arXiv:2512.04551 [pdf, ps, other]
Title: Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Speech emotion recognition (SER) is an important technology in human-computer interaction. However, achieving high performance is challenging due to emotional complexity and scarce annotated data. To tackle these challenges, we propose a multi-loss learning (MLL) framework integrating an energy-adaptive mixup (EAM) method and a frame-level attention module (FLAM). The EAM method leverages SNR-based augmentation to generate diverse speech samples capturing subtle emotional variations. FLAM enhances frame-level feature extraction for multi-frame emotional cues. Our MLL strategy combines Kullback-Leibler divergence, focal, center, and supervised contrastive loss to optimize learning, address class imbalance, and improve feature separability. We evaluate our method on four widely used SER datasets: IEMOCAP, MSP-IMPROV, RAVDESS, and SAVEE. The results demonstrate our method achieves state-of-the-art performance, suggesting its effectiveness and robustness.

[216]  arXiv:2512.04552 [pdf, ps, other]
Title: RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Differentiable reinforcement learning (RL) frameworks like DiffRO offer a powerful approach for controllable text-to-speech (TTS), but are vulnerable to reward hacking, particularly for nuanced tasks like emotion control. The policy model can exploit a vanilla Reward Model (RM) by generating acoustic artifacts to achieve spurious rewards, but at the cost of degrading perceptual quality. To address this, we propose Robust Reward Policy Optimization (RRPO), a novel framework that employs a hybrid regularization scheme. This scheme develops a robust RM whose reward signal is more reliably aligned with human perception, compelling the policy to abandon detrimental shortcuts and instead learn the complex features of genuine emotions. Our ablation study confirms the enhanced robustness of our RM, as evidenced by its strong cross-lingual generalization. The subjective evaluation demonstrates that this robust RM effectively mitigates reward hacking, leading to significant improvements in both emotional expressiveness and naturalness over all baselines. Demo page: https://lrwinr.github.io/RRPO-CosyVoice.

[217]  arXiv:2512.04554 [pdf, ps, other]
Title: Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Document Visual Question Answering (DocVQA) enables end-to-end reasoning grounded on information present in a document input. While recent models have shown impressive capabilities, they remain vulnerable to adversarial attacks. In this work, we introduce a novel attack scenario that aims to forge document content in a visually imperceptible yet semantically targeted manner, allowing an adversary to induce specific or generally incorrect answers from a DocVQA model. We develop specialized attack algorithms that can produce adversarially forged documents tailored to different attackers' goals, ranging from targeted misinformation to systematic model failure scenarios. We demonstrate the effectiveness of our approach against two end-to-end state-of-the-art models: Pix2Struct, a vision-language transformer that jointly processes image and text through sequence-to-sequence modeling, and Donut, a transformer-based model that directly extracts text and answers questions from document images. Our findings highlight critical vulnerabilities in current DocVQA systems and call for the development of more robust defenses.

[218]  arXiv:2512.04555 [pdf, ps, other]
Title: ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning
Comments: Under Review
Subjects: Computation and Language (cs.CL)

We propose ADAPT, a meta-learning algorithm that \emph{learns} task sampling proportions under an explicit token budget for multi-task instruction tuning. Instead of fixing task weights by hand, \adapt{} maintains a continuous distribution over tasks and updates it via meta-gradients of a smooth worst-case validation objective, inducing an adaptive curriculum that allocates more tokens to useful tasks while avoiding collapse. We instantiate ADAPT on three $\sim$1B-parameter open-weight LLMs (Gemma-3-1B, LLaMA-3.2-1B, Qwen-0.6B), training on 20 Natural Instructions task types under budgets of $1\%$, $5\%$, and $10\%$ of the available supervised tokens, and compare against strong supervised fine-tuning baselines with uniform and size-proportional mixing. We conduct evaluations on 11 out-of-domain benchmarks spanning reasoning, reading comprehension, code generation, and instruction following, we find that ADAPT matches or slightly improves average downstream performance relative to the best static mixture, while using fewer effective training tokens and reallocating budget toward harder, benchmark-aligned tasks.

[219]  arXiv:2512.04556 [pdf, ps, other]
Title: Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex
Comments: 10 pages, 7 figures
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

Image convolution with complex kernels is a fundamental operation in photography, scientific imaging, and animation effects, yet direct dense convolution is computationally prohibitive on resource-limited devices. Existing approximations, such as simulated annealing or low-rank decompositions, either lack efficiency or fail to capture non-convex kernels. We introduce a differentiable kernel decomposition framework that represents a target spatially-variant, dense, complex kernel using a set of sparse kernel samples. Our approach features (i) a decomposition that enables differentiable optimization of sparse kernels, (ii) a dedicated initialization strategy for non-convex shapes to avoid poor local minima, and (iii) a kernel-space interpolation scheme that extends single-kernel filtering to spatially varying filtering without retraining and additional runtime overhead. Experiments on Gaussian and non-convex kernels show that our method achieves higher fidelity than simulated annealing and significantly lower cost than low-rank decompositions. Our approach provides a practical solution for mobile imaging and real-time rendering, while remaining fully differentiable for integration into broader learning pipelines.

[220]  arXiv:2512.04557 [pdf, ps, other]
Title: Efficient Safety Verification of Autonomous Vehicles with Neural Network Operator
Subjects: Systems and Control (eess.SY)

When autonomous vehicles encounter untrained scenarios, ensuring safety hinges on effective safety verification to prevent accidents stemming from unexpected model decisions. Reachability analysis, a method of safety verification, offers relatively high precision but at the cost of significant computational complexity. Our method leverages end-to-end neural network operators to compute reachable sets, replacing traditional mathematical set operators, thereby achieving higher efficiency in safety verification without substantially compromising accuracy or increasing conservativeness. We define vehicle dynamics on discrete time series and detail the safety verification process and safety standard based on reachable sets. Experimental evaluations conducted in several typical road driving scenarios demonstrate the superior efficiency performance of our proposed operator over classical methods.

[221]  arXiv:2512.04558 [pdf, ps, other]
Title: On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference
Comments: 45 pages, 6 figures, 3 tables
Subjects: Machine Learning (cs.LG)

Test-time compute (TTC) has become an increasingly prominent paradigm for enhancing large language models (LLMs). Despite the empirical success of methods such as best-of-$n$ (BoN) sampling and sequential revision, their fundamental limits remain unclear. We address this gap by analyzing a mixture-of-reference policy model and proving that standard BoN is inherently suboptimal. To move closer to the optimal frontier, we study reward-filtered sequential inference, a simple procedure that selectively incorporates only high-reward generations into the context. This mechanism concentrates computation on superior policy candidates and suppresses inferior ones. On the theoretical side, we show that reward-filtered sequential inference yields strictly stronger guarantees than standard TTC paradigms. On the empirical side, we evaluate such an inference strategy across diverse benchmarks and observe consistent improvements over widely used approaches, demonstrating the practical effectiveness of our framework.

[222]  arXiv:2512.04559 [pdf, ps, other]
Title: Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
Comments: 36 pages, 21 figures, 4 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Diffusion models excel at generating high-likelihood samples but often require alignment with downstream objectives. Existing fine-tuning methods for diffusion models significantly suffer from reward over-optimization, resulting in high-reward but unnatural samples and degraded diversity. To mitigate over-optimization, we propose \textbf{Soft Q-based Diffusion Finetuning (SQDF)}, a novel KL-regularized RL method for diffusion alignment that applies a reparameterized policy gradient of a training-free, differentiable estimation of the soft Q-function. SQDF is further enhanced with three innovations: a discount factor for proper credit assignment in the denoising process, the integration of consistency models to refine Q-function estimates, and the use of an off-policy replay buffer to improve mode coverage and manage the reward-diversity trade-off. Our experiments demonstrate that SQDF achieves superior target rewards while preserving diversity in text-to-image alignment. Furthermore, in online black-box optimization, SQDF attains high sample efficiency while maintaining naturalness and diversity.

[223]  arXiv:2512.04561 [pdf, ps, other]
Title: Omniscient Attacker in Stochastic Security Games with Interdependent Nodes
Subjects: Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY)

The adoption of reinforcement learning for critical infrastructure defense introduces a vulnerability where sophisticated attackers can strategically exploit the defense algorithm's learning dynamics. While prior work addresses this vulnerability in the context of repeated normal-form games, its extension to the stochastic games remains an open research gap. We close this gap by examining stochastic security games between an RL defender and an omniscient attacker, utilizing a tractable linear influence network model. To overcome the structural limitations of prior methods, we propose and apply neuro-dynamic programming. Our experimental results demonstrate that the omniscient attacker can significantly outperform a naive defender, highlighting the critical vulnerability introduced by the learning dynamics and the effectiveness of the proposed strategy.

[224]  arXiv:2512.04562 [pdf, ps, other]
Title: LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models
Comments: 46 pages, 17 figures, 16 tables
Subjects: Machine Learning (cs.LG)

Generative machine learning (ML) models hold great promise for accelerating materials discovery through the inverse design of inorganic crystals, enabling an unprecedented exploration of chemical space. Yet, the lack of standardized evaluation frameworks makes it challenging to evaluate, compare, and further develop these ML models meaningfully. In this work, we introduce LeMat-GenBench, a unified benchmark for generative models of crystalline materials, supported by a set of evaluation metrics designed to better inform model development and downstream applications. We release both an open-source evaluation suite and a public leaderboard on Hugging Face, and benchmark 12 recent generative models. Results reveal that an increase in stability leads to a decrease in novelty and diversity on average, with no model excelling across all dimensions. Altogether, LeMat-GenBench establishes a reproducible and extensible foundation for fair model comparison and aims to guide the development of more reliable, discovery-oriented generative models for crystalline materials.

[225]  arXiv:2512.04563 [pdf, ps, other]
Title: COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual Spatial Reasoning is crucial for enabling Multimodal Large Language Models (MLLMs) to understand object properties and spatial relationships, yet current models still struggle with 3D-aware reasoning. Existing approaches typically enhance either perception, by augmenting RGB inputs with auxiliary modalities such as depth and segmentation, or reasoning, by training on spatial VQA datasets and applying reinforcement learning, and thus treat these two aspects in isolation. In this work, we investigate whether a unified MLLM can develop an intrinsic ability to enhance spatial perception and, through adaptive interleaved reasoning, achieve stronger spatial intelligence. We propose \textbf{COOPER}, a unified MLLM that leverages depth and segmentation as auxiliary modalities and is trained in two stages to acquire auxiliary modality generation and adaptive, interleaved reasoning capabilities. COOPER achieves an average \textbf{6.91\%} improvement in spatial reasoning while maintaining general performance. Moreover, even a variant trained only for auxiliary modality generation attains a \textbf{7.92\%} gain on distance and size estimation, suggesting that learning to generate auxiliary modalities helps internalize spatial knowledge and strengthen spatial understanding.

[226]  arXiv:2512.04564 [pdf, ps, other]
Title: Dataset creation for supervised deep learning-based analysis of microscopic images -- review of important considerations and recommendations
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Supervised deep learning (DL) receives great interest for automated analysis of microscopic images with an increasing body of literature supporting its potential. The development and validation of those DL models relies heavily on the availability of high-quality, large-scale datasets. However, creating such datasets is a complex and resource-intensive process, often hindered by challenges such as time constraints, domain variability, and risks of bias in image collection and label creation. This review provides a comprehensive guide to the critical steps in dataset creation, including: 1) image acquisition, 2) selection of annotation software, and 3) annotation creation. In addition to ensuring a sufficiently large number of images, it is crucial to address sources of image variability (domain shifts) - such as those related to slide preparation and digitization - that could lead to algorithmic errors if not adequately represented in the training data. Key quality criteria for annotations are the three "C"s: correctness, completeness, and consistency. This review explores methods to enhance annotation quality through the use of advanced techniques that mitigate the limitations of single annotators. To support dataset creators, a standard operating procedure (SOP) is provided as supplemental material, outlining best practices for dataset development. Furthermore, the article underscores the importance of open datasets in driving innovation and enhancing reproducibility of DL research. By addressing the challenges and offering practical recommendations, this review aims to advance the creation of and availability to high-quality, large-scale datasets, ultimately contributing to the development of generalizable and robust DL models for pathology applications.

[227]  arXiv:2512.04565 [pdf, ps, other]
Title: Adapt and Stabilize, Then Learn and Optimize: A New Approach to Adaptive LQR
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper focuses on adaptive control of the discrete-time linear quadratic regulator (adaptive LQR). Recent literature has made significant contributions in proving non-asymptotic convergence rates, but existing approaches have a few drawbacks that pose barriers for practical implementation. These drawbacks include (i) a requirement of an initial stabilizing controller, (ii) a reliance on exploration for closed-loop stability, and/or (iii) computationally intensive algorithms. This paper proposes a new algorithm that overcomes these drawbacks for a particular class of discrete-time systems. This algorithm leverages direct Model-Reference Adaptive Control (direct MRAC) and combines it with an epoch-based approach in order to address the drawbacks (i)-(iii) with a provable high-probability regret bound comparable to existing literature. Simulations demonstrate that the proposed approach yields regrets that are comparable to those from existing methods when the conditions (i) and (ii) are met, and yields regrets that are significantly smaller when either of these two conditions is not met.

[228]  arXiv:2512.04566 [pdf, ps, other]
Title: Reliable Statistical Guarantees for Conformal Predictors with Small Datasets
Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

Surrogate models (including deep neural networks and other machine learning algorithms in supervised learning) are capable of approximating arbitrarily complex, high-dimensional input-output problems in science and engineering, but require a thorough data-agnostic uncertainty quantification analysis before these can be deployed for any safety-critical application. The standard approach for data-agnostic uncertainty quantification is to use conformal prediction (CP), a well-established framework to build uncertainty models with proven statistical guarantees that do not assume any shape for the error distribution of the surrogate model. However, since the classic statistical guarantee offered by CP is given in terms of bounds for the marginal coverage, for small calibration set sizes (which are frequent in realistic surrogate modelling that aims to quantify error at different regions), the potentially strong dispersion of the coverage distribution around its average negatively impacts the reliability of the uncertainty model, often obtaining coverages below the expected value, resulting in a less applicable framework. After providing a gentle presentation of uncertainty quantification for surrogate models for machine learning practitioners, in this paper we bridge the gap by proposing a new statistical guarantee that offers probabilistic information for the coverage of a single conformal predictor. We show that the proposed framework converges to the standard solution offered by CP for large calibration set sizes and, unlike the classic guarantee, still offers reliable information about the coverage of a conformal predictor for small data sizes. We illustrate and validate the methodology in a suite of examples, and implement an open access software solution that can be used alongside common conformal prediction libraries to obtain uncertainty models that fulfil the new guarantee.

[229]  arXiv:2512.04568 [pdf, ps, other]
Title: Prompt2Craft: Generating Functional Craft Assemblies with LLMs
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task that involves building an accurate representation of a given target object using the available objects, which do not directly correspond to its parts. In this work, we focus on selecting the subset of available objects for the final craft, when the given input is an RGB image of the target in the wild. We use a mask segmentation neural network to identify visible parts, followed by retrieving labeled template meshes. These meshes undergo pose optimization to determine the most suitable template. Then, we propose to simplify the parts of the transformed template mesh to primitive shapes like cuboids or cylinders. Finally, we design a search algorithm to find correspondences in the scene based on local and global proportions. We develop baselines for comparison that consider all possible combinations, and choose the highest scoring combination for common metrics used in foreground maps and mask accuracy. Our approach achieves comparable results to the baselines for two different scenes, and we show qualitative results for an implementation in a real-world scenario.

[230]  arXiv:2512.04571 [pdf, ps, other]
Title: Temp-SCONE: A Novel Out-of-Distribution Detection and Domain Generalization Framework for Wild Data with Temporal Shift
Comments: 22 pages, 12 figures, 72 subfigures, 6 tables
Subjects: Machine Learning (cs.LG)

Open-world learning (OWL) requires models that can adapt to evolving environments while reliably detecting out-of-distribution (OOD) inputs. Existing approaches, such as SCONE, achieve robustness to covariate and semantic shifts but assume static environments, leading to degraded performance in dynamic domains. In this paper, we propose Temp-SCONE, a temporally consistent extension of SCONE designed to handle temporal shifts in dynamic environments. Temp-SCONE introduces a confidence-driven regularization loss based on Average Thresholded Confidence (ATC), penalizing instability in predictions across time steps while preserving SCONE's energy-margin separation. Experiments on dynamic datasets demonstrate that Temp-SCONE significantly improves robustness under temporal drift, yielding higher corrupted-data accuracy and more reliable OOD detection compared to SCONE. On distinct datasets without temporal continuity, Temp-SCONE maintains comparable performance, highlighting the importance and limitations of temporal regularization. Our theoretical insights on temporal stability and generalization error further establish Temp-SCONE as a step toward reliable OWL in evolving dynamic environments.

[231]  arXiv:2512.04573 [pdf, other]
Title: A Rocq Formalization of Monomial and Graded Orders
Authors: Sylvie Boldo (TOCCATA), François Clément (SERENA, CERMICS), Vincent Martin (LMAC), Micaela Mayero (LIPN)
Subjects: Logic in Computer Science (cs.LO)

Even if binary relations and orders are a common formalization topic, we need to formalize specific orders (namely monomial and graded) in the process of formalizing in Rocq the finite element method. This article is therefore definitions, operators, and proofs of properties about relations and orders, thus providing a comprehensive Rocq library. We especially focus on monomial orders, that are total orders compatible with the monoid operation. More than its definition and proved properties, we define several of them, among them the lexicographic and grevlex orders. For the sake of genericity, we formalize the grading of an order, a high-level operator that transforms a binary relation into another one, and we prove that grading an order preserves many of its properties, such as the monomial order property. This leads us to the definition and properties of four different graded orders, with very factorized proofs. We therefore provide a comprehensive and user-friendly library in Rocq about orders, including monomial and graded orders, that contains more than 700 lemmas.

[232]  arXiv:2512.04576 [pdf, ps, other]
Title: TARDis: Time Attenuated Representation Disentanglement for Incomplete Multi-Modal Tumor Segmentation and Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Tumor segmentation and diagnosis in contrast-enhanced Computed Tomography (CT) rely heavily on the physiological dynamics of contrast agents. However, obtaining a complete multi-phase series is often clinically unfeasible due to radiation concerns or scanning limitations, leading to the "missing modality" problem. Existing deep learning approaches typically treat missing phases as absent independent channels, ignoring the inherent temporal continuity of hemodynamics. In this work, we propose Time Attenuated Representation Disentanglement (TARDis), a novel physics-aware framework that redefines missing modalities as missing sample points on a continuous Time-Attenuation Curve. TARDis explicitly disentangles the latent feature space into a time-invariant static component (anatomy) and a time-dependent dynamic component (perfusion). We achieve this via a dual-path architecture: a quantization-based path using a learnable embedding dictionary to extract consistent anatomical structures, and a probabilistic path using a Conditional Variational Autoencoder to model dynamic enhancement conditioned on the estimated scan time. This design allows the network to hallucinate missing hemodynamic features by sampling from the learned latent distribution. Extensive experiments on a large-scale private abdominal CT dataset (2,282 cases) and two public datasets demonstrate that TARDis significantly outperforms state-of-the-art incomplete modality frameworks. Notably, our method maintains robust diagnostic performance even in extreme data-sparsity scenarios, highlighting its potential for reducing radiation exposure while maintaining diagnostic precision.

[233]  arXiv:2512.04578 [pdf, ps, other]
Title: LexGenius: An Expert-Level Benchmark for Large Language Models in Legal General Intelligence
Subjects: Computation and Language (cs.CL)

Legal general intelligence (GI) refers to artificial intelligence (AI) that encompasses legal understanding, reasoning, and decision-making, simulating the expertise of legal experts across domains. However, existing benchmarks are result-oriented and fail to systematically evaluate the legal intelligence of large language models (LLMs), hindering the development of legal GI. To address this, we propose LexGenius, an expert-level Chinese legal benchmark for evaluating legal GI in LLMs. It follows a Dimension-Task-Ability framework, covering seven dimensions, eleven tasks, and twenty abilities. We use the recent legal cases and exam questions to create multiple-choice questions with a combination of manual and LLM reviews to reduce data leakage risks, ensuring accuracy and reliability through multiple rounds of checks. We evaluate 12 state-of-the-art LLMs using LexGenius and conduct an in-depth analysis. We find significant disparities across legal intelligence abilities for LLMs, with even the best LLMs lagging behind human legal professionals. We believe LexGenius can assess the legal intelligence abilities of LLMs and enhance legal GI development. Our project is available at https://github.com/QwenQKing/LexGenius.

[234]  arXiv:2512.04579 [pdf, ps, other]
Title: Gauss-Newton accelerated MPPI Control
Comments: 6 pages, 3 figures, submitted to the IFAC World Congress 2026
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

Model Predictive Path Integral (MPPI) control is a sampling-based optimization method that has recently attracted attention, particularly in the robotics and reinforcement learning communities. MPPI has been widely applied as a GPU-accelerated random search method to deterministic direct single-shooting optimal control problems arising in model predictive control (MPC) formulations. MPPI offers several key advantages, including flexibility, robustness, ease of implementation, and inherent parallelizability. However, its performance can deteriorate in high-dimensional settings since the optimal control problem is solved via Monte Carlo sampling. To address this limitation, this paper proposes an enhanced MPPI method that incorporates a Jacobian reconstruction technique and the second-order Generalized Gauss-Newton method. This novel approach is called \textit{Gauss-Newton accelerated MPPI}. The numerical results show that the Gauss-Newton accelerated MPPI approach substantially improves MPPI scalability and computational efficiency while preserving the key benefits of the classical MPPI framework, making it a promising approach even for high-dimensional problems.

[235]  arXiv:2512.04580 [pdf, ps, other]
Title: A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

To enhance the performance of large language models (LLMs) in various domain-specific applications, sensitive data such as healthcare, law, and finance are being used to privately customize or fine-tune these models. Such privately adapted LLMs are regarded as either personal privacy assets or corporate intellectual property. Therefore, protecting model weights and maintaining strict confidentiality during deployment and distribution have become critically important. However, existing model formats and deployment frameworks provide little to no built-in support for confidentiality, access control, or secure integration with trusted hardware. Current methods for securing model deployment either rely on computationally expensive cryptographic techniques or tightly controlled private infrastructure. Although these approaches can be effective in specific scenarios, they are difficult and costly for widespread deployment.
In this paper, we introduce CryptoTensors, a secure and format-compatible file structure for confidential LLM distribution. Built as an extension to the widely adopted Safetensors format, CryptoTensors incorporates tensor-level encryption and embedded access control policies, while preserving critical features such as lazy loading and partial deserialization. It enables transparent decryption and automated key management, supporting flexible licensing and secure model execution with minimal overhead. We implement a proof-of-concept library, benchmark its performance across serialization and runtime scenarios, and validate its compatibility with existing inference frameworks, including Hugging Face Transformers and vLLM. Our results highlight CryptoTensors as a light-weight, efficient, and developer-friendly solution for safeguarding LLM weights in real-world and widespread deployments.

[236]  arXiv:2512.04581 [pdf, ps, other]
Title: Infrared UAV Target Tracking with Dynamic Feature Refinement and Global Contextual Attention Knowledge Distillation
Comments: Accepted by IEEE TMM
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Unmanned aerial vehicle (UAV) target tracking based on thermal infrared imaging has been one of the most important sensing technologies in anti-UAV applications. However, the infrared UAV targets often exhibit weak features and complex backgrounds, posing significant challenges to accurate tracking. To address these problems, we introduce SiamDFF, a novel dynamic feature fusion Siamese network that integrates feature enhancement and global contextual attention knowledge distillation for infrared UAV target (IRUT) tracking. The SiamDFF incorporates a selective target enhancement network (STEN), a dynamic spatial feature aggregation module (DSFAM), and a dynamic channel feature aggregation module (DCFAM). The STEN employs intensity-aware multi-head cross-attention to adaptively enhance important regions for both template and search branches. The DSFAM enhances multi-scale UAV target features by integrating local details with global features, utilizing spatial attention guidance within the search frame. The DCFAM effectively integrates the mixed template generated from STEN in the template branch and original template, avoiding excessive background interference with the template and thereby enhancing the emphasis on UAV target region features within the search frame. Furthermore, to enhance the feature extraction capabilities of the network for IRUT without adding extra computational burden, we propose a novel tracking-specific target-aware contextual attention knowledge distiller. It transfers the target prior from the teacher network to the student model, significantly improving the student network's focus on informative regions at each hierarchical level of the backbone network. Extensive experiments on real infrared UAV datasets demonstrate that the proposed approach outperforms state-of-the-art target trackers under complex backgrounds while achieving a real-time tracking speed.

[237]  arXiv:2512.04585 [pdf, ps, other]
Title: SAM3-I: Segment Anything with Instructions
Comments: Preliminary results; work in progress
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Segment Anything Model 3 (SAM3) has advanced open-vocabulary segmentation through promptable concept segmentation, allowing users to segment all instances corresponding to a given concept, typically specified with short noun-phrase (NP) prompts. While this marks the first integration of language-level concepts within the SAM family, real-world usage typically requires far richer expressions that include attributes, spatial relations, functionalities, actions, states, and even implicit reasoning over instances. Currently, SAM3 relies on external multi-modal agents to convert complex instructions into NPs and then conduct iterative mask filtering. However, these NP-level concepts remain overly coarse, often failing to precisely represent a specific instance. In this work, we present SAM3-I, an enhanced framework that unifies concept-level understanding and instruction-level reasoning within the SAM family. SAM3-I introduces an instruction-aware cascaded adaptation mechanism that progressively aligns expressive instruction semantics with SAM3's existing vision-language representations, enabling direct instruction-following segmentation without sacrificing its original concept-driven capabilities. Furthermore, we design a structured instruction taxonomy spanning concept, simple, and complex levels, and develop a scalable data engine to construct a dataset with diverse instruction-mask pairs. Experiments show that SAM3-I delivers appealing performance, demonstrating that SAM3 can be effectively extended to follow natural-language instructions while preserving its strong concept grounding. We open-source SAM3-I and provide practical fine-tuning workflows, enabling researchers to adapt it to domain-specific applications. The source code is available here.

[238]  arXiv:2512.04588 [pdf, ps, other]
Title: UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems
Subjects: Information Retrieval (cs.IR)

Resources for simulation-based evaluation of conversational recommender systems (CRSs) are scarce. The UserSimCRS toolkit was introduced to address this gap. In this work, we present UserSimCRS v2, a significant upgrade aligning the toolkit with state-of-the-art research. Key extensions include an enhanced agenda-based user simulator, introduction of large language model-based simulators, integration for a wider range of CRSs and datasets, and new LLM-as-a-judge evaluation utilities. We demonstrate these extensions in a case study.

[239]  arXiv:2512.04590 [pdf, ps, other]
Title: Exploiting \texttt{ftrace}'s \texttt{function\_graph} Tracer Features for Machine Learning: A Case Study on Encryption Detection
Comments: Conference paper presented at AICCSA 2025
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

This paper proposes using the Linux kernel ftrace framework, particularly the function graph tracer, to generate informative system level data for machine learning (ML) applications. Experiments on a real world encryption detection task demonstrate the efficacy of the proposed features across several learning algorithms. The learner faces the problem of detecting encryption activities across a large dataset of files, using function call traces and graph based features. Empirical results highlight an outstanding accuracy of 99.28 on the task at hand, underscoring the efficacy of features derived from the function graph tracer. The results were further validated in an additional experiment targeting a multilabel classification problem, in which running programs were identified from trace data. This work provides comprehensive methodologies for preprocessing raw trace data and extracting graph based features, offering significant advancements in applying ML to system behavior analysis, program identification, and anomaly detection. By bridging the gap between system tracing and ML, this paper paves the way for innovative solutions in performance monitoring and security analytics.

[240]  arXiv:2512.04592 [pdf, ps, other]
Title: Stable self-adaptive timestepping for Reduced Order Models for incompressible flows
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)

This work introduces RedEigCD, the first self-adaptive timestepping technique specifically tailored for reduced-order models (ROMs) of the incompressible Navier-Stokes equations. Building upon linear stability concepts, the method adapts the timestep by directly bounding the stability function of the employed time integration scheme using exact spectral information of matrices related to the reduced operators. Unlike traditional error-based adaptive methods, RedEigCD relies on the eigenbounds of the convective and diffusive ROM operators, whose computation is feasible at reduced scale and fully preserves the online efficiency of the ROM. A central theoretical contribution of this work is the proof, based on the combined theorems of Bendixson and Rao, that, under linearized assumptions, the maximum stable timestep for projection-based ROMs is shown to be larger than or equal to that of their corresponding full-order models (FOMs). Numerical experiments for both periodic and non-homogeneous boundary conditions demonstrate that RedEigCD yields stable timestep increases up to a factor 40 compared to the FOM, without compromising accuracy. The methodology thus establishes a new link between linear stability theory and reduced-order modeling, offering a systematic path towards efficient, self-regulating ROM integration in incompressible flow simulations.

[241]  arXiv:2512.04596 [pdf, ps, other]
Title: QoSDiff: An Implicit Topological Embedding Learning Framework Leveraging Denoising Diffusion and Adversarial Attention for Robust QoS Prediction
Comments: Preprint submitted to IEEE Transactions on Services Computing
Subjects: Machine Learning (cs.LG)

Accurate Quality of Service (QoS) prediction is fundamental to service computing, providing essential data-driven guidance for service selection and ensuring superior user experiences. However, prevalent approaches, particularly Graph Neural Networks (GNNs), heavily rely on constructing explicit user--service interaction graphs. This dependency introduces severe scalability bottlenecks and limits performance when explicit connections are sparse or corrupted by noise. To address these challenges, this paper introduces \emph{QoSDiff}, a novel embedding learning framework that bypasses the prerequisite of explicit graph construction. Specifically, it leverages a denoising diffusion probabilistic model to recover intrinsic latent structures from noisy initializations. To further capture high-order interactions, we propose an adversarial interaction module that integrates a bidirectional hybrid attention mechanism. This adversarial paradigm dynamically distinguishes informative patterns from noise, enabling a dual-perspective modeling of intricate user--service associations. Extensive experiments on two large-scale real-world datasets demonstrate that QoSDiff significantly outperforms state-of-the-art baselines. Notably, the results highlight the framework's superior cross-dataset generalization capability and exceptional robustness against data sparsity and observational noise.

[242]  arXiv:2512.04597 [pdf, ps, other]
Title: When Robots Should Say "I Don't Know": Benchmarking Abstention in Embodied Question Answering
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

Embodied Question Answering (EQA) requires an agent to interpret language, perceive its environment, and navigate within 3D scenes to produce responses. Existing EQA benchmarks assume that every question must be answered, but embodied agents should know when they do not have sufficient information to answer. In this work, we focus on a minimal requirement for EQA agents, abstention: knowing when to withhold an answer. From an initial study of 500 human queries, we find that 32.4% contain missing or underspecified context. Drawing on this initial study and cognitive theories of human communication errors, we derive five representative categories requiring abstention: actionability limitation, referential underspecification, preference dependence, information unavailability, and false presupposition. We augment OpenEQA by having annotators transform well-posed questions into ambiguous variants outlined by these categories. The resulting dataset, AbstainEQA, comprises 1,636 annotated abstention cases paired with 1,636 original OpenEQA instances for balanced evaluation. Evaluating on AbstainEQA, we find that even the best frontier model only attains 42.79% abstention recall, while humans achieve 91.17%. We also find that scaling, prompting, and reasoning only yield marginal gains, and that fine-tuned models overfit to textual cues. Together, these results position abstention as a fundamental prerequisite for reliable interaction in embodied settings and as a necessary basis for effective clarification.

[243]  arXiv:2512.04598 [pdf, ps, other]
Title: The Ethics of Generative AI
Authors: Michael Klenk
Comments: Draft version to appear as a chapter in the Encyclopedia of Applied Ethics, 3rd Edition, edited by Ruth Chadwick
Subjects: Artificial Intelligence (cs.AI)

This chapter discusses the ethics of generative AI. It provides a technical primer to show how generative AI affords experiencing technology as if it were human, and this affordance provides a fruitful focus for the philosophical ethics of generative AI. It then shows how generative AI can both aggravate and alleviate familiar ethical concerns in AI ethics, including responsibility, privacy, bias and fairness, and forms of alienation and exploitation. Finally, the chapter examines ethical questions that arise specifically from generative AI's mimetic generativity, such as debates about authorship and credit, the emergence of as-if social relationships with machines, and new forms of influence, persuasion, and manipulation.

[244]  arXiv:2512.04599 [pdf, ps, other]
Title: Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Detecting illicit visual content demands more than image-level NSFW flags; moderators must also know what objects make an image illegal and where those objects occur. We introduce a zero-shot pipeline that simultaneously (i) detects if an image contains harmful content, (ii) identifies each critical element involved, and (iii) localizes those elements with pixel-accurate masks - all in one pass. The system first applies foundation segmentation model (SAM) to generate candidate object masks and refines them into larger independent regions. Each region is scored for malicious relevance by a vision-language model using open-vocabulary prompts; these scores weight a fusion step that produces a consolidated malicious object map. An ensemble across multiple segmenters hardens the pipeline against adaptive attacks that target any single segmentation method. Evaluated on a newly-annotated 790-image dataset spanning drug, sexual, violent and extremist content, our method attains 85.8% element-level recall, 78.1% precision and a 92.1% segment-success rate - exceeding direct zero-shot VLM localization by 27.4% recall at comparable precision. Against PGD adversarial perturbations crafted to break SAM and VLM, our method's precision and recall decreased by no more than 10%, demonstrating high robustness against attacks. The full pipeline processes an image in seconds, plugs seamlessly into existing VLM workflows, and constitutes the first practical tool for fine-grained, explainable malicious-image moderation.

[245]  arXiv:2512.04601 [pdf, ps, other]
Title: Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
Comments: 22 pages, 4 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Large language model (LLM) agents -- LLMs that dynamically interact with an environment over long horizons -- have become an increasingly important area of research, enabling automation in complex tasks involving tool-use, web browsing, and dialogue with people. In the absence of expert demonstrations, training LLM agents has relied on policy gradient methods that optimize LLM policies with respect to an (often sparse) reward function. However, in long-horizon tasks with sparse rewards, learning from trajectory-level rewards can be noisy, leading to training that is unstable and has high sample complexity. Furthermore, policy improvement hinges on discovering better actions through exploration, which can be difficult when actions lie in natural language space. In this paper, we propose Natural Language Actor-Critic (NLAC), a novel actor-critic algorithm that trains LLM policies using a generative LLM critic that produces natural language rather than scalar values. This approach leverages the inherent strengths of LLMs to provide a richer and more actionable training signal; particularly, in tasks with large, open-ended action spaces, natural language explanations for why an action is suboptimal can be immensely useful for LLM policies to reason how to improve their actions, without relying on random exploration. Furthermore, our approach can be trained off-policy without policy gradients, offering a more data-efficient and stable alternative to existing on-policy methods. We present results on a mixture of reasoning, web browsing, and tool-use with dialogue tasks, demonstrating that NLAC shows promise in outperforming existing training approaches and offers a more scalable and stable training paradigm for LLM agents.

[246]  arXiv:2512.04609 [pdf, ps, other]
Title: Strategies for zero boil-off liquid hydrogen transfer: an export terminal case-study
Comments: 9 pages, 9 figures
Subjects: Systems and Control (eess.SY)

To ensure economic viability, LH2 export terminals must minimize boil-off losses. We show two strategies to achieve zero boil-off losses for the transfer of 160 000 m3 LH2 (11 248 tons) using a centrifugal pump. In the first strategy, a pump with variable speed drive (VSD) and split-range control for the flow rate achieves losses from 0 wt% to 0.24 wt% in an uncertainty analysis. A pump efficiency approaching 70% is the most important factor to minimize losses. In contrast, a fixed-speed pump has unacceptably high losses ranging from 0.76 wt% to 1.06 wt% (119 tons per ship). The second strategy is to increase the maximum pressure in the seaborne tank (base case is 1.15 bara). Zero loss is achieved for the fixed speed pump if the maximum pressure is increased to 1.35 bara, while 1.22 bara is required for the pump with VSD assuming an efficiency of 60%.

[247]  arXiv:2512.04610 [pdf, ps, other]
Title: Weakly-sparse and strongly flip-flat classes of graphs are uniformly almost-wide
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)

In this work we take a step towards characterising strongly flip-flat classes of graphs. Strong flip-flatness appears to be the analogue of uniform almost-wideness in the setting of dense classes of graphs. We prove that strongly flip-flat classes of graphs that are weakly sparse are indeed uniformly almost-wide.

[248]  arXiv:2512.04611 [pdf, ps, other]
Title: PBFuzz: Agentic Directed Fuzzing for PoV Generation
Comments: 24 pages, 8 figures
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Proof-of-Vulnerability (PoV) input generation is a critical task in software security and supports downstream applications such as path generation and validation. Generating a PoV input requires solving two sets of constraints: (1) reachability constraints for reaching vulnerable code locations, and (2) triggering constraints for activating the target vulnerability. Existing approaches, including directed greybox fuzzing and LLM-assisted fuzzing, struggle to efficiently satisfy these constraints. This work presents an agentic method that mimics human experts. Human analysts iteratively study code to extract semantic reachability and triggering constraints, form hypotheses about PoV triggering strategies, encode them as test inputs, and refine their understanding using debugging feedback. We automate this process with an agentic directed fuzzing framework called PBFuzz. PBFuzz tackles four challenges in agentic PoV generation: autonomous code reasoning for semantic constraint extraction, custom program-analysis tools for targeted inference, persistent memory to avoid hypothesis drift, and property-based testing for efficient constraint solving while preserving input structure. Experiments on the Magma benchmark show strong results. PBFuzz triggered 57 vulnerabilities, surpassing all baselines, and uniquely triggered 17 vulnerabilities not exposed by existing fuzzers. PBFuzz achieved this within a 30-minute budget per target, while conventional approaches use 24 hours. Median time-to-exposure was 339 seconds for PBFuzz versus 8680 seconds for AFL++ with CmpLog, giving a 25.6x efficiency improvement with an API cost of 1.83 USD per vulnerability.

[249]  arXiv:2512.04614 [pdf, ps, other]
Title: On Tight FPT Time Approximation Algorithms for k-Clustering Problems
Subjects: Data Structures and Algorithms (cs.DS)

Following recent advances in combining approximation algorithms with fixed-parameter tractability (FPT), we study FPT-time approximation algorithms for minimum-norm $k$-clustering problems, parameterized by the number $k$ of open facilities.
For the capacitated setting, we give a tight $(3+\epsilon)$-approximation for the general-norm capacitated $k$-clustering problem in FPT-time parameterized by $k$ and $\epsilon$. Prior to our work, such a result was only known for the capacitated $k$-median problem [CL, ICALP, 2019]. As a special case, our result yields an FPT-time $3$-approximation for capacitated $k$-center. The problem has not been studied in the FPT-time setting, with the previous best known polynomial-time approximation ratio being 9 [ABCG, MP, 2015].
In the uncapacitated setting, we consider the $top$-$cn$ norm $k$-clustering problem, where the goal of the problem is to minimize the $top$-$cn$ norm of the connection distance vector. Our main result is a tight $\big(1 + \frac 2{ec} + \epsilon\big)$-approximation algorithm for the problem with $c \in \big(\frac1e, 1\big]$. (For the case $c \leq \frac1e$, there is a simple tight $(3+\epsilon)$-approximation.) Our framework can be easily extended to give a tight $\left(3, 1+\frac2e + \epsilon\right)$-bicriteria approximation for the ($k$-center, $k$-median) problem in FPT time, improving the previous best polynomial-time $(4, 8)$ guarantee [AB, WAOA, 2017].
All results are based on a unified framework: computing a $(1+\epsilon)$-approximate solution using $O\left(\frac{k\log n}{\epsilon}\right)$ facilities $S$ via LP rounding, sampling a few client representatives $R$ based on the solution $S$, guessing a few pivots from $S \cup R$ and some radius information on the pivots, and solving the problem using the guesses. We believe this framework can lead to further results on $k$-clustering problems.

[250]  arXiv:2512.04616 [pdf, ps, other]
Title: Standard audiogram classification from loudness scaling data using unsupervised, supervised, and explainable machine learning techniques
Subjects: Sound (cs.SD); Medical Physics (physics.med-ph)

To address the calibration and procedural challenges inherent in remote audiogram assessment for rehabilitative audiology, this study investigated whether calibration-independent adaptive categorical loudness scaling (ACALOS) data can be used to approximate individual audiograms by classifying listeners into standard Bisgaard audiogram types using machine learning. Three classes of machine learning approaches - unsupervised, supervised, and explainable - were evaluated. Principal component analysis (PCA) was performed to extract the first two principal components, which together explained more than 50 percent of the variance. Seven supervised multi-class classifiers were trained and compared, alongside unsupervised and explainable methods. Model development and evaluation used a large auditory reference database containing ACALOS data (N = 847). The PCA factor map showed substantial overlap between listeners, indicating that cleanly separating participants into six Bisgaard classes based solely on their loudness patterns is challenging. Nevertheless, the models demonstrated reasonable classification performance, with logistic regression achieving the highest accuracy among supervised approaches. These findings demonstrate that machine learning models can predict standard Bisgaard audiogram types, within certain limits, from calibration-independent loudness perception data, supporting potential applications in remote or resource-limited settings without requiring a traditional audiogram.

[251]  arXiv:2512.04617 [pdf, ps, other]
Title: Score Matching for Estimating Finite Point Processes
Subjects: Machine Learning (cs.LG)

Score matching estimators have garnered significant attention in recent years because they eliminate the need to compute normalizing constants, thereby mitigating the computational challenges associated with maximum likelihood estimation (MLE).While several studies have proposed score matching estimators for point processes, this work highlights the limitations of these existing methods, which stem primarily from the lack of a mathematically rigorous analysis of how score matching behaves on finite point processes -- special random configurations on bounded spaces where many of the usual assumptions and properties of score matching no longer hold. To this end, we develop a formal framework for score matching on finite point processes via Janossy measures and, within this framework, introduce an (autoregressive) weighted score-matching estimator, whose statistical properties we analyze in classical parametric settings. For general nonparametric (e.g., deep) point process models, we show that score matching alone does not uniquely identify the ground-truth distribution due to subtle normalization issues, and we propose a simple survival-classification augmentation that yields a complete, integration-free training objective for any intensity-based point process model for spatio-temporal case. Experiments on synthetic and real-world temporal and spatio-temporal datasets, demonstrate that our method accurately recovers intensities and achieves performance comparable to MLE with better efficiency.

[252]  arXiv:2512.04618 [pdf, ps, other]
Title: Neural Decoding of Overt Speech from ECoG Using Vision Transformers and Contrastive Representation Learning
Subjects: Artificial Intelligence (cs.AI)

Speech Brain Computer Interfaces (BCIs) offer promising solutions to people with severe paralysis unable to communicate. A number of recent studies have demonstrated convincing reconstruction of intelligible speech from surface electrocorticographic (ECoG) or intracortical recordings by predicting a series of phonemes or words and using downstream language models to obtain meaningful sentences. A current challenge is to reconstruct speech in a streaming mode by directly regressing cortical signals into acoustic speech. While this has been achieved recently using intracortical data, further work is needed to obtain comparable results with surface ECoG recordings. In particular, optimizing neural decoders becomes critical in this case. Here we present an offline speech decoding pipeline based on an encoder-decoder deep neural architecture, integrating Vision Transformers and contrastive learning to enhance the direct regression of speech from ECoG signals. The approach is evaluated on two datasets, one obtained with clinical subdural electrodes in an epileptic patient, and another obtained with the fully implantable WIMAGINE epidural system in a participant of a motor BCI trial. To our knowledge this presents a first attempt to decode speech from a fully implantable and wireless epidural recording system offering perspectives for long-term use.

[253]  arXiv:2512.04619 [pdf, ps, other]
Title: Denoise to Track: Harnessing Video Diffusion Priors for Robust Correspondence
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this work, we introduce HeFT (Head-Frequency Tracker), a zero-shot point tracking framework that leverages the visual priors of pretrained video diffusion models. To better understand how they encode spatiotemporal information, we analyze the internal representations of Video Diffusion Transformer (VDiT). Our analysis reveals that attention heads act as minimal functional units with distinct specializations for matching, semantic understanding, and positional encoding. Additionally, we find that the low-frequency components in VDiT features are crucial for establishing correspondences, whereas the high-frequency components tend to introduce noise. Building on these insights, we propose a head- and frequency-aware feature selection strategy that jointly selects the most informative attention head and low-frequency components to enhance tracking performance. Specifically, our method extracts discriminative features through single-step denoising, applies feature selection, and employs soft-argmax localization with forward-backward consistency checks for correspondence estimation. Extensive experiments on TAP-Vid benchmarks demonstrate that HeFT achieves state-of-the-art zero-shot tracking performance, approaching the accuracy of supervised methods while eliminating the need for annotated training data. Our work further underscores the promise of video diffusion models as powerful foundation models for a wide range of downstream tasks, paving the way toward unified visual foundation models.

[254]  arXiv:2512.04625 [pdf, ps, other]
Title: Rethinking Decoupled Knowledge Distillation: A Predictive Distribution Perspective
Comments: Accepted to IEEE TNNLS
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

In the history of knowledge distillation, the focus has once shifted over time from logit-based to feature-based approaches. However, this transition has been revisited with the advent of Decoupled Knowledge Distillation (DKD), which re-emphasizes the importance of logit knowledge through advanced decoupling and weighting strategies. While DKD marks a significant advancement, its underlying mechanisms merit deeper exploration. As a response, we rethink DKD from a predictive distribution perspective. First, we introduce an enhanced version, the Generalized Decoupled Knowledge Distillation (GDKD) loss, which offers a more versatile method for decoupling logits. Then we pay particular attention to the teacher model's predictive distribution and its impact on the gradients of GDKD loss, uncovering two critical insights often overlooked: (1) the partitioning by the top logit considerably improves the interrelationship of non-top logits, and (2) amplifying the focus on the distillation loss of non-top logits enhances the knowledge extraction among them. Utilizing these insights, we further propose a streamlined GDKD algorithm with an efficient partition strategy to handle the multimodality of teacher models' predictive distribution. Our comprehensive experiments conducted on a variety of benchmarks, including CIFAR-100, ImageNet, Tiny-ImageNet, CUB-200-2011, and Cityscapes, demonstrate GDKD's superior performance over both the original DKD and other leading knowledge distillation methods. The code is available at https://github.com/ZaberKo/GDKD.

[255]  arXiv:2512.04629 [pdf, ps, other]
Title: BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation
Subjects: Artificial Intelligence (cs.AI)

Molecules play a crucial role in biomedical research and discovery, particularly in the field of small molecule drug development. Given the rapid advancements in large language models, especially the recent emergence of reasoning models, it is natural to explore how a general-purpose language model can be efficiently adapted for molecular science applications. In this work, we introduce BioMedGPT-Mol, a molecular language model designed to support molecular understanding and generation tasks. By curating and unifying existing public instruction datasets, we have assembled a large-scale, comprehensive, and high-quality training dataset. The model is then fine-tuned through a meticulously designed multi-task learning framework. On a consolidated benchmark derived from LlaSMol, TOMG-Bench, and MuMOInstruct, BioMedGPT-Mol achieves remarkable performance. Our experimental results demonstrate that a general-purpose reasoning model can be effectively and efficiently post-trained into a professional molecular language model through a well-structured multi-task curriculum. Leveraging the power of it, we further explore retrosynthetic planning task, and the performance on RetroBench demonstrates its competitive capability of acting as an end-to-end retrosynthetic planner. We anticipate that our approach can be extended to other biomedical scientific domains.

[256]  arXiv:2512.04630 [pdf, ps, other]
Title: Reflection-Satisfaction Tradeoff: Investigating Impact of Reflection on Student Engagement with AI-Generated Programming Hints
Comments: Preprint
Subjects: Computers and Society (cs.CY)

Generative AI tools, such as AI-generated hints, are increasingly integrated into programming education to offer timely, personalized support. However, little is known about how to effectively leverage these hints while ensuring autonomous and meaningful learning. One promising approach involves pairing AI-generated hints with reflection prompts, asking students to review and analyze their learning, when they request hints. This study investigates the interplay between AI-generated hints and different designs of reflection prompts in an online introductory programming course. We conducted a two-trial field experiment. In Trial 1, students were randomly assigned to receive prompts either before or after receiving hints, or no prompt at all. Each prompt also targeted one of three SRL phases: planning, monitoring, and evaluation. In Trial 2, we examined two types of prompt guidance: directed (offering more explicit and structured guidance) and open (offering more general and less constrained guidance). Findings show that students in the before-hint (RQ1), planning (RQ2), and directed (RQ3) prompt groups produced higher-quality reflections but reported lower satisfaction with AI-generated hints than those in other conditions. Immediate performance did not differ across conditions. This negative relationship between reflection quality and hint satisfaction aligns with previous work on student mental effort and satisfaction. Our results highlight the need to reconsider how AI models are trained and evaluated for education, as prioritizing user satisfaction can undermine deeper learning.

[257]  arXiv:2512.04632 [pdf, other]
Title: Turbo-Muon: Accelerating Orthogonality-Based Optimization with Pre-Conditioning
Authors: Thibaut Boissin (IRIT-MISFIT), Thomas Massena (DTIPG - SNCF, IRIT-MISFIT), Franck Mamalet, Mathieu Serrurier (IRIT-MISFIT)
Subjects: Artificial Intelligence (cs.AI)

Orthogonality-based optimizers, such as Muon, have recently shown strong performance across large-scale training and community-driven efficiency challenges. However, these methods rely on a costly gradient orthogonalization step. Even efficient iterative approximations such as Newton-Schulz remain expensive, typically requiring dozens of matrix multiplications to converge. We introduce a preconditioning procedure that accelerates Newton-Schulz convergence and reduces its computational cost. We evaluate its impact and show that the overhead of our preconditioning can be made negligible. Furthermore, the faster convergence it enables allows us to remove one iteration out of the usual five without degrading approximation quality. Our publicly available implementation achieves up to a 2.8x speedup in the Newton-Schulz approximation. We also show that this has a direct impact on end-to-end training runtime with 5-10% improvement in realistic training scenarios across two efficiency-focused tasks. On challenging language or vision tasks, we validate that our method maintains equal or superior model performance while improving runtime. Crucially, these improvements require no hyperparameter tuning and can be adopted as a simple drop-in replacement. Our code is publicly available on github.

[258]  arXiv:2512.04634 [pdf, ps, other]
Title: Interface layers and coupling conditions for discrete kinetic models on networks: a spectral approac
Subjects: Numerical Analysis (math.NA)

We consider kinetic and related macroscopic equations on networks. A class of linear kinetic BGK models is considered, where the limit equation for small Knudsen numbers is given by the wave equation. Coupling conditions for the macroscopic equations are obtained from the kinetic coupling conditions via an asymptotic analysis near the nodes of the network and the consideration of coupled solutions of kinetic half-space problems. Analytical results are obtained for a discrete velocity version of the coupled half-space problems. Moreover, an efficient spectral method is developed to solve the coupled discrete velocity half-space problems. In particular, this allows to determine the relevant coefficients in the coupling conditions for the macroscopic equations
from the underlying kinetic network problem. These coefficients correspond to the so-called extrapolation length for kinetic boundary value problems. Numerical results show the accuracy and fast convergence of the approach. Moreover, a comparison of the kinetic solution on the network with the macroscopic solution is presented.

[259]  arXiv:2512.04635 [pdf, ps, other]
Title: Federated Learning for Anomaly Detection in Maritime Movement Data
Comments: Accepted at MDM2024
Subjects: Machine Learning (cs.LG)

This paper introduces M3fed, a novel solution for federated learning of movement anomaly detection models. This innovation has the potential to improve data privacy and reduce communication costs in machine learning for movement anomaly detection. We present the novel federated learning (FL) strategies employed to train M3fed, perform an example experiment with maritime AIS data, and evaluate the results with respect to communication costs and FL model quality by comparing classic centralized M3 and the new federated M3fed.

[260]  arXiv:2512.04639 [pdf, ps, other]
Title: When GenAI Meets Fake News: Understanding Image Cascade Dynamics on Reddit
Comments: Accepted at 2025 MIT Undergraduate Research Technology Conference (URTC'25)
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)

AI-generated content and misinformation are increasingly prevalent on social networks. While prior research primarily examined textual misinformation, fewer studies have focused on visual content's role in virality. In this work, we present the first large-scale analysis of how misinformation and AI-generated images propagate through repost cascades across five ideologically diverse Reddit communities. By integrating textual sentiment, visual attributes, and diffusion metrics (e.g., time-to-first repost, community reach), our framework accurately predicts both immediate post-level virality (AUC=0.83) and long-term cascade-level spread (AUC=0.998). These findings offer essential insights for moderating synthetic and misleading visual content online.

[261]  arXiv:2512.04643 [pdf, ps, other]
Title: SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Video Large Language Models (VideoLLMs) have shown remarkable progress in video understanding. However, these models still struggle to effectively perceive and exploit rich temporal information in videos when responding to user queries. Therefore, they often generate descriptions of events that are temporal inconsistent or causally implausible, causing severe hallucination issues. While most prior studies have focused on spatial hallucinations (e.g. object mismatches), temporal reasoning in video understanding remains relatively underexplored. To address this issue, we propose Self-Diagnostic Contrastive Decoding (SEASON), a training-free method that adaptively enhances temporal and spatial faithfulness for each output token. It achieves this by dynamically diagnosing each token's hallucination tendency and applying adaptive contrastive decoding against its corresponding temporal and spatial negatives. Extensive experiments demonstrate that SEASON outperforms all existing training-free hallucination mitigation approaches on three hallucination examination benchmarks, while further improves VideoLLMs across four general video understanding benchmarks. The code will be released upon acceptance.

[262]  arXiv:2512.04644 [pdf, ps, other]
Title: Contract-Governed Training for Earth Observation: Observed Service Agreement Graphs and Coverage-Accuracy Trade-offs
Authors: Wenzhang Du
Comments: 9 pages, 2 figures
Subjects: Machine Learning (cs.LG)

Earth observation (EO) models are frequently trained under implicit sampling policies that optimize global accuracy but provide no explicit guarantees on who (which regions, classes, or mission-critical strata) is being served throughout training. This paper introduces a contract-governed training paradigm for EO in which training samples are grouped into service contracts -- semantically meaningful units such as (dataset, region, rare-crop indicator) -- and each contract is assigned a target service share. We instantiate this paradigm as an Observed Service Agreement Graph (OSAG), a lightweight governance layer that (i) monitors contract-level exposure (coverage) during optimization, (ii) drives empirical coverage toward target shares via contract-normalized sampling weights, and (iii) exposes explicit accuracy-governance trade-offs through two knobs: a sampling mixture coefficient alpha and a contract-regularization weight lambda_C. We provide a compact theory in a toy setting: OSAG sampling concentrates empirical coverage to targets; coverage deviations upper-bound service-risk deviations; and contract design (coarse vs. fine) modulates governance cost. Experiments on AVIRIS hyperspectral scenes (Indian Pines plus Salinas) and multispectral Sentinel-2 EuroSAT demonstrate that OSAG can substantially reduce priority coverage error while maintaining global accuracy and improving high-priority accuracy. A EuroSAT coarse-vs-fine contract ablation further evidences how semantically refined contracts can reduce the accuracy cost per unit of governance improvement.

[263]  arXiv:2512.04647 [pdf, ps, other]
Title: Auto-Optimization with Active Learning in Uncertain Environment: A Predictive Control Approach
Subjects: Systems and Control (eess.SY)

This paper presents an auto-optimal model predictive control (MPC) framework enhanced with active learning, designed to autonomously track optimal operational conditions in an unknown environment,where the conditions may dynamically adjust to environmental changes. First, an exploitation-oriented MPC (EO-MPC) is proposed, integrating real-time sampling data with robust set-based parameter estimation techniques to address the critical challenge of parameter identification. By introducing virtual excitation signals into the terminal constraint and establishing a validation mechanism for persistent excitation condition, the EO-MPC effectively resolves the issue of insufficient persistent excitation in parameter identification. Building upon this foundation, an active learning MPC (AL-MPC) approach is developed to integrate both available and virtual future data to resolve the fundamental conflict between tracking an unknown optimal operational condition and parameter identification. The recursive feasibility and convergence of the proposed methods are rigorously established, and numerous examples substantiate the reliability and effectiveness of the approach in practical applications.

[264]  arXiv:2512.04652 [pdf, ps, other]
Title: Quantised Academic Mobility: Network and Cluster Analysis of Degree Switching, Plan Changes, and Re-entries in an Engineering Faculty (1980-2019)
Authors: H. R. Paz
Comments: 23 pages, 4 tables , 5 figures
Subjects: Computers and Society (cs.CY)

This study challenges the traditional binary view of student progression (retention versus dropout) by conceptualising academic trajectories as complex, quantised pathways. Utilising a 40-year longitudinal dataset from an Argentine engineering faculty (N = 24,016), we introduce CAPIRE, an analytical framework that differentiates between degree major switches, curriculum plan changes, and same-plan re-entries. While 73.3 per cent of students follow linear trajectories (Estables), a significant 26.7 per cent exhibit complex mobility patterns. By applying Principal Component Analysis (PCA) and DBSCAN clustering, we reveal that these trajectories are not continuous but structurally quantised, occupying discrete bands of complexity. The analysis identifies six distinct student archetypes, including 'Switchers' (10.7 per cent) who reorient vocationally, and 'Stable Re-entrants' (6.9 per cent) who exhibit stop-out behaviours without changing discipline. Furthermore, network analysis highlights specific 'hub majors' - such as electronics and computing - that act as systemic attractors. These findings suggest that student flux is an organised ecosystemic feature rather than random noise, offering institutions a new lens for curriculum analytics and predictive modelling.

[265]  arXiv:2512.04653 [pdf, ps, other]
Title: Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control
Comments: Co-first authors: Pouria Yazdani and Arash Rezaali
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Multi-agent reinforcement learning (MARL) has emerged as a promising paradigm for adaptive traffic signal control (ATSC) of multiple intersections. Existing approaches typically follow either a fully centralized or a fully decentralized design. Fully centralized approaches suffer from the curse of dimensionality, and reliance on a single learning server, whereas purely decentralized approaches operate under severe partial observability and lack explicit coordination resulting in suboptimal performance. These limitations motivate region-based MARL, where the network is partitioned into smaller, tightly coupled intersections that form regions, and training is organized around these regions. This paper introduces a Semi-Centralized Training, Decentralized Execution (SEMI-CTDE) architecture for multi intersection ATSC. Within each region, SEMI-CTDE performs centralized training with regional parameter sharing and employs composite state and reward formulations that jointly encode local and regional information. The architecture is highly transferable across different policy backbones and state-reward instantiations. Building on this architecture, we implement two models with distinct design objectives. A multi-perspective experimental analysis of the two implemented SEMI-CTDE-based models covering ablations of the architecture's core elements including rule based and fully decentralized baselines shows that they achieve consistently superior performance and remain effective across a wide range of traffic densities and distributions.

[266]  arXiv:2512.04660 [pdf, ps, other]
Title: I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image editing models are advancing rapidly, yet comprehensive evaluation remains a significant challenge. Existing image editing benchmarks generally suffer from limited task scopes, insufficient evaluation dimensions, and heavy reliance on manual annotations, which significantly constrain their scalability and practical applicability. To address this, we propose \textbf{I2I-Bench}, a comprehensive benchmark for image-to-image editing models, which features (i) diverse tasks, encompassing 10 task categories across both single-image and multi-image editing tasks, (ii) comprehensive evaluation dimensions, including 30 decoupled and fine-grained evaluation dimensions with automated hybrid evaluation methods that integrate specialized tools and large multimodal models (LMMs), and (iii) rigorous alignment validation, justifying the consistency between our benchmark evaluations and human preferences. Using I2I-Bench, we benchmark numerous mainstream image editing models, investigating the gaps and trade-offs between editing models across various dimensions. We will open-source all components of I2I-Bench to facilitate future research.

[267]  arXiv:2512.04668 [pdf, ps, other]
Title: Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
Comments: Under review at ACL Rolling Review
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA (Multi-Agent Memory Attack), a framework that measures how network structure shapes leakage. MAMA operates on synthetic documents containing labeled Personally Identifiable Information (PII) entities, from which we generate sanitized task instructions. We execute a two-phase protocol: Engram (seeding private information into a target agent's memory) and Resonance (multi-round interaction where an attacker attempts extraction). Over up to 10 interaction rounds, we quantify leakage as the fraction of ground-truth PII recovered from attacking agent outputs via exact matching. We systematically evaluate six common network topologies (fully connected, ring, chain, binary tree, star, and star-ring), varying agent counts $n\in\{4,5,6\}$, attacker-target placements, and base models. Our findings reveal consistent patterns: fully connected graphs exhibit maximum leakage while chains provide strongest protection; shorter attacker-target graph distance and higher target centrality significantly increase vulnerability; leakage rises sharply in early rounds before plateauing; model choice shifts absolute leakage rates but preserves topology rankings; temporal/locational PII attributes leak more readily than identity credentials or regulated identifiers. These results provide the first systematic mapping from architectural choices to measurable privacy risk, yielding actionable guidance: prefer sparse or hierarchical connectivity, maximize attacker-target separation, limit node degree and network radius, avoid shortcuts bypassing hubs, and implement topology-aware access controls.

[268]  arXiv:2512.04673 [pdf, ps, other]
Title: Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models
Subjects: Software Engineering (cs.SE)

Large Language Models (LLMs) have revolutionized both general natural language processing and domain-specific applications such as code synthesis, legal reasoning, and finance. However, while prior studies have explored individual model capabilities, a systematic cross-domain comparison that unifies linguistic, reasoning, and code understanding abilities remains underexplored. In this work, we present a comprehensive evaluation of five general-purpose and three code-specific state-of-the-art LLMs across six diverse benchmarks encompassing linguistic competence, mathematical reasoning, and trustworthiness. Additionally, we analyze model behavior on the CoNaLa dataset for code explanation, comparing natural language and code-specialized LLMs. Our findings reveal that models optimized for code (e.g., CodeLLaMA variants) exhibit strong reasoning and syntactic precision, that even for non-coding tasks can show measurable performance gains, in contrast to general-purpose models like Mistral-7B and Llama-3-8B.

[269]  arXiv:2512.04675 [pdf, ps, other]
Title: Cryptanalysis of Gleeok-128
Comments: 44 pages, 5 figures
Subjects: Cryptography and Security (cs.CR)

Gleeok is a family of low latency keyed pseudorandom functions (PRFs) consisting of three parallel SPN based permutations whose outputs are XORed to form the final value. Both Gleeok-128 and Gleeok-256 use a 256 bit key, with block sizes of 128 and 256 bits, respectively. Owing to its multi branch structure, evaluating security margins and mounting effective key recovery attacks present nontrivial challenges. This paper provides the first comprehensive third party cryptanalysis of Gleeok-128. We introduce a two stage MILP based framework for constructing branch wise and full cipher differential linear (DL) distinguishers, together with an integral based key recovery framework tailored to multi branch designs. Our DL analysis yields 7, 7, 8, and 4 round distinguishers for Branch 1, Branch 2, Branch 3, and Gleeok-128, respectively, with squared correlations approximately 2 to the power minus 88.12, 2 to the power minus 88.12, 2 to the power minus 38.73, and 2 to the power minus 49.04, outperforming those in the design document except for the full PRF case. By tightening algebraic degree bounds, we further derive 9, 9, and 7 round integral distinguishers for the three branches and a 7 round distinguisher for the full PRF, extending the designers results by 3, 3, and 2 rounds and by 2 rounds, respectively. These integral properties enable 7 round and 8 round key recovery attacks in the non full codebook and full codebook settings. In addition, we identify a flaw in the original linear security evaluation of Branch 3, showing that it can be distinguished over all 12 rounds with data complexity about 2 to the power 48. We also propose optimized linear layer parameters that significantly improve linear resistance without sacrificing diffusion. Our results advance the understanding of Gleeok-128 and provide general methods for analyzing multi branch symmetric designs.

[270]  arXiv:2512.04676 [pdf, ps, other]
Title: A Unified Low-rank ADI Framework with Shared Linear Solves for Simultaneously Solving Multiple Lyapunov, Sylvester, and Riccati Equations
Subjects: Systems and Control (eess.SY); Numerical Analysis (math.NA)

It is known in the literature that the low-rank ADI method for Lyapunov equations is a Petrov-Galerkin projection algorithm that implicitly performs model order reduction. In this paper, we show that the low-rank ADI methods for Sylvester and Riccati equations are also Petrov-Galerkin projection algorithms that implicitly perform model order reduction. By observing that the ADI methods for Lyapunov, Sylvester, and Riccati equations differ only in pole placement and not in their interpolatory nature, we show that the shifted linear solves-which constitute the bulk of the computational cost-can be shared. The pole-placement step involves only small-scale operations and is therefore inexpensive. We propose a unified ADI framework that requires only two shifted linear solves per iteration to simultaneously solve six Lyapunov equations, one Sylvester equation, and ten Riccati equations, thus substantially increasing the return on investment for the computational cost spent on the linear solves. All operations needed to extract the individual solutions from these shared linear solves are small-scale and inexpensive.
Since all ADI methods implicitly perform model order reduction when solving these linear matrix equations, we show that the resulting reduced-order models can be obtained as an additional byproduct. These models not only interpolate the original transfer function at the mirror images of the ADI shifts but also preserve important system properties such as stability, minimum-phase property, positive-realness, bounded-realness, and passivity. Consequently, the proposed unified ADI framework also serves as a recursive, interpolation-based model order reduction method, which can preserve several important properties of the original model in the reduced-order model.

[271]  arXiv:2512.04677 [pdf, ps, other]
Title: Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing diffusion-based video generation methods are fundamentally constrained by sequential computation and long-horizon inconsistency, limiting their practical adoption in real-time, streaming audio-driven avatar synthesis. We present Live Avatar, an algorithm-system co-designed framework that enables efficient, high-fidelity, and infinite-length avatar generation using a 14-billion-parameter diffusion model. Our approach introduces Timestep-forcing Pipeline Parallelism (TPP), a distributed inference paradigm that pipelines denoising steps across multiple GPUs, effectively breaking the autoregressive bottleneck and ensuring stable, low-latency real-time streaming. To further enhance temporal consistency and mitigate identity drift and color artifacts, we propose the Rolling Sink Frame Mechanism (RSFM), which maintains sequence fidelity by dynamically recalibrating appearance using a cached reference image. Additionally, we leverage Self-Forcing Distribution Matching Distillation to facilitate causal, streamable adaptation of large-scale models without sacrificing visual quality. Live Avatar demonstrates state-of-the-art performance, reaching 20 FPS end-to-end generation on 5 H800 GPUs, and, to the best of our knowledge, is the first to achieve practical, real-time, high-fidelity avatar generation at this scale. Our work establishes a new paradigm for deploying advanced diffusion models in industrial long-form video synthesis applications.

[272]  arXiv:2512.04678 [pdf, ps, other]
Title: Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Efficient streaming video generation is critical for simulating interactive and dynamic worlds. Existing methods distill few-step video diffusion models with sliding window attention, using initial frames as sink tokens to maintain attention performance and reduce error accumulation. However, video frames become overly dependent on these static tokens, resulting in copied initial frames and diminished motion dynamics. To address this, we introduce Reward Forcing, a novel framework with two key designs. First, we propose EMA-Sink, which maintains fixed-size tokens initialized from initial frames and continuously updated by fusing evicted tokens via exponential moving average as they exit the sliding window. Without additional computation cost, EMA-Sink tokens capture both long-term context and recent dynamics, preventing initial frame copying while maintaining long-horizon consistency. Second, to better distill motion dynamics from teacher models, we propose a novel Rewarded Distribution Matching Distillation (Re-DMD). Vanilla distribution matching treats every training sample equally, limiting the model's ability to prioritize dynamic content. Instead, Re-DMD biases the model's output distribution toward high-reward regions by prioritizing samples with greater dynamics rated by a vision-language model. Re-DMD significantly enhances motion quality while preserving data fidelity. We include both quantitative and qualitative experiments to show that Reward Forcing achieves state-of-the-art performance on standard benchmarks while enabling high-quality streaming video generation at 23.1 FPS on a single H100 GPU.

[273]  arXiv:2512.04679 [pdf, ps, other]
Title: Timely Information for Strategic Persuasion
Subjects: Information Theory (cs.IT); Computer Science and Game Theory (cs.GT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP); Systems and Control (eess.SY)

This work investigates a dynamic variant of Bayesian persuasion, in which a strategic sender seeks to influence a receiver's belief over time through controlling the timing of the information disclosure, under resource constraints. We consider a binary information source (i.e., taking values 0 or 1), where the source's state evolve according to a continuous-time Markov chain (CTMC). In this setting, the receiver aims to estimate the source's state as accurately as possible. In contrast, the sender seeks to persuade the receiver to estimate the state to be 1, regardless of whether this estimate reflects the true state. This misalignment between their objectives naturally leads to a Stackelberg game formulation where the sender, acting as the leader, chooses an information-revelation policy, and the receiver, as the follower, decides whether to follow the sender's messages. As a result, the sender's objective is to maximize the long-term average time that the receiver's estimate equals 1, subject to a total sampling constraint and a constraint for the receiver to follow the sender's messages called incentive compatibility (IC) constraint. We first consider the single-source problem and show that the sender's optimal policy is to allocate a minimal sampling rate to the undesired state 0 (just enough to satisfy the IC constraint) and assign the remaining sampling rate to the desired state 1. Next, we extend the analysis to the multi-source case, where each source has a different minimal sampling rate. Our results show that the sender can leverage the timeliness of the revealed information to influence the receiver, thereby achieving a higher utility.

[274]  arXiv:2512.04680 [pdf, ps, other]
Title: Generative AI for Self-Adaptive Systems: State of the Art and Research Roadmap
Comments: Accepted by ACM Transactions on Autonomous and Adaptive Systems
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Self-adaptive systems (SASs) are designed to handle changes and uncertainties through a feedback loop with four core functionalities: monitoring, analyzing, planning, and execution. Recently, generative artificial intelligence (GenAI), especially the area of large language models, has shown impressive performance in data comprehension and logical reasoning. These capabilities are highly aligned with the functionalities required in SASs, suggesting a strong potential to employ GenAI to enhance SASs. However, the specific benefits and challenges of employing GenAI in SASs remain unclear. Yet, providing a comprehensive understanding of these benefits and challenges is complex due to several reasons: limited publications in the SAS field, the technological and application diversity within SASs, and the rapid evolution of GenAI technologies. To that end, this paper aims to provide researchers and practitioners a comprehensive snapshot that outlines the potential benefits and challenges of employing GenAI's within SAS. Specifically, we gather, filter, and analyze literature from four distinct research fields and organize them into two main categories to potential benefits: (i) enhancements to the autonomy of SASs centered around the specific functions of the MAPE-K feedback loop, and (ii) improvements in the interaction between humans and SASs within human-on-the-loop settings. From our study, we outline a research roadmap that highlights the challenges of integrating GenAI into SASs. The roadmap starts with outlining key research challenges that need to be tackled to exploit the potential for applying GenAI in the field of SAS. The roadmap concludes with a practical reflection, elaborating on current shortcomings of GenAI and proposing possible mitigation strategies.

[275]  arXiv:2512.04683 [pdf, ps, other]
Title: Geschlechtsübergreifende Maskulina im Sprachgebrauch Eine korpusbasierte Untersuchung zu lexemspezifischen Unterschieden
Comments: 32 pages, 8 figures
Subjects: Computation and Language (cs.CL)

This study examines the distribution and linguistic characteristics of generic masculines (GM) in contemporary German press texts. The use of masculine personal nouns to refer to mixed-gender groups or unspecified individuals has been widely debated in academia and the public, with con-flicting perspectives on its gender-neutrality. While psycholinguistic studies suggest that GM is more readily associated with male referents, corpus-based analyses of its actual use remain scarce. We investigate GM in a large corpus of press texts, focusing on lexeme-specific differences across dif-ferent types of personal nouns. We conducted manual annotations of the whole inflectional para-digm of 21 personal nouns, resulting in 6,195 annotated tokens. Our findings reveal considerable differences between lexical items, especially between passive role nouns and prestige-related per-sonal nouns. On a grammatical level, we find that GM occurs predominantly in the plural and in indefinite noun phrases. Furthermore, our data shows that GM is not primarily used to denote entire classes of people, as has been previously claimed. By providing an empirical insight into the use of GM in authentic written language, we contribute to a more nuanced understanding of its forms and manifestations. These findings provide a solid basis for aligning linguistic stimuli in psy-cholinguistic studies more closely with real-world language use.

[276]  arXiv:2512.04686 [pdf, ps, other]
Title: Towards Cross-View Point Correspondence in Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Cross-view correspondence is a fundamental capability for spatial understanding and embodied AI. However, it is still far from being realized in Vision-Language Models (VLMs), especially in achieving precise point-level correspondence, which is crucial for precise affordance interaction. So we propose the Cross-View Point Correspondence (CVPC) task and CrossPoint-Bench, a comprehensive benchmark with hierarchical design, inspired by the human cognitive process of "perceive", "reason", and "correspond". Our evaluation shows the state-of-the-art models (e.g., Gemini-2.5-Pro) still fall far behind humans, with a gap of over 54.65% in overall accuracy, exposing a challenge in transitioning from coarse-grained judgement to fine-grained coordinate prediction. To address this problem, we construct CrossPoint-378K, a dataset with 378K question-answering pairs across 900 scenes, focused on actionable affordance regions that better reflect real-world manipulation and interaction scenarios. Furthermore, we propose CroPond that trained on the CrossPoint-378K dataset. Our CroPond achieves state-of-the-art performance on CrossPoint-Bench, surpassing Gemini-2.5-Pro by 39.7% accuracy, which offers a foundation for advancing future work on cross-view correspondence. The benchmark, dataset, and model are publicly available at https://github.com/WangYipu2002/CrossPoint.

[277]  arXiv:2512.04687 [pdf, ps, other]
Title: Intuitionistic modal logic LIK4 is decidable
Subjects: Logic in Computer Science (cs.LO)

In this note, we prove that intuitionistic modal logic LIK4 is decidable.

[278]  arXiv:2512.04691 [pdf, ps, other]
Title: Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective
Comments: Accepted to LaMAS 2026@AAAI'26 (this https URL)
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multiagent Systems (cs.MA)

Large language models (LLMs) have been widely deployed in various applications, often functioning as autonomous agents that interact with each other in multi-agent systems. While these systems have shown promise in enhancing capabilities and enabling complex tasks, they also pose significant ethical challenges. This position paper outlines a research agenda aimed at ensuring the ethical behavior of multi-agent systems of LLMs (MALMs) from the perspective of mechanistic interpretability. We identify three key research challenges: (i) developing comprehensive evaluation frameworks to assess ethical behavior at individual, interactional, and systemic levels; (ii) elucidating the internal mechanisms that give rise to emergent behaviors through mechanistic interpretability; and (iii) implementing targeted parameter-efficient alignment techniques to steer MALMs towards ethical behaviors without compromising their performance.

[279]  arXiv:2512.04692 [pdf, ps, other]
Title: Interactive Communication -- cross-disciplinary perspectives from psychology, acoustics, and media technology
Subjects: Human-Computer Interaction (cs.HC)

Interactive communication (IC), i.e., the reciprocal exchange of information between two or more interactive partners, is a fundamental part of human nature. As such, it has been studied across multiple scientific disciplines with different goals and methods. This article provides a cross-disciplinary primer on contemporary IC that integrates psychological mechanisms with acoustic and media-technological constraints across theory, measurement, and applications. First, we outline theoretical frameworks that account for verbal, nonverbal and multimodal aspects of IC, including distinctions between face-to-face and computer-mediated communication. Second, we summarize key methodological approaches, including behavioral, cognitive, and experiential measures of communicative synchrony and acoustic signal quality. Third, we discuss selected applications, i.e. assistive listening technologies, conversational agents, alongside ethical considerations. Taken together, this review highlights how human capacities and technical systems jointly shape IC, consolidating concepts, findings, and challenges that have often been discussed in separate lines of research.

[280]  arXiv:2512.04694 [pdf, ps, other]
Title: TimesNet-Gen: Deep Learning-based Site Specific Strong Motion Generation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Effective earthquake risk reduction relies on accurate site-specific evaluations. This requires models that can represent the influence of local site conditions on ground motion characteristics. In this context, data driven approaches that learn site controlled signatures from recorded ground motions offer a promising direction. We address strong ground motion generation from time-domain accelerometer records and introduce the TimesNet-Gen, a time-domain conditional generator. The approach uses a station specific latent bottleneck. We evaluate generation by comparing HVSR curves and fundamental site-frequency $f_0$ distributions between real and generated records per station, and summarize station specificity with a score based on the $f_0$ distribution confusion matrices. TimesNet-Gen achieves strong station-wise alignment and compares favorably with a spectrogram-based conditional VAE baseline for site-specific strong motion synthesis. Our codes are available via https://github.com/brsylmz23/TimesNet-Gen.

[281]  arXiv:2512.04695 [pdf, ps, other]
Title: TRINITY: An Evolved LLM Coordinator
Subjects: Machine Learning (cs.LG)

Combining diverse foundation models is promising, but weight-merging is limited by mismatched architectures and closed APIs. Trinity addresses this with a lightweight coordinator that orchestrates collaboration among large language models (LLMs). The coordinator, comprising a compact language model (approximately $0.6$B parameters) and a lightweight head (approximately $10$K parameters), is optimized with an evolutionary strategy for efficient and adaptive delegation. Trinity processes queries over multiple turns, where at each turn the coordinator assigns one of three roles (Thinker, Worker, or Verifier) to a selected LLM, effectively offloading complex skill acquisition from the coordinator itself. Experiments show that Trinity consistently outperforms individual models and existing methods across coding, math, reasoning, and domain knowledge tasks, and generalizes robustly to out-of-distribution tasks. On standard benchmarks, Trinity achieves state-of-the-art results, including a score of 86.2% on LiveCodeBench. Theoretical and empirical analyses identify two main factors behind this performance: (1) the coordinator's hidden-state representations provide rich contextualization of inputs, and (2) under high dimensionality and strict budget constraints, the separable Covariance Matrix Adaptation Evolution Strategy offers advantages over reinforcement learning, imitation learning, and random search by exploiting potential block-epsilon-separability.

[282]  arXiv:2512.04699 [pdf, ps, other]
Title: OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-Resolution
Comments: Accepted as TCSVT, 15 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Arbitrary-scale super-resolution (ASSR) overcomes the limitation of traditional super-resolution (SR) methods that operate only at fixed scales (e.g., 4x), enabling a single model to handle arbitrary magnification. Most existing ASSR approaches rely on implicit neural representation (INR), but its regression-driven feature extraction and aggregation intrinsically limit the ability to synthesize fine details, leading to low realism. Recent diffusion-based realistic image super-resolution (Real-ISR) models leverage powerful pre-trained diffusion priors and show impressive results at the 4x setting. We observe that they can also achieve ASSR because the diffusion prior implicitly adapts to scale by encouraging high-realism generation. However, without explicit scale control, the diffusion process cannot be properly adjusted for different magnification levels, resulting in excessive hallucination or blurry outputs, especially under ultra-high scales. To address these issues, we propose OmniScaleSR, a diffusion-based realistic arbitrary-scale SR framework designed to achieve both high fidelity and high realism. We introduce explicit, diffusion-native scale control mechanisms that work synergistically with implicit scale adaptation, enabling scale-aware and content-aware modulation of the diffusion process. In addition, we incorporate multi-domain fidelity enhancement designs to further improve reconstruction accuracy. Extensive experiments on bicubic degradation benchmarks and real-world datasets show that OmniScaleSR surpasses state-of-the-art methods in both fidelity and perceptual realism, with particularly strong performance at large magnification factors. Code will be released at https://github.com/chaixinning/OmniScaleSR.

[283]  arXiv:2512.04702 [pdf, ps, other]
Title: POLARIS: Is Multi-Agentic Reasoning the Next Wave in Engineering Self-Adaptive Systems?
Comments: Accepted as a short paper at SEAMS 2026
Subjects: Software Engineering (cs.SE)

The growing scale, complexity, interconnectivity, and autonomy of modern software ecosystems introduce unprecedented uncertainty, challenging the foundations of traditional self-adaptation. Existing approaches, typically rule-driven controllers or isolated learning components, struggle to generalize to novel contexts or coordinate responses across distributed subsystems, leaving them ill-equipped for emergent unknown unknowns. Recent discussions on Self-Adaptation 2.0 emphasize an equal partnership between AI and adaptive systems, merging learning-driven intelligence with adaptive control for predictive and proactive behavior. Building on this foundation, we introduce POLARIS, a three-layer multi-agentic self-adaptation framework that advances beyond reactive adaptation. POLARIS integrates: (1) a low-latency Adapter layer for monitoring and safe execution, (2) a transparent Reasoning layer that generates and verifies plans using tool-aware, explainable agents, and (3) a Meta layer that records experiences and meta-learns improved adaptation policies over time. Through shared knowledge and predictive models, POLARIS handles uncertainty, learns from past actions, and evolves its strategies, enabling systems that anticipate change and maintain resilient, goal-directed behavior. Preliminary evaluation on two self-adaptive exemplars, SWIM and SWITCH, shows that POLARIS consistently outperforms state-of-the-art baselines. We argue this marks a shift toward Self-Adaptation 3.0, akin to Software 3.0: a paradigm where systems not only learn from their environment but also reason about and evolve their own adaptation processes, continuously improving to meet novel challenges.

[284]  arXiv:2512.04703 [pdf, ps, other]
Title: Towards Continuous-Time Approximations for Stochastic Gradient Descent without Replacement
Authors: Stefan Perko
Subjects: Machine Learning (cs.LG); Probability (math.PR)

Gradient optimization algorithms using epochs, that is those based on stochastic gradient descent without replacement (SGDo), are predominantly used to train machine learning models in practice. However, the mathematical theory of SGDo and related algorithms remain underexplored compared to their "with replacement" and "one-pass" counterparts. In this article, we propose a stochastic, continuous-time approximation to SGDo with additive noise based on a Young differential equation driven by a stochastic process we call an "epoched Brownian motion". We show its usefulness by proving the almost sure convergence of the continuous-time approximation for strongly convex objectives and learning rate schedules of the form $u_t = \frac{1}{(1+t)^\beta}, \beta \in (0,1)$. Moreover, we compute an upper bound on the asymptotic rate of almost sure convergence, which is as good or better than previous results for SGDo.

[285]  arXiv:2512.04705 [pdf, ps, other]
Title: Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators
Comments: Submitted to IEEE Transactions on Emerging Topics in Computing
Subjects: Computational Complexity (cs.CC); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)

Advancements in high-performance computing and cloud technologies have enabled the development of increasingly sophisticated Deep Learning (DL) models. However, the growing demand for embedded intelligence at the edge imposes stringent computational and energy constraints, challenging the deployment of these large-scale models. Early Exiting Neural Networks (EENN) have emerged as a promising solution, allowing dynamic termination of inference based on input complexity to enhance efficiency. Despite their potential, EENN performance is highly influenced by the heterogeneity of edge accelerators and the constraints imposed by quantization, affecting accuracy, energy efficiency, and latency. Yet, research on the automatic optimization of EENN design for edge hardware remains limited. To bridge this gap, we propose a hardware-aware Neural Architecture Search (NAS) framework that systematically integrates the effects of quantization and hardware resource allocation to optimize the placement of early exit points within a network backbone. Experimental results on the CIFAR-10 dataset demonstrate that our NAS framework can discover architectures that achieve over a 50\% reduction in computational costs compared to conventional static networks, making them more suitable for deployment in resource-constrained edge environments.

[286]  arXiv:2512.04711 [pdf, ps, other]
Title: Large Speech Model Enabled Semantic Communication
Comments: 15 pages, 9 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

Existing speech semantic communication systems mainly based on Joint Source-Channel Coding (JSCC) architectures have demonstrated impressive performance, but their effectiveness remains limited by model structures specifically designed for particular tasks and datasets. Recent advances indicate that generative large models pre-trained on massive datasets, can achieve outstanding performance arexhibit exceptional performance across diverse downstream tasks with minimal fine-tuning. To exploit the rich semantic knowledge embedded in large models and enable adaptive transmission over lossy channels, we propose a Large Speech Model enabled Semantic Communication (LargeSC) system. Simultaneously achieving adaptive compression and robust transmission over lossy channels remains challenging, requiring trade-offs among compression efficiency, speech quality, and latency. In this work, we employ the Mimi as a speech codec, converting speech into discrete tokens compatible with existing network architectures. We propose an adaptive controller module that enables adaptive transmission and in-band Unequal Error Protection (UEP), dynamically adjusting to both speech content and packet loss probability under bandwidth constraints. Additionally, we employ Low-Rank Adaptation (LoRA) to finetune the Moshi foundation model for generative recovery of lost speech tokens. Simulation results show that the proposed system supports bandwidths ranging from 550 bps to 2.06 kbps, outperforms conventional baselines in speech quality under high packet loss rates and achieves an end-to-end latency of approximately 460 ms, thereby demonstrating its potential for real-time deployment.

[287]  arXiv:2512.04714 [pdf, ps, other]
Title: Playing the Player: A Heuristic Framework for Adaptive Poker AI
Comments: 49 pages, 39 figures. White Paper by Spiderdime Systems
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

For years, the discourse around poker AI has been dominated by the concept of solvers and the pursuit of unexploitable, machine-perfect play. This paper challenges that orthodoxy. It presents Patrick, an AI built on the contrary philosophy: that the path to victory lies not in being unexploitable, but in being maximally exploitative. Patrick's architecture is a purpose-built engine for understanding and attacking the flawed, psychological, and often irrational nature of human opponents. Through detailed analysis of its design, its novel prediction-anchored learning method, and its profitable performance in a 64,267-hand trial, this paper makes the case that the solved myth is a distraction from the real, far more interesting challenge: creating AI that can master the art of human imperfection.

[288]  arXiv:2512.04720 [pdf, ps, other]
Title: M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis
Comments: Submitted to ICASSP 2026
Subjects: Sound (cs.SD)

Non-autoregressive (NAR) text-to-speech synthesis relies on length alignment between text sequences and audio representations, constraining naturalness and expressiveness. Existing methods depend on duration modeling or pseudo-alignment strategies that severely limit naturalness and computational efficiency. We propose M3-TTS, a concise and efficient NAR TTS paradigm based on multi-modal diffusion transformer (MM-DiT) architecture. M3-TTS employs joint diffusion transformer layers for cross-modal alignment, achieving stable monotonic alignment between variable-length text-speech sequences without pseudo-alignment requirements. Single diffusion transformer layers further enhance acoustic detail modeling. The framework integrates a mel-vae codec that provides 3* training acceleration. Experimental results on Seed-TTS and AISHELL-3 benchmarks demonstrate that M3-TTS achieves state-of-the-art NAR performance with the lowest word error rates (1.36\% English, 1.31\% Chinese) while maintaining competitive naturalness scores. Code and demos will be available at https://wwwwxp.github.io/M3-TTS.

[289]  arXiv:2512.04727 [pdf, ps, other]
Title: Sequential Enumeration in Large Language Models
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Reliably counting and generating sequences of items remain a significant challenge for neural networks, including Large Language Models (LLMs). Indeed, although this capability is readily handled by rule-based symbolic systems based on serial computation, learning to systematically deploy counting procedures is difficult for neural models, which should acquire these skills through learning. Previous research has demonstrated that recurrent architectures can only approximately track and enumerate sequences of events, and it remains unclear whether modern deep learning systems, including LLMs, can deploy systematic counting procedures over sequences of discrete symbols. This paper aims to fill this gap by investigating the sequential enumeration abilities of five state-of-the-art LLMs, including proprietary, open-source, and reasoning models. We probe LLMs in sequential naming and production tasks involving lists of letters and words, adopting a variety of prompting instructions to explore the role of chain-of-thought in the spontaneous emerging of counting strategies. We also evaluate open-source models with the same architecture but increasing size to see whether the mastering of counting principles follows scaling laws, and we analyze the embedding dynamics during sequential enumeration to investigate the emergent encoding of numerosity. We find that some LLMs are indeed capable of deploying counting procedures when explicitly prompted to do so, but none of them spontaneously engage in counting when simply asked to enumerate the number of items in a sequence. Our results suggest that, despite their impressive emergent abilities, LLMs cannot yet robustly and systematically deploy counting procedures, highlighting a persistent gap between neural and symbolic approaches to compositional generalization.

[290]  arXiv:2512.04728 [pdf, ps, other]
Title: Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Generative psychological analysis of in-the-wild conversations faces two fundamental challenges: (1) existing Vision-Language Models (VLMs) fail to resolve Articulatory-Affective Ambiguity, where visual patterns of speech mimic emotional expressions; and (2) progress is stifled by a lack of verifiable evaluation metrics capable of assessing visual grounding and reasoning depth. We propose a complete ecosystem to address these twin challenges. First, we introduce Multilevel Insight Network for Disentanglement(MIND), a novel hierarchical visual encoder that introduces a Status Judgment module to algorithmically suppress ambiguous lip features based on their temporal feature variance, achieving explicit visual disentanglement. Second, we construct ConvoInsight-DB, a new large-scale dataset with expert annotations for micro-expressions and deep psychological inference. Third, Third, we designed the Mental Reasoning Insight Rating Metric (PRISM), an automated dimensional framework that uses expert-guided LLM to measure the multidimensional performance of large mental vision models. On our PRISM benchmark, MIND significantly outperforms all baselines, achieving a +86.95% gain in micro-expression detection over prior SOTA. Ablation studies confirm that our Status Judgment disentanglement module is the most critical component for this performance leap. Our code has been opened.

[291]  arXiv:2512.04729 [pdf, ps, other]
Title: Weighted total variation regularization for inverse problems with significant null spaces
Subjects: Numerical Analysis (math.NA)

We consider inverse problems with large null spaces, which arise in important applications such as in inverse ECG and EEG procedures. Standard regularization methods typically produce solutions in or near the orthogonal complement of the forward operator's null space. This often leads to inadequate results, where internal sources are mistakenly interpreted as being near the data acquisition sites -- e.g., near or at the body surface in connection with EEG and ECG recordings.
To mitigate this, we previously proposed weighting schemes for Tikhonov and sparsity regularization. Here, we extend this approach to total variation (TV) regularization, which is particularly suited for identifying spatially extended regions with approximately constant values. We introduce a weighted TV-regularization method, provide supporting analysis, and demonstrate its performance through numerical experiments. Unlike standard TV regularization, the weighted version successfully recovers the location and size of large, piecewise constant sources away from the boundary, though not their exact shape.
Additionally, we explore a hybrid weighted-sparsity and TV regularization approach, which better captures both small and large sources, albeit with somewhat more blurred reconstructions than the weighted TV method alone.

[292]  arXiv:2512.04731 [pdf, ps, other]
Title: Bridging Simulation and Reality: Cross-Domain Transfer with Semantic 2D Gaussian Splatting
Subjects: Robotics (cs.RO)

Cross-domain transfer in robotic manipulation remains a longstanding challenge due to the significant domain gap between simulated and real-world environments. Existing methods such as domain randomization, adaptation, and sim-real calibration often require extensive tuning or fail to generalize to unseen scenarios. To address this issue, we observe that if domain-invariant features are utilized during policy training in simulation, and the same features can be extracted and provided as the input to policy during real-world deployment, the domain gap can be effectively bridged, leading to significantly improved policy generalization. Accordingly, we propose Semantic 2D Gaussian Splatting (S2GS), a novel representation method that extracts object-centric, domain-invariant spatial features. S2GS constructs multi-view 2D semantic fields and projects them into a unified 3D space via feature-level Gaussian splatting. A semantic filtering mechanism removes irrelevant background content, ensuring clean and consistent inputs for policy learning. To evaluate the effectiveness of S2GS, we adopt Diffusion Policy as the downstream learning algorithm and conduct experiments in the ManiSkill simulation environment, followed by real-world deployment. Results demonstrate that S2GS significantly improves sim-to-real transferability, maintaining high and stable task performance in real-world scenarios.

[293]  arXiv:2512.04733 [pdf, ps, other]
Title: E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

End-to-end autonomous driving (AD) systems increasingly adopt vision-language-action (VLA) models, yet they typically ignore the passenger's emotional state, which is central to comfort and AD acceptance. We introduce Open-Domain End-to-End (OD-E2E) autonomous driving, where an autonomous vehicle (AV) must interpret free-form natural-language commands, infer the emotion, and plan a physically feasible trajectory. We propose E3AD, an emotion-aware VLA framework that augments semantic understanding with two cognitively inspired components: a continuous Valenc-Arousal-Dominance (VAD) emotion model that captures tone and urgency from language, and a dual-pathway spatial reasoning module that fuses egocentric and allocentric views for human-like spatial cognition. A consistency-oriented training scheme, combining modality pretraining with preference-based alignment, further enforces coherence between emotional intent and driving actions. Across real-world datasets, E3AD improves visual grounding and waypoint planning and achieves state-of-the-art (SOTA) VAD correlation for emotion estimation. These results show that injecting emotion into VLA-style driving yields more human-aligned grounding, planning, and human-centric feedback.

[294]  arXiv:2512.04734 [pdf, ps, other]
Title: MT-Depth: Multi-task Instance feature analysis for the Depth Completion
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Depth completion plays a vital role in 3D perception systems, especially in scenarios where sparse depth data must be densified for tasks such as autonomous driving, robotics, and augmented reality. While many existing approaches rely on semantic segmentation to guide depth completion, they often overlook the benefits of object-level understanding. In this work, we introduce an instance-aware depth completion framework that explicitly integrates binary instance masks as spatial priors to refine depth predictions. Our model combines four main components: a frozen YOLO V11 instance segmentation branch, a U-Net-based depth completion backbone, a cross-attention fusion module, and an attention-guided prediction head. The instance segmentation branch generates per-image foreground masks that guide the depth branch via cross-attention, allowing the network to focus on object-centric regions during refinement. We validate our method on the Virtual KITTI 2 dataset, showing that it achieves lower RMSE compared to both a U-Net-only baseline and previous semantic-guided methods, while maintaining competitive MAE. Qualitative and quantitative results demonstrate that the proposed model effectively enhances depth accuracy near object boundaries, occlusions, and thin structures. Our findings suggest that incorporating instance-aware cues offers a promising direction for improving depth completion without relying on dense semantic labels.

[295]  arXiv:2512.04735 [pdf, ps, other]
Title: A Fast Ethereum-Compatible Forkless Database
Subjects: Databases (cs.DB)

The State Database of a blockchain stores account data and enables authentication. Modern blockchains use fast consensus protocols to avoid forking, improving throughput and finality. However, Ethereum's StateDB was designed for a forking chain that maintains multiple state versions. While newer blockchains adopt Ethereum's standard for DApp compatibility, they do not require multiple state versions, making legacy Ethereum databases inefficient for fast, non-forking blockchains. Moreover, existing StateDB implementations have been built on key-value stores (e.g., LevelDB), which make them less efficient.
This paper introduces a novel state database that is a native database implementation and maintains Ethereum compatibility while being specialized for non-forking blockchains. Our database delivers ten times speedups and 99% space reductions for validators, and a threefold decrease in storage requirements for archive nodes.

[296]  arXiv:2512.04737 [pdf, ps, other]
Title: Recent advances in the numerical solution of multi-order fractional differential equations
Comments: 35 pages, 8 figures, 4 tables
Subjects: Numerical Analysis (math.NA)

The efficient numerical solution of fractional differential equations has been recently tackled through the definition of Fractional HBVMs (FHBVMs), a class of Runge-Kutta type methods. Corresponding Matlab (c) codes have been also made available on the internet, proving to be very competitive w.r.t. existing ones. However, so far, FHBVMs have been given for solving systems of fractional differential equations with the same order of fractional derivative, whereas the numerical solution of multi-order problems (i.e., problems in which different orders of fractional derivatives occur) has not been handled, yet. Due to their relevance in applications, in this paper we propose an extension of FHBVMs for addressing fractional multi-order problems, providing full details for such an approach. A corresponding Matlab (c) code, handling the case of two different fractional orders, is also made available, proving very effective for numerically solving these problems.

[297]  arXiv:2512.04738 [pdf, ps, other]
Title: OsmT: Bridging OpenStreetMap Queries and Natural Language with Open-source Tag-aware Language Models
Comments: 42nd IEEE International Conference on Data Engineering (ICDE)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB)

Bridging natural language and structured query languages is a long-standing challenge in the database community. While recent advances in language models have shown promise in this direction, existing solutions often rely on large-scale closed-source models that suffer from high inference costs, limited transparency, and lack of adaptability for lightweight deployment. In this paper, we present OsmT, an open-source tag-aware language model specifically designed to bridge natural language and Overpass Query Language (OverpassQL), a structured query language for accessing large-scale OpenStreetMap (OSM) data. To enhance the accuracy and structural validity of generated queries, we introduce a Tag Retrieval Augmentation (TRA) mechanism that incorporates contextually relevant tag knowledge into the generation process. This mechanism is designed to capture the hierarchical and relational dependencies present in the OSM database, addressing the topological complexity inherent in geospatial query formulation. In addition, we define a reverse task, OverpassQL-to-Text, which translates structured queries into natural language explanations to support query interpretation and improve user accessibility. We evaluate OsmT on a public benchmark against strong baselines and observe consistent improvements in both query generation and interpretation. Despite using significantly fewer parameters, our model achieves competitive accuracy, demonstrating the effectiveness of open-source pre-trained language models in bridging natural language and structured query languages within schema-rich geospatial environments.

[298]  arXiv:2512.04742 [pdf, ps, other]
Title: Rotatable Antenna-Enhanced Cell-Free Communication
Subjects: Information Theory (cs.IT)

Rotatable antenna (RA) is a promising technology that can exploit new spatial degrees-of-freedom (DoFs) by flexibly adjusting the three-dimensional (3D) boresight direction of antennas. In this letter, we investigate an RA-enhanced cell-free system for downlink transmission, where multiple RA-equipped access points (APs) cooperatively serve multiple single-antenna users over the same time-frequency resource. Specifically, we aim to maximize the sum rate of all users by jointly optimizing the AP-user associations and the RA boresight directions. Accordingly, we propose a two-stage strategy to solve the AP-user association problem, and then employ fractional programming (FP) and successive convex approximation (SCA) techniques to optimize the RA boresight directions. Numerical results demonstrate that the proposed RA-enhanced cell-free system significantly outperforms various benchmark schemes.

[299]  arXiv:2512.04746 [pdf, ps, other]
Title: SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Extreme low-bit quantization is critical for efficiently deploying Large Language Models (LLMs), yet it often leads to severe performance degradation at 2-bits and even 4-bits (e.g., MXFP4). We present SignRoundV2, a post-training quantization framework that is highly effective even without mixed-precision. SignRoundV2 introduces (1) a fast sensitivity metric that combines gradient information with quantization-induced deviations to guide layer-wise bit allocation, and (2) a lightweight pre-tuning search for quantization scales to improve extremely low-bit quantization. These components allow SignRoundV2 to close the gap with full-precision models. Extensive experiments indicate that our method sustains competitive accuracy for LLMs, achieving production-grade performance with about 1 percent variance at 4-5 bits and strong results even at 2 bits. The implementation is available at https://github.com/intel/auto-round.

[300]  arXiv:2512.04747 [pdf, ps, other]
Title: A Tutorial on Regression Analysis: From Linear Models to Deep Learning -- Lecture Notes on Artificial Intelligence
Subjects: Machine Learning (cs.LG)

This article serves as the regression analysis lecture notes in the Intelligent Computing course cluster (including the courses of Artificial Intelligence, Data Mining, Machine Learning, and Pattern Recognition). It aims to provide students -- who are assumed to possess only basic university-level mathematics (i.e., with prerequisite courses in calculus, linear algebra, and probability theory) -- with a comprehensive and self-contained understanding of regression analysis without requiring any additional references. The lecture notes systematically introduce the fundamental concepts, modeling components, and theoretical foundations of regression analysis, covering linear regression, logistic regression, multinomial logistic regression, polynomial regression, basis-function models, kernel-based methods, and neural-network-based nonlinear regression. Core methodological topics include loss-function design, parameter-estimation principles, ordinary least squares, gradient-based optimization algorithms and their variants, as well as regularization techniques such as Ridge and LASSO regression. Through detailed mathematical derivations, illustrative examples, and intuitive visual explanations, the materials help students understand not only how regression models are constructed and optimized, but also how they reveal the underlying relationships between features and response variables. By bridging classical statistical modeling and modern machine-learning practice, these lecture notes aim to equip students with a solid conceptual and technical foundation for further study in advanced artificial intelligence models.

[301]  arXiv:2512.04748 [pdf, ps, other]
Title: Model Whisper: Steering Vectors Unlock Large Language Models' Potential in Test-time
Comments: accepted to aaai2026
Subjects: Computation and Language (cs.CL)

It is a critical challenge to efficiently unlock the powerful reasoning potential of Large Language Models (LLMs) for specific tasks or new distributions. Existing test-time adaptation methods often require tuning model parameters, which is not only computationally expensive but also risks degrading the model's pre-existing abilities.To address this, we introduce a lightweight component, Test-Time Steering Vectors (TTSV), which is prepended to the input while keeping the LLM's parameters entirely frozen. By optimizing the TTSV on test data to minimize the model's output entropy, we steer the model towards an internal state of higher confidence, activating its inherent abilities most relevant to the current task. TTSV is both lightweight and highly efficient to optimize, making it a true plug-and-play enhancement. Extensive experiments validate our approach's effectiveness on both base models and reasoning-enhanced models. For instance, on the MATH500 task, TTSV achieves a 45.88% relative performance gain on the Qwen2.5-Math-7B model and a 16.22% relative gain on the Qwen3-4B model. Furthermore, our approach exhibits robust generalization, with its steering vectors proving highly transferable across diverse tasks.

[302]  arXiv:2512.04750 [pdf, ps, other]
Title: Robust Precoding Designs of RSMA for Multiuser MIMO Systems
Comments: This work has been accepted for publication in IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

Rate-splitting multiple access (RSMA) has been studied for multiuser multiple-input multiple-output (MUMIMO) systems especially in the presence of imperfect channel state information (CSI) at the transmitter. However, its precoding designs that maximize the sum rate normally have high computational complexity. To implement an efficient RSMA scheme for the MU-MIMO system, in this work, we propose a novel robust precoding design, which can handle imperfect CSI. Specifically, we first adopt the generalized mutual information to construct a lower bound of the objective function in the sum rate maximization problem. Then, we apply a smooth lower bound of the non-smooth sum rate objective function to construct a new optimization problem. By revealing the relationship between the generalized signal-to-interference-plus-noise ratio and the minimum mean square error matrices, we transform the constructed problem into a tractable one. After decomposing the transformed problem into three subproblems, we investigate a new alternating precoding design based on sequential solutions. Simulation results demonstrate that the proposed precoding scheme achieves comparable performance to conventional methods, while significantly reducing the computational complexity.

[303]  arXiv:2512.04751 [pdf, ps, other]
Title: NAWOA-XGBoost: A Novel Model for Early Prediction of Academic Potential in Computer Science Students
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Whale Optimization Algorithm (WOA) suffers from limited global search ability, slow convergence, and tendency to fall into local optima, restricting its effectiveness in hyperparameter optimization for machine learning models. To address these issues, this study proposes a Nonlinear Adaptive Whale Optimization Algorithm (NAWOA), which integrates strategies such as Good Nodes Set initialization, Leader-Followers Foraging, Dynamic Encircling Prey, Triangular Hunting, and a nonlinear convergence factor to enhance exploration, exploitation, and convergence stability. Experiments on 23 benchmark functions demonstrate NAWOA's superior optimization capability and robustness. Based on this optimizer, an NAWOA-XGBoost model was developed to predict academic potential using data from 495 Computer Science undergraduates at Macao Polytechnic University (2009-2019). Results show that NAWOA-XGBoost outperforms traditional XGBoost and WOA-XGBoost across key metrics, including Accuracy (0.8148), Macro F1 (0.8101), AUC (0.8932), and G-Mean (0.8172), demonstrating strong adaptability on multi-class imbalanced datasets.

[304]  arXiv:2512.04752 [pdf, ps, other]
Title: RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting
Subjects: Machine Learning (cs.LG)

Reinforcement Learning from Human Feedback (RLHF) is an important fine-tuning technique for large language models (LLMs) and comprises three stages: generation, inference, and training. The generation stage generates samples that are then used to infer learnable experiences for training. We observe that the generation stage is the bottleneck of the entire execution process and consider it a key point for optimization. Specifically, we realize the first attempt to integrate speculative decoding into the RLHF generation stage and propose RLHFSpec, an RLHF system that accelerates generation execution with adaptive speculative decoding and sample reallocation. To fully exploit the performance potential provided by speculative decoding, especially dealing with the dynamic workload of the generation stage, RLHFSpec proposes a workload-aware drafting strategy selection mechanism, which selects the near-optimal strategy by jointly considering the verification cost and the number of accepted tokens. Moreover, RLHFSpec also proposes sample reallocation to fully utilize the GPU resources, and optimizes it with an efficient sample migration mechanism. The experimental results show that the RLHFSpec can achieve higher throughput in the generation stage compared to state-of-the-art works. Moreover, due to the effective alleviation of the generation bottleneck, RLHFSpec also shows significant performance speedup in the entire RLHF execution.

[305]  arXiv:2512.04753 [pdf, ps, other]
Title: EtCon: Edit-then-Consolidate for Reliable Knowledge Editing
Subjects: Computation and Language (cs.CL)

Knowledge editing aims to update specific facts in large language models (LLMs) without full retraining. Prior efforts sought to tune the knowledge layers of LLMs, proving effective for making selective edits. However, a significant gap exists between their performance in controlled, teacher-forcing evaluations and their real-world effectiveness in lifelong learning scenarios, which greatly limits their practical applicability. This work's empirical analysis reveals two recurring issues associated with this gap: (1) Most traditional methods lead the edited model to overfit to the new fact, thereby degrading pre-trained capabilities; (2) There is a critical absence of a knowledge consolidation stage, leaving new facts insufficiently integrated into LLMs' inference-time behavior under autoregressive generation, thereby leading to a mismatch between parametric knowledge and actual generation behavior. To this end, we propose Edit-then-Consolidate, a novel knowledge editing paradigm that aims to bridge the gap between theoretical knowledge editing methods and their real-world applicability. Specifically, (1) our framework mitigates overfitting via Targeted Proximal Supervised Fine-Tuning (TPSFT) that localizes the edit via a trust-region objective to limit policy drift; (2) Then, a consolidation stage using Group Relative Policy Optimization (GRPO) aligns the edited knowledge with CoT-based inference policy by optimizing trajectory-level behavior under comprehensive reward signals. Extensive experiments demonstrate our framework consistently improves editing reliability and generalization under real-world evaluations, while better preserving locality and pre-trained capabilities.

[306]  arXiv:2512.04755 [pdf, ps, other]
Title: Typing Fallback Functions: A Semantic Approach to Type Safe Smart Contracts
Subjects: Programming Languages (cs.PL)

This paper develops semantic typing in a smart-contract setting to ensure type safety of code that uses statically untypable language constructs, such as the fallback function. The idea is that the creator of a contract on the blockchain equips code containing such constructs with a formal proof of its type safety, given in terms of the semantics of types. Then, a user of the contract only needs to check the validity of the provided `proof certificate' of type safety. This is a form of proof-carrying code, which naturally fits with the immutable nature of the blockchain environment.
As a concrete application of our approach, we focus on ensuring information flow control and non-interference for the language TINYSOL, a distilled version of the Solidity language, through security types. We provide the semantics of types in terms of a typed operational semantics of TINYSOL, and a way for expressing the proofs of safety as coinductively-defined typing interpretations and for representing them compactly via up-to techniques, similar to those used for bisimilarity. We also show how our machinery can be used to type the typical pointer-to-implementation pattern based on the fallback function. However, our main contribution is not the safety theorem per se (and so security properties different from non-interference can be considered as well), but rather the presentation of the theoretical developments necessary to make this approach work in a blockchain/smart-contract setting.

[307]  arXiv:2512.04759 [pdf, ps, other]
Title: Challenging the Abilities of Large Language Models in Italian: a Community Initiative
Subjects: Computation and Language (cs.CL)

The rapid progress of Large Language Models (LLMs) has transformed natural language processing and broadened its impact across research and society. Yet, systematic evaluation of these models, especially for languages beyond English, remains limited. "Challenging the Abilities of LAnguage Models in ITAlian" (CALAMITA) is a large-scale collaborative benchmarking initiative for Italian, coordinated under the Italian Association for Computational Linguistics. Unlike existing efforts that focus on leaderboards, CALAMITA foregrounds methodology: it federates more than 80 contributors from academia, industry, and the public sector to design, document, and evaluate a diverse collection of tasks, covering linguistic competence, commonsense reasoning, factual consistency, fairness, summarization, translation, and code generation. Through this process, we not only assembled a benchmark of over 20 tasks and almost 100 subtasks, but also established a centralized evaluation pipeline that supports heterogeneous datasets and metrics. We report results for four open-weight LLMs, highlighting systematic strengths and weaknesses across abilities, as well as challenges in task-specific evaluation. Beyond quantitative results, CALAMITA exposes methodological lessons: the necessity of fine-grained, task-representative metrics, the importance of harmonized pipelines, and the benefits and limitations of broad community engagement. CALAMITA is conceived as a rolling benchmark, enabling continuous integration of new tasks and models. This makes it both a resource -- the most comprehensive and diverse benchmark for Italian to date -- and a framework for sustainable, community-driven evaluation. We argue that this combination offers a blueprint for other languages and communities seeking inclusive and rigorous LLM evaluation practices.

[308]  arXiv:2512.04761 [pdf, ps, other]
Title: Order Matters: 3D Shape Generation from Sequential VR Sketches
Subjects: Computer Vision and Pattern Recognition (cs.CV)

VR sketching lets users explore and iterate on ideas directly in 3D, offering a faster and more intuitive alternative to conventional CAD tools. However, existing sketch-to-shape models ignore the temporal ordering of strokes, discarding crucial cues about structure and design intent. We introduce VRSketch2Shape, the first framework and multi-category dataset for generating 3D shapes from sequential VR sketches. Our contributions are threefold: (i) an automated pipeline that generates sequential VR sketches from arbitrary shapes, (ii) a dataset of over 20k synthetic and 900 hand-drawn sketch-shape pairs across four categories, and (iii) an order-aware sketch encoder coupled with a diffusion-based 3D generator. Our approach yields higher geometric fidelity than prior work, generalizes effectively from synthetic to real sketches with minimal supervision, and performs well even on partial sketches. All data and models will be released open-source at https://chenyizi086.github.io/VRSketch2Shape_website.

[309]  arXiv:2512.04763 [pdf, ps, other]
Title: MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Memory-augmented Large Language Models (LLMs) have demonstrated remarkable consistency during prolonged dialogues by storing relevant memories and incorporating them as context. Such memory-based personalization is also key in on-device settings that allow users to keep their conversations and data private. However, memory-augmented systems typically rely on LLMs that are too costly for local on-device deployment. Even though Small Language Models (SLMs) are more suitable for on-device inference than LLMs, they cannot achieve sufficient performance. Additionally, these LLM-based systems lack native visual capabilities, limiting their applicability in multimodal contexts. In this paper, we introduce (i) MemLoRA, a novel memory system that enables local deployment by equipping SLMs with specialized memory adapters, and (ii) its vision extension MemLoRA-V, which integrates small Vision-Language Models (SVLMs) to memory systems, enabling native visual understanding. Following knowledge distillation principles, each adapter is trained separately for specific memory operations$\unicode{x2013}$knowledge extraction, memory update, and memory-augmented generation. Equipped with memory adapters, small models enable accurate on-device memory operations without cloud dependency. On text-only operations, MemLoRA outperforms 10$\times$ larger baseline models (e.g., Gemma2-27B) and achieves performance comparable to 60$\times$ larger models (e.g., GPT-OSS-120B) on the LoCoMo benchmark. To evaluate visual understanding operations instead, we extend LoCoMo with challenging Visual Question Answering tasks that require direct visual reasoning. On this, our VLM-integrated MemLoRA-V shows massive improvements over caption-based approaches (81.3 vs. 23.7 accuracy) while keeping strong performance in text-based tasks, demonstrating the efficacy of our method in multimodal contexts.

[310]  arXiv:2512.04764 [pdf, ps, other]
Title: Human Cognitive Biases in Explanation-Based Interaction: The Case of Within and Between Session Order Effect
Comments: 18 pages, 10 figures, published at AAAI 2026
Subjects: Artificial Intelligence (cs.AI)

Explanatory Interactive Learning (XIL) is a powerful interactive learning framework designed to enable users to customize and correct AI models by interacting with their explanations. In a nutshell, XIL algorithms select a number of items on which an AI model made a decision (e.g. images and their tags) and present them to users, together with corresponding explanations (e.g. image regions that drive the model's decision). Then, users supply corrective feedback for the explanations, which the algorithm uses to improve the model. Despite showing promise in debugging tasks, recent studies have raised concerns that explanatory interaction may trigger order effects, a well-known cognitive bias in which the sequence of presented items influences users' trust and, critically, the quality of their feedback. We argue that these studies are not entirely conclusive, as the experimental designs and tasks employed differ substantially from common XIL use cases, complicating interpretation. To clarify the interplay between order effects and explanatory interaction, we ran two larger-scale user studies (n = 713 total) designed to mimic common XIL tasks. Specifically, we assessed order effects both within and between debugging sessions by manipulating the order in which correct and wrong explanations are presented to participants. Order effects had a limited, through significant impact on users' agreement with the model (i.e., a behavioral measure of their trust), and only when examined withing debugging sessions, not between them. The quality of users' feedback was generally satisfactory, with order effects exerting only a small and inconsistent influence in both experiments. Overall, our findings suggest that order effects do not pose a significant issue for the successful employment of XIL approaches. More broadly, our work contributes to the ongoing efforts for understanding human factors in AI.

[311]  arXiv:2512.04765 [pdf, ps, other]
Title: AdiBhashaa: A Community-Curated Benchmark for Machine Translation into Indian Tribal Languages
Subjects: Computation and Language (cs.CL)

Large language models and multilingual machine translation (MT) systems increasingly drive access to information, yet many languages of the tribal communities remain effectively invisible in these technologies. This invisibility exacerbates existing structural inequities in education, governance, and digital participation. We present AdiBhashaa, a community-driven initiative that constructs the first open parallel corpora and baseline MT systems for four major Indian tribal languages-Bhili, Mundari, Gondi, and Santali. This work combines participatory data creation with native speakers, human-in-the-loop validation, and systematic evaluation of both encoder-decoder MT models and large language models. In addition to reporting technical findings, we articulate how AdiBhashaa illustrates a possible model for more equitable AI research: it centers local expertise, builds capacity among early-career researchers from marginalized communities, and foregrounds human validation in the development of language technologies.

[312]  arXiv:2512.04770 [pdf, ps, other]
Title: Embodied Co-Design for Rapidly Evolving Agents: Taxonomy, Frontiers, and Challenges
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Systems and Control (eess.SY)

Brain-body co-evolution enables animals to develop complex behaviors in their environments. Inspired by this biological synergy, embodied co-design (ECD) has emerged as a transformative paradigm for creating intelligent agents-from virtual creatures to physical robots-by jointly optimizing their morphologies and controllers rather than treating control in isolation. This integrated approach facilitates richer environmental interactions and robust task performance. In this survey, we provide a systematic overview of recent advances in ECD. We first formalize the concept of ECD and position it within related fields. We then introduce a hierarchical taxonomy: a lower layer that breaks down agent design into three fundamental components-controlling brain, body morphology, and task environment-and an upper layer that integrates these components into four major ECD frameworks: bi-level, single-level, generative, and open-ended. This taxonomy allows us to synthesize insights from more than one hundred recent studies. We further review notable benchmarks, datasets, and applications in both simulated and real-world scenarios. Finally, we identify significant challenges and offer insights into promising future research directions. A project associated with this survey has been created at https://github.com/Yuxing-Wang-THU/SurveyBrainBody.

[313]  arXiv:2512.04771 [pdf, ps, other]
Title: Complementary Characterization of Agent-Based Models via Computational Mechanics and Diffusion Models
Authors: Roberto Garrone
Comments: 11 pages. Methods paper introducing a dual-domain framework for analyzing ABM dynamics. Companion temporal-analysis preprint: arXiv:2510.12729
Subjects: Multiagent Systems (cs.MA); Machine Learning (cs.LG)

This article extends the preprint "Characterizing Agent-Based Model Dynamics via $\epsilon$-Machines and Kolmogorov-Style Complexity" by introducing diffusion models as orthogonal and complementary tools for characterizing the output of agent-based models (ABMs). Where $\epsilon$-machines capture the predictive temporal structure and intrinsic computation of ABM-generated time series, diffusion models characterize high-dimensional cross-sectional distributions, learn underlying data manifolds, and enable synthetic generation of plausible population-level outcomes. We provide a formal analysis demonstrating that the two approaches operate on distinct mathematical domains -- processes vs. distributions -- and show that their combination yields a two-axis representation of ABM behavior based on temporal organization and distributional geometry. To our knowledge, this is the first framework to integrate computational mechanics with score-based generative modeling for the structural analysis of ABM outputs, thereby situating ABM characterization within the broader landscape of modern machine-learning methods for density estimation and intrinsic computation. The framework is validated using the same elder-caregiver ABM dataset introduced in the companion paper, and we provide precise definitions and propositions formalizing the mathematical complementarity between $\epsilon$-machines and diffusion models. This establishes a principled methodology for jointly analyzing temporal predictability and high-dimensional distributional structure in complex simulation models.

[314]  arXiv:2512.04772 [pdf, ps, other]
Title: TEMPO-VINE: A Multi-Temporal Sensor Fusion Dataset for Localization and Mapping in Vineyards
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

In recent years, precision agriculture has been introducing groundbreaking innovations in the field, with a strong focus on automation. However, research studies in robotics and autonomous navigation often rely on controlled simulations or isolated field trials. The absence of a realistic common benchmark represents a significant limitation for the diffusion of robust autonomous systems under real complex agricultural conditions. Vineyards pose significant challenges due to their dynamic nature, and they are increasingly drawing attention from both academic and industrial stakeholders interested in automation. In this context, we introduce the TEMPO-VINE dataset, a large-scale multi-temporal dataset specifically designed for evaluating sensor fusion, simultaneous localization and mapping (SLAM), and place recognition techniques within operational vineyard environments. TEMPO-VINE is the first multi-modal public dataset that brings together data from heterogeneous LiDARs of different price levels, AHRS, RTK-GPS, and cameras in real trellis and pergola vineyards, with multiple rows exceeding 100 m in length. In this work, we address a critical gap in the landscape of agricultural datasets by providing researchers with a comprehensive data collection and ground truth trajectories in different seasons, vegetation growth stages, terrain and weather conditions. The sequence paths with multiple runs and revisits will foster the development of sensor fusion, localization, mapping and place recognition solutions for agricultural fields. The dataset, the processing tools and the benchmarking results will be available at the dedicated webpage upon acceptance.

[315]  arXiv:2512.04773 [pdf, ps, other]
Title: Using Machine Learning to Take Stay-or-Go Decisions in Data-driven Drone Missions
Comments: 19 pages, 3 figures, to appear in the proceedings of MobiQuitous 2025
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Drones are becoming indispensable in many application domains. In data-driven missions, besides sensing, the drone must process the collected data at runtime to decide whether additional action must be taken on the spot, before moving to the next point of interest. If processing does not reveal an event or situation that requires such an action, the drone has waited in vain instead of moving to the next point. If, however, the drone starts moving to the next point and it turns out that a follow-up action is needed at the previous point, it must spend time to fly-back. To take this decision, we propose different machine-learning methods based on branch prediction and reinforcement learning. We evaluate these methods for a wide range of scenarios where the probability of event occurrence changes with time. Our results show that the proposed methods consistently outperform the regression-based method proposed in the literature and can significantly improve the worst-case mission time by up to 4.1x. Also, the achieved median mission time is very close, merely up to 2.7% higher, to that of a method with perfect knowledge of the current underlying event probability at each point of interest.

[316]  arXiv:2512.04776 [pdf, ps, other]
Title: Customer Identification for Electricity Retailers Based on Monthly Demand Profiles by Activity Sectors and Locations
Subjects: Computational Engineering, Finance, and Science (cs.CE)

The increasing competition in the electric sector is challenging retail companies as they must assign its commercial efforts to attract the most profitable customers. Those are whose energy demand best fit certain target profiles, which usually depend on generation or cost policies. But, even when the demand profile is available, it is in an anonymous way, preventing its association to a particular client. In this paper, we explore a large dataset containing several millions of monthly demand profiles in Spain and use the available information about the associated economic sector and location for an indirect identification of the customers. The distance of the demand profile from the target is used to define a key performance indicator (KPI) which is used as the main driver of the proposed marketing strategy. The combined use of activity and location has been revealed as a powerful tool for indirect identification of customers, as 100,000 customers are uniquely identified, while about 300,000 clients are identifiable in small sets containing 10 or less consumers. To assess the proposed marketing strategy, it has been compared to the random attraction of new clients, showing a reduction of distance from the target of 40% for 10,000 new customers.

[317]  arXiv:2512.04779 [pdf, ps, other]
Title: YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
Comments: 13 pages, 3 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

Singing Voice Synthesis (SVS) remains constrained in practical deployment due to its strong dependence on accurate phoneme-level alignment and manually annotated melody contours, requirements that are resource-intensive and hinder scalability. To overcome these limitations, we propose a melody-driven SVS framework capable of synthesizing arbitrary lyrics following any reference melody, without relying on phoneme-level alignment. Our method builds on a Diffusion Transformer (DiT) architecture, enhanced with a dedicated melody extraction module that derives melody representations directly from reference audio. To ensure robust melody encoding, we employ a teacher model to guide the optimization of the melody extractor, alongside an implicit alignment mechanism that enforces similarity distribution constraints for improved melodic stability and coherence. Additionally, we refine duration modeling using weakly annotated song data and introduce a Flow-GRPO reinforcement learning strategy with a multi-objective reward function to jointly enhance pronunciation clarity and melodic fidelity. Experiments show that our model achieves superior performance over existing approaches in both objective measures and subjective listening tests, especially in zero-shot and lyric adaptation settings, while maintaining high audio quality without manual annotation. This work offers a practical and scalable solution for advancing data-efficient singing voice synthesis. To support reproducibility, we release our inference code and model checkpoints.

[318]  arXiv:2512.04781 [pdf, ps, other]
Title: Pick-to-Learn for Systems and Control: Data-driven Synthesis with State-of-the-art Safety Guarantees
Comments: 27 double-column pages, 18 figures
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Data-driven methods have become paramount in modern systems and control problems characterized by growing levels of complexity. In safety-critical environments, deploying these methods requires rigorous guarantees, a need that has motivated much recent work at the interface of statistical learning and control. However, many existing approaches achieve this goal at the cost of sacrificing valuable data for testing and calibration, or by constraining the choice of learning algorithm, thus leading to suboptimal performances. In this paper, we describe Pick-to-Learn (P2L) for Systems and Control, a framework that allows any data-driven control method to be equipped with state-of-the-art safety and performance guarantees. P2L enables the use of all available data to jointly synthesize and certify the design, eliminating the need to set aside data for calibration or validation purposes. In presenting a comprehensive version of P2L for systems and control, this paper demonstrates its effectiveness across a range of core problems, including optimal control, reachability analysis, safe synthesis, and robust control. In many of these applications, P2L delivers designs and certificates that outperform commonly employed methods, and shows strong potential for broad applicability in diverse practical settings.

[319]  arXiv:2512.04784 [pdf, ps, other]
Title: PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Consistent image generation requires faithfully preserving identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and character design. Supervised training approaches struggle with this task due to the lack of large-scale datasets capturing visual consistency and the complexity of modeling human perceptual preferences. In this paper, we argue that reinforcement learning (RL) offers a promising alternative by enabling models to learn complex and subjective visual criteria in a data-free manner. To achieve this, we introduce PaCo-RL, a comprehensive framework that combines a specialized consistency reward model with an efficient RL algorithm. The first component, PaCo-Reward, is a pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and CoT reasons. The second component, PaCo-GRPO, leverages a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization. Extensive experiments across the two representative subtasks show that PaCo-Reward significantly improves alignment with human perceptions of visual consistency, and PaCo-GRPO achieves state-of-the-art consistency performance with improved training efficiency and stability. Together, these results highlight the promise of PaCo-RL as a practical and scalable solution for consistent image generation. The project page is available at https://x-gengroup.github.io/HomePage_PaCo-RL/.

[320]  arXiv:2512.04785 [pdf, ps, other]
Title: ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

AI agent-based systems are becoming increasingly integral to modern software architectures, enabling autonomous decision-making, dynamic task execution, and multimodal interactions through large language models (LLMs). However, these systems introduce novel and evolving security challenges, including prompt injection attacks, context poisoning, model manipulation, and opaque agent-to-agent communication, that are not effectively captured by traditional threat modeling frameworks. In this paper, we introduce ASTRIDE, an automated threat modeling platform purpose-built for AI agent-based systems. ASTRIDE extends the classical STRIDE framework by introducing a new threat category, A for AI Agent-Specific Attacks, which encompasses emerging vulnerabilities such as prompt injection, unsafe tool invocation, and reasoning subversion, unique to agent-based applications. To automate threat modeling, ASTRIDE combines a consortium of fine-tuned vision-language models (VLMs) with the OpenAI-gpt-oss reasoning LLM to perform end-to-end analysis directly from visual agent architecture diagrams, such as data flow diagrams(DFDs). LLM agents orchestrate the end-to-end threat modeling automation process by coordinating interactions between the VLM consortium and the reasoning LLM. Our evaluations demonstrate that ASTRIDE provides accurate, scalable, and explainable threat modeling for next-generation intelligent systems. To the best of our knowledge, ASTRIDE is the first framework to both extend STRIDE with AI-specific threats and integrate fine-tuned VLMs with a reasoning LLM to fully automate diagram-driven threat modeling in AI agent-based applications.

[321]  arXiv:2512.04786 [pdf, ps, other]
Title: LaFiTe: A Generative Latent Field for 3D Native Texturing
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generating high-fidelity, seamless textures directly on 3D surfaces, what we term 3D-native texturing, remains a fundamental open challenge, with the potential to overcome long-standing limitations of UV-based and multi-view projection methods. However, existing native approaches are constrained by the absence of a powerful and versatile latent representation, which severely limits the fidelity and generality of their generated textures. We identify this representation gap as the principal barrier to further progress. We introduce LaFiTe, a framework that addresses this challenge by learning to generate textures as a 3D generative sparse latent color field. At its core, LaFiTe employs a variational autoencoder (VAE) to encode complex surface appearance into a sparse, structured latent space, which is subsequently decoded into a continuous color field. This representation achieves unprecedented fidelity, exceeding state-of-the-art methods by >10 dB PSNR in reconstruction, by effectively disentangling texture appearance from mesh topology and UV parameterization. Building upon this strong representation, a conditional rectified-flow model synthesizes high-quality, coherent textures across diverse styles and geometries. Extensive experiments demonstrate that LaFiTe not only sets a new benchmark for 3D-native texturing but also enables flexible downstream applications such as material synthesis and texture super-resolution, paving the way for the next generation of 3D content creation workflows.

[322]  arXiv:2512.04790 [pdf, ps, other]
Title: Spatially-Enhanced Retrieval-Augmented Generation for Walkability and Urban Discovery
Subjects: Information Retrieval (cs.IR)

Large Language Models (LLMs) have become foundational tools in artificial intelligence, supporting a wide range of applications beyond traditional natural language processing, including urban systems and tourist recommendations. However, their tendency to hallucinate and their limitations in spatial retrieval and reasoning are well known, pointing to the need for novel solutions. Retrieval-augmented generation (RAG) has recently emerged as a promising way to enhance LLMs with accurate, domain-specific, and timely information. Spatial RAG extends this approach to tasks involving geographic understanding. In this work, we introduce WalkRAG, a spatial RAG-based framework with a conversational interface for recommending walkable urban itineraries. Users can request routes that meet specific spatial constraints and preferences while interactively retrieving information about the path and points of interest (POIs) along the way. Preliminary results show the effectiveness of combining information retrieval, spatial reasoning, and LLMs to support urban discovery.

[323]  arXiv:2512.04793 [pdf, ps, other]
Title: YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases
Comments: 17 pages, 5 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

Singing voice conversion (SVC) aims to render the target singer's timbre while preserving melody and lyrics. However, existing zero-shot SVC systems remain fragile in real songs due to harmony interference, F0 errors, and the lack of inductive biases for singing. We propose YingMusic-SVC, a robust zero-shot framework that unifies continuous pre-training, robust supervised fine-tuning, and Flow-GRPO reinforcement learning. Our model introduces a singing-trained RVC timbre shifter for timbre-content disentanglement, an F0-aware timbre adaptor for dynamic vocal expression, and an energy-balanced rectified flow matching loss to enhance high-frequency fidelity. Experiments on a graded multi-track benchmark show that YingMusic-SVC achieves consistent improvements over strong open-source baselines in timbre similarity, intelligibility, and perceptual naturalness, especially under accompanied and harmony-contaminated conditions, demonstrating its effectiveness for real-world SVC deployment.

[324]  arXiv:2512.04797 [pdf, ps, other]
Title: SIMA 2: A Generalist Embodied Agent for Virtual Worlds
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

We introduce SIMA 2, a generalist embodied agent that understands and acts in a wide variety of 3D virtual worlds. Built upon a Gemini foundation model, SIMA 2 represents a significant step toward active, goal-directed interaction within an embodied environment. Unlike prior work (e.g., SIMA 1) limited to simple language commands, SIMA 2 acts as an interactive partner, capable of reasoning about high-level goals, conversing with the user, and handling complex instructions given through language and images. Across a diverse portfolio of games, SIMA 2 substantially closes the gap with human performance and demonstrates robust generalization to previously unseen environments, all while retaining the base model's core reasoning capabilities. Furthermore, we demonstrate a capacity for open-ended self-improvement: by leveraging Gemini to generate tasks and provide rewards, SIMA 2 can autonomously learn new skills from scratch in a new environment. This work validates a path toward creating versatile and continuously learning agents for both virtual and, eventually, physical worlds.

[325]  arXiv:2512.04799 [pdf, ps, other]
Title: DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors
Subjects: Computation and Language (cs.CL)

We present an enhanced benchmark for evaluating linguistic acceptability in Danish. We first analyze the most common errors found in written Danish. Based on this analysis, we introduce a set of fourteen corruption functions that generate incorrect sentences by systematically introducing errors into existing correct Danish sentences. To ensure the accuracy of these corruptions, we assess their validity using both manual and automatic methods. The results are then used as a benchmark for evaluating Large Language Models on a linguistic acceptability judgement task. Our findings demonstrate that this extension is both broader and more comprehensive than the current state of the art. By incorporating a greater variety of corruption types, our benchmark provides a more rigorous assessment of linguistic acceptability, increasing task difficulty, as evidenced by the lower performance of LLMs on our benchmark compared to existing ones. Our results also suggest that our benchmark has a higher discriminatory power which allows to better distinguish well-performing models from low-performing ones.

[326]  arXiv:2512.04810 [pdf, ps, other]
Title: EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose EMMA, an efficient and unified architecture for multimodal understanding, generation and editing. Specifically, EMMA primarily consists of 1) An efficient autoencoder with a 32x compression ratio, which significantly reduces the number of tokens required for generation. This also ensures the training balance between understanding and generation tasks by applying the same compression ratio to images. 2) Channel-wise concatenation instead of token-wise concatenation among visual understanding and generation tokens, which further reduces the visual tokens in unified architectures. 3) A shared-and-decoupled network that enables mutual improvements across tasks while meeting the task-specific modeling requirements. 4) A mixture-of-experts mechanism adopted for visual understanding encoder, which substantially improves perceptual capabilities with a few parameters increase. Extensive experiments have shown that EMMA-4B can significantly outperform state-of-the-art unified multimodal approaches (e.g., BAGEL-7B) in both efficiency and performance, while also achieving competitive results compared to recent multimodal understanding and generation experts (e.g., Qwen3-VL and Qwen-Image). We believe that EMMA lays a solid foundation for the future development of unified multimodal architectures.

[327]  arXiv:2512.04812 [pdf, ps, other]
Title: Construction of the Nearest Nonnegative Hankel Matrix for a Prescribed Eigenpair
Subjects: Numerical Analysis (math.NA)

We study the problem of determining whether a prescribed eigenpair $(\lambda,x)$
can be made an exact eigenpair of a nonnegative Hankel matrix through the smallest
possible structured perturbation. The task reduces to check the feasibility of a
set of linear constraints that encode both the Hankel structure and entrywise
nonnegativity. When the feasibility set is nonempty, we compute the minimum-norm
perturbation $\Delta H$ such that $(H+\Delta H)x=\lambda x$. When no such perturbation
exists, we compute the nearest nonnegative Hankel matrix in a residual sense by
minimizing $\|(H+\Delta H)x-\lambda x\|_{2}$ subject to the imposed constraints.
Because closed-form formulas for the structured backward error are generally
unavailable, our method provides a fully numerical and optimization-based
framework for evaluating eigenpair sensitivity under nonnegativity-preserving
Hankel perturbations. Numerical examples illustrate both feasible and infeasible cases.

[328]  arXiv:2512.04813 [pdf, ps, other]
Title: MOVE: A Simple Motion-Based Data Collection Paradigm for Spatial Generalization in Robotic Manipulation
Comments: 9 pages, 9 figures
Subjects: Robotics (cs.RO)

Imitation learning method has shown immense promise for robotic manipulation, yet its practical deployment is fundamentally constrained by the data scarcity. Despite prior work on collecting large-scale datasets, there still remains a significant gap to robust spatial generalization. We identify a key limitation: individual trajectories, regardless of their length, are typically collected from a \emph{single, static spatial configuration} of the environment. This includes fixed object and target spatial positions as well as unchanging camera viewpoints, which significantly restricts the diversity of spatial information available for learning. To address this critical bottleneck in data efficiency, we propose \textbf{MOtion-Based Variability Enhancement} (\emph{MOVE}), a simple yet effective data collection paradigm that enables the acquisition of richer spatial information from dynamic demonstrations. Our core contribution is an augmentation strategy that injects motion into any movable objects within the environment for each demonstration. This process implicitly generates a dense and diverse set of spatial configurations within a single trajectory. We conduct extensive experiments in both simulation and real-world environments to validate our approach. For example, in simulation tasks requiring strong spatial generalization, \emph{MOVE} achieves an average success rate of 39.1\%, a 76.1\% relative improvement over the static data collection paradigm (22.2\%), and yields up to 2--5$\times$ gains in data efficiency on certain tasks. Our code is available at https://github.com/lucywang720/MOVE.

[329]  arXiv:2512.04814 [pdf, ps, other]
Title: Shared Multi-modal Embedding Space for Face-Voice Association
Comments: Ranked 1st in Fame 2026 Challenge, ICASSP
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)

The FAME 2026 challenge comprises two demanding tasks: training face-voice associations combined with a multilingual setting that includes testing on languages on which the model was not trained. Our approach consists of separate uni-modal processing pipelines with general face and voice feature extraction, complemented by additional age-gender feature extraction to support prediction. The resulting single-modal features are projected into a shared embedding space and trained with an Adaptive Angular Margin (AAM) loss. Our approach achieved first place in the FAME 2026 challenge, with an average Equal-Error Rate (EER) of 23.99%.

[330]  arXiv:2512.04815 [pdf, ps, other]
Title: RobustSplat++: Decoupling Densification, Dynamics, and Illumination for In-the-Wild 3DGS
Comments: arXiv admin note: substantial text overlap with arXiv:2506.02751
Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D Gaussian Splatting (3DGS) has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling. However, existing methods struggle with accurately modeling in-the-wild scenes affected by transient objects and illuminations, leading to artifacts in the rendered images. We identify that the Gaussian densification process, while enhancing scene detail capture, unintentionally contributes to these artifacts by growing additional Gaussians that model transient disturbances and illumination variations. To address this, we propose RobustSplat++, a robust solution based on several critical designs. First, we introduce a delayed Gaussian growth strategy that prioritizes optimizing static scene structure before allowing Gaussian splitting/cloning, mitigating overfitting to transient objects in early optimization. Second, we design a scale-cascaded mask bootstrapping approach that first leverages lower-resolution feature similarity supervision for reliable initial transient mask estimation, taking advantage of its stronger semantic consistency and robustness to noise, and then progresses to high-resolution supervision to achieve more precise mask prediction. Third, we incorporate the delayed Gaussian growth strategy and mask bootstrapping with appearance modeling to handling in-the-wild scenes including transients and illuminations. Extensive experiments on multiple challenging datasets show that our method outperforms existing methods, clearly demonstrating the robustness and effectiveness of our method.

[331]  arXiv:2512.04821 [pdf, ps, other]
Title: LatentFM: A Latent Flow Matching Approach for Generative Medical Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generative models have achieved remarkable progress with the emergence of flow matching (FM). It has demonstrated strong generative capabilities and attracted significant attention as a simulation-free flow-based framework capable of learning exact data densities. Motivated by these advances, we propose LatentFM, a flow-based model operating in the latent space for medical image segmentation. To model the data distribution, we first design two variational autoencoders (VAEs) to encode both medical images and their corresponding masks into a lower-dimensional latent space. We then estimate a conditional velocity field that guides the flow based on the input image. By sampling multiple latent representations, our method synthesizes diverse segmentation outputs whose pixel-wise variance reliably captures the underlying data distribution, enabling both highly accurate and uncertainty-aware predictions. Furthermore, we generate confidence maps that quantify the model certainty, providing clinicians with richer information for deeper analysis. We conduct experiments on two datasets, ISIC-2018 and CVC-Clinic, and compare our method with several prior baselines, including both deterministic and generative approach models. Through comprehensive evaluations, both qualitative and quantitative results show that our approach achieves superior segmentation accuracy while remaining highly efficient in the latent space.

[332]  arXiv:2512.04822 [pdf, ps, other]
Title: Enabling Ethical AI: A case study in using Ontological Context for Justified Agentic AI Decisions
Comments: 24 pages including references, with 6 images and 2 tables. Appendices, supporting data and additional reference provided from page 25 to 117
Subjects: Artificial Intelligence (cs.AI)

In this preprint, we present A collaborative human-AI approach to building an inspectable semantic layer for Agentic AI. AI agents first propose candidate knowledge structures from diverse data sources; domain experts then validate, correct, and extend these structures, with their feedback used to improve subsequent models. Authors show how this process captures tacit institutional knowledge, improves response quality and efficiency, and mitigates institutional amnesia. We argue for a shift from post-hoc explanation to justifiable Agentic AI, where decisions are grounded in explicit, inspectable evidence and reasoning accessible to both experts and non-specialists.

[333]  arXiv:2512.04823 [pdf, ps, other]
Title: Constrained Control of PDE Traffic Flow via Spatial Control Barrier Functions
Comments: Submitted to 2026 European Control Conference, 6 pages, 7 figures
Subjects: Systems and Control (eess.SY)

In this paper, a constrained control approach to variable speed limit (VSL) control for macroscopic partial differential equations (PDE) traffic models is developed. Control Lyapunov function (CLF) theory for ordinary differential equations (ODE) is extended to account for spatially and temporally varying states and control inputs. The stabilizing CLF is then unified with safety constraints through the introduction of spatially varying control barrier functions (sCBF). These methods are applied to in-domain VSL control of the Lighthill-Whitham-Richards (LWR) model to regulate traffic density to a desired profile while ensuring the density remains below prescribed limits enforced by the sCBF. Results show that incorporating constrained control minimally affects the stabilizing control input while successfully maintaining the density with the defined safe set.

[334]  arXiv:2512.04824 [pdf, ps, other]
Title: Hierarchical matrix approximability of inverse of convection dominated finite element matrices
Comments: 32 pages, 22 figures. Submited to the Journal of Computational Physics
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

Several researchers have developed a rich toolbox of matrix compression techniques that exploit structure and redundancy in large matrices. Classical methods such as the block low-rank format and the Fast Multipole Method make it possible to manipulate very large systems by representing them in a reduced form. Among the most sophisticated tools in this area are hierarchical matrices (H-matrices), which exploit local properties of the underlying kernel or operator to approximate matrix blocks by low-rank factors, organized in a recursive hierarchy. H-matrices offer a flexible and scalable framework, yielding nearly linear complexity in both storage and computation. Hierarchical matrix techniques, originally developed for boundary integral equations, have recently been applied to matrices stemming from the discretization of advection-dominated problems. However, their effectiveness is limited by the loss of coercivity induced by convection phenomena, where traditional methods fail. Initial work by Le Borne addressed this by modifying the admissibility criterion for structured grids with constant convection, but challenges remain for more general grids and advection fields. In this work, we propose a novel partitioning strategy based on "convection tubes", clusters aligned with the convection vector field. This method does not require a structured grid or constant convection, overcoming the limitations of previous approaches. We present both theoretical analyses and numerical experiments, that demonstrate the efficiency and robustness of our method for convection-dominated PDEs on unstructured grids. The approach builds on a P\'eclet-robust Caccioppoli inequality, crucial for handling convection-dominated problems.

[335]  arXiv:2512.04827 [pdf, ps, other]
Title: Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs
Authors: Wenzhang Du
Comments: 11 pages, 3 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG)

Subjective mean opinion scores (MOS) remain the de-facto target for non-intrusive speech and singing quality assessment. However, MOS is a scalar that collapses heterogeneous user expectations, ignores service-level objectives, and is difficult to compare across deployment graphs. We propose a contract-driven QoE auditing framework: each service graph G is evaluated under a set of human-interpretable experience contracts C, yielding a contract-level satisfaction vector Q(G, C). We show that (i) classical MOS regression is a special case with a degenerate contract set, (ii) contract-driven quality is more stable than MOS under graph view transformations (e.g., pooling by system vs. by system type), and (iii) the effective sample complexity of learning contracts is governed by contract semantics rather than merely the dimensionality of C. We instantiate the framework on URGENT2024 MOS (6.9k speech utterances with raw rating vectors) and SingMOS v1 (7,981 singing clips; 80 systems). On URGENT, we train a contract-aware neural auditor on self-supervised WavLM embeddings; on SingMOS, we perform contract-driven graph auditing using released rating vectors and metadata without decoding audio. Empirically, our auditor matches strong MOS predictors in MOS accuracy while providing calibrated contract probabilities; on SingMOS, Q(G, C) exhibits substantially smaller cross-view drift than raw MOS and graph-only baselines; on URGENT, difficulty curves reveal that mis-specified "simple" contracts can be harder to learn than richer but better aligned contract sets.

[336]  arXiv:2512.04828 [pdf, ps, other]
Title: The Stagnant Persistence Paradox: Survival Analysis and Temporal Efficiency in Exact Sciences and Engineering Education
Authors: H. R. Paz
Comments: 18 pages , 5 figures, 3 tables
Subjects: Computers and Society (cs.CY)

Research on student progression in higher education has traditionally focused on vertical outcomes such as persistence and dropout, often reducing complex academic histories to binary indicators. While the structural component of horizontal mobility (major switching, plan changes, re-entries) has recently been recognised as a core feature of contemporary university systems, the temporal cost and efficiency of these pathways remain largely unquantified. Using forty years of administrative records from a large faculty of engineering and exact sciences in Argentina (N = 24,016), this study applies a dual-outcome survival analysis framework to two key outcomes: definitive dropout and first major switch. We reconstruct academic trajectories as sequences of enrolment spells and typed transitions under the CAPIRE protocol, and then deploy non-parametric Kaplan-Meier estimators to model time-to-event under right-censoring. Results uncover a critical systemic inefficiency: a global median survival time of 4.33 years prior to definitive dropout, with a pronounced long tail of extended enrolment. This pattern reveals a phenomenon of stagnant persistence, where students remain formally enrolled for long periods without commensurate curricular progression. In contrast, major switching follows an early-event regime, with a median time of 1.0 year among switchers and most switches concentrated within the first academic year. We argue that academic failure in rigid engineering curricula is not a sudden outcome but a long-tail process that generates high opportunity costs, and that institutional indicators should shift from static retention metrics towards measures of curricular velocity based on time-to-event analysis.

[337]  arXiv:2512.04829 [pdf, ps, other]
Title: Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing
Subjects: Artificial Intelligence (cs.AI)

Sphere packing, Hilbert's eighteenth problem, asks for the densest arrangement of congruent spheres in n-dimensional Euclidean space. Although relevant to areas such as cryptography, crystallography, and medical imaging, the problem remains unresolved: beyond a few special dimensions, neither optimal packings nor tight upper bounds are known. Even a major breakthrough in dimension $n=8$, later recognised with a Fields Medal, underscores its difficulty. A leading technique for upper bounds, the three-point method, reduces the problem to solving large, high-precision semidefinite programs (SDPs). Because each candidate SDP may take days to evaluate, standard data-intensive AI approaches are infeasible. We address this challenge by formulating SDP construction as a sequential decision process, the SDP game, in which a policy assembles SDP formulations from a set of admissible components. Using a sample-efficient model-based framework that combines Bayesian optimisation with Monte Carlo Tree Search, we obtain new state-of-the-art upper bounds in dimensions $4-16$, showing that model-based search can advance computational progress in longstanding geometric problems. Together, these results demonstrate that sample-efficient, model-based search can make tangible progress on mathematically rigid, evaluation limited problems, pointing towards a complementary direction for AI-assisted discovery beyond large-scale LLM-driven exploration.

[338]  arXiv:2512.04830 [pdf, ps, other]
Title: FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis
Comments: Novel View Synthesis, Driving Scene, Free Trajectory, Image Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Closed-loop simulation and scalable pre-training for autonomous driving require synthesizing free-viewpoint driving scenes. However, existing datasets and generative pipelines rarely provide consistent off-trajectory observations, limiting large-scale evaluation and training. While recent generative models demonstrate strong visual realism, they struggle to jointly achieve interpolation consistency and extrapolation realism without per-scene optimization. To address this, we propose FreeGen, a feed-forward reconstruction-generation co-training framework for free-viewpoint driving scene synthesis. The reconstruction model provides stable geometric representations to ensure interpolation consistency, while the generation model performs geometry-aware enhancement to improve realism at unseen viewpoints. Through co-training, generative priors are distilled into the reconstruction model to improve off-trajectory rendering, and the refined geometry in turn offers stronger structural guidance for generation. Experiments demonstrate that FreeGen achieves state-of-the-art performance for free-viewpoint driving scene synthesis.

[339]  arXiv:2512.04832 [pdf, ps, other]
Title: Tokenizing Buildings: A Transformer for Layout Synthesis
Comments: 8 pages, 1 page References, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

We introduce Small Building Model (SBM), a Transformer-based architecture for layout synthesis in Building Information Modeling (BIM) scenes. We address the question of how to tokenize buildings by unifying heterogeneous feature sets of architectural elements into sequences while preserving compositional structure. Such feature sets are represented as a sparse attribute-feature matrix that captures room properties. We then design a unified embedding module that learns joint representations of categorical and possibly correlated continuous feature groups. Lastly, we train a single Transformer backbone in two modes: an encoder-only pathway that yields high-fidelity room embeddings, and an encoder-decoder pipeline for autoregressive prediction of room entities, referred to as Data-Driven Entity Prediction (DDEP). Experiments across retrieval and generative layout synthesis show that SBM learns compact room embeddings that reliably cluster by type and topology, enabling strong semantic retrieval. In DDEP mode, SBM produces functionally sound layouts, with fewer collisions and boundary violations and improved navigability.

[340]  arXiv:2512.04833 [pdf, ps, other]
Title: Counterfactual Explanations for Power System Optimisation
Subjects: Systems and Control (eess.SY)

Enhanced computational capabilities of modern decision-making software have allowed us to solve increasingly sophisticated optimisation problems. But in complex socio-economic, technical environments such as electricity markets, transparent operation is key to ensure a fair treatment of all parties involved, particularly regarding dispatch decisions. We address this issue by building on the concept of counterfactual explanations, answering questions such as "Why was this generator not dispatched?" by identifying minimum changes in the input parameters that would have changed the optimal solution. Both DC Optimal Power Flow and Unit Commitment problems are considered, wherein the variable parameters are the spatial and temporal demand profiles, respectively. The thereby obtained explanations allow users to identify the most important differences between the real and expected market outcomes and observe which constraints have led to the solution. The framework uses a bilevel optimisation problem to find the counterfactual demand scenarios. State-of-the-art methods are compared with data-driven heuristics on the basis of computational efficiency and explanation accuracy. Results show that leveraging historical data from previously solved instances can provide significant speed benefits and allows us to derive explanations in cases where conventional methods would not be tractable.

[341]  arXiv:2512.04834 [pdf, ps, other]
Title: Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

Large Language Models (LLMs) have become a key topic in AI and NLP, transforming sectors like healthcare, finance, education, and marketing by improving customer service, automating tasks, providing insights, improving diagnostics, and personalizing learning experiences. Information extraction from clinical records is a crucial task in digital healthcare. Although traditional NLP techniques have been used for this in the past, they often fall short due to the complexity, variability of clinical language, and high inner semantics in the free clinical text. Recently, Large Language Models (LLMs) have become a powerful tool for better understanding and generating human-like text, making them highly effective in this area. In this paper, we explore the ability of open-source multilingual LLMs to understand EHRs (Electronic Health Records) in Italian and help extract information from them in real-time. Our detailed experimental campaign on comorbidity extraction from EHR reveals that some LLMs struggle in zero-shot, on-premises settings, and others show significant variation in performance, struggling to generalize across various diseases when compared to native pattern matching and manual annotations.

[342]  arXiv:2512.04837 [pdf, ps, other]
Title: A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing methods for deepfake detection aim to develop generalizable detectors. Although "generalizable" is the ultimate target once and for all, with limited training forgeries and domains, it appears idealistic to expect generalization that covers entirely unseen variations, especially given the diversity of real-world deepfakes. Therefore, introducing large-scale multi-domain data for training can be feasible and important for real-world applications. However, within such a multi-domain scenario, the differences between multiple domains, rather than the subtle real/fake distinctions, dominate the feature space. As a result, despite detectors being able to relatively separate real and fake within each domain (i.e., high AUC), they struggle with single-image real/fake judgments in domain-unspecified conditions (i.e., low ACC). In this paper, we first define a new research paradigm named Multi-In-Domain Face Forgery Detection (MID-FFD), which includes sufficient volumes of real-fake domains for training. Then, the detector should provide definitive real-fake judgments to the domain-unspecified inputs, which simulate the frame-by-frame independent detection scenario in the real world. Meanwhile, to address the domain-dominant issue, we propose a model-agnostic framework termed DevDet (Developer for Detector) to amplify real/fake differences and make them dominant in the feature space. DevDet consists of a Face Forgery Developer (FFDev) and a Dose-Adaptive detector Fine-Tuning strategy (DAFT). Experiments demonstrate our superiority in predicting real-fake under the MID-FFD scenario while maintaining original generalization ability to unseen data.

[343]  arXiv:2512.04838 [pdf, ps, other]
Title: DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution
Comments: 18 pages, 10 Figures
Subjects: Computation and Language (cs.CL)

In the age of advanced large language models (LLMs), the boundaries between human and AI-generated text are becoming increasingly blurred. We address the challenge of segmenting mixed-authorship text, that is identifying transition points in text where authorship shifts from human to AI or vice-versa, a problem with critical implications for authenticity, trust, and human oversight. We introduce a novel framework, called Info-Mask for mixed authorship detection that integrates stylometric cues, perplexity-driven signals, and structured boundary modeling to accurately segment collaborative human-AI content. To evaluate the robustness of our system against adversarial perturbations, we construct and release an adversarial benchmark dataset Mixed-text Adversarial setting for Segmentation (MAS), designed to probe the limits of existing detectors. Beyond segmentation accuracy, we introduce Human-Interpretable Attribution (HIA overlays that highlight how stylometric features inform boundary predictions, and we conduct a small-scale human study assessing their usefulness. Across multiple architectures, Info-Mask significantly improves span-level robustness under adversarial conditions, establishing new baselines while revealing remaining challenges. Our findings highlight both the promise and limitations of adversarially robust, interpretable mixed-authorship detection, with implications for trust and oversight in human-AI co-authorship.

[344]  arXiv:2512.04841 [pdf, ps, other]
Title: SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security
Authors: Wei Zhao, Zhe Li, Jun Sun
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) exhibit remarkable capabilities but remain vulnerable to adversarial manipulations such as jailbreaking, where crafted prompts bypass safety mechanisms. Understanding the causal factors behind such vulnerabilities is essential for building reliable defenses.
In this work, we introduce a unified causality analysis framework that systematically supports all levels of causal investigation in LLMs, ranging from token-level, neuron-level, and layer-level interventions to representation-level analysis. The framework enables consistent experimentation and comparison across diverse causality-based attack and defense methods. Accompanying this implementation, we provide the first comprehensive survey of causality-driven jailbreak studies and empirically evaluate the framework on multiple open-weight models and safety-critical benchmarks including jailbreaks, hallucination detection, backdoor identification, and fairness evaluation. Our results reveal that: (1) targeted interventions on causally critical components can reliably modify safety behavior; (2) safety-related mechanisms are highly localized (i.e., concentrated in early-to-middle layers with only 1--2\% of neurons exhibiting causal influence); and (3) causal features extracted from our framework achieve over 95\% detection accuracy across multiple threat types.
By bridging theoretical causality analysis and practical model safety, our framework establishes a reproducible foundation for research on causality-based attacks, interpretability, and robust attack detection and mitigation in LLMs. Code is available at https://github.com/Amadeuszhao/SOK_Casuality.

[345]  arXiv:2512.04843 [pdf, ps, other]
Title: From Symptoms to Systems: An Expert-Guided Approach to Understanding Risks of Generative AI for Eating Disorders
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Generative AI systems may pose serious risks to individuals vulnerable to eating disorders. Existing safeguards tend to overlook subtle but clinically significant cues, leaving many risks unaddressed. To better understand the nature of these risks, we conducted semi-structured interviews with 15 clinicians, researchers, and advocates with expertise in eating disorders. Using abductive qualitative analysis, we developed an expert-guided taxonomy of generative AI risks across seven categories: (1) providing generalized health advice; (2) encouraging disordered behaviors; (3) supporting symptom concealment; (4) creating thinspiration; (5) reinforcing negative self-beliefs; (6) promoting excessive focus on the body; and (7) perpetuating narrow views about eating disorders. Our results demonstrate how certain user interactions with generative AI systems intersect with clinical features of eating disorders in ways that may intensify risk. We discuss implications of our work, including approaches for risk assessment, safeguard design, and participatory evaluation practices with domain experts.

[346]  arXiv:2512.04844 [pdf, ps, other]
Title: Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Expanding the linguistic diversity of instruct large language models (LLMs) is crucial for global accessibility but is often hindered by the reliance on costly specialized target language labeled data and catastrophic forgetting during adaptation. We tackle this challenge under a realistic, low-resource constraint: adapting instruct LLMs using only unlabeled target language data. We introduce Source-Shielded Updates (SSU), a selective parameter update strategy that proactively preserves source knowledge. Using a small set of source data and a parameter importance scoring method, SSU identifies parameters critical to maintaining source abilities. It then applies a column-wise freezing strategy to protect these parameters before adaptation. Experiments across five typologically diverse languages and 7B and 13B models demonstrate that SSU successfully mitigates catastrophic forgetting. It reduces performance degradation on monolingual source tasks to just 3.4% (7B) and 2.8% (13B) on average, a stark contrast to the 20.3% and 22.3% from full fine-tuning. SSU also achieves target-language performance highly competitive with full fine-tuning, outperforming it on all benchmarks for 7B models and the majority for 13B models.

[347]  arXiv:2512.04845 [pdf, ps, other]
Title: A High-Order Discretization Scheme for Surface Integral Equations for Analyzing the Electroencephalography Forward Problem
Comments: 9 pages, 7 figures
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph); Biological Physics (physics.bio-ph)

A Nystrom-based high-order (HO) discretization scheme for surface integral equations (SIEs) for analyzing the electroencephalography (EEG) forward problem is proposed in this work. We use HO surface elements and interpolation functions for the discretization of the interfaces of the head volume and the unknowns on the elements, respectively. The advantage of this work over existing isoparametric HO discretization schemes resides in the fact that the interpolation points are different from the mesh nodes, allowing for the flexible manipulation of the order of the basis functions without regenerating the mesh of the interfaces. Moreover, the interpolation points are chosen from the quadrature rules with the same number of points on the elements simplifying the numerical computation of the surface integrals for the far-interaction case. In this contribution, we extend the implementation of the HO discretization scheme to the double-layer and the adjoint double-layer formulations, as well as to the isolated-skull-approach for the double-layer formulation and to the indirect adjoint double-layer formulation, employed to improve the solution accuracy in case of high conductivity contrast models, which requires the development of different techniques for the singularity treatment. Numerical experiments are presented to demonstrate the accuracy, flexibility, and efficiency of the proposed scheme for the four SIEs for analyzing the EEG forward problem.

[348]  arXiv:2512.04847 [pdf, ps, other]
Title: Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

Pre-trained audio models excel at detecting acoustic patterns in auscultation sounds but often fail to grasp their clinical significance, limiting their use and performance in diagnostic tasks. To bridge this gap, we introduce AcuLa (Audio-Clinical Understanding via Language Alignment), a lightweight post-training framework that instills semantic understanding into any audio encoder by aligning it with a medical language model, which acts as a "semantic teacher." To enable alignment at scale, we construct a large-scale dataset by leveraging off-the-shelf large language models to translate the rich, structured metadata accompanying existing audio recordings into coherent clinical reports. Our alignment strategy combines a representation-level contrastive objective with a self-supervised modeling, ensuring that the model learns clinical semantics while preserving fine-grained temporal cues. AcuLa achieves state-of-the-art results across 18 diverse cardio-respiratory tasks from 10 different datasets, improving the mean AUROC on classification benchmarks from 0.68 to 0.79 and, on the most challenging COVID-19 cough detection task, boosting the AUROC from 0.55 to 0.89. Our work demonstrates that this audio-language alignment transforms purely acoustic models into clinically-aware diagnostic tools, establishing a novel paradigm for enhancing physiological understanding in audio-based health monitoring.

[349]  arXiv:2512.04850 [pdf, ps, other]
Title: Side-by-side first-price auctions with imperfect bidders
Authors: Benjamin Heymann
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH); Optimization and Control (math.OC)

We model a procurement scenario in which two \textit{imperfect} bidders act simultaneously on behalf of a single buyer, a configuration common in display advertising and referred to as \textit{side-by-side bidding} but largely unexplored in theory. We prove that the iterated best response algorithm converges to an equilibrium under standard distributional assumptions and provide sufficient condition for uniqueness. Beyond establishing existence and convergence, our analysis provides a tractable numerical method for quantitative studies of side-by-side procurement.

[350]  arXiv:2512.04852 [pdf, ps, other]
Title: Ask Safely: Privacy-Aware LLM Query Generation for Knowledge Graphs
Subjects: Information Retrieval (cs.IR)

Large Language Models (LLMs) are increasingly used to query knowledge graphs (KGs) due to their strong semantic understanding and extrapolation capabilities compared to traditional approaches. However, these methods cannot be applied when the KG contains sensitive data and the user lacks the resources to deploy a local generative LLM. To address this issue, we propose a privacy-aware query generation approach for KGs. Our method identifies sensitive information in the graph based on its structure and omits such values before requesting the LLM to translate natural language questions into Cypher queries. Experimental results show that our approach preserves the quality of the generated queries while preventing sensitive data from being transmitted to third-party services.

[351]  arXiv:2512.04854 [pdf, ps, other]
Title: From Task Executors to Research Partners: Evaluating AI Co-Pilots Through Workflow Integration in Biomedical Research
Subjects: Artificial Intelligence (cs.AI)

Artificial intelligence systems are increasingly deployed in biomedical research. However, current evaluation frameworks may inadequately assess their effectiveness as research collaborators. This rapid review examines benchmarking practices for AI systems in preclinical biomedical research. Three major databases and two preprint servers were searched from January 1, 2018 to October 31, 2025, identifying 14 benchmarks that assess AI capabilities in literature understanding, experimental design, and hypothesis generation. The results revealed that all current benchmarks assess isolated component capabilities, including data analysis quality, hypothesis validity, and experimental protocol design. However, authentic research collaboration requires integrated workflows spanning multiple sessions, with contextual memory, adaptive dialogue, and constraint propagation. This gap implies that systems excelling on component benchmarks may fail as practical research co-pilots. A process-oriented evaluation framework is proposed that addresses four critical dimensions absent from current benchmarks: dialogue quality, workflow orchestration, session continuity, and researcher experience. These dimensions are essential for evaluating AI systems as research co-pilots rather than as isolated task executors.

[352]  arXiv:2512.04855 [pdf, ps, other]
Title: A Novel Trust-Based DDoS Cyberattack Detection Model for Smart Business Environments
Comments: 22 Pages
Journal-ref: International Journal of Network Security & Its Applications (IJNSA) Vol.17, No.5/6, November 2025
Subjects: Cryptography and Security (cs.CR)

As the frequency and complexity of Distributed Denial-of-Service (DDoS) attacks continue to increase, the level of threats posed to Smart Internet of Things (SIoT) business environments have also increased. These environments generally have several interconnected SIoT systems and devices that are integral to daily operations, usually depending on cloud infrastructure and real-time data analytics, which require continuous availability and secure data exchange. Conventional detection mechanisms, while useful in static or traditional network environments, often are inadequate in responding to the needs of these dynamic and diverse SIoT networks. In this paper, we introduce a novel trust-based DDoS detection model tailored to meet the unique requirements of smart business environments. The proposed model incorporates a trust evaluation engine that continuously monitors node behaviour, calculating trust scores based on packet delivery ratio, response time, and anomaly detection. These trust metrics are then aggregated by a central trust-based repository that uses inherent trust values to identify traffic patterns indicative of DDoS attacks. By integrating both trust scores and central trust-based outputs, the trust calculation is enhanced, ensuring that threats are accurately identified and addressed in real-time. The model demonstrated a significant improvement in detection accuracy, and a low false-positive rate with enhanced scalability and adaptability under TCP SYN, Ping Flood, and UDP Flood attacks. The results show that a trust-based approach provides an effective, lightweight alternative for securing resource-constrained business IoT environments.

[353]  arXiv:2512.04856 [pdf, ps, other]
Title: Safe model-based Reinforcement Learning via Model Predictive Control and Control Barrier Functions
Comments: Submitted to IFAC WC 2026, 7 pages, 3 figures
Subjects: Systems and Control (eess.SY)

Optimal control strategies are often combined with safety certificates to ensure both performance and safety in safety-critical systems. A prominent example is combining Model Predictive Control (MPC) with Control Barrier Functions (CBF). Yet, efficient tuning of MPC parameters and choosing an appropriate class $\mathcal{K}$ function in the CBF is challenging and problem dependent. This paper introduces a safe model-based Reinforcement Learning (RL) framework where a parametric MPC controller incorporates a CBF constraint with a parameterized class $\mathcal{K}$ function and serves as a function approximator to learn improved safe control policies from data. Three variations of the framework are introduced, distinguished by the way the optimization problem is formulated and the class $\mathcal{K}$ function is parameterized, including neural architectures. Numerical experiments on a discrete double-integrator with static and dynamic obstacles demonstrate that the proposed methods improve performance while ensuring safety.

[354]  arXiv:2512.04857 [pdf, ps, other]
Title: Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Autoregressive (AR) visual generation has emerged as a powerful paradigm for image and multimodal synthesis, owing to its scalability and generality. However, existing AR image generation suffers from severe memory bottlenecks due to the need to cache all previously generated visual tokens during decoding, leading to both high storage requirements and low throughput. In this paper, we introduce \textbf{LineAR}, a novel, training-free progressive key-value (KV) cache compression pipeline for autoregressive image generation. By fully exploiting the intrinsic characteristics of visual attention, LineAR manages the cache at the line level using a 2D view, preserving the visual dependency regions while progressively evicting less-informative tokens that are harmless for subsequent line generation, guided by inter-line attention. LineAR enables efficient autoregressive (AR) image generation by utilizing only a few lines of cache, achieving both memory savings and throughput speedup, while maintaining or even improving generation quality. Extensive experiments across six autoregressive image generation models, including class-conditional and text-to-image generation, validate its effectiveness and generality. LineAR improves ImageNet FID from 2.77 to 2.68 and COCO FID from 23.85 to 22.86 on LlamaGen-XL and Janus-Pro-1B, while retaining only 1/6 KV cache. It also improves DPG on Lumina-mGPT-768 with just 1/8 KV cache. Additionally, LineAR achieves significant memory and throughput gains, including up to 67.61% memory reduction and 7.57x speedup on LlamaGen-XL, and 39.66% memory reduction and 5.62x speedup on Janus-Pro-7B.

[355]  arXiv:2512.04858 [pdf, ps, other]
Title: Exact 3-D Channel Impulse Response for Spherical Receivers with Arbitrary Drift Directions
Comments: 5 pages, 4 figures. Preprint prepared for journal submission
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Probability (math.PR)

Accurate channel modeling for spherical absorbing receivers is fundamental to the design of realistic molecular multiple-input multiple-output (MIMO) systems. While advanced modulation schemes have been proposed to mitigate interference, determining the channel impulse response (CIR) under arbitrary flow directions remains a challenge; existing exact solutions are restricted to either 1-D/no-drift scenarios or planar receiver geometries. Addressing this gap, we derive the first exact analytical CIR for a spherical receiver in a 3-D molecular communication system with uniform drift in an arbitrary direction. Unlike prior approximations that ignore the angle between the drift and the transmission axis, our approach utilizes the Girsanov theorem to analytically transform the hitting-time distribution from a stationary medium to a drifted one. The proposed closed-form expression not only eliminates modeling errors inherent in previous approximations for off-axis receivers but also enables efficient parameter-space exploration of critical system metrics (e.g., peak time and amplitude), a task that would be computationally costly with pure simulation-based approaches.

[356]  arXiv:2512.04859 [pdf, ps, other]
Title: High-Performance DBMSs with io_uring: When and How to use it
Subjects: Databases (cs.DB)

We study how modern database systems can leverage the Linux io_uring interface for efficient, low-overhead I/O. io_uring is an asynchronous system call batching interface that unifies storage and network operations, addressing limitations of existing Linux I/O interfaces. However, naively replacing traditional I/O interfaces with io_uring does not necessarily yield performance benefits. To demonstrate when io_uring delivers the greatest benefits and how to use it effectively in modern database systems, we evaluate it in two use cases: Integrating io_uring into a storage-bound buffer manager and using it for high-throughput data shuffling in network-bound analytical workloads. We further analyze how advanced io_uring features, such as registered buffers and passthrough I/O, affect end-to-end performance. Our study shows when low-level optimizations translate into tangible system-wide gains and how architectural choices influence these benefits. Building on these insights, we derive practical guidelines for designing I/O-intensive systems using io_uring and validate their effectiveness in a case study of PostgreSQL's recent io_uring integration, where applying our guidelines yields a performance improvement of 14%.

[357]  arXiv:2512.04862 [pdf, ps, other]
Title: Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing
Comments: * Equal contribution. Minor figure corrections compared to the ICCV 2025 version
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Capturing accurate 3D human pose in the wild would provide valuable data for training pose estimation and motion generation methods. While video-based estimation approaches have become increasingly accurate, they often fail in common scenarios involving self-contact, such as a hand touching the face. In contrast, wearable bioimpedance sensing can cheaply and unobtrusively measure ground-truth skin-to-skin contact. Consequently, we propose a novel framework that combines visual pose estimators with bioimpedance sensing to capture the 3D pose of people by taking self-contact into account. Our method, BioTUCH, initializes the pose using an off-the-shelf estimator and introduces contact-aware pose optimization during measured self-contact: reprojection error and deviations from the input estimate are minimized while enforcing vertex proximity constraints. We validate our approach using a new dataset of synchronized RGB video, bioimpedance measurements, and 3D motion capture. Testing with three input pose estimators, we demonstrate an average of 11.7% improvement in reconstruction accuracy. We also present a miniature wearable bioimpedance sensor that enables efficient large-scale collection of contact-aware training data for improving pose estimation and generation using BioTUCH. Code and data are available at biotuch.is.tue.mpg.de

[358]  arXiv:2512.04864 [pdf, ps, other]
Title: Are Your Agents Upward Deceivers?
Subjects: Artificial Intelligence (cs.AI)

Large Language Model (LLM)-based agents are increasingly used as autonomous subordinates that carry out tasks for users. This raises the question of whether they may also engage in deception, similar to how individuals in human organizations lie to superiors to create a good image or avoid punishment. We observe and define agentic upward deception, a phenomenon in which an agent facing environmental constraints conceals its failure and performs actions that were not requested without reporting. To assess its prevalence, we construct a benchmark of 200 tasks covering five task types and eight realistic scenarios in a constrained environment, such as broken tools or mismatched information sources. Evaluations of 11 popular LLMs reveal that these agents typically exhibit action-based deceptive behaviors, such as guessing results, performing unsupported simulations, substituting unavailable information sources, and fabricating local files. We further test prompt-based mitigation and find only limited reductions, suggesting that it is difficult to eliminate and highlighting the need for stronger mitigation strategies to ensure the safety of LLM-based agents.

[359]  arXiv:2512.04867 [pdf, ps, other]
Title: Functional Stability of Software-Hardware Neural Network Implementation The NeuroComp Project
Comments: 14 pages
Subjects: Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE)

This paper presents an innovative approach to ensuring functional stability of neural networks through hardware redundancy at the individual neuron level. Unlike the classical Dropout method, which is used during training for regularization purposes, the proposed system ensures resilience to hardware failures during network operation. Each neuron is implemented on a separate microcomputer (ESP32), allowing the system to continue functioning even when individual computational nodes fail.

[360]  arXiv:2512.04868 [pdf, ps, other]
Title: SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Knowledge-based conversational question answering (KBCQA) confronts persistent challenges in resolving coreference, modeling contextual dependencies, and executing complex logical reasoning. Existing approaches, whether end-to-end semantic parsing or stepwise agent-based reasoning, often suffer from structural inaccuracies and prohibitive computational costs, particularly when processing intricate queries over large knowledge graphs. To address these limitations, we introduce SEAL, a novel two-stage semantic parsing framework grounded in self-evolving agentic learning. In the first stage, a large language model (LLM) extracts a minimal S-expression core that captures the essential semantics of the input query. This core is then refined by an agentic calibration module, which corrects syntactic inconsistencies and aligns entities and relations precisely with the underlying knowledge graph. The second stage employs template-based completion, guided by question-type prediction and placeholder instantiation, to construct a fully executable S-expression. This decomposition not only simplifies logical form generation but also significantly enhances structural fidelity and linking efficiency. Crucially, SEAL incorporates a self-evolving mechanism that integrates local and global memory with a reflection module, enabling continuous adaptation from dialog history and execution feedback without explicit retraining. Extensive experiments on the SPICE benchmark demonstrate that SEAL achieves state-of-the-art performance, especially in multi-hop reasoning, comparison, and aggregation tasks. The results validate notable gains in both structural accuracy and computational efficiency, underscoring the framework's capacity for robust and scalable conversational reasoning.

[361]  arXiv:2512.04869 [pdf, ps, other]
Title: Developing a General Personal Tutor for Education
Journal-ref: Trends in Cognitive Sciences, 29 (11),957-960 (2025)
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

The vision of a universal AI tutor has remained elusive, despite decades of effort. Could LLMs be the game-changer? We overview novel issues arising from developing a nationwide AI tutor. We highlight the practical questions that point to specific gaps in our scientific understanding of the learning process.

[362]  arXiv:2512.04871 [pdf, ps, other]
Title: STELLA: Guiding Large Language Models for Time Series Forecasting with Semantic Abstractions
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent adaptations of Large Language Models (LLMs) for time series forecasting often fail to effectively enhance information for raw series, leaving LLM reasoning capabilities underutilized. Existing prompting strategies rely on static correlations rather than generative interpretations of dynamic behavior, lacking critical global and instance-specific context. To address this, we propose STELLA (Semantic-Temporal Alignment with Language Abstractions), a framework that systematically mines and injects structured supplementary and complementary information. STELLA employs a dynamic semantic abstraction mechanism that decouples input series into trend, seasonality, and residual components. It then translates intrinsic behavioral features of these components into Hierarchical Semantic Anchors: a Corpus-level Semantic Prior (CSP) for global context and a Fine-grained Behavioral Prompt (FBP) for instance-level patterns. Using these anchors as prefix-prompts, STELLA guides the LLM to model intrinsic dynamics. Experiments on eight benchmark datasets demonstrate that STELLA outperforms state-of-the-art methods in long- and short-term forecasting, showing superior generalization in zero-shot and few-shot settings. Ablation studies further validate the effectiveness of our dynamically generated semantic anchors.

[363]  arXiv:2512.04875 [pdf, ps, other]
Title: SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Automated lesion detection in chest X-rays has demonstrated significant potential for improving clinical diagnosis by precisely localizing pathological abnormalities. While recent promptable detection frameworks have achieved remarkable accuracy in target localization, existing methods typically rely on manual annotations as prompts, which are labor-intensive and impractical for clinical applications. To address this limitation, we propose SP-Det, a novel self-prompted detection framework that automatically generates rich textual context to guide multi-label lesion detection without requiring expert annotations. Specifically, we introduce an expert-free dual-text prompt generator (DTPG) that leverages two complementary textual modalities: semantic context prompts that capture global pathological patterns and disease beacon prompts that focus on disease-specific manifestations. Moreover, we devise a bidirectional feature enhancer (BFE) that synergistically integrates comprehensive diagnostic context with disease-specific embeddings to significantly improve feature representation and detection accuracy. Extensive experiments on two chest X-ray datasets with diverse thoracic disease categories demonstrate that our SP-Det framework outperforms state-of-the-art detection methods while completely eliminating the dependency on expert-annotated prompts compared to existing promptable architectures.

[364]  arXiv:2512.04876 [pdf, ps, other]
Title: Optimizations and extensions for fair join pattern matching
Authors: Ioannis Karras
Comments: This is a Master's thesis for the Master's in Computer Science and Engineering at DTU (Technical University of Denmark)
Subjects: Programming Languages (cs.PL); Data Structures and Algorithms (cs.DS)

Join patterns are an underexplored approach for the programming of concurrent and distributed systems. When applied to the actor model, join patterns offer the novel capability of matching combinations of messages in the mailbox of an actor. Previous work by Philipp Haller et al. in the paper "Fair Join Pattern Matching for Actors" (ECOOP 2024) explored join patterns with conditional guards in an actor-based setting with a specification of fair and deterministic matching semantics. Nevertheless, the question of time efficiency in fair join pattern matching has remained underexplored. The stateful tree-based matching algorithm of Haller et al. performs worse than an implementation that adapts the Rete algorithm to the regular version of a join pattern matching benchmark, while outperforming on a variant with heavy conditional guards, which take longer to evaluate. Nevertheless, conforming Rete to the problem of join pattern matching requires heavy manual adaptation.
In this thesis, we enhance and optimize the stateful tree-based matching algorithm of Haller et al. to achieve up to tenfold performance improvements on certain benchmarks, approaching the performance of Rete on regular benchmarks while maintaining the advantages of versatility and performance with heavy guards. We also enhance the benchmark suite, adding new features and enhancing its extensibility and user-friendliness. We extend the join pattern implementation with a less ambiguous syntax as well as dynamic pattern switching. Finally, we present a new complex model use case for join patterns, showing their applicability in a microservice web architecture.

[365]  arXiv:2512.04883 [pdf, ps, other]
Title: SDG-Track: A Heterogeneous Observer-Follower Framework for High-Resolution UAV Tracking on Embedded Platforms
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Real-time tracking of small unmanned aerial vehicles (UAVs) on edge devices faces a fundamental resolution-speed conflict. Downsampling high-resolution imagery to standard detector input sizes causes small target features to collapse below detectable thresholds. Yet processing native 1080p frames on resource-constrained platforms yields insufficient throughput for smooth gimbal control. We propose SDG-Track, a Sparse Detection-Guided Tracker that adopts an Observer-Follower architecture to reconcile this conflict. The Observer stream runs a high-capacity detector at low frequency on the GPU to provide accurate position anchors from 1920x1080 frames. The Follower stream performs high-frequency trajectory interpolation via ROI-constrained sparse optical flow on the CPU. To handle tracking failures from occlusion or model drift caused by spectrally similar distractors, we introduce Dual-Space Recovery, a training-free re-acquisition mechanism combining color histogram matching with geometric consistency constraints. Experiments on a ground-to-air tracking station demonstrate that SDG-Track achieves 35.1 FPS system throughput while retaining 97.2\% of the frame-by-frame detection precision. The system successfully tracks agile FPV drones under real-world operational conditions on an NVIDIA Jetson Orin Nano. Our paper code is publicly available at https://github.com/Jeffry-wen/SDG-Track

[366]  arXiv:2512.04884 [pdf, ps, other]
Title: Hoi! -- A Multimodal Dataset for Force-Grounded, Cross-View Articulated Manipulation
Subjects: Robotics (cs.RO)

We present a dataset for force-grounded, cross-view articulated manipulation that couples what is seen with what is done and what is felt during real human interaction. The dataset contains 3048 sequences across 381 articulated objects in 38 environments. Each object is operated under four embodiments - (i) human hand, (ii) human hand with a wrist-mounted camera, (iii) handheld UMI gripper, and (iv) a custom Hoi! gripper - where the tool embodiment provides synchronized end-effector forces and tactile sensing. Our dataset offers a holistic view of interaction understanding from video, enabling researchers to evaluate how well methods transfer between human and robotic viewpoints, but also investigate underexplored modalities such as force sensing and prediction.

[367]  arXiv:2512.04885 [pdf, ps, other]
Title: Stability-Guaranteed Dual Kalman Filtering for Electrochemical Battery State Estimation
Comments: This work has been submitted to 23rd IFAC World Congress for possible publication
Subjects: Systems and Control (eess.SY)

Accurate and stable state estimation is critical for battery management. Although dual Kalman filtering can jointly estimate states and parameters, the strong coupling between filters may cause divergence under large initialization errors or model mismatch. This paper proposes a Stability Guaranteed Dual Kalman Filtering (SG-DKF) method. A Lyapunov-based analysis yields a sufficient stability condition, leading to an adaptive dead-zone rule that suspends parameter updates when the innovation exceeds a stability bound. Applied to an electrochemical battery model, SG-DKF achieves accuracy comparable to a dual EKF and reduces state of charge RMSE by over 45% under large initial state errors.

[368]  arXiv:2512.04888 [pdf, ps, other]
Title: You Only Train Once (YOTO): A Retraining-Free Object Detection Framework
Comments: under review in the Elsevier Engineering Journal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Object detection constitutes the primary task within the domain of computer vision. It is utilized in numerous domains. Nonetheless, object detection continues to encounter the issue of catastrophic forgetting. The model must be retrained whenever new products are introduced, utilizing not only the new products dataset but also the entirety of the previous dataset. The outcome is obvious: increasing model training expenses and significant time consumption. In numerous sectors, particularly retail checkout, the frequent introduction of new products presents a great challenge. This study introduces You Only Train Once (YOTO), a methodology designed to address the issue of catastrophic forgetting by integrating YOLO11n for object localization with DeIT and Proxy Anchor Loss for feature extraction and metric learning. For classification, we utilize cosine similarity between the embedding features of the target product and those in the Qdrant vector database. In a case study conducted in a retail store with 140 products, the experimental results demonstrate that our proposed framework achieves encouraging accuracy, whether for detecting new or existing products. Furthermore, without retraining, the training duration difference is significant. We achieve almost 3 times the training time efficiency compared to classical object detection approaches. This efficiency escalates as additional new products are added to the product database. The average inference time is 580 ms per image containing multiple products, on an edge device, validating the proposed framework's feasibility for practical use.

[369]  arXiv:2512.04890 [pdf, ps, other]
Title: Equivariant Symmetry-Aware Head Pose Estimation for Fetal MRI
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present E(3)-Pose, a novel fast pose estimation method that jointly and explicitly models rotation equivariance and object symmetry. Our work is motivated by the challenging problem of accounting for fetal head motion during a diagnostic MRI scan. We aim to enable automatic adaptive prescription of 2D diagnostic MRI slices with 6-DoF head pose estimation, supported by 3D MRI volumes rapidly acquired before each 2D slice. Existing methods struggle to generalize to clinical volumes, due to pose ambiguities induced by inherent anatomical symmetries, as well as low resolution, noise, and artifacts. In contrast, E(3)-Pose captures anatomical symmetries and rigid pose equivariance by construction, and yields robust estimates of the fetal head pose. Our experiments on publicly available and representative clinical fetal MRI datasets demonstrate the superior robustness and generalization of our method across domains. Crucially, E(3)-Pose achieves state-of-the-art accuracy on clinical MRI volumes, paving the way for clinical translation. Our implementation is available at github.com/ramyamut/E3-Pose.

[370]  arXiv:2512.04892 [pdf, ps, other]
Title: Small-Signal Stability Oriented Real-Time Operation of Power Systems with a High Penetration of Inverter-Based Resources
Subjects: Systems and Control (eess.SY)

This study proposes a control strategy to ensure the safe operation of modern power systems with high penetration of inverter-based resources (IBRs) within an optimal operation framework. The objective is to obtain operating points that satisfy the optimality conditions of a predefined problem while guaranteeing small-signal stability. The methodology consists of two stages. First, an offline analysis of a set of operating points is performed to derive a data-driven regression-based expression that captures a damping-based stability index as a function of the operating conditions. Second, an Online Feedback Optimization (OFO) controller is employed to drive the system toward an optimal operating point while maintaining a secure distance from the instability region. The proposed strategy is evaluated on an academic test case based on a modified version of the IEEE 9-bus system, in which synchronous generators are replaced by IBRs operating under both grid-following and grid-forming control modes. The results demonstrate the effectiveness of the method and are discussed in detail.

[371]  arXiv:2512.04894 [pdf, ps, other]
Title: Data-driven Methods for Delay Differential Equations
Subjects: Numerical Analysis (math.NA)

Data-driven methodologies are nowadays ubiquitous. Their rapid development and spread have led to applications even beyond the traditional fields of science. As far as dynamical systems and differential equations are concerned, neural networks and sparse identification tools have emerged as powerful approaches to recover the governing equations from available temporal data series. In this chapter we first illustrate possible extensions of the sparse identification of nonlinear dynamics (SINDy) algorithm, originally developed for ordinary differential equations (ODEs), to delay differential equations (DDEs) with discrete, possibly multiple and unknown delays. Two methods are presented for SINDy, one directly tackles the underlying DDE and the other acts on the system of ODEs approximating the DDE through pseudospectral collocation. We also introduce another way of capturing the dynamics of DDEs using neural networks and trainable delays in continuous time, and present the training algorithms developed for these neural delay differential equations (NDDEs). The relevant MATLAB implementations for both the SINDy approach and for the NDDE approach are provided. These approaches are tested on several examples, including classical systems such as the delay logistic and the Mackey-Glass equation, and directly compared to each other on the delayed R\"ossler system. We provide insights on the connection between the approaches and future directions on developing data-driven methods for time delay systems.

[372]  arXiv:2512.04895 [pdf, ps, other]
Title: Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems
Authors: M Zeeshan, Saud Satti
Comments: 5 pages, 2 figures, IEEE Transactions on Dependable and Secure Computing
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Multimodal Artificial Intelligence (AI) systems, particularly Vision-Language Models (VLMs), have become integral to critical applications ranging from autonomous decision-making to automated document processing. As these systems scale, they rely heavily on preprocessing pipelines to handle diverse inputs efficiently. However, this dependency on standard preprocessing operations, specifically image downscaling, creates a significant yet often overlooked security vulnerability. While intended for computational optimization, scaling algorithms can be exploited to conceal malicious visual prompts that are invisible to human observers but become active semantic instructions once processed by the model. Current adversarial strategies remain largely static, failing to account for the dynamic nature of modern agentic workflows. To address this gap, we propose Chameleon, a novel, adaptive adversarial framework designed to expose and exploit scaling vulnerabilities in production VLMs. Unlike traditional static attacks, Chameleon employs an iterative, agent-based optimization mechanism that dynamically refines image perturbations based on the target model's real-time feedback. This allows the framework to craft highly robust adversarial examples that survive standard downscaling operations to hijack downstream execution. We evaluate Chameleon against Gemini 2.5 Flash model. Our experiments demonstrate that Chameleon achieves an Attack Success Rate (ASR) of 84.5% across varying scaling factors, significantly outperforming static baseline attacks which average only 32.1%. Furthermore, we show that these attacks effectively compromise agentic pipelines, reducing decision-making accuracy by over 45% in multi-step tasks. Finally, we discuss the implications of these vulnerabilities and propose multi-scale consistency checks as a necessary defense mechanism.

[373]  arXiv:2512.04904 [pdf, ps, other]
Title: ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Despite tremendous recent progress, Flow Matching methods still suffer from exposure bias due to discrepancies in training and inference. This paper investigates the root causes of exposure bias in Flow Matching, including: (1) the model lacks generalization to biased inputs during training, and (2) insufficient low-frequency content captured during early denoising, leading to accumulated bias. Based on these insights, we propose ReflexFlow, a simple and effective reflexive refinement of the Flow Matching learning objective that dynamically corrects exposure bias. ReflexFlow consists of two components: (1) Anti-Drift Rectification (ADR), which reflexively adjusts prediction targets for biased inputs utilizing a redesigned loss under training-time scheduled sampling; and (2) Frequency Compensation (FC), which reflects on missing low-frequency components and compensates them by reweighting the loss using exposure bias. ReflexFlow is model-agnostic, compatible with all Flow Matching frameworks, and improves generation quality across datasets. Experiments on CIFAR-10, CelebA-64, and ImageNet-256 show that ReflexFlow outperforms prior approaches in mitigating exposure bias, achieving a 35.65% reduction in FID on CelebA-64.

[374]  arXiv:2512.04908 [pdf, ps, other]
Title: Logic-Driven Cybersecurity: A Novel Framework for System Log Anomaly Detection using Answer Set Programming
Comments: Submitted to FLOPS 2026
Subjects: Cryptography and Security (cs.CR); Logic in Computer Science (cs.LO)

This study explores the application of Answer Set Programming (ASP) for detecting anomalies in system logs, addressing the challenges posed by evolving cyber threats. We propose a novel framework that leverages ASP's declarative nature and logical reasoning capabilities to encode complex security rules as logical predicates. Our ASP-based system was applied to a real-world Linux system log dataset, demonstrating its effectiveness in identifying various anomalies such as potential brute-force attacks, privilege escalations, frequent network connections from specific IPs, and various system-level issues. Key findings highlight ASP's strengths in handling structured log data, rule flexibility, and event correlation. The approach shows promise in providing explainable alerts from real-world data. This research contributes to computer forensics by demonstrating a logic-based paradigm for log analysis on a practical dataset, opening avenues for more nuanced and adaptive cyber intelligence systems.

[375]  arXiv:2512.04910 [pdf, ps, other]
Title: Declarative Synthesis and Multi-Objective Optimization of Stripboard Circuit Layouts Using Answer Set Programming
Authors: Fang Li
Comments: Accepted by the 43rd IEEE International Conference on Computer Design (ICCD 2025)
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)

This paper presents a novel approach to automated stripboard circuit layout design using Answer Set Programming (ASP). The work formulates the layout problem as both a synthesis and multi-objective optimization task that simultaneously generates viable layouts while minimizing board area and component strip crossing. By leveraging ASP's declarative nature, this work expresses complex geometric and electrical constraints in a natural and concise manner. The two-phase solving methodology first ensures feasibility before optimizing layout quality. Experimental results demonstrate that this approach generates compact, manufacturable layouts for a range of circuit complexities. This work represents a significant advancement in automated stripboard layout, offering a practical tool for electronics prototyping and education while showcasing the power of declarative programming for solving complex design automation problems.

[376]  arXiv:2512.04912 [pdf, ps, other]
Title: A result relating convex n-widths to covering numbers with some applications to neural networks
Subjects: Machine Learning (cs.LG)

In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or ``features'' is known to be hard. Typically, the worst-case error of the best basis set decays only as fast as $\Theta\(n^{-1/d}\)$, where $n$ is the number of basis functions and $d$ is the input dimension. However, there are many examples of high-dimensional pattern recognition problems (such as face recognition) where linear combinations of small sets of features do solve the problem well. Hence these function classes do not suffer from the ``curse of dimensionality'' associated with more general classes. It is natural then, to look for characterizations of high-dimensional function classes that nevertheless are approximated well by linear combinations of small sets of features. In this paper we give a general result relating the error of approximation of a function class to the covering number of its ``convex core''. For one-hidden-layer neural networks, covering numbers of the class of functions computed by a single hidden node upper bound the covering numbers of the convex core. Hence, using standard results we obtain upper bounds on the approximation rate of neural network classes.

[377]  arXiv:2512.04917 [pdf, ps, other]
Title: On Disturbance-Aware Minimum-Time Trajectory Planning: Evidence from Tests on a Dynamic Driving Simulator
Comments: 18 pages, 11 figures, 5 tables
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This work investigates how disturbance-aware, robustness-embedded reference trajectories translate into driving performance when executed by professional drivers in a dynamic simulator. Three planned reference trajectories are compared against a free-driving baseline (NOREF) to assess trade-offs between lap time (LT) and steering effort (SE): NOM, the nominal time-optimal trajectory; TLC, a track-limit-robust trajectory obtained by tightening margins to the track edges; and FLC, a friction-limit-robust trajectory obtained by tightening against axle and tire saturation. All trajectories share the same minimum lap-time objective with a small steering-smoothness regularizer and are evaluated by two professional drivers using a high-performance car on a virtual track. The trajectories derive from a disturbance-aware minimum-lap-time framework recently proposed by the authors, where worst-case disturbance growth is propagated over a finite horizon and used to tighten tire-friction and track-limit constraints, preserving performance while providing probabilistic safety margins. LT and SE are used as performance indicators, while RMS lateral deviation, speed error, and drift angle characterize driving style. Results show a Pareto-like LT-SE trade-off: NOM yields the shortest LT but highest SE; TLC minimizes SE at the cost of longer LT; FLC lies near the efficient frontier, substantially reducing SE relative to NOM with only a small LT increase. Removing trajectory guidance (NOREF) increases both LT and SE, confirming that reference trajectories improve pace and control efficiency. Overall, the findings highlight reference-based and disturbance-aware planning, especially FLC, as effective tools for training and for achieving fast yet stable trajectories.

[378]  arXiv:2512.04918 [pdf, ps, other]
Title: Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty
Subjects: Machine Learning (cs.LG)

Intraday surgical scheduling is a multi-objective decision problem under uncertainty-balancing elective throughput, urgent and emergency demand, delays, sequence-dependent setups, and overtime. We formulate the problem as a cooperative Markov game and propose a multi-agent reinforcement learning (MARL) framework in which each operating room (OR) is an agent trained with centralized training and decentralized execution. All agents share a policy trained via Proximal Policy Optimization (PPO), which maps rich system states to actions, while a within-epoch sequential assignment protocol constructs conflict-free joint schedules across ORs. A mixed-integer pre-schedule provides reference starting times for electives; we impose type-specific quadratic delay penalties relative to these references and a terminal overtime penalty, yielding a single reward that captures throughput, timeliness, and staff workload. In simulations reflecting a realistic hospital mix (six ORs, eight surgery types, random urgent and emergency arrivals), the learned policy outperforms six rule-based heuristics across seven metrics and three evaluation subsets, and, relative to an ex post MIP oracle, quantifies optimality gaps. Policy analytics reveal interpretable behavior-prioritizing emergencies, batching similar cases to reduce setups, and deferring lower-value electives. We also derive a suboptimality bound for the sequential decomposition under simplifying assumptions. We discuss limitations-including OR homogeneity and the omission of explicit staffing constraints-and outline extensions. Overall, the approach offers a practical, interpretable, and tunable data-driven complement to optimization for real-time OR scheduling.

[379]  arXiv:2512.04921 [pdf, ps, other]
Title: The AI Consumer Index (ACE)
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

We introduce the first version of the AI Consumer Index (ACE), a benchmark for assessing whether frontier AI models can perform high-value consumer tasks. ACE contains a hidden heldout set of 400 test cases, split across four consumer activities: shopping, food, gaming, and DIY. We are also open sourcing 80 cases as a devset with a CC-BY license. For the ACE leaderboard we evaluated 10 frontier models (with websearch turned on) using a novel grading methodology that dynamically checks whether relevant parts of the response are grounded in the retrieved web sources. GPT 5 (Thinking = High) is the top-performing model, scoring 56.1%, followed by o3 Pro (Thinking = On) (55.2%) and GPT 5.1 (Thinking = High) (55.1%). Models differ across domains, and in Shopping the top model scores under 50%. For some requests (such as giving the correct price or providing working links), models are highly prone to hallucination. Overall, ACE shows a substantial gap between the performance of even the best models and consumers' AI needs.

[380]  arXiv:2512.04923 [pdf, ps, other]
Title: Algorithmic Thinking Theory
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large language models (LLMs) have proven to be highly effective for solving complex reasoning tasks. Surprisingly, their capabilities can often be improved by iterating on previously generated solutions. In this context, a reasoning plan for generating and combining a set of solutions can be thought of as an algorithm for reasoning using a probabilistic oracle.
We introduce a theoretical framework for analyzing such reasoning algorithms. This framework formalizes the principles underlying popular techniques for iterative improvement and answer aggregation, providing a foundation for designing a new generation of more powerful reasoning methods. Unlike approaches for understanding models that rely on architectural specifics, our model is grounded in experimental evidence. As a result, it offers a general perspective that may extend to a wide range of current and future reasoning oracles.

[381]  arXiv:2512.04926 [pdf, ps, other]
Title: Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Latent Diffusion Models (LDMs) inherently follow a coarse-to-fine generation process, where high-level semantic structure is generated slightly earlier than fine-grained texture. This indicates the preceding semantics potentially benefit texture generation by providing a semantic anchor. Recent advances have integrated semantic priors from pretrained visual encoders to further enhance LDMs, yet they still denoise semantic and VAE-encoded texture synchronously, neglecting such ordering. Observing these, we propose Semantic-First Diffusion (SFD), a latent diffusion paradigm that explicitly prioritizes semantic formation. SFD first constructs composite latents by combining a compact semantic latent, which is extracted from a pretrained visual encoder via a dedicated Semantic VAE, with the texture latent. The core of SFD is to denoise the semantic and texture latents asynchronously using separate noise schedules: semantics precede textures by a temporal offset, providing clearer high-level guidance for texture refinement and enabling natural coarse-to-fine generation. On ImageNet 256x256 with guidance, SFD achieves FID 1.06 (LightningDiT-XL) and FID 1.04 (1.0B LightningDiT-XXL), while achieving up to 100x faster convergence than the original DiT. SFD also improves existing methods like ReDi and VA-VAE, demonstrating the effectiveness of asynchronous, semantics-led modeling. Project page and code: https://yuemingpan.github.io/SFD.github.io/.

[382]  arXiv:2512.04927 [pdf, ps, other]
Title: Virtually Unrolling the Herculaneum Papyri by Diffeomorphic Spiral Fitting
Authors: Paul Henderson
Comments: Accepted at WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The Herculaneum Papyri are a collection of rolled papyrus documents that were charred and buried by the famous eruption of Mount Vesuvius. They promise to contain a wealth of previously unseen Greek and Latin texts, but are extremely fragile and thus most cannot be unrolled physically. A solution to access these texts is virtual unrolling, where the papyrus surface is digitally traced out in a CT scan of the scroll, to create a flattened representation. This tracing is very laborious to do manually in gigavoxel-sized scans, so automated approaches are desirable. We present the first top-down method that automatically fits a surface model to a CT scan of a severely damaged scroll. We take a novel approach that globally fits an explicit parametric model of the deformed scroll to existing neural network predictions of where the rolled papyrus likely passes. Our method guarantees the resulting surface is a single continuous 2D sheet, even passing through regions where the surface is not detectable in the CT scan. We conduct comprehensive experiments on high-resolution CT scans of two scrolls, showing that our approach successfully unrolls large regions, and exceeds the performance of the only existing automated unrolling method suitable for this data.

[383]  arXiv:2512.04929 [pdf, ps, other]
Title: Weak convergence rates for spectral regularization via sampling inequalities
Subjects: Numerical Analysis (math.NA)

Convergence rates in spectral regularization methods quantify the approximation error in inverse problems as a function of the noise level or the number of sampling points. Classical strong convergence rate results typically rely on source conditions, which are essential for estimating the truncation error. However, in the framework of kernel approximation, the truncation error in the case of Tikhonov regularization can be characterized entirely through sampling inequalities, without invoking source conditions. In this paper, we first generalize sampling inequalities to spectral regularization, and then, by exploiting the connection between inverse problems and kernel approximation, we derive weak convergence rate bounds for inverse problems, independently of source conditions. These weak convergence rates are established and analyzed when the forward operator is compact and uniformly bounded, or the kernel operator is of trace class.

[384]  arXiv:2512.04938 [pdf, ps, other]
Title: Toward Continuous Neurocognitive Monitoring: Integrating Speech AI with Relational Graph Transformers for Rare Neurological Diseases
Subjects: Artificial Intelligence (cs.AI)

Patients with rare neurological diseases report cognitive symptoms -"brain fog"- invisible to traditional tests. We propose continuous neurocognitive monitoring via smartphone speech analysis integrated with Relational Graph Transformer (RELGT) architectures. Proof-of-concept in phenylketonuria (PKU) shows speech-derived "Proficiency in Verbal Discourse" correlates with blood phenylalanine (p = -0.50, p < 0.005) but not standard cognitive tests (all |r| < 0.35). RELGT could overcome information bottlenecks in heterogeneous medical data (speech, labs, assessments), enabling predictive alerts weeks before decompensation. Key challenges: multi-disease validation, clinical workflow integration, equitable multilingual deployment. Success would transform episodic neurology into continuous personalized monitoring for millions globally.

[385]  arXiv:2512.04939 [pdf, ps, other]
Title: LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging
Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D vision foundation models like Visual Geometry Grounded Transformer (VGGT) have advanced greatly in geometric perception. However, it is time-consuming and memory-intensive for long sequences, limiting application to large-scale scenes beyond hundreds of images. To address this, we propose LiteVGGT, achieving up to 10x speedup and substantial memory reduction, enabling efficient processing of 1000-image scenes. We derive two key insights for 3D reconstruction: (1) tokens from local image regions have inherent geometric correlations, leading to high similarity and computational redundancy; (2) token similarity across adjacent network layers remains stable, allowing for reusable merge decisions. Guided by these, we design a simple yet efficient strategy, dubbed geometry-aware cached token merging. We analyze each token's geometric importance, optimizing anchor token selection to better preserve key information for reconstruction. We also cache and reuse merge indices across layers, substantially reducing latency with minimal accuracy impact. This strategy retains VGGT's core performance, enabling efficient fine-tuning and FP8 quantization for further gains. Extensive experiments validate LiteVGGT's effectiveness, scalability, and robustness. Project page: https://garlicba.github.io/LiteVGGT/

[386]  arXiv:2512.04943 [pdf, ps, other]
Title: Towards Adaptive Fusion of Multimodal Deep Networks for Human Action Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This study introduces a pioneering methodology for human action recognition by harnessing deep neural network techniques and adaptive fusion strategies across multiple modalities, including RGB, optical flows, audio, and depth information. Employing gating mechanisms for multimodal fusion, we aim to surpass limitations inherent in traditional unimodal recognition methods while exploring novel possibilities for diverse applications. Through an exhaustive investigation of gating mechanisms and adaptive weighting-based fusion architectures, our methodology enables the selective integration of relevant information from various modalities, thereby bolstering both accuracy and robustness in action recognition tasks. We meticulously examine various gated fusion strategies to pinpoint the most effective approach for multimodal action recognition, showcasing its superiority over conventional unimodal methods. Gating mechanisms facilitate the extraction of pivotal features, resulting in a more holistic representation of actions and substantial enhancements in recognition performance. Our evaluations across human action recognition, violence action detection, and multiple self-supervised learning tasks on benchmark datasets demonstrate promising advancements in accuracy. The significance of this research lies in its potential to revolutionize action recognition systems across diverse fields. The fusion of multimodal information promises sophisticated applications in surveillance and human-computer interaction, especially in contexts related to active assisted living.

[387]  arXiv:2512.04947 [pdf, ps, other]
Title: Crack detection by holomorphic neural networks and transfer-learning-enhanced genetic optimization
Subjects: Computational Engineering, Finance, and Science (cs.CE)

A new strategy for detecting cracks in 2D solids based on strain data is introduced. Crack detection is formulated as an inverse problem and solved using genetic optimization. The novelty lies in the evaluation of the model response at each generation. Specifically, the solution to the corresponding plane elasticity problem is expressed via holomorphic potentials, which are determined by training two holomorphic neural networks. As the potentials satisfy equilibrium and traction-free conditions along the crack faces a priori, the training proceeds quickly based solely on boundary information. Training efficiency is further improved by splitting the genetic search into long-range and short-range stages, enabling the use of transfer learning in the latter. The new strategy is tested on three benchmark problems, showing that an optimal number of training epochs exists that provides the best overall performance. A comparison is also made with a popular crack detection approach that uses XFEM to compute the model response. Under the assumption of identical stress-field representation accuracy, the proposed method is found to be between 7 and 23 times faster than the XFEM-based approach. While the strategy is presented here for the simplified case of a single internal crack, generalization is feasible. Overall, the present findings demonstrate that combining genetic optimization with holomorphic neural networks and transfer learning offers a promising avenue for developing crack detection strategies with higher efficiency than those currently available.

[388]  arXiv:2512.04949 [pdf, ps, other]
Title: CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent
Comments: 10 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conventional group-level policy optimization algorithm becomes suboptimal because of its underlying assumption that each action holds equal contribution, which deviates significantly from reality. Our analysis reveals that only a small fraction of actions are critical in determining the final outcome. Building on this insight, we propose CARL, a critical-action-focused reinforcement learning algorithm tailored for multi-step agents. CARL achieves focused training through providing action-level optimization signals for high-criticality actions while excluding low-criticality actions from model update. Extensive experiments demonstrate that CARL achieves both stronger performance and higher efficiency during training and inference across diverse evaluation settings.

[389]  arXiv:2512.04950 [pdf, ps, other]
Title: Opacity problems in multi-energy timed automata
Comments: This is the author version (extended with all proofs) of the manuscript of the same name published in the proceedings of the 41st ACM/SIGAPP Symposium on Applied Computing (SAC 2026)
Subjects: Cryptography and Security (cs.CR)

Cyber-physical systems can be subject to information leakage; in the presence of continuous variables such as time and energy, these leaks can be subtle to detect. We study here the verification of opacity problems over systems with observation over both timing and energy information. We introduce guarded multi-energy timed automata as an extension of timed automata with multiple energy variables and guards over such variables. Despite undecidability of this general formalism, we establish positive results over a number of subclasses, notably when the attacker observes the final energy and/or the execution time, but also when they have access to the value of the energy variables every time unit.

[390]  arXiv:2512.04951 [pdf, ps, other]
Title: MAX BISECTION might be harder to approximate than MAX CUT
Comments: 35 pages, 2 figures
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)

The MAX BISECTION problem seeks a maximum-size cut that evenly divides the vertices of a given undirected graph. An open problem raised by Austrin, Benabbas, and Georgiou is whether MAX BISECTION can be approximated as well as MAX CUT, i.e., to within ${\alpha_{GW}}\approx 0.8785672\ldots$, which is the approximation ratio achieved by the celebrated Goemans-Williamson algorithm for MAX CUT, which is best possible assuming the Unique Games Conjecture (UGC). They conjectured that the answer is yes.
The current paradigm for obtaining approximation algorithms for MAX BISECTION, due to Raghavendra and Tan and Austrin, Benabbas, and Georgiou, follows a two-phase approach. First, a large number of rounds of the Sum-of-Squares (SoS) hierarchy is used to find a solution to the ``Basic SDP'' relaxation of MAX CUT which is $\varepsilon$-uncorrelated, for an arbitrarily small $\varepsilon > 0$. Second, standard SDP rounding techniques (such as ${\cal THRESH}$) are used to round this $\varepsilon$-uncorrelated solution, producing with high probability a cut that is almost balanced, i.e., a cut that has at most $\frac12+\varepsilon$ fraction of the vertices on each side. This cut is then converted into an exact bisection of the graph with only a small loss.
In this paper, we show that this two-stage paradigm cannot be used to obtain an $\alpha_{GW}$-approximation algorithm for MAX BISECTION if one relies only on the $\varepsilon$-uncorrelatedness property of the solution produced by the first phase. More precisely, for any $\varepsilon > 0$, we construct an explicit instance of MAX BISECTION for which the ratio between the value of the optimal integral solution and the value of some $\varepsilon$-uncorrelated solution of the Basic SDP relaxation is less than $0.87853 < {\alpha_{GW}}$. Our instances are also integrality gaps for the Basic SDP relaxation of MAX BISECTION.

[391]  arXiv:2512.04952 [pdf, ps, other]
Title: FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via neural Action Tokenization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Autoregressive vision-language-action (VLA) models have recently demonstrated strong capabilities in robotic manipulation. However, their core process of action tokenization often involves a trade-off between reconstruction fidelity and inference efficiency. We introduce FASTer, a unified framework for efficient and generalizable robot learning that integrates a learnable tokenizer with an autoregressive policy built upon it. FASTerVQ encodes action chunks as single-channel images, capturing global spatio-temporal dependencies while maintaining a high compression ratio. FASTerVLA builds on this tokenizer with block-wise autoregressive decoding and a lightweight action expert, achieving both faster inference and higher task performance. Extensive experiments across simulated and real-world benchmarks show that FASTerVQ delivers superior reconstruction quality, high token utilization, and strong cross-task and cross-embodiment generalization, while FASTerVLA further improves overall capability, surpassing previous state-of-the-art VLA models in both inference speed and task performance.

[392]  arXiv:2512.04954 [pdf, ps, other]
Title: Amortized Inference of Multi-Modal Posteriors using Likelihood-Weighted Normalizing Flows
Authors: Rajneil Baruah
Comments: 14 pages, 8 figures
Subjects: Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); High Energy Physics - Phenomenology (hep-ph); Computational Physics (physics.comp-ph); Data Analysis, Statistics and Probability (physics.data-an)

We present a novel technique for amortized posterior estimation using Normalizing Flows trained with likelihood-weighted importance sampling. This approach allows for the efficient inference of theoretical parameters in high-dimensional inverse problems without the need for posterior training samples. We implement the method on multi-modal benchmark tasks in 2D and 3D to check for the efficacy. A critical observation of our study is the impact of the topology of the base distributions on the modelled posteriors. We find that standard unimodal base distributions fail to capture disconnected support, resulting in spurious probability bridges between modes. We demonstrate that initializing the flow with a Gaussian Mixture Model that matches the cardinality of the target modes significantly improves reconstruction fidelity, as measured by some distance and divergence metrics.

[393]  arXiv:2512.04955 [pdf, ps, other]
Title: Bounds on Maximal Leakage over Bayesian Networks
Comments: 9 pages, double column format, 2 figures
Journal-ref: IEEE International Symposium on Information Theory (ISIT 2025)
Subjects: Information Theory (cs.IT); Probability (math.PR); Statistics Theory (math.ST)

Maximal leakage quantifies the leakage of information from data $X \in \mathcal{X}$ due to an observation $Y$. While fundamental properties of maximal leakage, such as data processing, sub-additivity, and its connection to mutual information, are well-established, its behavior over Bayesian networks is not well-understood and existing bounds are primarily limited to binary $\mathcal{X}$. In this paper, we investigate the behavior of maximal leakage over Bayesian networks with finite alphabets. Our bounds on maximal leakage are established by utilizing coupling-based characterizations which exist for channels satisfying certain conditions. Furthermore, we provide more general conditions under which such coupling characterizations hold for $|\mathcal{X}| = 4$. In the course of our analysis, we also present a new simultaneous coupling result on maximal leakage exponents. Finally, we illustrate the effectiveness of the proposed bounds with some examples.

[394]  arXiv:2512.04957 [pdf, ps, other]
Title: LLMs Know More Than Words: A Genre Study with Syntax, Metaphor & Phonetics
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) demonstrate remarkable potential across diverse language related tasks, yet whether they capture deeper linguistic properties, such as syntactic structure, phonetic cues, and metrical patterns from raw text remains unclear. To analysis whether LLMs can learn these features effectively and apply them to important nature language related tasks, we introduce a novel multilingual genre classification dataset derived from Project Gutenberg, a large-scale digital library offering free access to thousands of public domain literary works, comprising thousands of sentences per binary task (poetry vs. novel;drama vs. poetry;drama vs. novel) in six languages (English, French, German, Italian, Spanish, and Portuguese). We augment each with three explicit linguistic feature sets (syntactic tree structures, metaphor counts, and phonetic metrics) to evaluate their impact on classification performance. Experiments demonstrate that although LLM classifiers can learn latent linguistic structures either from raw text or from explicitly provided features, different features contribute unevenly across tasks, which underscores the importance of incorporating more complex linguistic signals during model training.

[395]  arXiv:2512.04958 [pdf, ps, other]
Title: Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The main focus of Hierarchical Reinforcement Learning (HRL) is studying how large Markov Decision Processes (MDPs) can be more efficiently solved when addressed in a modular way, by combining partial solutions computed for smaller subtasks. Despite their very intuitive role for learning, most notions of MDP abstractions proposed in the HRL literature have limited expressive power or do not possess formal efficiency guarantees. This work addresses these fundamental issues by defining Realizable Abstractions, a new relation between generic low-level MDPs and their associated high-level decision processes. The notion we propose avoids non-Markovianity issues and has desirable near-optimality guarantees. Indeed, we show that any abstract policy for Realizable Abstractions can be translated into near-optimal policies for the low-level MDP, through a suitable composition of options. As demonstrated in the paper, these options can be expressed as solutions of specific constrained MDPs. Based on these findings, we propose RARL, a new HRL algorithm that returns compositional and near-optimal low-level policies, taking advantage of the Realizable Abstraction given in the input. We show that RARL is Probably Approximately Correct, it converges in a polynomial number of samples, and it is robust to inaccuracies in the abstraction.

[396]  arXiv:2512.04960 [pdf, ps, other]
Title: Hybrid-Diffusion Models: Combining Open-loop Routines with Visuomotor Diffusion Policies
Subjects: Robotics (cs.RO)

Despite the fact that visuomotor-based policies obtained via imitation learning demonstrate good performances in complex manipulation tasks, they usually struggle to achieve the same accuracy and speed as traditional control based methods. In this work, we introduce Hybrid-Diffusion models that combine open-loop routines with visuomotor diffusion policies. We develop Teleoperation Augmentation Primitives (TAPs) that allow the operator to perform predefined routines, such as locking specific axes, moving to perching waypoints, or triggering task-specific routines seamlessly during demonstrations. Our Hybrid-Diffusion method learns to trigger such TAPs during inference. We validate the method on challenging real-world tasks: Vial Aspiration, Open-Container Liquid Transfer, and container unscrewing. All experimental videos are available on the project's website: https://hybriddiffusion.github.io/

[397]  arXiv:2512.04963 [pdf, ps, other]
Title: GeoPE:A Unified Geometric Positional Embedding for Structured Tensors
Authors: Yupu Yao, Bowen Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Standard Vision Transformers flatten 2D images into 1D sequences, disrupting the natural spatial topology. While Rotary Positional Embedding (RoPE) excels in 1D, it inherits this limitation, often treating spatially distant patches (e.g., at row edges) as sequence neighbors. Existing 2D approaches typically treat spatial axes independently, failing to decouple this false sequential proximity from true spatial distance. To restore the 2D spatial manifold, we introduce Geometric Positional Embedding (GeoPE), a framework that extends rotations to 3D Euclidean space using quaternions. To overcome non-commutativity and ensure symmetry, GeoPE constructs a unified rotational operator by computing the geometric mean in the Lie algebra. This creates a geometrically coupled encoding that effectively separates spatial dimensions. Extensive experiments on image classification, object detection, and 3D semantic segmentation demonstrate that GeoPE consistently outperforms existing 2D RoPE variants and significantly enhances shape bias, confirming its ability to capture true geometric structure.

[398]  arXiv:2512.04966 [pdf, ps, other]
Title: Environment-Aware Channel Inference via Cross-Modal Flow: From Multimodal Sensing to Wireless Channels
Comments: 13 pages, 13 figures, 40 references, submitted to IEEE for possible publication
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

Accurate channel state information (CSI) underpins reliable and efficient wireless communication. However, acquiring CSI via pilot estimation incurs substantial overhead, especially in massive multiple-input multiple-output (MIMO) systems operating in high-Doppler environments. By leveraging the growing availability of environmental sensing data, this treatise investigates pilot-free channel inference that estimates complete CSI directly from multimodal observations, including camera images, LiDAR point clouds, and GPS coordinates. In contrast to prior studies that rely on predefined channel models, we develop a data-driven framework that formulates the sensing-to-channel mapping as a cross-modal flow matching problem. The framework fuses multimodal features into a latent distribution within the channel domain, and learns a velocity field that continuously transforms the latent distribution toward the channel distribution. To make this formulation tractable and efficient, we reformulate the problem as an equivalent conditional flow matching objective and incorporate a modality alignment loss, while adopting low-latency inference mechanisms to enable real-time CSI estimation. In experiments, we build a procedural data generator based on Sionna and Blender to support realistic modeling of sensing scenes and wireless propagation. System-level evaluations demonstrate significant improvements over pilot- and sensing-based benchmarks in both channel estimation accuracy and spectral efficiency for the downstream beamforming task.

[399]  arXiv:2512.04967 [pdf, ps, other]
Title: Balanced Few-Shot Episodic Learning for Accurate Retinal Disease Diagnosis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Automated retinal disease diagnosis is vital given the rising prevalence of conditions such as diabetic retinopathy and macular degeneration. Conventional deep learning approaches require large annotated datasets, which are costly and often imbalanced across disease categories, limiting their reliability in practice. Few-shot learning (FSL) addresses this challenge by enabling models to generalize from only a few labeled samples per class. In this study,we propose a balanced few-shot episodic learning framework tailored to the Retinal Fundus Multi-Disease Image Dataset (RFMiD). Focusing on the ten most represented classes, which still show substantial imbalance between majority diseases (e.g., Diabetic Retinopathy, Macular Hole) and minority ones (e.g., Optic Disc Edema, Branch Retinal Vein Occlusion), our method integrates three key components: (i) balanced episodic sampling, ensuring equal participation of all classes in each 5-way 5-shot episode; (ii) targeted augmentation, including Contrast Limited Adaptive Histogram Equalization (CLAHE) and color/geometry transformations, to improve minority-class di- versity; and (iii) a ResNet-50 encoder pretrained on ImageNet, selected for its superior ability to capture fine-grained retinal features. Prototypes are computed in the embedding space and classification is performed with cosine similarity for improved stability. Trained on 100 episodes and evaluated on 1,000 test episodes, our framework achieves substantial accuracy gains and reduces bias toward majority classes, with notable improvements for underrepresented diseases. These results demonstrate that dataset-aware few-shot pipelines, combined with balanced sampling and CLAHE-enhanced preprocessing, can deliver more robust and clinically fair retinal disease diagnosis under data-constrained conditions.

[400]  arXiv:2512.04969 [pdf, ps, other]
Title: Rethinking the Use of Vision Transformers for AI-Generated Image Detection
Comments: Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Rich feature representations derived from CLIP-ViT have been widely utilized in AI-generated image detection. While most existing methods primarily leverage features from the final layer, we systematically analyze the contributions of layer-wise features to this task. Our study reveals that earlier layers provide more localized and generalizable features, often surpassing the performance of final-layer features in detection tasks. Moreover, we find that different layers capture distinct aspects of the data, each contributing uniquely to AI-generated image detection. Motivated by these findings, we introduce a novel adaptive method, termed MoLD, which dynamically integrates features from multiple ViT layers using a gating-based mechanism. Extensive experiments on both GAN- and diffusion-generated images demonstrate that MoLD significantly improves detection performance, enhances generalization across diverse generative models, and exhibits robustness in real-world scenarios. Finally, we illustrate the scalability and versatility of our approach by successfully applying it to other pre-trained ViTs, such as DINOv2.

[401]  arXiv:2512.04970 [pdf, ps, other]
Title: Stable Single-Pixel Contrastive Learning for Semantic and Geometric Tasks
Comments: UniReps Workshop 2025, 12 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We pilot a family of stable contrastive losses for learning pixel-level representations that jointly capture semantic and geometric information. Our approach maps each pixel of an image to an overcomplete descriptor that is both view-invariant and semantically meaningful. It enables precise point-correspondence across images without requiring momentum-based teacher-student training. Two experiments in synthetic 2D and 3D environments demonstrate the properties of our loss and the resulting overcomplete representations.

[402]  arXiv:2512.04971 [pdf, ps, other]
Title: Exploring YouTube's Political Communication Networks during the 2024 French Elections
Comments: 12 pages, 8 figures, to be published in ICWSM'2026
Subjects: Social and Information Networks (cs.SI)

In 2024, France was shaken by the far-right National Rally's victory in the European elections. In response to this unprecedented result, French President Emmanuel Macron dissolved the National Assembly, triggering legislative elections just two weeks later. A whirlwind campaign followed, partly on social media, as is now the norm, and concluded with the victory of a left-wing coalition. This article examines the YouTube activity of two key actors during this period, news media and politicians, and the commenting behavior they generated. We built a dataset of 35 news media channels, 28 politicians and parties channels, 43.5k videos posted from three months before the European elections to one week after the second round of the legislative elections, and 7.4M associated comments. We examined upload activity and engagement across political orientations and used network analysis methods to uncover the structure of their commenting communities. We also identified politicians' appearances on news media channels and assessed their impact on commenting user bases. Our findings show that, among politicians and parties channels, far-right and left-wing ones were significantly more active and received substantially higher engagement (views, likes, and comments) than other groups, with denser and more clustered commenting communities. About 7% of commenters commented across political orientations and were much more active than in-group commenters. News media channels tended to favor politically aligned guests, while centrist politicians were over-represented. Finally, politicians' presence in the videos of a specific news media channel increased the share of commenters who were active on this channel and political channels, regardless of their orientation.

[403]  arXiv:2512.04973 [pdf, ps, other]
Title: Preliminary Analysis and Simulation of a Compact Variable Stiffness Wrist
Comments: This article has been accepted for publication in Springer Proceedings in Advanced Robotics, vol 31. Springer, Cham. This is the author's version, which has not been fully edited, and the content may change prior to final publication. Citation information: DOI this https URL
Journal-ref: In International Symposium on Advances in Robot Kinematics, 2024, pp. 69-76. Cham: Springer Nature Switzerland
Subjects: Robotics (cs.RO)

Variable Stiffness Actuators prove invaluable for robotics applications in unstructured environments, fostering safe interactions and enhancing task adaptability. Nevertheless, their mechanical design inevitably results in larger and heavier structures compared to classical rigid actuators. This paper introduces a novel 3 Degrees of Freedom (DoFs) parallel wrist that achieves variable stiffness through redundant elastic actuation. Leveraging its parallel architecture, the device employs only four motors, rendering it compact and lightweight. This characteristic makes it particularly well-suited for applications in prosthetics or humanoid robotics. The manuscript delves into the theoretical model of the device and proposes a sophisticated control strategy for independent regulation of joint position and stiffness. Furthermore, it validates the proposed controller through simulation, utilizing a comprehensive analysis of the system dynamics. The reported results affirm the ability of the device to achieve high accuracy and disturbance rejection in rigid configurations while minimizing interaction forces with its compliant behavior.

[404]  arXiv:2512.04974 [pdf, ps, other]
Title: Efficient Generative Transformer Operators For Million-Point PDEs
Subjects: Machine Learning (cs.LG)

We introduce ECHO, a transformer-operator framework for generating million-point PDE trajectories. While existing neural operators (NOs) have shown promise for solving partial differential equations, they remain limited in practice due to poor scalability on dense grids, error accumulation during dynamic unrolling, and task-specific design. ECHO addresses these challenges through three key innovations. (i) It employs a hierarchical convolutional encode-decode architecture that achieves a 100 $\times$ spatio-temporal compression while preserving fidelity on mesh points. (ii) It incorporates a training and adaptation strategy that enables high-resolution PDE solution generation from sparse input grids. (iii) It adopts a generative modeling paradigm that learns complete trajectory segments, mitigating long-horizon error drift. The training strategy decouples representation learning from downstream task supervision, allowing the model to tackle multiple tasks such as trajectory generation, forward and inverse problems, and interpolation. The generative model further supports both conditional and unconditional generation. We demonstrate state-of-the-art performance on million-point simulations across diverse PDE systems featuring complex geometries, high-frequency dynamics, and long-term horizons.

[405]  arXiv:2512.04981 [pdf, ps, other]
Title: Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Large vision-language model (LVLM) based text-to-image (T2I) systems have become the dominant paradigm in image generation, yet whether they amplify social biases remains insufficiently understood. In this paper, we show that LVLM-based models produce markedly more socially biased images than non-LVLM-based models. We introduce a 1,024 prompt benchmark spanning four levels of linguistic complexity and evaluate demographic bias across multiple attributes in a systematic manner. Our analysis identifies system prompts, the predefined instructions guiding LVLMs, as a primary driver of biased behavior. Through decoded intermediate representations, token-probability diagnostics, and embedding-association analyses, we reveal how system prompts encode demographic priors that propagate into image synthesis. To this end, we propose FairPro, a training-free meta-prompting framework that enables LVLMs to self-audit and construct fairness-aware system prompts at test time. Experiments on two LVLM-based T2I models, SANA and Qwen-Image, show that FairPro substantially reduces demographic bias while preserving text-image alignment. We believe our findings provide deeper insight into the central role of system prompts in bias propagation and offer a practical, deployable approach for building more socially responsible T2I systems.

[406]  arXiv:2512.04983 [pdf, ps, other]
Title: A tangential low-rank ADI method for solving indefinite Lyapunov equations
Comments: 33 pages, 6 figures
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)

Continuous-time algebraic Lyapunov equations have become an essential tool in various applications. In the case of large-scale sparse coefficient matrices and indefinite constant terms, indefinite low-rank factorizations have successfully been used to allow methods like the alternating direction implicit (ADI) iteration to efficiently compute accurate approximations to the solution of the Lyapunov equation. However, classical block-type approaches quickly increase in computational costs when the rank of the constant term grows. In this paper, we propose a novel tangential reformulation of the ADI iteration that allows for the efficient construction of low-rank approximations to the solution of Lyapunov equations with indefinite right-hand sides even in the case of constant terms with higher ranks. We provide adaptive methods for the selection of the corresponding ADI parameters, namely shifts and tangential directions, which allow for the automatic application of the method to any relevant problem setting. The effectiveness of the developed algorithms is illustrated by several numerical examples.

[407]  arXiv:2512.04984 [pdf, ps, other]
Title: Federated Learning for Terahertz Wireless Communication
Comments: 10 pages, 4 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The convergence of Terahertz (THz) communications and Federated Learning (FL) promises ultra-fast distributed learning, yet the impact of realistic wideband impairments on optimization dynamics remains theoretically uncharacterized. This paper bridges this gap by developing a multicarrier stochastic framework that explicitly couples local gradient updates with frequency-selective THz effects, including beam squint, molecular absorption, and jitter. Our analysis uncovers a critical diversity trap: under standard unbiased aggregation, the convergence error floor is driven by the harmonic mean of subcarrier SNRs. Consequently, a single spectral hole caused by severe beam squint can render the entire bandwidth useless for reliable model updates. We further identify a fundamental bandwidth limit, revealing that expanding the spectrum beyond a critical point degrades convergence due to the integration of thermal noise and gain collapse at band edges. Finally, we demonstrate that an SNR-weighted aggregation strategy is necessary to suppress the variance singularity at these spectral holes, effectively recovering convergence in high-squint regimes where standard averaging fails. Numerical results validate the expected impact of the discussed physical layer parameters' on performance of THz-FL systems.

[408]  arXiv:2512.04987 [pdf, ps, other]
Title: Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
Subjects: Computation and Language (cs.CL)

The evolution of Large Language Models (LLMs) from passive responders to autonomous agents necessitates a fundamental shift in learning paradigms -- from static imitation to incentive-driven decision making. However, this transition is significantly impeded by the lack of scalable infrastructure capable of constructing high-quality interaction signals for effective policy learning. To address this, we introduce a comprehensive method designed to systematically scale the diversity and complexity of interactive environments. Our method realizes this scaling by addressing three orthogonal dimensions: (1) Complexity: NexAU, a flexible agent framework that supports building complex agent hierarchies via simple configurations; (2) Diversity: NexA4A automatically generates diverse agent hierarchies from natural language to cover infinite domains; and (3) Fidelity: NexGAP bridges the simulation-reality gap by integrating dynamic real-world environment for grounded trajectories synthesis. We train Nex-N1 upon the diverse and complex interactive environments established by our infrastructure. Empirical results on benchmarks such as SWE-bench and tau2 demonstrate that Nex-N1 consistently outperforms SOTA open-source models and achieves competitive performance against frontier proprietary models on complex agentic tasks. We open-source the Nex ecosystem and model weights to facilitate further research.

[409]  arXiv:2512.04988 [pdf, ps, other]
Title: Strategic Self-Improvement for Competitive Agents in AI Labour Markets
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)

As artificial intelligence (AI) agents are deployed across economic domains, understanding their strategic behavior and market-level impact becomes critical. This paper puts forward a groundbreaking new framework that is the first to capture the real-world economic forces that shape agentic labor markets: adverse selection, moral hazard, and reputation dynamics. Our framework encapsulates three core capabilities that successful LLM-agents will need: \textbf{metacognition} (accurate self-assessment of skills), \textbf{competitive awareness} (modeling rivals and market dynamics), and \textbf{long-horizon strategic planning}. We illustrate our framework through a tractable simulated gig economy where agentic Large Language Models (LLMs) compete for jobs, develop skills, and adapt their strategies under competitive pressure. Our simulations illustrate how LLM agents explicitly prompted with reasoning capabilities learn to strategically self-improve and demonstrate superior adaptability to changing market conditions. At the market level, our simulations reproduce classic macroeconomic phenomena found in human labor markets, while controlled experiments reveal potential AI-driven economic trends, such as rapid monopolization and systemic price deflation. This work provides a foundation to further explore the economic properties of AI-driven labour markets, and a conceptual framework to study the strategic reasoning capabilities in agents competing in the emerging economy.

[410]  arXiv:2512.04991 [pdf, ps, other]
Title: Parametric disjunctive timed networks
Comments: This is the author version of the manuscript of the same name published in the proceedings of the 34th EACSL Annual Conference on Computer Science Logic (CSL 2026)
Subjects: Logic in Computer Science (cs.LO)

We consider distributed systems with an arbitrary number of processes, modelled by timed automata that communicate through location guards: a process can take a guarded transition if at least one other process is in a given location. In this work, we introduce parametric disjunctive timed networks, where each timed automaton may contain timing parameters, i.e. unknown constants. We investigate two problems: deciding the emptiness of the set of parameter valuations for which
1) a given location is reachable for at least one process (local property), and
2) a global state is reachable where all processes are in a given location (global property).
Our main positive result is that the first problem is decidable for networks of processes with a single clock and without invariants; this result holds for arbitrarily many timing parameters -- a setting with few known decidability results. However, it becomes undecidable when invariants are allowed, or when considering global properties, even for systems with a single parameter. This highlights the significant expressive power of invariants in these networks. Additionally, we exhibit further decidable subclasses by restraining the syntax of guards and invariants.

[411]  arXiv:2512.04992 [pdf, ps, other]
Title: Evolutionary Architecture Search through Grammar-Based Sequence Alignment
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Neural architecture search (NAS) in expressive search spaces is a computationally hard problem, but it also holds the potential to automatically discover completely novel and performant architectures. To achieve this we need effective search algorithms that can identify powerful components and reuse them in new candidate architectures. In this paper, we introduce two adapted variants of the Smith-Waterman algorithm for local sequence alignment and use them to compute the edit distance in a grammar-based evolutionary architecture search. These algorithms enable us to efficiently calculate a distance metric for neural architectures and to generate a set of hybrid offspring from two parent models. This facilitates the deployment of crossover-based search heuristics, allows us to perform a thorough analysis on the architectural loss landscape, and track population diversity during search. We highlight how our method vastly improves computational complexity over previous work and enables us to efficiently compute shortest paths between architectures. When instantiating the crossover in evolutionary searches, we achieve competitive results, outperforming competing methods. Future work can build upon this new tool, discovering novel components that can be used more broadly across neural architecture design, and broadening its applications beyond NAS.

[412]  arXiv:2512.04996 [pdf, ps, other]
Title: A dynamic memory assignment strategy for dilation-based ICP algorithm on embedded GPUs
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a memory-efficient optimization strategy for the high-performance point cloud registration algorithm VANICP, enabling lightweight execution on embedded GPUs with constrained hardware resources. VANICP is a recently published acceleration framework that significantly improves the computational efficiency of point-cloud-based applications. By transforming the global nearest neighbor search into a localized process through a dilation-based information propagation mechanism, VANICP greatly reduces the computational complexity of the NNS. However, its original implementation demands a considerable amount of memory, which restricts its deployment in resource-constrained environments such as embedded systems. To address this issue, we propose a GPU-oriented dynamic memory assignment strategy that optimizes the memory usage of the dilation operation. Furthermore, based on this strategy, we construct an enhanced version of the VANICP framework that achieves over 97% reduction in memory consumption while preserving the original performance. Source code is published on: https://github.com/changqiong/VANICP4Em.git.

[413]  arXiv:2512.04998 [pdf, ps, other]
Title: Introducing V-Soft Pro: a Modular Platform for a Transhumeral Prosthesis with Controllable Stiffness
Comments: This article has been accepted for publication in Proceedings of the International Conference On Rehabilitation Robotics (ICORR), 2025. This is the author's version, which has not been fully edited, and content may change prior to final publication. Citation information: DOI 10.1109/ICORR66766.2025.11062964
Journal-ref: In 2025 International Conference On Rehabilitation Robotics (ICORR) (pp. 154-159). IEEE
Subjects: Robotics (cs.RO)

Current upper limb prostheses aim to enhance user independence in daily activities by incorporating basic motor functions. However, they fall short of replicating the natural movement and interaction capabilities of the human arm. In contrast, human limbs leverage intrinsic compliance and actively modulate joint stiffness, enabling adaptive responses to varying tasks, impact absorption, and efficient energy transfer during dynamic actions. Inspired by this adaptability, we developed a transhumeral prosthesis with Variable Stiffness Actuators (VSAs) to replicate the controllable compliance found in biological joints. The proposed prosthesis features a modular design, allowing customization for different residual limb shapes and accommodating a range of independent control signals derived from users' biological cues. Integrated elastic elements passively support more natural movements, facilitate safe interactions with the environment, and adapt to diverse task requirements. This paper presents a comprehensive overview of the platform and its functionalities, highlighting its potential applications in the field of prosthetics.

[414]  arXiv:2512.05000 [pdf, ps, other]
Title: Reflection Removal through Efficient Adaptation of Diffusion Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

We introduce a diffusion-transformer (DiT) framework for single-image reflection removal that leverages the generalization strengths of foundation diffusion models in the restoration setting. Rather than relying on task-specific architectures, we repurpose a pre-trained DiT-based foundation model by conditioning it on reflection-contaminated inputs and guiding it toward clean transmission layers. We systematically analyze existing reflection removal data sources for diversity, scalability, and photorealism. To address the shortage of suitable data, we construct a physically based rendering (PBR) pipeline in Blender, built around the Principled BSDF, to synthesize realistic glass materials and reflection effects. Efficient LoRA-based adaptation of the foundation model, combined with the proposed synthetic data, achieves state-of-the-art performance on in-domain and zero-shot benchmarks. These results demonstrate that pretrained diffusion transformers, when paired with physically grounded data synthesis and efficient adaptation, offer a scalable and high-fidelity solution for reflection removal. Project page: https://hf.co/spaces/huawei-bayerlab/windowseat-reflection-removal-web

[415]  arXiv:2512.05006 [pdf, ps, other]
Title: Self-Supervised Learning for Transparent Object Depth Completion Using Depth from Non-Transparent Objects
Comments: conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The perception of transparent objects is one of the well-known challenges in computer vision. Conventional depth sensors have difficulty in sensing the depth of transparent objects due to refraction and reflection of light. Previous research has typically train a neural network to complete the depth acquired by the sensor, and this method can quickly and accurately acquire accurate depth maps of transparent objects. However, previous training relies on a large amount of annotation data for supervision, and the labeling of depth maps is costly. To tackle this challenge, we propose a new self-supervised method for training depth completion networks. Our method simulates the depth deficits of transparent objects within non-transparent regions and utilizes the original depth map as ground truth for supervision. Experiments demonstrate that our method achieves performance comparable to supervised approach, and pre-training with our method can improve the model performance when the training samples are small.

[416]  arXiv:2512.05008 [pdf, ps, other]
Title: Contact-Implicit Modeling and Simulation of a Snake Robot on Compliant and Granular Terrain
Authors: Haroon Hublikar
Subjects: Robotics (cs.RO)

This thesis presents a unified modeling and simulation framework for analyzing sidewinding and tumbling locomotion of the COBRA snake robot across rigid, compliant, and granular terrains. A contact-implicit formulation is used to model distributed frictional interactions during sidewinding, and validated through MATLAB Simscape simulations and physical experiments on rigid ground and loose sand. To capture terrain deformation effects, Project Chrono's Soil Contact Model (SCM) is integrated with the articulated multibody dynamics, enabling prediction of slip, sinkage, and load redistribution that reduce stride efficiency on deformable substrates. For high-energy rolling locomotion on steep slopes, the Chrono DEM Engine is used to simulate particle-resolved granular interactions, revealing soil failure, intermittent lift-off, and energy dissipation mechanisms not captured by rigid models. Together, these methods span real-time control-oriented simulation and high-fidelity granular physics. Results demonstrate that rigid-ground models provide accurate short-horizon motion prediction, while continuum and particle-based terrain modeling becomes necessary for reliable mobility analysis in soft and highly dynamic environments. This work establishes a hierarchical simulation pipeline that advances robust, terrain-aware locomotion for robots operating in challenging unstructured settings.

[417]  arXiv:2512.05012 [pdf, ps, other]
Title: Factuality and Transparency Are All RAG Needs! Self-Explaining Contrastive Evidence Re-ranking
Comments: This work was presented as a poster at the Applied Social Media Lab during the 2025 Synthesizer & Open Showcase at the Berkman Klein Center for Internet & Society at Harvard University
Subjects: Computation and Language (cs.CL)

This extended abstract introduces Self-Explaining Contrastive Evidence Re-Ranking (CER), a novel method that restructures retrieval around factual evidence by fine-tuning embeddings with contrastive learning and generating token-level attribution rationales for each retrieved passage. Hard negatives are automatically selected using a subjectivity-based criterion, forcing the model to pull factual rationales closer while pushing subjective or misleading explanations apart. As a result, the method creates an embedding space explicitly aligned with evidential reasoning. We evaluated our method on clinical trial reports, and initial experimental results show that CER improves retrieval accuracy, mitigates the potential for hallucinations in RAG systems, and provides transparent, evidence-based retrieval that enhances reliability, especially in safety-critical domains.

[418]  arXiv:2512.05013 [pdf, ps, other]
Title: Detecting Perspective Shifts in Multi-agent Systems
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Methodology (stat.ME)

Generative models augmented with external tools and update mechanisms (or \textit{agents}) have demonstrated capabilities beyond intelligent prompting of base models. As agent use proliferates, dynamic multi-agent systems have naturally emerged. Recent work has investigated the theoretical and empirical properties of low-dimensional representations of agents based on query responses at a single time point. This paper introduces the Temporal Data Kernel Perspective Space (TDKPS), which jointly embeds agents across time, and proposes several novel hypothesis tests for detecting behavioral change at the agent- and group-level in black-box multi-agent systems. We characterize the empirical properties of our proposed tests, including their sensitivity to key hyperparameters, in simulations motivated by a multi-agent system of evolving digital personas. Finally, we demonstrate via natural experiment that our proposed tests detect changes that correlate sensitively, specifically, and significantly with a real exogenous event. As far as we are aware, TDKPS is the first principled framework for monitoring behavioral dynamics in black-box multi-agent systems -- a critical capability as generative agent deployment continues to scale.

[419]  arXiv:2512.05015 [pdf, ps, other]
Title: Plug-and-Play Homeostatic Spark: Zero-Cost Acceleration for SNN Training Across Paradigms
Comments: 12 pages, 4 figures
Subjects: Neural and Evolutionary Computing (cs.NE)

Spiking neural networks offer event driven computation, sparse activation, and hardware efficiency, yet training often converges slowly and lacks stability. We present Adaptive Homeostatic Spiking Activity Regulation (AHSAR), an extremely simple plug in and training paradigm agnostic method that stabilizes optimization and accelerates convergence without changing the model architecture, loss, or gradients. AHSAR introduces no trainable parameters. It maintains a per layer homeostatic state during the forward pass, maps centered firing rate deviations to threshold scales through a bounded nonlinearity, uses lightweight cross layer diffusion to avoid sharp imbalance, and applies a slow across epoch global gain that combines validation progress with activity energy to tune the operating point. The computational cost is negligible. Across diverse training methods, SNN architectures of different depths, widths, and temporal steps, and both RGB and DVS datasets, AHSAR consistently improves strong baselines and enhances out of distribution robustness. These results indicate that keeping layer activity within a moderate band is a simple and effective principle for scalable and efficient SNN training.

[420]  arXiv:2512.05016 [pdf, ps, other]
Title: Generative Neural Video Compression via Video Diffusion Prior
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present GNVC-VD, the first DiT-based generative neural video compression framework built upon an advanced video generation foundation model, where spatio-temporal latent compression and sequence-level generative refinement are unified within a single codec. Existing perceptual codecs primarily rely on pre-trained image generative priors to restore high-frequency details, but their frame-wise nature lacks temporal modeling and inevitably leads to perceptual flickering. To address this, GNVC-VD introduces a unified flow-matching latent refinement module that leverages a video diffusion transformer to jointly enhance intra- and inter-frame latents through sequence-level denoising, ensuring consistent spatio-temporal details. Instead of denoising from pure Gaussian noise as in video generation, GNVC-VD initializes refinement from decoded spatio-temporal latents and learns a correction term that adapts the diffusion prior to compression-induced degradation. A conditioning adaptor further injects compression-aware cues into intermediate DiT layers, enabling effective artifact removal while maintaining temporal coherence under extreme bitrate constraints. Extensive experiments show that GNVC-VD surpasses both traditional and learned codecs in perceptual quality and significantly reduces the flickering artifacts that persist in prior generative approaches, even below 0.01 bpp, highlighting the promise of integrating video-native generative priors into neural codecs for next-generation perceptual video compression.

[421]  arXiv:2512.05021 [pdf, ps, other]
Title: HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Handwritten Text Recognition remains challenging due to the limited data, high writing style variance, and scripts with complex diacritics. Existing approaches, though partially address these issues, often struggle to generalize without massive synthetic data. To address these challenges, we propose HTR-ConvText, a model designed to capture fine-grained, stroke-level local features while preserving global contextual dependencies. In the feature extraction stage, we integrate a residual Convolutional Neural Network backbone with a MobileViT with Positional Encoding block. This enables the model to both capture structural patterns and learn subtle writing details. We then introduce the ConvText encoder, a hybrid architecture combining global context and local features within a hierarchical structure that reduces sequence length for improved efficiency. Additionally, an auxiliary module injects textual context to mitigate the weakness of Connectionist Temporal Classification. Evaluations on IAM, READ2016, LAM and HANDS-VNOnDB demonstrate that our approach achieves improved performance and better generalization compared to existing methods, especially in scenarios with limited training samples and high handwriting diversity.

[422]  arXiv:2512.05025 [pdf, ps, other]
Title: RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Earth observation (EO) data spans a wide range of spatial, spectral, and temporal resolutions, from high-resolution optical imagery to low resolution multispectral products or radar time series. While recent foundation models have improved multimodal integration for learning meaningful representations, they often expect fixed input resolutions or are based on sensor-specific encoders limiting generalization across heterogeneous EO modalities. To overcome these limitations we introduce RAMEN, a resolution-adjustable multimodal encoder that learns a shared visual representation across EO data in a fully sensor-agnostic manner. RAMEN treats the modality and spatial and temporal resolutions as key input data features, enabling coherent analysis across modalities within a unified latent space. Its main methodological contribution is to define spatial resolution as a controllable output parameter, giving users direct control over the desired level of detail at inference and allowing explicit trade-offs between spatial precision and computational cost. We train a single, unified transformer encoder reconstructing masked multimodal EO data drawn from diverse sources, ensuring generalization across sensors and resolutions. Once pretrained, RAMEN transfers effectively to both known and unseen sensor configurations and outperforms larger state-of-the-art models on the community-standard PANGAEA benchmark, containing various multi-sensor and multi-resolution downstream tasks. Our code and pretrained model are available at https://github.com/nicolashoudre/RAMEN.

[423]  arXiv:2512.05030 [pdf, ps, other]
Title: Dual-Path Region-Guided Attention Network for Ground Reaction Force and Moment Regression
Authors: Xuan Li, Samuel Bello
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Accurate estimation of three-dimensional ground reaction forces and moments (GRFs/GRMs) is crucial for both biomechanics research and clinical rehabilitation evaluation. In this study, we focus on insole-based GRF/GRM estimation and further validate our approach on a public walking dataset. We propose a Dual-Path Region-Guided Attention Network that integrates anatomy-inspired spatial priors and temporal priors into a region-level attention mechanism, while a complementary path captures context from the full sensor field. The two paths are trained jointly and their outputs are combined to produce the final GRF/GRM predictions. Conclusions: Our model outperforms strong baseline models, including CNN and CNN-LSTM architectures on two datasets, achieving the lowest six-component average NRMSE of 5.78% on the insole dataset and 1.42% for the vertical ground reaction force on the public dataset. This demonstrates robust performance for ground reaction force and moment estimation.

[424]  arXiv:2512.05033 [pdf, ps, other]
Title: Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
Comments: 22 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Modern Large Language Models achieve impressive reasoning capabilities with long Chain of Thoughts, but they incur substantial computational cost during inference, and this motivates techniques to improve the performance-cost ratio. Among these techniques, Speculative Decoding accelerates inference by employing a fast but inaccurate draft model to autoregressively propose tokens, which are then verified in parallel by a more capable target model. However, due to unnecessary rejections caused by token mismatches in semantically equivalent steps, traditional token-level Speculative Decoding struggles in reasoning tasks. Although recent works have shifted to step-level semantic verification, which improve efficiency by accepting or rejecting entire reasoning steps, existing step-level methods still regenerate many rejected steps with little improvement, wasting valuable target compute. To address this challenge, we propose Arbitrage, a novel step-level speculative generation framework that routes generation dynamically based on the relative advantage between draft and target models. Instead of applying a fixed acceptance threshold, Arbitrage uses a lightweight router trained to predict when the target model is likely to produce a meaningfully better step. This routing approximates an ideal Arbitrage Oracle that always chooses the higher-quality step, achieving near-optimal efficiency-accuracy trade-offs. Across multiple mathematical reasoning benchmarks, Arbitrage consistently surpasses prior step-level Speculative Decoding baselines, reducing inference latency by up to $\sim2\times$ at matched accuracy.

[425]  arXiv:2512.05038 [pdf, ps, other]
Title: SuperActivators: Only the Tail of the Distribution Contains Reliable Concept Signals
Subjects: Machine Learning (cs.LG)

Concept vectors aim to enhance model interpretability by linking internal representations with human-understandable semantics, but their utility is often limited by noisy and inconsistent activations. In this work, we uncover a clear pattern within the noise, which we term the SuperActivator Mechanism: while in-concept and out-of-concept activations overlap considerably, the token activations in the extreme high tail of the in-concept distribution provide a reliable signal of concept presence. We demonstrate the generality of this mechanism by showing that SuperActivator tokens consistently outperform standard vector-based and prompting concept detection approaches, achieving up to a 14% higher F1 score across image and text modalities, model architectures, model layers, and concept extraction techniques. Finally, we leverage SuperActivator tokens to improve feature attributions for concepts.

[426]  arXiv:2512.05039 [pdf, ps, other]
Title: Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding
Comments: Submitted for review CVPR-2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Facial Image inpainting aim is to restore the missing or corrupted regions in face images while preserving identity, structural consistency and photorealistic image quality, a task specifically created for photo restoration. Though there are recent lot of advances in deep generative models, existing methods face problems with large irregular masks, often producing blurry textures on the edges of the masked region, semantic inconsistencies, or unconvincing facial structures due to direct pixel level synthesis approach and limited exploitation of facial priors. In this paper we propose a novel architecture, which address these above challenges through semantic-guided hierarchical synthesis. Our approach starts with a method that organizes and synthesizes information based on meaning, followed by refining the texture. This process gives clear insights into the facial structure before we move on to creating detailed images. In the first stage, we blend two techniques: one that focuses on local features with CNNs and global features with Vision Transformers. This helped us create clear and detailed semantic layouts. In the second stage, we use a Multi-Modal Texture Generator to refine these layouts by pulling in information from different scales, ensuring everything looks cohesive and consistent. The architecture naturally handles arbitrary mask configurations through dynamic attention without maskspecific training. Experiment on two datasets CelebA-HQ and FFHQ shows that our model outperforms other state-of-the-art methods, showing improvements in metrics like LPIPS, PSNR, and SSIM. It produces visually striking results with better semantic preservation, in challenging large-area inpainting situations.

[427]  arXiv:2512.05044 [pdf, ps, other]
Title: Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Comments: 18 Pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generating interactive and dynamic 4D scenes from a single static image remains a core challenge. Most existing generate-then-reconstruct and reconstruct-then-generate methods decouple geometry from motion, causing spatiotemporal inconsistencies and poor generalization. To address these, we extend the reconstruct-then-generate framework to jointly perform Motion generation and geometric Reconstruction for 4D Synthesis (MoRe4D). We first introduce TrajScene-60K, a large-scale dataset of 60,000 video samples with dense point trajectories, addressing the scarcity of high-quality 4D scene data. Based on this, we propose a diffusion-based 4D Scene Trajectory Generator (4D-STraG) to jointly generate geometrically consistent and motion-plausible 4D point trajectories. To leverage single-view priors, we design a depth-guided motion normalization strategy and a motion-aware module for effective geometry and dynamics integration. We then propose a 4D View Synthesis Module (4D-ViSM) to render videos with arbitrary camera trajectories from 4D point track representations. Experiments show that MoRe4D generates high-quality 4D scenes with multi-view consistency and rich dynamic details from a single image. Code: https://github.com/Zhangyr2022/MoRe4D.

[428]  arXiv:2512.05053 [pdf, ps, other]
Title: A Randomized Scheduling Framework for Privacy-Preserving Multi-robot Rendezvous given Prior Information
Subjects: Systems and Control (eess.SY)

Privacy has become a critical concern in modern multi-robot systems, driven by both ethical considerations and operational constraints. As a result, growing attention has been directed toward privacy-preserving coordination in dynamical multi-robot systems. This work introduces a randomized scheduling mechanism for privacy-preserving robot rendezvous. The proposed approach achieves improved privacy even at lower communication rates, where privacy is quantified via pointwise maximal leakage. We show that lower transmission rates provide stronger privacy guarantees and prove that rendezvous is still achieved under the randomized scheduling mechanism. Numerical simulations are provided to demonstrate the effectiveness of the method.

[429]  arXiv:2512.05060 [pdf, ps, other]
Title: 4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer
Comments: Code: this https URL, Webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Constructing 4D language fields is crucial for embodied AI, augmented/virtual reality, and 4D scene understanding, as they provide enriched semantic representations of dynamic environments and enable open-vocabulary querying in complex scenarios. However, existing approaches to 4D semantic field construction primarily rely on scene-specific Gaussian splatting, which requires per-scene optimization, exhibits limited generalization, and is difficult to scale to real-world applications. To address these limitations, we propose 4DLangVGGT, the first Transformer-based feed-forward unified framework for 4D language grounding, that jointly integrates geometric perception and language alignment within a single architecture. 4DLangVGGT has two key components: the 4D Visual Geometry Transformer, StreamVGGT, which captures spatio-temporal geometric representations of dynamic scenes; and the Semantic Bridging Decoder (SBD), which projects geometry-aware features into a language-aligned semantic space, thereby enhancing semantic interpretability while preserving structural fidelity. Unlike prior methods that depend on costly per-scene optimization, 4DLangVGGT can be jointly trained across multiple dynamic scenes and directly applied during inference, achieving both deployment efficiency and strong generalization. This design significantly improves the practicality of large-scale deployment and establishes a new paradigm for open-vocabulary 4D scene understanding. Experiments on HyperNeRF and Neu3D datasets demonstrate that our approach not only generalizes effectively but also achieves state-of-the-art performance, achieving up to 2% gains under per-scene training and 1% improvements under multi-scene training. Our code released in https://github.com/hustvl/4DLangVGGT

[430]  arXiv:2512.05062 [pdf, ps, other]
Title: Configuration Defects in Kubernetes
Subjects: Software Engineering (cs.SE)

Kubernetes is a tool that facilitates rapid deployment of software. Unfortunately, configuring Kubernetes is prone to errors. Configuration defects are not uncommon and can result in serious consequences. This paper reports an empirical study about configuration defects in Kubernetes with the goal of helping practitioners detect and prevent these defects. We study 719 defects that we extract from 2,260 Kubernetes configuration scripts using open source repositories. Using qualitative analysis, we identify 15 categories of defects. We find 8 publicly available static analysis tools to be capable of detecting 8 of the 15 defect categories. We find that the highest precision and recall of those tools are for defects related to data fields. We develop a linter to detect two categories of defects that cause serious consequences, which none of the studied tools are able to detect. Our linter revealed 26 previously-unknown defects that have been confirmed by practitioners, 19 of which have already been fixed. We conclude our paper by providing recommendations on how defect detection and repair techniques can be used for Kubernetes configuration scripts. The datasets and source code used for the paper are publicly available online.

[431]  arXiv:2512.05065 [pdf, ps, other]
Title: Personalizing Agent Privacy Decisions via Logical Entailment
Subjects: Cryptography and Security (cs.CR)

Personal language model-based agents are becoming more widespread for completing tasks on behalf of users; however, this raises serious privacy questions regarding whether these models will appropriately disclose user data. While prior work has evaluated language models on data-sharing scenarios based on general privacy norms, we focus on personalizing language models' privacy decisions, grounding their judgments directly in prior user privacy decisions. Our findings suggest that general privacy norms are insufficient for effective personalization of privacy decisions. Furthermore, we find that eliciting privacy judgments from the model through In-context Learning (ICL) is unreliable to due misalignment with the user's prior privacy judgments and opaque reasoning traces, which make it difficult for the user to interpret the reasoning behind the model's decisions. To address these limitations, we propose ARIEL (Agentic Reasoning with Individualized Entailment Logic), a framework that jointly leverages a language model and rule-based logic for structured data-sharing reasoning. ARIEL is based on formulating personalization of data sharing as an entailment, whether a prior user judgment on a data-sharing request implies the same judgment for an incoming request. Our experimental evaluations on advanced models and publicly-available datasets demonstrate that ARIEL can reduce the F1 score error by $\textbf{39.1%}$ over language model-based reasoning (ICL), demonstrating that ARIEL is effective at correctly judging requests where the user would approve data sharing. Overall, our findings suggest that combining LLMs with strict logical entailment is a highly effective strategy for enabling personalized privacy judgments for agents.

[432]  arXiv:2512.05066 [pdf, ps, other]
Title: Multi-LLM Collaboration for Medication Recommendation
Comments: 8 pages, 5 figures, 1 table
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

As healthcare increasingly turns to AI for scalable and trustworthy clinical decision support, ensuring reliability in model reasoning remains a critical challenge. Individual large language models (LLMs) are susceptible to hallucinations and inconsistency, whereas naive ensembles of models often fail to deliver stable and credible recommendations. Building on our previous work on LLM Chemistry, which quantifies the collaborative compatibility among LLMs, we apply this framework to improve the reliability in medication recommendation from brief clinical vignettes. Our approach leverages multi-LLM collaboration guided by Chemistry-inspired interaction modeling, enabling ensembles that are effective (exploiting complementary strengths), stable (producing consistent quality), and calibrated (minimizing interference and error amplification). We evaluate our Chemistry-based Multi-LLM collaboration strategy on real-world clinical scenarios to investigate whether such interaction-aware ensembles can generate credible, patient-specific medication recommendations. Preliminary results are encouraging, suggesting that LLM Chemistry-guided collaboration may offer a promising path toward reliable and trustworthy AI assistants in clinical practice.

[433]  arXiv:2512.05067 [pdf, ps, other]
Title: Perceptually-Minimal Color Optimization for Web Accessibility: A Multi-Phase Constrained Approach
Authors: Lalitha A R
Subjects: Human-Computer Interaction (cs.HC); Optimization and Control (math.OC)

Web accessibility guidelines require sufficient color contrast between text and backgrounds; yet, manually adjusting colors often necessitates significant visual deviation, compromising vital brand aesthetics. We present a novel, multi-phase optimization approach for automatically generating WCAG-compliant colors while minimizing perceptual change to original design choices.
Our method treats this as a constrained, non-linear optimization problem, utilizing the modern perceptually uniform OKLCH color space. Crucially, the optimization is constrained to preserve the original hue ($\text{H}$) of the color, ensuring that modifications are strictly limited to necessary adjustments in lightness ($\text{L}$) and chroma ($\text{C}$). This is achieved through a three-phase sequence: binary search, gradient descent, and progressive constraint relaxation.
Evaluation on a dataset of 10,000 procedurally generated color pairs demonstrates that the algorithm successfully resolves accessibility violations in $77.22\%$ of cases, with $88.51\%$ of successful corrections exhibiting imperceptible color difference ($\Delta E_{2000} < 2.0$) as defined by standard perceptibility thresholds. The median perceptual change for successful adjustments is only $0.76\ \Delta E_{2000}$, and the algorithm achieves this with a median processing time of $0.876\text{ms}$ per color pair.
The approach demonstrates that accessibility compliance and visual design integrity can be achieved simultaneously through a computationally efficient, perceptually-aware optimization that respects brand identity. The algorithm is publicly implemented in the open-source cm-colors Python library.

[434]  arXiv:2512.05069 [pdf, ps, other]
Title: Hybrid Quantum-Classical Autoencoders for Unsupervised Network Intrusion Detection
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Quantum Physics (quant-ph)

Unsupervised anomaly-based intrusion detection requires models that can generalize to attack patterns not observed during training. This work presents the first large-scale evaluation of hybrid quantum-classical (HQC) autoencoders for this task. We construct a unified experimental framework that iterates over key quantum design choices, including quantum-layer placement, measurement approach, variational and non-variational formulations, and latent-space regularization. Experiments across three benchmark NIDS datasets show that HQC autoencoders can match or exceed classical performance in their best configurations, although they exhibit higher sensitivity to architectural decisions. Under zero-day evaluation, well-configured HQC models provide stronger and more stable generalization than classical and supervised baselines. Simulated gate-noise experiments reveal early performance degradation, indicating the need for noise-aware HQC designs. These results provide the first data-driven characterization of HQC autoencoder behavior for network intrusion detection and outline key factors that govern their practical viability. All experiment code and configurations are available at https://github.com/arasyi/hqcae-network-intrusion-detection.

[435]  arXiv:2512.05071 [pdf, ps, other]
Title: The Evolving Landscape of Interactive Surface Sensing Technologies
Comments: 12 pages, 10 figures
Subjects: Systems and Control (eess.SY)

Interactive surfaces have evolved from capacitive touch and IR based systems into a diverse ecosystem of sensing technologies that support rich and expressive human computer interaction. This survey traces that progression, beginning with infrared vision based approaches, such as FTIR and diffuse illumination, and the rise of capacitive touch as the dominant technology in modern devices, to focusing on contemporary modalities including vision and acoustic sensing. New technologies under development are also discussed, including mmWave radar, and vibration based techniques. Each sensing technique is examined in terms of its operating principles, resolution, scalability, and applications, along with discussions of multimodal integration. By comparing tradeoffs between sensing modalities, the survey highlights the technical and design factors that shape interactive surface performance and user experience. The review concludes by identifying persistent challenges, including sensing accuracy, power constraints, and privacy concerns, and outlines how emerging sensing modalities can enable future interactive environments to be ubiquitous and intelligent.

[436]  arXiv:2512.05073 [pdf, ps, other]
Title: David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design?
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Software Engineering (cs.SE)

Large Language Model(LLM) inference demands massive compute and energy, making domain-specific tasks expensive and unsustainable. As foundation models keep scaling, we ask: Is bigger always better for hardware design? Our work tests this by evaluating Small Language Models coupled with a curated agentic AI framework on NVIDIA's Comprehensive Verilog Design Problems(CVDP) benchmark. Results show that agentic workflows: through task decomposition, iterative feedback, and correction - not only unlock near-LLM performance at a fraction of the cost but also create learning opportunities for agents, paving the way for efficient, adaptive solutions in complex design tasks.

[437]  arXiv:2512.05076 [pdf, ps, other]
Title: BulletTime: Decoupled Control of Time and Camera Pose for Video Generation
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Emerging video diffusion models achieve high visual fidelity but fundamentally couple scene dynamics with camera motion, limiting their ability to provide precise spatial and temporal control. We introduce a 4D-controllable video diffusion framework that explicitly decouples scene dynamics from camera pose, enabling fine-grained manipulation of both scene dynamics and camera viewpoint. Our framework takes continuous world-time sequences and camera trajectories as conditioning inputs, injecting them into the video diffusion model through a 4D positional encoding in the attention layer and adaptive normalizations for feature modulation. To train this model, we curate a unique dataset in which temporal and camera variations are independently parameterized; this dataset will be made public. Experiments show that our model achieves robust real-world 4D control across diverse timing patterns and camera trajectories, while preserving high generation quality and outperforming prior work in controllability. See our website for video results: https://19reborn.github.io/Bullet4D/

[438]  arXiv:2512.05079 [pdf, ps, other]
Title: Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Object geometry is key information for robot manipulation. Yet, object reconstruction is a challenging task because cameras only capture partial observations of objects, especially when occlusion occurs. In this paper, we leverage two extra sources of information to reduce the ambiguity of vision signals. First, generative models learn priors of the shapes of commonly seen objects, allowing us to make reasonable guesses of the unseen part of geometry. Second, contact information, which can be obtained from videos and physical interactions, provides sparse constraints on the boundary of the geometry. We combine the two sources of information through contact-guided 3D generation. The guidance formulation is inspired by drag-based editing in generative models. Experiments on synthetic and real-world data show that our approach improves the reconstruction compared to pure 3D generation and contact-based optimization.

[439]  arXiv:2512.05080 [pdf, ps, other]
Title: OMTRA: A Multi-Task Generative Model for Structure-Based Drug Design
Comments: Presented at the Machine Learning for Structural Biology Workshop, 2025
Subjects: Machine Learning (cs.LG)

Structure-based drug design (SBDD) focuses on designing small-molecule ligands that bind to specific protein pockets. Computational methods are integral in modern SBDD workflows and often make use of virtual screening methods via docking or pharmacophore search. Modern generative modeling approaches have focused on improving novel ligand discovery by enabling de novo design. In this work, we recognize that these tasks share a common structure and can therefore be represented as different instantiations of a consistent generative modeling framework. We propose a unified approach in OMTRA, a multi-modal flow matching model that flexibly performs many tasks relevant to SBDD, including some with no analogue in conventional workflows. Additionally, we curate a dataset of 500M 3D molecular conformers, complementing protein-ligand data and expanding the chemical diversity available for training. OMTRA obtains state of the art performance on pocket-conditioned de novo design and docking; however, the effects of large-scale pretraining and multi-task training are modest. All code, trained models, and dataset for reproducing this work are available at https://github.com/gnina/OMTRA

[440]  arXiv:2512.05081 [pdf, ps, other]
Title: Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advances in autoregressive video diffusion have enabled real-time frame streaming, yet existing solutions still suffer from temporal repetition, drift, and motion deceleration. We find that naively applying StreamingLLM-style attention sinks to video diffusion leads to fidelity degradation and motion stagnation. To overcome this, we introduce Deep Forcing, which consists of two training-free mechanisms that address this without any fine-tuning. Specifically, 1) Deep Sink dedicates half of the sliding window to persistent sink tokens and re-aligns their temporal RoPE phase to the current timeline, stabilizing global context during long rollouts. 2) Participative Compression performs importance-aware KV cache pruning that preserves only tokens actively participating in recent attention while safely discarding redundant and degraded history, minimizing error accumulation under out-of-distribution length generation. Together, these components enable over 12x extrapolation (e.g. 5s-trained to 60s+ generation) with better imaging quality than LongLive, better aesthetic quality than RollingForcing, almost maintaining overall consistency, and substantial gains in dynamic degree, all while maintaining real-time generation. Our results demonstrate that training-free KV-cache management can match or exceed training-based approaches for autoregressively streaming long-video generation.

[441]  arXiv:2512.05084 [pdf, ps, other]
Title: Gradient Descent with Provably Tuned Learning-rate Schedules
Authors: Dravyansh Sharma
Subjects: Machine Learning (cs.LG)

Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one typically sets them using heuristic approaches without formal near-optimality guarantees. Recent work by Gupta and Roughgarden studies how to learn a good step-size in gradient descent. However, like most of the literature with theoretical guarantees for gradient-based optimization, their results rely on strong assumptions on the function class including convexity and smoothness which do not hold in typical applications. In this work, we develop novel analytical tools for provably tuning hyperparameters in gradient-based algorithms that apply to non-convex and non-smooth functions. We obtain matching sample complexity bounds for learning the step-size in gradient descent shown for smooth, convex functions in prior work (up to logarithmic factors) but for a much broader class of functions. Our analysis applies to gradient descent on neural networks with commonly used activation functions (including ReLU, sigmoid and tanh). We extend our framework to tuning multiple hyperparameters, including tuning the learning rate schedule, simultaneously tuning momentum and step-size, and pre-training the initialization vector. Our approach can be used to bound the sample complexity for minimizing both the validation loss as well as the number of gradient descent iterations.

[442]  arXiv:2512.05085 [pdf, ps, other]
Title: Performance Analysis of Fluid Reconfigurable Intelligent Surface over Covert Communications
Subjects: Information Theory (cs.IT)

This paper investigates the impact of the recently proposed concept of fluid reconfigurable intelligent surfaces (FRIS) on covert communications. Specifically, we consider a communication scenario where a legitimate transmitter aims to covertly deliver information to its intended receiver through a planar FRIS, while an adversary attempts to detect whether any transmission is occurring. In this context, we analyze the false alarm (FA) and missed detection (MD) probabilities, and derive a closed-form expression for the covertness outage probability (COP). Furthermore, the success probability is characterized under the optimal detection threshold, providing new insights into the trade-off between covertness and reliable transmission. Numerical results reveal that FRIS provides a clear advantage over fixed-position RIS at low-to-moderate transmit powers by improving reliability and enhancing covertness, while at very high power levels, fixed-position RIS may sustain slightly higher success probability due to reduced leakage toward the adversary.

[443]  arXiv:2512.05089 [pdf, ps, other]
Title: The Geometry of Intelligence: Deterministic Functional Topology as a Foundation for Real-World Perception
Authors: Eduardo Di Santi
Comments: 35 pages, 6 figures. This preprint develops a deterministic functional-topological framework showing that physical systems generate compact perceptual manifolds with finite radius. We provide theory, Monte-Carlo estimators, and validation across PM, battery, and ECG domains, unifying biological perception and self-supervised AI
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Real-world physical processes do not generate arbitrary variability: their signals concentrate on compact and low-variability subsets of functional space. This geometric structure enables rapid generalization from a few examples in both biological and artificial systems.
This work develops a deterministic functional-topological framework in which the set of valid realizations of a physical phenomenon forms a compact perceptual manifold with stable invariants and a finite Hausdorff radius. We show that the boundaries of this manifold can be discovered in a fully self-supervised manner through Monte Carlo sampling, even when the governing equations of the system are unknown.
We provide theoretical guarantees, practical estimators of knowledge boundaries, and empirical validations across three domains: electromechanical railway point machines, electrochemical battery discharge curves, and physiological ECG signals.
Our results demonstrate that deterministic functional topology offers a unified mathematical foundation for perception, representation, and world-model construction, explaining why biological learners and self-supervised AI models can generalize from limited observations.

[444]  arXiv:2512.05091 [pdf, ps, other]
Title: Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
Comments: Technical Report; Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advances in Multimodal Large Language Models (MLLMs) have significantly improved performance on tasks such as visual grounding and visual question answering. However, the reasoning processes of these models remain largely opaque; they typically output only final predictions without revealing the intermediate steps or fine-grained evidence (e.g., pixels, locations) that lead to the result. This contrasts with human intelligence, which naturally operates through a chain of visual reasoning. To address this limitation, we introduce the Visual Reasoning Tracer (VRT) task, which requires models to not only localize the target object but also explicitly predict the intermediate objects that form the reasoning path. To advance research in this area, we contribute: (1) VRT-Bench, a human-annotated benchmark for evaluating visual reasoning; (2) a new metric for assessing the quality of reasoning traces; and (3) VRT-80k, a large-scale dataset for reasoning model training. Our experiments reveal that while existing models often produce the correct final output, they struggle to ground their intermediate reasoning. In contrast, models trained on VRT-80k achieve substantial improvements in tracing the reasoning path.

[445]  arXiv:2512.05094 [pdf, ps, other]
Title: From Generated Human Videos to Physically Plausible Robot Trajectories
Comments: For project website, see this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Video generation models are rapidly improving in their ability to synthesize human actions in novel contexts, holding the potential to serve as high-level planners for contextual robot control. To realize this potential, a key research question remains open: how can a humanoid execute the human actions from generated videos in a zero-shot manner? This challenge arises because generated videos are often noisy and exhibit morphological distortions that make direct imitation difficult compared to real video. To address this, we introduce a two-stage pipeline. First, we lift video pixels into a 4D human representation and then retarget to the humanoid morphology. Second, we propose GenMimic-a physics-aware reinforcement learning policy conditioned on 3D keypoints, and trained with symmetry regularization and keypoint-weighted tracking rewards. As a result, GenMimic can mimic human actions from noisy, generated videos. We curate GenMimicBench, a synthetic human-motion dataset generated using two video generation models across a spectrum of actions and contexts, establishing a benchmark for assessing zero-shot generalization and policy robustness. Extensive experiments demonstrate improvements over strong baselines in simulation and confirm coherent, physically stable motion tracking on a Unitree G1 humanoid robot without fine-tuning. This work offers a promising path to realizing the potential of video generation models as high-level policies for robot control.

[446]  arXiv:2512.05098 [pdf, ps, other]
Title: SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards
Authors: Yuan Gao, Jin Song
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In recent years, Image Quality Assessment (IQA) for AI-generated images (AIGI) has advanced rapidly; however, existing methods primarily target portraits and artistic images, lacking a systematic evaluation of interior scenes. We introduce Spatial Aesthetics, a paradigm that assesses the aesthetic quality of interior images along four dimensions: layout, harmony, lighting, and distortion. We construct SA-BENCH, the first benchmark for spatial aesthetics, comprising 18,000 images and 50,000 precise annotations. Employing SA-BENCH, we systematically evaluate current IQA methodologies and develop SA-IQA, through MLLM fine-tuning and a multidimensional fusion approach, as a comprehensive reward framework for assessing spatial aesthetics. We apply SA-IQA to two downstream tasks: (1) serving as a reward signal integrated with GRPO reinforcement learning to optimize the AIGC generation pipeline, and (2) Best-of-N selection to filter high-quality images and improve generation quality. Experiments indicate that SA-IQA significantly outperforms existing methods on SA-BENCH, setting a new standard for spatial aesthetics evaluation. Code and dataset will be open-sourced to advance research and applications in this domain.

[447]  arXiv:2512.05100 [pdf, ps, other]
Title: Structured Document Translation via Format Reinforcement Learning
Comments: IJCNLP-AACL 2025 Main (Oral)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recent works on structured text translation remain limited to the sentence level, as they struggle to effectively handle the complex document-level XML or HTML structures. To address this, we propose \textbf{Format Reinforcement Learning (FormatRL)}, which employs Group Relative Policy Optimization on top of a supervised fine-tuning model to directly optimize novel structure-aware rewards: 1) TreeSim, which measures structural similarity between predicted and reference XML trees and 2) Node-chrF, which measures translation quality at the level of XML nodes. Additionally, we apply StrucAUC, a fine-grained metric distinguishing between minor errors and major structural failures. Experiments on the SAP software-documentation benchmark demonstrate improvements across six metrics and an analysis further shows how different reward functions contribute to improvements in both structural and translation quality.

[448]  arXiv:2512.05103 [pdf, ps, other]
Title: TV2TV: A Unified Framework for Interleaved Language and Video Generation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Video generation models are rapidly advancing, but can still struggle with complex video outputs that require significant semantic branching or repeated high-level reasoning about what should happen next. In this paper, we introduce a new class of omni video-text models that integrate ideas from recent LM reasoning advances to address this challenge. More specifically, we present TV2TV, a unified generative modeling framework which decomposes video generation into an interleaved text and video generation process. TV2TV jointly learns language modeling (next-token prediction) and video flow matching (next-frame prediction) using a Mixture-of-Transformers (MoT) architecture. At inference time, TV2TV decides when to alternate between generating text and video frames, allowing the model to "think in words" about subsequent content before ``acting in pixels'' to produce frames. This design offloads much of the responsibility for deciding what should happen next to the language modeling tower, enabling improved visual quality and prompt alignment of generated videos. It also enables fine-grained controllability, allowing users to modify the video generation trajectory through text interventions at any point in the process. In controlled experiments on video game data, TV2TV demonstrates substantial improvements in both visual quality and controllability. TV2TV also scales to natural videos, as we show by augmenting sports videos with interleaved natural language action descriptions using vision-language models (VLMs). Training TV2TV on this corpus yields strong visual quality and prompt alignment, showcasing the model's ability to reason about and generate complex real-world action sequences. Together, these results highlight TV2TV as a promising step toward video generation with open-ended textual reasoning and control.

[449]  arXiv:2512.05104 [pdf, ps, other]
Title: EvoIR: Towards All-in-One Image Restoration via Evolutionary Frequency Modulation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

All-in-One Image Restoration (AiOIR) tasks often involve diverse degradation that require robust and versatile strategies. However, most existing approaches typically lack explicit frequency modeling and rely on fixed or heuristic optimization schedules, which limit the generalization across heterogeneous degradation. To address these limitations, we propose EvoIR, an AiOIR-specific framework that introduces evolutionary frequency modulation for dynamic and adaptive image restoration. Specifically, EvoIR employs the Frequency-Modulated Module (FMM) that decomposes features into high- and low-frequency branches in an explicit manner and adaptively modulates them to enhance both structural fidelity and fine-grained details. Central to EvoIR, an Evolutionary Optimization Strategy (EOS) iteratively adjusts frequency-aware objectives through a population-based evolutionary process, dynamically balancing structural accuracy and perceptual fidelity. Its evolutionary guidance further mitigates gradient conflicts across degradation and accelerates convergence. By synergizing FMM and EOS, EvoIR yields greater improvements than using either component alone, underscoring their complementary roles. Extensive experiments on multiple benchmarks demonstrate that EvoIR outperforms state-of-the-art AiOIR methods.

[450]  arXiv:2512.05105 [pdf, ps, other]
Title: Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

Long context reasoning in large language models (LLMs) has demonstrated enhancement of their cognitive capabilities via chain-of-thought (CoT) inference. Training such models is usually done via reinforcement learning with verifiable rewards (RLVR) in reasoning based problems, like math and programming. However, RLVR is limited by several bottlenecks, such as, lack of dense reward, and inadequate sample efficiency. As a result, it requires significant compute resources in post-training phase. To overcome these limitations, in this work, we propose \textbf{Semantic Soft Bootstrapping (SSB)}, a self-distillation technique, in which the same base language model plays the role of both teacher and student, but receives different semantic contexts about the correctness of its outcome at training time. The model is first prompted with a math problem and several rollouts are generated. From them, the correct and most common incorrect response are filtered, and then provided to the model in context to produce a more robust, step-by-step explanation with a verified final answer. This pipeline automatically curates a paired teacher-student training set from raw problem-answer data, without any human intervention. This generation process also produces a sequence of logits, which is what the student model tries to match in the training phase just from the bare question alone. In our experiment, Qwen2.5-3B-Instruct on GSM8K dataset via parameter-efficient fine-tuning. We then tested its accuracy on MATH500, and AIME2024 benchmarks. Our experiments show a jump of 10.6%, and 10% improvements in accuracy, respectively, over group relative policy optimization (GRPO), which is a commonly used RLVR algorithm. Our code is available at https://github.com/purbeshmitra/semantic-soft-bootstrapping, and the model, curated dataset is available at https://huggingface.co/purbeshmitra/semantic-soft-bootstrapping.

[451]  arXiv:2512.05106 [pdf, ps, other]
Title: NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Robotics (cs.RO)

Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion {\phi}-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. {\phi}-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, {\phi}-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, {\phi}-PD improves CARLA-to-Waymo planner performance by 50\%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our \href{https://yuzeng-at-tri.github.io/ppd-page/}{project page}.

[452]  arXiv:2512.05107 [pdf, ps, other]
Title: STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models
Subjects: Robotics (cs.RO)

Recent advances in Vision-Language-Action (VLA) models, powered by large language models and reinforcement learning-based fine-tuning, have shown remarkable progress in robotic manipulation. Existing methods often treat long-horizon actions as linguistic sequences and apply trajectory-level optimization methods such as Trajectory-wise Preference Optimization (TPO) or Proximal Policy Optimization (PPO), leading to coarse credit assignment and unstable training. However, unlike language, where a unified semantic meaning is preserved despite flexible sentence order, action trajectories progress through causally chained stages with different learning difficulties. This motivates progressive stage optimization. Thereby, we present Stage-Aware Reinforcement (STARE), a module that decomposes a long-horizon action trajectory into semantically meaningful stages and provides dense, interpretable, and stage-aligned reinforcement signals. Integrating STARE into TPO and PPO, we yield Stage-Aware TPO (STA-TPO) and Stage-Aware PPO (STA-PPO) for offline stage-wise preference and online intra-stage interaction, respectively. Further building on supervised fine-tuning as initialization, we propose the Imitation -> Preference -> Interaction (IPI), a serial fine-tuning pipeline for improving action accuracy in VLA models. Experiments on SimplerEnv and ManiSkill3 demonstrate substantial gains, achieving state-of-the-art success rates of 98.0 percent on SimplerEnv and 96.4 percent on ManiSkill3 tasks.

[453]  arXiv:2512.05110 [pdf, ps, other]
Title: ShadowDraw: From Any Object to Shadow-Drawing Compositional Art
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)

We introduce ShadowDraw, a framework that transforms ordinary 3D objects into shadow-drawing compositional art. Given a 3D object, our system predicts scene parameters, including object pose and lighting, together with a partial line drawing, such that the cast shadow completes the drawing into a recognizable image. To this end, we optimize scene configurations to reveal meaningful shadows, employ shadow strokes to guide line drawing generation, and adopt automatic evaluation to enforce shadow-drawing coherence and visual quality. Experiments show that ShadowDraw produces compelling results across diverse inputs, from real-world scans and curated datasets to generative assets, and naturally extends to multi-object scenes, animations, and physical deployments. Our work provides a practical pipeline for creating shadow-drawing art and broadens the design space of computational visual art, bridging the gap between algorithmic design and artistic storytelling. Check out our project page https://red-fairy.github.io/ShadowDraw/ for more results and an end-to-end real-world demonstration of our pipeline!

[454]  arXiv:2512.05111 [pdf, ps, other]
Title: ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Reward models are critical for aligning vision-language systems with human preferences, yet current approaches suffer from hallucination, weak visual grounding, and an inability to use tools for verification, limiting their reliability on complex multimodal reasoning tasks. We present ARM-Thinker, an A}gentic multimodal Reward Model that autonomously invokes external tools (e.g., image cropping, doc page retrieval) to ground judgments in verifiable evidence, replacing static, non-interactive reward scoring. This enables the model to verify fine-grained visual details, cross-reference multi-page evidence, and validate reasoning claims, which are capabilities absent in existing reward models. We train ARM-Thinker with multi-stage reinforcement learning, jointly optimizing tool-calling decisions and judgment accuracy. To evaluate agentic reward modeling, we introduce ARMBench-VL, comprising three benchmarks that assess fine-grained visual grounding (image-level tools), multi-page document understanding (retrieval tools), and instruction following (text-level verification). ARM-Thinker achieves +16.2% average improvement on reward modeling benchmarks, +9.6% on tool-use tasks, and outperforms baselines on multimodal math and logical reasoning benchmarks. Our results demonstrate that agentic capabilities significantly enhance both accuracy and interpretability of reward models.

[455]  arXiv:2512.05112 [pdf, ps, other]
Title: DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent unified multimodal large language models (MLLMs) have shown impressive capabilities, incorporating chain-of-thought (CoT) reasoning for enhanced text-to-image generation. However, existing approaches remain limited, either treating the model merely as a standalone generator or relying on abstract textual planning. To this end, we propose Draft-as-CoT (DraCo), a novel interleaved reasoning paradigm that fully leverages both textual and visual contents in CoT for better planning and verification. Our method first generates a low-resolution draft image as preview, providing more concrete and structural visual planning and guidance. Then, we employ the model's inherent understanding capability to verify potential semantic misalignments between the draft and input prompt, and performs refinement through selective corrections with super-resolution. In this way, our approach addresses two fundamental challenges: the coarse-grained nature of textual planning and the difficulty in generating rare attribute combinations. To support training, we curate DraCo-240K, aiming to enhance three atomic capabilities spanning general correction, instance manipulation, and layout reorganization. Supported by DraCo-CFG, a specialized classifier-free guidance (CFG) strategy for interleaved reasoning, DraCo achieves a tremendous increase on GenEval (+8%), Imagine-Bench (+0.91), and GenEval++ (+3%), significantly outperforming direct generation and other generation methods empowered by CoT.

[456]  arXiv:2512.05113 [pdf, ps, other]
Title: Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting
Comments: WACV 2025. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Synthesizing high-fidelity frozen 3D scenes from monocular Mannequin-Challenge (MC) videos is a unique problem distinct from standard dynamic scene reconstruction. Instead of focusing on modeling motion, our goal is to create a frozen scene while strategically preserving subtle dynamics to enable user-controlled instant selection. To achieve this, we introduce a novel application of dynamic Gaussian splatting: the scene is modeled dynamically, which retains nearby temporal variation, and a static scene is rendered by fixing the model's time parameter. However, under this usage, monocular capture with sparse temporal supervision introduces artifacts like ghosting and blur for Gaussians that become unobserved or occluded at weakly supervised timestamps. We propose Splannequin, an architecture-agnostic regularization that detects two states of Gaussian primitives, hidden and defective, and applies temporal anchoring. Under predominantly forward camera motion, hidden states are anchored to their recent well-observed past states, while defective states are anchored to future states with stronger supervision. Our method integrates into existing dynamic Gaussian pipelines via simple loss terms, requires no architectural changes, and adds zero inference overhead. This results in markedly improved visual quality, enabling high-fidelity, user-selectable frozen-time renderings, validated by a 96% user preference. Project page: https://chien90190.github.io/splannequin/

[457]  arXiv:2512.05114 [pdf, ps, other]
Title: Deep infant brain segmentation from multi-contrast MRI
Comments: 8 pages, 8 figures, 1 table, website at this https URL, presented at the 2025 IEEE Asilomar Conference on Signals, Systems, and Computers
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Segmentation of magnetic resonance images (MRI) facilitates analysis of human brain development by delineating anatomical structures. However, in infants and young children, accurate segmentation is challenging due to development and imaging constraints. Pediatric brain MRI is notoriously difficult to acquire, with inconsistent availability of imaging modalities, substantial non-head anatomy in the field of view, and frequent motion artifacts. This has led to specialized segmentation models that are often limited to specific image types or narrow age groups, or that are fragile for more variable images such as those acquired clinically. We address this method fragmentation with BabySeg, a deep learning brain segmentation framework for infants and young children that supports diverse MRI protocols, including repeat scans and image types unavailable during training. Our approach builds on recent domain randomization techniques, which synthesize training images far beyond realistic bounds to promote dataset shift invariance. We also describe a mechanism that enables models to flexibly pool and interact features from any number of input scans. We demonstrate state-of-the-art performance that matches or exceeds the accuracy of several existing methods for various age cohorts and input configurations using a single model, in a fraction of the runtime required by many existing tools.

[458]  arXiv:2512.05115 [pdf, ps, other]
Title: Light-X: Generative 4D Video Rendering with Camera and Illumination Control
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advances in illumination control extend image-based methods to video, yet still facing a trade-off between lighting fidelity and temporal consistency. Moving beyond relighting, a key step toward generative modeling of real-world scenes is the joint control of camera trajectory and illumination, since visual dynamics are inherently shaped by both geometry and lighting. To this end, we present Light-X, a video generation framework that enables controllable rendering from monocular videos with both viewpoint and illumination control. 1) We propose a disentangled design that decouples geometry and lighting signals: geometry and motion are captured via dynamic point clouds projected along user-defined camera trajectories, while illumination cues are provided by a relit frame consistently projected into the same geometry. These explicit, fine-grained cues enable effective disentanglement and guide high-quality illumination. 2) To address the lack of paired multi-view and multi-illumination videos, we introduce Light-Syn, a degradation-based pipeline with inverse-mapping that synthesizes training pairs from in-the-wild monocular footage. This strategy yields a dataset covering static, dynamic, and AI-generated scenes, ensuring robust training. Extensive experiments show that Light-X outperforms baseline methods in joint camera-illumination control and surpasses prior video relighting methods under both text- and background-conditioned settings.

[459]  arXiv:2512.05116 [pdf, ps, other]
Title: Value Gradient Guidance for Flow Matching Alignment
Comments: Accepted at NeurIPS 2025; 26 pages, 20 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

While methods exist for aligning flow matching models--a popular and effective class of generative models--with human preferences, existing approaches fail to achieve both adaptation efficiency and probabilistically sound prior preservation. In this work, we leverage the theory of optimal control and propose VGG-Flow, a gradient-matching-based method for finetuning pretrained flow matching models. The key idea behind this algorithm is that the optimal difference between the finetuned velocity field and the pretrained one should be matched with the gradient field of a value function. This method not only incorporates first-order information from the reward model but also benefits from heuristic initialization of the value function to enable fast adaptation. Empirically, we show on a popular text-to-image flow matching model, Stable Diffusion 3, that our method can finetune flow matching models under limited computational budgets while achieving effective and prior-preserving alignment.

[460]  arXiv:2512.05117 [pdf, ps, other]
Title: The Universal Weight Subspace Hypothesis
Comments: 37 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

We show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demonstrates that neural networks systematically converge to shared spectral subspaces regardless of initialization, task, or domain. Through mode-wise spectral analysis of over 1100 models - including 500 Mistral-7B LoRAs, 500 Vision Transformers, and 50 LLaMA-8B models - we identify universal subspaces capturing majority variance in just a few principal directions. By applying spectral decomposition techniques to the weight matrices of various architectures trained on a wide range of tasks and datasets, we identify sparse, joint subspaces that are consistently exploited, within shared architectures across diverse tasks and datasets. Our findings offer new insights into the intrinsic organization of information within deep networks and raise important questions about the possibility of discovering these universal subspaces without the need for extensive data and computational resources. Furthermore, this inherent structure has significant implications for model reusability, multi-task learning, model merging, and the development of training and inference-efficient algorithms, potentially reducing the carbon footprint of large-scale neural models.

Cross-lists for Fri, 5 Dec 25

[461]  arXiv:2512.04087 (cross-list from q-bio.NC) [pdf, ps, other]
Title: Human-Centred Evaluation of Text-to-Image Generation Models for Self-expression of Mental Distress: A Dataset Based on GPT-4o
Authors: Sui He, Shenbin Qian
Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Effective communication is central to achieving positive healthcare outcomes in mental health contexts, yet international students often face linguistic and cultural barriers that hinder their communication of mental distress. In this study, we evaluate the effectiveness of AI-generated images in supporting self-expression of mental distress. To achieve this, twenty Chinese international students studying at UK universities were invited to describe their personal experiences of mental distress. These descriptions were elaborated using GPT-4o with four persona-based prompt templates rooted in contemporary counselling practice to generate corresponding images. Participants then evaluated the helpfulness of generated images in facilitating the expression of their feelings based on their original descriptions. The resulting dataset comprises 100 textual descriptions of mental distress, 400 generated images, and corresponding human evaluation scores. Findings indicate that prompt design substantially affects perceived helpfulness, with the illustrator persona achieving the highest ratings. This work introduces the first publicly available text-to-image evaluation dataset with human judgment scores in the mental health domain, offering valuable resources for image evaluation, reinforcement learning with human feedback, and multi-modal research on mental health communication.

[462]  arXiv:2512.04092 (cross-list from physics.soc-ph) [pdf, ps, other]
Title: The changing surface of the world's roads
Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)

Resilient road infrastructure is a cornerstone of the UN Sustainable Development Goals. Yet a primary indicator of network functionality and resilience is critically lacking: a comprehensive global baseline of road surface information. Here, we overcome this gap by applying a deep learning framework to a global mosaic of Planetscope satellite imagery from 2020 and 2024. The result is the first global multi-temporal dataset of road pavedness and width for 9.2 million km of critical arterial roads, achieving 95.5% coverage where nearly half the network was previously unclassified. This dataset reveals a powerful multi-scale geography of human development. At the planetary scale, we show that the rate of change in pavedness is a robust proxy for a country's development trajectory (correlation with HDI = 0.65). At the national scale, we quantify how unpaved roads constitute a fragile backbone for economic connectivity. We further synthesize our data into a global Humanitarian Passability Matrix with direct implications for humanitarian logistics. At the local scale, case studies demonstrate the framework's versatility: in Ghana, road quality disparities expose the spatial outcomes of governance; in Pakistan, the data identifies infrastructure vulnerabilities to inform climate resilience planning. Together, this work delivers both a foundational dataset and a multi-scale analytical framework for monitoring global infrastructure, from the dynamics of national development to the realities of local governance, climate adaptation, and equity. Unlike traditional proxies such as nighttime lights, which reflect economic activity, road surface data directly measures the physical infrastructure that underpins prosperity and resilience - at higher spatial resolution.

[463]  arXiv:2512.04099 (cross-list from q-fin.ST) [pdf, ps, other]
Title: Partial multivariate transformer as a tool for cryptocurrencies time series prediction
Comments: Accepted for publication in the proceedings of ICTAI 2025
Subjects: Statistical Finance (q-fin.ST); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Trading and Market Microstructure (q-fin.TR)

Forecasting cryptocurrency prices is hindered by extreme volatility and a methodological dilemma between information-scarce univariate models and noise-prone full-multivariate models. This paper investigates a partial-multivariate approach to balance this trade-off, hypothesizing that a strategic subset of features offers superior predictive power. We apply the Partial-Multivariate Transformer (PMformer) to forecast daily returns for BTCUSDT and ETHUSDT, benchmarking it against eleven classical and deep learning models. Our empirical results yield two primary contributions. First, we demonstrate that the partial-multivariate strategy achieves significant statistical accuracy, effectively balancing informative signals with noise. Second, we experiment and discuss an observable disconnect between this statistical performance and practical trading utility; lower prediction error did not consistently translate to higher financial returns in simulations. This finding challenges the reliance on traditional error metrics and highlights the need to develop evaluation criteria more aligned with real-world financial objectives.

[464]  arXiv:2512.04100 (cross-list from eess.SP) [pdf, ps, other]
Title: ReVeal-MT: A Physics-Informed Neural Network for Multi-Transmitter Radio Environment Mapping
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)

Accurately mapping the radio environment (e.g., identifying wireless signal strength at specific frequency bands and geographic locations) is crucial for efficient spectrum sharing, enabling Secondary Users~(SUs) to access underutilized spectrum bands while protecting Primary Users~(PUs). While existing models have made progress, they often degrade in performance when multiple transmitters coexist, due to the compounded effects of shadowing, interference from adjacent transmitters. To address this challenge, we extend our prior work on Physics-Informed Neural Networks~(PINNs) for single-transmitter mapping to derive a new multi-transmitter Partial Differential Equation~(PDE) formulation of the Received Signal Strength Indicator~(RSSI). We then propose \emph{ReVeal-MT} (Re-constructor and Visualizer of Spectrum Landscape for Multiple Transmitters), a novel PINN which integrates the multi-source PDE residual into a neural network loss function, enabling accurate spectrum landscape reconstruction from sparse RF sensor measurements. ReVeal-MT is validated using real-world measurements from the ARA wireless living lab across rural and suburban environments, and benchmarked against 3GPP and ITU-R channel models and a baseline PINN model for a single transmitter use-case. Results show that ReVeal-MT achieves substantial accuracy gains in multi-transmitter scenarios, e.g., achieving an RMSE of only 2.66\,dB with as few as 45 samples over a 370-square-kilometer region, while maintaining low computational complexity. These findings demonstrate that ReVeal-MT significantly advances radio environment mapping under realistic multi-transmitter conditions, with strong potential for enabling fine-grained spectrum management and precise coexistence between PUs and SUs.

[465]  arXiv:2512.04133 (cross-list from physics.flu-dyn) [pdf, ps, other]
Title: Effective permeabilities for flow through anisotropic microscopic geometries
Subjects: Fluid Dynamics (physics.flu-dyn); Numerical Analysis (math.NA)

This work develops a computational and theoretical framework for determining effective permeabilities in anisotropic microscopic geometries containing dense, fibre-like obstacles, motivated by the need to model flow in coiled aneurysm domains accurately. Building on homogenisation theory and fully resolved simulations in Representative Elementary Volumes (REVs), we validate the permeability model introduced in [C. Boutin, Study of permeability by periodic and self-consistent homogenisation. Eur. J. Mech. A Solids, 19(4):603-632, 2000] and propose a systematic methodology for capturing the directional variations induced by fibre orientation. The resulting permeability tensors are incorporated into macroscopic flow simulations based on the Darcy equation, enabling direct comparison of anisotropic and isotropic permeability models across several benchmark configurations. Our findings show that anisotropy has a significant impact on local flow direction and magnitude, generating directional permeability contrasts which cannot be reproduced by classical isotropic approximations. By integrating coil-induced microstructural effects into continuum-scale hemodynamic models, the proposed approach enables more realistic assessment of post-treatment aneurysm flow behaviour. Beyond this clinical application, the framework is broadly applicable to other biomedical and engineering systems involving fibrous or filamentous porous microstructures.

[466]  arXiv:2512.04149 (cross-list from hep-ph) [pdf, ps, other]
Title: Enhancing next token prediction based pre-training for jet foundation models
Subjects: High Energy Physics - Phenomenology (hep-ph); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); Data Analysis, Statistics and Probability (physics.data-an)

Next token prediction is an attractive pre-training task for jet foundation models, in that it is simulation free and enables excellent generative capabilities that can transfer across datasets. Here we study multiple improvements to next token prediction, building on the initial work of OmniJet-$\alpha$. Instead of tokenizing particles and subsequently only using the token-ID as the model input for both the generative and the classification task, we adopt a hybrid setup, which allows us to use continuous feature vectors as model input while only using token-IDs in the next token prediction target. Secondly, we explore a combined pre-training strategy that combines masked particle modeling and generative learning objectives. Taken together, these changes greatly improve the performance in downstream classification tasks without any loss in generative performance.

[467]  arXiv:2512.04169 (cross-list from quant-ph) [pdf, ps, other]
Title: Exploiting Movable Logical Qubits for Lattice Surgery Compilation
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)

Lattice surgery with two-dimensional quantum error correcting codes is among the leading schemes for fault-tolerant quantum computation, motivated by superconducting hardware architectures. In conventional lattice surgery compilation schemes, logical circuits are compiled following a place-and-route paradigm, where logical qubits remain statically fixed in space throughout the computation. In this work, we introduce a paradigm shift by exploiting movable logical qubits via teleportation during the logical lattice surgery CNOT gate. Focusing on lattice surgery with the color code, we propose a proof-of-concept compilation scheme that leverages this capability. Numerical simulations show that the proposed approach can substantially reduce the routed circuit depth compared to standard place-and-route compilation techniques. Our results demonstrate that optimizations based on movable logical qubits are not limited to architectures with physically movable qubits, such as neutral atoms or trapped ions - they are also readily applicable to superconducting quantum hardware. An open-source implementation of our method is available on GitHub https://github.com/munich-quantum-toolkit/qecc.

[468]  arXiv:2512.04182 (cross-list from eess.SP) [pdf, ps, other]
Title: A Spatial Array for Spectrally Agile Wireless Processing
Comments: Accepted and presented in 2025 Asilomar Conference on Signals, Systems, and Computers
Subjects: Signal Processing (eess.SP); Hardware Architecture (cs.AR)

Massive MIMO is a cornerstone of next-generation wireless communication, offering significant gains in capacity, reliability, and energy efficiency. However, to meet emerging demands such as high-frequency operation, wide bandwidths, co-existence, integrated sensing, and resilience to dynamic interference, future systems must exhibit both scalability and spectral agility. These requirements place increasing pressure on the underlying processing hardware to be both efficient and reconfigurable. This paper proposes a custom-designed spatial array architecture that serves as a reconfigurable, general-purpose core optimized for a class of wireless kernels that commonly arise in diverse communications and sensing tasks. The proposed spatial array is evaluated against specialized cores for each kernel using High-Level Synthesis (HLS). Both the reconfigurable and specialized designs are synthesized in a 32 nm process to assess latency, throughput, area, and power in realistic processes. The results identify conditions under which general-purpose systolic architectures can approach the efficiency of specialized cores, thereby paving the way toward more scalable and agile systems.

[469]  arXiv:2512.04194 (cross-list from math.OC) [pdf, ps, other]
Title: Probabilistic Safety under Arbitrary Disturbance Distributions using Piecewise-Affine Control Barrier Functions
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

We propose a simple safety filter design for stochastic discrete-time systems based on piecewise affine probabilistic control barrier functions, providing an appealing balance between modeling flexibility and computational complexity. Exact evaluation of the safety filter consists of solving a mixed-integer quadratic program (MIQP) if the dynamics are control-affine (or a mixed-integer nonlinear program in general). We propose a heuristic search method that replaces this by a small number of small-scale quadratic programs (QPs), or nonlinear programs (NLPs) respectively. The proposed approach provides a flexible framework in which arbitrary (data-driven) quantile estimators can be used to bound the probability of safety violations. Through extensive numerical experiments, we demonstrate improvements in conservatism and computation time with respect to existing methods, and we illustrate the flexibility of the method for modeling complex safety sets. Supplementary material can be found at https://mathijssch.github.io/ecc26-supplementary/.

[470]  arXiv:2512.04204 (cross-list from astro-ph.IM) [pdf, ps, other]
Title: Machine Phenomenology: A Simple Equation Classifying Fast Radio Bursts
Comments: 19 pages, 9 figures, 3 tables. Submitted to SCIENCE CHINA Physics, Mechanics & Astronomy
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); High Energy Astrophysical Phenomena (astro-ph.HE); Artificial Intelligence (cs.AI)

This work shows how human physical reasoning can guide machine-driven symbolic regression toward discovering empirical laws from observations. As an example, we derive a simple equation that classifies fast radio bursts (FRBs) into two distinct Gaussian distributions, indicating the existence of two physical classes. This human-AI workflow integrates feature selection, dimensional analysis, and symbolic regression: deep learning first analyzes CHIME Catalog 1 and identifies six independent parameters that collectively provide a complete description of FRBs; guided by Buckingham-$\pi$ analysis and correlation analysis, humans then construct dimensionless groups; finally, symbolic regression performed by the machine discovers the governing equation. When applied to the newer CHIME Catalog, the equation produces consistent results, demonstrating that it captures the underlying physics. This framework is applicable to a broad range of scientific domains.

[471]  arXiv:2512.04211 (cross-list from quant-ph) [pdf, ps, other]
Title: Simulation of a Heterogeneous Quantum Network
Comments: 9 pages, 8 figures
Subjects: Quantum Physics (quant-ph); Networking and Internet Architecture (cs.NI)

Quantum networks are expected to be heterogeneous systems, combining distinct qubit platforms, photon wavelengths, and device timescales to achieve scalable, multiuser connectivity. Building and iterating on such systems is costly and slow, motivating hardware-faithful simulations to explore architecture design space and justify implementation decisions. This paper presents a framework for simulating heterogeneous quantum networks based on SeQUeNCe, a discrete-event simulator of quantum networks. We introduce faithful device models for two representative platforms - Ytterbium atoms and superconducting qubits. On top of these models, we implement entanglement generation and entanglement swapping protocols for time-bin encoded photons that account for disparate clock rates and quantum frequency conversion and transducer losses/noise brought by the heterogeneity. Using extensive simulations, we map the rate-fidelity trade space and identify the dominant bottlenecks unique to heterogeneous systems. The models are open source and extensible, enabling reproducible evaluation of future heterogeneous designs and protocols.

[472]  arXiv:2512.04216 (cross-list from quant-ph) [pdf, ps, other]
Title: Maestro: Intelligent Execution for Quantum Circuit Simulation
Subjects: Quantum Physics (quant-ph); Mathematical Software (cs.MS); Software Engineering (cs.SE)

Quantum circuit simulation remains essential for developing and validating quantum algorithms, especially as current quantum hardware is limited in scale and quality. However, the growing diversity of simulation methods and software tools creates a high barrier to selecting the most suitable backend for a given circuit. We introduce Maestro, a unified interface for quantum circuit simulation that integrates multiple simulation paradigms - state vector, MPS, tensor network, stabilizer, GPU-accelerated, and p-block methods - under a single API. Maestro includes a predictive runtime model that automatically selects the optimal simulator based on circuit structure and available hardware, and applies backend-specific optimizations such as multiprocessing, GPU execution, and improved sampling. Benchmarks across heterogeneous workloads demonstrate that Maestro outperforms individual simulators in both single-circuit and large batched settings, particularly in high-performance computing environments. Maestro provides a scalable, extensible platform for quantum algorithm research, hybrid quantum-classical workflows, and emerging distributed quantum computing architectures.

[473]  arXiv:2512.04232 (cross-list from q-bio.PE) [pdf, ps, other]
Title: Decentralized Social Media and Artificial Intelligence in Digital Public Health Monitoring
Subjects: Populations and Evolution (q-bio.PE); Computers and Society (cs.CY)

Digital public health monitoring has long relied on data from major social media platforms. Twitter was once an indispensable resource for tracking disease outbreaks and public sentiment in real time. Researchers used Twitter to monitor everything from influenza spread to vaccine hesitancy, demonstrating that social media data can serve as an early-warning system for emerging health threats. However, recent shifts in the social media landscape have challenged this data-driven paradigm. Platform policy changes, exemplified by Twitter's withdrawal of free data access, now restrict the very data that fueled a decade of digital public health research. At the same time, advances in artificial intelligence, particularly large language models (LLMs), have dramatically expanded our capacity to analyze large-scale textual data across languages and contexts. This presents a paradox: we possess powerful new AI tools to extract insights from social media, but face dwindling access to the data. In this viewpoint, we examine how digital public health monitoring is navigating these countervailing trends. We discuss the rise of decentralized social networks like Mastodon and Bluesky as alternative data sources, weighing their openness and ethical alignment with research against their smaller scale and potential biases. Ultimately, we argue that digital public health surveillance must adapt by embracing new platforms and methodologies, focusing on common diseases and broad signals that remain detectable, while advocating for policies that preserve researchers' access to public data in privacy-respective ways.

[474]  arXiv:2512.04298 (cross-list from quant-ph) [pdf, ps, other]
Title: Experimental Sensitivity Enhancement of a Quantum Rydberg Atom-Based RF Receiver with a Metamaterial GRIN Lens
Subjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)

We experimentally demonstrate enhanced sensitivity of an atom-based Rydberg radio frequency (RF) receiver integrated with a gradient refractive index (GRIN) Luneburg-type metamaterial lens. By analyzing the electromagnetically induced transparency (EIT) effect in Cesium vapor, we compare receiver performance with and without the GRIN lens under a 2.2~GHz and a 3.6~GHz far-field excitation. Our measurements reveal a significant amplification of the EIT transparency window when the lens is introduced, consistent with the theoretical prediction that the local E-field enhancement at the vapor cell reduces the minimum detectable electric field and increases the signal-to-noise ratio (SNR) of the Rydberg RF receiver. This experimental validation highlights the potential of metamaterial-assisted quantum sensing to overcome the inherent bandwidth and sensitivity limitations of bare Rydberg receivers for a variety of applications, such as electromagnetic compatibility (EMC) testing, quantum radar, and wireless communications.

[475]  arXiv:2512.04312 (cross-list from physics.optics) [pdf, ps, other]
Title: Universal Quantum Interconnects via Phase-Coherent Four-Wave Mixing
Subjects: Optics (physics.optics); Systems and Control (eess.SY); Applied Physics (physics.app-ph); Quantum Physics (quant-ph)

Quantum transduction, which enables the coherent conversion of quantum information between disparate physical platforms, is a cornerstone for realizing scalable and interoperable quantum networks. Among various approaches, parametric frequency mixing processes such as four-wave mixing (FWM) offer a promising pathway toward efficient and low-noise transduction. In this work, we demonstrate the feasibility of coherent quantum state transfer by indirectly verifying high-fidelity wavefunction's phase mapping (>99%) from the input field to the generated output field wave. Using a gas-filled hollow-core capillary fiber, we systematically investigate spectral phase evolution across a broad range, including infrared (IR) to ultraviolet (UV) transitions, as well as conversions from telecom-band (1550 nm) to visible (516 nm) and deep-UV (308 nm) wavelengths. Our results reveal that strong phase coherence can be maintained throughout these diverse conversion regimes. Because quantum properties such as coherence and entanglement are intrinsically encoded in both the amplitude and phase of a photonic wavefunction, preserving spectral phase is essential for faithful quantum information transfer. We further show that efficient and phase-preserving transduction can be achieved by tuning system parameters, offering valuable insights into nonlinear coupling dynamics. These findings establish a promising foundation for advancing FWM-based quantum transduction schemes and open new avenues for integrating heterogeneous quantum systems across wide spectral domains within future quantum communication networks.

[476]  arXiv:2512.04322 (cross-list from physics.chem-ph) [pdf, ps, other]
Title: Quantum Chemistry Simulation of Dibenzothiophene for Asphalt Aging Analysis
Authors: Om Tailor
Comments: Runner-up Submission to QWC MITRE Industry Challenge
Subjects: Chemical Physics (physics.chem-ph); Emerging Technologies (cs.ET)

This paper presents the execution and analysis of a comprehensive quantum chemistry pipeline for gathering actionable insight into asphalt aging mechanisms through the study of dibenzothiophene (DBT), a key sulfur-containing compound in asphalt binders. Using advanced quantum algorithms, specifically Variational Quantum Eigensolver (VQE) with k-UpCCGSD and ADAPT-VQE ansatze, we achieved ground state energy calculations with accuracies reaching -864.69 Ha. Our implementation demonstrates quantum advantages in handling strongly correlated electron systems while providing actionable insights for designing oxidation-resistant asphalt formulations. The work also establishes a scalable framework for quantum-enhanced materials design.

[477]  arXiv:2512.04345 (cross-list from quant-ph) [pdf, ps, other]
Title: The operator layer cake theorem is equivalent to Frenkel's integral formula
Comments: This note is related to arXiv:2208.12194, arXiv:2507.06232, and arXiv:2507.07065
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph); Functional Analysis (math.FA)

The operator layer cake theorem provides an integral representation for the directional derivative of the operator logarithm in terms of a family of projections [arXiv:2507.06232]. Recently, the related work [arXiv:2507.07065] showed that the theorem gives an alternative proof to Frenkel's integral formula for Umegaki's relative entropy [Quantum, 7:1102 (2023)]. In this short note, we find a converse implication, demonstrating that the operator layer cake theorem is equivalent to Frenkel's integral formula.

[478]  arXiv:2512.04365 (cross-list from eess.SP) [pdf, ps, other]
Title: RRAM-Based Analog Matrix Computing for Massive MIMO Signal Processing: A Review
Authors: Pushen Zuo, Zhong Sun
Subjects: Signal Processing (eess.SP); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET)

Resistive random-access memory (RRAM) provides an excellent platform for analog matrix computing (AMC), enabling both matrix-vector multiplication (MVM) and the solution of matrix equations through open-loop and closed-loop circuit architectures. While RRAM-based AMC has been widely explored for accelerating neural networks, its application to signal processing in massive multiple-input multiple-output (MIMO) wireless communication is rapidly emerging as a promising direction. In this Review, we summarize recent advances in applying AMC to massive MIMO, including DFT/IDFT computation for OFDM modulation and demodulation using MVM circuits; MIMO detection and precoding using MVM-based iterative algorithms; and rapid one-step solutions enabled by matrix inversion (INV) and generalized inverse (GINV) circuits. We also highlight additional opportunities, such as AMC-based compressed-sensing recovery for channel estimation and eigenvalue circuits for leakage-based precoding. Finally, we outline key challenges, including RRAM device reliability, analog circuit precision, array scalability, and data conversion bottlenecks, and discuss the opportunities for overcoming these barriers. With continued progress in device-circuit-algorithm co-design, RRAM-based AMC holds strong promise for delivering high-efficiency, high-reliability solutions to (ultra)massive MIMO signal processing in the 6G era.

[479]  arXiv:2512.04371 (cross-list from math.PR) [pdf, ps, other]
Title: Constructive Approximation under Carleman's Condition, with Applications to Smoothed Analysis
Subjects: Probability (math.PR); Machine Learning (cs.LG); Functional Analysis (math.FA); Statistics Theory (math.ST); Machine Learning (stat.ML)

A classical result of Carleman, based on the theory of quasianalytic functions, shows that polynomials are dense in $L^2(\mu)$ for any $\mu$ such that the moments $\int x^k d\mu$ do not grow too rapidly as $k \to \infty$. In this work, we develop a fairly tight quantitative analogue of the underlying Denjoy-Carleman theorem via complex analysis, and show that this allows for nonasymptotic control of the rate of approximation by polynomials for any smooth function with polynomial growth at infinity. In many cases, this allows us to establish $L^2$ approximation-theoretic results for functions over general classes of distributions (e.g., multivariate sub-Gaussian or sub-exponential distributions) which were previously known only in special cases. As one application, we show that the Paley--Wiener class of functions bandlimited to $[-\Omega,\Omega]$ admits superexponential rates of approximation over all strictly sub-exponential distributions, which leads to a new characterization of the class. As another application, we solve an open problem recently posed by Chandrasekaran, Klivans, Kontonis, Meka and Stavropoulos on the smoothed analysis of learning, and also obtain quantitative improvements to their main results and applications.

[480]  arXiv:2512.04384 (cross-list from econ.TH) [pdf, ps, other]
Title: A 'Turtle Model' of Food System Transformations: Embracing Citizens' Diverse Values and Knowledge in Change Processes
Comments: 4 figures
Subjects: Theoretical Economics (econ.TH); Social and Information Networks (cs.SI)

We explore the challenges and opportunities of transitioning towards sustainable food systems through the lens of democratic food governance fostering inclusive and systemic transformation. Drawing on concepts of wicked problems and systems thinking, we propose a theory of change represented as a 'turtle model' that embraces the diversity of citizens' values and knowledge to highlight multiple avenues of transformation. As quadruple helix innovation and governance hubs, cities can be hotspots for food system transformations. We illustrate this for Dublin, Ireland, where local citizens' value-based food identities were galvanized to activate ecological awareness and promote sustainable seafood consumption. Within this democratic food governance framework, approaches such as open science, transdisciplinarity, and citizen engagement are fit-for-purpose to engage diverse food actors from government, industry, academia, and civil society in shared dialogue and action to transform food systems.

[481]  arXiv:2512.04391 (cross-list from quant-ph) [pdf, ps, other]
Title: Adversarial Limits of Quantum Certification: When Eve Defeats Detection
Authors: Davut Emre Tasar
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI)

Security of quantum key distribution (QKD) relies on certifying that observed correlations arise from genuine quantum entanglement rather than eavesdropper manipulation. Theoretical security proofs assume idealized conditions, practical certification must contend with adaptive adversaries who optimize their attack strategies against detection systems. Established fundamental adversarial limits for quantum certification using Eve GAN, a generative adversarial network trained to produce classical correlations indistinguishable from quantum. Our central finding: when Eve interpolates her classical correlations with quantum data at mixing parameter, all tested detection methods achieve ROC AUC = 0.50, equivalent to random guessing. This means an eavesdropper needs only 5% classical admixture to completely evade detection. Critically, we discover that same distribution calibration a common practice in prior certification studies inflates detection performance by 44 percentage points compared to proper cross distribution evaluation, revealing a systematic flaw that may have led to overestimated security claims. Analysis of Popescu Rohrlich (PR Box) regime identifies a sharp phase transition at CHSH S = 2.05: below this value, no statistical method distinguishes classical from quantum correlations; above it, detection probability increases monotonically. Hardware validation on IBM Quantum demonstrates that Eve-GAN achieves CHSH = 2.736, remarkably exceeding real quantum hardware performance (CHSH = 2.691), illustrating that classical adversaries can outperform noisy quantum systems on standard certification metrics. These results have immediate implications for QKD security: adversaries maintaining 95% quantum fidelity evade all tested detection methods. We provide corrected methodology using cross-distribution calibration and recommend mandatory adversarial testing for quantum security claims.

[482]  arXiv:2512.04392 (cross-list from stat.ML) [pdf, ps, other]
Title: Informative missingness and its implications in semi-supervised learning
Comments: 1
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-intensive, together with unlabelled data to enhance prediction performance. This defines an incomplete-data problem, which statistically can be formulated within the likelihood framework for finite mixture models that can be fitted using the expectation-maximisation (EM) algorithm. Ideally, one would prefer a completely labelled sample, as one would anticipate that a labelled observation provides more information than an unlabelled one. However, when the mechanism governing label absence depends on the observed features or the class labels or both, the missingness indicators themselves contain useful information. In certain situations, the information gained from modelling the missing-label mechanism can even outweigh the loss due to missing labels, yielding a classifier with a smaller expected error than one based on a completely labelled sample analysed. This improvement arises particularly when class overlap is moderate, labelled data are sparse, and the missingness is informative. Modelling such informative missingness thus offers a coherent statistical framework that unifies likelihood-based inference with the behaviour of empirical SSL methods.

[483]  arXiv:2512.04405 (cross-list from eess.SP) [pdf, ps, other]
Title: Towards 6G Native-AI Edge Networks: A Semantic-Aware and Agentic Intelligence Paradigm
Comments: submitted to Digital Communications and Networks
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)

The evolution toward sixth-generation wireless systems positions intelligence as a native network capability, fundamentally transforming the design of radio access networks (RANs). Within this vision, Semantic-native communication and agentic intelligence are expected to play central roles. SemCom departs from bit-level fidelity and instead emphasizes task-oriented meaning exchange, enabling compact SC and introducing new performance measures such as semantic fidelity and task success rate. Agentic intelligence endows distributed RAN entities with goal-driven autonomy, reasoning, planning, and multi-agent collaboration, increasingly supported by foundation models and knowledge graphs. In this work, we first introduce the conceptual foundations of SemCom and agentic networking, and discuss why existing AI-driven O-RAN solutions remain largely bit-centric and task-siloed. We then present a unified taxonomy that organizes recent research along three axes: i) semantic abstraction level (symbol/feature/intent/knowledge), ii) agent autonomy and coordination granularity (single-, multi-, and hierarchical-agent), and iii) RAN control placement across PHY/MAC, near-real-time RIC, and non-real-time RIC. Based on this taxonomy, we systematically introduce enabling technologies including task-oriented semantic encoders/decoders, multi-agent reinforcement learning, foundation-model-assisted RAN agents, and knowledge-graph-based reasoning for cross-layer awareness. Representative 6G use cases, such as immersive XR, vehicular V2X, and industrial digital twins, are analyzed to illustrate the semantic-agentic convergence in practice. Finally, we identify open challenges in semantic representation standardization, scalable trustworthy agent coordination, O-RAN interoperability, and energy-efficient AI deployment, and outline research directions toward operational semantic-agentic AI-RAN.

[484]  arXiv:2512.04434 (cross-list from physics.flu-dyn) [pdf, ps, other]
Title: Predicting Time-Dependent Flow Over Complex Geometries Using Operator Networks
Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG)

Fast, geometry-generalizing surrogates for unsteady flow remain challenging. We present a time-dependent, geometry-aware Deep Operator Network that predicts velocity fields for moderate-Re flows around parametric and non-parametric shapes. The model encodes geometry via a signed distance field (SDF) trunk and flow history via a CNN branch, trained on 841 high-fidelity simulations. On held-out shapes, it attains $\sim 5\%$ relative L2 single-step error and up to 1000X speedups over CFD. We provide physics-centric rollout diagnostics, including phase error at probes and divergence norms, to quantify long-horizon fidelity. These reveal accurate near-term transients but error accumulation in fine-scale wakes, most pronounced for sharp-cornered geometries. We analyze failure modes and outline practical mitigations. Code, splits, and scripts are openly released at: https://github.com/baskargroup/TimeDependent-DeepONet to support reproducibility and benchmarking.

[485]  arXiv:2512.04450 (cross-list from physics.comp-ph) [pdf, ps, other]
Title: On the Construction of High-Order and Exact Pressure Equilibrium Schemes for Arbitrary Equations of State
Comments: 29 pages, 13 figures
Subjects: Computational Physics (physics.comp-ph); Numerical Analysis (math.NA)

Typical fully conservative discretizations of the Euler compressible single or multi-component fluid equations governed by a real-fluid equation of state exhibit spurious pressure oscillations due to the nonlinearity of the thermodynamic relation between pressure, density, and internal energy. A fully conservative, pressure-equilibrium preserving method and a high-order, fully conservative, approximate pressure-equilibrium preserving method are presented. Both methods are general and can handle an arbitrary equation of state and arbitrary number of species. Unlike existing approaches to discretize the multi-component Euler equations, we do not introduce non conservative updates, overspecified equations, or design for a specific equation of state. The proposed methods are demonstrated on inviscid smooth interface advection problems governed by three equations of state: ideal-gas, stiffened-gas, and van der Waals where we show orders of magnitude reductions in spurious pressure oscillations compared to existing schemes.

[486]  arXiv:2512.04452 (cross-list from physics.ao-ph) [pdf, ps, other]
Title: NORi: An ML-Augmented Ocean Boundary Layer Parameterization
Comments: 48 pages, 16 figures, submitted to Journal of Advances in Modeling Earth Systems (JAMES)
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)

NORi is a machine-learned (ML) parameterization of ocean boundary layer turbulence that is physics-based and augmented with neural networks. NORi stands for neural ordinary differential equations (NODEs) Richardson number (Ri) closure. The physical parameterization is controlled by a Richardson number-dependent diffusivity and viscosity. The NODEs are trained to capture the entrainment through the base of the boundary layer, which cannot be represented with a local diffusive closure. The parameterization is trained using large-eddy simulations in an "a posteriori" fashion, where parameters are calibrated with a loss function that explicitly depends on the actual time-integrated variables of interest rather than the instantaneous subgrid fluxes, which are inherently noisy. NORi is designed for the realistic nonlinear equation of state of seawater and demonstrates excellent prediction and generalization capabilities in capturing entrainment dynamics under different convective strengths, oceanic background stratifications, rotation strengths, and surface wind forcings. NORi is numerically stable for at least 100 years of integration time in large-scale simulations, despite only being trained on 2-day horizons, and can be run with time steps as long as one hour. The highly expressive neural networks, combined with a physically-rigorous base closure, prove to be a robust paradigm for designing parameterizations for climate models where data requirements are drastically reduced, inference performance can be directly targeted and optimized, and numerical stability is implicitly encouraged during training.

[487]  arXiv:2512.04497 (cross-list from quant-ph) [pdf, ps, other]
Title: QReach: A Reachability Analysis Tool for Quantum Markov Chains
Comments: 15 pages, 5 figures
Journal-ref: Computer Aided Verification (CAV 2024), LNCS 14683, 2024, pp. 520-532
Subjects: Quantum Physics (quant-ph); Logic in Computer Science (cs.LO)

We present QReach, the first reachability analysis tool for quantum Markov chains based on decision diagrams CFLOBDD (presented at CAV 2023). QReach provides a novel framework for finding reachable subspaces, as well as a series of model-checking subprocedures like image computation. Experiments indicate its practicality in verification of quantum circuits and algorithms. QReach is expected to play a central role in future quantum model checkers.

[488]  arXiv:2512.04523 (cross-list from math.OC) [pdf, ps, other]
Title: An accelerated proximal bundle method for convex optimization
Comments: 21 pages, 1 figure
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

The proximal bundle method (PBM) is a powerful and widely used approach for minimizing nonsmooth convex functions. However, for smooth objectives, its best-known convergence rate remains suboptimal, and whether PBM can be accelerated remains open. In this work, we present the first accelerated proximal bundle method that achieves the optimal $\mathscr{O}(1/\sqrt{\epsilon})$ iteration complexity for obtaining an $\epsilon$-accurate solution in smooth convex optimization. The proposed method is conceptually simple, which differs from Nesterov's accelerated gradient descent by only a single line and retains all key structural properties of the classical PBM. In particular, it relies on the same minimal assumptions on model approximations and preserves the standard bundle testing criterion. Numerical experiments confirm the accelerated $\mathscr{O}(1/\sqrt{\epsilon})$ convergence rate predicted by our theory.

[489]  arXiv:2512.04574 (cross-list from physics.flu-dyn) [pdf, ps, other]
Title: Can Explicit Subgrid Models Enhance Implicit LES Simulations? A GPU-Oriented High-Order-Solver Perspective
Subjects: Fluid Dynamics (physics.flu-dyn); Numerical Analysis (math.NA)

High-order Discontinuous Galerkin (DG) methods offer excellent accuracy for turbulent flow simulations, especially when implemented on GPU-oriented architectures that favor very high polynomial orders. On modern GPUs, high-order polynomial evaluations cost roughly the same as low-order ones, provided the DG degrees of freedom fit within device memory. However, even with high-order discretizations, simulations at high Reynolds numbers still require some level of under-resolution, leaving them sensitive to numerical dissipation and aliasing effects. Here, we investigate the interplay between intrinsic DG dissipation mechanisms (implicit dissipation) -- in particular split forms and Riemann solvers -- and explicit subgrid-scale models in Large Eddy Simulations (LES). Using the three-dimensional Taylor--Green vortex at $Re = 1600$ and an inviscid case ($Re \to \infty$), we evaluate kinetic energy dissipation, spectral accuracy, and numerical stability.
Our results show that when stability for under-resolved turbulence is ensured through split-forms (energy- or entropy-stable) schemes, subgrid-scale (SGS) LES models are not strictly necessary. At moderate Reynolds numbers, when the spatial resolution is sufficient to capture the relevant turbulence scales (i.e., in well-resolved LES), adding SGS models does not improve accuracy because the wavenumber range where they act overlaps with the inherent numerical dissipation of the DG scheme. In contrast, when the resolution is insufficient, as is typical at high Reynolds numbers, explicit subgrid-scale models complement the numerical dissipation and enhance accuracy by removing the excess energy that numerical fluxes alone cannot dissipate.
These findings provide practical guidance for choosing numerical strategies in high-order turbulence simulations.

[490]  arXiv:2512.04642 (cross-list from q-bio.NC) [pdf, ps, other]
Title: Limit cycles for speech
Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL)

Rhythmic fluctuations in acoustic energy and accompanying neuronal excitations in cortical oscillations are characteristic of human speech, yet whether a corresponding rhythmicity inheres in the articulatory movements that generate speech remains unclear. The received understanding of speech movements as discrete, goal-oriented actions struggles to make contact with the rhythmicity findings. In this work, we demonstrate that an unintuitive -- but no less principled than the conventional -- representation for discrete movements reveals a pervasive limit cycle organization and unlocks the recovery of previously inaccessible rhythmic structure underlying the motor activity of speech. These results help resolve a time-honored tension between the ubiquity of biological rhythmicity and discreteness in speech, the quintessential human higher function, by revealing a rhythmic organization at the most fundamental level of individual articulatory actions.

[491]  arXiv:2512.04663 (cross-list from quant-ph) [pdf, ps, other]
Title: Fermionic neural Gibbs states
Subjects: Quantum Physics (quant-ph); Strongly Correlated Electrons (cond-mat.str-el); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)

We introduce fermionic neural Gibbs states (fNGS), a variational framework for modeling finite-temperature properties of strongly interacting fermions. fNGS starts from a reference mean-field thermofield-double state and uses neural-network transformations together with imaginary-time evolution to systematically build strong correlations. Applied to the doped Fermi-Hubbard model, a minimal lattice model capturing essential features of strong electronic correlations, fNGS accurately reproduces thermal energies over a broad range of temperatures, interaction strengths, even at large dopings, for system sizes beyond the reach of exact methods. These results demonstrate a scalable route to studying finite-temperature properties of strongly correlated fermionic systems beyond one dimension with neural-network representations of quantum states.

[492]  arXiv:2512.04671 (cross-list from physics.soc-ph) [pdf, ps, other]
Title: Evolutionary Dynamics Based on Reputation in Networked Populations with Game Transitions
Comments: 13 pages
Journal-ref: IEEE Transactions on Signal and Information Processing over Networks, 2025, 1-13
Subjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI)

The environment undergoes perpetual changes that are influenced by a combination of endogenous and exogenous factors. Consequently, it exerts a substantial influence on an individual's physical and psychological state, directly or indirectly affecting the evolutionary dynamics of a population described by a network, which in turn can also alter the environment. Furthermore, the evolution of strategies, shaped by reputation, can diverge due to variations in multiple factors. To explore the potential consequences of the mentioned situations, this paper studies how game and reputation dynamics alter the evolution of cooperation. Concretely, game transitions are determined by individuals' behaviors and external uncontrollable factors. The cooperation level of its neighbors reflects individuals' reputation, and further, a general fitness function regarding payoff and reputation is provided. Within the context of the donation game, we investigate the relevant outcomes associated with the aforementioned evolutionary process, considering various topologies for distinct interactions. Additionally, a biased mutation is introduced to gain a deeper insight into the strategy evolution. We detect a substantial increase in the cooperation level through intensive simulations, and some important phenomena are observed, e.g., the unilateral increase of the value of prosocial behavior limits promotion in cooperative behavior in square-lattice networks.

[493]  arXiv:2512.04690 (cross-list from stat.ML) [pdf, ps, other]
Title: Recurrent Neural Networks with Linear Structures for Electricity Price Forecasting
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We present a novel recurrent neural network architecture designed explicitly for day-ahead electricity price forecasting, aimed at improving short-term decision-making and operational management in energy systems. Our combined forecasting model embeds linear structures, such as expert models and Kalman filters, into recurrent networks, enabling efficient computation and enhanced interpretability. The design leverages the strengths of both linear and non-linear model structures, allowing it to capture all relevant stylised price characteristics in power markets, including calendar and autoregressive effects, as well as influences from load, renewable energy, and related fuel and carbon markets. For empirical testing, we use hourly data from the largest European electricity market spanning 2018 to 2025 in a comprehensive forecasting study, comparing our model against state-of-the-art approaches, particularly high-dimensional linear and neural network models. The proposed model achieves approximately 12% higher accuracy than leading benchmarks. We evaluate the contributions of the interpretable model components and conclude on the impact of combining linear and non-linear structures.

[494]  arXiv:2512.04696 (cross-list from stat.ML) [pdf, ps, other]
Title: Provable FDR Control for Deep Feature Selection: Deep MLPs and Beyond
Authors: Kazuma Sawaya
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

We develop a flexible feature selection framework based on deep neural networks that approximately controls the false discovery rate (FDR), a measure of Type-I error. The method applies to architectures whose first layer is fully connected. From the second layer onward, it accommodates multilayer perceptrons (MLPs) of arbitrary width and depth, convolutional and recurrent networks, attention mechanisms, residual connections, and dropout. The procedure also accommodates stochastic gradient descent with data-independent initializations and learning rates. To the best of our knowledge, this is the first work to provide a theoretical guarantee of FDR control for feature selection within such a general deep learning setting.
Our analysis is built upon a multi-index data-generating model and an asymptotic regime in which the feature dimension $n$ diverges faster than the latent dimension $q^{*}$, while the sample size, the number of training iterations, the network depth, and hidden layer widths are left unrestricted. Under this setting, we show that each coordinate of the gradient-based feature-importance vector admits a marginal normal approximation, thereby supporting the validity of asymptotic FDR control. As a theoretical limitation, we assume $\mathbf{B}$-right orthogonal invariance of the design matrix, and we discuss broader generalizations. We also present numerical experiments that underscore the theoretical findings.

[495]  arXiv:2512.04697 (cross-list from math.OC) [pdf, ps, other]
Title: Continuous-time reinforcement learning for optimal switching over multiple regimes
Comments: Keywords: Optimal regime switching, multiple regimes, continuous-time reinforcement learning, system of HJB equations, policy improvement, policy iteration convergence
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Computational Finance (q-fin.CP)

This paper studies the continuous-time reinforcement learning (RL) for optimal switching problems across multiple regimes. We consider a type of exploratory formulation under entropy regularization where the agent randomizes both the timing of switches and the selection of regimes through the generator matrix of an associated continuous-time finite-state Markov chain. We establish the well-posedness of the associated system of Hamilton-Jacobi-Bellman (HJB) equations and provide a characterization of the optimal policy. The policy improvement and the convergence of the policy iterations are rigorously established by analyzing the system of equations. We also show the convergence of the value function in the exploratory formulation towards the value function in the classical formulation as the temperature parameter vanishes. Finally, a reinforcement learning algorithm is devised and implemented by invoking the policy evaluation based on the martingale characterization. Our numerical examples with the aid of neural networks illustrate the effectiveness of the proposed RL algorithm.

[496]  arXiv:2512.04710 (cross-list from quant-ph) [pdf, ps, other]
Title: Quantum-Inspired Optimization through Qudit-Based Imaginary Time Evolution
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)

Imaginary-time evolution has been shown to be a promising framework for tackling combinatorial optimization problems on quantum hardware. In this work, we propose a classical quantum-inspired strategy for solving combinatorial optimization problems with integer-valued decision variables by encoding decision variables into multi-level quantum states known as qudits. This method results in a reduced number of decision variables compared to binary formulations while inherently incorporating single-association constraints. Efficient classical simulation is enabled by constraining the system to remain in a product state throughout optimization. The qudit states are optimized by applying a sequence of unitary operators that iteratively approximate the dynamics of imaginary time evolution. Unlike previous studies, we propose a gradient-based method of adaptively choosing the Hermitian operators used to generate the state evolution at each optimization step, as a means to improve the convergence properties of the algorithm. The proposed algorithm demonstrates promising results on Min-d-Cut problem with constraints, outperforming Gurobi on penalized constraint formulation, particularly for larger values of d.

[497]  arXiv:2512.04716 (cross-list from physics.flu-dyn) [pdf, ps, other]
Title: Towards an AI Fluid Scientist: LLM-Powered Scientific Discovery in Experimental Fluid Mechanics
Subjects: Fluid Dynamics (physics.flu-dyn); Artificial Intelligence (cs.AI)

The integration of artificial intelligence into experimental fluid mechanics promises to accelerate discovery, yet most AI applications remain narrowly focused on numerical studies. This work proposes an AI Fluid Scientist framework that autonomously executes the complete experimental workflow: hypothesis generation, experimental design, robotic execution, data analysis, and manuscript preparation. We validate this through investigation of vortex-induced vibration (VIV) and wake-induced vibration (WIV) in tandem cylinders. Our work has four key contributions: (1) A computer-controlled circulating water tunnel (CWT) with programmatic control of flow velocity, cylinder position, and forcing parameters (vibration frequency and amplitude) with data acquisition (displacement, force, and torque). (2) Automated experiments reproduce literature benchmarks (Khalak and Williamson [1999] and Assi et al. [2013, 2010]) with frequency lock-in within 4% and matching critical spacing trends. (3) The framework with Human-in-the-Loop (HIL) discovers more WIV amplitude response phenomena, and uses a neural network to fit physical laws from data, which is 31% higher than that of polynomial fitting. (4) The framework with multi-agent with virtual-real interaction system executes hundreds of experiments end-to-end, which automatically completes the entire process of scientific research from hypothesis generation, experimental design, experimental execution, data analysis, and manuscript preparation. It greatly liberates human researchers and improves study efficiency, providing new paradigm for the development and research of experimental fluid mechanics.

[498]  arXiv:2512.04719 (cross-list from eess.SP) [pdf, ps, other]
Title: Pinching-Antenna System Design under Random LoS and NLoS Channels
Comments: 13 pages, 8 figures, 2 tables
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Pinching antennas, realized through position-adjustable radiating elements along dielectric waveguides, have emerged as a promising flexible-antenna technology thanks to their ability to dynamically reshape large-scale channel conditions. However, most existing studies focus on idealized LoS-dominated environments, overlooking the stochastic nature of realistic wireless propagation. This paper investigates a more practical multiuser pinching-antenna system under a composite probabilistic channel model that captures distance-dependent LoS blockage and NLoS scattering. To account for both efficiency and reliability aspects of communication, two complementary design metrics are considered: an average signal-to-noise ratio (SNR) metric characterizing long-term throughput and fairness, and an outage-constrained metric ensuring a prescribed reliability level. Based on these metrics, we formulate two optimization problems: the first maximizes the max-min average SNR across users, while the second maximizes a guaranteed SNR threshold under per-user outage constraints. Although both problems are inherently nonconvex, we exploit their underlying monotonic structures and develop low-complexity, bisection-based algorithms that achieve globally optimal solutions using only simple scalar evaluations. Extensive simulations validate the effectiveness of the proposed methods and demonstrate that pinching-antenna systems significantly outperform conventional fixed-antenna designs even under random LoS and NLoS channels.

[499]  arXiv:2512.04745 (cross-list from math.OC) [pdf, ps, other]
Title: Neural Policy Composition from Free Energy Minimization
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO)

The ability to compose acquired skills to plan and execute behaviors is a hallmark of natural intelligence. Yet, despite remarkable cross-disciplinary efforts, a principled account of how task structure shapes gating and how such computations could be delivered in neural circuits, remains elusive. Here we introduce GateMod, an interpretable theoretically grounded computational model linking the emergence of gating to the underlying decision-making task, and to a neural circuit architecture. We first develop GateFrame, a normative framework casting policy gating into the minimization of the free energy. This framework, relating gating rules to task, applies broadly across neuroscience, cognitive and computational sciences. We then derive GateFlow, a continuous-time energy based dynamics that provably converges to GateFrame optimal solution. Convergence, exponential and global, follows from a contractivity property that also yields robustness and other desirable properties. Finally, we derive a neural circuit from GateFlow, GateNet. This is a soft-competitive recurrent circuit whose components perform local and contextual computations consistent with known dendritic and neural processing motifs. We evaluate GateMod across two different settings: collective behaviors in multi-agent systems and human decision-making in multi-armed bandits. In all settings, GateMod provides interpretable mechanistic explanations of gating and quantitatively matches or outperforms established models. GateMod offers a unifying framework for neural policy gating, linking task objectives, dynamical computation, and circuit-level mechanisms. It provides a framework to understand gating in natural agents beyond current explanations and to equip machines with this ability.

[500]  arXiv:2512.04749 (cross-list from physics.geo-ph) [pdf, ps, other]
Title: UnwrapDiff: Conditional Diffusion for Robust InSAR Phase Unwrapping
Subjects: Geophysics (physics.geo-ph); Artificial Intelligence (cs.AI)

Phase unwrapping is a fundamental problem in InSAR data processing, supporting geophysical applications such as deformation monitoring and hazard assessment. Its reliability is limited by noise and decorrelation in radar acquisitions, which makes accurate reconstruction of the deformation signal challenging. We propose a denoising diffusion probabilistic model (DDPM)-based framework for InSAR phase unwrapping, UnwrapDiff, in which the output of the traditional minimum cost flow algorithm (SNAPHU) is incorporated as conditional guidance. To evaluate robustness, we construct a synthetic dataset that incorporates atmospheric effects and diverse noise patterns, representative of realistic InSAR observations. Experiments show that the proposed model leverages the conditional prior while reducing the effect of diverse noise patterns, achieving on average a 10.11\% reduction in NRMSE compared to SNAPHU. It also achieves better reconstruction quality in difficult cases such as dyke intrusions.

[501]  arXiv:2512.04762 (cross-list from math.LO) [pdf, ps, other]
Title: Maehara Interpolation in Extensions of R-mingle
Subjects: Logic (math.LO); Logic in Computer Science (cs.LO)

We show that there are exactly five quasivarieties of Sugihara algebras with the amalgamation property, and that all of these have the relative congruence extension property. As a consequence, we obtain that the amalgamation property and transferable injections property coincide for arbitrary quasivarieties of Sugihara algebras. These results provide a complete description of arbitrary (not merely axiomatic) extensions of the logic R-mingle that have the Maehara interpolation property, and further demonstrates that the Robinson property and Maehara interpolation property coincide for arbitrary extensions of R-mingle. Further, we show that the question of whether a given finitely based extension of R-mingle has the Maehara interpolation property is decidable.

[502]  arXiv:2512.04778 (cross-list from math.OC) [pdf, ps, other]
Title: Adaptive Online Optimization for Microgrids with Renewable Energy Sources
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In this paper we propose a novel adaptive online optimization algorithm tailored to the management of microgrids with high renewable energy penetration, which can be formulated as a constrained, online optimization problem. The proposed algorithm is characterized by a control-based design that applies the internal model principle, and a system identification routine tasked with identifying such internal model. In addition, in order to ensure the constraints are verified, we integrate a projection onto the constraint set. We showcase promising numerical results for the microgrid use case, highlighting in particular the enhanced adaptability of the proposed algorithm to changes in the internal model. The performance of the proposed algorithm is shown to outperform state-of-the-art alternative in the long-term, ensuring efficient management of the grid.

[503]  arXiv:2512.04795 (cross-list from math.CO) [pdf, ps, other]
Title: Unavoidable patterns and plane paths in dense topological graphs
Subjects: Combinatorics (math.CO); Computational Geometry (cs.CG)

Let $C_{s,t}$ be the complete bipartite geometric graph, with $s$ and $t$ vertices on two distinct parallel lines respectively, and all $s t$ straight-line edges drawn between them. In this paper, we show that every complete bipartite simple topological graph, with parts of size $2(k-1)^4 + 1$ and $2^{k^{5k}}$, contains a topological subgraph weakly isomorphic to $C_{k,k}$. As a corollary, every $n$-vertex simple topological graph not containing a plane path of length $k$ has at most $O_k(n^{2 - 8/k^4})$ edges. When $k = 3$, we obtain a stronger bound by showing that every $n$-vertex simple topological graph not containing a plane path of length 3 has at most $O(n^{4/3})$ edges. We also prove that $x$-monotone simple topological graphs not containing a plane path of length 3 have at most a linear number of edges.

[504]  arXiv:2512.04803 (cross-list from astro-ph.GA) [pdf, ps, other]
Title: 287,872 Supermassive Black Holes Masses: Deep Learning Approaching Reverberation Mapping Accuracy
Comments: 14 pages, 9 figures. Submitted to Journal of High Energy Astrophysics
Subjects: Astrophysics of Galaxies (astro-ph.GA); High Energy Astrophysical Phenomena (astro-ph.HE); Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI)

We present a population-scale catalogue of 287,872 supermassive black hole masses with high accuracy. Using a deep encoder-decoder network trained on optical spectra with reverberation-mapping (RM) based labels of 849 quasars and applied to all SDSS quasars up to $z=4$, our method achieves a root-mean-square error of $0.058$\,dex, a relative uncertainty of $\approx 14\%$, and coefficient of determination $R^{2}\approx0.91$ with respect to RM-based masses, far surpassing traditional single-line virial estimators. Notably, the high accuracy is maintained for both low ($<10^{7.5}\,M_\odot$) and high ($>10^{9}\,M_\odot$) mass quasars, where empirical relations are unreliable.

[505]  arXiv:2512.04808 (cross-list from q-bio.NC) [pdf, ps, other]
Title: Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)

Discovering the neural mechanisms underpinning cognition is one of the grand challenges of neuroscience. However, previous approaches for building models of RNN dynamics that explain behaviour required iterative refinement of architectures and/or optimisation objectives, resulting in a piecemeal, and mostly heuristic, human-in-the-loop process. Here, we offer an alternative approach that automates the discovery of viable RNN mechanisms by explicitly training RNNs to reproduce behaviour, including the same characteristic errors and suboptimalities, that humans and animals produce in a cognitive task. Achieving this required two main innovations. First, as the amount of behavioural data that can be collected in experiments is often too limited to train RNNs, we use a non-parametric generative model of behavioural responses to produce surrogate data for training RNNs. Second, to capture all relevant statistical aspects of the data, we developed a novel diffusion model-based approach for training RNNs. To showcase the potential of our approach, we chose a visual working memory task as our test-bed, as behaviour in this task is well known to produce response distributions that are patently multimodal (due to swap errors). The resulting network dynamics correctly qualitative features of macaque neural data. Importantly, these results were not possible to obtain with more traditional approaches, i.e., when only a limited set of behavioural signatures (rather than the full richness of behavioural response distributions) were fitted, or when RNNs were trained for task optimality (instead of reproducing behaviour). Our approach also yields novel predictions about the mechanism of swap errors, which can be readily tested in experiments. These results suggest that fitting RNNs to rich patterns of behaviour provides a powerful way to automatically discover mechanisms of important cognitive functions.

[506]  arXiv:2512.04865 (cross-list from math.AG) [pdf, ps, other]
Title: Series of quasi-uniform scatterings with fast search, root systems and neural network classifications
Authors: Igor V. Netay
Subjects: Algebraic Geometry (math.AG); Machine Learning (cs.LG); Representation Theory (math.RT)

In this paper we describe an approach to construct large extendable collections of vectors in predefined spaces of given dimensions. These collections are useful for neural network latent space configuration and training. For classification problem with large or unknown number of classes this allows to construct classifiers without classification layer and extend the number of classes without retraining of network from the very beginning. The construction allows to create large well-spaced vector collections in spaces of minimal possible dimension. If the number of classes is known or approximately predictable, one can choose sufficient enough vector collection size. If one needs to significantly extend the number of classes, one can extend the collection in the same latent space, or to incorporate the collection into collection of higher dimensions with same spacing between vectors. Also, regular symmetric structure of constructed vector collections can significantly simplify problems of search for nearest cluster centers or embeddings in the latent space. Construction of vector collections is based on combinatorics and geometry of semi-simple Lie groups irreducible representations with highest weight.

[507]  arXiv:2512.04873 (cross-list from physics.comp-ph) [pdf, ps, other]
Title: VNS Tokamak OpenMC-Serpent Validation for Medical Isotope Studies
Subjects: Computational Physics (physics.comp-ph); Computational Engineering, Finance, and Science (cs.CE)

The Volumetric Neutron Source (VNS) tokamak is a proposed fusion reactor for testing and qualification of reactor components for future use in a fusion power facility, and has potential use for radioisotope production. The VNS geometry is modeled in the Serpent and OpenMC neutronics codes. Analog neutron-photon coupled simulations are carried out to compare the model's vacuum vessel and blanket components across codes. In the vacuum vessel, neutron and photon flux maps are calculated, while in the blanket region, neutron and photon spectra, (n,T), and (n,2n) reaction rates are calculated and compared between models. The detector response comparisons found the following: neutron flux and (n,T) reactions achieved excellent agreement, the (n,2n) detector response had good agreement, and photon flux had regional discrepancies depending on Serpent tracking used. Hybrid tracking lead to a relative difference of about 20% in the outboard side blanket, where as employment of delta tracking resulted in less than 1% relative difference. On an HPC cluster, Serpent was found to have shorter computation time than OpenMC in neutron photon coupled simulations using both hybrid tracking and delta tracking, but longer in neutron only simulations. An exemplary radioisotope production case is presented for the demonstration of additional VNS capabilities.

[508]  arXiv:2512.04874 (cross-list from math.FA) [pdf, ps, other]
Title: Shorting Dynamics and Structured Kernel Regularization
Authors: James Tian
Subjects: Functional Analysis (math.FA); Machine Learning (cs.LG)

This paper develops a nonlinear operator dynamic that progressively removes the influence of a prescribed feature subspace while retaining maximal structure elsewhere. The induced sequence of positive operators is monotone, admits an exact residual decomposition, and converges to the classical shorted operator. Transporting this dynamic to reproducing kernel Hilbert spaces yields a corresponding family of kernels that converges to the largest kernel dominated by the original one and annihilating the given subspace. In the finite-sample setting, the associated Gram operators inherit a structured residual decomposition that leads to a canonical form of kernel ridge regression and a principled way to enforce nuisance invariance. This gives a unified operator-analytic approach to invariant kernel construction and structured regularization in data analysis.

[509]  arXiv:2512.04881 (cross-list from eess.SP) [pdf, ps, other]
Title: Beampattern Synthesis for Discrete Phase RIS in Communication and Sensing Systems
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Extensive research on Reconfigurable Intelligent Surfaces (RIS) has primarily focused on optimizing reflective coefficients for passive beamforming in specific target directions. This optimization typically assumes prior knowledge of the target direction, which is unavailable before the target is detected. To enhance direction estimation, it is critical to develop array pattern synthesis techniques that yield a wider beam by maximizing the received power over the entire target area. Although this challenge has been addressed with active antennas, RIS systems pose a unique challenge due to their inherent phase constraints, which can be continuous or discrete.
This work addresses this challenge through a novel array pattern synthesis method tailored for discrete phase constraints in RIS. We introduce a penalty method that pushes these constraints to the boundary of the convex hull. Then, the Minorization-Maximization (MM) method is utilized to reformulate the problem into a convex one. Our numerical results show that our algorithm can generate a wide beam pattern comparable to that achievable with per-power constraints, with both the amplitudes and phases being adjustable. We compare our method with a traditional beam sweeping technique, showing a) several orders of magnitude reduction of the MSE of Angle of Arrival (AOA) at low to medium Signal-to-Noise Ratio (SNR)s; and b) $8$~dB SNR reduction to achieve a high probability of detection.

[510]  arXiv:2512.04980 (cross-list from stat.ML) [pdf, ps, other]
Title: Learning Causality for Longitudinal Data
Comments: PhD thesis manuscript
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)

This thesis develops methods for causal inference and causal representation learning (CRL) in high-dimensional, time-varying data.
The first contribution introduces the Causal Dynamic Variational Autoencoder (CDVAE), a model for estimating Individual Treatment Effects (ITEs) by capturing unobserved heterogeneity in treatment response driven by latent risk factors that affect only outcomes. CDVAE comes with theoretical guarantees on valid latent adjustment and generalization bounds for ITE error. Experiments on synthetic and real datasets show that CDVAE outperforms baselines, and that state-of-the-art models greatly improve when augmented with its latent substitutes, approaching oracle performance without access to true adjustment variables.
The second contribution proposes an efficient framework for long-term counterfactual regression based on RNNs enhanced with Contrastive Predictive Coding (CPC) and InfoMax. It captures long-range dependencies under time-varying confounding while avoiding the computational cost of transformers, achieving state-of-the-art results and introducing CPC into causal inference.
The third contribution advances CRL by addressing how latent causes manifest in observed variables. We introduce a model-agnostic interpretability layer based on the geometry of the decoder Jacobian. A sparse self-expression prior induces modular, possibly overlapping groups of observed features aligned with shared latent influences. We provide recovery guarantees in both disjoint and overlapping settings and show that meaningful latent-to-observed structure can be recovered without anchor features or single-parent assumptions. Scalable Jacobian-based regularization techniques are also developed.

[511]  arXiv:2512.04985 (cross-list from stat.ML) [pdf, ps, other]
Title: Towards a unified framework for guided diffusion models
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

Guided or controlled data generation with diffusion models\blfootnote{Partial preliminary results of this work appeared in International Conference on Machine Learning 2025 \citep{li2025provable}.} has become a cornerstone of modern generative modeling. Despite substantial advances in diffusion model theory, the theoretical understanding of guided diffusion samplers remains severely limited. We make progress by developing a unified algorithmic and theoretical framework that accommodates both diffusion guidance and reward-guided diffusion. Aimed at fine-tuning diffusion models to improve certain rewards, we propose injecting a reward guidance term -- constructed from the difference between the original and reward-reweighted scores -- into the backward diffusion process, and rigorously quantify the resulting reward improvement over the unguided counterpart. As a key application, our framework shows that classifier-free guidance (CFG) decreases the expected reciprocal of the classifier probability, providing the first theoretical characterization of the specific performance metric that CFG improves for general target distributions. When applied to reward-guided diffusion, our framework yields a new sampler that is easy-to-train and requires no full diffusion trajectories during training. Numerical experiments further corroborate our theoretical findings.

[512]  arXiv:2512.05024 (cross-list from stat.ME) [pdf, ps, other]
Title: Model-Free Assessment of Simulator Fidelity via Quantile Curves
Comments: 33 pages, 11 figures
Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Simulation of complex systems originated in manufacturing and queuing applications. It is now widely used for large-scale, ML-based systems in research, education, and consumer surveys. However, characterizing the discrepancy between simulators and ground truth remains challenging for increasingly complex, machine-learning-based systems. We propose a computationally tractable method to estimate the quantile function of the discrepancy between the simulated and ground-truth outcome distributions. Our approach focuses on output uncertainty and treats the simulator as a black box, imposing no modeling assumptions on its internals, and hence applies broadly across many parameter families, from Bernoulli and multinomial models to continuous, vector-valued settings. The resulting quantile curve supports confidence interval construction for unseen scenarios, risk-aware summaries of sim-to-real discrepancy (e.g., VaR/CVaR), and comparison of simulators' performance. We demonstrate our methodology in an application assessing LLM simulation fidelity on the WorldValueBench dataset spanning four LLMs.

[513]  arXiv:2512.05040 (cross-list from math.MG) [pdf, ps, other]
Title: Geometric Data Science
Comments: Questions and comments are welcome at [email protected]. The latest version is at this http URL
Subjects: Metric Geometry (math.MG); Materials Science (cond-mat.mtrl-sci); Computational Geometry (cs.CG)

This book introduces the new research area of Geometric Data Science, where data can represent any real objects through geometric measurements.
The first part of the book focuses on finite point sets. The most important result is a complete and continuous classification of all finite clouds of unordered points under rigid motion in any Euclidean space. The key challenge was to avoid the exponential complexity arising from permutations of the given unordered points. For a fixed dimension of the ambient Euclidean space, the times of all algorithms for the resulting invariants and distance metrics depend polynomially on the number of points.
The second part of the book advances a similar classification in the much more difficult case of periodic point sets, which model all periodic crystals at the atomic scale. The most significant result is the hierarchy of invariants from the ultra-fast to complete ones. The key challenge was to resolve the discontinuity of crystal representations that break down under almost any noise. Experimental validation on all major materials databases confirmed the Crystal Isometry Principle: any real periodic crystal has a unique location in a common moduli space of all periodic structures under rigid motion. The resulting moduli space contains all known and not yet discovered periodic crystals and hence continuously extends Mendeleev's table to the full crystal universe.

[514]  arXiv:2512.05049 (cross-list from quant-ph) [pdf, ps, other]
Title: QKAN-LSTM: Quantum-inspired Kolmogorov-Arnold Long Short-term Memory
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Long short-term memory (LSTM) models are a particular type of recurrent neural networks (RNNs) that are central to sequential modeling tasks in domains such as urban telecommunication forecasting, where temporal correlations and nonlinear dependencies dominate. However, conventional LSTMs suffer from high parameter redundancy and limited nonlinear expressivity. In this work, we propose the Quantum-inspired Kolmogorov-Arnold Long Short-Term Memory (QKAN-LSTM), which integrates Data Re-Uploading Activation (DARUAN) modules into the gating structure of LSTMs. Each DARUAN acts as a quantum variational activation function (QVAF), enhancing frequency adaptability and enabling an exponentially enriched spectral representation without multi-qubit entanglement. The resulting architecture preserves quantum-level expressivity while remaining fully executable on classical hardware. Empirical evaluations on three datasets, Damped Simple Harmonic Motion, Bessel Function, and Urban Telecommunication, demonstrate that QKAN-LSTM achieves superior predictive accuracy and generalization with a 79% reduction in trainable parameters compared to classical LSTMs. We extend the framework to the Jiang-Huang-Chen-Goan Network (JHCG Net), which generalizes KAN to encoder-decoder structures, and then further use QKAN to realize the latent KAN, thereby creating a Hybrid QKAN (HQKAN) for hierarchical representation learning. The proposed HQKAN-LSTM thus provides a scalable and interpretable pathway toward quantum-inspired sequential modeling in real-world data environments.

[515]  arXiv:2512.05058 (cross-list from quant-ph) [pdf, ps, other]
Title: Meta-Learning for Quantum Optimization via Quantum Sequence Model
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The Quantum Approximate Optimization Algorithm (QAOA) is a leading approach for solving combinatorial optimization problems on near-term quantum processors. However, finding good variational parameters remains a significant challenge due to the non-convex energy landscape, often resulting in slow convergence and poor solution quality. In this work, we propose a quantum meta-learning framework that trains advanced quantum sequence models to generate effective parameter initialization policies. We investigate four classical or quantum sequence models, including the Quantum Kernel-based Long Short-Term Memory (QK-LSTM), as learned optimizers in a "learning to learn" paradigm. Our numerical experiments on the Max-Cut problem demonstrate that the QK-LSTM optimizer achieves superior performance, obtaining the highest approximation ratios and exhibiting the fastest convergence rate across all tested problem sizes (n=10 to 13). Crucially, the QK-LSTM model achieves perfect parameter transferability by synthesizing a single, fixed set of near-optimal parameters, leading to a remarkable sustained acceleration of convergence even when generalizing to larger problems. This capability, enabled by the compact and expressive power of the quantum kernel architecture, underscores its effectiveness. The QK-LSTM, with only 43 trainable parameters, substantially outperforms the classical LSTM (56 parameters) and other quantum sequence models, establishing a robust pathway toward highly efficient parameter initialization for variational quantum algorithms in the NISQ era.

[516]  arXiv:2512.05070 (cross-list from stat.ML) [pdf, ps, other]
Title: Control Consistency Losses for Diffusion Bridges
Comments: Frontiers in Probabilistic Inference: Sampling Meets Learning Workshop at NeurIPS 2025 (Oral)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Simulating the conditioned dynamics of diffusion processes, given their initial and terminal states, is an important but challenging problem in the sciences. The difficulty is particularly pronounced for rare events, for which the unconditioned dynamics rarely reach the terminal state. In this work, we leverage a self-consistency property of the conditioned dynamics to learn the diffusion bridge in an iterative online manner, and demonstrate promising empirical results in a range of settings.

[517]  arXiv:2512.05092 (cross-list from stat.ML) [pdf, ps, other]
Title: Foundations of Diffusion Models in General State Spaces: A Self-Contained Introduction
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Although diffusion models now occupy a central place in generative modeling, introductory treatments commonly assume Euclidean data and seldom clarify their connection to discrete-state analogues. This article is a self-contained primer on diffusion over general state spaces, unifying continuous domains and discrete/categorical structures under one lens. We develop the discrete-time view (forward noising via Markov kernels and learned reverse dynamics) alongside its continuous-time limits -- stochastic differential equations (SDEs) in $\mathbb{R}^d$ and continuous-time Markov chains (CTMCs) on finite alphabets -- and derive the associated Fokker--Planck and master equations. A common variational treatment yields the ELBO that underpins standard training losses. We make explicit how forward corruption choices -- Gaussian processes in continuous spaces and structured categorical transition kernels (uniform, masking/absorbing and more) in discrete spaces -- shape reverse dynamics and the ELBO. The presentation is layered for three audiences: newcomers seeking a self-contained intuitive introduction; diffusion practitioners wanting a global theoretical synthesis; and continuous-diffusion experts looking for an analogy-first path into discrete diffusion. The result is a unified roadmap to modern diffusion methodology across continuous domains and discrete sequences, highlighting a compact set of reusable proofs, identities, and core theoretical principles.

Replacements for Fri, 5 Dec 25

[518]  arXiv:2007.16161 (replaced) [pdf, ps, other]
Title: Coinductive proof search for polarized logic with applications to full intuitionistic propositional logic
Comments: 57 pages incl. appendices (v2 has 22 pages incl. appendices and corresponds to DOI 10.4230/LIPIcs.TYPES.2020.4); besides adding running and meta-theoretic examples, the previous version is extended threefold: (1) from proof terms to co-proof terms as objects of study, (2) LJQ is used as source system besides LJT, (3) there is an entire scheme for the properties being studied
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)
[519]  arXiv:2201.07139 (replaced) [pdf, ps, other]
Title: Safe Online Bid Optimization with Return on Investment and Budget Constraints
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[520]  arXiv:2202.04044 (replaced) [pdf, ps, other]
Title: Aging and the Narrowing of Scientific Innovation
Comments: 15 pages, 4 figures
Subjects: Digital Libraries (cs.DL)
[521]  arXiv:2204.09329 (replaced) [pdf, ps, other]
Title: Deterministic Distributed Algorithms and Measurable Combinatorics on $Δ$-Regular Forests
Comments: This paper is an extension of some parts of the conference paper "Local Problems on Trees from the Perspectives of Distributed Algorithms, Finitary Factors, and Descriptive Combinatorics (arXiv:2106.02066)
Subjects: Logic (math.LO); Distributed, Parallel, and Cluster Computing (cs.DC)
[522]  arXiv:2212.00476 (replaced) [pdf, ps, other]
Title: Efficient stereo matching on embedded GPUs with zero-means cross correlation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Hardware Architecture (cs.AR)
[523]  arXiv:2305.15140 (replaced) [pdf, ps, other]
Title: Polynomial-Time Pseudodeterministic Construction of Primes
Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
[524]  arXiv:2309.07104 (replaced) [pdf, ps, other]
Title: Polygon Intersection-over-Union Loss for Viewpoint-Agnostic Monocular 3D Vehicle Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[525]  arXiv:2312.10657 (replaced) [pdf, ps, other]
Title: UltraClean: A Simple Framework to Train Robust Neural Networks against Backdoor Attacks
Subjects: Cryptography and Security (cs.CR)
[526]  arXiv:2312.12462 (replaced) [pdf, ps, other]
Title: Towards an end-to-end artificial intelligence driven global weather forecasting system
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[527]  arXiv:2312.12919 (replaced) [pdf, ps, other]
Title: Coloring Grids Avoiding Bicolored Paths
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
[528]  arXiv:2402.13927 (replaced) [pdf, ps, other]
Title: The Delusional Hedge Algorithm as a Model of Human Learning from Diverse Opinions
Subjects: Artificial Intelligence (cs.AI)
[529]  arXiv:2403.02514 (replaced) [pdf, ps, other]
Title: The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework
Comments: 33 pages, 5 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[530]  arXiv:2404.01445 (replaced) [pdf, ps, other]
Title: Using Dynamic Safety Margins as Control Barrier Functions
Comments: 12 pages, 5 figures, 2 tables
Subjects: Systems and Control (eess.SY)
[531]  arXiv:2404.02112 (replaced) [pdf, ps, other]
Title: ImageNot: A contrast with ImageNet preserves model rankings
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[532]  arXiv:2405.02568 (replaced) [pdf, ps, other]
Title: Surface-Based Visibility-Guided Uncertainty for Continuous Active 3D Neural Reconstruction
Comments: The main claims are the same as in the previous version, but the naming and explanations have been changed. Accepted at AAAI 2026 Artificial Intelligence with Biased or Scarce Data workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[533]  arXiv:2405.07953 (replaced) [pdf, ps, other]
Title: On the Decidability of Monadic Theories of Arithmetic Predicates
Comments: 32 pages, conference version of "On the Decidability of Monadic Second-Order Logic with Arithmetic Predicates" from LICS 2024 (Distinguished Paper Award)
Subjects: Logic in Computer Science (cs.LO)
[534]  arXiv:2405.18770 (replaced) [pdf, ps, other]
Title: Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships
Comments: WACV 2026 Accepted. Code available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[535]  arXiv:2406.03280 (replaced) [pdf, ps, other]
Title: FusionBench: A Unified Library and Comprehensive Benchmark for Deep Model Fusion
Comments: Project homepage: this https URL Online documentation: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[536]  arXiv:2406.11062 (replaced) [pdf, ps, other]
Title: To Ban or Not to Ban: Uses and Gratifications of Mobile Phones among Township High School Learners
Comments: African Conference On Information Systems & Technology, 2017, 11 pages
Subjects: Computers and Society (cs.CY)
[537]  arXiv:2406.12090 (replaced) [pdf, other]
Title: Data-Aware Hybrid Tableaux
Subjects: Logic in Computer Science (cs.LO)
[538]  arXiv:2406.17374 (replaced) [pdf, ps, other]
Title: Generalizability of experimental studies
Comments: Under review at TMLR
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
[539]  arXiv:2407.06929 (replaced) [pdf, ps, other]
Title: Convergence of the Semi-Discrete WaveHoltz Iteration
Subjects: Numerical Analysis (math.NA)
[540]  arXiv:2407.11698 (replaced) [pdf, ps, other]
Title: NITRO-D: Native Integer-only Training of Deep Convolutional Neural Networks
Comments: 15 pages, 3 figures
Journal-ref: 28th European Conference on Artificial Intelligence (ECAI 2025)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
[541]  arXiv:2407.13535 (replaced) [pdf, ps, other]
Title: Visuospatial navigation from the bottom-up: without vestibular integration, distance prediction, or maps
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
[542]  arXiv:2408.05540 (replaced) [pdf, ps, other]
Title: Convergence Analysis for Deep Sparse Coding via Convolutional Neural Networks
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE)
[543]  arXiv:2408.16389 (replaced) [pdf, ps, other]
Title: Addressing common misinterpretations of KART and UAT in neural network literature
Authors: Vugar Ismailov
Comments: 16 pages. A section, a subsection, a table, and references have been added
Journal-ref: Neural Networks 196 (2026), Paper no. 108361
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[544]  arXiv:2408.16547 (replaced) [pdf, ps, other]
Title: OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation
Comments: published in ECCV2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[545]  arXiv:2409.12103 (replaced) [pdf, ps, other]
Title: Towards practical secure delegated quantum computing with semi-classical light
Comments: 45 pages, 15 figures
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)
[546]  arXiv:2409.17892 (replaced) [pdf, ps, other]
Title: EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Subjects: Computation and Language (cs.CL)
[547]  arXiv:2409.18530 (replaced) [pdf, ps, other]
Title: A Static Analysis of Popular C Packages in Linux
Comments: Proceedings of the 22nd Annual International Conference on Privacy, Security, and Trust (PST 2025), Fredericton, IEEE, pp. 1-10
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
[548]  arXiv:2409.19051 (replaced) [pdf, ps, other]
Title: Multimodal Markup Document Models for Graphic Design Completion
Comments: Accepted by ACM Multimedia 2025, Project page: this https URL
Journal-ref: Proceedings of the 33rd ACM International Conference on Multimedia. 2025. p.11022-11031
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[549]  arXiv:2410.09414 (replaced) [pdf, ps, other]
Title: Multi-agent Assisted Automatic Test Generation for Java JSON Libraries
Comments: In the 32nd Asia-Pacific Software Engineering Conference (APSEC 2025)
Subjects: Software Engineering (cs.SE)
[550]  arXiv:2410.10095 (replaced) [pdf, ps, other]
Title: Candidate Monotonicity and Proportionality for Lotteries and Non-Resolute Rules
Authors: Jannik Peters
Comments: Paper accepted at Social Choice and Welfare
Subjects: Computer Science and Game Theory (cs.GT)
[551]  arXiv:2410.15555 (replaced) [pdf, ps, other]
Title: Bayesian Concept Bottleneck Models with LLM Priors
Comments: 2025 Conference on Neural Information Processing Systems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[552]  arXiv:2410.16888 (replaced) [pdf, ps, other]
Title: Unsupervised Time Series Anomaly Prediction with Importance-based Generative Contrastive Learning
Comments: ACM SIGKDD 2025
Subjects: Machine Learning (cs.LG)
[553]  arXiv:2410.18797 (replaced) [pdf, ps, other]
Title: Learning Geodesics of Geometric Shape Deformations From Images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[554]  arXiv:2410.18895 (replaced) [pdf, ps, other]
Title: ArterialNet: Reconstructing Arterial Blood Pressure Waveform with Wearable Pulsatile Signals, a Cohort-Aware Approach
Subjects: Machine Learning (cs.LG)
[555]  arXiv:2410.21288 (replaced) [pdf, ps, other]
Title: Digital requirements engineering with an INCOSE-derived SysML meta-model
Comments: 24 pages; 11 figures; 3 tables; journal preprint
Subjects: Software Engineering (cs.SE); Systems and Control (eess.SY)
[556]  arXiv:2411.00241 (replaced) [pdf, ps, other]
Title: A Fast and Model Based Approach for Evaluating Task-Competence of Antagonistic Continuum Arms
Comments: Published in the 8th IEEE-RAS International Conference on Soft Robotics (RoboSoft 2025). See this https URL for code, proofs, and supplementary information. Please note the officially published version of the paper in IEEE contains an error in Equation 7. That has been corrected here, so this is the final version of the paper. Apologies for the confusion!
Journal-ref: 2025 IEEE 8th International Conference on Soft Robotics (RoboSoft), Lausanne, Switzerland, 2025, pp. 1-8
Subjects: Robotics (cs.RO); Computational Engineering, Finance, and Science (cs.CE)
[557]  arXiv:2411.09840 (replaced) [pdf, ps, other]
Title: Self-Propelled Agents and Group Social Force
Authors: Peng Wang, Peter Luh
Comments: 16 pages, 10 figures
Subjects: Physics and Society (physics.soc-ph); Multiagent Systems (cs.MA); Dynamical Systems (math.DS); Adaptation and Self-Organizing Systems (nlin.AO)
[558]  arXiv:2411.15061 (replaced) [pdf, ps, other]
Title: Empowering Clients -- Transformation of Design Processes Due to Generative AI
Subjects: Artificial Intelligence (cs.AI)
[559]  arXiv:2411.17461 (replaced) [pdf, ps, other]
Title: SoK: Decentralized AI (DeAI)
Comments: This is a Systematization of Knowledge (SoK) for the rapidly evolving field of Decentralized AI (DeAI). We welcome valuable comments, suggestions, and collaboration to further refine and enhance this work. We hope our contribution will help accelerate the advancement of DeAI
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[560]  arXiv:2412.00387 (replaced) [pdf, ps, other]
Title: A generalization of Burmester-Desmedt GKE based on a non-abelian finite group action
Comments: 20 pages. Submitted to the Journal of Discrete Mathematical Sciences and Cryptography in April 2025
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
[561]  arXiv:2412.09058 (replaced) [pdf, ps, other]
Title: EmbedGenius: Towards Automated Software Development for Generic Embedded IoT Systems
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
[562]  arXiv:2412.13222 (replaced) [pdf, ps, other]
Title: Near Real-time Adaptive Isotropic and Anisotropic Image-to-mesh Conversion for Numerical Simulations Involving Cerebral Aneurysms
Comments: 58 pages, 16 figures, 13 tables, presented at the 18th U.S. National Congress on Computational Mechanics conference
Subjects: Fluid Dynamics (physics.flu-dyn); Distributed, Parallel, and Cluster Computing (cs.DC); Graphics (cs.GR); Mathematical Software (cs.MS); Numerical Analysis (math.NA)
[563]  arXiv:2412.13547 (replaced) [pdf, ps, other]
Title: Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[564]  arXiv:2412.16181 (replaced) [pdf, ps, other]
Title: Minimum Weighted Feedback Arc Sets for Ranking from Pairwise Comparisons
Comments: This is a preliminary paper
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
[565]  arXiv:2412.17704 (replaced) [pdf, ps, other]
Title: Enhanced Quantum Circuit Cutting Framework for Sampling Overhead Reduction
Comments: 29 pages, 6 figures
Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC)
[566]  arXiv:2412.17961 (replaced) [pdf, ps, other]
Title: Extending Graph Condensation to Multi-Label Datasets: A Benchmark Study
Comments: Accepted by Transactions on Machine Learning Research (TMLR)
Journal-ref: Transactions on Machine Learning Research, 2025
Subjects: Machine Learning (cs.LG)
[567]  arXiv:2501.00926 (replaced) [pdf, ps, other]
Title: Differentially Private Matchings: Symmetry Lower Bounds, Arboricity Sparsifiers, and Public Vertex Subset Mechanism
Comments: Abstract truncated to fit arXiv limits
Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR)
[568]  arXiv:2501.07033 (replaced) [pdf, ps, other]
Title: Detection of AI Deepfake and Fraud in Online Payments Using GAN-Based Models
Comments: The paper will be published and indexed by IEEE at 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE 2025)
Journal-ref: 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE)
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[569]  arXiv:2501.12119 (replaced) [pdf, ps, other]
Title: ENTIRE: Learning-based Volume Rendering Time Prediction
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[570]  arXiv:2501.12374 (replaced) [pdf, ps, other]
Title: Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists
Comments: Eisenmann and Karjus contributed equally to this work and share first authorship
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[571]  arXiv:2501.13820 (replaced) [pdf, ps, other]
Title: Consistent spectral clustering in sparse tensor block models
Comments: 52 pages
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Probability (math.PR)
[572]  arXiv:2501.14120 (replaced) [pdf, ps, other]
Title: On the Transfer of Knowledge in Quantum Algorithms
Comments: 14 pages, 8 figures, 4 tables. Paper submitted for its review in Expert Systems journal
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI)
[573]  arXiv:2501.15731 (replaced) [pdf, ps, other]
Title: Renewable Energy Prediction: A Comparative Study of Deep Learning Models for Complex Dataset Analysis
Comments: 11 pages, 2 figures and 6 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[574]  arXiv:2501.16165 (replaced) [pdf, ps, other]
Title: A Survey of Operating System Kernel Fuzzing
Comments: This work has been accepted by ACM Transactions on Software Engineering and Methodology (TOSEM)
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Operating Systems (cs.OS)
[575]  arXiv:2502.05101 (replaced) [pdf, ps, other]
Title: A generalized Active Flux method of arbitrarily high order in two dimensions
Subjects: Numerical Analysis (math.NA)
[576]  arXiv:2502.11678 (replaced) [pdf, ps, other]
Title: Which Type of Students can LLMs Act? Investigating Authentic Simulation with Graph-based Human-AI Collaborative System
Comments: This work has been submitted to AI Open for possible publication
Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL)
[577]  arXiv:2502.12151 (replaced) [pdf, ps, other]
Title: VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution
Subjects: Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)
[578]  arXiv:2502.13247 (replaced) [pdf, ps, other]
Title: Grounding LLM Reasoning with Knowledge Graphs
Subjects: Computation and Language (cs.CL)
[579]  arXiv:2502.13348 (replaced) [pdf, ps, other]
Title: System-level Analysis of Dual-Mode Networked Sensing: ISAC Integration & Coordination Gains
Comments: This work has been accepted for publication in IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)
[580]  arXiv:2502.15177 (replaced) [pdf, ps, other]
Title: Optimizing Product Provenance Verification using Data Valuation Methods
Comments: Proceedings of the AAAI Conference on Artificial Intelligence 2026
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
[581]  arXiv:2502.15522 (replaced) [pdf, ps, other]
Title: Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[582]  arXiv:2502.15851 (replaced) [pdf, ps, other]
Title: Control Illusion: The Failure of Instruction Hierarchies in Large Language Models
Comments: Accepted to AAAI-26 Main Technical Track Proceedings
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[583]  arXiv:2502.15982 (replaced) [pdf, ps, other]
Title: Hunting a rabbit: complexity, approximability and some characterizations
Subjects: Combinatorics (math.CO); Computational Complexity (cs.CC)
[584]  arXiv:2502.17793 (replaced) [pdf, ps, other]
Title: SYNTHIA: Novel Concept Design with Affordance Composition
Comments: ACL 2025 Main, Code is available this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[585]  arXiv:2502.19159 (replaced) [pdf, ps, other]
Title: Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[586]  arXiv:2503.01269 (replaced) [pdf, ps, other]
Title: ChatGPT for President! Presupposed content in politicians versus GPT-generated texts
Comments: 36 pages, 6 figures
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
[587]  arXiv:2503.02269 (replaced) [pdf, ps, other]
Title: Experience Replay with Random Reshuffling
Authors: Yasuhiro Fujita
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[588]  arXiv:2503.04058 (replaced) [pdf, ps, other]
Title: EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[589]  arXiv:2503.05168 (replaced) [pdf, ps, other]
Title: SeeLe: A Unified Acceleration Framework for Real-Time Gaussian Splatting
Subjects: Graphics (cs.GR)
[590]  arXiv:2503.05255 (replaced) [pdf, ps, other]
Title: CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Comments: Accepted by AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[591]  arXiv:2503.09321 (replaced) [pdf, ps, other]
Title: DAVE: Diagnostic benchmark for Audio Visual Evaluation
Comments: First two authors contributed equally
Journal-ref: NeurIPS 2025 Datasets & Benchmarks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[592]  arXiv:2503.12921 (replaced) [pdf, ps, other]
Title: Objective Measurement of AI Literacy: Development and Validation of the AI Competency Objective Scale (AICOS)
Journal-ref: Computers and Education: Artificial Intelligence, 2025
Subjects: Human-Computer Interaction (cs.HC)
[593]  arXiv:2503.15470 (replaced) [pdf, ps, other]
Title: EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
Comments: Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[594]  arXiv:2503.18916 (replaced) [pdf, ps, other]
Title: Entropic Analysis of Time Series through Kernel Density Estimation
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[595]  arXiv:2503.22622 (replaced) [pdf, ps, other]
Title: Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion
Comments: project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[596]  arXiv:2503.23228 (replaced) [pdf, ps, other]
Title: Energy-Aware Lane Planning for Connected Electric Vehicles in Urban Traffic: Design and Vehicle-in-the-Loop Validation
Comments: Accepted at 2025 IEEE Conference on Decision and Control (CDC25')
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
[597]  arXiv:2503.24387 (replaced) [pdf, ps, other]
Title: CoCoIns: Consistent Subject Generation via Contrastive Instantiated Concepts
Comments: TMLR 2025. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[598]  arXiv:2504.00409 (replaced) [pdf, ps, other]
Title: Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[599]  arXiv:2504.12158 (replaced) [pdf, ps, other]
Title: What is a monoid?
Comments: 26 pages. Version accepted to POPL 2026
Subjects: Category Theory (math.CT); Programming Languages (cs.PL)
[600]  arXiv:2504.12606 (replaced) [pdf, ps, other]
Title: Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution Can Improve Robust Scene Graph Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[601]  arXiv:2504.12679 (replaced) [pdf, ps, other]
Title: TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents
Comments: AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[602]  arXiv:2504.16632 (replaced) [pdf, ps, other]
Title: Efficient Algorithms for Maximal Matroid Degenerations and Irreducible Decompositions of Circuit Varieties
Subjects: Combinatorics (math.CO); Symbolic Computation (cs.SC); Algebraic Geometry (math.AG)
[603]  arXiv:2505.04488 (replaced) [pdf, ps, other]
Title: "I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Comments: 17 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[604]  arXiv:2505.05323 (replaced) [pdf, ps, other]
Title: Approximation-free Control for Signal Temporal Logic Specifications using Spatiotemporal Tubes
Comments: This paper is a revised version of the article published in IEEE Control Systems Letters (L-CSS). This version corrects the errors in Theorem 3.2 and the proof of Theorem 3.3, that was present in the initial submitted manuscript
Journal-ref: IEEE Control Systems Letters, vol. 9, pp. 1562-1567, 2025
Subjects: Systems and Control (eess.SY)
[605]  arXiv:2505.07614 (replaced) [pdf, ps, other]
Title: Bant: Byzantine Antidote via Trial Function and Trust Scores
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[606]  arXiv:2505.10961 (replaced) [pdf, ps, other]
Title: Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[607]  arXiv:2505.11085 (replaced) [pdf, ps, other]
Title: A Fast Kernel-based Conditional Independence test with Application to Causal Discovery
Comments: 11 pages, 5 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[608]  arXiv:2505.12723 (replaced) [pdf, ps, other]
Title: On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
Subjects: Computation and Language (cs.CL)
[609]  arXiv:2505.13243 (replaced) [pdf, ps, other]
Title: Conformalized Decision Risk Assessment
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[610]  arXiv:2505.15436 (replaced) [pdf, ps, other]
Title: Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[611]  arXiv:2505.16732 (replaced) [pdf, ps, other]
Title: Sequential Monte Carlo for Policy Optimization in Continuous POMDPs
Comments: Accepted at NeurIPS 2025
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[612]  arXiv:2505.18091 (replaced) [pdf, ps, other]
Title: Data Mixing Can Induce Phase Transitions in Knowledge Acquisition
Comments: NeurIPS'25 Spotlight
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[613]  arXiv:2505.19361 (replaced) [pdf, ps, other]
Title: Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
[614]  arXiv:2505.19550 (replaced) [pdf, ps, other]
Title: Turing Test 2.0: The General Intelligence Threshold
Subjects: Artificial Intelligence (cs.AI)
[615]  arXiv:2505.22758 (replaced) [pdf, ps, other]
Title: FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[616]  arXiv:2506.00331 (replaced) [pdf, ps, other]
Title: TreeRare: Syntax Tree-Guided Retrieval and Reasoning for Knowledge-Intensive Question Answering
Subjects: Computation and Language (cs.CL)
[617]  arXiv:2506.00469 (replaced) [pdf, ps, other]
Title: Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data
Comments: EMMA-500 Gen 2; refer to Gen 1 in arXiv:2409.17892
Subjects: Computation and Language (cs.CL)
[618]  arXiv:2506.01269 (replaced) [pdf, ps, other]
Title: Region-of-Interest-Guided Deep Joint Source-Channel Coding for Image Transmission
Subjects: Information Theory (cs.IT)
[619]  arXiv:2506.01926 (replaced) [pdf, ps, other]
Title: Large language models can learn and generalize steganographic chain-of-thought under process supervision
Comments: 10 pages main text, 3 figures main text, 17 pages supplementary material, 1 figure supplementary material, accepted at NeurIPS 2025
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[620]  arXiv:2506.03726 (replaced) [pdf, ps, other]
Title: Introducing multiverse analysis to bibliometrics: The case of team size effects on disruptive research
Comments: 50 pages, 3 figures, 11 tables
Subjects: Digital Libraries (cs.DL); Applications (stat.AP)
[621]  arXiv:2506.03942 (replaced) [pdf, ps, other]
Title: Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Comments: 15 pages, 6 figures, IEEE TMI submission. This version originally appeared in error as arXiv:2403.06759(v2)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[622]  arXiv:2506.05901 (replaced) [pdf, ps, other]
Title: Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[623]  arXiv:2506.08123 (replaced) [pdf, ps, other]
Title: QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA
Comments: Findings of the Association for Computational Linguistics: EMNLP 2025, pages 20619-20642, Suzhou, China
Journal-ref: Findings of the Association for Computational Linguistics: EMNLP 2025, pages 20619-20642, Suzhou, China, 2025
Subjects: Computation and Language (cs.CL)
[624]  arXiv:2506.08139 (replaced) [pdf, ps, other]
Title: SoftStep: Learning Sparse Similarity Powers Deep Neighbor-Based Regression
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[625]  arXiv:2506.09495 (replaced) [pdf, ps, other]
Title: Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[626]  arXiv:2506.09508 (replaced) [pdf, ps, other]
Title: Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
[627]  arXiv:2506.09532 (replaced) [pdf, ps, other]
Title: Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[628]  arXiv:2506.10098 (replaced) [pdf, ps, other]
Title: Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models
Comments: 9 pages, 4 figures; This work has been submitted to the IEEE for possible publication; Code available at: this https URL
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[629]  arXiv:2506.10138 (replaced) [pdf, ps, other]
Title: Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
Comments: Presented at the Mechanistic Interpretability Workshop at NeurIPS 2025. 34 pages, 26 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[630]  arXiv:2506.12200 (replaced) [pdf, ps, other]
Title: PRO-V-R1: Reasoning Enhanced Programming Agent for RTL Verification
Subjects: Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
[631]  arXiv:2506.13222 (replaced) [pdf, ps, other]
Title: NeuroPhysNet: A FitzHugh-Nagumo-Based Physics-Informed Neural Network Framework for Electroencephalograph (EEG) Analysis and Motor Imagery Classification
Comments: Here is a revised version of the manuscript, incorporating Prof. Yuantong Gu's contributions to restructuring and revising the manuscript
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[632]  arXiv:2506.16016 (replaced) [pdf, ps, other]
Title: Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
Subjects: Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
[633]  arXiv:2506.20240 (replaced) [pdf, ps, other]
Title: Low-order finite element complex with application to a fourth-order elliptic singular perturbation problem
Comments: 28 pages, 1 figure
Subjects: Numerical Analysis (math.NA)
[634]  arXiv:2506.22406 (replaced) [pdf, ps, other]
Title: Baseline-improved Economic Model Predictive Control for Optimal Microgrid Dispatch
Comments: 18 pages, 4 tables, Manuscript submitted on Automatica
Subjects: Systems and Control (eess.SY)
[635]  arXiv:2506.23749 (replaced) [pdf, ps, other]
Title: A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications
Subjects: Software Engineering (cs.SE)
[636]  arXiv:2507.04315 (replaced) [pdf, ps, other]
Title: HLStrans: Dataset for C-to-HLS Hardware Code Synthesis
Subjects: Hardware Architecture (cs.AR)
[637]  arXiv:2507.06625 (replaced) [pdf, ps, other]
Title: Q-STAC: Q-Guided Stein Variational Model Predictive Actor-Critic
Comments: 9 pages, 10 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[638]  arXiv:2507.06968 (replaced) [pdf, ps, other]
Title: Scaling Towards the Information Boundary of Instruction Sets: The Infinity Instruct Subject Technical Report
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[639]  arXiv:2507.07381 (replaced) [pdf, ps, other]
Title: Multi-Focus Temporal Shifting for Precise Event Spotting in Sports Videos
Comments: 7 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[640]  arXiv:2507.09080 (replaced) [pdf, ps, other]
Title: BioAnalyst: A Foundation Model for Biodiversity
Subjects: Artificial Intelligence (cs.AI)
[641]  arXiv:2507.10965 (replaced) [pdf, ps, other]
Title: Convolutive sequences, I: Through the lens of integer partition functions
Comments: 23 pages, accepted by Experimental Mathematics
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Number Theory (math.NT)
[642]  arXiv:2507.17588 (replaced) [pdf, ps, other]
Title: Dual-branch Prompting for Multimodal Machine Translation
Comments: This manuscript is currently under review at the ACM Transactions on Multimedia Computing, Communications, and Applications
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[643]  arXiv:2507.18926 (replaced) [pdf, ps, other]
Title: Geometric Multi-color Message Passing Graph Neural Networks for Blood-brain Barrier Permeability Prediction
Subjects: Machine Learning (cs.LG)
[644]  arXiv:2507.20862 (replaced) [pdf, ps, other]
Title: Bi-cephalic self-attended model to classify Parkinson's disease patients with freezing of gait
Authors: Shomoita Jahid Mitin (1,2), Rodrigue Rizk (2), Maximilian Scherer (3), Thomas Koeglsperger (3), Daniel Lench (4), KC Santosh (2), Arun Singh (1,5) ((1) Biomedical and Translational Sciences, University of South Dakota, Vermillion, SD, USA and (2) Artificial Intelligence Research lab, Department of Computer Science, University of South Dakota, Vermillion, SD and (3) Department of Neurology, Ludwig Maximilian University, Munich, Germany and (4) Department of Neurology, Medical University of South Carolina, Charleston, SC, USA and (5) Department of Neuroscience, Sanford School of Medicine, University of South Dakota, Sioux Falls, SD, USA)
Comments: 39 pages, 8339 words, 4 figures, 3 tables, European Journal of Neuroscience: Special edition FOG
Subjects: Machine Learning (cs.LG)
[645]  arXiv:2508.07198 (replaced) [pdf, ps, other]
Title: WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis
Subjects: Software Engineering (cs.SE)
[646]  arXiv:2508.08833 (replaced) [pdf, ps, other]
Title: An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems
Comments: 34 pages, 9 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[647]  arXiv:2508.09476 (replaced) [pdf, ps, other]
Title: Collaborative Face Experts Fusion in Video Generation: Boosting Identity Consistency Across Large Face Poses
Comments: \href{this https URL}{Project page}
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[648]  arXiv:2508.09560 (replaced) [pdf, ps, other]
Title: WeatherPrompt: Multi-modality Representation Learning for All-Weather Drone Visual Geo-Localization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[649]  arXiv:2508.09950 (replaced) [pdf, ps, other]
Title: PPL: Point Cloud Supervised Proprioceptive Locomotion Reinforcement Learning for Legged Robots in Crawl Spaces
Comments: Accepted by RA-L
Subjects: Robotics (cs.RO)
[650]  arXiv:2508.11422 (replaced) [pdf, ps, other]
Title: Principles of Physiological Closed-Loop Controllers in Neuromodulation
Comments: 36 pages, 10 figures
Subjects: Systems and Control (eess.SY); Neurons and Cognition (q-bio.NC)
[651]  arXiv:2508.12365 (replaced) [pdf, ps, other]
Title: TaoSR1: The Thinking Model for E-commerce Relevance Search
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[652]  arXiv:2508.12943 (replaced) [pdf, ps, other]
Title: OPTIC-ER: A Reinforcement Learning Framework for Real-Time Emergency Response and Equitable Resource Allocation in Underserved African Communities
Authors: Mary Tonwe
Comments: Source code and data available at: this https URL
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
[653]  arXiv:2508.16634 (replaced) [pdf, ps, other]
Title: Few-shot Class-incremental Fault Diagnosis by Preserving Class-Agnostic Knowledge with Dual-Granularity Representations
Comments: This manuscript is currently under review at the Measurement
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[654]  arXiv:2508.16667 (replaced) [pdf, ps, other]
Title: BrainPath: A Biologically-Informed AI Framework for Individualized Aging Brain Generation
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[655]  arXiv:2508.18246 (replaced) [pdf, ps, other]
Title: Flight-Ready Precise and Robust Carrier-Phase GNSS Navigation Software for Distributed Space Systems
Subjects: Systems and Control (eess.SY)
[656]  arXiv:2508.19891 (replaced) [pdf, ps, other]
Title: Simpler is Faster: Practical Distance Reporting by Sorting Along a Space-Filling Curve
Subjects: Computational Geometry (cs.CG)
[657]  arXiv:2509.00030 (replaced) [pdf, ps, other]
Title: SignBind-LLM: Multi-Stage Modality Fusion for Sign Language Translation
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[658]  arXiv:2509.01721 (replaced) [pdf, ps, other]
Title: Convolutional Monge Mapping between EEG Datasets to Support Independent Component Labeling
Comments: Code available at: this https URL; Accepted to NeurIPS 2025 Workshop on Learning from Time Series for Health
Subjects: Machine Learning (cs.LG)
[659]  arXiv:2509.03015 (replaced) [pdf, ps, other]
Title: Harnessing Batched BLAS/LAPACK Kernels on GPUs for Parallel Solutions of Block Tridiagonal Systems
Comments: 7 pages
Subjects: Mathematical Software (cs.MS)
[660]  arXiv:2509.04084 (replaced) [pdf, ps, other]
Title: Optimizing Frequent Checkpointing via Low-Cost Differential for Distributed Training Systems
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[661]  arXiv:2509.04588 (replaced) [pdf, ps, other]
Title: Beyond Output Faithfulness: Learning Attributions that Preserve Computational Pathways
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[662]  arXiv:2509.04734 (replaced) [pdf, ps, other]
Title: Beyond I-Con: Exploring New Dimension of Distance Measures in Representation Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[663]  arXiv:2509.04791 (replaced) [pdf, ps, other]
Title: What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking
Subjects: Artificial Intelligence (cs.AI)
[664]  arXiv:2509.08765 (replaced) [pdf, ps, other]
Title: One-shot acceleration of transient PDE solvers via online-learned preconditioners
Comments: code available at this https URL
Subjects: Computational Physics (physics.comp-ph); Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
[665]  arXiv:2509.11418 (replaced) [pdf, ps, other]
Title: Mechanizing Synthetic Tait Computability in Istari
Subjects: Programming Languages (cs.PL)
[666]  arXiv:2509.12712 (replaced) [pdf, ps, other]
Title: A Lightweight Architecture for Multi-instrument Transcription with Practical Optimizations
Subjects: Sound (cs.SD); Information Retrieval (cs.IR)
[667]  arXiv:2509.12760 (replaced) [pdf, ps, other]
Title: Similarity-Distance-Magnitude Activations
Authors: Allen Schmaltz
Comments: 21 pages, 8 tables, 1 algorithm. arXiv admin note: substantial text overlap with arXiv:2502.20167
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[668]  arXiv:2509.13400 (replaced) [pdf, ps, other]
Title: Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[669]  arXiv:2509.16717 (replaced) [pdf, ps, other]
Title: Semi-Supervised Synthetic Data Generation with Fine-Grained Relevance Control for Short Video Search Relevance Modeling
Comments: Submitted to AAAI 2026
Subjects: Computation and Language (cs.CL)
[670]  arXiv:2509.18874 (replaced) [pdf, ps, other]
Title: When Ads Become Profiles: Uncovering the Invisible Risk of Web Advertising at Scale with LLMs
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[671]  arXiv:2509.19830 (replaced) [pdf, ps, other]
Title: On the Rate of Convergence of Kolmogorov-Arnold Network Regression Estimators
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[672]  arXiv:2509.21399 (replaced) [pdf, ps, other]
Title: Downscaling climate projections to 1 km with single-image super resolution
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[673]  arXiv:2509.21423 (replaced) [pdf, ps, other]
Title: Near-Optimal Experiment Design in Linear non-Gaussian Cyclic Models
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[674]  arXiv:2509.21584 (replaced) [pdf, ps, other]
Title: IndiSeek learns information-guided disentangled representations
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[675]  arXiv:2509.21670 (replaced) [pdf, ps, other]
Title: MORPH: PDE Foundation Models with Arbitrary Data Modality
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
[676]  arXiv:2509.22761 (replaced) [pdf, ps, other]
Title: MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
Comments: 21 pages,13 figures,9 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[677]  arXiv:2509.23808 (replaced) [pdf, ps, other]
Title: Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[678]  arXiv:2509.24894 (replaced) [pdf, ps, other]
Title: Improved Stochastic Optimization of LogSumExp
Comments: 19 pages, 5 figures, 1 table
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[679]  arXiv:2509.25723 (replaced) [pdf, ps, other]
Title: SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition
Comments: 23 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[680]  arXiv:2509.25998 (replaced) [pdf, ps, other]
Title: VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[681]  arXiv:2510.00002 (replaced) [pdf, ps, other]
Title: Formally and Empirically Verified Methodologies for Scalable Hierarchical Full-Stack Systems
Authors: Dong Liu
Comments: Significant revision: refined paper structure, updated FDR 4.2.7 machine-checked CSP models, strengthened LTL-based correctness verification, and incorporated a detailed description of the enterprise-grade experimental environment
Subjects: Software Engineering (cs.SE)
[682]  arXiv:2510.01012 (replaced) [pdf, ps, other]
Title: Random Feature Spiking Neural Networks
Comments: 37 pages incl. references & appendix, 6 figures, 6 tables
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[683]  arXiv:2510.01440 (replaced) [pdf, ps, other]
Title: Cobham's theorem for the Gaussian integers
Comments: 15 pages, 2 figures
Subjects: Number Theory (math.NT); Formal Languages and Automata Theory (cs.FL); Commutative Algebra (math.AC)
[684]  arXiv:2510.01523 (replaced) [pdf, ps, other]
Title: MetaSynth: Multi-Agent Metadata Generation from Implicit Feedback in Black-Box Systems
Comments: IEEE Big Data 2025
Subjects: Information Retrieval (cs.IR)
[685]  arXiv:2510.01903 (replaced) [pdf, ps, other]
Title: MelTok: 2D Tokenization for Single-Codebook Audio Compression
Comments: 11 pages, 6 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[686]  arXiv:2510.02067 (replaced) [pdf, ps, other]
Title: Adaptive Kernel Selection for Stein Variational Gradient Descent
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[687]  arXiv:2510.02333 (replaced) [pdf, ps, other]
Title: Human Mobility Datasets Enriched With Contextual and Social Dimensions
Comments: 5 pages, 3 figures, 1 table
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
[688]  arXiv:2510.02967 (replaced) [pdf, ps, other]
Title: Grounding Large Language Models in Clinical Evidence: A Retrieval-Augmented Generation System for Querying UK NICE Clinical Guidelines
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[689]  arXiv:2510.03163 (replaced) [pdf, ps, other]
Title: ROGR: Relightable 3D Objects using Generative Relighting
Comments: NeurIPS 2025 Spotlight. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[690]  arXiv:2510.03731 (replaced) [pdf, ps, other]
Title: Optimizing Fine-Tuning through Advanced Initialization Strategies for Low-Rank Adaptation
Authors: Yongfu Xue
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[691]  arXiv:2510.05013 (replaced) [pdf, ps, other]
Title: Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration
Comments: 20 pages, 19 pages of supplementary material
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[692]  arXiv:2510.05497 (replaced) [pdf, ps, other]
Title: Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Machine Learning (cs.LG)
[693]  arXiv:2510.06783 (replaced) [pdf, ps, other]
Title: TTRV: Test-Time Reinforcement Learning for Vision Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[694]  arXiv:2510.07594 (replaced) [pdf, ps, other]
Title: Locality-Sensitive Hashing-Based Efficient Point Transformer for Charged Particle Reconstruction
Comments: Accepted to NeurIPS 2025 Machine Learning and the Physical Sciences Workshop
Subjects: High Energy Physics - Experiment (hep-ex); Machine Learning (cs.LG)
[695]  arXiv:2510.07828 (replaced) [pdf, ps, other]
Title: MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions
Comments: Accepted to WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[696]  arXiv:2510.10062 (replaced) [pdf, ps, other]
Title: HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks
Comments: Submitted to ICLR 2026
Subjects: Computation and Language (cs.CL)
[697]  arXiv:2510.12586 (replaced) [pdf, ps, other]
Title: There is No VAE: End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-training
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[698]  arXiv:2510.13872 (replaced) [pdf, ps, other]
Title: Joint Discriminative-Generative Modeling via Dual Adversarial Training
Comments: V2: ImageNet 256x256 with ConvNeXt-Large (FID 3.29), formal theoretical analysis, L_inf training, computational analysis, expanded related work/baselines. Revised presentation. Code: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[699]  arXiv:2510.15712 (replaced) [pdf, ps, other]
Title: PLS-complete problems with lexicographic cost functions: Max-$k$-SAT and Abelian Permutation Orbit Minimization
Comments: 29 pages
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
[700]  arXiv:2510.18870 (replaced) [pdf, ps, other]
Title: Triangle Multiplication Is All You Need For Biomolecular Structure Representations
Comments: Preprint
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)
[701]  arXiv:2510.19430 (replaced) [pdf, ps, other]
Title: GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Comments: this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[702]  arXiv:2510.19667 (replaced) [pdf, ps, other]
Title: Dara: Automated multiple-hypothesis phase identification and refinement from powder X-ray diffraction
Subjects: Materials Science (cond-mat.mtrl-sci); Software Engineering (cs.SE); Data Analysis, Statistics and Probability (physics.data-an)
[703]  arXiv:2510.21245 (replaced) [pdf, ps, other]
Title: Convergence of Stochastic Gradient Langevin Dynamics in the Lazy Training Regime
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[704]  arXiv:2510.22379 (replaced) [pdf, ps, other]
Title: TraceTrans: Translation and Spatial Tracing for Surgical Prediction
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[705]  arXiv:2510.22890 (replaced) [pdf, ps, other]
Title: Reducing measurements in quantum erasure correction by quantum local recovery
Comments: 15 pages, 1 figure. Review of stabilizer codes is reused from arXiv:2505.18840. Version 2 corrected errors in original Proposition 3 (Theorem 3 and Proposition 4 in V2), Figure 1, and example computations in Section 6. Version 3 corrected confusing sentences in Remark 10 and several typos. No essential changes were made between versions 2 and 3
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
[706]  arXiv:2510.25121 (replaced) [pdf, ps, other]
Title: Bilevel Models for Adversarial Learning and A Case Study
Comments: This paper has been accepted by Mathematics
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[707]  arXiv:2510.25982 (replaced) [pdf, ps, other]
Title: Enabling Fast and Accurate Neutral Atom Readout through Image Denoising
Comments: 12 pages, 15 figures
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
[708]  arXiv:2511.00494 (replaced) [pdf, ps, other]
Title: An Indoor Radio Mapping Dataset Combining 3D Point Clouds and RSSI
Comments: 12 pages, 7 figures, 3 tables, under review to Nature Scientific Data
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)
[709]  arXiv:2511.01437 (replaced) [pdf, ps, other]
Title: Designing for Distributed Heterogeneous Modularity: On Software Architecture and Deployment of MoonBots
Comments: 6 pages, 8 figures. Accepted at ISPARO 2025
Subjects: Robotics (cs.RO)
[710]  arXiv:2511.03070 (replaced) [pdf, ps, other]
Title: Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
[711]  arXiv:2511.03928 (replaced) [pdf, ps, other]
Title: SynQuE: Estimating Synthetic Dataset Quality Without Annotations
Comments: Under review
Subjects: Machine Learning (cs.LG)
[712]  arXiv:2511.04439 (replaced) [pdf, ps, other]
Title: The Peril of Preference: Why GRPO fails on Ordinal Rewards
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[713]  arXiv:2511.04473 (replaced) [pdf, ps, other]
Title: Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
[714]  arXiv:2511.04479 (replaced) [pdf, ps, other]
Title: ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai
Comments: Accepted at IJCNLP-AACL 2025 (Main). This version includes the corrected Table 2 and an updated conclusion regarding the deletion count of the Gemma model
Subjects: Computation and Language (cs.CL)
[715]  arXiv:2511.04572 (replaced) [pdf, ps, other]
Title: Fisher Meets Lindahl: A Unified Duality Framework for Market Equilibrium
Comments: 51 pages. Abstract shortened to meet arXiv's requirement. Fixed a few typos
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)
[716]  arXiv:2511.06208 (replaced) [pdf, ps, other]
Title: Resilience Inference for Supply Chains with Hypergraph Neural Network
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[717]  arXiv:2511.07093 (replaced) [pdf, ps, other]
Title: Stability of 0-dimensional persistent homology in enriched and sparsified point clouds
Comments: 18 pages, 13 figures. Clarified exposition and motivation. Comments welcome
Subjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG)
[718]  arXiv:2511.07161 (replaced) [pdf, ps, other]
Title: LLMscape
Comments: Accepted to NeurIPS 2025, Creative AI Track (updated to include poster)
Subjects: Machine Learning (cs.LG)
[719]  arXiv:2511.07441 (replaced) [pdf, ps, other]
Title: AudAgent: Automated Auditing of Privacy Policy Compliance in AI Agents
Authors: Ye Zheng, Yidan Hu
Comments: Submitted to PETS'26 Issue 3
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[720]  arXiv:2511.08003 (replaced) [pdf, ps, other]
Title: Sharp Eyes and Memory for VideoLLMs: Information-Aware Visual Token Pruning for Efficient and Reliable VideoLLM Reasoning
Comments: The 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26) Poster
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[721]  arXiv:2511.09144 (replaced) [pdf, ps, other]
Title: Practical Global and Local Bounds in Gaussian Process Regression via Chaining
Comments: Accepted as a conference paper at AAAI2026
Subjects: Machine Learning (cs.LG)
[722]  arXiv:2511.09568 (replaced) [pdf, ps, other]
Title: VEDA: 3D Molecular Generation via Variance-Exploding Diffusion with Annealing
Subjects: Chemical Physics (physics.chem-ph); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[723]  arXiv:2511.10119 (replaced) [pdf, ps, other]
Title: Intelligence Foundation Model: A New Perspective to Approach Artificial General Intelligence
Authors: Borui Cai, Yao Zhao
Subjects: Artificial Intelligence (cs.AI)
[724]  arXiv:2511.10202 (replaced) [pdf, ps, other]
Title: Algorithms and Complexity of Hedge Cluster Deletion Problems
Comments: Added the kernelization complexity and corrected few statements
Subjects: Data Structures and Algorithms (cs.DS)
[725]  arXiv:2511.11520 (replaced) [pdf, ps, other]
Title: Scalable Policy Evaluation with Video World Models
Subjects: Robotics (cs.RO)
[726]  arXiv:2511.12063 (replaced) [pdf, ps, other]
Title: Bayesian Optimization in Language Space: An Eval-Efficient AI Self-Improvement Framework
Subjects: Artificial Intelligence (cs.AI)
[727]  arXiv:2511.12784 (replaced) [pdf, ps, other]
Title: Evaluating Autoformalization Robustness via Semantically Similar Paraphrasing
Subjects: Computation and Language (cs.CL); Logic in Computer Science (cs.LO)
[728]  arXiv:2511.13240 (replaced) [pdf, ps, other]
Title: Incoherent Beliefs & Inconsistent Actions in Large Language Models
Subjects: Machine Learning (cs.LG)
[729]  arXiv:2511.13725 (replaced) [pdf, ps, other]
Title: AI Kill Switch for malicious web-based LLM agent
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[730]  arXiv:2511.14317 (replaced) [pdf, ps, other]
Title: Intervention Efficiency and Perturbation Validation Framework: Capacity-Aware and Robust Clinical Model Selection under the Rashomon Effect
Comments: Accepted to the Workshop on Navigating Model Uncertainty and the Rashomon Effect: From Theory and Tools to Applications and Impact (AAAI 2026)
Subjects: Machine Learning (cs.LG)
[731]  arXiv:2511.15371 (replaced) [pdf, ps, other]
Title: CID: Measuring Feature Importance Through Counterfactual Distributions
Comments: Accepted at Northern Lights Deep Learning (NLDL) 2026 Conference
Subjects: Machine Learning (cs.LG)
[732]  arXiv:2511.16139 (replaced) [pdf, ps, other]
Title: Multidimensional Rubric-oriented Reward Model Learning via Geometric Projection Reference Constraints
Subjects: Artificial Intelligence (cs.AI)
[733]  arXiv:2511.16275 (replaced) [pdf, ps, other]
Title: SeSE: A Structural Information-Guided Uncertainty Quantification Framework for Hallucination Detection in LLMs
Comments: 14 pages of main text and 10 pages of appendices;Submit to IEEE TKDE
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[734]  arXiv:2511.16708 (replaced) [pdf, ps, other]
Title: Multi-Agent Code Verification via Information Theory
Authors: Shreshth Rajan
Comments: 18 pages, 3 figures, 9 tables
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
[735]  arXiv:2511.16857 (replaced) [pdf, ps, other]
Title: BOP-ASK: Object-Interaction Reasoning for Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[736]  arXiv:2511.17127 (replaced) [pdf, ps, other]
Title: Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
[737]  arXiv:2511.17147 (replaced) [src]
Title: A lightweight detector for real-time detection of remote sensing images
Comments: wrong results
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[738]  arXiv:2511.17191 (replaced) [pdf, ps, other]
Title: Independent sets and colorings of $K_{t,t,t}$-free graphs
Comments: 24 pages; 2 figures
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
[739]  arXiv:2511.17402 (replaced) [pdf, ps, other]
Title: PUCP-Metrix: An Open-source and Comprehensive Toolkit for Linguistic Analysis of Spanish Texts
Comments: 1 figure, Submitted to EACL Demo track (under review)
Subjects: Computation and Language (cs.CL)
[740]  arXiv:2511.17419 (replaced) [pdf, ps, other]
Title: DS-Span: Single-Phase Discriminative Subgraph Mining for Efficient Graph Embeddings
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[741]  arXiv:2511.17983 (replaced) [pdf, ps, other]
Title: An Adaptive Resonance Theory-based Topological Clustering Algorithm with a Self-Adjusting Vigilance Parameter
Comments: This manuscript is currently under review
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[742]  arXiv:2511.18203 (replaced) [pdf, ps, other]
Title: SkillWrapper: Generative Predicate Invention for Skill Abstraction
Subjects: Robotics (cs.RO)
[743]  arXiv:2511.18533 (replaced) [pdf, ps, other]
Title: DE-KAN: A Kolmogorov Arnold Network with Dual Encoder for accurate 2D Teeth Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[744]  arXiv:2511.18685 (replaced) [pdf, ps, other]
Title: Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[745]  arXiv:2511.18723 (replaced) [src]
Title: N2N: A Parallel Framework for Large-Scale MILP under Distributed Memory
Comments: We cannot publish the paper at this time because some internal processes have not yet been completed
Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
[746]  arXiv:2511.19035 (replaced) [pdf, ps, other]
Title: Changes in Gaza: DINOv3-Powered Multi-Class Change Detection for Damage Assessment in Conflict Zones
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[747]  arXiv:2511.19708 (replaced) [pdf, ps, other]
Title: An Accelerated Distributed Optimization with Equality and Inequality Coupling Constraints
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[748]  arXiv:2511.19730 (replaced) [pdf, ps, other]
Title: Training-Free Active Learning Framework in Materials Science with Large Language Models
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
[749]  arXiv:2511.20193 (replaced) [pdf, ps, other]
Title: Separating the Wheat from the Chaff: Understanding (In-)Completeness of Proof Mechanisms for Separation Logic with Inductive Definitions
Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
[750]  arXiv:2511.20275 (replaced) [pdf, ps, other]
Title: HAFO: A Force-Adaptive Control Framework for Humanoid Robots in Intense Interaction Environments
Subjects: Robotics (cs.RO)
[751]  arXiv:2511.20395 (replaced) [pdf, ps, other]
Title: Identifying environmental factors associated with tetrodotoxin contamination in bivalve mollusks using eXplainable AI
Comments: 18 pages, 6 figures, submitted to npj Science of Food
Subjects: Machine Learning (cs.LG)
[752]  arXiv:2511.20785 (replaced) [pdf, ps, other]
Title: LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[753]  arXiv:2511.21164 (replaced) [pdf, ps, other]
Title: Preference-Aligned Options from Generative AI Compensates for Age-Related Cognitive Decline in Decision Making
Subjects: Human-Computer Interaction (cs.HC)
[754]  arXiv:2511.21319 (replaced) [pdf, ps, other]
Title: Analytical Phasor-Based Fault Location Enhancement for Wind Farm Collector Networks
Subjects: Systems and Control (eess.SY)
[755]  arXiv:2511.21523 (replaced) [pdf, ps, other]
Title: EoS-FM: Can an Ensemble of Specialist Models act as a Generalist Feature Extractor?
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[756]  arXiv:2511.21620 (replaced) [pdf, ps, other]
Title: Mean-square exponential stability of exact and numerical solutions for neutral stochastic delay differential equations with Markovian switching
Comments: 23 pages, 4 figures; Added a missing institutional affiliation; other content unchanged
Subjects: Numerical Analysis (math.NA); Probability (math.PR)
[757]  arXiv:2511.21750 (replaced) [pdf, ps, other]
Title: SO-Bench: A Structural Output Evaluation of Multimodal LLMs
Comments: v2 preprint. Fixed some typos, add a discussion about limitation, provide pseudo-codes for eval
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)
[758]  arXiv:2511.21908 (replaced) [src]
Title: Multi-Modal Machine Learning for Early Trust Prediction in Human-AI Interaction Using Face Image and GSR Bio Signals
Comments: This version contains errors in content presentation and arrangement, so it is being withdrawn until a corrected version is generated
Subjects: Machine Learning (cs.LG)
[759]  arXiv:2511.22037 (replaced) [pdf, ps, other]
Title: What AI Speaks for Your Community: Polling AI Agents for Public Opinion on Data Center Projects
Comments: 35 Pages. Accepted to NeurIPS 2025 Workshop on Socially Responsible and Trustworthy Foundation Models (ResponsibleFM)
Subjects: Computers and Society (cs.CY)
[760]  arXiv:2511.22148 (replaced) [pdf, ps, other]
Title: Towards Heterogeneous Quantum Federated Learning: Challenges and Solutions
Comments: Accepted at IEEE Network Magazine
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI)
[761]  arXiv:2511.22244 (replaced) [src]
Title: Joint Scheduling of Workload Demand and Energy Supply in Low-carbon Data Centers with Decision-Dependent Uncertainty Set
Comments: The upper and lower bounds of the uncertainty set relied on in decision-making need to be reconsidered
Subjects: Systems and Control (eess.SY)
[762]  arXiv:2511.22254 (replaced) [pdf, ps, other]
Title: Co-Evolving Agents: Learning from Failures as Hard Negatives
Subjects: Artificial Intelligence (cs.AI)
[763]  arXiv:2511.22345 (replaced) [pdf, ps, other]
Title: Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
Comments: Accepted by AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[764]  arXiv:2511.22421 (replaced) [pdf, ps, other]
Title: Semantic-Aware Caching for Efficient Image Generation in Edge Computing
Subjects: Networking and Internet Architecture (cs.NI)
[765]  arXiv:2511.22809 (replaced) [pdf, ps, other]
Title: AI summaries in online search influence users' attitudes
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[766]  arXiv:2511.22861 (replaced) [pdf, ps, other]
Title: Escaping Barren Plateaus in Variational Quantum Algorithms Using Negative Learning Rate in Quantum Internet of Things
Comments: Accepted at IEEE Internet of Things Journal
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI)
[767]  arXiv:2511.23002 (replaced) [pdf, ps, other]
Title: JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization
Comments: 31 pages, 18 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[768]  arXiv:2511.23075 (replaced) [pdf, ps, other]
Title: SpaceMind: Camera-Guided Modality Fusion for Spatial Reasoning in Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[769]  arXiv:2511.23256 (replaced) [pdf, ps, other]
Title: Robust HRRP Recognition under Interrupted Sampling Repeater Jamming using a Prior Jamming Information-Guided Network
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)
[770]  arXiv:2512.00074 (replaced) [pdf, ps, other]
Title: Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[771]  arXiv:2512.00572 (replaced) [pdf, ps, other]
Title: Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[772]  arXiv:2512.00580 (replaced) [pdf, ps, other]
Title: Non-Asymptotic Convergence of Discrete Diffusion Models: Masked and Random Walk dynamics
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[773]  arXiv:2512.00626 (replaced) [pdf, ps, other]
Title: XAI-Driven Skin Disease Classification: Leveraging GANs to Augment ResNet-50 Performance
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[774]  arXiv:2512.00686 (replaced) [pdf, ps, other]
Title: Using physics-inspired Singular Learning Theory to understand grokking & other phase transitions in modern neural networks
Comments: 8 pages, preprint
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[775]  arXiv:2512.00770 (replaced) [pdf, ps, other]
Title: Rate-Splitting Multiple Access for Secure Near-Field Integrated Sensing and Communication
Comments: 13 pages and 6 figures
Subjects: Information Theory (cs.IT)
[776]  arXiv:2512.00774 (replaced) [pdf, ps, other]
Title: Hybrid Beamfocusing Design for RSMA-Enhanced Near-Field Secure Communications
Comments: 10 pages and 6 figures
Subjects: Information Theory (cs.IT)
[777]  arXiv:2512.01152 (replaced) [pdf, ps, other]
Title: Open-Set Domain Adaptation Under Background Distribution Shift: Challenges and A Provably Efficient Solution
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[778]  arXiv:2512.01424 (replaced) [pdf, ps, other]
Title: ViRectify: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language Models
Comments: 22 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[779]  arXiv:2512.01621 (replaced) [src]
Title: Ergodicity and invariant measure approximation of the stochastic Cahn-Hilliard equation via an explicit fully discrete scheme
Comments: This version contains errors that require substantial revision. I will resubmit a corrected version later
Subjects: Numerical Analysis (math.NA)
[780]  arXiv:2512.01710 (replaced) [pdf, ps, other]
Title: MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications
Authors: Stefano Zeppieri
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
[781]  arXiv:2512.01973 (replaced) [pdf, ps, other]
Title: Bounded treewidth, multiple context-free grammars, and downward closures
Subjects: Formal Languages and Automata Theory (cs.FL)
[782]  arXiv:2512.02056 (replaced) [pdf, ps, other]
Title: Reversing Large Language Models for Efficient Training and Fine-Tuning
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[783]  arXiv:2512.02200 (replaced) [pdf, ps, other]
Title: Modelling the Doughnut of social and planetary boundaries with frugal machine learning
Comments: Presented at the Rethinking AI Workshop @ EurIPS'25
Subjects: Machine Learning (cs.LG); General Economics (econ.GN)
[784]  arXiv:2512.02403 (replaced) [pdf, ps, other]
Title: ESACT: An End-to-End Sparse Accelerator for Compute-Intensive Transformers via Local Similarity
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
[785]  arXiv:2512.03025 (replaced) [pdf, ps, other]
Title: LORE: A Large Generative Model for Search Relevance
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[786]  arXiv:2512.03377 (replaced) [pdf, ps, other]
Title: Nexus: Higher-Order Attention Mechanisms in Transformers
Subjects: Computation and Language (cs.CL)
[787]  arXiv:2512.03397 (replaced) [pdf, ps, other]
Title: Surfel-LIO: Fast LiDAR-Inertial Odometry with Pre-computed Surfels and Hierarchical Z-order Voxel Hashing
Subjects: Robotics (cs.RO)
[788]  arXiv:2512.03405 (replaced) [pdf, ps, other]
Title: ViDiC: Video Difference Captioning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[789]  arXiv:2512.03636 (replaced) [pdf, ps, other]
Title: Head, posture, and full-body gestures in dyadic conversations
Comments: 7 figures, 10 tables, 29 pages
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[790]  arXiv:2512.03762 (replaced) [pdf, ps, other]
Title: RoCo: Role-Based LLMs Collaboration for Automatic Heuristic Design
Subjects: Artificial Intelligence (cs.AI)
[791]  arXiv:2512.03767 (replaced) [pdf, ps, other]
Title: CaFTRA: Frequency-Domain Correlation-Aware Feedback-Free MIMO Transmission and Resource Allocation for 6G and Beyond
Subjects: Systems and Control (eess.SY); Information Theory (cs.IT)
[792]  arXiv:2512.03771 (replaced) [pdf, ps, other]
Title: In-Context Representation Hijacking
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[793]  arXiv:2512.03783 (replaced) [pdf, ps, other]
Title: Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD)
[794]  arXiv:2512.03807 (replaced) [pdf, ps, other]
Title: Algorithms for Boolean Matrix Factorization using Integer Programming and Heuristics
Comments: 24 pages, 12 tables, 3 figures, 2 typos corrected in v2, code and data available from this https URL
Subjects: Information Retrieval (cs.IR); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)
[795]  arXiv:2512.03887 (replaced) [pdf, ps, other]
Title: A Hierarchical Tree-based approach for creating Configurable and Static Deep Research Agent (Static-DRA)
Authors: Saurav Prateek
Comments: 16 pages, 6 figures, 4 tables. Code available at: this https URL
Subjects: Artificial Intelligence (cs.AI)
[796]  arXiv:2512.03906 (replaced) [pdf, ps, other]
Title: IBM Multilevel Process Mining vs de facto Object-Centric Process Mining approaches
Subjects: Databases (cs.DB)
[797]  arXiv:2512.03914 (replaced) [pdf, ps, other]
Title: Integrating High Performance In-Memory Data Streaming and In-Situ Visualization in Hybrid MPI+OpenMP PIC MC Simulations Towards Exascale
Comments: Accepted by The International Journal of High Performance Computing Applications (IJHPCA) prepared in English, formatted in SAGE Publications (LaTeX) template and consists of 22 pages, which includes the main text, references, and figures
Subjects: Plasma Physics (physics.plasm-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[798]  arXiv:2512.03915 (replaced) [pdf, ps, other]
Title: A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models
Authors: X.Y. Han, Yuan Zhong
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[799]  arXiv:2512.03932 (replaced) [pdf, ps, other]
Title: Beyond the Ground Truth: Enhanced Supervision for Image Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[800]  arXiv:2512.03963 (replaced) [pdf, ps, other]
Title: TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[801]  arXiv:2512.03977 (replaced) [pdf, ps, other]
Title: An Information Theory of Finite Abstractions and their Fundamental Scalability Limits
Subjects: Systems and Control (eess.SY); Information Theory (cs.IT); Dynamical Systems (math.DS); Optimization and Control (math.OC)
[802]  arXiv:2512.04032 (replaced) [pdf, ps, other]
Title: Jina-VLM: Small Multilingual Vision Language Model
Comments: 18 pages, 1-7 main content, 13-18 appendix for tables and dataset
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[ total of 802 entries: 1-802 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2512, contact, help  (Access key information)