We gratefully acknowledge support from
the Simons Foundation and member institutions.

Statistics

New submissions

[ total of 122 entries: 1-122 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 5 Feb 26

[1]  arXiv:2602.03889 [pdf, ps, other]
Title: Transcendental Regularization of Finite Mixtures:Theoretical Guarantees and Practical Limitations
Authors: Ernest Fokoué
Comments: 24 pages, 6 figures, 2 tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Finite mixture models are widely used for unsupervised learning, but maximum likelihood estimation via EM suffers from degeneracy as components collapse. We introduce transcendental regularization, a penalized likelihood framework with analytic barrier functions that prevent degeneracy while maintaining asymptotic efficiency. The resulting Transcendental Algorithm for Mixtures of Distributions (TAMD) offers strong theoretical guarantees: identifiability, consistency, and robustness. Empirically, TAMD successfully stabilizes estimation and prevents collapse, yet achieves only modest improvements in classification accuracy-highlighting fundamental limits of mixture models for unsupervised learning in high dimensions. Our work provides both a novel theoretical framework and an honest assessment of practical limitations, implemented in an open-source R package.

[2]  arXiv:2602.03896 [pdf, ps, other]
Title: A Hitchhiker's Guide to Poisson Gradient Estimation
Comments: Code: this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: Exponential Arrival Time (EAT) simulation and Gumbel-SoftMax (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical guidance for practitioners. Our main technical contribution is a modification to the EAT method that theoretically guarantees an unbiased first moment (exactly matching the firing rate), and reduces second-moment bias. We evaluate these methods on their distributional fidelity, gradient quality, and performance on two tasks: (1) variational autoencoders with Poisson latents, and (2) partially observable generalized linear models, where latent neural connectivity must be inferred from observed spike trains. Across all metrics, our modified EAT method exhibits better overall performance (often comparable to exact gradients), and substantially higher robustness to hyperparameter choices. Together, our results clarify the trade-offs between these methods and offer concrete recommendations for practitioners working with Poisson latent variable models.

[3]  arXiv:2602.03899 [pdf, ps, other]
Title: Byzantine Machine Learning: MultiKrum and an optimal notion of robustness
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)

Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule has been extensively studied, and endowed with formal robustness and convergence guarantees. Yet, MultiKrum, a natural extension of Krum, is often preferred in practice for its superior empirical performance, even though no theoretical guarantees were available until now. In this work, we provide the first proof that MultiKrum is a robust aggregation rule, and bound its robustness coefficient. To do so, we introduce $\kappa^\star$, the optimal *robustness coefficient* of an aggregation rule, which quantifies the accuracy of mean estimation in the presence of adversaries in a tighter manner compared with previously adopted notions of robustness. We then construct an upper and a lower bound on MultiKrum's robustness coefficient. As a by-product, we also improve on the best-known bounds on Krum's robustness coefficient. We show that MultiKrum's bounds are never worse than Krum's, and better in realistic regimes. We illustrate this analysis by an experimental investigation on the quality of the lower bound.

[4]  arXiv:2602.03948 [pdf, ps, other]
Title: Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Statistics Theory (math.ST)

In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the available data are summarized solely through the degrees of the nodes in the network. We adopt the $\beta$ model, which is the prototypical statistical model adopted for this form of aggregated relational information, and study the problem of minimax-optimal parameter estimation under both local and central differential privacy constraints. We establish finite sample minimax lower bounds that characterize the precise dependence of the estimation risk on the network size and the privacy parameters, and we propose simple estimators that achieve these bounds up to constants and logarithmic factors under both local and central differential privacy frameworks. Our results provide the first comprehensive finite sample characterization of privacy utility trade offs for parameter estimation in $\beta$ models, addressing the classical graph case and extending the analysis to higher order hypergraph models. We further demonstrate the effectiveness of our methods through experiments on synthetic data and a real world communication network.

[5]  arXiv:2602.03954 [pdf, ps, other]
Title: Learning Multi-type heterogeneous interacting particle systems
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)

We propose a framework for the joint inference of network topology, multi-type interaction kernels, and latent type assignments in heterogeneous interacting particle systems from multi-trajectory data. This learning task is a challenging non-convex mixed-integer optimization problem, which we address through a novel three-stage approach. First, we leverage shared structure across agent interactions to recover a low-rank embedding of the system parameters via matrix sensing. Second, we identify discrete interaction types by clustering within the learned embedding. Third, we recover the network weight matrix and kernel coefficients through matrix factorization and a post-processing refinement. We provide theoretical guarantees with estimation error bounds under a Restricted Isometry Property (RIP) assumption and establish conditions for the exact recovery of interaction types based on cluster separability. Numerical experiments on synthetic datasets, including heterogeneous predator-prey systems, demonstrate that our method yields an accurate reconstruction of the underlying dynamics and is robust to noise.

[6]  arXiv:2602.03970 [pdf, ps, other]
Title: Statistical Guarantees for Reasoning Probes on Looped Boolean Circuits
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Metric Geometry (math.MG); Statistics Theory (math.ST)

We study the statistical behaviour of reasoning probes in a stylized model of looped reasoning, given by Boolean circuits whose computational graph is a perfect $\nu$-ary tree ($\nu\ge 2$) and whose output is appended to the input and fed back iteratively for subsequent computation rounds. A reasoning probe has access to a sampled subset of internal computation nodes, possibly without covering the entire graph, and seeks to infer which $\nu$-ary Boolean gate is executed at each queried node, representing uncertainty via a probability distribution over a fixed collection of $\mathtt{m}$ admissible $\nu$-ary gates. This partial observability induces a generalization problem, which we analyze in a realizable, transductive setting.
We show that, when the reasoning probe is parameterized by a graph convolutional network (GCN)-based hypothesis class and queries $N$ nodes, the worst-case generalization error attains the optimal rate $\mathcal{O}(\sqrt{\log(2/\delta)}/\sqrt{N})$ with probability at least $1-\delta$, for $\delta\in (0,1)$. Our analysis combines snowflake metric embedding techniques with tools from statistical optimal transport. A key insight is that this optimal rate is achievable independently of graph size, owing to the existence of a low-distortion one-dimensional snowflake embedding of the induced graph metric. As a consequence, our results provide a sharp characterization of how structural properties of the computational graph govern the statistical efficiency of reasoning under partial access.

[7]  arXiv:2602.03972 [pdf, ps, other]
Title: Fixed Budget is No Harder Than Fixed Confidence in Best-Arm Identification up to Logarithmic Factors
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The best-arm identification (BAI) problem is one of the most fundamental problems in interactive machine learning, which has two flavors: the fixed-budget setting (FB) and the fixed-confidence setting (FC).
For $K$-armed bandits with the unique best arm, the optimal sample complexities for both settings have been settled down, and they match up to logarithmic factors.
This prompts an interesting research question about the generic, potentially structured BAI problems: Is FB harder than FC or the other way around?
In this paper, we show that FB is no harder than FC up to logarithmic factors.
We do this constructively: we propose a novel algorithm called FC2FB (fixed confidence to fixed budget), which is a meta algorithm that takes in an FC algorithm $\mathcal{A}$ and turn it into an FB algorithm.
We prove that this FC2FB enjoys a sample complexity that matches, up to logarithmic factors, that of the sample complexity of $\mathcal{A}$.
This means that the optimal FC sample complexity is an upper bound of the optimal FB sample complexity up to logarithmic factors.
Our result not only reveals a fundamental relationship between FB and FC, but also has a significant implication: FC2FB, combined with existing state-of-the-art FC algorithms, leads to improved sample complexity for a number of FB problems.

[8]  arXiv:2602.03985 [pdf, ps, other]
Title: Doubly-Robust Bayesian Estimation of Optimal Individualized Treatment Rules using Network Meta-Analysis
Subjects: Methodology (stat.ME); Applications (stat.AP)

An optimal individualized treatment rule (ITR) is a function that takes a patient's characteristics, such as demographics, biomarkers, and treatment history, and outputs a treatment that is expected to give the best outcome for that patient. Major Depressive Disorder (MDD) is a common and disabling mental health condition for which an optimal ITR is of interest. Unfortunately, the power to detect treatment-covariate interactions in individual studies of MDD treatments is low. Additionally, all treatments of interest are not compared head-to-head in a single study. Network meta-analysis (NMA) is a method of synthesizing data from multiple studies to estimate the relative effects of a set of treatments. Recently, two-stage ITR NMA was proposed as a method to estimate ITRs that has the potential to improve power and simultaneously consider all relevant treatment options. In the first stage, study-specific ITRs are estimated, and in the second stage, they are pooled using a Bayesian NMA model. The existing approach is vulnerable to model misspecification and fails to address missing outcomes, which occur in the MDD data. We overcome these challenges by proposing Bayesian Bootstrap dynamic Weighted Ordinary Least Squares (BBdWOLS), a doubly-robust approach to ITR estimation that accounts for missing at random outcomes and naturally quantifies the uncertainty in estimation. We also propose an improvement to the NMA model that incorporates the full variance-covariance matrix of study-specific estimates. In a simulation study, we show that our fully Bayesian ITR NMA method is more robust and efficient than the existing approach. We apply our method to the motivating dataset consisting of three studies of pharmacological treatments for MDD, and explore how ITR NMA results can support personalized decision making in this context.

[9]  arXiv:2602.04010 [pdf, ps, other]
Title: Robust Nonparametric Two-Sample Tests via Mutual Information using Extended Bregman Divergence
Authors: Arijit Pyne
Subjects: Methodology (stat.ME)

We introduce a generalized formulation of mutual information (MI) based on the extended Bregman divergence, a framework that subsumes the generalized S-Bregman (GSB) divergence family. The GSB divergence unifies two important classes of statistical distances, namely the S-divergence and the Bregman exponential divergence (BED), thereby encompassing several widely used subfamilies, including the power divergence (PD), density power divergence (DPD), and S-Hellinger distance (S-HD). In parametric inference, minimum divergence estimators are well known to balance robustness with high asymptotic efficiency relative to the maximum likelihood estimator. However, nonparametric tests based on such statistical distances have been relatively less explored. In this paper, we construct a class of consistent and robust nonparametric two-sample tests for the equality of two absolutely continuous distributions using the generalized MI. We establish the asymptotic normality of the proposed test statistics under the null and contiguous alternatives. The robustness properties of the generalized MI are rigorously studied through the influence function and the breakdown point, demonstrating that stability of the generalized MI translates into stability of the associated tests. Extensive simulation studies show that divergences beyond the PD family often yield superior robustness under contamination while retaining high asymptotic power. A data-driven scheme for selecting optimal tuning parameters is also proposed. Finally, the methodology is illustrated with applications to real data.

[10]  arXiv:2602.04077 [pdf, ps, other]
Title: Efficient Subgroup Analysis via Optimal Trees with Global Parameter Fusion
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Identifying and making statistical inferences on differential treatment effects (commonly known as subgroup analysis in clinical research) is central to precision health. Subgroup analysis allows practitioners to pinpoint populations for whom a treatment is especially beneficial or protective, thereby advancing targeted interventions. Tree based recursive partitioning methods are widely used for subgroup analysis due to their interpretability. Nevertheless, these approaches encounter significant limitations, including suboptimal partitions induced by greedy heuristics and overfitting from locally estimated splits, especially under limited sample sizes. To address these limitations, we propose a fused optimal causal tree method that leverages mixed integer optimization (MIO) to facilitate precise subgroup identification. Our approach ensures globally optimal partitions and introduces a parameter fusion constraint to facilitate information sharing across related subgroups. This design substantially improves subgroup discovery accuracy and enhances statistical efficiency. We provide theoretical guarantees by rigorously establishing out of sample risk bounds and comparing them with those of classical tree based methods. Empirically, our method consistently outperforms popular baselines in simulations. Finally, we demonstrate its practical utility through a case study on the Health and Aging Brain Study Health Disparities (HABS-HD) dataset, where our approach yields clinically meaningful insights.

[11]  arXiv:2602.04092 [pdf, ps, other]
Title: Time-to-Event Estimation with Unreliably Reported Events in Medicare Health Plan Payment
Comments: 36 pages, 9 figures
Subjects: Applications (stat.AP); Econometrics (econ.EM); Methodology (stat.ME)

Time-to-event estimation (i.e., survival analysis) is common in health research, most often using methods that assume proportional hazards and no competing risks. Because both assumptions are frequently invalid, estimators more aligned with real-world settings have been proposed. An effect can be estimated as the difference in areas below the cumulative incidence functions of two groups up to a pre-specified time point. This approach, restricted mean time lost (RMTL), can be used in settings with competing risks as well. We extend RMTL estimation for use in an understudied health policy application in Medicare. Medicare currently supports healthcare payment for over 69 million beneficiaries, most of whom are enrolled in Medicare Advantage plans and receive insurance from private insurers. These insurers are prospectively paid by the federal government for each of their beneficiaries' anticipated health needs using an ordinary least squares linear regression algorithm. As all coefficients are positive and predictor variables are largely insurer-submitted health conditions, insurers are incentivized to upcode, or report more diagnoses than may be accurate. Such gaming is projected to cost the federal government $40 billion in 2025 alone without clear benefit to beneficiaries. We propose several novel estimators of coding intensity and possible upcoding in Medicare Advantage, including accounting for unreliable reporting. We demonstrate estimator performance in simulated data leveraging the National Institutes of Health's All of Us study and also develop an open source R package to simulate realistic labeled upcoding data, which were not previously available.

[12]  arXiv:2602.04124 [pdf, ps, other]
Title: Privacy Amplification for Synthetic data using Range Restriction
Comments: 25 pages, 20 figures
Subjects: Methodology (stat.ME)

We introduce a new class of range restricted formal data privacy standards that condition on owner beliefs about sensitive data ranges. By incorporating this additional information, we can provide a stronger privacy guarantee (e.g. an amplification). The range restricted formal privacy standards protect only a subset (or ball) of data values and exclude ranges (or balls) believed to be already publicly known. The privacy standards are designed for the risk-weighted pseudo posterior (model) mechanism (PPM) used to generate synthetic data under an asymptotic Differential (aDP) privacy guarantee. The PPM downweights the likelihood contribution for each record proportionally to its disclosure risk. The PPM is adapted under inclusion of beliefs by adjusting the risk-weighted pseudo likelihood. We introduce two alternative adjustments. The first expresses data owner knowledge of the sensitive range as a probability, $\lambda$, that a datum value drawn from the underlying generating distribution lies outside the ball or subspace of values that are sensitive. The portion of each datum likelihood contribution deemed sensitive is then $(1-\lambda) \leq 1$ and is the only portion of the likelihood subject to risk down-weighting. The second adjustment encodes knowledge as the difference in probability masses $P(R) \leq 1$ between the edges of the sensitive range, $R$. We use the resulting conditional (pseudo) likelihood for a sensitive record, which boosts its worst case tail values away from 0. We compare privacy and utility properties for the PPM under the aDP and range restricted privacy standards.

[13]  arXiv:2602.04125 [pdf, ps, other]
Title: Attack-Resistant Uniform Fairness for Linear and Smooth Contextual Bandits
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

Modern systems, such as digital platforms and service systems, increasingly rely on contextual bandits for online decision-making; however, their deployment can inadvertently create unfair exposure among arms, undermining long-term platform sustainability and supplier trust. This paper studies the contextual bandit problem under a uniform $(1-\delta)$-fairness constraint, and addresses its unique vulnerabilities to strategic manipulation. The fairness constraint ensures that preferential treatment is strictly justified by an arm's actual reward across all contexts and time horizons, using uniformity to prevent statistical loopholes. We develop novel algorithms that achieve (nearly) minimax-optimal regret for both linear and smooth reward functions, while maintaining strong $(1-\tilde{O}(1/T))$-fairness guarantees, and further characterize the theoretically inherent yet asymptotically marginal "price of fairness". However, we reveal that such merit-based fairness becomes uniquely susceptible to signal manipulation. We show that an adversary with a minimal $\tilde{O}(1)$ budget can not only degrade overall performance as in traditional attacks, but also selectively induce insidious fairness-specific failures while leaving conspicuous regret measures largely unaffected. To counter this, we design robust variants incorporating corruption-adaptive exploration and error-compensated thresholding. Our approach yields the first minimax-optimal regret bounds under $C$-budgeted attack while preserving $(1-\tilde{O}(1/T))$-fairness. Numerical experiments and a real-world case demonstrate that our algorithms sustain both fairness and efficiency.

[14]  arXiv:2602.04146 [pdf, ps, other]
Title: Bayes, E-values and Testing
Subjects: Statistics Theory (math.ST)

This paper studies relationships between Kolmogorov complexity, Shannon entropy, Bayes factors, E-values, and exchangeability testing. The focus is on negative log marginal or predictive probabilities -- what I.J.~Good termed the ``weight of evidence'' -- as a common evidence statistic linking coding, prediction, and sequential testing. The paper reviews the relevant information-theoretic and martingale tools, and discusses exchangeability testing via conformal e-prediction.

[15]  arXiv:2602.04155 [pdf, ps, other]
Title: Maximin Relative Improvement: Fair Learning as a Bargaining Problem
Subjects: Machine Learning (stat.ML); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)

When deploying a single predictor across multiple subpopulations, we propose a fundamentally different approach: interpreting group fairness as a bargaining problem among subpopulations. This game-theoretic perspective reveals that existing robust optimization methods such as minimizing worst-group loss or regret correspond to classical bargaining solutions and embody different fairness principles. We propose relative improvement, the ratio of actual risk reduction to potential reduction from a baseline predictor, which recovers the Kalai-Smorodinsky solution. Unlike absolute-scale methods that may not be comparable when groups have different potential predictability, relative improvement provides axiomatic justification including scale invariance and individual monotonicity. We establish finite-sample convergence guarantees under mild conditions.

[16]  arXiv:2602.04178 [pdf, ps, other]
Title: Sparse group principal component analysis via double thresholding with application to multi-cellular programs
Subjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)

Multi-cellular programs (MCPs) are coordinated patterns of gene expression across interacting cell types that collectively drive complex biological processes such as tissue development and immune responses. While MCPs are typically estimated from high-dimensional gene expression data using methods like sparse principal component analysis or latent factor models, these approaches often suffer from high computational costs and limited statistical power. In this work, we propose Sparse Group Principal Component Analysis (SGPCA) to estimate MCPs by leveraging their inherent group and individual sparsity. We introduce an efficient double-thresholding algorithm based on power iteration. In each iteration, a group thresholding step first identifies relevant gene groups, followed by an individual thresholding step to select active cell types. This algorithm achieves a linear computational complexity of $O(np)$, making it highly efficient and scalable for large-scale genomic analyses. We establish theoretical guarantees for SGPCA, including statistical consistency and a convergence rate that surpasses competing methods. Through extensive simulations, we demonstrate that SGPCA achieves superior estimation accuracy and improved statistical power for signal detection. Furthermore, We apply SGPCA to a Lupus study, discovering differentially expressed MCPs distinguishing Lupus patients from normal subjects.

[17]  arXiv:2602.04230 [pdf, ps, other]
Title: Validating Causal Message Passing Against Network-Aware Methods on Real Experiments
Subjects: Methodology (stat.ME); Econometrics (econ.EM)

Estimating total treatment effects in the presence of network interference typically requires knowledge of the underlying interaction structure. However, in many practical settings, network data is either unavailable, incomplete, or measured with substantial error. We demonstrate that causal message passing, a methodology that leverages temporal structure in outcome data rather than network topology, can recover total treatment effects comparable to network-aware approaches. We apply causal message passing to two large-scale field experiments where a recently developed bipartite graph methodology, which requires network knowledge, serves as a benchmark. Despite having no access to the interaction network, causal message passing produces effect estimates that match the network-aware approach in direction across all metrics and in statistical significance for the primary decision metric. Our findings validate the premise of causal message passing: that temporal variation in outcomes can serve as an effective substitute for network observation when estimating spillover effects. This has important practical implications: practitioners facing settings where network data is costly to collect, proprietary, or unreliable can instead exploit the temporal dynamics of their experimental data.

[18]  arXiv:2602.04233 [pdf, ps, other]
Title: Provable Target Sample Complexity Improvements as Pre-Trained Models Scale
Comments: AISTATS2026
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Pre-trained models have become indispensable for efficiently building models across a broad spectrum of downstream tasks. The advantages of pre-trained models have been highlighted by empirical studies on scaling laws, which demonstrate that larger pre-trained models can significantly reduce the sample complexity of downstream learning. However, existing theoretical investigations of pre-trained models lack the capability to explain this phenomenon. In this paper, we provide a theoretical investigation by introducing a novel framework, caulking, inspired by parameter-efficient fine-tuning (PEFT) methods such as adapter-based fine-tuning, low-rank adaptation, and partial fine-tuning. Our analysis establishes that improved pre-trained models provably decrease the sample complexity of downstream tasks, thereby offering theoretical justification for the empirically observed scaling laws relating pre-trained model size to downstream performance, a relationship not covered by existing results.

[19]  arXiv:2602.04272 [pdf, ps, other]
Title: Bures-Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications
Comments: 27 pages, 6 figures. Submitted to Bayesian Analysis
Subjects: Computation (stat.CO); Machine Learning (cs.LG); Methodology (stat.ME)

The Importance-Weighted Evidence Lower Bound (IW-ELBO) has emerged as an effective objective for variational inference (VI), tightening the standard ELBO and mitigating the mode-seeking behaviour.
However, optimizing the IW-ELBO in Euclidean space is often inefficient, as its gradient estimators suffer from a vanishing signal-to-noise ratio (SNR). This paper formulates the optimisation of the IW-ELBO in Bures-Wasserstein space, a manifold of Gaussian distributions equipped with the 2-Wasserstein metric. We derive the Wasserstein gradient of the IW-ELBO and project it onto the Bures-Wasserstein space to yield a tractable algorithm for Gaussian VI.
A pivotal contribution of our analysis concerns the stability of the gradient estimator. While the SNR of the standard Euclidean gradient estimator is known to vanish as the number of importance samples $K$ increases, we prove that the SNR of the Wasserstein gradient scales favourably as $\Omega(\sqrt{K})$, ensuring optimisation efficiency even for large $K$. We further extend this geometric analysis to the Variational R\'enyi Importance-Weighted Autoencoder bound, establishing analogous stability guarantees. Experiments demonstrate that the proposed framework achieves superior approximation performance compared to other baselines.

[20]  arXiv:2602.04302 [pdf, ps, other]
Title: Phase Transition of Spectral Fluctuations in Large Gram Matrices with a Variance Profile: A Unified Framework for Sparse CLTs
Comments: 25 pages, 4 figures
Subjects: Statistics Theory (math.ST)

We study the asymptotic spectral behavior of high-dimensional random Gram matrices with sparsity and a given variance profile, motivated by applications in wireless communication. Specifically, we consider the Gram matrices $\mathbf S_n=\mathbf Y_n\mathbf Y_n^*$, where the entries of $\mathbf Y_n$ are independent, centered, heteroscedastic, and sparse through Bernoulli masking. The sparsity level is parameterized as $s=q^2/n$, with $q$ ranging from polynomial order to order $n^{1/2}$.
We investigate two asymptotic regimes in a high-dimensional framework: a moderate-sparsity regime with fixed $s\in(0,1]$, and a high-sparsity regime where $s\to0$. In both regimes, we establish the convergence of the empirical spectral distribution of $\mathbf S_n$ to a deterministic limit, and further derive central limit theorems for linear spectral statistics using resolvent techniques and martingale difference arguments. Our analysis reveals a phase transition in the fluctuation behavior across the two regimes. In the high-sparsity regime, the asymptotic fluctuations are governed by fourth-moment effects, with sparsity-scaled contributions being suppressed. Moreover, a mismatch between the scaling of the mean and variance, of different orders in $q$, necessitates an explicit correction in the centering of the linear spectral statistic. The theory applies to both Gaussian and non-Gaussian entries, and its statistical utility is illustrated through applications to hypothesis testing and outage probability analysis in large-scale MIMO systems.

[21]  arXiv:2602.04318 [pdf, ps, other]
Title: Accurate and Efficient Approximation of the Null Distribution of Rao's Spacing Test
Comments: 10 pages
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

Rao's spacing test is a widely used nonparametric method for assessing uniformity on the circle. However, its broader applicability in practical settings has been limited because the null distribution is not easily calculated. As a result, practitioners have traditionally depended on pre-tabulated critical values computed for a limited set of sample sizes, which restricts the flexibility and generality of the method. In this paper, we address this limitation by recursively computing higher-order moments of the Rao's spacing test statistic and employing the Gram-Charlier expansion to derive an accurate approximation to its null distribution. This approach allows for the efficient and direct computation of p-values for arbitrary sample sizes, thereby eliminating the dependency on existing critical value tables. Moreover, we confirm that our method remains accurate and effective even for large sample sizes that are not represented in current tables, thus overcoming a significant practical limitation. Comparative evaluations with published critical values and saddlepoint approximations demonstrate that our method achieves a high degree of accuracy across a wide range of sample sizes. These findings greatly improve the practicality and usability of Rao's spacing test in both theoretical investigations and applied statistical analyses.

[22]  arXiv:2602.04322 [pdf, other]
Title: Exact Multiple Change-Point Detection Via Smallest Valid Partitioning
Authors: Vincent Runge (LaMME), Anica Kostic (LSE), Alexandre Combeau (LaMME), Gaetano Romano
Subjects: Methodology (stat.ME)

We introduce smallest valid partitioning (SVP), a segmentation method for multiple change-point detection in time-series. SVP relies on a local notion of segment validity: a candidate segment is retained only if it passes a user-chosen validity test (e.g., a single change-point test). From the collection of valid segments, we propose a coherent aggregation procedure that constructs a global segmentation which is the exact solution of an optimization problem. Our main contribution is the use of a lexicographic order for the optimization problem that prioritizes parsimony. We analyze the computational complexity of the resulting procedure, which ranges from linear to cubic time depending on the chosen cost and validity functions, the data regime and the number of detected changes. Finally, we assess the quality of SVP through comparisons with standard optimal partitioning algorithms, showing that SVP yields competitive segmentations while explicitly enforcing segment validity. The flexibility of SVP makes it applicable to a broad class of problems; as an illustration, we demonstrate robust change-point detection by encoding robustness in the validity criterion.

[23]  arXiv:2602.04335 [pdf, other]
Title: Geometry-Aware Optimal Transport: Fast Intrinsic Dimension and Wasserstein Distance Estimation
Authors: Ferdinand Genans (SU, LPSM), Olivier Wintenberger (SU, LPSM)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Solving large scale Optimal Transport (OT) in machine learning typically relies on sampling measures to obtain a tractable discrete problem. While the discrete solver's accuracy is controllable, the rate of convergence of the discretization error is governed by the intrinsic dimension of our data. Therefore, the true bottleneck is the knowledge and control of the sampling error. In this work, we tackle this issue by introducing novel estimators for both sampling error and intrinsic dimension. The key finding is a simple, tuning-free estimator of $\text{OT}_c(\rho, \hat\rho)$ that utilizes the semi-dual OT functional and, remarkably, requires no OT solver. Furthermore, we derive a fast intrinsic dimension estimator from the multi-scale decay of our sampling error estimator. This framework unlocks significant computational and statistical advantages in practice, enabling us to (i) quantify the convergence rate of the discretization error, (ii) calibrate the entropic regularization of Sinkhorn divergences to the data's intrinsic geometry, and (iii) introduce a novel, intrinsic-dimension-based Richardson extrapolation estimator that strongly debiases Wasserstein distance estimation. Numerical experiments demonstrate that our geometry-aware pipeline effectively mitigates the discretization error bottleneck while maintaining computational efficiency.

[24]  arXiv:2602.04347 [pdf, ps, other]
Title: A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization
Comments: Accepted for publication in INFORMS Transactions on Education
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In recent years, instructional practices in Operations Research (OR), Management Science (MS), and Analytics have increasingly shifted toward digital environments, where large and diverse groups of learners make it difficult to provide practice that adapts to individual needs. This paper introduces a method that generates personalized sequences of exercises by selecting, at each step, the exercise most likely to advance a learner's understanding of a targeted skill. The method uses information about the learner and their past performance to guide these choices, and learning progress is measured as the change in estimated skill level before and after each exercise. Using data from an online mathematics tutoring platform, we find that the approach recommends exercises associated with greater skill improvement and adapts effectively to differences across learners. From an instructional perspective, the framework enables personalized practice at scale, highlights exercises with consistently strong learning value, and helps instructors identify learners who may benefit from additional support.

[25]  arXiv:2602.04353 [pdf, ps, other]
Title: Anyone for chess? Analysing chess ratings above high thresholds
Authors: Nils Lid Hjort
Comments: 9 pages, 7 figures
Subjects: Other Statistics (stat.OT)

Suppose some cleverness score parameter is sufficiently
interesting to be defined and then measured, perhaps for
different strata of specialists or for the broader population.
Such phenomena could have Gaussian distributions,
when it comes to all players in a stratum, but when interest
focuses on the very tails, for the top few percent,
those above certain high thresholds,
different models are called for, along with the need
to analyse such based on the listed top scores only.
In this note I develop such models and tools,
and apply them to the top-100 and above 2100 points
lists for regular chess ratings, for the currently active
14671 men and 753 women,
as given by the FIDE, January 2026.
It is argued that even when two or more distributions have
close to identical expected values, or medians,
even smaller differences in variance may explain gaps
for the few very best ones.

[26]  arXiv:2602.04364 [pdf, ps, other]
Title: Anytime-Valid Conformal Risk Control
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Prediction sets provide a means of quantifying the uncertainty in predictive tasks. Using held out calibration data, conformal prediction and risk control can produce prediction sets that exhibit statistically valid error control in a computationally efficient manner. However, in the standard formulations, the error is only controlled on average over many possible calibration datasets of fixed size. In this paper, we extend the control to remain valid with high probability over a cumulatively growing calibration dataset at any time point. We derive such guarantees using quantile-based arguments and illustrate the applicability of the proposed framework to settings involving distribution shift. We further establish a matching lower bound and show that our guarantees are asymptotically tight. Finally, we demonstrate the practical performance of our methods through both simulations and real-world numerical examples.

[27]  arXiv:2602.04400 [pdf, ps, other]
Title: Unit Shiha Distribution and its Applications to Engineering and Medical Data
Authors: F. A. Shiha
Subjects: Methodology (stat.ME); Probability (math.PR); Statistics Theory (math.ST)

There is a growing need for flexible statistical distributions that can accurately model data defined on the unit interval. This paper introduces a new unit distribution, termed the unit Shiha (USh) distribution, which is derived from the original Shiha (Sh) distribution through an inverse exponential transformation. The probability density function of the USh distribution is sufficiently flexible to model both left- and right-skewed data, while its hazard rate function is capable of capturing various failure-rate patterns, including increasing, bathtub-shaped, and J-shaped forms. Several statistical properties of the proposed distribution are investigated, including moments and related measures, the quantile function, entropy, and stress-strength reliability. Parameter estimation is carried out using the maximum likelihood method, and its performance is evaluated through a simulation study. The practical usefulness of the USh distribution is demonstrated using four real-life data sets, and its performance is compared with several well-known competing unit distributions. The comparative results indicate that the proposed model fits the data better than the competitive models applied in this study.

[28]  arXiv:2602.04402 [pdf, ps, other]
Title: Performative Learning Theory
Comments: 52 pages, 2 figures
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Statistics Theory (math.ST)

Performative predictions influence the very outcomes they aim to forecast. We study performative predictions that affect a sample (e.g., only existing users of an app) and/or the whole population (e.g., all potential app users). This raises the question of how well models generalize under performativity. For example, how well can we draw insights about new app users based on existing users when both of them react to the app's predictions? We address this question by embedding performative predictions into statistical learning theory. We prove generalization bounds under performative effects on the sample, on the population, and on both. A key intuition behind our proofs is that in the worst case, the population negates predictions, while the sample deceptively fulfills them. We cast such self-negating and self-fulfilling predictions as min-max and min-min risk functionals in Wasserstein space, respectively. Our analysis reveals a fundamental trade-off between performatively changing the world and learning from it: the more a model affects data, the less it can learn from it. Moreover, our analysis results in a surprising insight on how to improve generalization guarantees by retraining on performatively distorted samples. We illustrate our bounds in a case study on prediction-informed assignments of unemployed German residents to job trainings, drawing upon administrative labor market records from 1975 to 2017 in Germany.

[29]  arXiv:2602.04457 [pdf, ps, other]
Title: Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference
Comments: ICLR 2026
Subjects: Methodology (stat.ME); Machine Learning (cs.LG)

A/B testing on platforms often faces challenges from network interference, where a unit's outcome depends not only on its own treatment but also on the treatments of its network neighbors. To address this, cluster-level randomization has become standard, enabling the use of network-aware estimators. These estimators typically trim the data to retain only a subset of informative units, achieving low bias under suitable conditions but often suffering from high variance. In this paper, we first demonstrate that the interior nodes - units whose neighbors all lie within the same cluster - constitute the vast majority of the post-trimming subpopulation. In light of this, we propose directly averaging over the interior nodes to construct the mean-in-interior (MII) estimator, which circumvents the delicate reweighting required by existing network-aware estimators and substantially reduces variance in classical settings. However, we show that interior nodes are often not representative of the full population, particularly in terms of network-dependent covariates, leading to notable bias. We then augment the MII estimator with a counterfactual predictor trained on the entire network, allowing us to adjust for covariate distribution shifts between the interior nodes and full population. By rearranging the expression, we reveal that our augmented MII estimator embodies an analytical form of the point estimator within prediction-powered inference framework. This insight motivates a semi-supervised lens, wherein interior nodes are treated as labeled data subject to selection bias. Extensive and challenging simulation studies demonstrate the outstanding performance of our augmented MII estimator across various settings.

[30]  arXiv:2602.04459 [pdf, ps, other]
Title: Bayesian PINNs for uncertainty-aware inverse problems (BPINN-IP)
Comments: submitted to ICIP 2006 conference
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The main contribution of this paper is to develop a hierarchical Bayesian formulation of PINNs for linear inverse problems, which is called BPINN-IP. The proposed methodology extends PINN to account for prior knowledge on the nature of the expected NN output, as well as its weights. Also, as we can have access to the posterior probability distributions, naturally uncertainties can be quantified. Also, variational inference and Monte Carlo dropout are employed to provide predictive means and variances for reconstructed images. Un example of applications to deconvolution and super-resolution is considered, details of the different steps of implementations are given, and some preliminary results are presented.

[31]  arXiv:2602.04472 [pdf, ps, other]
Title: Universality of General Spiked Tensor Models
Comments: 102pages
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

We study the rank-one spiked tensor model in the high-dimensional regime, where the noise entries are independent and identically distributed with zero mean, unit variance, and finite fourth moment.This setting extends the classical Gaussian framework to a substantially broader class of noise distributions.Focusing on asymmetric tensors of order $d$ ($\ge 3$), we analyze the maximum likelihood estimator of the best rank-one approximation.Under a mild assumption isolating informative critical points of the associated optimization landscape, we show that the empirical spectral distribution of a suitably defined block-wise tensor contraction converges almost surely to a deterministic limit that coincides with the Gaussian case.As a consequence, the asymptotic singular value and the alignments between the estimated and true spike directions admit explicit characterizations identical to those obtained under Gaussian noise. These results establish a universality principle for spiked tensor models, demonstrating that their high-dimensional spectral behavior and statistical limits are robust to non-Gaussian noise.
Our analysis relies on resolvent methods from random matrix theory, cumulant expansions valid under finite moment assumptions, and variance bounds based on Efron-Stein-type arguments. A key challenge in the proof is how to handle the statistical dependence between the signal term and the noise term.

[32]  arXiv:2602.04554 [pdf, ps, other]
Title: mmcmcBayes:An R Package Implementing a Multistage MCMC Framework for Detecting the Differentially Methylated Regions
Comments: 27 pages, 3 figures
Subjects: Applications (stat.AP); Computation (stat.CO)

Identifying differentially methylated regions is an important task in epigenome-wide association studies, where differential signals often arise across groups of neighboring CpG sites. Many existing methods detect differentially methylated regions by aggregating CpG-level test results, which may limit their ability to capture complex regional methylation patterns. In this paper, we introduce the R package mmcmcBayes, which implements a multistage Markov chain Monte Carlo procedure for region-level detection of differentially methylated regions. The method models sample-wise regional methylation summaries using the alpha-skew generalized normal distribution and evaluates evidence for differential methylation between groups through Bayes factors. We use a multistage region-splitting strategy to refine candidate regions based on statistical evidence. We describe the underlying methodology and software implementation, and illustrate its performance through simulation studies and applications to Illumina 450K methylation data. The mmcmcBayes package provides a practical region-level alternative to existing CpG-based differentially methylated regions detection methods and includes supporting functions for summarizing, comparing, and visualizing detected regions.

[33]  arXiv:2602.04594 [pdf, ps, other]
Title: Distributed Convoluted Rank Regression for Non-Shareable Data under Non-Additive Losses
Subjects: Methodology (stat.ME)

We study high-dimensional rank regression when data are distributed across multiple machines and the loss is a non-additive U-statistic, as in convoluted rank regression (CRR). Classical communication-efficient surrogate likelihood (CSL) methods crucially rely on the additivity of the empirical loss and therefore break down for CRR, whose global loss couples all sample pairs across machines. We propose a distributed convoluted rank regression (DCRR) framework that constructs a similar surrogate loss and demonstrate its validity under the non-additive losses. We show that this surrogate shares the same population minimizer as the full-data CRR loss and yields estimators that are statistically equivalent to centralized CRR. Building on this, we develop a two-stage sparse DCRR procedure -- an iterative $\ell_1$-penalized stage followed by a folded-concave refinement -- and establish non-asymptotic error bounds, a distributed strong oracle property, and a DHBIC-type criterion for consistent model selection. A scaling result shows that the number of machines may diverge as $M = o({N/(s^2\log p)})$ while achieving centralized oracle rates with only $O(\log N)$ communication rounds. Simulations and a large-scale real data example demonstrate substantial gains over naive divide-and-conquer, particularly under heavy-tailed errors.

[34]  arXiv:2602.04596 [pdf, ps, other]
Title: A principled framework for uncertainty decomposition in TabPFN
Comments: 9 pages (+2 reference, +34 appendix). Code in this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

TabPFN is a transformer that achieves state-of-the-art performance on supervised tabular tasks by amortizing Bayesian prediction into a single forward pass. However, there is currently no method for uncertainty decomposition in TabPFN. Because it behaves, in an idealised limit, as a Bayesian in-context learner, we cast the decomposition challenge as a Bayesian predictive inference (BPI) problem. The main computational tool in BPI, predictive Monte Carlo, is challenging to apply here as it requires simulating unmodeled covariates. We therefore pursue the asymptotic alternative, filling a gap in the theory for supervised settings by proving a predictive CLT under quasi-martingale conditions. We derive variance estimators determined by the volatility of predictive updates along the context. The resulting credible bands are fast to compute, target epistemic uncertainty, and achieve near-nominal frequentist coverage. For classification, we further obtain an entropy-based uncertainty decomposition.

[35]  arXiv:2602.04611 [pdf, ps, other]
Title: Targeted Synthetic Control Method
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The synthetic control method (SCM) estimates causal effects in panel data with a single-treated unit by constructing a counterfactual outcome as a weighted combination of untreated control units that matches the pre-treatment trajectory. In this paper, we introduce the targeted synthetic control (TSC) method, a new two-stage estimator that directly estimates the counterfactual outcome. Specifically, our TSC method (1) yields a targeted debiasing estimator, in the sense that the targeted updating refines the initial weights to produce more stable weights; and (2) ensures that the final counterfactual estimation is a convex combination of observed control outcomes to enable direct interpretation of the synthetic control weights. TSC is flexible and can be instantiated with arbitrary machine learning models. Methodologically, TSC starts from an initial set of synthetic-control weights via a one-dimensional targeted update through the weight-tilting submodel, which calibrates the weights to reduce bias of weights estimation arising from pre-treatment fit. Furthermore, TSC avoids key shortcomings of existing methods (e.g., the augmented SCM), which can produce unbounded counterfactual estimates. Across extensive synthetic and real-world experiments, TSC consistently improves estimation accuracy over state-of-the-art SCM baselines.

[36]  arXiv:2602.04638 [pdf, ps, other]
Title: Inference for Within- and Between-Partnership Transmission Rates for HIV Infection
Comments: 14 pages, 3 figures, 3 tables
Subjects: Applications (stat.AP)

HIV transmission within serodiscordant couples remains a significant public health challenge, particularly in sub-Saharan Africa. Estimating the rate of such infection, alongside the rates of introduction of infection from outside the partnership, is a special case of the more general epidemiological challenge of inferring intensities of within- and between-group intensities of transmission. This study presents a stochastic susceptible-infected (SI) pair model for estimating key epidemiological parameters governing HIV transmission within and between couples, which we further extend to account for gender-specific differences in infection dynamics. Using a likelihood-based inference approach, we estimate transmission parameters and associated uncertainty from observed data. These values can be used to inform infection prevention strategies for HIV, and the methodology proposed can be generalised to other epidemiological settings.

[37]  arXiv:2602.04667 [pdf, ps, other]
Title: Causal explanations of outliers in systems with lagged time-dependencies
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Root-cause analysis in controlled time dependent systems poses a major challenge in applications. Especially energy systems are difficult to handle as they exhibit instantaneous as well as delayed effects and if equipped with storage, do have a memory. In this paper we adapt the causal root-cause analysis method of Budhathoki et al. [2022] to general time-dependent systems, as it can be regarded as a strictly causal definition of the term "root-cause". Particularly, we discuss two truncation approaches to handle the infinite dependency graphs present in time-dependent systems. While one leaves the causal mechanisms intact, the other approximates the mechanisms at the start nodes. The effectiveness of the different approaches is benchmarked using a challenging data generation process inspired by a problem in factory energy management: the avoidance of peaks in the power consumption. We show that given enough lags our extension is able to localize the root-causes in the feature and time domain. Further the effect of mechanism approximation is discussed.

[38]  arXiv:2602.04668 [pdf, ps, other]
Title: Estimation of reliability and accuracy of models of $\varphi$-sub-Gaussian process using generating functions of polynomial expansions
Subjects: Statistics Theory (math.ST)

Stochastic processes are often represented through orthonormal series expansions, a framework originating in the classical works of Lo\`eve and Karhunen and widely used for simulation and numerical approximation. While truncation error in such expansions has been extensively studied, practical models frequently involve an additional source of error arising from the approximation of coefficient functions when closed-form expressions are unavailable. The combined effect of these two errors remains insufficiently addressed in the literature. Building on the author's earlier work on reliability and accuracy estimates for $\varphi$-sub-Gaussian processes, this paper extends the methodology to orthonormal polynomial systems that do not possess normalized generating functions in analytical form, including the Legendre, generalized Laguerre, and Gegenbauer families. New bounds are derived for models in $L_p(T)$ and $C([0,T])$ that simultaneously account for truncation and coefficient approximation. The resulting criteria provide practical guidance for selecting the number of series terms required to achieve prescribed levels of reliability and accuracy across a broader class of polynomial-based stochastic process models.

[39]  arXiv:2602.04679 [pdf, ps, other]
Title: LID Framework: A new method for geospatial and exploratory data analysis of potential innovation deter-minants at the neighborhood level
Subjects: Computation (stat.CO); Computers and Society (cs.CY)

The geography of innovation offers a framework to understand how territorial characteristics shape innovation, often via spatial and cognitive proximity. Empirical research has focused largely on national and regional scales, while urban and sub-regional geographies receive less attention. Local studies typically rely on limited indicators (e.g., firm-level data, patents, basic socioeconomic measures), with few offering a systematic framework integrating urban form, mobility, amenities, and human-capital proxies at the neighborhood scale. Our study investigates innovation at a finer spatial resolution, going beyond proprietary or static indicators. We develop the Local Innovation Determinants (LID) database and framework to identify key enabling factors across regions, combining traditional government data with publicly available data via APIs for a more granular understanding of spatial dynamics shaping innovation capacity. Using exploratory big and geospatial data analytics and random forest models, we examine neighborhoods in New York and Massachusetts across four dimensions: social factors, economic characteristics, land use and mobility, morphology, and environment. Results show that alternative data sources offer significant yet underexplored potential to enhance insights into innovation dynamics. City policymakers should consider neighborhood-specific determinants and characteristics when designing and implementing local innovation strategies.

[40]  arXiv:2602.04682 [pdf, ps, other]
Title: Covariate Selection for Joint Latent Space Modeling of Sparse Network Data
Subjects: Methodology (stat.ME)

Network data are increasingly common in the social sciences and infectious disease epidemiology. Analyses often link network structure to node-level covariates, but existing methods falter with sparse networks and high-dimensional node features. We propose a joint latent space modeling framework for sparse networks with high-dimensional binary node covariates that performs covariate selection while accounting for uncertainty in estimated latent positions. Building on joint latent space models that couple edges and node variables through shared latent positions, we introduce a group lasso screening step and incorporate a measurement-error-aware stabilization term to mitigate bias from using estimated latent positions as predictors. We establish prediction error rates for the covariate component both when latent positions are treated as observed and when they are estimated with bounded error; under uniform control across $q$ covariates and $n$ nodes, the rate is of order $O(\log q / n)$ up to an additional term due to latent position estimation error. Our method addresses three challenges: (1) incorporating information from isolated nodes, which are common in sparse networks but often ignored; (2) selecting relevant covariates from high-dimensional spaces; and (3) accounting for uncertainty in estimated latent positions. Simulations show predictive performance remains stable as covariate sparsity grows, while naive approaches degrade. We illustrate how the method can support efficient study design using household social networks from 75 Indian villages, where an emulated pilot study screens a large covariate battery and substantially reduces required subsequent data collection without sacrificing network predictive accuracy.

[41]  arXiv:2602.04691 [pdf, ps, other]
Title: Linear Regression: Inference Based on Cluster Estimates
Subjects: Methodology (stat.ME)

This article proposes a novel estimator for regression coefficients in clustered data that explicitly accounts for within-cluster dependence. We study the asymptotic properties of the proposed estimator under both finite and infinite cluster sizes. The analysis is then extended to a standard random coefficient model, where we derive asymptotic results for the average (common) parameters and develop a Wald-type test for general linear hypotheses. We also investigate the performance of the conventional pooled ordinary least squares (POLS) estimator within the random coefficients framework and show that it can be unreliable across a wide range of empirically relevant settings. Furthermore, we introduce a new test for parameter stability at a higher (superblock; Tier 2, Tier 3,...) level, assuming that parameters are stable across clusters within that level. Extensive simulation studies demonstrate the effectiveness of the proposed tests, and an empirical application illustrates their practical relevance.

[42]  arXiv:2602.04708 [pdf, ps, other]
Title: Statistical inference for the stochastic wave equation based on discrete observations
Comments: 44 pages, 6 figures
Subjects: Statistics Theory (math.ST)

The wave speed of a stochastic wave equation driven by Riesz noise on the unbounded multidimensional spatial domain is estimated based on discrete measurements. Central limit theorems for second-order variations of the observations in space, time, and space-time are established. Under general assumptions on the spatial and temporal sampling frequencies, the resulting method-of-moments estimators are asymptotically normally distributed. The covariance structure of the discrete increments admits a closed-form representation involving two different Fej\'er-type kernels, enabling a precise analysis of the interplay between spatial and temporal contributions.

[43]  arXiv:2602.04736 [pdf, ps, other]
Title: Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates
Comments: Code is available at this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

A complete understanding of heterogeneous treatment effects involves characterizing the full conditional distribution of potential outcomes. To this end, we propose the Conditional Counterfactual Mean Embeddings (CCME), a framework that embeds conditional distributions of counterfactual outcomes into a reproducing kernel Hilbert space (RKHS). Under this framework, we develop a two-stage meta-estimator for CCME that accommodates any RKHS-valued regression in each stage. Based on this meta-estimator, we develop three practical CCME estimators: (1) Ridge Regression estimator, (2) Deep Feature estimator that parameterizes the feature map by a neural network, and (3) Neural-Kernel estimator that performs RKHS-valued regression, with the coefficients parameterized by a neural network. We provide finite-sample convergence rates for all estimators, establishing that they possess the double robustness property. Our experiments demonstrate that our estimators accurately recover distributional features including multimodal structure of conditional counterfactual distributions.

[44]  arXiv:2602.04751 [pdf, ps, other]
Title: Multiple Imputation Methods under Extreme Values
Comments: 36 pages main text, 20 pages appendix, 12 figures, 28 tables. Submitted to the Austrian Journal of Statistics (under review)
Subjects: Computation (stat.CO); Methodology (stat.ME)

Missing data are ubiquitous in empirical databases, yet statistical analyses typically require complete data matrices. Multiple imputation offers a principled solution for filling these gaps. This study evaluates the performance of several multiple imputation methods, both in the presence and absence of extreme values, using the MICE package in R. Through Monte Carlo simulations, we generated incomplete data sets with three variables and assessed each imputation method within regression models. The results indicate that the linear regression based imputation method showed the best overall predictive performance (CV-MSE), whereas the sparse model approach was generally less efficient. Our findings underscore the relevance of extreme values when selecting an imputation strategy and highlight sample size, proportion of missingness, presence of extremes, and the type of fitted model as key determinants of performance. Despite its limitations, the study offers practical recommendations for researchers, stressing the need to examine the missingness mechanism and the occurrence of extreme values before choosing an imputation method.

[45]  arXiv:2602.04788 [pdf, ps, other]
Title: Species Sensitivity Distribution revisited: a Bayesian nonparametric approach
Subjects: Methodology (stat.ME); Applications (stat.AP)

We present a novel approach to ecological risk assessment by recasting the Species Sensitivity Distribution (SSD) method within a Bayesian nonparametric (BNP) framework. Widely mandated by environmental regulatory bodies globally, SSD has faced criticism due to its historical reliance on parametric assumptions when modeling species variability. By adopting nonparametric mixture models, we address this limitation, establishing a statistically robust foundation for SSD. Our BNP approach offers several advantages, including its efficacy in handling small datasets or censored data, which are common in ecological risk assessment, and its ability to provide principled uncertainty quantification alongside simultaneous density estimation and clustering. We utilize a specific nonparametric prior as the mixing measure, chosen for its robust clustering properties, a crucial consideration given the lack of strong prior beliefs about the number of components. Through simulation studies and analysis of real datasets, we demonstrate the superiority of our BNP-SSD over classical SSD methods. We also provide a BNP-SSD Shiny application, making our methodology available to the Ecotoxicology community. Moreover, we exploit the inherent clustering structure of the mixture model to explore patterns in species sensitivity. Our findings underscore the effectiveness of the proposed approach in improving ecological risk assessment methodologies.

[46]  arXiv:2602.04798 [pdf, ps, other]
Title: Score-Based Change-Point Detection and Region Localization for Spatio-Temporal Point Processes
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)

We study sequential change-point detection for spatio-temporal point processes, where actionable detection requires not only identifying when a distributional change occurs but also localizing where it manifests in space. While classical quickest change detection methods provide strong guarantees on detection delay and false-alarm rates, existing approaches for point-process data predominantly focus on temporal changes and do not explicitly infer affected spatial regions. We propose a likelihood-free, score-based detection framework that jointly estimates the change time and the change region in continuous space-time without assuming parametric knowledge of the pre- or post-change dynamics. The method leverages a localized and conditionally weighted Hyv\"arinen score to quantify event-level deviations from nominal behavior and aggregates these scores using a spatio-temporal CUSUM-type statistic over a prescribed class of spatial regions. Operating sequentially, the procedure outputs both a stopping time and an estimated change region, enabling real-time detection with spatial interpretability. We establish theoretical guarantees on false-alarm control, detection delay, and spatial localization accuracy, and demonstrate the effectiveness of the proposed approach through simulations and real-world spatio-temporal event data.

[47]  arXiv:2602.04823 [pdf, ps, other]
Title: Adaptive estimation of Sobolev-type energy functionals on the sphere
Comments: 26 pages, 3 figures
Subjects: Statistics Theory (math.ST)

We study the estimation of quadratic Sobolev-type integral functionals of an unknown density on the unit sphere. The functional is defined through fractional powers of the Laplace--Beltrami operator and provides a global measure of smoothness and spectral energy. Our approach relies on spherical needlet frames, which yield a localized multiscale decomposition while preserving tight frame properties in the natural square-integrable function space on the sphere.
We construct unbiased estimators of suitably truncated versions of the functional and derive sharp oracle risk bounds through an explicit bias--variance analysis. When the smoothness of the density is unknown, we propose a Lepski-type data-driven selection of the resolution level. The resulting adaptive estimator achieves minimax-optimal rates over Sobolev classes, without resorting to nonlinear or sparsity-based methods.

[48]  arXiv:2602.04855 [pdf, ps, other]
Title: Marginal Likelihood Inference for Fitting Dynamical Survival Analysis Models to Epidemic Count Data
Comments: 25 pages, 2 figures and 6 tables
Subjects: Methodology (stat.ME)

Stochastic compartmental models are prevalent tools for describing disease spread, but inference under these models is challenging for many types of surveillance data when the marginal likelihood function becomes intractable due to missing information. To address this, we develop a closed-form likelihood for discretely observed incidence count data under the dynamical survival analysis (DSA) paradigm. The method approximates the stochastic population-level hazard by a large population limit while retaining a count-valued stochastic model, and leads to survival analytic inferential strategies that are both computationally efficient and flexible to model generalizations. Through simulation, we show that parameter estimation is competitive with recent exact but computationally expensive likelihood-based methods in partially observed settings. Previous work has shown that the DSA approximation is generalizable, and we show that the inferential developments here also carry over to models featuring individual heterogeneity, such as frailty models. We consider case studies of both Ebola and COVID-19 data on variants of the model, including a network-based epidemic model and a model with distributions over susceptibility, demonstrating its flexibility and practical utility on real, partially observed datasets.

[49]  arXiv:2602.04872 [pdf, ps, other]
Title: Multi-layer Cross-Attention is Provably Optimal for Multi-modal In-context Learning
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recent progress has rapidly advanced our understanding of the mechanisms underlying in-context learning in modern attention-based neural networks. However, existing results focus exclusively on unimodal data; in contrast, the theoretical underpinnings of in-context learning for multi-modal data remain poorly understood. We introduce a mathematically tractable framework for studying multi-modal learning and explore when transformer-like architectures can recover Bayes-optimal performance in-context. To model multi-modal problems, we assume the observed data arises from a latent factor model. Our first result comprises a negative take on expressibility: we prove that single-layer, linear self-attention fails to recover the Bayes-optimal predictor uniformly over the task distribution. To address this limitation, we introduce a novel, linearized cross-attention mechanism, which we study in the regime where both the number of cross-attention layers and the context length are large. We show that this cross-attention mechanism is provably Bayes optimal when optimized using gradient flow. Our results underscore the benefits of depth for in-context learning and establish the provable utility of cross-attention for multi-modal distributions.

Cross-lists for Thu, 5 Feb 26

[50]  arXiv:2602.03906 (cross-list from cs.LG) [pdf, ps, other]
Title: GeoIB: Geometry-Aware Information Bottleneck via Statistical-Manifold Compression
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

Information Bottleneck (IB) is widely used, but in deep learning, it is usually implemented through tractable surrogates, such as variational bounds or neural mutual information (MI) estimators, rather than directly controlling the MI I(X;Z) itself. The looseness and estimator-dependent bias can make IB "compression" only indirectly controlled and optimization fragile.
We revisit the IB problem through the lens of information geometry and propose a \textbf{Geo}metric \textbf{I}nformation \textbf{B}ottleneck (\textbf{GeoIB}) that dispenses with mutual information (MI) estimation. We show that I(X;Z) and I(Z;Y) admit exact projection forms as minimal Kullback-Leibler (KL) distances from the joint distributions to their respective independence manifolds. Guided by this view, GeoIB controls information compression with two complementary terms: (i) a distribution-level Fisher-Rao (FR) discrepancy, which matches KL to second order and is reparameterization-invariant; and (ii) a geometry-level Jacobian-Frobenius (JF) term that provides a local capacity-type upper bound on I(Z;X) by penalizing pullback volume expansion of the encoder. We further derive a natural-gradient optimizer consistent with the FR metric and prove that the standard additive natural-gradient step is first-order equivalent to the geodesic update. We conducted extensive experiments and observed that the GeoIB achieves a better trade-off between prediction accuracy and compression ratio in the information plane than the mainstream IB baselines on popular datasets. GeoIB improves invariance and optimization stability by unifying distributional and geometric regularization under a single bottleneck multiplier. The source code of GeoIB is released at "https://anonymous.4open.science/r/G-IB-0569".

[51]  arXiv:2602.03911 (cross-list from cs.LG) [pdf, ps, other]
Title: The Role of Target Update Frequencies in Q-Learning
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

The target network update frequency (TUF) is a central stabilization mechanism in (deep) Q-learning. However, their selection remains poorly understood and is often treated merely as another tunable hyperparameter rather than as a principled design decision. This work provides a theoretical analysis of target fixing in tabular Q-learning through the lens of approximate dynamic programming. We formulate periodic target updates as a nested optimization scheme in which each outer iteration applies an inexact Bellman optimality operator, approximated by a generic inner loop optimizer. Rigorous theory yields a finite-time convergence analysis for the asynchronous sampling setting, specializing to stochastic gradient descent in the inner loop. Our results deliver an explicit characterization of the bias-variance trade-off induced by the target update period, showing how to optimally set this critical hyperparameter. We prove that constant target update schedules are suboptimal, incurring a logarithmic overhead in sample complexity that is entirely avoidable with adaptive schedules. Our analysis shows that the optimal target update frequency increases geometrically over the course of the learning process.

[52]  arXiv:2602.03914 (cross-list from cs.LG) [pdf, ps, other]
Title: Causal Discovery for Cross-Sectional Data Based on Super-Structure and Divide-and-Conquer
Authors: Wenyu Wang (1), Yaping Wan (1) ((1) University of South China)
Comments: 7 pages,16 figures
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

This paper tackles a critical bottleneck in Super-Structure-based divide-and-conquer causal discovery: the high computational cost of constructing accurate Super-Structures--particularly when conditional independence (CI) tests are expensive and domain knowledge is unavailable. We propose a novel, lightweight framework that relaxes the strict requirements on Super-Structure construction while preserving the algorithmic benefits of divide-and-conquer. By integrating weakly constrained Super-Structures with efficient graph partitioning and merging strategies, our approach substantially lowers CI test overhead without sacrificing accuracy. We instantiate the framework in a concrete causal discovery algorithm and rigorously evaluate its components on synthetic data. Comprehensive experiments on Gaussian Bayesian networks, including magic-NIAB, ECOLI70, and magic-IRRI, demonstrate that our method matches or closely approximates the structural accuracy of PC and FCI while drastically reducing the number of CI tests. Further validation on the real-world China Health and Retirement Longitudinal Study (CHARLS) dataset confirms its practical applicability. Our results establish that accurate, scalable causal discovery is achievable even under minimal assumptions about the initial Super-Structure, opening new avenues for applying divide-and-conquer methods to large-scale, knowledge-scarce domains such as biomedical and social science research.

[53]  arXiv:2602.03919 (cross-list from cond-mat.stat-mech) [pdf, ps, other]
Title: Tsallis Entropy derived from the Chaitin-Kolmogorov Informational Entropy
Authors: Airton Deppman
Comments: 16 pages 1 figure
Subjects: Statistical Mechanics (cond-mat.stat-mech); Statistics Theory (math.ST)

We provide a rigorous first-principle derivation of the non-additive Tsallis' entropy by employing the Chaitin-Kolmogorov algorithmic information theory. By applying non-local restrictive rules on the string formation (grammar), we show that the algorithmic cost follows a power-law of the string length, instead of the linear behaviour obtained in the classical theory. As a result, the Tsallis entropy governs the increase of information. We explore the result showing, through Landauer's limit, that the heat dissipation in systems with long-range correlations is diminished. The $\Omega_q$ number, which remains incompressible, now offers the possibility of a continuous increase of complexity, measured by the parameter $q$. We show the consistency of the results by a numerical simulation, and discuss Zipf's law in light of the new findings.

[54]  arXiv:2602.03999 (cross-list from math.PR) [pdf, ps, other]
Title: Functional Stochastic Localization
Comments: Comments welcome!
Subjects: Probability (math.PR); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

Eldan's stochastic localization is a probabilistic construction that has proved instrumental to modern breakthroughs in high-dimensional geometry and the design of sampling algorithms. Motivated by sampling under non-Euclidean geometries and the mirror descent algorithm in optimization, we develop a functional generalization of Eldan's process that replaces Gaussian regularization with regularization by any positive integer multiple of a log-Laplace transform. We further give a mixing time bound on the Markov chain induced by our localization process, which holds if our target distribution satisfies a functional Poincar\'e inequality. Finally, we apply our framework to differentially private convex optimization in $\ell_p$ norms for $p \in [1, 2)$, where we improve state-of-the-art query complexities in a zeroth-order model.

[55]  arXiv:2602.04021 (cross-list from cs.LG) [pdf, ps, other]
Title: Group Contrastive Learning for Weakly Paired Multimodal Data
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

We present GROOVE, a semi-supervised multi-modal representation learning approach for high-content perturbation data where samples across modalities are weakly paired through shared perturbation labels but lack direct correspondence. Our primary contribution is GroupCLIP, a novel group-level contrastive loss that bridges the gap between CLIP for paired cross-modal data and SupCon for uni-modal supervised contrastive learning, addressing a fundamental gap in contrastive learning for weakly-paired settings. We integrate GroupCLIP with an on-the-fly backtranslating autoencoder framework to encourage cross-modally entangled representations while maintaining group-level coherence within a shared latent space. Critically, we introduce a comprehensive combinatorial evaluation framework that systematically assesses representation learners across multiple optimal transport aligners, addressing key limitations in existing evaluation strategies. This framework includes novel simulations that systematically vary shared versus modality-specific perturbation effects enabling principled assessment of method robustness. Our combinatorial benchmarking reveals that there is not yet an aligner that uniformly dominates across settings or modality pairs. Across simulations and two real single-cell genetic perturbation datasets, GROOVE performs on par with or outperforms existing approaches for downstream cross-modal matching and imputation tasks. Our ablation studies demonstrate that GroupCLIP is the key component driving performance gains. These results highlight the importance of leveraging group-level constraints for effective multi-modal representation learning in scenarios where only weak pairing is available.

[56]  arXiv:2602.04028 (cross-list from cs.AI) [pdf, ps, other]
Title: Axiomatic Foundations of Counterfactual Explanations
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)

Explaining autonomous and intelligent systems is critical in order to improve trust in their decisions. Counterfactuals have emerged as one of the most compelling forms of explanation. They address ``why not'' questions by revealing how decisions could be altered. Despite the growing literature, most existing explainers focus on a single type of counterfactual and are restricted to local explanations, focusing on individual instances. There has been no systematic study of alternative counterfactual types, nor of global counterfactuals that shed light on a system's overall reasoning process.
This paper addresses the two gaps by introducing an axiomatic framework built on a set of desirable properties for counterfactual explainers. It proves impossibility theorems showing that no single explainer can satisfy certain axiom combinations simultaneously, and fully characterizes all compatible sets. Representation theorems then establish five one-to-one correspondences between specific subsets of axioms and the families of explainers that satisfy them. Each family gives rise to a distinct type of counterfactual explanation, uncovering five fundamentally different types of counterfactuals. Some of these correspond to local explanations, while others capture global explanations. Finally, the framework situates existing explainers within this taxonomy, formally characterizes their behavior, and analyzes the computational complexity of generating such explanations.

[57]  arXiv:2602.04042 (cross-list from cs.LG) [pdf, ps, other]
Title: Partition Trees: Conditional Density Estimation over General Outcome Spaces
Comments: Code available at this https URL
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

We propose Partition Trees, a tree-based framework for conditional density estimation over general outcome spaces, supporting both continuous and categorical variables within a unified formulation. Our approach models conditional distributions as piecewise-constant densities on data adaptive partitions and learns trees by directly minimizing conditional negative log-likelihood. This yields a scalable, nonparametric alternative to existing probabilistic trees that does not make parametric assumptions about the target distribution. We further introduce Partition Forests, an ensemble extension obtained by averaging conditional densities. Empirically, we demonstrate improved probabilistic prediction over CART-style trees and competitive or superior performance compared to state-of-the-art probabilistic tree methods and Random Forests, along with robustness to redundant features and heteroscedastic noise.

[58]  arXiv:2602.04078 (cross-list from cs.LG) [pdf, ps, other]
Title: Principles of Lipschitz continuity in neural networks
Authors: Róisín Luo
Comments: Ph.D. Thesis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Deep learning has achieved remarkable success across a wide range of domains, significantly expanding the frontiers of what is achievable in artificial intelligence. Yet, despite these advances, critical challenges remain -- most notably, ensuring robustness to small input perturbations and generalization to out-of-distribution data. These critical challenges underscore the need to understand the underlying fundamental principles that govern robustness and generalization. Among the theoretical tools available, Lipschitz continuity plays a pivotal role in governing the fundamental properties of neural networks related to robustness and generalization. It quantifies the worst-case sensitivity of network's outputs to small input perturbations. While its importance is widely acknowledged, prior research has predominantly focused on empirical regularization approaches based on Lipschitz constraints, leaving the underlying principles less explored. This thesis seeks to advance a principled understanding of the principles of Lipschitz continuity in neural networks within the paradigm of machine learning, examined from two complementary perspectives: an internal perspective -- focusing on the temporal evolution of Lipschitz continuity in neural networks during training (i.e., training dynamics); and an external perspective -- investigating how Lipschitz continuity modulates the behavior of neural networks with respect to features in the input data, particularly its role in governing frequency signal propagation (i.e., modulation of frequency signal propagation).

[59]  arXiv:2602.04164 (cross-list from cs.ET) [pdf, ps, other]
Title: The Dynamics of Attention across Automated and Manual Driving Modes: A Driving Simulation Study
Subjects: Emerging Technologies (cs.ET); Applications (stat.AP); Computation (stat.CO); Other Statistics (stat.OT)

This study aims to explore the dynamics of driver attention to various zones, including the road, the central mirror, the embedded Human-Machine Interface (HMI), and the speedometer, across different driving modes in AVs. The integration of autonomous vehicles (AVs) into transportation systems has introduced critical safety concerns, particularly regarding driver re-engagement during mode transitions. Past accidents underscore the risks of overreliance on automation and highlight the need to understand dynamic attention allocation to support safety in autonomous driving. A high-fidelity driving simulation was conducted. Eye-tracking technology was used to measure fixation duration, fixation count, and time to first fixation across distinct driving modes (automated, manual, and transition), which were then used to assess how drivers allocated attention to various areas of interest (AOIs). Findings show that drivers' attention varies significantly across driving modes. In manual mode, attention consistently focuses on the road, while in automated mode, prolonged fixation on the embedded HMI was observed. During the handover and takeover phases, attention shifts dynamically between environmental and technological elements. The study reveals that driver attention allocation is mode-dependent. These findings inform the design of adaptive HMIs in AVs that align with drivers' attention patterns. By presenting relevant information according to the driving context, such systems can enhance driver-vehicle interaction, support effective transitions, and improve overall safety. Systematic analysis of visual attention dynamics across driving modes is gaining prominence, as it informs adaptive HMI designs and driver readiness interventions. The GLMM findings can be directly applied to the design of adaptive HMIs or driver training programs to enhance attention and improve safety.

[60]  arXiv:2602.04189 (cross-list from cs.LG) [pdf, ps, other]
Title: Benchmarking Uncertainty Quantification of Plug-and-Play Diffusion Priors for Inverse Problems Solving
Subjects: Machine Learning (cs.LG); Computation (stat.CO)

Plug-and-play diffusion priors (PnPDP) have become a powerful paradigm for solving inverse problems in scientific and engineering domains. Yet, current evaluations of reconstruction quality emphasize point-estimate accuracy metrics on a single sample, which do not reflect the stochastic nature of PnPDP solvers and the intrinsic uncertainty of inverse problems, critical for scientific tasks. This creates a fundamental mismatch: in inverse problems, the desired output is typically a posterior distribution and most PnPDP solvers induce a distribution over reconstructions, but existing benchmarks only evaluate a single reconstruction, ignoring distributional characterization such as uncertainty. To address this gap, we conduct a systematic study to benchmark the uncertainty quantification (UQ) of existing diffusion inverse solvers. Specifically, we design a rigorous toy model simulation to evaluate the uncertainty behavior of various PnPDP solvers, and propose a UQ-driven categorization. Through extensive experiments on toy simulations and diverse real-world scientific inverse problems, we observe uncertainty behaviors consistent with our taxonomy and theoretical justification, providing new insights for evaluating and understanding the uncertainty for PnPDPs.

[61]  arXiv:2602.04250 (cross-list from math.PR) [pdf, ps, other]
Title: A Note on Physical Dependence and Mixing Conditions for Triangular Arrays
Comments: Keywords: Weak Dependence, Strong Mixing, $\beta$-Mixing, Physical Dependence, Triangular Arrays, Local Stationarity
Subjects: Probability (math.PR); Statistics Theory (math.ST)

Under mild structural assumptions and regularity conditions on the marginal and conditional densities, an explicit bound on the $\beta$-mixing coefficients in terms of the physical dependence measure is provided. Consequently, weak physical dependence implies $\beta$-mixing and strong mixing for triangular arrays, complementing Hill (2025), who proved the converse implication under moment assumptions.

[62]  arXiv:2602.04270 (cross-list from cs.LG) [pdf, ps, other]
Title: Multi-Integration of Labels across Categories for Component Identification (MILCCI)
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

Many fields collect large-scale temporal data through repeated measurements (trials), where each trial is labeled with a set of metadata variables spanning several categories. For example, a trial in a neuroscience study may be linked to a value from category (a): task difficulty, and category (b): animal choice. A critical challenge in time-series analysis is to understand how these labels are encoded within the multi-trial observations, and disentangle the distinct effect of each label entry across categories. Here, we present MILCCI, a novel data-driven method that i) identifies the interpretable components underlying the data, ii) captures cross-trial variability, and iii) integrates label information to understand each category's representation within the data. MILCCI extends a sparse per-trial decomposition that leverages label similarities within each category to enable subtle, label-driven cross-trial adjustments in component compositions and to distinguish the contribution of each category. MILCCI also learns each component's corresponding temporal trace, which evolves over time within each trial and varies flexibly across trials. We demonstrate MILCCI's performance through both synthetic and real-world examples, including voting patterns, online page view trends, and neuronal recordings.

[63]  arXiv:2602.04408 (cross-list from cs.LG) [pdf, ps, other]
Title: Separation-Utility Pareto Frontier: An Information-Theoretic Characterization
Authors: Shizhou Xu
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the Pareto frontier (optimal trade-off) between utility and separation, a fairness criterion requiring predictive independence from sensitive attributes conditional on the true outcome. Through an information-theoretic lens, we prove a characterization of the utility-separation Pareto frontier, establish its concavity, and thereby prove the increasing marginal cost of separation in terms of utility. In addition, we characterize the conditions under which this trade-off becomes strict, providing a guide for trade-off selection in practice. Based on the theoretical characterization, we develop an empirical regularizer based on conditional mutual information (CMI) between predictions and sensitive attributes given the true outcome. The CMI regularizer is compatible with any deep model trained via gradient-based optimization and serves as a scalar monitor of residual separation violations, offering tractable guarantees during training. Finally, numerical experiments support our theoretical findings: across COMPAS, UCI Adult, UCI Bank, and CelebA, the proposed method substantially reduces separation violations while matching or exceeding the utility of established baseline methods. This study thus offers a provable, stable, and flexible approach to enforcing separation in deep learning.

[64]  arXiv:2602.04527 (cross-list from cs.GT) [pdf, ps, other]
Title: Graph-Based Audits for Meek Single Transferable Vote Elections
Subjects: Computer Science and Game Theory (cs.GT); Applications (stat.AP)

In the context of election security, a Risk-Limiting Audit (RLA) is a statistical framework that uses a minimal partial recount of the ballots to guarantee that the results of the election were correctly reported. A generalized RLA framework has remained elusive for algorithmic election rules such as the Single Transferable Vote (STV) rule, because of the dependence of these rules on the chronology of eliminations and elections leading to the outcome of the election. This paper proposes a new graph-based approach to audit these algorithmic election rules, by considering the space of all possible sequences of elections and eliminations. If we fix a subgraph of this universal space ahead of the audit, a sufficient strategy is to verify statistically that the true election sequence does not leave the fixed subgraph. This makes for a flexible framework to audit these elections in a chronology-agnostic way.

[65]  arXiv:2602.04548 (cross-list from cs.LG) [pdf, ps, other]
Title: Gradient Flow Through Diagram Expansions: Learning Regimes and Explicit Solutions
Comments: 48 pages, under review for ICML'2026
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We develop a general mathematical framework to analyze scaling regimes and derive explicit analytic solutions for gradient flow (GF) in large learning problems. Our key innovation is a formal power series expansion of the loss evolution, with coefficients encoded by diagrams akin to Feynman diagrams. We show that this expansion has a well-defined large-size limit that can be used to reveal different learning phases and, in some cases, to obtain explicit solutions of the nonlinear GF. We focus on learning Canonical Polyadic (CP) decompositions of high-order tensors, and show that this model has several distinct extreme lazy and rich GF regimes such as free evolution, NTK and under- and over-parameterized mean-field. We show that these regimes depend on the parameter scaling, tensor order, and symmetry of the model in a specific and subtle way. Moreover, we propose a general approach to summing the formal loss expansion by reducing it to a PDE; in a wide range of scenarios, it turns out to be 1st order and solvable by the method of characteristics. We observe a very good agreement of our theoretical predictions with experiment.

[66]  arXiv:2602.04550 (cross-list from quant-ph) [pdf, ps, other]
Title: Locally Gentle State Certification for High Dimensional Quantum Systems
Subjects: Quantum Physics (quant-ph); Statistics Theory (math.ST)

Standard approaches to quantum statistical inference rely on measurements that induce a collapse of the wave function, effectively consuming the quantum state to extract information. In this work, we investigate the fundamental limits of \emph{locally-gentle} quantum state certification, where the learning algorithm is constrained to perturb the state by at most $\alpha$ in trace norm, thereby allowing for the reuse of samples. We analyze the hypothesis testing problem of distinguishing whether an unknown state $\rho$ is equal to a reference $\rho_0$ or $\epsilon$-far from it. We derive the minimax sample complexity for this problem, quantifying the information-theoretic price of non-destructive measurements. Specifically, by constructing explicit measurement operators, we show that the constraint of $\alpha$-gentleness imposes a sample size penalty of $\frac{d}{\alpha^2}$, yielding a total sample complexity of $n = \Theta(\frac{d^3}{\epsilon^2 \alpha^2})$. Our results clarify the trade-off between information extraction and state disturbance, and highlight deep connections between physical measurement constraints and privacy mechanisms in quantum learning. Crucially, we find that the sample size penalty incurred by enforcing $\alpha$-gentleness scales linearly with the Hilbert-space dimension $d$ rather than the number of parameters $d^2-1$ typical for high-dimensional private estimation.

[67]  arXiv:2602.04761 (cross-list from cs.LG) [pdf, ps, other]
Title: Improved Dimension Dependence for Bandit Convex Optimization with Gradient Variations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Gradient-variation online learning has drawn increasing attention due to its deep connections to game theory, optimization, etc. It has been studied extensively in the full-information setting, but is underexplored with bandit feedback. In this work, we focus on gradient variation in Bandit Convex Optimization (BCO) with two-point feedback. By proposing a refined analysis on the non-consecutive gradient variation, a fundamental quantity in gradient variation with bandits, we improve the dimension dependence for both convex and strongly convex functions compared with the best known results (Chiang et al., 2013). Our improved analysis for the non-consecutive gradient variation also implies other favorable problem-dependent guarantees, such as gradient-variance and small-loss regrets. Beyond the two-point setup, we demonstrate the versatility of our technique by achieving the first gradient-variation bound for one-point bandit linear optimization over hyper-rectangular domains. Finally, we validate the effectiveness of our results in more challenging tasks such as dynamic/universal regret minimization and bandit games, establishing the first gradient-variation dynamic and universal regret bounds for two-point BCO and fast convergence rates in bandit games.

[68]  arXiv:2602.04762 (cross-list from q-bio.PE) [pdf, ps, other]
Title: Uncertainty in Island-based Ecosystem Services and Climate Change
Subjects: Populations and Evolution (q-bio.PE); Applications (stat.AP); Other Statistics (stat.OT)

Small and medium-sized islands are acutely exposed to climate change and ecosystem degradation, yet the extent to which uncertainty is systematically addressed in scientific assessments of their ecosystem services remains poorly understood. This study revisits 226 peer-reviewed articles drawn from two global systematic reviews on island ecosystem services and climate change, applying a structured post hoc analysis to evaluate how uncertainty is treated across methods, service categories, ecosystem realms, and decision contexts. Studies were classified according to whether uncertainty was explicitly analysed, just mentioned, or ignored. Only 30 percent of studies incorporated uncertainty explicitly, while more than half did not address it at all. Scenario-based approaches dominated uncertainty assessment, whereas probabilistic and ensemble-based frameworks remained limited. Cultural ecosystem services and extreme climate impacts exhibited the lowest levels of uncertainty integration, and few studies connected uncertainty treatment to policy relevant decision frameworks. Weak or absent treatment of uncertainty emerges as a structural challenge in island systems, where narrow ecological thresholds, strong land-sea coupling, limited spatial buffers, and reduced institutional redundancy amplify the consequences of decision-making under incomplete knowledge. Systematic mapping of how uncertainty is framed, operationalised, or neglected reveals persistent methodological and conceptual gaps and informs concrete directions for strengthening uncertainty integration in future island-focused ecosystem service and climate assessments. Embedding uncertainty more robustly into modelling practices, participatory processes, and policy tools is essential for enhancing scientific credibility, governance relevance, and adaptive capacity in insular socio-ecological systems.

[69]  arXiv:2602.04774 (cross-list from cond-mat.dis-nn) [pdf, ps, other]
Title: Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); Machine Learning (stat.ML)

Setting the learning rate for a deep learning model is a critical part of successful training, yet choosing this hyperparameter is often done empirically with trial and error. In this work, we explore a solvable model of optimal learning rate schedules for a powerlaw random feature model trained with stochastic gradient descent (SGD). We consider the optimal schedule $\eta_T^\star(t)$ where $t$ is the current iterate and $T$ is the total training horizon. This schedule is computed both numerically and analytically (when possible) using optimal control methods. Our analysis reveals two regimes which we term the easy phase and hard phase. In the easy phase the optimal schedule is a polynomial decay $\eta_T^\star(t) \simeq T^{-\xi} (1-t/T)^{\delta}$ where $\xi$ and $\delta$ depend on the properties of the features and task. In the hard phase, the optimal schedule resembles warmup-stable-decay with constant (in $T$) initial learning rate and annealing performed over a vanishing (in $T$) fraction of training steps. We investigate joint optimization of learning rate and batch size, identifying a degenerate optimality condition. Our model also predicts the compute-optimal scaling laws (where model size and training steps are chosen optimally) in both easy and hard regimes. Going beyond SGD, we consider optimal schedules for the momentum $\beta(t)$, where speedups in the hard phase are possible. We compare our optimal schedule to various benchmarks in our task including (1) optimal constant learning rates $\eta_T(t) \sim T^{-\xi}$ (2) optimal power laws $\eta_T(t) \sim T^{-\xi} t^{-\chi}$, finding that our schedule achieves better rates than either of these. Our theory suggests that learning rate transfer across training horizon depends on the structure of the model and task. We explore these ideas in simple experimental pretraining setups.

[70]  arXiv:2602.04795 (cross-list from cs.LG) [pdf, ps, other]
Title: Maximum-Volume Nonnegative Matrix Factorization
Comments: arXiv admin note: substantial text overlap with arXiv:2412.06380
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Numerical Analysis (math.NA); Machine Learning (stat.ML)

Nonnegative matrix factorization (NMF) is a popular data embedding technique. Given a nonnegative data matrix $X$, it aims at finding two lower dimensional matrices, $W$ and $H$, such that $X\approx WH$, where the factors $W$ and $H$ are constrained to be element-wise nonnegative. The factor $W$ serves as a basis for the columns of $X$. In order to obtain more interpretable and unique solutions, minimum-volume NMF (MinVol NMF) minimizes the volume of $W$. In this paper, we consider the dual approach, where the volume of $H$ is maximized instead; this is referred to as maximum-volume NMF (MaxVol NMF). MaxVol NMF is identifiable under the same conditions as MinVol NMF in the noiseless case, but it behaves rather differently in the presence of noise. In practice, MaxVol NMF is much more effective to extract a sparse decomposition and does not generate rank-deficient solutions. In fact, we prove that the solutions of MaxVol NMF with the largest volume correspond to clustering the columns of $X$ in disjoint clusters, while the solutions of MinVol NMF with smallest volume are rank deficient. We propose two algorithms to solve MaxVol NMF. We also present a normalized variant of MaxVol NMF that exhibits better performance than MinVol NMF and MaxVol NMF, and can be interpreted as a continuum between standard NMF and orthogonal NMF. We illustrate our results in the context of hyperspectral unmixing.

[71]  arXiv:2602.04863 (cross-list from cs.LG) [pdf, ps, other]
Title: Subliminal Effects in Your Data: A General Mechanism via Log-Linearity
Comments: Code available at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

Training modern large language models (LLMs) has become a veritable smorgasbord of algorithms and datasets designed to elicit particular behaviors, making it critical to develop techniques to understand the effects of datasets on the model's properties. This is exacerbated by recent experiments that show datasets can transmit signals that are not directly observable from individual datapoints, posing a conceptual challenge for dataset-centric understandings of LLM training and suggesting a missing fundamental account of such phenomena. Towards understanding such effects, inspired by recent work on the linear structure of LLMs, we uncover a general mechanism through which hidden subtexts can arise in generic datasets.
We introduce Logit-Linear-Selection (LLS), a method that prescribes how to select subsets of a generic preference dataset to elicit a wide range of hidden effects. We apply LLS to discover subsets of real-world datasets so that models trained on them exhibit behaviors ranging from having specific preferences, to responding to prompts in a different language not present in the dataset, to taking on a different persona. Crucially, the effect persists for the selected subset, across models with varying architectures, supporting its generality and universality.

Replacements for Thu, 5 Feb 26

[72]  arXiv:2305.00081 (replaced) [pdf, ps, other]
Title: Mixture Quantiles Estimated by Constrained Linear Regression
Subjects: Methodology (stat.ME)
[73]  arXiv:2305.19557 (replaced) [pdf, ps, other]
Title: Dictionary Learning under Symmetries via Group Representations
Comments: 33 pages, 3 figures
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[74]  arXiv:2306.10767 (replaced) [pdf, ps, other]
Title: P-Tensors: a General Formalism for Constructing Higher Order Message Passing Networks
Journal-ref: Proc. AISTATS, PMLR 238:424-432, 2024
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[75]  arXiv:2310.01184 (replaced) [pdf, ps, other]
Title: Applications of Improvements to the Pythagorean Won-Loss Expectation in Optimizing Rosters
Subjects: Applications (stat.AP)
[76]  arXiv:2403.14881 (replaced) [pdf, ps, other]
Title: The German Tank Problem with Multiple Factories
Subjects: Statistics Theory (math.ST)
[77]  arXiv:2404.01390 (replaced) [pdf, ps, other]
Title: Convex relaxation for the generalized maximum-entropy sampling problem
Subjects: Statistics Theory (math.ST); Optimization and Control (math.OC)
[78]  arXiv:2405.04636 (replaced) [pdf, ps, other]
Title: Data-driven Error Estimation: Excess Risk Bounds without Class Complexity as Input
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[79]  arXiv:2407.13731 (replaced) [pdf, ps, other]
Title: Predictive Low Rank Matrix Learning under Partial Observations: Mixed-Projection ADMM
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[80]  arXiv:2410.07427 (replaced) [pdf, ps, other]
Title: A Generalization Bound for a Family of Implicit Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[81]  arXiv:2410.18844 (replaced) [pdf, ps, other]
Title: Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)
[82]  arXiv:2410.18869 (replaced) [pdf, ps, other]
Title: On the Mean-Field limit of diffusive games through the master equation: $L^{\infty}$ estimates and extreme value behavior
Comments: 41 pages including references
Subjects: Probability (math.PR); Analysis of PDEs (math.AP); Optimization and Control (math.OC); Statistics Theory (math.ST); Mathematical Finance (q-fin.MF)
[83]  arXiv:2501.06148 (replaced) [pdf, ps, other]
Title: From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training
Comments: TMLR final version; code: this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[84]  arXiv:2504.00049 (replaced) [pdf, ps, other]
Title: Scalable Durational Event Models: Application to Physical and Digital Interactions
Subjects: Methodology (stat.ME); Computation (stat.CO)
[85]  arXiv:2504.21106 (replaced) [pdf, ps, other]
Title: An Axiomatic Approach to Comparing Sensitivity Parameters
Comments: This paper is a revised, shorter version of our now-superseded previous working paper arXiv:2206.02303v4, without the identification analysis or empirical results of the former sections 4 and 5. The identification analysis and empirical results can now be found in our companion paper arXiv:2206.02303v5
Subjects: Econometrics (econ.EM); Methodology (stat.ME)
[86]  arXiv:2505.00785 (replaced) [pdf, ps, other]
Title: Proper Correlation Coefficients for Nominal Random Variables
Subjects: Methodology (stat.ME); Econometrics (econ.EM)
[87]  arXiv:2505.13732 (replaced) [pdf, ps, other]
Title: Backward Conformal Prediction
Comments: Code available at: this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[88]  arXiv:2505.18526 (replaced) [pdf, ps, other]
Title: Scalable Deep Basis Kernel Gaussian Processes
Comments: Previous title: Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[89]  arXiv:2506.01913 (replaced) [pdf, ps, other]
Title: Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[90]  arXiv:2506.06571 (replaced) [pdf, ps, other]
Title: Graph Persistence goes Spectral
Comments: 32 pages, 4 figures, 7 tables. Accepted at NeurIPS 2025. Final version, clarified minor bug
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[91]  arXiv:2506.12818 (replaced) [pdf, ps, other]
Title: Taking the GP Out of the Loop
Comments: 12 pages, 11 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[92]  arXiv:2506.15492 (replaced) [pdf, ps, other]
Title: LIT-LVM: Structured Regularization for Interaction Terms in Linear Predictors using Latent Variable Models
Comments: Published in the Transactions on Machine Learning Research (2025). this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[93]  arXiv:2506.24007 (replaced) [pdf, ps, other]
Title: Minimax and Bayes Optimal Best-Arm Identification
Authors: Masahiro Kato
Subjects: Econometrics (econ.EM); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
[94]  arXiv:2507.06969 (replaced) [pdf, ps, other]
Title: Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy
Comments: NeurIPS 2025
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Machine Learning (stat.ML)
[95]  arXiv:2507.10419 (replaced) [pdf, ps, other]
Title: Multiple Choice Learning of Low-Rank Adapters for Language Modeling
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
[96]  arXiv:2508.05844 (replaced) [pdf, ps, other]
Title: Online Budget Allocation with Censored Semi-Bandit Feedback
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Machine Learning (stat.ML)
[97]  arXiv:2508.06377 (replaced) [pdf, ps, other]
Title: DP-SPRT: Differentially Private Sequential Probability Ratio Tests
Comments: Accepted for spotlight presentation at AISTATS 2026. 36 pages, 5 figures, 1 table
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Statistics Theory (math.ST)
[98]  arXiv:2508.13131 (replaced) [pdf, ps, other]
Title: Improving Detection of Watermarked Language Models
Comments: Published at TMLR 2026
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
[99]  arXiv:2509.03317 (replaced) [pdf, ps, other]
Title: Bayesian Additive Regression Trees for functional ANOVA model
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[100]  arXiv:2509.24095 (replaced) [pdf, ps, other]
Title: Singleton-Optimized Conformal Prediction
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[101]  arXiv:2510.04441 (replaced) [pdf, ps, other]
Title: Domain Generalization Under Posterior Drift
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[102]  arXiv:2510.04769 (replaced) [pdf, ps, other]
Title: When Do Credal Sets Stabilize? Fixed-Point Theorems for Credal Set Updates
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)
[103]  arXiv:2510.06136 (replaced) [pdf, ps, other]
Title: Geometric Model Selection for Latent Space Network Models: Hypothesis Testing via Multidimensional Scaling and Resampling Techniques
Subjects: Methodology (stat.ME)
[104]  arXiv:2510.07473 (replaced) [pdf, ps, other]
Title: metabeta -- A fast neural model for Bayesian mixed-effects regression
Comments: 19 pages, 9 main text, 8 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[105]  arXiv:2510.12663 (replaced) [pdf, ps, other]
Title: The $α$--regression for compositional data: a unified framework for standard, spatially-lagged, spatial autoregressive and geographically-weighted regression models
Subjects: Methodology (stat.ME)
[106]  arXiv:2510.13060 (replaced) [pdf, ps, other]
Title: Achieving Logarithmic Regret in KL-Regularized Zero-Sum Markov Games
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC); Machine Learning (stat.ML)
[107]  arXiv:2511.02235 (replaced) [pdf, ps, other]
Title: Diffusion Index Forecasting with Tensor Data
Subjects: Methodology (stat.ME); Econometrics (econ.EM)
[108]  arXiv:2512.00252 (replaced) [pdf, ps, other]
Title: DAISI: Data Assimilation with Inverse Sampling using Stochastic Interpolants
Comments: 44 pages, 26 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
[109]  arXiv:2512.00698 (replaced) [pdf, ps, other]
Title: Flow Matching for Tabular Data Synthesis
Comments: 16 pages main, 19 pages appendix, 5 figures. Fixed results on Indonesia dataset, but not affect overall results. Added standard tabular generative model benchmark
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[110]  arXiv:2512.12742 (replaced) [pdf, ps, other]
Title: A Novel Framework Using Variational Inference with Normalizing Flows to Train Transport Reversible Jump Proposals
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[111]  arXiv:2512.13599 (replaced) [pdf, ps, other]
Title: Correcting exponentiality test for binned earthquake magnitudes
Subjects: Geophysics (physics.geo-ph); Methodology (stat.ME)
[112]  arXiv:2512.17688 (replaced) [pdf, ps, other]
Title: Convergence Guarantees for Federated SARSA with Local Training and Heterogeneous Agents
Comments: Deep FedSARSA !
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[113]  arXiv:2512.19805 (replaced) [pdf, ps, other]
Title: Guardrailed Uplift Targeting: A Causal Optimization Playbook for Marketing Strategy
Authors: Deepit Sapru
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
[114]  arXiv:2512.21005 (replaced) [pdf, ps, other]
Title: Learning from Neighbors with PHIBP: Predicting Infectious Disease Dynamics in Data-Sparse Environments
Comments: v2: Revised version incorporating peer review feedback from book chapter submission. Clarifies modeling objectives for infectious disease prediction and situates the work within a three-paper PHIBP framework, highlighting suitability for future AI/LLM plug-and-play model specification
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
[115]  arXiv:2601.06514 (replaced) [pdf, ps, other]
Title: Inference-Time Alignment for Diffusion Models via Variationally Stable Doob's Matching
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC); Statistics Theory (math.ST)
[116]  arXiv:2601.11213 (replaced) [pdf, ps, other]
Title: Study on Light Propagation through Space-Time Random Media via Stochastic Partial Differential Equations
Comments: 5page,3figures
Subjects: Optics (physics.optics); Mathematical Physics (math-ph); Applications (stat.AP)
[117]  arXiv:2601.13874 (replaced) [pdf, ps, other]
Title: Unified Unbiased Variance Estimation for Maximum Mean Discrepancy: Robust Finite-Sample Performance with Imbalanced Data and Exact Acceleration under Null and Alternative Hypotheses
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[118]  arXiv:2601.15500 (replaced) [pdf, ps, other]
Title: Low-Dimensional Adaptation of Rectified Flow: A Diffusion and Stochastic Localization Perspective
Comments: 32 pages, 7 figures
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)
[119]  arXiv:2601.22378 (replaced) [pdf, ps, other]
Title: It's all In the (Exponential) Family: An Equivalence between Maximum Likelihood Estimation and Control Variates for Sketching Algorithms
Comments: 36 pages, 15 figures, accepted to AISTATS 2026 (poster)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)
[120]  arXiv:2601.22409 (replaced) [pdf, ps, other]
Title: Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks
Comments: 41 pages, 3 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[121]  arXiv:2602.01928 (replaced) [pdf, ps, other]
Title: Privacy Amplification by Missing Data
Authors: Simon Roburin (LPSM (UMR\_8001)), Rafaël Pinot (LPSM (UMR\_8001)), Erwan Scornet (LPSM (UMR\_8001))
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[122]  arXiv:2602.02806 (replaced) [pdf, ps, other]
Title: De-Linearizing Agent Traces: Bayesian Inference of Latent Partial Orders for Efficient Execution
Subjects: Applications (stat.AP)
[ total of 122 entries: 1-122 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2602, contact, help  (Access key information)