Eloïse Berthier

 

Briefly

Since November 2022, I am a part-time researcher at ENSTA, Institut Polytechnique de Paris, at Unité d'Informatique et d'Ingénierie des Systèmes (U2IS). I am also an Hi! Paris associate member. I am interested in computational methods for reinforcement learning and optimal control, along with global optimization methods, and explainability for machine learning. But of course I'm always open to collaborations outside these topics!

From September 2019 to October 2022, I was a Ph.D. student under the supervision of Francis Bach. I have worked in the SIERRA team in Paris, which is a joint team between Inria Paris, ENS Paris and CNRS. My research focused on developing efficient algorithms for optimal control and reinforcement learning, with a particular interest in methods which can be applied to robotics, and which come with theoretical guarantees.

Before that, I have worked in the MLO team, under the supervision of Martin Jaggi, on privacy-preserving machine learning.

Contact

  • E-mail: e......@ensta.fr

  • Physical address: ENSTA campus de Paris-Saclay, 828 Boulevard des Maréchaux, 91120 Palaiseau.

    Get directions with Citymapper

ENSTA & Hi! Paris 

Selected Recent Publications and Preprints

  • Z. Kobeissi, E. Berthier. Model-independent O(1/k)-convergence rate for TD(0) with linear function approximation, universal learning steps and iid samples. AISTATS, 2026.
    [conference] [Show Abstract]

    Abstract: In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method. We establish a new convergence rate, for the Mean-Square Error (MSE) on the approximated function, that is (i) fast in the sense that it admits an optimal dependency in the number of iterations k (i.e., of order 1/k), (ii) is robust to ill-conditioning: it only depends on an initial error and model-independent constants and (iii) is sharp up to a multiplicative constant lower than 11. In particular, it does not depend on the smallest eigenvalue of the uncentered covariance matrix of the linear parametrization, unlike all pre-existing O(1/k) rates in the TD(0) literature. We also introduce PCTD(0), a variant of TD(0), which benefits from better convergence properties under an additional assumption of strong mixing on the Markov Chain.

  • E. Mauduit, E. Berthier, A. Simonetto. No-Regret Gaussian Process Optimization of Time-Varying Functions. preprint, 2025.
    [arxiv] [Show Abstract]

    Abstract: Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary settings. However, for time-varying objectives, it is known that no-regret is unattainable under pure bandit feedback unless strong and often unrealistic assumptions are imposed. In this article, we propose a novel method to optimize time-varying rewards in the frequentist setting, where the objective has bounded RKHS norm. Time variations are captured through uncertainty injection (UI), which enables heteroscedastic GP regression that adapts past observations to the current time step. As no-regret is unattainable in general in the strict bandit setting, we relax the latter allowing additional queries on previously observed points. Building on sparse inference and the effect of UI on regret, we propose textbf{W-SparQ-GP-UCB}, an online algorithm that achieves no-regret with only a vanishing number of additional queries per iteration. To assess the theoretical limits of this approach, we establish a lower bound on the number of additional queries required for no-regret, proving the efficiency of our method. Finally, we provide a comprehensive analysis linking the degree of time-variation of the function to achievable regret rates, together with upper and lower bounds on the number of additional queries needed in each regime.

  • R. Kazmierczak, S. Azzolin, E. Berthier, A. Hedström, P. Delhomme, N. Bousquet, G. Frehse, M. Mancini, B. Caramiaux, A. Passerini, G. Franchi. Benchmarking XAI Explanations with Human-Aligned Evaluations. AAAI AI Alignment Track, 2026.
    [arxiv] [Show Abstract]

    Abstract: In this paper, we introduce PASTA (Perceptual Assessment System for explanaTion of Artificial intelligence), a novel framework for a human-centric evaluation of XAI techniques in computer vision. Our first key contribution is a human evaluation of XAI explanations on four diverse datasets (COCO, Pascal Parts, Cats Dogs Cars, and MonumAI) which constitutes the first large-scale benchmark dataset for XAI, with annotations at both the image and concept levels. This dataset allows for robust evaluation and comparison across various XAI methods. Our second major contribution is a data-based metric for assessing the interpretability of explanations. It mimics human preferences, based on a database of human evaluations of explanations in the PASTA-dataset. With its dataset and metric, the PASTA framework provides consistent and reliable comparisons between XAI techniques, in a way that is scalable but still aligned with human evaluations. Additionally, our benchmark allows for comparisons between explanations across different modalities, an aspect previously unaddressed. Our findings indicate that humans tend to prefer saliency maps over other explanation types. Moreover, we provide evidence that human assessments show a low correlation with existing XAI metrics that are numerically simulated by probing the model.