Eloïse Berthier

 

Briefly

Since November 2022, I am a part-time researcher at ENSTA, at Unité d'Informatique et d'Ingénierie des Systèmes (U2IS). I am interested in computational methods for reinforcement learning and optimal control, along with global optimization methods, and explainability for machine learning. But of course I'm always open to collaborations outside these topics!

From September 2019 to October 2022, I was a Ph.D. student under the supervision of Francis Bach. I have worked in the SIERRA team in Paris, which is a joint team between Inria Paris, ENS Paris and CNRS. My research focused on developing efficient algorithms for optimal control and reinforcement learning, with a particular interest in methods which can be applied to robotics, and which come with theoretical guarantees.

Before that, I have worked in the MLO team, under the supervision of Martin Jaggi, on privacy-preserving machine learning.

Contact

  • E-mail: e......@ensta.fr

  • Physical address: ENSTA campus de Paris-Saclay, 828 Boulevard des Maréchaux, 91120 Palaiseau.

    Get directions with Citymapper

Selected Recent Publications and Preprints

  • E. Mauduit, E. Berthier, A. Simonetto. No-Regret Gaussian Process Optimization of Time-Varying Functions. preprint, 2025.
    [arxiv] [Show Abstract]

    Abstract: Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary settings. However, for time-varying objectives, it is known that no-regret is unattainable under pure bandit feedback unless strong and often unrealistic assumptions are imposed. In this article, we propose a novel method to optimize time-varying rewards in the frequentist setting, where the objective has bounded RKHS norm. Time variations are captured through uncertainty injection (UI), which enables heteroscedastic GP regression that adapts past observations to the current time step. As no-regret is unattainable in general in the strict bandit setting, we relax the latter allowing additional queries on previously observed points. Building on sparse inference and the effect of UI on regret, we propose textbf{W-SparQ-GP-UCB}, an online algorithm that achieves no-regret with only a vanishing number of additional queries per iteration. To assess the theoretical limits of this approach, we establish a lower bound on the number of additional queries required for no-regret, proving the efficiency of our method. Finally, we provide a comprehensive analysis linking the degree of time-variation of the function to achievable regret rates, together with upper and lower bounds on the number of additional queries needed in each regime.

  • R. Kazmierczak, S. Azzolin, E. Berthier, A. Hedström, P. Delhomme, N. Bousquet, G. Frehse, M. Mancini, B. Caramiaux, A. Passerini, G. Franchi. Benchmarking XAI Explanations with Human-Aligned Evaluations. AAAI AI Alignment Track, 2026.
    [arxiv] [Show Abstract]

    Abstract: In this paper, we introduce PASTA (Perceptual Assessment System for explanaTion of Artificial intelligence), a novel framework for a human-centric evaluation of XAI techniques in computer vision. Our first key contribution is a human evaluation of XAI explanations on four diverse datasets (COCO, Pascal Parts, Cats Dogs Cars, and MonumAI) which constitutes the first large-scale benchmark dataset for XAI, with annotations at both the image and concept levels. This dataset allows for robust evaluation and comparison across various XAI methods. Our second major contribution is a data-based metric for assessing the interpretability of explanations. It mimics human preferences, based on a database of human evaluations of explanations in the PASTA-dataset. With its dataset and metric, the PASTA framework provides consistent and reliable comparisons between XAI techniques, in a way that is scalable but still aligned with human evaluations. Additionally, our benchmark allows for comparisons between explanations across different modalities, an aspect previously unaddressed. Our findings indicate that humans tend to prefer saliency maps over other explanation types. Moreover, we provide evidence that human assessments show a low correlation with existing XAI metrics that are numerically simulated by probing the model.

  • D. Brellmann, E. Berthier, D. Filliat, G. Frehse. On double descent in reinforcement learning with LSTD and random features. The Twelfth International Conference on Learning Representations (ICLR), 2024.
    [arxiv] [Show Abstract]

    Abstract: Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and l2-regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define over-parameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Square Temporal Difference (LSTD) algorithm in an asymptotic regime, as both the number of parameters and states go to infinity, maintaining a constant ratio. We derive deterministic limits of both the empirical and the true Mean-Squared Bellman Error (MSBE) that feature correction terms responsible for the double descent. Correction terms vanish when the l2-regularization is increased or the number of unvisited states goes to zero. Numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.