Eloïse Berthier

Briefly

Since November 2022, I am a part-time researcher at ENSTA, at Unité d'Informatique et d'Ingénierie des Systèmes (U2IS).

From September 2019 to October 2022, I was a Ph.D. student under the supervision of Francis Bach. I have worked in the SIERRA team in Paris, which is a joint team between Inria Paris, ENS Paris and CNRS. My research focused on developing efficient algorithms for optimal control and reinforcement learning, with a particular interest in methods which can be applied to robotics, and which come with theoretical guarantees.

Before that, I have worked in the MLO team, under the supervision of Martin Jaggi, on privacy-preserving machine learning.

Contact

E-mail: eloise [dot] berthier [at] ensta [dot] fr
Physical address: ENSTA campus de Paris-Saclay, 828 Boulevard des Maréchaux, 91120 Palaiseau.

Selected Recent Publications and Preprints

R. Kazmierczak, E. Berthier, G. Frehse, G. Franchi. Explainability for Vision Foundation Models: A Survey. Information Fusion, 2025.
[arxiv] [Show Abstract]
Abstract: As artificial intelligence systems become increasingly integrated into daily life, the field of explainability has gained significant attention. This trend is particularly driven by the complexity of modern AI models and their decision-making processes. The advent of foundation models, characterized by their extensive generalization capabilities and emergent uses, has further complicated this landscape. Foundation models occupy an ambiguous position in the explainability domain: their complexity makes them inherently challenging to interpret, yet they are increasingly leveraged as tools to construct explainable models. In this survey, we explore the intersection of foundation models and eXplainable AI (XAI) in the vision domain. We begin by compiling a comprehensive corpus of papers that bridge these fields. Next, we categorize these works based on their architectural characteristics. We then discuss the challenges faced by current research in integrating XAI within foundation models. Furthermore, we review common evaluation methodologies for these combined approaches. Finally, we present key observations and insights from our survey, offering directions for future research in this rapidly evolving field.

D. Brellmann, E. Berthier, D. Filliat, G. Frehse. On double descent in reinforcement learning with LSTD and random features. The Twelfth International Conference on Learning Representations (ICLR), 2024.
[arxiv] [Show Abstract]
Abstract: Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and l2-regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define over-parameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Square Temporal Difference (LSTD) algorithm in an asymptotic regime, as both the number of parameters and states go to infinity, maintaining a constant ratio. We derive deterministic limits of both the empirical and the true Mean-Squared Bellman Error (MSBE) that feature correction terms responsible for the double descent. Correction terms vanish when the l2-regularization is increased or the number of unvisited states goes to zero. Numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.

E. Berthier, Z. Kobeissi, F. Bach. A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning. Advances in Neural Information Processing Systems (NeurIPS), 2022.
[hal, poster, slides] [Show Abstract]
Abstract: Temporal-difference learning is a popular algorithm for policy evaluation. In this paper, we study the convergence of the regularized non-parametric TD(0) algorithm, in both the independent and Markovian observation settings. In particular, when TD is performed in a universal reproducing kernel Hilbert space (RKHS), we prove convergence of the averaged iterates to the optimal value function, even when it does not belong to the RKHS. We provide explicit convergence rates that depend on a source condition relating the regularity of the optimal value function to the RKHS. We illustrate this convergence numerically on a simple continuous-state Markov reward process.