Eloïse Berthier
|
Briefly
Since November 2022, I am a part-time researcher at ENSTA Paris, at Unité d'Informatique et d'Ingénierie des Systèmes (U2IS).
From September 2019 to October 2022, I was a Ph.D. student under the supervision of Francis Bach. I have worked in the SIERRA team in Paris, which is a joint team between Inria Paris, ENS Paris and CNRS. My research focused on developing efficient algorithms for optimal control and reinforcement learning, with a particular interest in methods which can be applied to robotics, and which come with theoretical guarantees.
Before that, I have worked in the MLO team, under the supervision of Martin Jaggi, on privacy-preserving machine learning.
Contact
E-mail: eloise [dot] berthier [at] ensta-paris [dot] fr
Physical address: ENSTA Paris, 828 Boulevard des Maréchaux, 91120 Palaiseau.
|
Thesis defense!
I have defended my thesis on Thursday, October 27th 2022, at Inria Paris. You can download the manuscript. You can also have a look at the slides.
Recent Publications and Preprints
D. Brellmann, E. Berthier, D. Filliat, G. Frehse. On double-descent in reinforcement learning with LSTD and random features. International Conference on Learning Representations (ICLR), 2024. [arxiv] [Show Abstract]
Abstract: Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and l2-regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define over-parameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Square Temporal Difference (LSTD) algorithm in an asymptotic regime, as both the number of parameters and states go to infinity, maintaining a constant ratio. We derive deterministic limits of both the empirical and the true Mean-Squared Bellman Error (MSBE) that feature correction terms responsible for the double descent. Correction terms vanish when the l2-regularization is increased or the number of unvisited states goes to zero. Numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.
R. Kazmierczak, E. Berthier, G. Frehse, G. Franchi. CLIP-QDA: An explainable concept bottleneck model. arXiv preprint arXiv:2312.00110, 2023. [arxiv] [Show Abstract]
Abstract: In this paper, we introduce an explainable algorithm designed from a multi-modal foundation model, that performs fast and explainable image classification. Drawing inspiration from CLIP-based Concept Bottleneck Models (CBMs), our method creates a latent space where each neuron is linked to a specific word. Observing that this latent space can be modeled with simple distributions, we use a Mixture of Gaussians (MoG) formalism to enhance the interpretability of this latent space. Then, we introduce CLIP-QDA, a classifier that only uses statistical values to infer labels from the concepts. In addition, this formalism allows for both local and global explanations. These explanations come from the inner design of our architecture, our work is part of a new family of greybox models, combining performances of opaque foundation models and the interpretability of transparent models. Our empirical findings show that in instances where the MoG assumption holds, CLIP-QDA achieves similar accuracy with state-of-the-art methods CBMs. Our explanations compete with existing XAI methods while being faster to compute.
E. Berthier, Z. Kobeissi, F. Bach. A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning. Advances in Neural Information Processing Systems (NeurIPS), 2022. [hal, poster, slides] [Show Abstract]
Abstract: Temporal-difference learning is a popular algorithm for policy evaluation. In this paper, we study the convergence of the regularized non-parametric TD(0) algorithm, in both the independent and Markovian observation settings. In particular, when TD is performed in a universal reproducing kernel Hilbert space (RKHS), we prove convergence of the averaged iterates to the optimal value function, even when it does not belong to the RKHS. We provide explicit convergence rates that depend on a source condition relating the regularity of the optimal value function to the RKHS. We illustrate this convergence numerically on a simple continuous-state Markov reward process.
E. Berthier, J. Carpentier, A. Rudi, F. Bach. Infinite-Dimensional Sums-of-Squares for Optimal Control. Conference on Decision and Control (CDC), 2022. [hal, poster, slides] [Show Abstract]
Abstract: We introduce an approximation method to solve an optimal control problem via the Lagrange dual of its weak formulation. It is based on a sum-of-squares representation of the Hamiltonian, and extends a previous method from polynomial optimization to the generic case of smooth problems. Such a representation is infinite-dimensional and relies on a particular space of functions-a reproducing kernel Hilbert space-chosen to fit the structure of the control problem. After subsampling, it leads to a practical method that amounts to solving a semi-definite program. We illustrate our approach by a numerical application on a simple low-dimensional control problem.
|