Resources

Publications and Preprints

  • D. Brellmann, E. Berthier, D. Filliat, G. Frehse. On double-descent in reinforcement learning with LSTD and random features. International Conference on Learning Representations (ICLR), 2024.
    [arxiv] [Show Abstract]

    Abstract: Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and l2-regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define over-parameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Square Temporal Difference (LSTD) algorithm in an asymptotic regime, as both the number of parameters and states go to infinity, maintaining a constant ratio. We derive deterministic limits of both the empirical and the true Mean-Squared Bellman Error (MSBE) that feature correction terms responsible for the double descent. Correction terms vanish when the l2-regularization is increased or the number of unvisited states goes to zero. Numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.

  • R. Kazmierczak, E. Berthier, G. Frehse, G. Franchi. CLIP-QDA: An explainable concept bottleneck model. arXiv preprint arXiv:2312.00110, 2023.
    [arxiv] [Show Abstract]

    Abstract: In this paper, we introduce an explainable algorithm designed from a multi-modal foundation model, that performs fast and explainable image classification. Drawing inspiration from CLIP-based Concept Bottleneck Models (CBMs), our method creates a latent space where each neuron is linked to a specific word. Observing that this latent space can be modeled with simple distributions, we use a Mixture of Gaussians (MoG) formalism to enhance the interpretability of this latent space. Then, we introduce CLIP-QDA, a classifier that only uses statistical values to infer labels from the concepts. In addition, this formalism allows for both local and global explanations. These explanations come from the inner design of our architecture, our work is part of a new family of greybox models, combining performances of opaque foundation models and the interpretability of transparent models. Our empirical findings show that in instances where the MoG assumption holds, CLIP-QDA achieves similar accuracy with state-of-the-art methods CBMs. Our explanations compete with existing XAI methods while being faster to compute.

  • T. Berthier, E. Berthier. Mesurer la (haute) intensité d’un combat [In French], 2023, RDN 860.
    [journal] [Show Abstract]

    Abstract: Définir la haute intensité d’un combat n’est pas chose aisée. Il peut y avoir le ressenti du combattant mais aussi la prise en compte de données factuelles. Celles-ci vont être combinées avec des facteurs de variabilité, permettant ainsi de quantifier cette notion d’intensité. Cette approche mathématique est aujourd’hui nécessaire au regard des évolutions.

  • E. Berthier. Efficient algorithms for control and reinforcement learning [PhD Thesis], 2022, ENS Paris, PSL Research University, France.
    [manuscript] [Show Abstract]

    Abstract: Reinforcement learning describes how an agent can learn to act in an unknown environment in order to maximize its reward in the long run. It has its origins in the field of optimal control, as well as in some works in psychology. The increase in computational power and the use of approximation methods such as neural networks have led to recent successes, in particular in the resolution of games, yet without systematically providing theoretical guarantees. As for the field of optimal control, for which a model of the environment is provided, it has known solid theoretical developments since the 1960s, with numerical tools that have proven useful in many industrial applications. Nevertheless, the numerical resolution of high dimensional nonlinear control problems, which are typically encountered in robotics, remains relatively open today. In this thesis, we develop and analyze efficient algorithms, when possible with theoretical guarantees, for control and reinforcement learning. We show that, even though they are formulated differently, these two problems are very similar. We first focus on the discretization of continuous state deterministic Markov decision processes, by adapting a method developed for continuous time control. Then we propose a method for fast estimation of stability regions applicable to imperfectly known high-dimensional dynamical systems. We then generalize an algorithm for solving control problems derived from polynomial optimization, to non-polynomial systems known through a finite number of observations. For this, we use a sum-of-squares representation of smooth positive functions from kernel methods. Finally, we analyze a classical algorithm in reinforcement learning, the temporal-difference learning algorithm, in its non-parametric version. In particular, we insist on the link between the temporal-difference learning algorithm and the stochastic gradient descent algorithm, for which many convergence results are known.

  • E. Berthier, Z. Kobeissi, F. Bach. A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning. Advances in Neural Information Processing Systems (NeurIPS), 2022.
    [hal, poster, slides] [Show Abstract]

    Abstract: Temporal-difference learning is a popular algorithm for policy evaluation. In this paper, we study the convergence of the regularized non-parametric TD(0) algorithm, in both the independent and Markovian observation settings. In particular, when TD is performed in a universal reproducing kernel Hilbert space (RKHS), we prove convergence of the averaged iterates to the optimal value function, even when it does not belong to the RKHS. We provide explicit convergence rates that depend on a source condition relating the regularity of the optimal value function to the RKHS. We illustrate this convergence numerically on a simple continuous-state Markov reward process.

  • E. Berthier, J. Carpentier, A. Rudi, F. Bach. Infinite-Dimensional Sums-of-Squares for Optimal Control. Conference on Decision and Control (CDC), 2022.
    [hal, poster, slides] [Show Abstract]

    Abstract: We introduce an approximation method to solve an optimal control problem via the Lagrange dual of its weak formulation. It is based on a sum-of-squares representation of the Hamiltonian, and extends a previous method from polynomial optimization to the generic case of smooth problems. Such a representation is infinite-dimensional and relies on a particular space of functions-a reproducing kernel Hilbert space-chosen to fit the structure of the control problem. After subsampling, it leads to a practical method that amounts to solving a semi-definite program. We illustrate our approach by a numerical application on a simple low-dimensional control problem.

  • E. Berthier, J. Carpentier, F. Bach. Fast and Robust Stability Region Estimation for Nonlinear Dynamical Systems. 2021 European Control Conference (ECC) , 2021.
    [hal, slides] [Show Abstract]

    Abstract: A linear quadratic regulator can stabilize a nonlinear dynamical system with a local feedback controller around a linearization point, while minimizing a given performance criteria. An important practical problem is to estimate the region of attraction of such a controller, that is, the region around this point where the controller is certified to be valid. This is especially important in the context of highly nonlinear dynamical systems. In this paper, we propose two stability certificates that are fast to compute and robust when the first, or second derivatives of the system dynamics are bounded. Associated with an efficient oracle to compute these bounds, this provides a simple stability region estimation algorithm compared to classic approaches of the state of the art. We experimentally validate that it can be applied to both polynomial and non-polynomial systems of various dimensions, including standard robotic systems, for estimating region of attractions around equilibrium points, as well as for trajectory tracking.

  • E. Berthier, F. Bach. Max-Plus Linear Approximations for Deterministic Continuous-State Markov Decision Processes. IEEE Control Systems Letters, 4(3):767-772, 2020.
    [hal, journal, slides] [Show Abstract]

    Abstract: We consider deterministic continuous-state Markov decision processes (MDPs). We apply a max-plus linear method to approximate the value function with a specific dictionary of functions that leads to an adequate state-discretization of the MDP. This is more efficient than a direct discretization of the state space, typically intractable in high dimension. We propose a simple strategy to adapt the discretization to a problem instance, thus mitigating the curse of dimensionality. We provide numerical examples showing that the method works well on simple MDPs.

  • E. Berthier. Protection des données d'entraînement pour l'apprentissage statistique [In French], 2019, Conférence Intelligence Artificielle et Défense.
    [pdf] [Show Abstract]

    Abstract: Les modèles d'apprentissage statistique sont susceptibles d'exposer les données qui ont été utilisées lors de leur entraînement. Ce phénomène doit être pris en compte pour qualifier le niveau de sensibilité d'un modèle. La notion de confidentialité différentielle, créée à l'origine pour la protection de la vie privée, répond partiellement à cette problématique. En particulier, il est possible d'adapter le processus d'apprentissage de façon à vérifier certaines propriétés de confidentialité. Lorsque les données sensibles sont distribuées sur plusieurs machines, des processus cryptographiques permettent d’entraîner conjointement un modèle sans en partager les données d’entraînement.

  • E. Berthier. Differential Privacy for Machine Learning [Master's Thesis], 2019, EPFL, Lausanne, Switzerland.
    [pdf, poster] [Show Abstract]

    Abstract: Machine learning algorithms can leak private information contained in particular training data. Differential privacy ensures that an algorithm does not rely too strongly on any individual data point. Differentially private machine learning can be achieved by injecting noise in the training process. In particular, the privacy of DP-SGD has already been well-studied, yet only in the case where each training example is sampled with replacement. We focus on the more practical case of sampling without replacement, or shuffling, and try to provide privacy guarantees for this algorithm. We also explore possible relaxations of differential privacy.

  • O. Kempf, E. Berthier. IA, explicabilité et défense [In French], 2019, RDN 820 - L'Intelligence artificielle et ses enjeux pour la Défense.
    [journal, synopsis] [Show Abstract]

    Abstract: L’IA est une réalité déjà ancienne mais son champ d’emploi ne cesse de s’élargir et accapare des domaines nouveaux, en particulier pour la défense. L’IA est polymorphe et se retrouve confrontée à un problème d’explicabilité. Pourquoi et comment sont les questions qui se posent pour les applications liées au contexte militaire. AI is in itself old news but its fields of application never cease to expand and capture new ones, particularly in the defence domain. AI takes on many forms and faces a problem of how it should be described. Why? and how? are the questions to be asked about those applications with a military connection.

Presentations & Outreach

  • On March 23th 2023, I was invited to present our CDC 2022 paper at Avignon Université.

  • I have defended my thesis on Thursday, October 27th 2022, at Inria Paris. You can download the manuscript. You can also have a look at the slides.

  • On June 17th 2022, I gave an invited talk at the CANUM 2020 (45th Congrès National d'Analyse Numérique) in Evian. This was in a mini-symposium dedicated to numerical methods for Hamilton-Jacobi equations and mean-field games.

  • On June 3rd 2022, I gave a talk at the SMAI-MODE days in Limoges. SMAI-MODE is the subgroup of the French Applied and Industrial Mathematics Society dedicated to optimization and decision.

  • On October 28th 2021, I gave a talk at the CJC-MA seminar, at Ecole polytechnique, Palaiseau. CJC-MA stands for ‘‘Congrès des Jeunes Chercheuses et Chercheurs en Mathématiques Appliquées". It is its 1st edition, and will hopefully be organized each year in a different location by PhD students in applied mathematics.

  • In October 2020, with Clémentine Fourrier, we co-organized the RJMI (Rendez-vous des Jeunes Mathématiciennes et Informaticiennes), at Inria Paris, with the support of Animath. This year's challenge was to set up a hybrid in-person & online event, which received great feedback from the participants!

  • From February 2020 to December 2021, we co-organized Inria's Junior Seminar with Denis Merigoux, a monthly seminar which allows PhD students, interns & post-docs to present their work, through easily understandable talks, so that anyone can attend. This was an online - hybrid - in person event, depending on the circumstances.

  • In October 2019, I had the opportunity to co-organize the RJMI (Rendez-vous des Jeunes Mathématiciennes et Informaticiennes), at Inria Paris. During two days, a small group of female high school students are offered to meet researchers, attend research talks, and work on challenging math and computer science problems. This event is meant to promote scientific careers for women and prevent self-censoring. Stay tuned for next year's edition!

  • Poster Presentation at Prairie Artificial Intelligence Summer School P.A.I.S.S., Paris, October 2019.

  • Junior Organizing Commitee & Poster Presentation at Paris-Saclay Junior Conference on Data Science and Engineering #JSDE2019, Saclay, September 2019.

Miscellaneous