Registered Data

[00400] Bilevel optimization in machine learning and imaging sciences

  • Session Time & Room :
    • 00400 (1/2) : 2C (Aug.22, 13:20-15:00) @A618
    • 00400 (2/2) : 2D (Aug.22, 15:30-17:10) @A618
  • Type : Proposal of Minisymposium
  • Abstract : In the framework of functional minimisation approaches, the task of customising the functional expression of both the prior and the likelihood terms to the data at hand by means of a further optimisation problem has been recently popularised under the name of bilevel optimisation. In this minisymposium, we gather experts working in such field both from theoretical and algorithmic perspectives with the intent of providing an overview of how bilevel learning can be effectively employed to estimate data-adaptive regularisation and data models for both imaging and machine learning applications.
  • Organizer(s) : Luca Calatroni, Samuel Vaiter
  • Classification : 46N10, 65K10, 90C26
  • Minisymposium Program :
    • 00400 (1/2) : 2C @A618 [Chair: Luca Calatroni/Samuel Vaiter]
      • [03653] Fixed-Point Automatic Differentiation of Forward--Backward Splitting Algorithms for Partly Smooth Functions
        • Format : Talk at Waseda University
        • Author(s) :
          • Sheheryar Mehmood (University of Tuebingen)
          • Peter Ochs (Saarland University)
        • Abstract : A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We consider such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity analysis and parameter learning optmization problems. We show that under partial smoothness and other mild assumptions, Automatic Differentiation (AD) of the sequence generated by proximal splitting algorithms converges to the derivative of the solution mapping. For a variant of automatic differentiation, which we call Fixed-Point Automatic Differentiation (FPAD), we remedy the memory overhead problem of the Reverse Mode AD and moreover provide faster convergence theoretically. We numerically illustrate the convergence and convergence rates of AD and FPAD on Lasso and Group Lasso problems and demonstrate the working of FPAD on prototypical practical image denoising problem by learning the regularization term.
      • [05289] A framework for bilevel optimization that enables stochastic and global variance reduction algorithms
        • Format : Talk at Waseda University
        • Author(s) :
          • Thomas Moreau (Inria - MIND)
        • Abstract : Bilevel optimization, the problem of minimizing a value function that involves the arg-minimum of another function, appears in many areas of machine learning. In a large-scale empirical risk minimization setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables are subject to variance reduction. This allows to design near-optimal algorithms to solve the bi-level problem.
      • [03309] Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start
        • Format : Talk at Waseda University
        • Author(s) :
          • Saverio Salzo (Sapienza Università di Roma)
          • Riccardo Grazzi (Istituto Italiano di Tecnologia)
          • Massimiliano Pontil (Istituto Italiano di Tecnologia)
        • Abstract : We present a stochastic algorithm for a general class of bilevel problems consisting of a minimization problem at the upper-level and a fixed-point equation at the lower-level. This setting includes instances of meta-learning, equilibrium models, and hyperparameter optimization. The main feature of our solution is to avoid using the warm-start procedure at the lower-level, which is not always well-suited in applications, and yet to achieves order-wise optimal or near-optimal sample complexity.
      • [02768] Bilevel subspace optimisation in heterogeneous clustering for cryo-EM
        • Format : Talk at Waseda University
        • Author(s) :
          • Willem Diepeveen (University of Cambridge)
          • Carlos Esteve-Yagüe (University of Cambridge)
          • Jan Lellmann (University of Lübeck)
          • Ozan Öktem (KTH Royal Institute of Technology)
          • Carola-Bibiane Schönlieb (University of Cambridge)
        • Abstract : In heterogeneous Cryo-EM we are concerned with retrieving protein conformations from noisy 2D projection images. Attempting to solve this directly is challenging in the absence of a good prior. In recent work, it has been observed that MD simulations live on low-dimensional manifold of conformation space. Although this subspace might not be a perfect reflection of reality, it potentially yields a good prior. In this work we attempt to use this information in Cryo-EM conformation retrieval. In particular, we aim to retrieve conformations and the actual manifold from Cryo-EM data, but we want the manifold to match the MD data. We propose a bilevel optimisation approach to this problem.
    • 00400 (2/2) : 2D @A618 [Chair: Luca Calatroni/Samuel Vaiter]
      • [05645] Test like you train in implicit deep learning
        • Author(s) :
          • Pierre Ablin (Apple)
        • Abstract : Implicit deep learning relies on expressing some components of deep learning pipelines implicitly via a root equation. The training of such a model is thus a bi-level optimization. In practice, the root equation is solved using a fixed number of iterations of a solver. We discuss the effect of having a different number of iterations at test time than at train time, challenging a popular assumption that more iterations at test time improve performance.