2025-02-18 equivariant lecture 3

[[lecture-data]]

2025-02-18

Readings

  • a
 

Summary

Today

n. Subject

Invariant functions on sets Zaheer et al 2017 Goals:

  1. Define a class of functions that can deal with sets/multisets as inputs
  2. Translate this class of functions into an efficient neural network
  3. Show that it works well (state-of-the-art) for several real-world applications

Sets/multisets for all permutations +++ see notes We can think of these as , or unordered tuples from the reals

example

Motivation

  • sometimes the order is not important for the “downstream task”
  • Applications in the paper
    • learning to estimate pop stats (mean, variance)
    • learning to sum digits from images
    • point cloud classification
    • outlier detection
    • predicting a set of tags (for text)

prior implementations of invariant functions on these sets

Invariant polynomials

  • is poly that can be written as combo of elementary symmetric polynomials

Equivalently is also a basis/generator for my symmetric polynomials +++++ see notes

This requires polynomials of degree .

Another way to describe them

Canonicalization

Given , find that sorts the entries such that A function is invariant then is a function of the sorted entries

choose a “representative” or “template” of my tuple

Deep Sets Paper

Input where is countable. Consider space of functions is permutation invariant if and only if can be decomposed as for suitable and

  • sum does not need to be a sum. Just needs to be a permutation invariant function applied to each element in the set

(can think of this as a layer in a neural network)

Wait whoa

  • Can write an invariant function as a sum of just one feature per element

Previous polynomial representation, we needed features per element.

  • will see in proof that perhaps we can get simpler representations by using a higher dimensional latent space ()

Note in 2nd lecture

  • if and is finite then (group averaging)

Here, , permutations in objects. So which is a huge set

But Deep Seets tells us that we only need to sum over objects instead of

Proof (sketch) Suppose we have a function such that is permutation invariant to permutation : (not the cleanest version, but first paper showing this)

  1. use to encode each possible input into a real number
    • injective over the whole set of sets
  2. learn that executes that function on the set

Since is countable, there exists a bijection . We can encode each subset of (ie each element of )

Let . We can encode as a binary sequence such that

  • this gives us a real number per orbit and then learning an invariant function is the same as learning the function in each orbit representative

There are also better theoretical proofs of the same thing Tabaghi and Wang 2024

  • sets of elements in

If this function is invariant (permutation invariant to the elements in the set), then there exists some and such that and can be chosen such that

architecture (++++++see notes)

point clouds

invariant functions on point clouds

motivation:

  • emulation of cosmological simulations
  • galaxy properties preditions from dark matter using only simultations “regression” framework for these types of movements ++++ see notes

points in dimensions

  • invariant to translations
  • rotations
  • reflections

invariant and permutation invariant

translations

  • center point cloud (translation invariance w sorting getting permutation invariance - choosing a representative from our orbit)

permutations Want and

  • is invariant
  • where is gram matrix
  • invariant by permutations acting by conjugations
  • symmetry of graphs that we saw in the first class
    • Graph neural networks (cover GNNs in next lectures)

want functions that are invariant to permutations of rows and columns

Why invariant theory?

  1. construct the graph with complexity, we’ll be able to learn invariant functions with complexity
  2. ++++ see notes
  3. Galois theory - interesting mathematical properties

Galois Theory (setup) is acting by conjugation (permuting rows and columns)

  • always sending diag to diag and off-diag to off-diag
  • (permuting rows and columns at the same time)

the above is a subgroup of (which are off-diag to off-diag and diag to diag independently)

++++see notes is invariant then if is S_{n} \times S_{n\choose_{2}} invariant also characterize ++++ notes

Galois theory approach

  • idea: if we construct a set of invariants that are only fixed by the desired group (ie no other bigger group fixes all the invariants) then those invariants generate the field of invariant functions
  • (fundamental theorem rephrased) Rosenlicht (1956) the orbits that the field generators fail to uniquely identiyfy (bad) has to satisfy that a finite set of equations (bad set is measure zero)

  • field extension “ties up” the permutation of diagonals and off diagonals
    • constrains the consistency of the permutation between diagonals and off diagonals
    • this generates the field of invariant functions
  • can construct a polynomial that satisfies this property

++see notes

that is the field generated by and and the sets are field generators for the space. And note that is algebraically independent from .

  • separating orbits under group actions is considered

(galois theory is not a necessary argument, uses easier argument - construct set of invariants that are only fixed by desired group, then can separate orbits almost everywhere)

using Deep Sets universally approximate invariant functions on symmetric function sets outside an invariant, closed, zero-measure “bad set” where orbits are not separable

From to invariant features

  • consider “centers” that is and S_n invariant
    • ex -means centroids ordered by 2norm

Invariant machine learning model with features : and this is invariant

Theorem (Huang Blum-Smith 2024 etc)

  • the machine learning model above can universally approximate invariant functions on symmetric function sets outside an invariant, closed, zero-measure “bad set” where orbits are not separable