2025-01-22 graphs lecture 1
2025-01-22
0. Intro/Class Overview
Signal processing vibes
- This class will be a bit more supervised learning
- GNNs from the very basics
- HWs = labs: pytorch, pygeometric : tutorial on Canvas
No textbook, based on research
Syllabus
3 homeworks 20% each (see syllabus on Canvas)
- project in pytorch
- covers 4 weeks of material
Final project 30% - design a graph dataset (implementation)
- or most recent papers and present (lit review)
Participation 10% - Scribing (google doc)
1. Basics
see [[Lecture 1.pdf]]
- Why data science and why large scale graphs?
- What are graphs/graph signals?
- How do graphs and graph signals interact with each other?
- Graph diffusion processes
- adjacency matrix and laplacian
- Total variation energy on graphs
- Canonical frequencies and oscillation models
What is graph data science?
- Road networks (graph is the structure of then data: data is "on top" of the graph)
- Network / partially complete network (graph and data are entangled)
- Molecules (graph itself is the datapoint)
Why large scale graphs?
- Graphs can get BIG
nodes
Genome sequencing using graphs.
- lots of tiny pieces of dna sections
- Want to find a path through all of them that makes sense
- edge when there is any overlap
A graph
(see undirected graph)
A graph
corresponds to no edge corresponds to an edge
These correspond to a binary relationship between nodes
- Friendship networks are unweighted
- Citation networks
- etc
A graph is weighted if
- Road networks, weights proportional to the lengths (traveling salesman problem)
- Correlation networks (graph induced by covariance matrix)
- etc
(see unweighted graph)
Graph signals are data that exist on a graph
There are two types of signals:
- fixed node properties or features are information associated with nodes
- ex. Nodes belong to group A or group B
- graph signals (which often implies variability) can be interpreted as variables on the nodes of the graph
- ex. traffic counts on the roads of minnesota
(see graph signals)
How do we process graph signals? How do signals
We can do this through a network diffusion process
- sometimes called message passing, aggregation
A network diffusion process is a way to process graph signals. It is a procedure or description of ways to estimate information about a graph using available information and computations.
Suppose we have a graph (see notes on Canvas and add graphic) with labelled nodes. We want to determine the labels of some nodes with unknown label.
Local (node-level) estimation or attributes with limited communications imposed by
- idea: take info from all its neighbors and average their labels
where is the neighborhood (graph) of .
Suppose
Define
We can estimate
- We assume that all traffic at node
leaves ie
Diffusion processes are local to each node and we represent them locally above.
- Can also represent these globally using matrices
The adjacency matrix for a graph
++++ (see notes)
In our above example,
We can define more general diffusion processes by defining the matrix
(can only equal 0 along the diagonal?)
This matrix is called the graph shift operator (GSO). We way
Most common are adjacency matrix and laplacian
We don't really use 5,6,7 for directed graphs
We can always recover the local implementations:
(see graph shift operator)
- adjacency matrix
- degree matrix
- graph laplacian
- random walk matrix
- random walk laplacian
- normalized adjacency matrix
- normalized graph laplacian
We usually don't use the last 3 with directed graphs.