Introduction: generating data on graphs

There are a multitude of practical reasons to perform analysis on graphs, as I’ve covered in other posts. As we’ve seen, neighborhood graphs provide a computationally convenient nonparametric tool for a host of algorithms.

Suppose we’re in a standard data analysis situation, with an \(n \times d\) matrix of data \(X\). PCA and kernel PCA are classic ways to learn some structure in the data, by decomposing the covariance matrix, i.e. performing SVD on \(X\).

The SVD learns the linear structure of \(X\) very well, but adapting to nonlinear structure is a bit more involved. Efficient and powerful algorithms for nonlinear dimensionality reduction, like diffusion maps, are based on a nearest-neighbor graph \(G\) with \(n\) nodes connected by (possibly weighted) edges, represented by an adjacency matrix \(A\). It’s interesting to compare PCA and diffusion maps because they both find the top few eigenvectors, but of different affinity-type matrices – the covariance matrix vs. the normalized adjacency matrix \(A\) of the neighborhood graph.

When these are the same, the statistics of the data are intimately linked to the structure of the graph. This opens up many applications, including:

An ultra-efficient algorithm for PCA on the data, regularized by the graph structure.
A computationally simple way to use the graph to generate data.

These methods are extremely efficient yet versatile, thanks to the magic of linear algebra on graphs.

Stationary signals on graphs

The concept of stationary signals is borrowed from signal processing. For data matrices \(X\) it links to the following central question:

What is the difference between dimensionality reduction with diffusion maps and with PCA?

and helps link the structure of the data to the structure of an affinity graph, allowing us to interpolate and generate on the data manifold.

A stationary signal is one whose statistical properties don’t change under shifts in the graph domain. Essentially, the characteristics of the signal don’t change as you traverse the nodes and edges of the graph, making it a useful property for graph signal processing, similar to the way that time invariance properties are used in traditional signal processing.

The generation process implemented here will transform white noise into a signal on the graph, generalizing well-known ideas from the analysis of time-varying signals.¹ It’s extremely efficient, requiring only some sparse matrix multiplications. This is why I wanted to implement the method, which is well-suited for large sparse graphs.

The setup here is pretty common. Let’s say we’re given a dataset (signal) \(X \in \mathbb{R}^{n \times d}\) with \(n\) points in \(d\) dimensions. We assume it’s centered, i.e. \(\sum_{i=1}^{n} \sum_{j=1}^{d} X_{ij} = 0\). Conceptually I’ll explain things with \(d=1\) (i.e. \(X \in \mathbb{R}^{n}\)), as all the methods will seamlessly vectorize for \(d > 1\).

Two definitions of stationary signals

Remember the definition of the graph Laplacian \(L_{G} := \text{diag}(\textbf{1}^\top A) - A\) (or one of its normalized versions).

Also let’s remember two covariance matrices:

\(\Sigma^{n}_{X} := \frac{1}{d} X X^{\top}\), the \(n \times n\) sample-wise matrix of \(X\)
\(\Sigma^{d}_{X} := \frac{1}{n} X^{\top} X\), the \(d \times d\) feature-wise covariance matrix of \(X\)

Stationary signals I: covariance matches the graph structure

A graph-stationary signal with respect to a graph \(G\) is a signal \(X \in \mathbb{R}^{n}\) on \(G\)’s vertices whose covariance matrix \(\Sigma_{X}^{n}\) and the graph Laplacian \(L_{G}\) have the same eigenvectors (Perraudin and Vandergheynst 2017).

Stationary signals II: constructive definition

A graph-stationary signal with respect to a graph \(G\) is a signal \(X \in \mathbb{R}^{n}\) which can be written as a polynomial in the graph Laplacian \(L_{G}\), applied to white noise. The polynomial should be of degree \(q < n\): for some coefficients \(h_0, \dots, h_{q-1}\), and a white signal \(W \in \mathbb{R}^{n}\) that is white (i.e. \(\mathbb{E}_{x}(W) = 0\) and \(\mathbb{E}_{x}(W W^{\top}) = I\)), we have (in matrix notation):

\[ X = \sum_{l=0}^{q} h_l L_{G}^{l} W \]

Equivalence of definitions

These definitions are equivalent for graph Laplacians (and some other matrices too (Marques et al. 2017)). This is very useful, as they have different strengths and weaknesses. Our data generation strategy will be based on the second (constructive) definition, but the first definition is the most useful to measure stationarity.

Measuring stationarity through coherence

Diagonalizing the covariance matrix in the basis of the graph Laplacian eigenvectors, we can get the graph spectral covariance matrix \(S\), which tells us all we need to know about how stationary a signal is.²

To understand this we have to define:

The SVD of the data as \(X = U \Sigma V^\top\), so that the eigenvectors of the covariance matrices \(\Sigma^{n}_{X}\) and \(\Sigma^{d}_{X}\) are \(U\) and \(V\) respectively.
The eigendecomposition of the graph Laplacian \(L_{G} = Q \Lambda Q^\top\).

Then the graph spectral covariance matrix \(S\) is defined as: \[ \Gamma := Q^\top \Sigma^{n}_{X} Q \]

The power spectral density³ (PSD) \(\gamma\) is the diagonal of this matrix \(\Gamma\), i.e. \(\gamma = \text{diag}(\Gamma)\). The PSD (along with the eigenvectors of the Laplacian) completely describes a stationary signal.

The degree of stationarity can be quantified by how much of the energy of this matrix lies on its diagonal: \[ \gamma (\Gamma) := \frac{||\text{diag}(\Gamma)||_{2}}{||\Gamma||_{F}} \]

This measure \(\gamma\) ranges from 0 to 1, analogous to the coherence in signal processing.

CODE

import sys, numpy as np, time, scipy
import anndata, scanpy as sc


def graph_spectral_covariance(
    datamat, 
    graph_mat, 
    sym_compute=True, 
    n_comps=40, 
    center=True
):
    """ Compute the graph spectral covariance matrix.

    Parameters
    ----------
    datamat : sparse matrix
        The data matrix, with cells as rows and genes as columns.
    graph_mat : sparse matrix
        The graph matrix, with cells as rows and cells as columns.
    sym_compute : bool
        Whether to compute the symmetric transition matrix.
    n_comps : int
        The number of components to use in the spectral decomposition.
    center : bool
        Whether to center the data matrix before computing the covariance.
    
    Returns
    -------
    gamma_mat : array
        The graph spectral covariance matrix.
    evecs : array
        The eigenvectors of the graph Laplacian.
    """
    # Find the eigenbasis of the Laplacian matrix
    # Compute the covariance matrix implicitly using a centered version of the data 
    # projected onto the eigenbasis of the Laplacian
    eigvals, evecs = compute_eigen(
        compute_transitions(graph_mat, sym=sym_compute), 
        n_comps=n_comps, sym=sym_compute
    )
    # datamat might be sparse, so we avoid centering it explicitly
    n_features = datamat.shape[1]
    eig_dot_data = datamat.T.dot(evecs)  # n_features x n_comps
    meanx = evecs.T.dot(np.ravel(datamat.mean(axis=1)))  # n_comps
    if not center:
        meanx = np.zeros_like(meanx)
    gamma_mat = (1.0/n_features)*(eig_dot_data - meanx).T.dot(eig_dot_data - meanx)
    alignment_factor = np.linalg.norm(np.diagonal(gamma_mat))/np.linalg.norm(gamma_mat)
    print("Alignment: {}".format(alignment_factor))
    return gamma_mat, evecs

CODE

a1, graphbasis = graph_spectral_covariance(dmat, gmat, n_comps=100)
# eigvals, evecs = compute_eigen(
#     compute_transitions(gmat, sym=True), 
#     n_comps=40, sym=True
# )

print( np.sqrt(np.square(np.diag(a1)).sum()/np.square(a1).sum()) )

a1base = np.median(np.square(a1))#[-1, 0]
a1db = 10*np.log10(np.square(a1)/a1base)
plt.imshow(a1db, cmap='RdBu_r')
plt.legend()
plt.show()

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

Alignment: 0.9564772268926722
0.9564772268926724

Efficient graph-infused PCA

(Jiang, Ding, and Tang 2013) describe a basic algorithm of this kind for learning a basis to describe a dataset, like PCA. This “graph-Laplacian PCA” (gLPCA) algorithm is a blend of diffusion maps and PCA. For a parameter \(\alpha \geq 0\), the algorithm takes a mixture of the covariance matrix (as in PCA) and the graph Laplacian (as in diffusion maps) and eigendecomposes this:

Compute normalized covariance matrix \(X X^{\top}\) and normalized transition matrix \(\tilde{W}\), both normalized to have top eigenvalue \(1\).
Find the top \(k\) eigenvectors of \(K := X X^{\top} + \alpha \tilde{W}\).

This solves the following reconstruction problem for the data \(X\): \[ \min_{U, Q \in \mathbb{R}^{n \times k}} || X - U Q^{\top} ||_{2}^{2} + \alpha \sum_{i, j = 1}^{n} || q_i - q_j ||^{2} \tilde{W}_{ij} \]

The first term is the usual PCA reconstruction error, and the second term is a graph Laplacian regularization term. The two are minimized by the same vectors if the data are stationary over the graph, because its Laplacian and the covariance matrix would have identical eigenvectors.

Generating stationary signals on graphs

We can use the eigenvalues and eigenvectors of the graph Laplacian to generate stationary signals on the graph.

The generation process is simple: filter white noise (independent standard Gaussians on each node) through a graph filter, modulating the spectrum of the signal (in the graph Laplacian basis) by the PSD of the data.

CODE

def gen_signal(psd, lap_mat, num_samples=100, num_components=40, return_fourier=True):
    """ Generate a signal from a power spectral density and the Laplacian matrix.

    Parameters
    ----------
    psd : array
        Length n_components vector of the power spectral density.
    lap_mat : sparse matrix
        The Laplacian matrix of the graph.
    num_components : int, optional (default: 40)
        The number of components to use in the spectral decomposition.
    num_samples : int , optional (default: 100)
        The number of samples to generate.
    return_fourier : bool, optional (default: True)
        Whether to return the graph Fourier transform of the signal.
    
    Returns
    -------
    gensig : array
        The generated signal, a length-n vector.
    components : array, optional
        The graph Fourier basis (n x n_components).
    """
    a1, graphbasis = graph_spectral_covariance(data_mat, graph_mat, n_comps=num_components)
    white_noise = np.random.standard_normal(size=(graph_mat.shape[0], num_samples))
    noise_fourier = graphbasis.T.dot(white_noise)
    filter_mat = scipy.sparse.(np.diag(a1))
    filtered_noise_fourier = filter_mat.dot(noise_fourier)
    filtered_noise = graphbasis.dot(filtered_noise_fourier)
    return filtered_noise
    # Start with white noise
    init_noise = np.random.normal(size=(lap_mat.shape[0], num_samples))
    # Partially diagonalize the Laplacian matrix to get U
    # Calculate sqrt(psd) * U.T * init_noise using graph_spectral_covariance
    return gensig

CODE

trunc_alignments = []
for i in range(a1.shape[0]):
    a2 = a1[:i, :i]
    trunc_alignments.append( np.sqrt(np.square(np.diag(a2)).sum()/np.square(a2).sum()) )

plt.plot(trunc_alignments)
plt.show()

/var/folders/5b/ps6ymxr90tj0jglr7hvc98zm0000gn/T/ipykernel_66589/162865570.py:4: RuntimeWarning: invalid value encountered in double_scalars
  trunc_alignments.append( np.sqrt(np.square(np.diag(a2)).sum()/np.square(a2).sum()) )

CODE

diag_energies = np.diag(a2)[1:20]
plt.plot(diag_energies)
plt.show()

CODE

num_samples = gmat.shape[0]
white_noise = np.random.standard_normal(size=(num_samples, 1000))
noise_fourier = graphbasis.T.dot(white_noise)
filter_mat = np.diagflat(np.diag(a1))
filtered_noise_fourier = filter_mat.dot(noise_fourier)
filtered_noise = graphbasis.dot(filtered_noise_fourier)

(209126, 1000)

References

Jiang, Bo, Chris Ding, and Jin Tang. 2013. “Graph-Laplacian PCA: Closed-Form Solution and Robustness.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3492–98.

Marques, Antonio G, Santiago Segarra, Geert Leus, and Alejandro Ribeiro. 2017. “Stationary Graph Processes and Spectral Estimation.” IEEE Transactions on Signal Processing 65 (22): 5911–26.

Perraudin, Nathanaël, and Pierre Vandergheynst. 2017. “Stationary Signal Processing on Graphs.” IEEE Transactions on Signal Processing 65 (13): 3462–77.

Footnotes

Specifically, the synthesis of time-stationary processes by linear time-invariant systems driven by white noise.↩︎
This is an analogue of the cross power spectral density between time-series signals.↩︎
Analogous to its namesake for time-series signals.↩︎