Laplacian regularization and trend filtering

graphs

statistics

Using graphs to regularize learning

Author

Akshay Balsubramani

Modified

February 4, 2023

Regularizing learning with graphs

Here, I’ll go over a family of useful and efficient methods, for using a given network structure over the data to regularize learning. These techniques flexibly allow us to incorporate prior knowledge about the data into the learning process. They can be used to improve performance, to learn more interpretable predictors from less data, and more. I’ll demonstrate this on single-cell gene expression data from the GTEx consortium (as I’ve written about elsewhere).

CODE

import sys, numpy as np, time, scipy
import requests
import matplotlib.pyplot as plt
%matplotlib inline

import anndata, scanpy as sc
import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns, os, sys, scipy.sparse as sp
sc.settings.verbosity = 0

# fname = 'GTEx_8_tissues_snRNAseq_atlas_071421.public_obs.h5ad'
# url_str = f'https://storage.googleapis.com/gtex_analysis_v9/snrna_seq_data/{fname}'
# response = requests.get(url_str)
# with open(file_path, 'wb') as fp:
#     fp.write(response.content)
# adta = sc.read(file_path)
# adta = sc.pp.subsample(adta, n_obs=int(num_subsample), random_state=random_state, copy=True)

adta = sc.read('../GTEx_8_snRNAseq_atlas_30k.h5ad')
sc.pp.neighbors(adta)
sc.tl.umap(adta)

I’ll be predicting the cell type label of the cells, a 44-way classification problem.¹ Cell type labels are typically manually annotated by human scientists examining gene expression profiles; clearly, this does not scale to the thousands of cells in any modern single-cell experiment. So this imputation of annotations is relevant to try in a few-shot setting.

I’ll create train/test splits based on the label annotations. The label strategy is to group cells into clusters based on their gene expression profiles, annotate each cluster with a label, and classify the remaining cells with the labels. This is a few-shot semi-supervised learning setting that really tests the ability of the method to generalize to new data based on the graph structure.

First, some utility code here is useful to get the indices of one cell in each cluster.

CODE

def index_of_each_class(class_arr):
    indices_encountered = []
    elements_encountered = []
    for i in range(len(class_arr)):
        if class_arr[i] not in elements_encountered:
            elements_encountered.append(class_arr[i])
            indices_encountered.append(i)
    return indices_encountered


num_labeled = 500

split_lst = np.array(['unlabeled']*adta.shape[0])

# adta.obs['split'][np.random.choice(len(adta.obs['split']), size=num_labeled, replace=False)] = 'labeled'
# # The cells also have a `subtissue` label, which is `nan` for some cells for which no subtissue can be denoted. Using tissue labels on these cells, we'll try to predict the rest.

labeled_mask = np.array([False]*adta.shape[0])
#labeled_mask = adta.obs['Broad cell type'] == 'Unknown'
#labeled_mask = adta.obs['Tissue composition'] == 'Other'
labeled_mask[index_of_each_class(adta.obs['leiden'])] = True
#labeled_mask = (np.random.rand(adta.shape[0]) < num_labeled/adta.shape[0])

split_lst[labeled_mask] = 'labeled'
adta.obs['split'] = split_lst

print( "{} labels gathered, one for each of {} clusters".format(
    np.sum(labeled_mask), len(np.unique(adta.obs['leiden'])) ))

45 labels gathered, one for each of 45 clusters

CODE

#sc.pl.umap(adta, color=['prep'], title='Data split')
sc.pl.umap(adta, color=['leiden'], title='Clusters')
sc.pl.umap(adta, color=['Broad cell type'], title='Cell type label')

/Users/akshay/opt/anaconda3/envs/env-chemML/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(

/Users/akshay/opt/anaconda3/envs/env-chemML/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(

Label propagation (regularized labels)

Like the harmonic functions method, the label propagation algorithm of (Zhou, Bousquet, et al. 2003) is a classic method for manipulating a vector of values (labels) on nodes of a graph. It propagates the label signal from each node to its neighbors, converging to a smoothed version of the labels over the graph.

Algorithm

The node labels are given by a matrix \(Y \in [0,1]^{n \times |\mathcal{Y}|}\), with each labeled node being a one-hot encoded row, and unlabeled rows zeroed out.

To fill in the entries of \(Y\), the method of (Zhou, Bousquet, et al. 2003) takes a parameter \(\alpha \in (0,1)\), computes the normalized adjacency matrix \(R = D^{-1/2} A D^{-1/2}\), and iterates the following until convergence to solve for prediction scores \(F\): \[ \alpha R F + (1-\alpha) Y \mapsto F \] When this has converged, predictions are made by thresholding the resulting scores \(F\). The iteration always converges, regardless of how \(F\) is initialized! As before, the convergence is exponential, so it just takes a few iterations in practice.

CODE

def label_propagation(
    labeled_signal, # (n x |Y|) matrix with unlabeled rows set to arbitrary values.
    adj_mat, # (n x n) adjacency matrix
    labeled_ndces, # Boolean mask indicating which cells are in the labeled set.
    param_alpha=0.8, 
    method='iterative', 
    eps_tol=0.01   # Min. relative error in consecutive iterations of F before stopping (normally <= 20 iterations)
):
    """
    From Zhou et al. 2003 "Learning with local and global consistency".
    Returns an n-vector of real-valued relative confidences.
    """
    labeled_signal[~labeled_ndces, :] = 0
    dw_invsqrt = scipy.sparse.diags(
        np.reciprocal(np.sqrt(np.ravel(adj_mat.sum(axis=0))))
    )
    R = dw_invsqrt.dot(adj_mat).dot(dw_invsqrt)
    F = labeled_signal.copy()
    if scipy.sparse.issparse(F):
        F = F.toarray()
    cumu = []
    if method == 'iterative':
        stop_crit = False
        while not stop_crit:
            F_new = np.array((param_alpha*R.dot(F)) + ((1-param_alpha)*labeled_signal))
            rel_err = np.square(F_new - F).sum()/np.square(F_new).sum()
            F = F_new
            cumu.append(F)  # np.argmax(F, axis=1)
            if rel_err <= eps_tol:
                stop_crit = True
        cumu.append(F)  # np.argmax(F, axis=1)
        return cumu
    elif method == 'direct':
        return scipy.sparse.linalg.lsmr(scipy.sparse.identity(R.shape[0]) - param_alpha*R, labeled_signal)

Motivation: smoothness-regularized optimization

\(\alpha\) is the only parameter, and controls the tradeoff between respecting the known labels \(Y\) and the graph structure \(R\).

\(\alpha \to 0\): Respects known labels, ignores graph structure
\(\alpha \to 1\): Respects graph structure, ignores known labels

This algorithm is exactly equivalent to solving a least-squares problem, with the label predictions regularized to be “smooth” with respect to the graph structure.

More specifically, defining a parameter \(\mu\) such that \(\alpha = \frac{1}{1 + \mu}\), the iteration converges to an \(F^*\) which solves an optimization problem:

\[ \min_{F \in \mathbb{R}^{n \times |\mathcal{Y}| }} \mu \underbrace{ \left\lVert Y - F \right\rVert^2 }_{loss} + \underbrace{F^\top (I - R) F}_{smoothness} \]

Here the “loss” term forces the predictions \(F\) to be close to the labels \(Y\), while the “smoothness” term forces the predictions to vary smoothly over the graph, so that the differences between neighboring nodes are not large.

Worked example

Label propagation can be run on this graph, with these labels. We’ll use a one-hot encoding for label_mat, as is standard practice.

CODE

from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder()
label_arr = adta.obs[['Broad cell type']]

label_mat = ohe.fit_transform(label_arr)
all_labels = label_mat.copy()
label_mat[adta.obs['split'] == 'unlabeled', :] = 0
label_mat.eliminate_zeros()   # Just for cleanup

/Users/akshay/opt/anaconda3/envs/env-chemML/lib/python3.10/site-packages/scipy/sparse/_index.py:146: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  self._set_arrayXarray(i, j, x)

CODE

cumuvals = label_propagation(label_mat, adta.obsp['connectivities'], np.array(adta.obs['split'] == 'labeled'), param_alpha=0.999)

real_labels = np.ravel(np.argmax(all_labels, axis=1))
for i in range(len(cumuvals)):
    ml = np.ravel(np.argmax(cumuvals[i], axis=1))
    print( "Iteration {}: \tAccuracy = {}".format(i, (real_labels == ml)[adta.obs['split'] == 'unlabeled'].mean() ))

Iteration 0:    Accuracy = 0.0370889667835086
Iteration 1:    Accuracy = 0.17656484727090635
Iteration 2:    Accuracy = 0.39365715239525956
Iteration 3:    Accuracy = 0.541745952261726
Iteration 4:    Accuracy = 0.6264730428976798
Iteration 5:    Accuracy = 0.6613920881321983
Iteration 6:    Accuracy = 0.6762477048906693
Iteration 7:    Accuracy = 0.685561675847104
Iteration 8:    Accuracy = 0.685561675847104

For a 45-class problem in a few-shot setting (given only 45 labels corresponding to the clusters), 68% accuracy is not bad!

Using label propagation for ranking

So the label propagation algorithm is a useful way to regularize learning, and can be used for classification. Another key application is ranking, which is a very common task in machine learning, and for which a neighborhood graph-based solution is cheap and effective.

Connections to Personalized PageRank

Label propagation is identical to using the Personalized PageRank (PPR) kernel (see (Page et al. 1999)), which is formulated with the concept of random walks on the graph that are biased towards the labels.

The paper (Zhou, Weston, et al. 2003) formalizes these connections, and discusses how to use label propagation for ranking. Ranking involves dealing with a search query – a single data point, or short list or points – and quickly returning a ranked list of the most relevant points to that query.

The intuition for using this label propagation algorithm for ranking is very simple:

The graph is over all data, both queries and non-queries.
The labels are the queries, with label 1, and the unlabeled points are the remaining points.
Apply label propagation to get a ranking score for all points.

This can also be viewed as a case of positive-unlabeled learning (Kiryo et al. 2017). In the words of the original paper (Zhou, Weston, et al. 2003):

“We… assign a positive ranking score to each query and zero to the remaining points which are ranked with respect to the queries. All points then spread their ranking score to their nearby neighbors via the weighted network. The spread process is repeated until a global stable state is achieved, and all points except queries are ranked according to their final ranking scores.”

These connections have been explored much more recently in the context of graph neural networks, for which this PPR kernel regularizes learning effectively (Klicpera, Bojchevski, and Günnemann 2019). So this identical linear algebraic iteration continues to find algebraic use in the GNN context.

In animated form

The quick convergence of the iteration is nicely illustrated with an animation. I’ll pick 2 points to be the query, and highlight the diffusion of ranks from these two seed query points.

CODE

carr = np.zeros((adta.shape[0], 1))

# Arbitrary points as the initial query
carr[0, 0] = 1
carr[1, 0] = 1

itime = time.time()
cumuvals = label_propagation(carr, adta.obsp['connectivities'], np.ravel(carr == 1), param_alpha=0.999)

I’ll import the animation code from an earlier post.

CODE

import requests
url = 'https://raw.githubusercontent.com/b-akshay/blog-tools/main/create_animation.py'
r = requests.get(url)

# make sure your filename is the same as how you want to import 
with open('create_animation.py', 'w') as f:
    f.write(r.text)
from create_animation import *
coords = adta.obsm['X_umap']
animation = create_animation(coords, cumuvals)

iter: [0/8]

CODE

itime = time.time()
print("Time taken to produce html animation: {}".format(time.time() - itime))

#HTML(animation.to_html5_video())
HTML(animation.to_jshtml())

Time taken to produce html animation: 2.3126602172851562e-05

Feature propagation for imputation on graphs

There’s an incredibly simple algorithm for feature imputation (Rossi et al. 2022), based on diffusion on graphs in a very similar way. The idea is to use the graph Laplacian to propagate the known feature values of nodes to missing ones, clamping known feature values. This is quite a strong heuristic baseline for imputation, and it can be used in any context where we have a graph and some features on the nodes.

The idea is to iteratively update the feature matrix \(X\) by propagating from known features to unknown ones, while clamping the known values at all times. This uses a symmetric normalized Laplacian kernel \(\tilde{A}\) (which I’ve written about in more detail elsewhere).

Credit: https://towardsdatascience.com/learning-on-graphs-with-missing-features-dd34be61b06

As the pseudocode suggests, it’s very easy to implement – I’ll write code to finish my explanation of it.

CODE

def compute_transitions(adj_mat, alpha=1.0, sym=True, normalize=True):
    """Compute symmetrized transition matrix. 
    alpha : The exponent of the diffusion kernel. 
    * 1 = Laplace-Beltrami (density)-normalized [default]. 
    * 0.5 = normalized graph Laplacian (Fokker-Planck dynamics, cf. "Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators" NIPS2006).
    * 0 = classical graph Laplacian. 
    """
    similarity_mat = 0.5*(adj_mat + adj_mat.T)
    dens = np.asarray(similarity_mat.sum(axis=0))  # dens[i] is an estimate for the sampling density at point i.
    K = scipy.sparse.spdiags(np.power(dens, -alpha), 0, similarity_mat.shape[0], similarity_mat.shape[0])
    W = scipy.sparse.csr_matrix(K.dot(similarity_mat).dot(K))
    if not normalize:
        return W
    else:
        z = np.sqrt(np.asarray(W.sum(axis=0)).astype(np.float64))    # sqrt(density)
        recipsqrt = np.reciprocal(z)
        recipsqrt[np.isinf(recipsqrt)] = 0
        Zmat = scipy.sparse.spdiags(recipsqrt, 0, W.shape[0], W.shape[0])
        return Zmat.dot(W).dot(Zmat) if sym else Zmat.power(2).dot(W)


def feature_propagation(adj_mat, feature_mat, missing_values, eps_tol=1e-5):
    """Feature propagation on a graph, given its adjacency matrix and feature matrix.

    Parameters
    ----------
    adj_mat : sparse matrix
        Adjacency matrix of the graph.
    feature_mat : sparse matrix
        (n x d) matrix of features for each vertex.
    missing_values : sparse matrix
        (n x d) {0,1} matrix of missing values. 1 = missing, 0 = observed.
    eps_tol : float , optional
        Tolerance for convergence of feature propagation (in Frobenius norm). Defaults to 1e-5.
    
    Returns
    -------
    feature_mat : sparse matrix
        (n x d) matrix of features for each vertex, after feature propagation.
    """
    diffusion_kernel = compute_transitions(adj_mat, alpha=0.5, sym=True, normalize=False)
    new_features = feature_mat.copy()
    new_features[missing_values == 1] = 0
    resid = 1
    while (resid < eps_tol):
        new_features = diffusion_kernel.dot(new_features)
        new_features[missing_values == 0] = feature_mat[missing_values == 0]
        resid = scipy.sparse.linalg.norm(new_features - feature_mat)
    return new_features

Now let’s see how this works on the GTEx data. Evaluating imputation methods is itself a real challenge (Hou et al. 2020).

CODE

# See how this works on the GTEx data.

adta.X

<30000x17695 sparse matrix of type '<class 'numpy.float32'>'
    with 25073163 stored elements in Compressed Sparse Row format>

Graph-based denoising with trend filtering

Trend filtering is an efficient and accurate way to denoise a signal over vertices of a graph. It doesn’t rely on any particular basis being learned beforehand, but just penalizes the signal where it varies too irregularly along the graph structure. The wonderful work of (Wang et al. 2015) provides the standard algorithm for this problem.

Scenario

Let’s say we have a graph \(G = (V,E)\) with \(n\) vertices, and \(y\) is an observed signal over the graph that we want to denoise. The \(k^{th}\) order trend filtering estimator is an interpretable way to denoise \(y\) by penalizing its \(k^{th}\) order differences, encouraging \(y\) to be an order-\(k\) polynomial over paths on the graph.

\[ y^{*}_{TF} := \arg\min_{\beta \in \mathbb{R}^n} \left[ \frac{1}{2} \| y - \beta \|^2 + \lambda \| \Delta^{(k+1)} \beta \|_1 \right] \]

Here, \(\Delta^{(k+1)}\) is the \((k+1)\)-th order difference operator. For sequences of numbers in one dimension, this is easy to appreciate; for instance, here are the second differences of a sequence:

It’s clear that the \(k^{th}\)-order difference around a point – the discrete derivative – incorporates information from its neighbors at distance \(k\) (i.e. \(k\) “hops” from the point). Similarly, \(\Delta^{(k+1)}\) is a discrete differential operator on the graph that incorporates differences between each node and its \(O(k)\)-hop neighbors.

First-order filtering: piecewise linear functions

First-order trend filtering of a signal tries to find a piecewise linear function over the graph which nicely approximates the signal. This is a very interpretable way to denoise a signal when implemented correctly. I’ll illustrate with an example.

CODE

def lap_genlasso_graph(
    signal, 
    adj_mat, 
    step_size_mult=0.5, 
    step_gap=5, 
    max_steps=None, 
    normalize=False, 
    stop_mult=3, 
    verbose=False
):
    """
    Stagewise regression that fits a signal as piecewise linear between a few source/sink cells. 
    Outputs the entire regularization curve, not just the 
    
    Parameters
    ---------- 
        signal : array
            A function on vertices
        adj_mat : sparse matrix
            (n x n) adjacency matrix of graph
    
    Returns
    -------
        toret : list of arrays
            List of vectors representing sparse source/sink maps.
    
    Details: Runs graph trend filtering as in http://www.jmlr.org/papers/volume17/15-147/15-147.pdf , 
    with a stagewise update (Eq. 44 of https://www.stat.cmu.edu/~ryantibs/papers/stagewise.pdf ) .
    """
    lapmat = laplacian_op_graph(adj_mat, normalize=normalize)
    toret = []
    laplacians = []
    approx_errors = []
    sink_volumes = []
    ctr = 0
    approx_sig = signal
    signorm = np.sum(np.square(signal))
    init_vol = np.sum(np.abs(lapmat.dot(approx_sig)))
    secant_slope = 1.0*init_vol/signorm
    print("Starting fit. Signal norm: {}".format(signorm))
    last_err = 0
    last_vol = init_vol
    while ((max_steps is None) or (ctr < max_steps)):
        approx_error = np.sum(np.square(signal - approx_sig))
        sink_volume = np.sum(np.abs(lapmat.dot(approx_sig)))
        step = lapmat.T.dot(np.sign(lapmat.dot(approx_sig)))
        approx_sig = approx_sig - (step_size_mult * (1.0/len(step)) * step)
        if ((last_vol == sink_volume) and (approx_error - last_err)):
            slope = 100*secant_slope
        else:
            slope = (last_vol - sink_volume)/(approx_error - last_err)
        last_err = approx_error
        last_vol = sink_volume
        if ctr%step_gap == 0:
            toret.append(approx_sig)
            laplacians.append(lapmat.dot(approx_sig))
            if verbose:
                print("Error: {}\t\t Source/sink volume: {}".format(approx_error, sink_volume))
            approx_errors.append(approx_error)
            sink_volumes.append(sink_volume)
        if slope < stop_mult*secant_slope:
            break
        ctr += 1
    return toret, np.array(approx_errors), np.array(sink_volumes), signorm, laplacians

This returns a regularization path, which is extremely interpretable as an animation.

CODE

coords = adta.obsm['X_umap']
animation = create_animation(coords, cumuvals)

Second-order filtering

CODE

def lapreg_lsq(
    labeled_signal, # n-vector with labeled values set and unlabeled values zeroed. 
    adj_mat, 
    param_alpha=0.3, # Controls tradeoff between square loss on labeled data (weight 1-\alpha) and smoothness over graph (weight \alpha)
    num_iter='auto', 
    reg_norm='l2', # One of ['l2', 'l1']. Following the discussion in Sec. 2.4 of the paper.
    eps_tol=0.01 # Min. relative error in consecutive iterations of F before stopping (normally <= 20 iterations)
):
    """
    Given a graph and a continuous label signal over its first L vertices, impute the signal 
    using Laplacian regularization. Returns n-vector predicting signal over all vertices.
    """
    dw_invsqrt = scipy.sparse.diags( np.reciprocal(np.sqrt(np.ravel(adj_mat.sum(axis=0)))) )
    R = dw_invsqrt.dot(adj_mat).dot(dw_invsqrt)
    F = labeled_signal.copy()#.toarray()
    cumu = []
    if num_iter == 'auto':
        stop_crit = False
        while not stop_crit:
            if reg_norm == 'l2':
                F_new = np.array(
                    (param_alpha*R.dot(F)) + ((1-param_alpha)*np.array(labeled_signal))
                )
            if reg_norm == 'l1':   # 2o trend filtering
                F_new = np.array(
                    (param_alpha*R.dot(np.sign(R.dot(F))) ) + ((1-param_alpha)*np.array(labeled_signal))
                )
            rel_err = np.square(F_new - F).sum()/np.square(F_new).sum()
            F = F_new
            cumu.append(F)
            if rel_err <= eps_tol:
                stop_crit = True
    cumu.append(F)
    return cumu

The generalized lasso: using graphs for regularization

There’s a versatile framework for using graphs to regularize a regression problem. It’s called the generalized lasso (Tibshirani and Taylor 2011); the idea is to use any given graph to impose a penalty on the size of the coefficients. The method is useful when the graph structure is known to be relevant to the problem.

When instantiated in different ways, the framework recovers many usefully interpretable regressions.

It can be used to penalize the size of the coefficients and encourage coefficient sparsity, as in the standard Lasso.
It can be used to impose sparsity and smoothness on the coefficients in groups, as in the group Lasso. These groups can be defined by the graph structure.

I’ll show how to solve either the problem or the dual (pg. 12 of (Tibshirani and Taylor 2011) ), deriving subgradient by soft thresholding. The solution is uniquely interpretable given the graph structure \(A\) (Ali and Tibshirani 2019).

Duality

Convex problems like this are nice for a couple of reasons. Of course, gradient descent and other efficient iterative optimization methods are guaranteed to converge nicely. Furthermore, there’s a completely equivalent corresponding “dual problem”, which is often easier to solve.

\[\begin{align*} \min_{\beta \in \mathbb{R}^n} \frac{1}{2} \| y - \beta \|^2 + \lambda \| \Delta^{(k+1)} \beta \|_1 &= \min_{\beta \in \mathbb{R}^n : \| \Delta^{(k+1)} \beta \|_1 \leq t} \frac{1}{2} \| y - \beta \|^2 \stackrel{(a)}{=} \min_{ u \in \mathbb{R}^{|E|} : \| u \|_{\infty} \leq \lambda} \frac{1}{2} \| y - \Delta^{(k+1) \top} u \|^2 \end{align*}\]

where \((a)\) is by convex (Fenchel/Lagrange) duality. The quantity \(\| \Delta^{(k+1)} \beta \|_1\) is the \(k^{th}\) order difference operator of the signal; this should be low for low-order polynomial signals.

We can verify component-wise that \(u\) will be binary \(\pm \lambda\) unless the corresponding component of the primal \(\Delta^{k+1} \beta\) is slack at the solution.

Which method should be used to optimize this? Of course the convex formulation means we can do gradient descent, but other methods can converge much faster.

References

Ali, Alnur, and Ryan J Tibshirani. 2019. “The Generalized Lasso Problem and Uniqueness.” Electronic Journal of Statistics 13: 2307–47.

Eraslan, Gökcen, Eugene Drokhlyansky, Shankara Anand, Evgenij Fiskin, Ayshwarya Subramanian, Michal Slyper, Jiali Wang, et al. 2022. “Single-Nucleus Cross-Tissue Molecular Reference Maps Toward Understanding Disease Gene Function.” Science 376 (6594): eabl4290.

Hou, Wenpin, Zhicheng Ji, Hongkai Ji, and Stephanie C Hicks. 2020. “A Systematic Evaluation of Single-Cell RNA-Sequencing Imputation Methods.” Genome Biology 21: 1–30.

Kiryo, Ryuichi, Gang Niu, Marthinus C Du Plessis, and Masashi Sugiyama. 2017. “Positive-Unlabeled Learning with Non-Negative Risk Estimator.” Advances in Neural Information Processing Systems 30.

Klicpera, Johannes, Aleksandar Bojchevski, and Stephan Günnemann. 2019. “Combining Neural Networks with Personalized Pagerank for Classification on Graphs.” In International Conference on Learning Representations.

Page, Lawrence, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. “The PageRank Citation Ranking: Bringing Order to the Web.” Stanford InfoLab.

Rossi, Emanuele, Henry Kenlay, Maria I Gorinova, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M Bronstein. 2022. “On the Unreasonable Effectiveness of Feature Propagation in Learning on Graphs with Missing Node Features.” In Learning on Graphs Conference, 11–11. PMLR.

Tibshirani, Ryan J, and Jonathan Taylor. 2011. “The Solution Path of the Generalized Lasso.” The Annals of Statistics, 1335–71.

Wang, Yu-Xiang, James Sharpnack, Alex Smola, and Ryan Tibshirani. 2015. “Trend Filtering on Graphs.” In Artificial Intelligence and Statistics, 1042–50. PMLR.

Zhou, Dengyong, Olivier Bousquet, Thomas Lal, Jason Weston, and Bernhard Schölkopf. 2003. “Learning with Local and Global Consistency.” In Advances in Neural Information Processing Systems, edited by S. Thrun, L. Saul, and B. Schölkopf. Vol. 16. MIT Press. https://proceedings.neurips.cc/paper/2003/file/87682805257e619d49b8e0dfdc14affa-Paper.pdf.

Zhou, Dengyong, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf. 2003. “Ranking on Data Manifolds.” Advances in Neural Information Processing Systems 16.

Footnotes

To generate the GTEx cell type labels as ground truth, the process used in the paper (Eraslan et al. 2022) involves significant human effort based on the marker genes typically signifying cell types. ↩︎