import sys, numpy as np, time, scipy, scipy.linalg, scipy.sparse
import matplotlib.pyplot as plt
%matplotlib inline
import anndata, scanpy as sc
def triangle_motif_kernel(gmat, ID='M1'):
"""
Compute an adjacency matrix between nodes,
whose (i,j)th entry is the number of motifs that i and j are both involved in.
gmat: A Scipy sparse adjacency matrix.
"""
= gmat.multiply(gmat.T)
Bmat = gmat - Bmat
Umat if ID == 'M1':
= np.dot(Umat, Umat).multiply(Umat.T)
c return c + c.T
elif ID == 'M2':
= np.dot(Bmat, Umat).multiply(Umat.T) + np.dot(Umat, Bmat).multiply(Umat.T) + np.dot(Umat, Umat).multiply(Bmat)
c return c + c.T
elif ID == 'M3':
= np.dot(Bmat, Bmat).multiply(Umat) + np.dot(Bmat, Umat).multiply(Bmat) + np.dot(Umat, Bmat).multiply(Bmat)
c return c + c.T
elif ID == 'M4':
= np.dot(Bmat, Bmat).multiply(Bmat)
c return c
elif ID == 'M5':
= np.dot(Umat, Umat).multiply(Umat) + np.dot(Umat, Umat.T).multiply(Umat) + np.dot(Umat.T, Umat).multiply(Umat)
c return c + c.T
elif ID == 'M6':
= np.dot(Umat, Bmat).multiply(Umat) + np.dot(Bmat, Umat.T).multiply(Umat.T) + np.dot(Umat.T, Umat).multiply(Bmat)
c return c
elif ID == 'M7':
= np.dot(Umat.T, Bmat).multiply(Umat.T) + np.dot(Bmat, Umat).multiply(Umat) + np.dot(Umat, Umat.T).multiply(Bmat)
c return c
Detecting motifs in networks
Triangles and derived graphs
Counting the number of triangles in a graph is an essential operation in graph theory. I’ll detail an extremely efficient approach to this task, which is especially useful in analysis of relatively sparse real-world graphs where tightly connected groups can indicate significant relationships.
Counting triangles with linear algebra
A network may be modeled as an undirected graph, like the one shown below.
Adjacency matrix. Let \(A\) be the adjacency matrix representation of the graph, defined as follows. The entries of \(A\) are either 0 or 1; and \(a_{i,j}\) equals 1 if and only if there is an edge connecting nodes \(i\) and \(j\). For instance, for the graph shown above,
\[ A = \begin{bmatrix} 0 & 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \end{bmatrix}. \]
Given this representation of the graph, here is a way to count triangles using linear algebra.
First, let \(A \cdot B\) denote matrix multiplication. That is, \(E = A \cdot B\) means \(e_{i,j} = \sum_k a_{i,k} b_{k, j}\). Then \([A \cdot A]_{i, j}\) counts the number of paths of length 2 from node \(i\) to node \(j\).
Next, let \(A \circ B\) denote elementwise multiplication. That is, \(E = A \circ B\) means \(e_{i, j} = a_{i, j} b_{i, j}\).
Then the matrix \(C = (A \cdot A) \circ A\) counts the number of triangles shared by each pair of vertices, i.e. each element \(c_{i,j}\), counts the number of triangles in which both \(i\) and \(j\) appear. For the example shown above, it turns out that \[ C = \begin{bmatrix} 0 & 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix}. \]
This reflects the fact that there is exactly one triangle in the graph, which is formed by nodes 1, 2, and 5.
Enumerating all 3-node motifs
Extending this example, there are a number of undirected and directed triangles exhaustively listed by (Benson, Gleich, and Leskovec 2016). Here are all the triangle-like motifs. We can compute them using linear algebra, using variations of the above reasoning.
The original papers (Benson, Gleich, and Leskovec 2016), (Yin et al. 2017) describe this in more detail.
The GraphBLAS library (Davis 2019) uses these ideas too, focusing more on speedups that have been a focus over the last few years (Azad, Buluç, and Gilbert 2015).
Trusses: cohesive subgraphs of a graph
A truss is a subgraph of a graph that is highly connected, like a clique. It’s a useful concept because trusses, unlike cliques, are efficient to find, so I’ve used them before and will cover them here.1
A \(k\)-truss is a subgraph of \(k\) vertices, where every pair participates in \(k-2\) triangles in the truss (J. Cohen 2008). This is a tight-knit subgraph, but a bit less restrictive than a \(k\)-clique, in which every pair by definition participates in \(k-2\) clique triangles, but there are also connection requirements on other vertices outside the triangle.
Detecting a truss can be done very efficiently (J. Cohen 2009). Starting from the given graph, the idea is to compute the number of triangles each edge participates in, and then remove edges that participate in fewer than \(k-2\) triangles. This is done iteratively until no more edges can be removed; the edges that remain are the edges in the truss.
These are very efficient with linear algebra, using the triangle enumeration operations we’ve just defined (Low et al. 2018). After initializing gmat
to be the given adjacency matrix adj_mat
, the algorithm repeats the following steps until no more edges can be removed:
- Enumerate \(M_4\) (undirected triangle motif) counts along each edge. We do this with
triangle_motif_kernel(gmat, ID='M4')
- Remove edges that participate in fewer than \(k-2\) triangles. We do this by thresholding the \(M_4\) counts from the previous step.
def get_k_truss(adj_mat, k):
"""Returns the k-truss of a graph.
Parameters
----------
adj_mat : sparse matrix
Adjacency matrix of the graph. (n x n), where n is the number of vertices.
k : int
The k-truss to be extracted.
Returns
-------
gmat : sparse matrix
(n x n) adjacency matrix of the k-truss.
Includes all n vertices, but only edges in the k-truss.
"""
= adj_mat.copy()
k_truss = np.ones_like(k_truss.data)
k_truss.data while True: # repeat until no edges are removed
= triangle_motif_kernel(k_truss, ID='M4')
tri_count = adj_mat.multiply(tri_count >= k-2)
gmat_new if (k_truss != gmat_new).nnz == 0:
break
= gmat_new
k_truss # Binarize the k-truss to get its sparsity structure, and return those edges of the original matrix.
= np.ones_like(k_truss.data)
k_truss.data return adj_mat.multiply(k_truss)
Higher-order motifs
The classic paper (Ponstein 1966) devises an ingenious algorithm to count arbitrary cycles. This is one of the most general ways to count motifs, and I might implement it at some point because I’ve not found an implementation. Instead, I’ll pursue construction of a 4th-order version of a clique, which was introduced more recently.
Higher-order connected subgraphs
(J. D. Cohen 2019) looks for higher-order connected subgraphs, which they call “trapezes.” These can be enumerated efficiently too, by adapting the above techniques - but I’ll leave that for another post.
References
Footnotes
The term comes from mechanical engineering of structures like bridges and roofs, and the idea comes from social network analysis.↩︎