Detecting motifs in networks

graphs
ML
Some linear algebra one-liners
Author

Akshay Balsubramani

Modified

April 22, 2023

Triangles and derived graphs

Counting the number of triangles in a graph is an essential operation in graph theory. I’ll detail an extremely efficient approach to this task, which is especially useful in analysis of relatively sparse real-world graphs where tightly connected groups can indicate significant relationships.

Counting triangles with linear algebra

A network may be modeled as an undirected graph, like the one shown below.

My Happy SVG

Adjacency matrix. Let \(A\) be the adjacency matrix representation of the graph, defined as follows. The entries of \(A\) are either 0 or 1; and \(a_{i,j}\) equals 1 if and only if there is an edge connecting nodes \(i\) and \(j\). For instance, for the graph shown above,

\[ A = \begin{bmatrix} 0 & 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \end{bmatrix}. \]

Given this representation of the graph, here is a way to count triangles using linear algebra.

First, let \(A \cdot B\) denote matrix multiplication. That is, \(E = A \cdot B\) means \(e_{i,j} = \sum_k a_{i,k} b_{k, j}\). Then \([A \cdot A]_{i, j}\) counts the number of paths of length 2 from node \(i\) to node \(j\).

Next, let \(A \circ B\) denote elementwise multiplication. That is, \(E = A \circ B\) means \(e_{i, j} = a_{i, j} b_{i, j}\).

Then the matrix \(C = (A \cdot A) \circ A\) counts the number of triangles shared by each pair of vertices, i.e. each element \(c_{i,j}\), counts the number of triangles in which both \(i\) and \(j\) appear. For the example shown above, it turns out that \[ C = \begin{bmatrix} 0 & 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix}. \]

This reflects the fact that there is exactly one triangle in the graph, which is formed by nodes 1, 2, and 5.

Enumerating all 3-node motifs

Extending this example, there are a number of undirected and directed triangles exhaustively listed by (Benson, Gleich, and Leskovec 2016). Here are all the triangle-like motifs. We can compute them using linear algebra, using variations of the above reasoning.

M1

M2

M3

M4

M5

M6

M7

import sys, numpy as np, time, scipy, scipy.linalg, scipy.sparse
import matplotlib.pyplot as plt
%matplotlib inline

import anndata, scanpy as sc


def triangle_motif_kernel(gmat, ID='M1'):
    """
    Compute an adjacency matrix between nodes, 
    whose (i,j)th entry is the number of motifs that i and j are both involved in.
    gmat: A Scipy sparse adjacency matrix.
    """
    Bmat = gmat.multiply(gmat.T)
    Umat = gmat - Bmat
    if ID == 'M1':
        c = np.dot(Umat, Umat).multiply(Umat.T)
        return c + c.T
    elif ID == 'M2':
        c = np.dot(Bmat, Umat).multiply(Umat.T) + np.dot(Umat, Bmat).multiply(Umat.T) + np.dot(Umat, Umat).multiply(Bmat)
        return c + c.T
    elif ID == 'M3':
        c = np.dot(Bmat, Bmat).multiply(Umat) + np.dot(Bmat, Umat).multiply(Bmat) + np.dot(Umat, Bmat).multiply(Bmat)
        return c + c.T
    elif ID == 'M4':
        c = np.dot(Bmat, Bmat).multiply(Bmat)
        return c
    elif ID == 'M5':
        c = np.dot(Umat, Umat).multiply(Umat) + np.dot(Umat, Umat.T).multiply(Umat) + np.dot(Umat.T, Umat).multiply(Umat)
        return c + c.T
    elif ID == 'M6':
        c = np.dot(Umat, Bmat).multiply(Umat) + np.dot(Bmat, Umat.T).multiply(Umat.T) + np.dot(Umat.T, Umat).multiply(Bmat)
        return c
    elif ID == 'M7':
        c = np.dot(Umat.T, Bmat).multiply(Umat.T) + np.dot(Bmat, Umat).multiply(Umat) + np.dot(Umat, Umat.T).multiply(Bmat)
        return c

The original papers (Benson, Gleich, and Leskovec 2016), (Yin et al. 2017) describe this in more detail.

The GraphBLAS library (Davis 2019) uses these ideas too, focusing more on speedups that have been a focus over the last few years (Azad, Buluç, and Gilbert 2015).

Trusses: cohesive subgraphs of a graph

A truss is a subgraph of a graph that is highly connected, like a clique. It’s a useful concept because trusses, unlike cliques, are efficient to find, so I’ve used them before and will cover them here.1

A \(k\)-truss is a subgraph of \(k\) vertices, where every pair participates in \(k-2\) triangles in the truss (J. Cohen 2008). This is a tight-knit subgraph, but a bit less restrictive than a \(k\)-clique, in which every pair by definition participates in \(k-2\) clique triangles, but there are also connection requirements on other vertices outside the triangle.

Detecting a truss can be done very efficiently (J. Cohen 2009). Starting from the given graph, the idea is to compute the number of triangles each edge participates in, and then remove edges that participate in fewer than \(k-2\) triangles. This is done iteratively until no more edges can be removed; the edges that remain are the edges in the truss.

These are very efficient with linear algebra, using the triangle enumeration operations we’ve just defined (Low et al. 2018). After initializing gmat to be the given adjacency matrix adj_mat, the algorithm repeats the following steps until no more edges can be removed:

  • Enumerate \(M_4\) (undirected triangle motif) counts along each edge. We do this with triangle_motif_kernel(gmat, ID='M4')
  • Remove edges that participate in fewer than \(k-2\) triangles. We do this by thresholding the \(M_4\) counts from the previous step.
def get_k_truss(adj_mat, k):
    """Returns the k-truss of a graph.

    Parameters
    ----------
    adj_mat : sparse matrix
        Adjacency matrix of the graph. (n x n), where n is the number of vertices.
    k : int
        The k-truss to be extracted.
    
    Returns
    -------
    gmat : sparse matrix
        (n x n) adjacency matrix of the k-truss. 
        Includes all n vertices, but only edges in the k-truss. 
    """
    k_truss = adj_mat.copy()
    k_truss.data = np.ones_like(k_truss.data)
    while True: # repeat until no edges are removed
        tri_count = triangle_motif_kernel(k_truss, ID='M4')
        gmat_new = adj_mat.multiply(tri_count >= k-2)
        if (k_truss != gmat_new).nnz == 0:
            break
        k_truss = gmat_new
    # Binarize the k-truss to get its sparsity structure, and return those edges of the original matrix.
    k_truss.data = np.ones_like(k_truss.data)
    return adj_mat.multiply(k_truss)

Higher-order motifs

The classic paper (Ponstein 1966) devises an ingenious algorithm to count arbitrary cycles. This is one of the most general ways to count motifs, and I might implement it at some point because I’ve not found an implementation. Instead, I’ll pursue construction of a 4th-order version of a clique, which was introduced more recently.

Higher-order connected subgraphs

(J. D. Cohen 2019) looks for higher-order connected subgraphs, which they call “trapezes.” These can be enumerated efficiently too, by adapting the above techniques - but I’ll leave that for another post.

References

Azad, Ariful, Aydin Buluç, and John Gilbert. 2015. “Parallel Triangle Counting and Enumeration Using Matrix Algebra.” In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 804–11. IEEE.
Benson, Austin R, David F Gleich, and Jure Leskovec. 2016. “Higher-Order Organization of Complex Networks.” Science 353 (6295): 163–66.
Cohen, Jonathan. 2008. “Trusses: Cohesive Subgraphs for Social Network Analysis.” National Security Agency Technical Report 16 (3.1): 1–29.
———. 2009. “Graph Twiddling in a Mapreduce World.” Computing in Science & Engineering 11 (4): 29–41.
Cohen, Jonathan D. 2019. “Trusses and Trapezes: Easily-Interpreted Communities in Social Networks.” arXiv Preprint arXiv:1907.09417.
Davis, Timothy A. 2019. “Algorithm 1000: SuiteSparse: GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra.” ACM Transactions on Mathematical Software (TOMS) 45 (4): 1–25.
Low, Tze Meng, Daniele G Spampinato, Anurag Kutuluru, Upasana Sridhar, Doru Thom Popovici, Franz Franchetti, and Scott McMillan. 2018. “Linear Algebraic Formulation of Edge-Centric k-Truss Algorithms with Adjacency Matrices.” In 2018 IEEE High Performance Extreme Computing Conference (HPEC), 1–7. IEEE.
Ponstein, Jacob. 1966. “Self-Avoiding Paths and the Adjacency Matrix of a Graph.” SIAM Journal on Applied Mathematics 14 (3): 600–609.
Yin, Hao, Austin R Benson, Jure Leskovec, and David F Gleich. 2017. “Local Higher-Order Graph Clustering.” In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 555–64.

Footnotes

  1. The term comes from mechanical engineering of structures like bridges and roofs, and the idea comes from social network analysis.↩︎