Using graphs to express structure

graphs
Overview
Efficient, practical tools
Author

Akshay Balsubramani

Summary

Graphs are an indispensable tool for learning because of both their flexibility and their efficiency.

  • Flexibility: They are capable of expressively representing all sorts of structured data and causal relationships.
  • Efficiency: Representing them is relatively cheap using basic tools. Computing with them uses blazing fast linear algebra routines in creative ways.

Here we’ll collect a bunch of tools that are useful for data analyses in biochemistry and the sciences.

A guide to these posts

CODE
s = """- [Constructing graphs](https://akshay.bio/variable-expectations/posts/graphs/Neighborhood-graph-construction-techniques.html)

- [Calculus on graphs as manifolds](https://akshay.bio/variable-expectations/posts/graphs/Graph-calculus.html)

- [Graph motifs and higher-order structures](https://akshay.bio/variable-expectations/posts/graphs/Graph-motifs.html)

- [Learning from neighbors with harmonic analysis](https://akshay.bio/variable-expectations/posts/graphs/Graph-harmonic.html)

- [Graph sparsification](https://akshay.bio/variable-expectations/posts/graphs/Graph-sparsification.html)

- [Trees and finding hierarchies](https://akshay.bio/variable-expectations/posts/graphs/Hierarchical-clustering.html)

- [Learning with the graph Laplacian and trend filtering](https://akshay.bio/variable-expectations/posts/graphs/Laplacian-regularization.html)

- [Localized graph descriptors](https://akshay.bio/variable-expectations/posts/graphs/Localized-graph-descriptions.html)

- [Stationarity, coherence, and generating signals on graphs](https://akshay.bio/variable-expectations/posts/graphs/Generating-signals-from-kNN.html)

- [Conformal prediction with confidence for chemistry](https://akshay.bio/variable-expectations/posts/neighborhoods/Conformal-metric.html)

- [Modeling distributions on data neighborhoods](https://akshay.bio/variable-expectations/posts/neighborhoods/Neighborhoods-and-distributions.html)

- [Which data are intrinsically hard to classify?](https://akshay.bio/variable-expectations/posts/neighborhoods/Nonparametric-margins.html)

- [Structured correlation coefficients](https://akshay.bio/variable-expectations/posts/statistics/A-new-correlation-coefficient.html)
"""


decomp_tools_path = "../../files/utils/" + "mermaid_tools.py"

from importlib.machinery import SourceFileLoader
mermaid_tools = SourceFileLoader("mermaid_tools", decomp_tools_path).load_module()
import re, pandas as pd

# pat = re.compile(r'-\s*\[([^\]]+)]\(([^)]+)\)')
pat = re.compile(r'-\s*\[([^\]]*)\]\(([^)]*)\)')   # * instead of +  →  empty allowed

titles, urls, names, ids = [], [], [], []
for title, url in pat.findall(s):
    if title or url:            # keep the row if at least one field is non-empty
        titles.append(title)
        urls.append(url)
        post_name = url.split("/")[-1].split(".")[0].replace("-", " ")
        # Get initial of each word in post_name
        initials = "".join(word[0] for word in post_name.split() if word)
        names.append(post_name)
        ids.append(initials)

node_df = pd.DataFrame({
    "label": titles,
    "url": urls, 
    "name": names, 
    "id": ids, 
    "subgraph": ["Construction", "Algorithms", "Structures", "Algorithms", "Algorithms", "Structures", "Algorithms", "Algorithms", "Neighborhoods", "Evaluation", "Neighborhoods", "Neighborhoods", "Evaluation"],
})

# node_df['text_color'] = '#000000'
# node_df['text_style'] = 'bold'
node_df
label url name id subgraph
0 Constructing graphs https://akshay.bio/variable-expectations/posts... Neighborhood graph construction techniques Ngct Construction
1 Calculus on graphs as manifolds https://akshay.bio/variable-expectations/posts... Graph calculus Gc Algorithms
2 Graph motifs and higher-order structures https://akshay.bio/variable-expectations/posts... Graph motifs Gm Structures
3 Learning from neighbors with harmonic analysis https://akshay.bio/variable-expectations/posts... Graph harmonic Gh Algorithms
4 Graph sparsification https://akshay.bio/variable-expectations/posts... Graph sparsification Gs Algorithms
5 Trees and finding hierarchies https://akshay.bio/variable-expectations/posts... Hierarchical clustering Hc Structures
6 Learning with the graph Laplacian and trend fi... https://akshay.bio/variable-expectations/posts... Laplacian regularization Lr Algorithms
7 Localized graph descriptors https://akshay.bio/variable-expectations/posts... Localized graph descriptions Lgd Algorithms
8 Stationarity, coherence, and generating signal... https://akshay.bio/variable-expectations/posts... Generating signals from kNN Gsfk Neighborhoods
9 Conformal prediction with confidence for chemi... https://akshay.bio/variable-expectations/posts... Conformal metric Cm Evaluation
10 Modeling distributions on data neighborhoods https://akshay.bio/variable-expectations/posts... Neighborhoods and distributions Nad Neighborhoods
11 Which data are intrinsically hard to classify? https://akshay.bio/variable-expectations/posts... Nonparametric margins Nm Neighborhoods
12 Structured correlation coefficients https://akshay.bio/variable-expectations/posts... A new correlation coefficient Ancc Evaluation
CODE
len(["Construction", "Algorithms", "Structures", "Algorithms", "Algorithms", "Structures", "Algorithms", "Algorithms", "Neighborhoods", "Evaluation", "Neighborhoods", "Evaluation"])
12
CODE
# node_df = pd.DataFrame([
#     ("TID",  "Target Identification",        "rect", "#c6dbef", "#3182bd", "https://en.wikipedia.org/wiki/Drug_target"),
#     ("VAL",  "Target Validation?",           "diamond", "#fdd49e", "#e6550d", "https://www.cell.com/trends/pharmacological-sciences/fulltext/S0165-6147(23)00137-2"),
#     ("HTS",  "Hit Discovery (HTS)",          "rect", "#c6dbef", "#3182bd", "https://en.wikipedia.org/wiki/High-throughput_screening"),
#     ("H2L",  "Hit‑to‑Lead / Lead Optimization",                  "rect", "#c6dbef", "#3182bd", "")
# ], columns=["id","label","shape","fill","stroke","url"])



edge_df = pd.DataFrame({
    "src": ["Construction", "Construction"],
    "dst": ["Algorithms", "Structures"],
    "label": ["", ""], "color": ["", ""], "style": ["", ""],
    #"label": ["", "various", "", ""], "color": ["", "cb181d", "", ""], "style": ["", "dotted", "", ""],
    "url": ["", ""],
})
CODE
md_file_path = "Graphs-expressing-structure.mmd"
html_file_path = "Graphs-expressing-structure.html"

m = mermaid_tools.build_mermaid(
    node_df, edge_df, 
    direction="TB", 
    output_path=md_file_path
)

mermaid_tools.mermaid_to_html(md_file_path, html_file_path)

from pathlib import Path
from IPython.display import Markdown, display

diagram = Path(md_file_path).read_text()
mermaid_str = f"```{mermaid}\n{diagram}\n```"
display(Markdown(mermaid_str))
Wrote Graphs-expressing-structure.html

flowchart TB
    Construction --> Algorithms
    Construction --> Structures
subgraph Construction
    Ngct["Constructing graphs"]
click Ngct "https://akshay.bio/variable-expectations/posts/graphs/Neighborhood-graph-construction-techniques.html" "Constructing graphs"
end
subgraph Algorithms
    Gc["Calculus on graphs as manifolds"]
click Gc "https://akshay.bio/variable-expectations/posts/graphs/Graph-calculus.html" "Calculus on graphs as manifolds"
    Gh["Learning from neighbors with harmonic analysis"]
click Gh "https://akshay.bio/variable-expectations/posts/graphs/Graph-harmonic.html" "Learning from neighbors with harmonic analysis"
    Gs["Graph sparsification"]
click Gs "https://akshay.bio/variable-expectations/posts/graphs/Graph-sparsification.html" "Graph sparsification"
    Lr["Learning with the graph Laplacian and trend filtering"]
click Lr "https://akshay.bio/variable-expectations/posts/graphs/Laplacian-regularization.html" "Learning with the graph Laplacian and trend filtering"
    Lgd["Localized graph descriptors"]
click Lgd "https://akshay.bio/variable-expectations/posts/graphs/Localized-graph-descriptions.html" "Localized graph descriptors"
end
subgraph Structures
    Gm["Graph motifs and higher-order structures"]
click Gm "https://akshay.bio/variable-expectations/posts/graphs/Graph-motifs.html" "Graph motifs and higher-order structures"
    Hc["Trees and finding hierarchies"]
click Hc "https://akshay.bio/variable-expectations/posts/graphs/Hierarchical-clustering.html" "Trees and finding hierarchies"
end
subgraph Neighborhoods
    Gsfk["Stationarity, coherence, and generating signals on graphs"]
click Gsfk "https://akshay.bio/variable-expectations/posts/graphs/Generating-signals-from-kNN.html" "Stationarity, coherence, and generating signals on graphs"
    Nad["Modeling distributions on data neighborhoods"]
click Nad "https://akshay.bio/variable-expectations/posts/neighborhoods/Neighborhoods-and-distributions.html" "Modeling distributions on data neighborhoods"
    Nm["Which data are intrinsically hard to classify?"]
click Nm "https://akshay.bio/variable-expectations/posts/neighborhoods/Nonparametric-margins.html" "Which data are intrinsically hard to classify?"
end
subgraph Evaluation
    Cm["Conformal prediction with confidence for chemistry"]
click Cm "https://akshay.bio/variable-expectations/posts/neighborhoods/Conformal-metric.html" "Conformal prediction with confidence for chemistry"
    Ancc["Structured correlation coefficients"]
click Ancc "https://akshay.bio/variable-expectations/posts/statistics/A-new-correlation-coefficient.html" "Structured correlation coefficients"
end

CODE
m
'flowchart TB\n    Construction --> Algorithms\n    Construction --> Structures\nsubgraph Construction\n    Ngct["Constructing graphs"]\nclick Ngct "https://akshay.bio/variable-expectations/posts/graphs/Neighborhood-graph-construction-techniques.html" "Constructing graphs"\nend\nsubgraph Algorithms\n    Gc["Calculus on graphs as manifolds"]\nclick Gc "https://akshay.bio/variable-expectations/posts/graphs/Graph-calculus.html" "Calculus on graphs as manifolds"\n    Gh["Learning from neighbors with harmonic analysis"]\nclick Gh "https://akshay.bio/variable-expectations/posts/graphs/Graph-harmonic.html" "Learning from neighbors with harmonic analysis"\n    Gs["Graph sparsification"]\nclick Gs "https://akshay.bio/variable-expectations/posts/graphs/Graph-sparsification.html" "Graph sparsification"\n    Lr["Learning with the graph Laplacian and trend filtering"]\nclick Lr "https://akshay.bio/variable-expectations/posts/graphs/Laplacian-regularization.html" "Learning with the graph Laplacian and trend filtering"\n    Lgd["Localized graph descriptors"]\nclick Lgd "https://akshay.bio/variable-expectations/posts/graphs/Localized-graph-descriptions.html" "Localized graph descriptors"\nend\nsubgraph Structures\n    Gm["Graph motifs and higher-order structures"]\nclick Gm "https://akshay.bio/variable-expectations/posts/graphs/Graph-motifs.html" "Graph motifs and higher-order structures"\n    Hc["Trees and finding hierarchies"]\nclick Hc "https://akshay.bio/variable-expectations/posts/graphs/Hierarchical-clustering.html" "Trees and finding hierarchies"\nend\nsubgraph Neighborhoods\n    Gsfk["Stationarity, coherence, and generating signals on graphs"]\nclick Gsfk "https://akshay.bio/variable-expectations/posts/graphs/Generating-signals-from-kNN.html" "Stationarity, coherence, and generating signals on graphs"\n    Nad["Modeling distributions on data neighborhoods"]\nclick Nad "https://akshay.bio/variable-expectations/posts/neighborhoods/Neighborhoods-and-distributions.html" "Modeling distributions on data neighborhoods"\n    Nm["Which data are intrinsically hard to classify?"]\nclick Nm "https://akshay.bio/variable-expectations/posts/neighborhoods/Nonparametric-margins.html" "Which data are intrinsically hard to classify?"\nend\nsubgraph Evaluation\n    Cm["Conformal prediction with confidence for chemistry"]\nclick Cm "https://akshay.bio/variable-expectations/posts/neighborhoods/Conformal-metric.html" "Conformal prediction with confidence for chemistry"\nend'

Reuse

Citation

BibTeX citation:
@online{balsubramani,
  author = {Balsubramani, Akshay},
  title = {Using Graphs to Express Structure},
  langid = {en}
}
For attribution, please cite this work as:
Balsubramani, Akshay. n.d. “Using Graphs to Express Structure.”