Graphs as interactive diagrams

dataviz

Translating between interactive flowcharts and data science

Author

Akshay Balsubramani

Graphs are a language for decision-making

Decision-making almost always involves moving from one situation to another under uncertainty and constraints. That “movement” has a direction (from current state → next state) and often branches (different actions or external events). Graphs are the natural data structure for representing exactly those ideas:

Sequential structure → paths (often trees).
Alternative futures / feedback → branching & cycles.
Quantities attached to situations or transitions → node/edge attributes.

We’ll focus on operational decisions under uncertainty, where we make sequential choices, move through observable or latent states, and need human‑legible artifacts for review, audit, and iteration.

Sequential choices → trees

Many decisions are temporal and branching: choose an action, observe an outcome, then choose again. That structure is exactly specified by a tree. Trees make conditionality explicit (“if A then B else C”). They also support roll‑ups, like those based on mean value: along a set of paths P with node costs \(c_i\), the expected utility is \(\sum_{P} \text{Pr}(\text{path } P)\,(\sum_{i \in P} c_i)\). They are legible to non‑technical stakeholders (product, ops, legal), and they localize edits: changing one branch’s probability or cost updates only dependent subtrees, not the whole model.

flowchart TD
    S{"Choose approach"}
    A[Build MVP]
    B[Run Survey]
    C((Strong demand))
    D((Weak demand))
    E([Ship])
    F([Pivot])

    S -->|cost=$100k| A
    S -->|cost=$30k| B
    A -->|p=0.6| C
    A -->|p=0.4| D
    B -->|p=0.5| C
    B -->|p=0.5| D
    C -->|EV=$1.2M| E
    D -->|EV=$0.2M| F

Business intelligence suites (e.g., Salesforce Einstein, Microsoft Power BI) expose “decision tree” explainers, for a few common reasons:

Interpretability – each split is a plain-language rule. Paths can yield traceable rationales, as for decision trees.
Structural alignment with KPIs – branches align with “funnel” metrics (region → segment → customer) and with organizational structures.
Perturbability – analysts prune or graft branches to test policy changes.

The underlying structure is still a directed graph with attributes, rendered tidily for narrative clarity.

Stateful processes are graphical, too

When the same state can be revisited (e.g., inventory levels, patient health states, Markov chains), a pure tree explodes in size. State diagrams compress identical states into single nodes and allow cycles. This type of graphical model allows for meaningful reachability queries (“How likely is stock-out within 7 steps?”) and shortest-path optimizations (“Minimal steps back to In Stock”).

The natural ML/AI abstraction aligning with this notion is a state diagram, often aligning with Markov decision processes:

States carry context: Under Review, Flagged‑High Risk, Resolved
Transitions encode actions or events with guards: severity > 3, timeout, approval received
Policies correspond to distributions over outgoing edges from each state

A minimal Mermaid state diagram can be written out as an example, showcasing ideas like self-loops and unnamed nodes.

stateDiagram-v2
    [*] --> Idle
    Idle --> Investigate: alert
    Investigate --> Remediate: HIGH severity
    Investigate --> Monitor: LOW severity
    Remediate --> Monitor
    Monitor --> Monitor
    Monitor --> Idle: resolve
    Monitor --> Investigate: regress

Interactive visualization for these graphs

For such graphs, interactive visualizations can be quite important, because humans grasp patterns faster when they can pan/zoom, collapse subtrees, or highlight cycles, enabling exploratory analysis and compelling grounded storytelling.

Node+edge file formats for data analysis

For the purposes of data analysis, graphs are often specified by two parallel files.

A node file lists nodes, one per line, with associated metadata.
An edge file lists edges, one per line. Each has a source node and a destination node, as well as any associated metadata for that edge.

This covers all forms of graphs that are normally used in decision-making. It’s great for joins, versioning, and analysis, but a translator is needed for common visualization languages. Mermaid is an effective lowest‑friction target: it is popular because it’s plaintext, renders in docs and wikis, and supports labels, shapes, and links.

An example

As a working example, we can illustrate these concepts with a schematic of a standard drug discovery pipeline, as is common in pharma and biotech. This is essentially a linear graph, with some of the nodes and edges being styled differently from the rest to denote special semantics. Writing it in node+edge form mimics the way such charts often represented.

CODE

import pandas as pd, itertools, pathlib
from typing import Optional, List

node_dict = {
    "id": ["TID", "VAL", "HTS", "H2L", "PRE", "IND", "P1", "P2", "P3", "NDA", "FDA"],
    "label": ["Target Identification", "Target Validation?", "Hit Discovery (HTS)", "Hit‑to‑Lead / Lead Optimization", "Pre-clinical Safety", "IND Filing", "Phase I", "Phase II", "Phase III", "NDA Submission", "Regulatory Approval"],
    "shape": ["rect", "diamond", "rect", "rect", "rect", "rect", "rect", "rect", "rect", "rect", "rect"],
    "fill": ["#c6dbef", "#fdd49e", "#c6dbef", "#c6dbef", "#c6dbef", "#c6dbef", "#c7e9c0", "#c7e9c0", "#c7e9c0", "#c6dbef", "#c6dbef"],
    "stroke": ["#3182bd", "#e6550d", "#3182bd", "#3182bd", "#3182bd", "#3182bd", "#238b45", "#238b45", "#238b45", "#3182bd", "#3182bd"],
    "url": ["https://en.wikipedia.org/wiki/Drug_target", "https://www.cell.com/trends/pharmacological-sciences/fulltext/S0165-6147(23)00137-2", "https://en.wikipedia.org/wiki/High-throughput_screening", "", "", "https://en.wikipedia.org/wiki/Investigational_New_Drug", "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_I", "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_II", "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_III", "https://en.wikipedia.org/wiki/New_Drug_Application", "https://en.wikipedia.org/wiki/Food_and_Drug_Administration"], 
    "subgraph": ["", "", "", "", "", "", "Clinical + post-clinical", "Clinical + post-clinical", "Clinical + post-clinical", "Clinical + post-clinical", "Clinical + post-clinical"],
}

node_df = pd.DataFrame(node_dict)

node_df['text_color'] = '#000000'
node_df['text_style'] = 'bold'

edge_dict = {
    "src": ["TID", "VAL", "HTS", "H2L", "PRE", "IND", "P1", "P2", "P3", "NDA"],
    "dst": ["VAL", "HTS", "H2L", "PRE", "IND", "P1", "P2", "P3", "NDA", "FDA"],
    "label": ["", "various", "", "", "", "", "", "", "", ""],
    "color": ["", "cb181d", "", "", "", "", "", "", "", ""],
    "style": ["", "dotted", "", "", "", "", "", "", "", ""],
    "url": ["", "", "", "", "", "", "", "", "", ""],
}

edge_df = pd.DataFrame(edge_dict)

node_df

	id	label	shape	fill	stroke	url	subgraph	text_color	text_style
0	TID	Target Identification	rect	#c6dbef	#3182bd	https://en.wikipedia.org/wiki/Drug_target		#000000	bold
1	VAL	Target Validation?	diamond	#fdd49e	#e6550d	https://www.cell.com/trends/pharmacological-sc...		#000000	bold
2	HTS	Hit Discovery (HTS)	rect	#c6dbef	#3182bd	https://en.wikipedia.org/wiki/High-throughput_...		#000000	bold
3	H2L	Hit‑to‑Lead / Lead Optimization	rect	#c6dbef	#3182bd			#000000	bold
4	PRE	Pre-clinical Safety	rect	#c6dbef	#3182bd			#000000	bold
5	IND	IND Filing	rect	#c6dbef	#3182bd	https://en.wikipedia.org/wiki/Investigational_...		#000000	bold
6	P1	Phase I	rect	#c7e9c0	#238b45	https://en.wikipedia.org/wiki/Phases_of_clinic...	Clinical + post-clinical	#000000	bold
7	P2	Phase II	rect	#c7e9c0	#238b45	https://en.wikipedia.org/wiki/Phases_of_clinic...	Clinical + post-clinical	#000000	bold
8	P3	Phase III	rect	#c7e9c0	#238b45	https://en.wikipedia.org/wiki/Phases_of_clinic...	Clinical + post-clinical	#000000	bold
9	NDA	NDA Submission	rect	#c6dbef	#3182bd	https://en.wikipedia.org/wiki/New_Drug_Applica...	Clinical + post-clinical	#000000	bold
10	FDA	Regulatory Approval	rect	#c6dbef	#3182bd	https://en.wikipedia.org/wiki/Food_and_Drug_Ad...	Clinical + post-clinical	#000000	bold

Translating data-science graphs → interactive flowcharts

This process of translation is largely a mechanical exercise, which we give code for below. A few points turn out to be key in writing this code.

Edge metadata → label: join selected columns as key=value pairs.
Keep stable IDs (sanitize to alphanumerics/underscore) and human labels separately.
Preserve ordering where it helps reading (top‑down graph TD), but don’t rely on it semantically.

A node-edge-format → Mermaid converter

A first cut at this problem involves writing a compact syntactic converter in Python. Let’s demonstrate with the following directed graph, a toy schematic of a drug discovery pipeline:

graph LR
    TID --> VAL
    VAL -->|various| HTS
    HTS --> H2L
    H2L --> LO
    LO --> PRE
    PRE --> IND
    IND --> P1
    subgraph Clinical + post-clinical
    P1 --> P2
    P2 --> P3
    P3 --> NDA
    NDA --> FDA
    end
    TID["Target Identification"]
click TID "https://en.wikipedia.org/wiki/Drug_target" "Target Identification"
style TID fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    VAL{"Target Validation?"}
click VAL "https://www.cell.com/trends/pharmacological-sciences/fulltext/S0165-6147(23)00137-2" "Target Validation?"
style VAL fill:#fdd49e,stroke:#e6550d,color:#000000,font-weight:bold
    HTS["Hit Discovery (HTS)"]
click HTS "https://en.wikipedia.org/wiki/High-throughput_screening" "Hit Discovery (HTS)"
style HTS fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    H2L["Hit‑to‑Lead"]
style H2L fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    LO["Lead Optimization"]
style LO fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    PRE["Pre-clinical Safety"]
style PRE fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    IND["IND Filing"]
click IND "https://en.wikipedia.org/wiki/Investigational_New_Drug" "IND Filing"
style IND fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    P1["Phase I"]
click P1 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_I" "Phase I"
style P1 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold
    P2["Phase II"]
click P2 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_II" "Phase II"
style P2 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold
    P3["Phase III"]
click P3 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_III" "Phase III"
style P3 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold
    NDA["NDA Submission"]
click NDA "https://en.wikipedia.org/wiki/New_Drug_Application" "NDA Submission"
style NDA fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    FDA["Regulatory Approval"]
click FDA "https://en.wikipedia.org/wiki/Food_and_Drug_Administration" "Regulatory Approval"
style FDA fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
linkStyle 1 stroke:#cb181d,stroke-dasharray:2 2

It’s useful to have a one-stop function to make these and other types of charts, in Markdown, from node-edge representations. Here we provide one such function called build_mermaid. It allows extra node- and edge-specific columns to be provided in which various style parameters of the mermaid chart can be specified, on a node- and edge-specific basis. This requires a significant amount of Mermaid-specific bookkeeping, which is implemented below.

It involves an optional node_subgraph column being added to node_df. Each row can have either (a) NaN / empty → node goes to the global scope; or (b) a string → node is placed inside the subgraph with that name. After collecting all distinct subgraph names, for each subgraph we emit a subgraph NAME … end block containing all its nodes. Nodes without a subgraph remain at the top level.

Such code can be tested on the above stylized model of the drug discovery pipeline, to demonstrate some of the style and linking options in a simple specification.

CODE

import itertools, pathlib, pandas as pd
from typing import Optional, List, Dict

def build_mermaid(
    node_df: pd.DataFrame,
    edge_df: pd.DataFrame,
    output_path: str = "diagram.mmd",
    fence: bool = False,
    # ─── node columns ────────────────────────────────────────────────────
    node_id: str = "id",
    node_label: str = "label",
    node_shape: Optional[str] = "shape",
    node_fill: Optional[str] = "fill",
    node_stroke: Optional[str] = "stroke",
    node_url: Optional[str] = "url",
    node_tooltip: Optional[str] = "tooltip",
    node_text_color: Optional[str] = "text_color",
    node_text_style: Optional[str] = "text_style",
    node_subgraph: Optional[str] = "subgraph",
    # ─── edge columns ────────────────────────────────────────────────────
    edge_src: str = "src",
    edge_dst: str = "dst",
    edge_label: Optional[str] = "label",
    edge_color: Optional[str] = "color",
    edge_style: Optional[str] = "style",      # dashed | dotted
    edge_url: Optional[str] = "url",
    edge_label_color: Optional[str] = "label_color",
    edge_label_style: Optional[str] = "label_style",
    # ─── layout ──────────────────────────────────────────────────────────
    direction: str = "TD",
) -> str:
    """
    Build a Mermaid flow-chart from *node_df* and *edge_df* **with optional
    sub-graph grouping**.

    If *node_subgraph* names an existing column, every distinct non-blank
    value starts a block::

         subgraph <value>
             …nodes…
         end

    All styling logic (node shapes, colours, text styles, edge styles,
    helper vertices for clickable edges) is identical to the legacy
    `old_build_mermaid`, so diagrams are pixel-perfect when no subgraph
    column is present.

    Parameters
    ----------
    node_df, edge_df : pd.DataFrame
        Vertices and directed edges.
    output_path : str
        Target .mmd file.
    fence : bool, default False
        Wrap result in ```{mermaid}``` fences (for Quarto/Markdown).
    <column-name parameters>
        Override these if your DataFrame uses different headers; pass *None*
        when an attribute is absent.
    direction : {"TD","LR","RL","BT"}, default "TD"
        Graph orientation.

    Returns
    -------
    str
        Mermaid source that was written to *output_path*.
    """
    g: List[str] = [f"flowchart {direction}"]
    link_styles: List[str] = []
    edge_index, dummy_iter = 0, itertools.count()

    # helpers ------------------------------------------------------------
    get = lambda row, col, d="": row[col] if col and col in row and pd.notna(row[col]) else d
    css = lambda s: [c for t, c in (
                        ("italic",   "font-style:italic"),
                        ("bold",     "font-weight:bold"),
                        ("underline","text-decoration:underline")
                     ) if t in str(s)]
    colorize = lambda c: "" if not c else c if str(c).startswith("#") else f"#{c}"

    shape = {
        "rect"      : lambda i,l: f'{i}["{l}"]',
        "round"     : lambda i,l: f'{i}("{l}")',
        "circle"    : lambda i,l: f'{i}(("{l}"))',
        "stadium"   : lambda i,l: f'{i}(["{l}"])',
        "subroutine": lambda i,l: f'{i}[[{l}]]',
        "diamond"   : lambda i,l: f'{i}{{"{l}"}}',
    }

    # edge writer --------------------------------------------------------
    def add_edge(src, dst, lab, styles):
        nonlocal edge_index
        arrow = f' -->|{lab}| ' if lab else ' --> '
        g.append(f"    {src}{arrow}{dst}")
        if styles:
            x = ",".join(styles)
            link_styles.append(f"linkStyle {edge_index} {x}")
        edge_index += 1

    # edges (done first) -------------------------------------------------
    for _, e in edge_df.iterrows():
        lab, src, dst, url = get(e, edge_label), get(e, edge_src), get(e, edge_dst), get(e, edge_url)

        e_styles: List[str] = []
        if colorize(get(e, edge_color)):
            e_styles.append(f"stroke:{colorize(get(e, edge_color))}")
        if get(e, edge_style).lower() in {"dashed", "dotted"}:
            pattern = "5 5" if get(e, edge_style).lower() == "dashed" else "2 2"
            e_styles.append(f"stroke-dasharray:{pattern}")
        if colorize(get(e, edge_label_color)):
            e_styles.append(f"color:{colorize(get(e, edge_label_color))}")
        e_styles += css(get(e, edge_label_style))

        if url:  # invisible helper vertex keeps the edge clickable
            helper = f"h{next(dummy_iter)}"
            tooltip = lab or "link"
            g += [
                f"    {src} --> {helper}",
                f'    {helper}[""]',
                f"style {helper} fill:transparent,stroke:transparent",
                f'click {helper} "{url}" "{tooltip}"',
            ]
            if e_styles:
                x = ",".join(e_styles)
                link_styles.append(f"linkStyle {edge_index} {x}")
            edge_index += 1
            add_edge(helper, dst, lab, e_styles)
        else:
            add_edge(src, dst, lab, e_styles)

    # group nodes by subgraph -------------------------------------------
    groups: Dict[str, List[pd.Series]] = {}
    for _, row in node_df.iterrows():
        grp = get(row, node_subgraph) if node_subgraph and node_subgraph in node_df.columns else ""
        groups.setdefault(str(grp), []).append(row)

    def render_node(row: pd.Series):
        nid, lbl = get(row, node_id), get(row, node_label, get(row, node_id))
        tooltip = get(row, node_tooltip, lbl)

        g.append("    " + shape.get(get(row, node_shape, "rect").lower(), shape["rect"])(nid, lbl))

        if get(row, node_url):
            g.append(f'click {nid} "{get(row, node_url)}" "{tooltip}"')

        n_styles: List[str] = []
        if colorize(get(row, node_fill)):
            n_styles.append(f"fill:{colorize(get(row, node_fill))}")
        if colorize(get(row, node_stroke)):
            n_styles.append(f"stroke:{colorize(get(row, node_stroke))}")
        if colorize(get(row, node_text_color)):
            n_styles.append(f"color:{colorize(get(row, node_text_color))}")
        n_styles += css(get(row, node_text_style))
        if n_styles:
            g.append(f"style {nid} {','.join(n_styles)}")

    for sg, rows in groups.items():
        if sg and sg.lower() not in {"", "nan"}:
            g.append(f"subgraph {sg}")
        for r in rows:
            render_node(r)
        if sg and sg.lower() not in {"", "nan"}:
            g.append("end")

    g.extend(link_styles)
    if fence:
        g = ["```{mermaid}", *g, "```"]

    text = "\n".join(g)
    pathlib.Path(output_path).write_text(text)
    return text

Running the translation code on these dataframes gives the desired stylized pipeline, with supporting links as necessary.

CODE

a = """
- [sdfasdf]()
"""

CODE

md_file_path = "graph.mmd"

build_mermaid(
    node_df, edge_df, 
    direction="LR", 
    output_path=md_file_path
)

from pathlib import Path
from IPython.display import Markdown, display

diagram = Path(md_file_path).read_text()
mermaid_str = f"```{mermaid}\n{diagram}\n```"
display(Markdown(mermaid_str))

flowchart LR
    TID --> VAL
    VAL -->|various| HTS
    HTS --> H2L
    H2L --> PRE
    PRE --> IND
    IND --> P1
    P1 --> P2
    P2 --> P3
    P3 --> NDA
    NDA --> FDA
    TID["Target Identification"]
click TID "https://en.wikipedia.org/wiki/Drug_target" "Target Identification"
style TID fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    VAL{"Target Validation?"}
click VAL "https://www.cell.com/trends/pharmacological-sciences/fulltext/S0165-6147(23)00137-2" "Target Validation?"
style VAL fill:#fdd49e,stroke:#e6550d,color:#000000,font-weight:bold
    HTS["Hit Discovery (HTS)"]
click HTS "https://en.wikipedia.org/wiki/High-throughput_screening" "Hit Discovery (HTS)"
style HTS fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    H2L["Hit‑to‑Lead / Lead Optimization"]
style H2L fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    PRE["Pre-clinical Safety"]
style PRE fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    IND["IND Filing"]
click IND "https://en.wikipedia.org/wiki/Investigational_New_Drug" "IND Filing"
style IND fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
subgraph Clinical + post-clinical
    P1["Phase I"]
click P1 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_I" "Phase I"
style P1 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold
    P2["Phase II"]
click P2 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_II" "Phase II"
style P2 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold
    P3["Phase III"]
click P3 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_III" "Phase III"
style P3 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold
    NDA["NDA Submission"]
click NDA "https://en.wikipedia.org/wiki/New_Drug_Application" "NDA Submission"
style NDA fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
    FDA["Regulatory Approval"]
click FDA "https://en.wikipedia.org/wiki/Food_and_Drug_Administration" "Regulatory Approval"
style FDA fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold
end
linkStyle 1 stroke:#cb181d,stroke-dasharray:2 2

Export to HTML

Exporting such charts as standalone HTML files requires referring to the freely available Mermaid JavaScript implementation, and rounds off our implementation with a portable, writable file for any Mermaid chart.

CODE

def mermaid_to_html(
    mermaid_file_path: str,
    html_out: str
):
    HTML_TEMPLATE="""<!DOCTYPE html>
<html><head><meta charset="utf-8">
<script src="https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.min.js"></script>
<style>.node rect{{rx:6;ry:6;}}</style>
</head><body>
<div class="mermaid">
{diagram}
</div>
<script>mermaid.initialize({{startOnLoad:true}});</script>
</body></html>"""
    diagram = Path(mermaid_file_path).read_text()
    pathlib.Path(html_out).write_text(HTML_TEMPLATE.format(diagram=diagram))
    print(f"Wrote {html_out}")


mermaid_to_html("graph.mmd", "graph.html")

Wrote graph.html

This can be opened in any browser, with an HTML file that is barely bigger than the specifying Mermaid file. The HTML file essentially just links to the freely available minimal Javascript implementation of Mermaid, which does the heavy lifting.

Practical themes

As all this illustrates, it’s extremely useful to have purely text-based representations of graphs, for ease of manipulation and storage. Mermaid is the consensus open standard for such text representations of networks/graphs/flowcharts with a Javascript backend.

The structure of graphs gives ample opportunities to differ computation and rendering, making plenty of interactive manipulations possible. Large graphs can be summarized with subgraphs, letting users interact to drill in. These can also provide the right abstractions for efficient deferred decision making and rendering.

Ultimately, extremely rich relational data in data science are typically represented by node and edge files. Visualizing these is often only possible incompletely, a few layers at a time, in a summarized manner. Keeping the canonical data in node/edge tables, with lightweight translators as above to produce documentation and review, has many benefits; it preserves rigor and information, while enabling trackability.

Exporting as code

This functionality is useful to have as a separate script.

CODE

all_code = r'''import pandas as pd
from typing import Optional, List, Dict
import itertools
import pathlib
from pathlib import Path


def build_mermaid(
    node_df: pd.DataFrame,
    edge_df: pd.DataFrame,
    output_path: str = "diagram.mmd",
    fence: bool = False,
    # ─── node columns ────────────────────────────────────────────────────
    node_id: str = "id",
    node_label: str = "label",
    node_shape: Optional[str] = "shape",
    node_fill: Optional[str] = "fill",
    node_stroke: Optional[str] = "stroke",
    node_url: Optional[str] = "url",
    node_tooltip: Optional[str] = "tooltip",
    node_text_color: Optional[str] = "text_color",
    node_text_style: Optional[str] = "text_style",
    node_subgraph: Optional[str] = "subgraph",
    # ─── edge columns ────────────────────────────────────────────────────
    edge_src: str = "src",
    edge_dst: str = "dst",
    edge_label: Optional[str] = "label",
    edge_color: Optional[str] = "color",
    edge_style: Optional[str] = "style",      # dashed | dotted
    edge_url: Optional[str] = "url",
    edge_label_color: Optional[str] = "label_color",
    edge_label_style: Optional[str] = "label_style",
    # ─── layout ──────────────────────────────────────────────────────────
    direction: str = "TD",
) -> str:
    """
    Build a Mermaid flow-chart from *node_df* and *edge_df* **with optional
    sub-graph grouping**.

    If *node_subgraph* names an existing column, every distinct non-blank
    value starts a block::

         subgraph <value>
             …nodes…
         end

    All styling logic (node shapes, colours, text styles, edge styles,
    helper vertices for clickable edges) is identical to the legacy
    `old_build_mermaid`, so diagrams are pixel-perfect when no subgraph
    column is present.

    Parameters
    ----------
    node_df, edge_df : pd.DataFrame
        Vertices and directed edges.
    output_path : str
        Target .mmd file.
    fence : bool, default False
        Wrap result in ```{mermaid}``` fences (for Quarto/Markdown).
    <column-name parameters>
        Override these if your DataFrame uses different headers; pass *None*
        when an attribute is absent.
    direction : {"TD","LR","RL","BT"}, default "TD"
        Graph orientation.

    Returns
    -------
    str
        Mermaid source that was written to *output_path*.
    """
    g: List[str] = [f"flowchart {direction}"]
    link_styles: List[str] = []
    edge_index, dummy_iter = 0, itertools.count()

    # helpers ------------------------------------------------------------
    get = lambda row, col, d="": row[col] if col and col in row and pd.notna(row[col]) else d
    css = lambda s: [c for t, c in (
                        ("italic",   "font-style:italic"),
                        ("bold",     "font-weight:bold"),
                        ("underline","text-decoration:underline")
                     ) if t in str(s)]
    colorize = lambda c: "" if not c else c if str(c).startswith("#") else f"#{c}"

    shape = {
        "rect"      : lambda i,l: f'{i}["{l}"]',
        "round"     : lambda i,l: f'{i}("{l}")',
        "circle"    : lambda i,l: f'{i}(("{l}"))',
        "stadium"   : lambda i,l: f'{i}(["{l}"])',
        "subroutine": lambda i,l: f'{i}[[{l}]]',
        "diamond"   : lambda i,l: f'{i}{{"{l}"}}',
    }

    # edge writer --------------------------------------------------------
    def add_edge(src, dst, lab, styles):
        nonlocal edge_index
        arrow = f' -->|{lab}| ' if lab else ' --> '
        g.append(f"    {src}{arrow}{dst}")
        if styles:
            x = ",".join(styles)
            link_styles.append(f"linkStyle {edge_index} {x}")
        edge_index += 1

    # edges (done first) -------------------------------------------------
    for _, e in edge_df.iterrows():
        lab, src, dst, url = get(e, edge_label), get(e, edge_src), get(e, edge_dst), get(e, edge_url)

        e_styles: List[str] = []
        if colorize(get(e, edge_color)):
            e_styles.append(f"stroke:{colorize(get(e, edge_color))}")
        if get(e, edge_style).lower() in {"dashed", "dotted"}:
            pattern = "5 5" if get(e, edge_style).lower() == "dashed" else "2 2"
            e_styles.append(f"stroke-dasharray:{pattern}")
        if colorize(get(e, edge_label_color)):
            e_styles.append(f"color:{colorize(get(e, edge_label_color))}")
        e_styles += css(get(e, edge_label_style))

        if url:  # invisible helper vertex keeps the edge clickable
            helper = f"h{next(dummy_iter)}"
            tooltip = lab or "link"
            g += [
                f"    {src} --> {helper}",
                f'    {helper}[""]',
                f"style {helper} fill:transparent,stroke:transparent",
                f'click {helper} "{url}" "{tooltip}"',
            ]
            if e_styles:
                x = ",".join(e_styles)
                link_styles.append(f"linkStyle {edge_index} {x}")
            edge_index += 1
            add_edge(helper, dst, lab, e_styles)
        else:
            add_edge(src, dst, lab, e_styles)

    # group nodes by subgraph -------------------------------------------
    groups: Dict[str, List[pd.Series]] = {}
    for _, row in node_df.iterrows():
        grp = get(row, node_subgraph) if node_subgraph and node_subgraph in node_df.columns else ""
        groups.setdefault(str(grp), []).append(row)

    def render_node(row: pd.Series):
        nid, lbl = get(row, node_id), get(row, node_label, get(row, node_id))
        tooltip = get(row, node_tooltip, lbl)

        g.append("    " + shape.get(get(row, node_shape, "rect").lower(), shape["rect"])(nid, lbl))

        if get(row, node_url):
            g.append(f'click {nid} "{get(row, node_url)}" "{tooltip}"')

        n_styles: List[str] = []
        if colorize(get(row, node_fill)):
            n_styles.append(f"fill:{colorize(get(row, node_fill))}")
        if colorize(get(row, node_stroke)):
            n_styles.append(f"stroke:{colorize(get(row, node_stroke))}")
        if colorize(get(row, node_text_color)):
            n_styles.append(f"color:{colorize(get(row, node_text_color))}")
        n_styles += css(get(row, node_text_style))
        if n_styles:
            g.append(f"style {nid} {','.join(n_styles)}")

    for sg, rows in groups.items():
        if sg and sg.lower() not in {"", "nan"}:
            g.append(f"subgraph {sg}")
        for r in rows:
            render_node(r)
        if sg and sg.lower() not in {"", "nan"}:
            g.append("end")

    g.extend(link_styles)
    if fence:
        g = ["```{mermaid}", *g, "```"]

    text = "\n".join(g)
    pathlib.Path(output_path).write_text(text)
    return text



def mermaid_to_html(
    mermaid_file_path: str,
    html_out: str
):
    HTML_TEMPLATE="""<!DOCTYPE html>
<html><head><meta charset="utf-8">
<script src="https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.min.js"></script>
<style>.node rect{{rx:6;ry:6;}}</style>
</head><body>
<div class="mermaid">
{diagram}
</div>
<script>mermaid.initialize({{startOnLoad:true}});</script>
</body></html>"""
    diagram = Path(mermaid_file_path).read_text()
    pathlib.Path(html_out).write_text(HTML_TEMPLATE.format(diagram=diagram))
    print(f"Wrote {html_out}")
'''

CODE

file_pfx = "../../files/utils/"

with open(file_pfx + "mermaid_tools.py", "w", encoding="utf-8") as f:
    f.write(all_code)

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{balsubramani,
  author = {Balsubramani, Akshay},
  title = {Graphs as Interactive Diagrams},
  langid = {en}
}

For attribution, please cite this work as:

Balsubramani, Akshay. n.d. “Graphs as Interactive Diagrams.”