flowchart TD S{"Choose approach"} A[Build MVP] B[Run Survey] C((Strong demand)) D((Weak demand)) E([Ship]) F([Pivot]) S -->|cost=$100k| A S -->|cost=$30k| B A -->|p=0.6| C A -->|p=0.4| D B -->|p=0.5| C B -->|p=0.5| D C -->|EV=$1.2M| E D -->|EV=$0.2M| F
Graphs as interactive diagrams
Graphs are a language for decision-making
Decision-making almost always involves moving from one situation to another under uncertainty and constraints. That “movement” has a direction (from current state → next state) and often branches (different actions or external events). Graphs are the natural data structure for representing exactly those ideas:
Sequential structure → paths (often trees).
Alternative futures / feedback → branching & cycles.
Quantities attached to situations or transitions → node/edge attributes.
We’ll focus on operational decisions under uncertainty, where we make sequential choices, move through observable or latent states, and need human‑legible artifacts for review, audit, and iteration.
Sequential choices → trees
Many decisions are temporal and branching: choose an action, observe an outcome, then choose again. That structure is exactly specified by a tree. Trees make conditionality explicit (“if A then B else C”). They also support roll‑ups, like those based on mean value: along a set of paths P with node costs \(c_i\), the expected utility is \(\sum_{P} \text{Pr}(\text{path } P)\,(\sum_{i \in P} c_i)\). They are legible to non‑technical stakeholders (product, ops, legal), and they localize edits: changing one branch’s probability or cost updates only dependent subtrees, not the whole model.
Business intelligence suites (e.g., Salesforce Einstein, Microsoft Power BI) expose “decision tree” explainers, for a few common reasons:
- Interpretability – each split is a plain-language rule. Paths can yield traceable rationales, as for decision trees.
- Structural alignment with KPIs – branches align with “funnel” metrics (region → segment → customer) and with organizational structures.
- Perturbability – analysts prune or graft branches to test policy changes.
The underlying structure is still a directed graph with attributes, rendered tidily for narrative clarity.
Stateful processes are graphical, too
When the same state can be revisited (e.g., inventory levels, patient health states, Markov chains), a pure tree explodes in size. State diagrams compress identical states into single nodes and allow cycles. This type of graphical model allows for meaningful reachability queries (“How likely is stock-out within 7 steps?”) and shortest-path optimizations (“Minimal steps back to In Stock”).
The natural ML/AI abstraction aligning with this notion is a state diagram, often aligning with Markov decision processes:
- States carry context: Under Review, Flagged‑High Risk, Resolved
- Transitions encode actions or events with guards: severity > 3, timeout, approval received
- Policies correspond to distributions over outgoing edges from each state
A minimal Mermaid state diagram can be written out as an example, showcasing ideas like self-loops and unnamed nodes.
stateDiagram-v2 [*] --> Idle Idle --> Investigate: alert Investigate --> Remediate: HIGH severity Investigate --> Monitor: LOW severity Remediate --> Monitor Monitor --> Monitor Monitor --> Idle: resolve Monitor --> Investigate: regress
Interactive visualization for these graphs
For such graphs, interactive visualizations can be quite important, because humans grasp patterns faster when they can pan/zoom, collapse subtrees, or highlight cycles, enabling exploratory analysis and compelling grounded storytelling.
Node+edge file formats for data analysis
For the purposes of data analysis, graphs are often specified by two parallel files.
A node file lists nodes, one per line, with associated metadata.
An edge file lists edges, one per line. Each has a source node and a destination node, as well as any associated metadata for that edge.
This covers all forms of graphs that are normally used in decision-making. It’s great for joins, versioning, and analysis, but a translator is needed for common visualization languages. Mermaid is an effective lowest‑friction target: it is popular because it’s plaintext, renders in docs and wikis, and supports labels, shapes, and links.
An example
As a working example, we can illustrate these concepts with a schematic of a standard drug discovery pipeline, as is common in pharma and biotech. This is essentially a linear graph, with some of the nodes and edges being styled differently from the rest to denote special semantics. Writing it in node+edge form mimics the way such charts often represented.
CODE
import pandas as pd, itertools, pathlib
from typing import Optional, List
= {
node_dict "id": ["TID", "VAL", "HTS", "H2L", "PRE", "IND", "P1", "P2", "P3", "NDA", "FDA"],
"label": ["Target Identification", "Target Validation?", "Hit Discovery (HTS)", "Hit‑to‑Lead / Lead Optimization", "Pre-clinical Safety", "IND Filing", "Phase I", "Phase II", "Phase III", "NDA Submission", "Regulatory Approval"],
"shape": ["rect", "diamond", "rect", "rect", "rect", "rect", "rect", "rect", "rect", "rect", "rect"],
"fill": ["#c6dbef", "#fdd49e", "#c6dbef", "#c6dbef", "#c6dbef", "#c6dbef", "#c7e9c0", "#c7e9c0", "#c7e9c0", "#c6dbef", "#c6dbef"],
"stroke": ["#3182bd", "#e6550d", "#3182bd", "#3182bd", "#3182bd", "#3182bd", "#238b45", "#238b45", "#238b45", "#3182bd", "#3182bd"],
"url": ["https://en.wikipedia.org/wiki/Drug_target", "https://www.cell.com/trends/pharmacological-sciences/fulltext/S0165-6147(23)00137-2", "https://en.wikipedia.org/wiki/High-throughput_screening", "", "", "https://en.wikipedia.org/wiki/Investigational_New_Drug", "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_I", "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_II", "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_III", "https://en.wikipedia.org/wiki/New_Drug_Application", "https://en.wikipedia.org/wiki/Food_and_Drug_Administration"],
"subgraph": ["", "", "", "", "", "", "Clinical + post-clinical", "Clinical + post-clinical", "Clinical + post-clinical", "Clinical + post-clinical", "Clinical + post-clinical"],
}
= pd.DataFrame(node_dict)
node_df
'text_color'] = '#000000'
node_df['text_style'] = 'bold'
node_df[
= {
edge_dict "src": ["TID", "VAL", "HTS", "H2L", "PRE", "IND", "P1", "P2", "P3", "NDA"],
"dst": ["VAL", "HTS", "H2L", "PRE", "IND", "P1", "P2", "P3", "NDA", "FDA"],
"label": ["", "various", "", "", "", "", "", "", "", ""],
"color": ["", "cb181d", "", "", "", "", "", "", "", ""],
"style": ["", "dotted", "", "", "", "", "", "", "", ""],
"url": ["", "", "", "", "", "", "", "", "", ""],
}
= pd.DataFrame(edge_dict)
edge_df
node_df
id | label | shape | fill | stroke | url | subgraph | text_color | text_style | |
---|---|---|---|---|---|---|---|---|---|
0 | TID | Target Identification | rect | #c6dbef | #3182bd | https://en.wikipedia.org/wiki/Drug_target | #000000 | bold | |
1 | VAL | Target Validation? | diamond | #fdd49e | #e6550d | https://www.cell.com/trends/pharmacological-sc... | #000000 | bold | |
2 | HTS | Hit Discovery (HTS) | rect | #c6dbef | #3182bd | https://en.wikipedia.org/wiki/High-throughput_... | #000000 | bold | |
3 | H2L | Hit‑to‑Lead / Lead Optimization | rect | #c6dbef | #3182bd | #000000 | bold | ||
4 | PRE | Pre-clinical Safety | rect | #c6dbef | #3182bd | #000000 | bold | ||
5 | IND | IND Filing | rect | #c6dbef | #3182bd | https://en.wikipedia.org/wiki/Investigational_... | #000000 | bold | |
6 | P1 | Phase I | rect | #c7e9c0 | #238b45 | https://en.wikipedia.org/wiki/Phases_of_clinic... | Clinical + post-clinical | #000000 | bold |
7 | P2 | Phase II | rect | #c7e9c0 | #238b45 | https://en.wikipedia.org/wiki/Phases_of_clinic... | Clinical + post-clinical | #000000 | bold |
8 | P3 | Phase III | rect | #c7e9c0 | #238b45 | https://en.wikipedia.org/wiki/Phases_of_clinic... | Clinical + post-clinical | #000000 | bold |
9 | NDA | NDA Submission | rect | #c6dbef | #3182bd | https://en.wikipedia.org/wiki/New_Drug_Applica... | Clinical + post-clinical | #000000 | bold |
10 | FDA | Regulatory Approval | rect | #c6dbef | #3182bd | https://en.wikipedia.org/wiki/Food_and_Drug_Ad... | Clinical + post-clinical | #000000 | bold |
Translating data-science graphs → interactive flowcharts
This process of translation is largely a mechanical exercise, which we give code for below. A few points turn out to be key in writing this code.
- Edge metadata → label: join selected columns as
key=value
pairs. - Keep stable IDs (sanitize to alphanumerics/underscore) and human labels separately.
- Preserve ordering where it helps reading (top‑down
graph TD
), but don’t rely on it semantically.
A node-edge-format → Mermaid converter
A first cut at this problem involves writing a compact syntactic converter in Python. Let’s demonstrate with the following directed graph, a toy schematic of a drug discovery pipeline:
graph LR TID --> VAL VAL -->|various| HTS HTS --> H2L H2L --> LO LO --> PRE PRE --> IND IND --> P1 subgraph Clinical + post-clinical P1 --> P2 P2 --> P3 P3 --> NDA NDA --> FDA end TID["Target Identification"] click TID "https://en.wikipedia.org/wiki/Drug_target" "Target Identification" style TID fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold VAL{"Target Validation?"} click VAL "https://www.cell.com/trends/pharmacological-sciences/fulltext/S0165-6147(23)00137-2" "Target Validation?" style VAL fill:#fdd49e,stroke:#e6550d,color:#000000,font-weight:bold HTS["Hit Discovery (HTS)"] click HTS "https://en.wikipedia.org/wiki/High-throughput_screening" "Hit Discovery (HTS)" style HTS fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold H2L["Hit‑to‑Lead"] style H2L fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold LO["Lead Optimization"] style LO fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold PRE["Pre-clinical Safety"] style PRE fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold IND["IND Filing"] click IND "https://en.wikipedia.org/wiki/Investigational_New_Drug" "IND Filing" style IND fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold P1["Phase I"] click P1 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_I" "Phase I" style P1 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold P2["Phase II"] click P2 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_II" "Phase II" style P2 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold P3["Phase III"] click P3 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_III" "Phase III" style P3 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold NDA["NDA Submission"] click NDA "https://en.wikipedia.org/wiki/New_Drug_Application" "NDA Submission" style NDA fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold FDA["Regulatory Approval"] click FDA "https://en.wikipedia.org/wiki/Food_and_Drug_Administration" "Regulatory Approval" style FDA fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold linkStyle 1 stroke:#cb181d,stroke-dasharray:2 2
It’s useful to have a one-stop function to make these and other types of charts, in Markdown, from node-edge representations. Here we provide one such function called build_mermaid
. It allows extra node- and edge-specific columns to be provided in which various style parameters of the mermaid chart can be specified, on a node- and edge-specific basis. This requires a significant amount of Mermaid-specific bookkeeping, which is implemented below.
It involves an optional node_subgraph
column being added to node_df. Each row can have either (a) NaN / empty → node goes to the global scope; or (b) a string → node is placed inside the subgraph with that name. After collecting all distinct subgraph names, for each subgraph we emit a subgraph NAME … end
block containing all its nodes. Nodes without a subgraph remain at the top level.
Such code can be tested on the above stylized model of the drug discovery pipeline, to demonstrate some of the style and linking options in a simple specification.
CODE
import itertools, pathlib, pandas as pd
from typing import Optional, List, Dict
def build_mermaid(
node_df: pd.DataFrame,
edge_df: pd.DataFrame,str = "diagram.mmd",
output_path: bool = False,
fence: # ─── node columns ────────────────────────────────────────────────────
str = "id",
node_id: str = "label",
node_label: str] = "shape",
node_shape: Optional[str] = "fill",
node_fill: Optional[str] = "stroke",
node_stroke: Optional[str] = "url",
node_url: Optional[str] = "tooltip",
node_tooltip: Optional[str] = "text_color",
node_text_color: Optional[str] = "text_style",
node_text_style: Optional[str] = "subgraph",
node_subgraph: Optional[# ─── edge columns ────────────────────────────────────────────────────
str = "src",
edge_src: str = "dst",
edge_dst: str] = "label",
edge_label: Optional[str] = "color",
edge_color: Optional[str] = "style", # dashed | dotted
edge_style: Optional[str] = "url",
edge_url: Optional[str] = "label_color",
edge_label_color: Optional[str] = "label_style",
edge_label_style: Optional[# ─── layout ──────────────────────────────────────────────────────────
str = "TD",
direction: -> str:
) """
Build a Mermaid flow-chart from *node_df* and *edge_df* **with optional
sub-graph grouping**.
If *node_subgraph* names an existing column, every distinct non-blank
value starts a block::
subgraph <value>
…nodes…
end
All styling logic (node shapes, colours, text styles, edge styles,
helper vertices for clickable edges) is identical to the legacy
`old_build_mermaid`, so diagrams are pixel-perfect when no subgraph
column is present.
Parameters
----------
node_df, edge_df : pd.DataFrame
Vertices and directed edges.
output_path : str
Target .mmd file.
fence : bool, default False
Wrap result in ```{mermaid}``` fences (for Quarto/Markdown).
<column-name parameters>
Override these if your DataFrame uses different headers; pass *None*
when an attribute is absent.
direction : {"TD","LR","RL","BT"}, default "TD"
Graph orientation.
Returns
-------
str
Mermaid source that was written to *output_path*.
"""
str] = [f"flowchart {direction}"]
g: List[str] = []
link_styles: List[= 0, itertools.count()
edge_index, dummy_iter
# helpers ------------------------------------------------------------
= lambda row, col, d="": row[col] if col and col in row and pd.notna(row[col]) else d
get = lambda s: [c for t, c in (
css "italic", "font-style:italic"),
("bold", "font-weight:bold"),
("underline","text-decoration:underline")
(if t in str(s)]
) = lambda c: "" if not c else c if str(c).startswith("#") else f"#{c}"
colorize
= {
shape "rect" : lambda i,l: f'{i}["{l}"]',
"round" : lambda i,l: f'{i}("{l}")',
"circle" : lambda i,l: f'{i}(("{l}"))',
"stadium" : lambda i,l: f'{i}(["{l}"])',
"subroutine": lambda i,l: f'{i}[[{l}]]',
"diamond" : lambda i,l: f'{i}{{"{l}"}}',
}
# edge writer --------------------------------------------------------
def add_edge(src, dst, lab, styles):
nonlocal edge_index
= f' -->|{lab}| ' if lab else ' --> '
arrow f" {src}{arrow}{dst}")
g.append(if styles:
= ",".join(styles)
x f"linkStyle {edge_index} {x}")
link_styles.append(+= 1
edge_index
# edges (done first) -------------------------------------------------
for _, e in edge_df.iterrows():
= get(e, edge_label), get(e, edge_src), get(e, edge_dst), get(e, edge_url)
lab, src, dst, url
str] = []
e_styles: List[if colorize(get(e, edge_color)):
f"stroke:{colorize(get(e, edge_color))}")
e_styles.append(if get(e, edge_style).lower() in {"dashed", "dotted"}:
= "5 5" if get(e, edge_style).lower() == "dashed" else "2 2"
pattern f"stroke-dasharray:{pattern}")
e_styles.append(if colorize(get(e, edge_label_color)):
f"color:{colorize(get(e, edge_label_color))}")
e_styles.append(+= css(get(e, edge_label_style))
e_styles
if url: # invisible helper vertex keeps the edge clickable
= f"h{next(dummy_iter)}"
helper = lab or "link"
tooltip += [
g f" {src} --> {helper}",
f' {helper}[""]',
f"style {helper} fill:transparent,stroke:transparent",
f'click {helper} "{url}" "{tooltip}"',
]if e_styles:
= ",".join(e_styles)
x f"linkStyle {edge_index} {x}")
link_styles.append(+= 1
edge_index
add_edge(helper, dst, lab, e_styles)else:
add_edge(src, dst, lab, e_styles)
# group nodes by subgraph -------------------------------------------
str, List[pd.Series]] = {}
groups: Dict[for _, row in node_df.iterrows():
= get(row, node_subgraph) if node_subgraph and node_subgraph in node_df.columns else ""
grp str(grp), []).append(row)
groups.setdefault(
def render_node(row: pd.Series):
= get(row, node_id), get(row, node_label, get(row, node_id))
nid, lbl = get(row, node_tooltip, lbl)
tooltip
" " + shape.get(get(row, node_shape, "rect").lower(), shape["rect"])(nid, lbl))
g.append(
if get(row, node_url):
f'click {nid} "{get(row, node_url)}" "{tooltip}"')
g.append(
str] = []
n_styles: List[if colorize(get(row, node_fill)):
f"fill:{colorize(get(row, node_fill))}")
n_styles.append(if colorize(get(row, node_stroke)):
f"stroke:{colorize(get(row, node_stroke))}")
n_styles.append(if colorize(get(row, node_text_color)):
f"color:{colorize(get(row, node_text_color))}")
n_styles.append(+= css(get(row, node_text_style))
n_styles if n_styles:
f"style {nid} {','.join(n_styles)}")
g.append(
for sg, rows in groups.items():
if sg and sg.lower() not in {"", "nan"}:
f"subgraph {sg}")
g.append(for r in rows:
render_node(r)if sg and sg.lower() not in {"", "nan"}:
"end")
g.append(
g.extend(link_styles)if fence:
= ["```{mermaid}", *g, "```"]
g
= "\n".join(g)
text
pathlib.Path(output_path).write_text(text)return text
Running the translation code on these dataframes gives the desired stylized pipeline, with supporting links as necessary.
CODE
= """
a - [sdfasdf]()
"""
CODE
= "graph.mmd"
md_file_path
build_mermaid(
node_df, edge_df, ="LR",
direction=md_file_path
output_path
)
from pathlib import Path
from IPython.display import Markdown, display
= Path(md_file_path).read_text()
diagram = f"```{mermaid}\n{diagram}\n```"
mermaid_str display(Markdown(mermaid_str))
flowchart LR TID --> VAL VAL -->|various| HTS HTS --> H2L H2L --> PRE PRE --> IND IND --> P1 P1 --> P2 P2 --> P3 P3 --> NDA NDA --> FDA TID["Target Identification"] click TID "https://en.wikipedia.org/wiki/Drug_target" "Target Identification" style TID fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold VAL{"Target Validation?"} click VAL "https://www.cell.com/trends/pharmacological-sciences/fulltext/S0165-6147(23)00137-2" "Target Validation?" style VAL fill:#fdd49e,stroke:#e6550d,color:#000000,font-weight:bold HTS["Hit Discovery (HTS)"] click HTS "https://en.wikipedia.org/wiki/High-throughput_screening" "Hit Discovery (HTS)" style HTS fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold H2L["Hit‑to‑Lead / Lead Optimization"] style H2L fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold PRE["Pre-clinical Safety"] style PRE fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold IND["IND Filing"] click IND "https://en.wikipedia.org/wiki/Investigational_New_Drug" "IND Filing" style IND fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold subgraph Clinical + post-clinical P1["Phase I"] click P1 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_I" "Phase I" style P1 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold P2["Phase II"] click P2 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_II" "Phase II" style P2 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold P3["Phase III"] click P3 "https://en.wikipedia.org/wiki/Phases_of_clinical_research#Phase_III" "Phase III" style P3 fill:#c7e9c0,stroke:#238b45,color:#000000,font-weight:bold NDA["NDA Submission"] click NDA "https://en.wikipedia.org/wiki/New_Drug_Application" "NDA Submission" style NDA fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold FDA["Regulatory Approval"] click FDA "https://en.wikipedia.org/wiki/Food_and_Drug_Administration" "Regulatory Approval" style FDA fill:#c6dbef,stroke:#3182bd,color:#000000,font-weight:bold end linkStyle 1 stroke:#cb181d,stroke-dasharray:2 2
Export to HTML
Exporting such charts as standalone HTML files requires referring to the freely available Mermaid JavaScript implementation, and rounds off our implementation with a portable, writable file for any Mermaid chart.
CODE
def mermaid_to_html(
str,
mermaid_file_path: str
html_out:
):="""<!DOCTYPE html>
HTML_TEMPLATE<html><head><meta charset="utf-8">
<script src="https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.min.js"></script>
<style>.node rect{{rx:6;ry:6;}}</style>
</head><body>
<div class="mermaid">
{diagram}
</div>
<script>mermaid.initialize({{startOnLoad:true}});</script>
</body></html>"""
= Path(mermaid_file_path).read_text()
diagram format(diagram=diagram))
pathlib.Path(html_out).write_text(HTML_TEMPLATE.print(f"Wrote {html_out}")
"graph.mmd", "graph.html") mermaid_to_html(
Wrote graph.html
This can be opened in any browser, with an HTML file that is barely bigger than the specifying Mermaid file. The HTML file essentially just links to the freely available minimal Javascript implementation of Mermaid, which does the heavy lifting.
Practical themes
As all this illustrates, it’s extremely useful to have purely text-based representations of graphs, for ease of manipulation and storage. Mermaid is the consensus open standard for such text representations of networks/graphs/flowcharts with a Javascript backend.
The structure of graphs gives ample opportunities to differ computation and rendering, making plenty of interactive manipulations possible. Large graphs can be summarized with subgraphs, letting users interact to drill in. These can also provide the right abstractions for efficient deferred decision making and rendering.
Ultimately, extremely rich relational data in data science are typically represented by node and edge files. Visualizing these is often only possible incompletely, a few layers at a time, in a summarized manner. Keeping the canonical data in node/edge tables, with lightweight translators as above to produce documentation and review, has many benefits; it preserves rigor and information, while enabling trackability.
Exporting as code
This functionality is useful to have as a separate script.
CODE
= r'''import pandas as pd
all_code from typing import Optional, List, Dict
import itertools
import pathlib
from pathlib import Path
def build_mermaid(
node_df: pd.DataFrame,
edge_df: pd.DataFrame,
output_path: str = "diagram.mmd",
fence: bool = False,
# ─── node columns ────────────────────────────────────────────────────
node_id: str = "id",
node_label: str = "label",
node_shape: Optional[str] = "shape",
node_fill: Optional[str] = "fill",
node_stroke: Optional[str] = "stroke",
node_url: Optional[str] = "url",
node_tooltip: Optional[str] = "tooltip",
node_text_color: Optional[str] = "text_color",
node_text_style: Optional[str] = "text_style",
node_subgraph: Optional[str] = "subgraph",
# ─── edge columns ────────────────────────────────────────────────────
edge_src: str = "src",
edge_dst: str = "dst",
edge_label: Optional[str] = "label",
edge_color: Optional[str] = "color",
edge_style: Optional[str] = "style", # dashed | dotted
edge_url: Optional[str] = "url",
edge_label_color: Optional[str] = "label_color",
edge_label_style: Optional[str] = "label_style",
# ─── layout ──────────────────────────────────────────────────────────
direction: str = "TD",
) -> str:
"""
Build a Mermaid flow-chart from *node_df* and *edge_df* **with optional
sub-graph grouping**.
If *node_subgraph* names an existing column, every distinct non-blank
value starts a block::
subgraph <value>
…nodes…
end
All styling logic (node shapes, colours, text styles, edge styles,
helper vertices for clickable edges) is identical to the legacy
`old_build_mermaid`, so diagrams are pixel-perfect when no subgraph
column is present.
Parameters
----------
node_df, edge_df : pd.DataFrame
Vertices and directed edges.
output_path : str
Target .mmd file.
fence : bool, default False
Wrap result in ```{mermaid}``` fences (for Quarto/Markdown).
<column-name parameters>
Override these if your DataFrame uses different headers; pass *None*
when an attribute is absent.
direction : {"TD","LR","RL","BT"}, default "TD"
Graph orientation.
Returns
-------
str
Mermaid source that was written to *output_path*.
"""
g: List[str] = [f"flowchart {direction}"]
link_styles: List[str] = []
edge_index, dummy_iter = 0, itertools.count()
# helpers ------------------------------------------------------------
get = lambda row, col, d="": row[col] if col and col in row and pd.notna(row[col]) else d
css = lambda s: [c for t, c in (
("italic", "font-style:italic"),
("bold", "font-weight:bold"),
("underline","text-decoration:underline")
) if t in str(s)]
colorize = lambda c: "" if not c else c if str(c).startswith("#") else f"#{c}"
shape = {
"rect" : lambda i,l: f'{i}["{l}"]',
"round" : lambda i,l: f'{i}("{l}")',
"circle" : lambda i,l: f'{i}(("{l}"))',
"stadium" : lambda i,l: f'{i}(["{l}"])',
"subroutine": lambda i,l: f'{i}[[{l}]]',
"diamond" : lambda i,l: f'{i}{{"{l}"}}',
}
# edge writer --------------------------------------------------------
def add_edge(src, dst, lab, styles):
nonlocal edge_index
arrow = f' -->|{lab}| ' if lab else ' --> '
g.append(f" {src}{arrow}{dst}")
if styles:
x = ",".join(styles)
link_styles.append(f"linkStyle {edge_index} {x}")
edge_index += 1
# edges (done first) -------------------------------------------------
for _, e in edge_df.iterrows():
lab, src, dst, url = get(e, edge_label), get(e, edge_src), get(e, edge_dst), get(e, edge_url)
e_styles: List[str] = []
if colorize(get(e, edge_color)):
e_styles.append(f"stroke:{colorize(get(e, edge_color))}")
if get(e, edge_style).lower() in {"dashed", "dotted"}:
pattern = "5 5" if get(e, edge_style).lower() == "dashed" else "2 2"
e_styles.append(f"stroke-dasharray:{pattern}")
if colorize(get(e, edge_label_color)):
e_styles.append(f"color:{colorize(get(e, edge_label_color))}")
e_styles += css(get(e, edge_label_style))
if url: # invisible helper vertex keeps the edge clickable
helper = f"h{next(dummy_iter)}"
tooltip = lab or "link"
g += [
f" {src} --> {helper}",
f' {helper}[""]',
f"style {helper} fill:transparent,stroke:transparent",
f'click {helper} "{url}" "{tooltip}"',
]
if e_styles:
x = ",".join(e_styles)
link_styles.append(f"linkStyle {edge_index} {x}")
edge_index += 1
add_edge(helper, dst, lab, e_styles)
else:
add_edge(src, dst, lab, e_styles)
# group nodes by subgraph -------------------------------------------
groups: Dict[str, List[pd.Series]] = {}
for _, row in node_df.iterrows():
grp = get(row, node_subgraph) if node_subgraph and node_subgraph in node_df.columns else ""
groups.setdefault(str(grp), []).append(row)
def render_node(row: pd.Series):
nid, lbl = get(row, node_id), get(row, node_label, get(row, node_id))
tooltip = get(row, node_tooltip, lbl)
g.append(" " + shape.get(get(row, node_shape, "rect").lower(), shape["rect"])(nid, lbl))
if get(row, node_url):
g.append(f'click {nid} "{get(row, node_url)}" "{tooltip}"')
n_styles: List[str] = []
if colorize(get(row, node_fill)):
n_styles.append(f"fill:{colorize(get(row, node_fill))}")
if colorize(get(row, node_stroke)):
n_styles.append(f"stroke:{colorize(get(row, node_stroke))}")
if colorize(get(row, node_text_color)):
n_styles.append(f"color:{colorize(get(row, node_text_color))}")
n_styles += css(get(row, node_text_style))
if n_styles:
g.append(f"style {nid} {','.join(n_styles)}")
for sg, rows in groups.items():
if sg and sg.lower() not in {"", "nan"}:
g.append(f"subgraph {sg}")
for r in rows:
render_node(r)
if sg and sg.lower() not in {"", "nan"}:
g.append("end")
g.extend(link_styles)
if fence:
g = ["```{mermaid}", *g, "```"]
text = "\n".join(g)
pathlib.Path(output_path).write_text(text)
return text
def mermaid_to_html(
mermaid_file_path: str,
html_out: str
):
HTML_TEMPLATE="""<!DOCTYPE html>
<html><head><meta charset="utf-8">
<script src="https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.min.js"></script>
<style>.node rect{{rx:6;ry:6;}}</style>
</head><body>
<div class="mermaid">
{diagram}
</div>
<script>mermaid.initialize({{startOnLoad:true}});</script>
</body></html>"""
diagram = Path(mermaid_file_path).read_text()
pathlib.Path(html_out).write_text(HTML_TEMPLATE.format(diagram=diagram))
print(f"Wrote {html_out}")
'''
CODE
= "../../files/utils/"
file_pfx
with open(file_pfx + "mermaid_tools.py", "w", encoding="utf-8") as f:
f.write(all_code)
Reuse
Citation
@online{balsubramani,
author = {Balsubramani, Akshay},
title = {Graphs as Interactive Diagrams},
langid = {en}
}