In working with ultralarge chemical spaces, it is often useful for drug designers and other scientists like medicinal chemists to visualize the chemical decompositions of molecules, with a view towards understanding the local chemical space and the relationships between molecules.
In this post, I’ll cover a few different ways to do this which are appropriate for different settings.
Context
We often encounter a situation where we have computed a decomposition of a molecule into a set of structure fragments. The molecule itself is represented by these fragments and the reaction rules to reconstitute them back together.
Highlighting the origin of individual atoms
It’s useful to be able to manipulate atom-level visualizations of these fragment decompositions. For each atom, we’d like to be able to track which fragment it came from.
This is an extra bit of bookkeeping that can be done when evaluating the infix expression corresponding to each structure. To do this, we modify and refactor the evaluate_expression function to return a list of mol objects with annotated atoms, which can be interpreted in parallel with each product:
CODE
from collections import defaultdictimport matplotlib.colors, iofrom rdkit import Chem, RDLoggerfrom rdkit.Chem import AllChem, Drawfrom rdkit.Chem.Draw import rdMolDraw2Dfrom IPython.display import SVG, displayimport matplotlib.pyplot as plt, numpy as npfrom PIL import ImageRDLogger.DisableLog('rdApp.*') # keep the output tidydef _propagate_frag_ids(prod, reactants): p = Chem.Mol(prod)for at in p.GetAtoms(): ids =set()if at.HasProp("react_idx") and at.HasProp("react_atom_idx"): r =int(at.GetProp("react_idx")) a =int(at.GetProp("react_atom_idx")) ra = reactants[r].GetAtomWithIdx(a)if ra.HasProp("frag_id"): ids.update(ra.GetProp("frag_id").split(","))elif at.HasProp("old_mapno"): m =int(at.GetProp("old_mapno"))for rm in reactants:for ra in rm.GetAtoms():if ra.GetAtomMapNum() == m and ra.HasProp("frag_id"): ids.update(ra.GetProp("frag_id").split(","))ifnot ids: ids.add("NEW") at.SetProp("frag_id", ",".join(sorted(ids))) at.SetAtomMapNum(0)return pdef evaluate_expression(expr, frags_program, rxn_dict): tok = expr.split('_')iflen(tok) ==1and tok[0].startswith('F'): idx =int(tok[0][1:]) mol = Chem.MolFromSmiles(frags_program[idx])for a in mol.GetAtoms(): a.SetProp("frag_id", f"F{idx}")return [Chem.CanonSmiles(frags_program[idx])], [mol]# assume F# _ R# _ F# (binary) left, rcode, right = tok l_smi, l_mol = evaluate_expression(left, frags_program, rxn_dict) r_smi, r_mol = evaluate_expression(right, frags_program, rxn_dict) rxn = AllChem.ReactionFromSmarts(rxn_dict[rcode]) products, annotated = [], []for lm in l_mol:for rm in r_mol:for prods in rxn.RunReactants((lm, rm)): prod = prods[0] smi = Chem.CanonSmiles(Chem.MolToSmiles(prod, True))if smi in products:continue products.append(smi) annotated.append(_propagate_frag_ids(prod, (lm, rm)))return products, annotateddef _avg_rgb(cols):returntuple(float(np.mean([c[i] for c in cols])) for i inrange(3))def highlight_fragments(mol, size=(300,200), palette=["#bdbdbd", '#e6194b', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#46f0f0', '#f032e6', '#bcf60c', '#fabebe', '#008080', '#e6beff', '#9a6324', '#fffac8', '#800000', '#aaffc3', '#808000', '#ffd8b1', '#000075', '#808080', '#7d87b9', '#bec1d4', '#d6bcc0']): athighlights = defaultdict(list) arads = {} uq_fids =set()for a in mol.GetAtoms():for fid in a.GetProp("frag_id").split(","): uq_fids.add(fid) athighlights[a.GetIdx()].append( matplotlib.colors.to_rgb(palette[int(fid[1:])]) ) arads[a.GetIdx()] =0.5 bndhighlights = defaultdict(list)for b in mol.GetBonds():for fid in uq_fids:if fid in athighlights[b.GetBeginAtomIdx()] and fid in athighlights[b.GetEndAtomIdx()]: bndhighlights[b.GetIdx()].append( matplotlib.colors.to_rgb(palette[int(fid[1:])]) ) d = rdMolDraw2D.MolDraw2DSVG(*size) d = rdMolDraw2D.MolDraw2DCairo(*size)# rdMolDraw2D.PrepareAndDrawMolecule(# d, mol,# highlightAtoms=list(a_cols),# highlightAtomColors=a_cols,# highlightBonds=list(b_cols),# highlightBondColors=b_cols,# ) d.DrawMoleculeWithHighlights( mol, "", dict(athighlights),dict(bndhighlights),arads,{} ) d.FinishDrawing() img = Image.open(io.BytesIO(d.GetDrawingText())) display(img)#display(SVG(d.GetDrawingText()))
To illustrate this in action, we can turn to a standard example of a common ionizable lipid and break it down at its ester linkages.
CODE
frags = ["CCC[CH3:1]", # F0 : a methyl radical (map #1)"O=C[CH3:2]", # F1 : another methyl radical (map #2)]rxn_smarts = {"R1": "[CH3:1].[CH3:2]>>[C:1]-[C:2]"}expr ="F0_R1_F1"smis, mols = evaluate_expression(expr, frags, rxn_smarts)print("canonical SMILES:", smis[0])for a in mols[0].GetAtoms():print(f"Atom {a.GetIdx():>2}: frag_id = {a.GetProp('frag_id')}")highlight_fragments(mols[0])
With complex structures, this really does delineate the fragment-based decomposition of any chemical structure.
Tree-based images
A complementary way to proceed is to render fragments individually and use the inherent tree-based structure of the decomposition as a guide to visualization.
CODE
# Importsfrom datetime import datetimeimport numpy as npimport networkx as nxfrom networkx.drawing.nx_pydot import graphviz_layoutfrom networkx.readwrite import json_graphfrom rdkit import Chemfrom rdkit.Chem import Drawimport matplotlib.pyplot as pltimport matplotlib.lines as mlinesfrom matplotlib.offsetbox import OffsetImage, AnnotationBboxfrom PIL import Imageprint('Last modified:', datetime.now())
Last modified: 2025-08-22 02:26:20.542314
Style config
We want to color each reaction differently to better illustrate the skeleton of the molecule.
import itertoolsdef evaluate_expression_mod( infix_str, frags_program, rxn_smarts_dict, rxn_codes):""" Returns a list of SMILES strings that are the possible results of evaluating the infix expression infix_str. frags_program is a list of SMILES strings that correspond to fragment IDs in the expression. random_choice is the number of random fragments to subsample and add to the expression (None for no subsampling). """ rxn_names = {v: k for k, v in rxn_codes.items()} tokens = infix_str.split('_')# Base case: if token is a SMILES, return its valueiflen(tokens) ==1:if tokens[0].startswith('F'): # It's a fragment this_frag_smiles = frags_program[int(tokens[0][1:])] toret = [ this_frag_smiles ] # Add the fragment that constituted the original structure possible_results = [Chem.CanonSmiles(x) for x inlist(set(toret))]else: # It's a SMILES string possible_results = [ Chem.CanonSmiles(tokens[0]) ]return possible_results# Handle Parenthesesfor i, ch inenumerate(tokens):if ch =='(': start = ielif ch ==')': end = i mid_values = evaluate_expression('_'.join(tokens[start+1:end]), frags_program, rxn_smarts_dict, rxn_codes) toret = [] leftstr ='_'.join(tokens[:start]) rightstr ='_'.join(tokens[end+1:])for left_val in mid_values: strlst = []iflen(leftstr) >0: strlst.append(leftstr) strlst.append(str(left_val))iflen(rightstr) >0: strlst.append(rightstr) toret.extend(evaluate_expression('_'.join(strlst), frags_program, rxn_smarts_dict, rxn_codes))return toret# Handle reactionsfor i inreversed(range(len(tokens))):if tokens[i].startswith('R'): toret = [] left_vals = evaluate_expression('_'.join(tokens[:i]), frags_program, rxn_smarts_dict, rxn_codes) right_vals = evaluate_expression('_'.join(tokens[i+1:]), frags_program, rxn_smarts_dict, rxn_codes) reaction_code = tokens[i]for comp1, comp2 in itertools.product(left_vals, right_vals): reactants = [comp1, comp2] rxn = AllChem.ReactionFromSmarts(rxn_smarts_dict[rxn_names[reaction_code]]) products, annotated = [], [] left_smiles, left_mols = left_vals[0], left_vals[1] right_smiles, right_mols = right_vals[0], right_vals[1]for l_smi, l_mol inzip(left_smiles, left_mols):for r_smi, r_mol inzip(right_smiles, right_mols):for prods in rxn.RunReactants((l_mol, r_mol)):for prod in prods: smi = Chem.CanonSmiles(Chem.MolToSmiles(prod, True))if smi in products: # de‑dupcontinue products.append(smi) annotated.append(_propagate_frag_ids(prod, (l_mol, r_mol)))return products, annotatedfrom rdkit import Chemfrom rdkit.Chem import Drawfrom rdkit.Chem.Draw import rdMolDraw2Dfrom IPython.display import SVG, displayimport matplotlib.pyplot as pltimport numpy as npdef _average_rgb(colours):"""Blend ≥2 RGB tuples by arithmetic mean."""returntuple(float(np.mean([c[i] for c in colours])) for i inrange(3))def highlight_fragments(mol, size=(600, 400), cmap="tab20"):""" Render an SVG of `mol` where each fragment ID (mol atom prop 'frag_id') gets its own colour, following the RDKit Cookbook recipe for multiple highlight colours. Parameters ---------- mol : RDKit Mol that already carries per-atom `frag_id` properties size : (w,h) tuple in px cmap : any Matplotlib qualitative palette name with ≥ #fragments colours """ mol = Chem.Mol(mol) # defensive copy rdMolDraw2D.PrepareMolForDrawing(mol)# 1) collect atoms per fragment frag_to_atoms = {}for atom in mol.GetAtoms():ifnot atom.HasProp("frag_id"): continuefor fid in atom.GetProp("frag_id").split(","): frag_to_atoms.setdefault(fid, set()).add(atom.GetIdx())# 2) assign a distinct base colour to each fragment ID palette = plt.cm.get_cmap(cmap, len(frag_to_atoms)) fid2colour = {fid: palette(i)[:3] for i, fid inenumerate(sorted(frag_to_atoms))}# 3) build highlight dictionaries following the Cookbook idiom atom_highlights, bond_highlights = {}, {}for fid, atoms in frag_to_atoms.items(): colour = fid2colour[fid]for a in atoms:if a in atom_highlights: # linker atom → blend colours atom_highlights[a] = _average_rgb([atom_highlights[a], colour])else: atom_highlights[a] = colour# bonds where both atoms belong to this fragmentfor bond in mol.GetBonds():if bond.GetBeginAtomIdx() in atoms and bond.GetEndAtomIdx() in atoms: bidx = bond.GetIdx()if bidx in bond_highlights: bond_highlights[bidx] = _average_rgb([bond_highlights[bidx], colour])else: bond_highlights[bidx] = colour# 4) flat list of all highlighted atoms/bonds hatoms =list(atom_highlights) hbonds =list(bond_highlights)# 5) RDKit Cookbook draw → SVG drawer = rdMolDraw2D.MolDraw2DSVG(*size) rdMolDraw2D.PrepareAndDrawMolecule( drawer, mol, highlightAtoms=hatoms, highlightAtomColors=atom_highlights, highlightBonds=hbonds, highlightBondColors=bond_highlights, ) drawer.FinishDrawing() display(SVG(drawer.GetDrawingText()))