Embedding ultralarge spaces

cheminformatics
LNP
Revisiting LNP libraries for rational design
Author

Akshay Balsubramani

Introduction

Implicitly defining combinatorial spaces using synthons is often the only way to proceed when virtual spaces get truly large. In such situations, we can score such spaces with machine learning models and even do neighborhood lookups on spaces to great effect.

Another indispensable addition to the toolkit is embedding methods, which are able to reduce any molecule into a fixed-length vector. With these in hand, we can add prior knowledge to generative algorithms, generate analogs of existing drugs in different classes, interpolate between successful candidates, and more.

We show some such embedding functionality in action in this post, by running it on a fragment-based ionizable lipid library for LNP engineering.

Loading synthon library

Other posts cover how to construct such a library. We build off another post which saves such a library to a file. As written, the file is a self-contained recipe for a combinatorial large virtual space, which does not specify reaction rules for putting the synthons together.

But the CSL-VAE algorithm requires such reaction rules to function. Its specification includes them:

Thee reaction rules are in fact implicitly defined, as forming bonds between marked reaction sites on the synthons.

Loading autoencoder model

The CSL-VAE paper (Pedawi et al. 2022)

the code

Analogue generation with neighborhood lookups

Pedawi, Aryan, Pawel Gniewek, Chaoyi Chang, Brandon Anderson, and Henry van den Bedem. 2022. “An Efficient Graph Generative Model for Navigating Ultra-Large Combinatorial Synthesis Libraries.” Advances in Neural Information Processing Systems 35: 8731–45.

Reuse

Citation

BibTeX citation:
@online{balsubramani,
  author = {Balsubramani, Akshay},
  title = {Embedding Ultralarge Spaces},
  langid = {en}
}
For attribution, please cite this work as:
Balsubramani, Akshay. n.d. “Embedding Ultralarge Spaces.”