Embedding ultralarge spaces
Introduction
Implicitly defining combinatorial spaces using synthons is often the only way to proceed when virtual spaces get truly large. In such situations, we can score such spaces with machine learning models and even do neighborhood lookups on spaces to great effect.
Another indispensable addition to the toolkit is embedding methods, which are able to reduce any molecule into a fixed-length vector. With these in hand, we can add prior knowledge to generative algorithms, generate analogs of existing drugs in different classes, interpolate between successful candidates, and more.
We show some such embedding functionality in action in this post, by running it on a fragment-based ionizable lipid library for LNP engineering.
Loading synthon library
Other posts cover how to construct such a library. We build off another post which saves such a library to a file. As written, the file is a self-contained recipe for a combinatorial large virtual space, which does not specify reaction rules for putting the synthons together.
But the CSL-VAE algorithm requires such reaction rules to function. Its specification includes them:
Thee reaction rules are in fact implicitly defined, as forming bonds between marked reaction sites on the synthons.
Loading autoencoder model
Analogue generation with neighborhood lookups
Reuse
Citation
@online{balsubramani,
author = {Balsubramani, Akshay},
title = {Embedding Ultralarge Spaces},
langid = {en}
}