TL;DR: This article collects linear algebra methods that fall within the standard numpy/scipy stack of methods for working with matrices,
Linear algebra constitutes a set of the fastest and most fundamental computational tools that AI methods use. For those who want to work efficiently with high-resolution sparse data, linear algebra methods are indispensable, even when they are abstracted away from a lot of other machine learning methods.
The workhorse of all linear algebra is the singular value decomposition for breaking down matrices quickly. This has been very well implemented and studied. But when dealing with data from proteins, chemicals, signaling molecules, and nucleic acid sequences, the sparsity and statistical patterns can tax even the SVD. To use it in the correct manner and get reliable results, humans and other agents require guides and usage examples which provide tacit information that is not explicitly documented.
We exhibit many such methods from linear algebra for these and other purposes.
A guide to these posts
There are a number of posts on specific uses of linear algebra tools, each overviewing a set of tools in its modern form. These tools are among the most efficient possible on GPUs for various algorithmic tasks.
s ="""- [The SVD and its uses](https://akshay.bio/variable-expectations/posts/linear-algebra/SVD-and-friends.html)- [Random projections and dimensionality reduction](https://akshay.bio/variable-expectations/posts/linear-algebra/RP-dimension-reduction.html)- [SVD-based imputation methods](https://akshay.bio/variable-expectations/posts/linear-algebra/SVD-imputation.html)- [Low-rank approximation of matrices and graphs](https://akshay.bio/variable-expectations/posts/linear-algebra/Low-rank-Nystrom.html)- [Solving sparse linear equations](https://akshay.bio/variable-expectations/posts/linear-algebra/Sparse-linear-solvers.html)"""
CODE
decomp_tools_path ="../../files/utils/"+"mermaid_tools.py"from importlib.machinery import SourceFileLoadermermaid_tools = SourceFileLoader("mermaid_tools", decomp_tools_path).load_module()import re, pandas as pd# pat = re.compile(r'-\s*\[([^\]]+)]\(([^)]+)\)')pat = re.compile(r'-\s*\[([^\]]*)\]\(([^)]*)\)') # * instead of + → empty allowedtitles, urls, names = [], [], []for title, url in pat.findall(s):if title or url: # keep the row if at least one field is non-empty titles.append(title) urls.append(url) names.append(url.split("/")[-1].split(".")[0])node_df = pd.DataFrame({"label": titles,"url": urls, "name": names, "id": names})node_df
flowchart LR
SVD-and-friends --> RP-dimension-reduction
SVD-and-friends --> SVD-imputation
SVD-and-friends --> Low-rank-Nystrom
Low-rank-Nystrom --> Sparse-linear-solvers
SVD-and-friends["The SVD and its uses"]
click SVD-and-friends "https://akshay.bio/variable-expectations/posts/linear-algebra/SVD-and-friends.html" "The SVD and its uses"
RP-dimension-reduction["Random projections and dimensionality reduction"]
click RP-dimension-reduction "https://akshay.bio/variable-expectations/posts/linear-algebra/RP-dimension-reduction.html" "Random projections and dimensionality reduction"
SVD-imputation["SVD-based imputation methods"]
click SVD-imputation "https://akshay.bio/variable-expectations/posts/linear-algebra/SVD-imputation.html" "SVD-based imputation methods"
Low-rank-Nystrom["Low-rank approximation of matrices and graphs"]
click Low-rank-Nystrom "https://akshay.bio/variable-expectations/posts/linear-algebra/Low-rank-Nystrom.html" "Low-rank approximation of matrices and graphs"
Sparse-linear-solvers["Solving sparse linear equations"]
click Sparse-linear-solvers "https://akshay.bio/variable-expectations/posts/linear-algebra/Sparse-linear-solvers.html" "Solving sparse linear equations"
Linear algebra for summarization
The SVD and related methods can summarize a matrix by a few vectors in a way which is highly quantifiable. Since data come as matrices by default for machine learning and AI, such summarization methods are extremely relevant in summarizing high-dimensional data to a few dimensions, with high fidelity and only sparse matrix multiplications.
Linear algebra sketching and projection tools form the backbone of most known ultra-efficient summarization techniques. In addition to the SVD and related linear algebra decompositions like the polar decomposition, lesser-known methods to summarize a matrix by archetypal rows/columns or combinations of them exist.
Randomness is an essential tool in linear algebra, particularly in summarization, where it can learn structure obliviously and extremely effectively. Random projections, sketching, and other dimensionality reduction tools are vital to the commonly used repertoire. Some of these tools are only known in specific applications outside their native fields, and deserve to be more widely used.
Linear algebra for analysis
Breaking down a matrix into its vectors can also help for analysis and for finding its structure. The SVD itself does so at a joint level across groups of examples. Co-clustering methods that find joint structure in several matrices are of this kind.
More structured matrices like graphs can be broken down by slightly more sophisticated linear algebra techniques. Low-rank approximation tools like the Nyström approximation are perhaps somewhat underused in their native form, and are key tools in modern randomized linear algebra. Meanwhile, the well-studied machinery of Schur decompositions and Krylov subspaces culminates in conjugate gradient methods, which are some of the crown jewels of effective computation for analysis.
SVD-based imputation methods also exist, and work well as part of broader pipelines or at scales where only sparse matrix multiplications are feasible.
Linear algebra for visualization
These linear algebra methods are especially effective in visualization and interactive, user-guided computation.
All spectral graph methods are essentially linear algebra methods on the adjacency matrices of graphs. In common data science practice, the graph is a kNN graph with constant degree, representing the manifold of data. There are many powerful methods to construct such graphs, which reflect the structure of the data manifold.
Once constructed, linear algebra methods on these graphs reveal nonparametric structure at all scales. Since they involve only iterative matrix multiplications, they are the most efficient tools of their kind for methods like clustering and structure determination. We show in interactive dashboards how these tools can operate in real-time, giving users seamless ways of performing computations on data.