Why say lot word when few do trick?
June 08, 2025The Minimum Description Length (MDL) principle, Kolmogorov complexity, linear regression, data compression and learning.
Written by Liam Bai who works on software at Ginkgo Bioworks and writes about math, AI, and biology. He's on LinkedIn and Twitter.
The Minimum Description Length (MDL) principle, Kolmogorov complexity, linear regression, data compression and learning.
Applying the logit lens to ESM-2. Logit visualization & interpretation, attention analysis, looking inside the mind of a protein language model.
Predicting variant effects with variational autoencoders. DeepSequence, EVE, EVEScape. Machine learning for clinical decisions, improving pandemic preparedness by predicting antibody escape.
Predicting protein function using deep generative models. Latent variable models, reconstruction, variational autoencoders (VAEs), Bayesian inference, evidence lower bound (ELBO).
Protein design by hallucination. DeepDream, Markov Chain Monte Carlo (MCMC), KL divergence, gradient optimization, scaffolding functional sites, SARS-CoV-2 receptor traps.
Learning protein representations. Transfer learning, protein language models, contextual embeddings, Transformers, masked language modeling, BERT, UniRep, ESM, attention analysis.
Predicting protein structure and function. Multiple Sequence Alignments (MSAs), the protein folding problem, the Potts model, Direct Coupling Analysis (DCA), EVCouplings.