Protein language models through the logit lens
May 21, 2025Applying the logit lens to ESM-2. Logit visualization & interpretation, attention analysis, looking inside the mind of a protein language model.
Written by Liam Bai who works on software at Ginkgo Bioworks and writes about math, AI, and biology. He's on LinkedIn and Twitter.
Applying the logit lens to ESM-2. Logit visualization & interpretation, attention analysis, looking inside the mind of a protein language model.
Predicting variant effects with variational autoencoders. DeepSequence, EVE, EVEScape. Machine learning for clinical decisions, improving pandemic preparedness by predicting antibody escape.
Predicting protein function using deep generative models. Latent variable models, reconstruction, variational autoencoders (VAEs), Bayesian inference, evidence lower bound (ELBO).
Protein design by hallucination. DeepDream, Markov Chain Monte Carlo (MCMC), KL divergence, gradient optimization, scaffolding functional sites, SARS-CoV-2 receptor traps.
Learning protein representations. Transfer learning, protein language models, contextual embeddings, Transformers, masked language modeling, BERT, UniRep, ESM, attention analysis.
Predicting protein structure and function. Multiple Sequence Alignments (MSAs), the protein folding problem, the Potts model, Direct Coupling Analysis (DCA), EVCouplings.