Episode 293

February 17, 2026

00:17:53

293: IndeLLM (ESM2) zero-shot scoring and Siamese transfer learning for in-frame indel prediction (MCC 0.77)

Hosted by

Gustavo B Barra
293: IndeLLM (ESM2) zero-shot scoring and Siamese transfer learning for in-frame indel prediction (MCC 0.77)
Base by Base
293: IndeLLM (ESM2) zero-shot scoring and Siamese transfer learning for in-frame indel prediction (MCC 0.77)

Feb 17 2026 | 00:17:53

/

Show Notes

Gracia Carmona O et al., Patterns. 7 ( - IndeLLM uses protein language models (ESM2) to score in-frame indels and a compact Siamese transfer-learning model that achieves state-of-the-art pathogenicity prediction with MCC = 0.77.

Study Highlights:
Using human protein sequences and ESM2 embeddings, the authors develop IndeLLM, a zero-shot scoring function that sums overlapping-region probabilities to correct length bias in in-frame indels. They train a compact Siamese one-hidden-layer network on PLM embeddings with biologically guided embedding splitting and achieve MCC = 0.77 on the test set. Per-residue probability differences mapped onto structures (FGFR1, GLMN) identify local regions affected by indels and improve interpretability. The framework reduces insertion false negatives and is released with Colab and GitHub tools for indel annotation and disease-variant analysis.

Conclusion:
IndeLLM zero-shot scoring and a small Siamese transfer-learning model provide improved, interpretable indel pathogenicity prediction, with the Siamese model achieving MCC = 0.77.

Music:
Enjoy the music based on this article at the end of the episode.

Reference:
Gracia Carmona O, Leipart V, Amdam GV, Orengo C, Fraternali F. Leveraging protein language models and a scoring function for indel characterization and transfer learning. Patterns. 7 (2026) 101425. https://doi.org/10.1016/j.patter.2025.101425

License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) - https://creativecommons.org/licenses/by/4.0/

Support:
Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00

Official website https://basebybase.com

On PaperCast Base by Base you’ll discover the latest in genomics, functional genomics, structural genomics, and proteomics.

Episode link: https://basebybase.castos.com/episodes/indellm-indel-siamese-model

Other Episodes