Feb 23 2026

ExomeBench: A Benchmark for Clinical Variant Interpretation in Exome Regions

1. What is ExomeBench?

We are excited to announce the public release of ExomeBench, a reproducible benchmark for clinically relevant variant interpretation in exome regions. This benchmark is designed to help researchers evaluate and improve models for health-relevant predictions, complementing existing tools and datasets in genomics. This post summarizes the benchmark tasks, baseline results, and how to get started.

There has been tremendous progress in DNA and genomics modelling with transformer-based models, such as Nucleotide Transformer[1] and Evo[2,3]. These models are typically evaluated on structural and functional genomics tasks, such as predicting regulatory elements, chromatin accessibility, or other sequence-level properties, and they achieve impressive performance on these benchmarks. However, as most existing benchmarks focus on tasks related to general sequence modeling, it is unclear how well these state-of-the-art models perform on clinically relevant tasks, for example, predicting the pathogenicity of a variant or its association with disease phenotypes. There has been a gap in evaluating clinically meaningful variant interpretation, whether a model can use context from exome-regions to support health-relevant predictions.

ExomeBench aims to fills this gap. Built from ClinVar-derived single-nucleotide variants in exome regions, it includes five supervised classification tasks that emphasize clinically relevant signals. Unlike other existing benchmarks, it focuses on questions that arise in medical and translational genomics, enabling researchers to test models on clinically relevant tasks.

By complementing existing benchmarks, ExomeBench provides a public resource to help researchers and clinicians iterate faster and develop novel modeling techniques for clinical genomics.

Key takeaways

A standardized, reproducible benchmark for exome-region variant interpretation curated from the ClinVar database.
Five supervised classification tasks spanning pathogenicity, phenotype-linked association, and gene attribution—aligned with practical clinical genomics questions.
Public baselines and experimental artifacts to support transparent, repeatable evaluation and easy comparisons across model families.

Important note on intended use

ExomeBench is a research benchmark. It is not a diagnostic tool and should not be used to make clinical decisions. ClinVar labels reflect submitted interpretations and evidence levels; any model performance reported here does not constitute clinical validation.

2. ExomeBench dataset overview

ExomeBench is built from ClinVar (November 2024 release) and focuses on variants in exome regions (the repository provides a detailed definition and region mask). We include only single-nucleotide variants (SNVs), the most common type of genetic variant in exomes.

For each variant, the input to a model is a DNA sequence window of 100 base pairs centered on the variant. These sequences are extracted from the GRCh38 reference genome, with the reference or alternate allele applied at the center position (see the repository for exact encoding details).

The dataset contains a total of over 158k variants across five clinically relevant tasks:

47k for pathogenicity prediction (PV),
35k for cancer-predisposing syndromes (CPS),
19k for cardiovascular phenotypes (CP),
13k for BRCA classification, and
44k for the top five genes task (TFG).

Each task has integer-encoded labels and an accompanying published label map for consistent interpretation.

To support proper evaluation, the data is split into train, validation, and test sets, ensuring that no exact variant appears in more than one split. For the PV task, we take an extra precaution with gene-disjoint splits- all variants from the same gene are kept in a single split. This prevents models from memorizing gene-specific patterns and makes performance a better indicator of generalization to new genes.

ExomeBench provides a carefully structured, reproducible dataset. We release the scripts used to create the dataset from ClinVar database.

3. ExomeBench tasks

ExomeBench includes five supervised classification tasks:

Pathogenic Variant Prediction (PV, 4-class): Predict clinical significance of a variant as one of {pathogenic, likely pathogenic, likely benign, benign}, according to ClinVar. To prevent models from memorizing gene-specific patterns, PV uses gene-disjoint splits, where all variants from the same gene are assigned to the same split.
Cancer-Predisposing Syndrome (CPS, binary): Predict whether the variant is associated with to a hereditary cancer–predisposing syndrome, using ClinVar’s condition and phenotype annotations.
Cardiovascular Phenotype (CP, binary): Similar in style to CPS, predict whether the variant is linked to cardiovascular conditions, according to ClinVar annotations.
BRCA classification (BRCA, 3-class): Predict whether the variant is in BRCA1, BRCA2, or neither, reflecting a real-world diagnostic scenario in hereditary breast and ovarian cancer.
Top 5 Genes (TFG, multi-class): Predict the variant’s gene label among the five most frequent genes used for this task (as defined in the dataset metadata).

4. Baseline model performance (MCC)

We report Matthews correlation coefficient (MCC) on the test set (higher is better). MCC is informative for imbalanced classification, which is common in clinical variant datasets.

Note: Some tasks (e.g., gene-attribution tasks like TFG) can be close to saturated and may be less discriminative across strong models; we recommend focusing comparisons on PV/CPS/CP for clinical interpretation signal.

5. Getting started

To use ExomeBench, start with the dataset on Hugging Face and follow the end-to-end workflows in the project’s GitHub repository (data loading, training, evaluation, and hyperparameter sweeps). Those resources are the source of truth for reproducing the baselines and running your own models.

Contact

Corresponding Email: exome-bench@cerebras.net

Citation

This benchmark was developed as part of the work supporting the STRAND paper (in collaboration with Mayo Clinic).

If you use ExomeBench, please (1) cite the STRAND paper and (2) acknowledge ClinVar and GRCh38 as the underlying sources used to construct the benchmark.

Ayanian, S. et.al. Introducing a foundational sequence transformer for range adaptive nucleotide decoding (STRAND), Briefings in Bioinformatics, Volume 26, Issue 6, November 2025, bbaf618. https://doi.org/10.1093/bib/bbaf618

References

Dalla-Torre, H., Gonzalez, L., Mendoza-Revilla, J. et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat Methods 22, 287–297 (2025). https://doi.org/10.1038/s41592-024-02523-z
Eric Nguyen et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024). https://doi.org/10.1126/science.ado9336
Garyk Brixi, Matthew G. Durrant, Jerome Ku et.al. Genome modeling and design across all domains of life with Evo 2. bioRxiv 2025.02.18.638918; doi: https://doi.org/10.1101/2025.02.18.638918