Products

Solutions

Customers

Developers

Company

Publications

February 26, 2025

Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs

November 18, 2024

Empirical Upper Bounds for Unstructured Sparsity in Compute-Efficient Language Modeling

November 01, 2024

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

October 31, 2024

Sparse maximal update parameterization: A holistic approach to sparse training dynamics

October 13, 2024

Self-Data Distillation for Recovering Quality in Pruned Large Language Models

September 04, 2024

Bilingual Adaptation of Monolingual Foundation Models

July 02, 2024

Bilingual Adaptation of Monolingual Foundation Models

May 20, 2024

MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

May 15, 2024

Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System

May 15, 2024

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

November 30, 2023

Efficient and Approximate Per-Example Gradient Norms for Gradient Noise Scale

November 13, 2023

Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator Hardware

Load more