Skip to main content
Version: nightly

Sparse LLMs

Neural Magic offers a wide range of optimized Large Language Models (LLMs) ready for efficient inference or further tuning on your datasets. This guide will help you explore and leverage models from our Hugging Face Hub as well as our SparseZoo.

Neural Magic offers expertly curated collections of state-of-the-art Large Language Models (LLMs) that have been rigorously sparsified for optimal inference performance. Dive deeper into these collections and explore usage guides for readily optimized models:

📄️ Llama 2

Discover Neural Magic's optimized Llama 2 models. Experience faster LLM performance with sparsification.

Hugging Face Hub Models

The Hugging Face Hub offers a vast repository of LLMs that are compatible with DeepSparse. While these models may not have undergone the same extensive quality assurance as SparseZoo models, they often allow for faster deployment into production environments. It's important to always evaluate Hugging Face Hub models to ensure their results meet your quality expectations.

Utilize the following links to streamline your discovery of LLMs on our Hugging Face Hub:

Neural Magic Organization

Find official models optimized by the Neural Magic team.

Sparse LLMs Collection

Explore LLMs with at least 50% weight pruning and DeepSparse optimization.

NM-Testing

Discover experimental models created by the community.

SparseZoo Models

SparseZoo houses a collection of LLMs that have undergone a rigorous testing process to ensure they maintain a high level of accuracy even after pruning and quantization. These models are optimized specifically for seamless integration with DeepSparse, saving you the time and effort of additional optimization steps.

Visit the following page to dive into the full range of LLMs available on the SparseZoo:

SparseZoo LLMs

Explore pre sparsified LLMs available in the SparseZoo.