Skip to main content
Version: nightly

LLMs - Causal Language Modeling

This section provides comprehensive guidance and resources to help you excel in various Large Language Model (LLM) tasks using Neural Magic's powerful suite of tools.

Deploymentโ€‹

Learn how to seamlessly deploy sparsified LLMs for accelerated text generation and reduced resource demands.

๐Ÿ“„๏ธ Serving LLMs

DeepSparse is a CPU inference runtime that takes advantage of sparsity to accelerate neural network inference. Coupled with SparseML, our optimization library for pruning and quantizing your models, DeepSparse delivers exceptional inference performance on CPU hardware.

Optimizing LLMsโ€‹

Discover state-of-the-art techniques to significantly reduce the footprint and increase the inference speed of your LLMs.

Optimizing LLMs with SparseML

Discover how to optimize LLMs using SparseML, Neural Magic's open-source library for model optimization.

Data Formatsโ€‹

Explore the most common data formats for LLMs, including text, JSONL, and more.

SparseML Data Formats

Learn about the most common data formats for LLMs, including text, JSONL, and more.

Presparsified Modelsโ€‹

Discover a selection of presparsified LLMs to help you get started with your text generation tasks.

๐Ÿ“„๏ธ Sparse LLMs

Discover and utilize optimized LLM models from SparseZoo and Hugging Face Hub for efficient DeepSparse deployment.

Guidesโ€‹

Explore best practices and step-by-step tutorials for specific LLM use cases.

๐Ÿ“„๏ธ Guides

Explore best practices and step-by-step tutorials for specific LLM use cases.