LLMs - Causal Language Modeling
This section provides comprehensive guidance and resources to help you excel in various Large Language Model (LLM) tasks using Neural Magic's powerful suite of tools.
Deploymentโ
Learn how to seamlessly deploy sparsified LLMs for accelerated text generation and reduced resource demands.
๐๏ธ Serving LLMs
DeepSparse is a CPU inference runtime that takes advantage of sparsity to accelerate neural network inference. Coupled with SparseML, our optimization library for pruning and quantizing your models, DeepSparse delivers exceptional inference performance on CPU hardware.
Optimizing LLMsโ
Discover state-of-the-art techniques to significantly reduce the footprint and increase the inference speed of your LLMs.
Optimizing LLMs with SparseML
Discover how to optimize LLMs using SparseML, Neural Magic's open-source library for model optimization.
Data Formatsโ
Explore the most common data formats for LLMs, including text, JSONL, and more.
SparseML Data Formats
Learn about the most common data formats for LLMs, including text, JSONL, and more.
Presparsified Modelsโ
Discover a selection of presparsified LLMs to help you get started with your text generation tasks.
๐๏ธ Sparse LLMs
Discover and utilize optimized LLM models from SparseZoo and Hugging Face Hub for efficient DeepSparse deployment.
Guidesโ
Explore best practices and step-by-step tutorials for specific LLM use cases.
๐๏ธ Guides
Explore best practices and step-by-step tutorials for specific LLM use cases.