Guides
Explore best practices and step-by-step tutorials for specific LLM use cases. Dive into the following guides to learn how to use LLM to build your own applications.
📄️ Why is Sparsity Important for LLMs?
Large Language Models (LLMs) have a large size that often poses challenges in terms of computational efficiency and memory usage. Weight sparsity is a technique that can significantly alleviate these issues, enhancing the practicality and scalability of LLMs. Here we outline the key benefits of weight sparsity in LLMs, focusing on three main aspects:
📄️ Convert LLMs From Hugging Face
This guide is for people interested in exporting their Hugging Face-compatible LLMs to work in DeepSparse.
📄️ Compress LLMs With SparseGPT
This page describes how to perform one-shot quantization of large language models using SparseML. This workflow requires a GPU with at least 16GB VRAM and 64GB of system RAM.
📄️ Sparse Fine-Tuning LLMs on GSM8k
Guide on sparse fine-tuning Llama2 7B model on GSM8K dataset, including steps, commands, and recipes for optimization.
📄️ LLM Serving on Windows
Here is a guide for running a large language model (LLM) for text generation on Windows using Windows Subsystem for Linux (WSL) and DeepSparse Server