Deploying LLMs
Deploy large language models (LLMs) for text generation using Neural Magic's DeepSparse. This doc includes code examples, performance benchmarking, and server setup.
Deploy large language models (LLMs) for text generation using Neural Magic's DeepSparse. This doc includes code examples, performance benchmarking, and server setup.
Install SparseZoo, Neural Magic's repository of pre-sparsified models, or learn how to access it through SparseML and DeepSparse.
Optimize large language models (LLMs) for efficient inference using one-shot pruning and quantization. Learn how to improve model performance and reduce costs without sacrificing accuracy.
Improve the performance of your large language models (LLMs) through fine-tuning with Neural Magic's SparseML. Optimize LLMs for specific tasks while maintaining accuracy.
Adapt large language models (LLMs) to new domains and tasks using sparse transfer learning with Neural Magic's SparseML. Maintain accuracy while optimizing for efficiency.