FAQs
General Product FAQs
What is Neural Magic?
Neural Magic was founded by a team of award-winning MIT computer scientists and is funded by Amdocs, Andreessen Horowitz, Comcast Ventures, NEA, Pillar VC, Ridgeline Partners, Verizon Ventures, and VMWare. The Neural Magic Platform includes several components, including DeepSparse, SparseML, and SparseZoo. DeepSparse is an inference runtime offering GPU-class performance on CPUs and tooling to integrate ML into your application. SparseML and SparseZoo, are open-source tooling and a model repository combination that enables you to create an inference-optimized sparse-model for deployment with DeepSparse.
Together, these components remove the tradeoff between performance and the simplicity and scalability of software-delivered deployments.
As of 2024, Neural Magic will be announcing sparsity on GPUs to complement our CPU efforts. Stay tuned!
What is DeepSparse?
DeepSparse, created by Neural Magic, is an inference runtime for deep learning models. It delivers state of art, GPU-class performance on commodity CPUs as well as tooling for integrating a model into an application and monitoring models in production.
Why Neural Magic?
Learn more about Neural Magic and DeepSparse (formerly known as the Neural Magic Inference Engine). Watch the Why Neural Magic video
How does Neural Magic make it work?
This is an older webinar (50m) where we went through the process of optimizing and deploying a model; we’ve enhanced our software since the recording went out but this will give you some background: Watch the How Does it Work video
Does Neural Magic support training of learning models on CPUs?
Neural Magic does not support training of deep learning models at this time. We do see value in providing a consistent CPU environment for our end users to train and infer on for their deep learning needs, and we have added this to our engineering backlog.
Do you run on AMD hardware?
DeepSparse is validated to work on x86 Intel (Haswell generation and later) and AMD CPUs running Linux, with support for AVX2, AVX-512, and VNNI instruction sets. Specific support details for some algorithms over different microarchitectures is available.
We are open to opportunities to expand our support footprint for different CPU-based processor architectures, based on market adoption and deep learning use cases.
Do you run on ARM architecture?
We have provided ARM support as of our 1.6 release. We primarily focused on LLMs and transformer models for server-grade systems like AWS Graviton and Ampere Currently, we have limited alpha support for CNN models on embedded systems, particularly those with dot product instructions (ARMv8.2+). ARM on MacOS has beta support. Feel free to pip install deepsparse-nightly
if you would like to try it out. We would like to hear your use cases and keep you in the
loop! Contact us to continue the conversation.
To what use cases is the Neural Magic Platform best suited?
We focus on the models and use cases related to LLMs, computer vision, and NLP where there may be cost sensitivity and both real-time and throughput constraints.
What types of models does Neural Magic support?
Today, we offer support for LLMs, CNN-based computer vision models, specifically classification and object detection model types. NLP models like BERT are also available. We are continuously adding models to the SparseZoo. Additionally, we are constantly investigating new model architectures.
Is dynamic shape supported?
Dynamic shape is currently not supported; be sure to use models with fixed inputs and compile the model for a particular batch size. Dynamic shape and dynamic batch sizes are on the Neural Magic roadmap; subscribe for updates.
Can multiple model inferences be executed?
Model inferences are executed as a single stream by default; concurrent execution can be enabled depending on the engine execution strategy.