BERT: Sparsifying to Improve NLP Performance
Neural Magic creates models and recipes that allow anyone to plug in their data and leverage SparseML’s recipe-driven approach on top of Hugging Face’s robust training pipelines for the popular BERT NLP network. Sparsifying involves removing redundant information from neural networks using algorithms such as pruning and quantization, among others. This sparsification process results in faster inference and smaller file sizes for deployments.
This page walks through the following use cases for trying out the sparsified BERT models:
Compare the differences between the models for both accuracy and inference performance
Run the models for inference in deployment or applications
Train the models on new datasets