Neural Magic LogoNeural Magic Logo
DeepSparse EngineSparseMLSparseZoo
Use Cases
Natural Language Processing
Text Classification

Text Classification with Hugging Face Transformers and SparseML

This page explains how to create and deploy a sparse Transformer for Text Classification.

SparseML Text Classification Pipelines integrate with Hugging Face’s Transformers library to enable the sparsification of a large set of transformers models. Sparsification is a powerful technique that results in faster, smaller, and cheaper deployable models. A sparse model can be deployed with Neural Magic's DeepSparse Engine with GPU-class performance directly on your CPU.

This integration enables you to create a sparse model in two ways:

  • Sparsification of Popular Transformer Models - sparsify any popular Hugging Face Transformer model from scratch.
  • Sparse Transfer Learning - fine-tune a sparse model (or use one of our sparse pre-trained models) on your own private dataset.

Each option is useful in different situations:

  • Sparsification from Scratch enables you to create a sparse version of any model (even those not in the SparseZoo), but requires hand-tuning the hyperparameters of the sparsification algorithm.
  • Sparse Transfer Learning is the easiest path to creating a sparse model trained on your data. Simply pull a pre-sparsified model and transfer learning recipe from the SparseZoo and fine-tune on your data with a single command.

Installation Requirements

This section requires SparseML Torch Install and DeepSparse General Install.

It is recommended to run Python 3.8 as some of the scripts within the transformers repository require it.

Transformers will not immediately install with this command. Instead, a sparsification-compatible version of Transformers will install on the first invocation of the Transformers code in SparseML.


There are some additional tutorials for this functionality on GitHub.

Getting Started

In the example below, a dense BERT model is sparsified and fine-tuned it on the MNLI dataset.

1sparseml.transformers.text_classification \
2 --model_name_or_path bert-base-uncased \
3 --task_name mnli \
4 --do_train \
5 --do_eval \
6 --output_dir './output' \
7 --cache_dir cache \
8 --distill_teacher disable \
9 --recipe zoo:nlp/text_classification/bert-base/pytorch/huggingface/mnli/12layer_pruned90-none

The SparseML train script is a wrapper around a Hugging Face script, and usage for most arguments follows the Hugging Face. The most important arguments for SparseML are:

  • model_name_or_path: specifies starting model. It can be a SparseZoo stub, Hugging Face model identifier, or a local directory with, tokenizer.json and config.json
  • recipe: recipe containing the training hyperparamters (SparseZoo stub or a local file)
  • task_name: specifies the sentiment analysis task for the MNLI dataset

To utilize a custom dataset, use the --train_file and --validation_file arguments. To use a dataset from the Hugging Face hub, use --dataset_name. See the Hugging Face Docs for more details.

Run the following to see the full list of options:

$ sparseml.transformers.text_classification -h

Sparse Transfer Learning

SparseML also enables you to fine-tune a pre-sparsified model onto your own dataset. While you are free to use your backbone, we encourage you to leverage one of our sparse pre-trained models to boost your productivity!

In the example below, we fetch a pruned, quantized BERT model, pre-trained on Wikipedia and Bookcorpus datasets. We then fine-tune the model to the SST2 dataset.

1sparseml.transformers.text_classification \
2 --model_name_or_path zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/12layer_pruned80_quant-none-vnni \
3 --task_name sst2 \
4 --do_train \
5 --do_eval \
6 --output_dir './output' \
7 --distill_teacher disable \
8 --recipe zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/12layer_pruned80_quant-none-vnni?recipe_type=transfer-text_classification

This usage of the script is the same as the above.

In this example, however, the starting model is a pruned-quantized version of BERT from SparseZoo (rather than a dense BERT) and the recipe is a transfer learning recipe, which instructs Transformers to maintain sparsity (rather than a sparsification recipe that sparsifies the model from scratch).

Additionally, this example uses the SST2 task (which uses the SST2 dataset).

Knowledge Distillation

By modifying the distill_teacher argument, you can enable Knowledge Distillation (KD) functionality. KD provides additional support to the sparsification process, enabling higher accuracy at higher levels of sparsity.

For example, the --distill_teacher argument can be set to pull a dense SST2 model from the SparseZoo to support the training process:

--distill_teacher zoo:nlp/text_classification/bert-base/pytorch/huggingface/sst2/base-none

Alternatively, the user may decide to train their own dense teacher model. The following command uses the dense BERT base model from the SparseZoo and fine-tunes it on the SST2 dataset for use as a dense teacher.

1sparseml.transformers.text_classification \
2 --model_name_or_path zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none \
3 --task_name sst2 \
4 --do_train \
5 --do_eval \
6 --output_dir models/teacher \
7 --recipe zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none?recipe_type=transfer-text_classification

Once the dense teacher is trained we may reuse it for KD in Sparsification or Sparse Transfer learning. Simply pass the path to the directory with the teacher model to the --distill_teacher argument. For example:

--distill_teacher models/teacher

SparseML CLI

The SparseML installation provides a CLI for sparsifying your models for a specific task; appending the --help argument displays a full list of options for training in SparseML:

sparseml.transformers.text_classification --help


1 --model_name_or_path MODEL_NAME_OR_PATH
2 Path to pretrained model, sparsezoo stub. or model identifier from (default: None)
3 --distill_teacher DISTILL_TEACHER
4 Teacher model which must be a trained text classification model (default: None)
5 --cache_dir CACHE_DIR
6 Where to store the pretrained data from (default: None)
7 --recipe RECIPE
8 Path to a SparseML sparsification recipe, see for more information (default: None)
9 --task_name TASK_NAME
10 The name of the task to train on: cola, mnli, mrpc, qnli, qqp, rte, sst2, stsb, wnli (default: None)

To learn about the Hugging Face Transformers run-scripts in more detail, refer to Hugging Face Transformers documentation.

Deploying with DeepSparse

The artifacts of the training process are saved to the directory --output_dir. Once the script terminates, the directory will have everything required to deploy or further modify the model such as:

  • The recipe (with the full description of the sparsification attributes).
  • Checkpoint files (saved in the appropriate framework format).
  • Additional configuration files (e.g., tokenizer, dataset info).

Exporting the Sparse Model to ONNX

The DeepSparse Engine uses the ONNX format to load neural networks and then deliver breakthrough performance for CPUs by leveraging the sparsity and quantization within a network.

The SparseML installation provides a sparseml.transformers.export_onnx command that you can use to load the training model folder and create a new model.onnx file within. Be sure the --model_path argument points to your trained model.

1sparseml.transformers.export_onnx \
2 --model_path './output' \
3 --task 'text-classification'

DeepSparse Engine Deployment

Once the model is exported in the ONNX format, it is ready for deployment with the DeepSparse Engine.

The deployment is intuitive due to the DeepSparse Python API.

1from deepsparse import Pipeline
2tc_pipeline = Pipeline.create(
3 task="text-classification",
4 model_path='./output'
7inference = tc_pipeline("Snorlax loves my Tesla!")
>> [{'label': 'LABEL_1', 'score': 0.9884248375892639}]
1inference = tc_pipeline("Snorlax hates pineapple pizza!")
>> [{'label': 'LABEL_0', 'score': 0.9981569051742554}]

To learn more, refer to the appropriate documentation in the DeepSparse repository.


For Neural Magic Support, sign up or log in to our Deep Sparse Community Slack. Bugs, feature requests, or additional questions can also be posted to our GitHub Issue Queue.

NLP Question Answering
NLP Token Classification