Neural Magic LogoNeural Magic Logo
Products
menu-icon
Products
DeepSparse EngineSparseMLSparseZoo
Get Started
Transfer a Sparsified Model
NLP Text Classification

Transfer a Sparsified Model for Text Classification

This page walks through an example of fine-tuning a pre-sparsified model onto a new dataset for sentiment analysis.

For NLP tasks, model distillation from a dense teacher to a sparse student model is helpful to achieve higher sparsity and accuracy. We will follow two steps using SparseML:

  1. Fine-tune a dense teacher model (BERT) onto a new dataset (SST2)
  2. Transfer learn a pre-sparsified model (DistilBERT) from the SparseZoo onto SST2, distilling from the dense teacher model trained in step 1

If you already have a trained teacher model ready to go, you can skip step 1.

An example of transfer learning without model distillation is in the Use Cases Page.

Install Requirements

This example requires SparseML General Install.

Train a Teacher

To create a teacher for the desired text classification dataset, we will fine-tune a dense BERT model from the SparseZoo, the stub of which is below:

zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none

The following script fine-tunes the dense teacher onto the SST2 dataset for sentiment analysis. After the command completes, the trained model will be around 92.7% accurate on SST-2 and stored in the local models/teacher directory.

$sparseml.transformers.text_classification \
--output_dir models/teacher \
--model_name_or_path "zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none" \
--recipe "zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none?recipe_type=transfer-text_classification" \
--recipe_args '{"init_lr":0.00003}' \
--task_name sst2 \
--max_seq_length 128 \
--per_device_train_batch_size 32 --per_device_eval_batch_size 32 \
--do_train --do_eval --evaluation_strategy epoch --fp16 \
--save_strategy epoch --save_total_limit 1

The SparseML train script is a wrapper around a HuggingFace script, and usage for most arguments follows the HuggingFace. The most important arguments for SparseML are:

  • model_name_or_path: specifies starting model. It can be a SparseZoo stub, HF model identifier, or a local directory with model.pt, tokenizer.json and config.json
  • recipe: recipe containing the training hyperparamters (SparseZoo stub or a local file)
  • task_name: specifies the sentiment analysis task. If not provided, also specifies the dataset, pipelines, and eval metrics.

To utilize a custom dataset, use the --train_file and --validation_file arguments. To use a dataset from the HuggingFace hub, use --dataset_name. See the HF Docs for more details.

Run the following to see the full list of options:

$ sparseml.transformers.text_classification -h

Transfer Learning

With the teacher model trained, it is ready to be distilled into a sparsified student model.

SparseZoo contains several pre-sparsified NLP models and recipes ready to be used as the student. This example uses the SparseZoo stub for a pruned-quantized DistilBERT:

zoo:nlp/masked_language_modeling/distilbert-none/pytorch/huggingface/wikipedia_bookcorpus/pruned80_quant-none-vnni

The following script then transfers the sparsified DistilBERT onto the SST2 dataset for sentiment analysis with the help of the teacher from above. After the command completes, the trained sparse model will be around 90.5% accurate on SST2 and stored in the local models/sparsified directory.

$sparseml.transformers.text_classification \
--output_dir models/sparsified \
--model_name_or_path "zoo:nlp/masked_language_modeling/distilbert-none/pytorch/huggingface/wikipedia_bookcorpus/pruned80_quant-none-vnni" \
--recipe "zoo:nlp/sentiment_analysis/distilbert-none/pytorch/huggingface/sst2/pruned80_quant-none-vnni" \
--distill_teacher models/teacher \
--task_name sst2 \
--max_seq_length 128 \
--per_device_train_batch_size 32 --per_device_eval_batch_size 32 \
--do_train --do_eval --evaluation_strategy epoch --fp16 \
--save_strategy epoch --save_total_limit 1

Usage is the same as above. The --distill_teacher argument instructs SparseML to perform model distillation from the teacher saved at models/teacher.

There are many additional command line arguments that can be passed to tweak your fine-tuning process. Run the following to see the full list of options:

$ sparseml.transformers.text_classification -h

Exporting for Inference

With the sparsified model successfully trained, it is time to export it for inference. The sparseml.transformers.export_onnx command is used to export the training graph to a performant inference one. After the command completes, a model.onnx file is created in models/deployment folder. It is now ready for deployment with the DeepSparse Engine.

$sparseml.transformers.export_onnx \
--model_path models/sparsified \
--task 'text-classification' --finetuning_task sst2 \
--sequence_length 128
Transfer a Sparsified Model
Transfer a Sparsified Model for Object Detection