This page explains how to create and deploy a sparse Transformer for Question Answering.
SparseML Question Answering Pipelines
integrate with Hugging Face’s Transformers library to enable the sparsification of a large set of transformers models.
Sparsification is a powerful technique that results in faster, smaller, and cheaper deployable models.
A sparse model can be deployed with DeepSparse for GPU-class performance directly on your CPU.
This integration enables you to create a sparse model in two ways. Each option is useful in different situations:
This use case requires installation of:
It is recommended to run Python 3.8 as some of the scripts within the Transformers repository require it.
Transformers will not immediately install with this command. Instead, a sparsification-compatible version of Transformers will install on the first invocation of the Transformers code in SparseML.
Here are additional tutorials for this functionality.
In the example below, a dense BERT model is sparsified and fine-tuned on the SQuAD dataset.
1sparseml.transformers.question_answering \2 --model_name_or_path bert-base-uncased \3 --dataset_name squad \4 --do_train \5 --do_eval \6 --output_dir './output' \7 --cache_dir cache \8 --distill_teacher disable \9 --recipe zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98
The SparseML train script is a wrapper around a Hugging Face script, and usage for most arguments follows the Hugging Face. The most important arguments for SparseML are:
--model_name_or_path
indicates the model from which to start the pruning process. It can be a SparseZoo stub, HF model identifier, or a path to a local model.--recipe
points to a recipe file containing the sparsification hyperparameters. It can be a SparseZoo stub or a local file. See Creating Sparsification Recipes for more information.--dataset_name
indicates that we should fine-tune on the SQuAD dataset.To utilize a custom dataset, use the --train_file
and --validation_file
arguments. To use a dataset from the Hugging Face hub, use --dataset_name
.
See the Hugging Face documentation for more details.
Run the following to see the full list of options:
$ sparseml.transformers.question_answering -h
SparseML also enables you to fine-tune a pre-sparsified model onto your own dataset. While you are free to use your backbone, we encourage you to leverage one of our sparse pre-trained models to boost your productivity!
In the example below, we fetch a pruned, quantized BERT model, pre-trained on Wikipedia and Bookcorpus datasets. We then fine-tune the model to the SQuAD dataset.
1sparseml.transformers.question_answering \2 --model_name_or_path zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/12layer_pruned80_quant-none-vnni \3 --dataset_name squad \4 --do_train \5 --do_eval \6 --output_dir './output' \7 --distill_teacher disable \8 --recipe zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/12layer_pruned80_quant-none-vnni?recipe_type=transfer-question_answering
The usage of the script is the same as for Sparsifying Popular Transformer Models, above. However, in this example, the starting model is a pruned-quantized version of BERT from the SparseZoo (rather than a dense BERT model) and the recipe is a transfer learning recipe, which instructs Transformers to maintain sparsity as it fine-tunes (rather than a recipe that sparsifies a model from scratch).
By modifying the distill_teacher
argument, you can enable Knowledge Distillation (KD) functionality. KD provides additional
support to the sparsification or transfer learning process, enabling higher accuracy at higher levels of sparsity.
For example, the --distill_teacher
argument can be set to pull a dense SQuAD model from the SparseZoo to support the training process:
--distill_teacher zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none
Alternatively, SparseML enables you to use your a custom dense teacher model. The following command uses the dense BERT base model from the SparseZoo and fine-tunes it on the SQuAD dataset for use as a dense teacher.
1sparseml.transformers.question_answering \2 --model_name_or_path zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none \3 --dataset_name squad \4 --do_train \5 --do_eval \6 --output_dir models/teacher \7 --recipe zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none?recipe_type=transfer-question_answering
Once the dense teacher is trained, you may reuse it for KD in sparsification or sparse transfer learning.
Simply pass the path to the directory with the teacher model to the --distill_teacher
argument. For example:
--distill_teacher models/teacher
The SparseML installation provides a CLI for sparsifying your models for a specific task. Appending the --help
argument displays a full list of options for training in SparseML:
sparseml.transformers.question_answering --help
The output is:
1 --model_name_or_path MODEL_NAME_OR_PATH2 Path to pre-trained model or model identifier from huggingface.co/models3 --distill_teacher DISTILL_TEACHER4 Teacher model which needs to be a trained QA model5 --cache_dir CACHE_DIR6 Directory path to store the pre-trained models downloaded from huggingface.co7 --recipe RECIPE8 Path to a SparseML sparsification recipe, see https://github.com/neuralmagic/sparseml for more information9 --dataset_name DATASET_NAME10 The name of the dataset to use (via the datasets library).11 ...
To learn about the Hugging Face Transformers run-scripts in more detail, refer to Hugging Face Transformers documentation.
The artifacts of the training process are saved to the directory --output_dir
. Once the script terminates, the directory will have everything required to deploy or further modify the model such as:
DeepSparse uses the ONNX format to load neural networks and then deliver breakthrough performance for CPUs by leveraging the sparsity and quantization within a network.
The SparseML installation provides a sparseml.transformers.export_onnx
command that you can use to load the training model folder and create a new model.onnx
file within. Be sure the --model_path
argument points to your trained model.
1sparseml.transformers.export_onnx \2 --model_path './output' \3 --task 'question-answering'
Once the model is exported in the ONNX format, it is ready for deployment with DeepSparse.
The deployment is intuitive due to the DeepSparse Python API.
1from deepsparse import Pipeline23qa_pipeline = Pipeline.create(4 task="question-answering",5 model_path='./output'6)78inference = qa_pipeline(question="What's my name?", context="My name is Snorlax")>> {'score': 0.9947717785835266, 'start': 11, 'end': 18, 'answer': 'Snorlax'}
To learn more, refer to the appropriate documentation in the DeepSparse repository.
For Neural Magic Support, sign up or log into our Neural Magic Community Slack. Bugs, feature requests, or additional questions can also be posted to our GitHub Issue Queue.