Using MobilenetSSD¶
Tensorflow MobilenetSSD model
Tensorflow Mobilenet SSD frozen graphs come in a couple of flavors. The standard frozen graph and a quantization aware frozen graph. The following example uses a quantization aware frozen graph to ensure accurate results on the Qualcomm® Neural Processing SDK runtimes.
Prerequisites
The quantization aware model conversion process was tested using Tensorflow v1.11 however other versions may also work. The CPU version of Tensorflow was used to avoid out of memory issues observed across various GPU cards during conversion.
Setup the Tensorflow Object Detection Framework
The quantization aware model is provided as a TFLite frozen graph. However Qualcomm® Neural Processing SDK requires a Tensorflow frozen graph (.PB). To convert the quantized model, the object detection framework is used to export to a Tensorflow frozen graph. Follow these steps to clone the object detection framework:
mkdir ~/tfmodels
cd ~/tfmodels
git clone https://github.com/tensorflow/models.git
Checkout a tested object detection framework commit (SHA)
git checkout ad386df597c069873ace235b931578671526ee00
Follow third party instructions to setup the Tensorflow object detection framework
Download the quantization aware model
A specific version of the Tensorflow MobilenetSSD model has been tested: ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03.tar.gz
wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03.tar.gz
After downloading the model extract the contents to a directory.
tar xzvf ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03.tar.gz
Export a trained graph from the object detection framework
Follow these instructions to export the Tensorflow graph:
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md
or modify and execute this sample script
Create this file, export_train.sh, using your favorite editor. Modify the paths to the correct directory location of the downloaded quantization aware model files.
#!/bin/bash
INPUT_TYPE=image_tensor
PIPELINE_CONFIG_PATH=<path_to>/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/pipeline.config
TRAINED_CKPT_PREFIX=<path_to>/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/model.ckpt
EXPORT_DIR=<path_to>/exported
pushd ~/tfmodels/models/tfmodels/research
python3 object_detection/export_inference_graph.py \
--input_type=${INPUT_TYPE} \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--trained_checkpoint_prefix=${TRAINED_CKPT_PREFIX} \
--output_directory=${EXPORT_DIR}
popd
Make the script executable
chmod u+x export_train.sh
Run the script
./export_train.sh
This should generate a frozen graph in
<path_to>/exported/frozen_inference_graph.pb
Convert the frozen graph using the snpe-tensorflow-to-dlc converter.
snpe-tensorflow-to-dlc --input_network <path_to>/exported/frozen_inference_graph.pb --input_dim Preprocessor/sub 1,300,300,3 --out_node detection_classes --out_node detection_boxes --out_node detection_scores ---output_path mobilenet_ssd.dlc --allow_unconsumed_nodes
After Qualcomm® Neural Processing SDK conversion you should have a mobilenet_ssd.dlc that can be loaded and run in the Qualcomm® Neural Processing SDK runtimes.
The output layers for the model are:
Postprocessor/BatchMultiClassNonMaxSuppression
add
The output buffer names are:
(classes) detection_classes:0 (+1 index offset)
(classes) Postprocessor/BatchMultiClassNonMaxSuppression_classes (0 index offset)
(boxes) Postprocessor/BatchMultiClassNonMaxSuppression_boxes
(scores) Postprocessor/BatchMultiClassNonMaxSuppression_scores
Running the model in |Qualcomm(R)| Neural Processing SDK
The following are limitations and suggestions for running DLC model in Qualcomm® Neural Processing SDK:
Batch dimension > 1 is not supported.
DetectionOutput layer is supported on CPU runtime processor only. To run the model using different runtime processor, such as GPU or DSP, CPU fallback mode must be enabled in Runtime List (see Snpe_SNPEBuilder_SetRuntimeProcessorOrder() description in Qualcomm® Neural Processing SDK API). If using snpe-net-run tool, use
–runtime_orderoptionConfigure DetectionOutput layer reasonably. Performance of DetectionOutput layer (i.e. processing time) is function of layer parameters:
top_k,keep_top_kandconfidence_threshold. For example,top_kparameters have practically exponential impact on processing time; e.g. top_k=100 will result in much smaller processing time vs. top_k=1000. Smallerconfidence_thresholdwill result in larger number of boxes to output, and vice versa.Resizing input dimensions at SNPE object creation/build time is not allowed. Note that input dimensions are embedded into DLC model during conversion, but in some cases can be overridden via Snpe_SNPEBuilder_SetInputDimensions() (see description in Qualcomm® Neural Processing SDK API) at SNPE object creation/build time. Due to PriorBox layer folding in the model converter, input/network resizing is not possible.