Running the Spoken Digit Recognition Model¶
Overview
The example C++ application in this tutorial is called snpe-net-run. It is a command line executable that executes a neural network using Qualcomm® Neural Processing SDK APIs.
The required arguments to snpe-net-run are:
A neural network model in the DLC file format
An input list file with paths to the input data.
Optional arguments to snpe-net-run are:
Choice of GPU or DSP runtime (default is CPU)
Output directory (default is ./output)
Show help description
snpe-net-run creates and populates an output directory with the results of executing the neural network on the input data.
The Qualcomm® Neural Processing SDK provides Linux and Android binaries of snpe-net-run under
$SNPE_ROOT/bin/x86_64-linux-clang
$SNPE_ROOT/bin/aarch64-android
$SNPE_ROOT/bin/aarch64-oe-linux-gcc8.2
$SNPE_ROOT/bin/aarch64-oe-linux-gcc9.3
Introduction
This chapter will show an example of recognizing the 10 classes in the free spoken digit dataset, with data processing and a 4-layer neural network, through Qualcomm® Neural Processing SDK. The step-by-step example will create, train, convert, and execute a TensorFlow-Keras audio model with Qualcomm® Neural Processing SDK.
As a prerequisite, users should download the Free Spoken Digit Dataset (FSDD).
cd $SNPE_ROOT/examples/Models/spoken_digit
git clone https://github.com/Jakobovski/free-spoken-digit-dataset
The external python3 packages required for this example are:
librosa (0.10.2)
tensorflow (2.10.1)
There are five files and a single directory in the $SNPE_ROOT/examples/Models/spoken_digit folder
free-spoken-digit-dataset (download from git)
input_list.txt
interpretRawDNNOutput.py
processSpokenDigitInput.py
spoken_digit.py
NOTICE.txt
The interpretRawDNNOutput.py will translate Qualcomm® Neural Processing SDK output and display the prediction.
The processSpokenDigitInput.py processes user input wav audio file into raw format for snpe-net-run.
The spoken_digit.py python3 script creates and trains a 5-layer neural network model. After training is done, the corresponding frozen protobuf file will be generated.
The free-spoken-digit-dataset directory is the dataset downloaded by the user.
Prerequisites
The Qualcomm® Neural Processing SDK has been set up following the Qualcomm (R) Neural Processing SDK Setup chapter.
The Tutorials Setup has been completed.
TensorFlow is installed (see TensorFlow Setup)
Create, Train, and Convert Spoken Digit Model
Run spoken_digit.py to create and train the Word-RNN model.
cd $SNPE_ROOT/examples/Models/spoken_digit
python3 spoken_digit.py
The terminal will show the following messages.
Successfully split free-spoken-digit-dataset training/testing data.
Training data created.
Epoch 1/20
4/4 [==============================] - 1s 142ms/step - loss: 30.2495 - accuracy: 0.1328 - val_loss: 25.0048 - val_accuracy: 0.1406
Epoch 2/20
4/4 [==============================] - 0s 42ms/step - loss: 16.7755 - accuracy: 0.1680 - val_loss: 10.5915 - val_accuracy: 0.1680
Epoch 3/20
4/4 [==============================] - 0s 44ms/step - loss: 9.0951 - accuracy: 0.1504 - val_loss: 8.5512 - val_accuracy: 0.1641
...
...
...
Epoch 18/20
4/4 [==============================] - 0s 31ms/step - loss: 0.2890 - accuracy: 0.9297 - val_loss: 1.8068 - val_accuracy: 0.6504
Epoch 19/20
4/4 [==============================] - 0s 28ms/step - loss: 0.2399 - accuracy: 0.9336 - val_loss: 1.7628 - val_accuracy: 0.6602
Epoch 20/20
4/4 [==============================] - 0s 29ms/step - loss: 0.2024 - accuracy: 0.9512 - val_loss: 1.7234 - val_accuracy: 0.6777
Optimization done.
Save frozen graph in spoken_digit.pb.
Next, convert the frozen graph model with snpe-tensorflow-to-dlc.
snpe-tensorflow-to-dlc --input_network model/spoken_digit.pb \
--input_dim x "1, 10, 35" \
--out_node "Identity \
--output_path spoken_digit.dlc
After DLC conversion, we can view the converted dlc architecture with snpe-dlc-info and snpe-dlc-viewer as follows:
snpe-dlc-info -i spoken_digit.dlc
The output will be:
-------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Name | Type | Inputs | Outputs | Out Dims | Runtimes | Parameters |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | sequential/dense/Tensordot/transpose | Transpose | x:0 (data type: Float_32; tensor dimension: [1,10,35]; tensor type: APP_WRITE) [NW Input] | sequential/dense/Tensordot/transpose:0 (data type: Float_32; tensor dimension: [1,10,35]; tensor type: NATIVE) | 1x10x35 | A D G C | packageName: qti.aisw |
| | | | | | | | perm: [0, 1, 2] |
| 1 | sequential/dense/Tensordot/Reshape:0 | Reshape | sequential/dense/Tensordot/transpose:0 (data type: Float_32; tensor dimension: [1,10,35]; tensor type: NATIVE) | sequential/dense/Tensordot/Reshape:0 (data type: Float_32; tensor dimension: [10,35]; tensor type: NATIVE) | 10x35 | A D G C | packageName: qti.aisw |
| | | | | | | | shape: [10, 35] |
| 2 | sequential/dense/Tensordot/MatMul | FullyConnected | sequential/dense/Tensordot/Reshape:0 (data type: Float_32; tensor dimension: [10,35]; tensor type: NATIVE) | sequential/dense/Tensordot/MatMul:0 (data type: Float_32; tensor dimension: [10,256]; tensor type: NATIVE) | 10x256 | A D G C | bias_op_name: sequential/dense/BiasAdd |
| | | | sequential/dense/Tensordot/ReadVariableOp:0 (data type: Float_32; tensor dimension: [256,35]; tensor type: STATIC) | | | | packageName: qti.aisw |
| | | | sequential/dense/BiasAdd/ReadVariableOp:0 (data type: Float_32; tensor dimension: [256]; tensor type: STATIC) | | | | param count: 9k (16.2%) |
| | | | | | | | MACs per inference: 8k (15.9%) |
| 3 | sequential/dense/BiasAdd:0 | Reshape | sequential/dense/Tensordot/MatMul:0 (data type: Float_32; tensor dimension: [10,256]; tensor type: NATIVE) | sequential/dense/BiasAdd:0 (data type: Float_32; tensor dimension: [1,10,256]; tensor type: NATIVE) | 1x10x256 | A D G C | packageName: qti.aisw |
| | | | | | | | shape: [1, 10, 256] |
| 4 | sequential/dense/Relu | ElementWiseNeuron | sequential/dense/BiasAdd:0 (data type: Float_32; tensor dimension: [1,10,256]; tensor type: NATIVE) | sequential/dense_1/Tensordot/transpose:0 (data type: Float_32; tensor dimension: [1,10,256]; tensor type: NATIVE) | 1x10x256 | A D G C | operation: 4 |
| | | | | | | | packageName: qti.aisw |
| 5 | sequential/dense_1/Tensordot/Reshape:0 | Reshape | sequential/dense_1/Tensordot/transpose:0 (data type: Float_32; tensor dimension: [1,10,256]; tensor type: NATIVE) | sequential/dense_1/Tensordot/Reshape:0 (data type: Float_32; tensor dimension: [10,256]; tensor type: NATIVE) | 10x256 | A D G C | packageName: qti.aisw |
| | | | | | | | shape: [10, 256] |
| 6 | sequential/dense_1/Tensordot/MatMul | FullyConnected | sequential/dense_1/Tensordot/Reshape:0 (data type: Float_32; tensor dimension: [10,256]; tensor type: NATIVE) | sequential/dense_1/Tensordot/MatMul:0 (data type: Float_32; tensor dimension: [10,128]; tensor type: NATIVE) | 10x128 | A D G C | bias_op_name: sequential/dense_1/BiasAdd |
| | | | sequential/dense_1/Tensordot/ReadVariableOp:0 (data type: Float_32; tensor dimension: [128,256]; tensor type: STATIC) | | | | packageName: qti.aisw |
| | | | sequential/dense_1/BiasAdd/ReadVariableOp:0 (data type: Float_32; tensor dimension: [128]; tensor type: STATIC) | | | | param count: 32k (57.9%) |
| | | | | | | | MACs per inference: 32k (58.2%) |
| 7 | sequential/dense_1/BiasAdd:0 | Reshape | sequential/dense_1/Tensordot/MatMul:0 (data type: Float_32; tensor dimension: [10,128]; tensor type: NATIVE) | sequential/dense_1/BiasAdd:0 (data type: Float_32; tensor dimension: [1,10,128]; tensor type: NATIVE) | 1x10x128 | A D G C | packageName: qti.aisw |
| | | | | | | | shape: [1, 10, 128] |
| 8 | sequential/dense_1/Relu | ElementWiseNeuron | sequential/dense_1/BiasAdd:0 (data type: Float_32; tensor dimension: [1,10,128]; tensor type: NATIVE) | sequential/dense_2/Tensordot/transpose:0 (data type: Float_32; tensor dimension: [1,10,128]; tensor type: NATIVE) | 1x10x128 | A D G C | operation: 4 |
| | | | | | | | packageName: qti.aisw |
| 9 | sequential/dense_2/Tensordot/Reshape:0 | Reshape | sequential/dense_2/Tensordot/transpose:0 (data type: Float_32; tensor dimension: [1,10,128]; tensor type: NATIVE) | sequential/dense_2/Tensordot/Reshape:0 (data type: Float_32; tensor dimension: [10,128]; tensor type: NATIVE) | 10x128 | A D G C | packageName: qti.aisw |
| | | | | | | | shape: [10, 128] |
| 10 | sequential/dense_2/Tensordot/MatMul | FullyConnected | sequential/dense_2/Tensordot/Reshape:0 (data type: Float_32; tensor dimension: [10,128]; tensor type: NATIVE) | sequential/dense_2/Tensordot/MatMul:0 (data type: Float_32; tensor dimension: [10,64]; tensor type: NATIVE) | 10x64 | A D G C | bias_op_name: sequential/dense_2/BiasAdd |
| | | | sequential/dense_2/Tensordot/ReadVariableOp:0 (data type: Float_32; tensor dimension: [64,128]; tensor type: STATIC) | | | | packageName: qti.aisw |
| | | | sequential/dense_2/BiasAdd/ReadVariableOp:0 (data type: Float_32; tensor dimension: [64]; tensor type: STATIC) | | | | param count: 8k (14.5%) |
| | | | | | | | MACs per inference: 8k (14.5%) |
| 11 | sequential/dense_2/BiasAdd:0 | Reshape | sequential/dense_2/Tensordot/MatMul:0 (data type: Float_32; tensor dimension: [10,64]; tensor type: NATIVE) | sequential/dense_2/BiasAdd:0 (data type: Float_32; tensor dimension: [1,10,64]; tensor type: NATIVE) | 1x10x64 | A D G C | packageName: qti.aisw |
| | | | | | | | shape: [1, 10, 64] |
| 12 | sequential/dense_2/Relu | ElementWiseNeuron | sequential/dense_2/BiasAdd:0 (data type: Float_32; tensor dimension: [1,10,64]; tensor type: NATIVE) | sequential/dense_2/Relu:0 (data type: Float_32; tensor dimension: [1,10,64]; tensor type: NATIVE) | 1x10x64 | A D G C | operation: 4 |
| | | | | | | | packageName: qti.aisw |
| 13 | sequential/dense_3/MatMul | FullyConnected | sequential/dense_2/Relu:0 (data type: Float_32; tensor dimension: [1,10,64]; tensor type: NATIVE) | sequential/dense_3/BiasAdd:0 (data type: Float_32; tensor dimension: [1,10]; tensor type: NATIVE) | 1x10 | A D G C | bias_op_name: sequential/dense_3/BiasAdd |
| | | | sequential/dense_3/MatMul/ReadVariableOp:0 (data type: Float_32; tensor dimension: [10,640]; tensor type: STATIC) | | | | packageName: qti.aisw |
| | | | sequential/dense_3/BiasAdd/ReadVariableOp:0 (data type: Float_32; tensor dimension: [10]; tensor type: STATIC) | | | | param count: 6k (11.3%) |
| | | | | | | | MACs per inference: 6k (11.4%) |
| 14 | sequential/dense_3/Softmax | Softmax | sequential/dense_3/BiasAdd:0 (data type: Float_32; tensor dimension: [1,10]; tensor type: NATIVE) | Identity:0 (data type: Float_32; tensor dimension: [1,10]; tensor type: APP_READ) | 1x10 | A D G C | axis: 1 |
| | | | | | | | beta: 1 |
| | | | | | | | packageName: qti.aisw |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
snpe-dlc-viewer -i spoken_digit.dlc
The output network model HTML file will be saved at /tmp/spoken_digit.html.
Run on Linux Host
First, processSpokenDigitInput.py script needs to be run in order to process the audio input data test/5_jackson_0.wav to raw format. The output name will be input.raw
python3 processSpokenDigitInput.py test/5_jackson_0.wav
Next, run snpe-net-run to get the inference result.
snpe-net-run --container spoken_digit.dlc --input_list input_list.txt
After snpe-net-run completes, verify that the results are populated in the $SNPE_ROOT/examples/Models/spoken_digit/output directory. There should be one or more .log files and several Result_X directories.
The raw output prediction will be located in $SNPE_ROOT/examples/Models/spoken_digit/output/Result_0/output/Result_0/Identity:0.raw. It holds the output tensor data of 10 probabilities for the 10 categories. The element with the highest value represents the top classification. We can use a python3 script to interpret the classification results as follows:
python3 interpretRawDNNOutput.py output/Result_0/Identity:0.raw
The output should look like the following, showing classification results for all the images.
0 : 0.000110
1 : 0.012185
2 : 0.000011
3 : 0.000593
4 : 0.002053
5 : 0.814478
6 : 0.002425
7 : 0.043664
8 : 0.003228
9 : 0.121254
Classification Result: Class 5.
The final output shows the audio file was classified as “Class 5” (from a total of 10 labels) with a probability of 0.814478. Look at the rest of the output to see the model’s classification on other classes.
Binary data input
Note that the spoken digit classification model does not accept wav files as input. The model expects its input tensor dimension to be 1 x 10 x 35 as a float array. The processSpokenDigitInput.py script performs a wav to binary data conversion. The script is an example of how wav audio files can be preprocessed to generate input for the classification model.
Run on Target Platform ( Android/LE/UBUN )
Select target architecture
Qualcomm® Neural Processing SDK provides binaries for different target platforms. Android binaries are compiled with clang using libc++ STL implementation. Below are examples for aarch64-android (Android platform) and aarch64-oe-linux-gcc11.2 toolchain (LE platform). Similarly other toolchains for different platforms can be set as SNPE_TARGET_ARCH
# For Android targets: architecture: arm64-v8a - compiler: clang - STL: libc++
export SNPE_TARGET_ARCH=aarch64-android
# Example for LE targets
export SNPE_TARGET_ARCH=aarch64-oe-linux-gcc11.2
For simplicity, this tutorial sets the target binaries to aarch64-android.
Push libraries and binaries to target
Push Qualcomm® Neural Processing SDK libraries and the prebuilt snpe-net-run executable to /data/local/tmp/snpeexample on the Android target. Set SNPE_TARGET_DSPARCH to the DSP architecture of the target Android device.
export SNPE_TARGET_ARCH=aarch64-android
export SNPE_TARGET_DSPARCH=hexagon-v73
adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin"
adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib"
adb shell "mkdir -p /data/local/tmp/snpeexample/dsp/lib"
adb push $SNPE_ROOT/lib/$SNPE_TARGET_ARCH/*.so \
/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
adb push $SNPE_ROOT/lib/$SNPE_TARGET_DSPARCH/unsigned/*.so \
/data/local/tmp/snpeexample/dsp/lib
adb push $SNPE_ROOT/bin/$SNPE_TARGET_ARCH/snpe-net-run \
/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
Set up enviroment variables
Set up the library path, the path variable, and the target architecture in adb shell to run the executable with the -h argument to see its description.
adb shell
export SNPE_TARGET_ARCH=aarch64-android
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
snpe-net-run -h
exit
Push model data to Android target
To execute the spoken digit classification model on your Android target follow these steps:
adb shell "mkdir -p /data/local/tmp/spoken_digit"
adb push input.raw /data/local/tmp/spoken_digit
adb push input_list.txt /data/local/tmp/spoken_digit
adb push spoken_digit.dlc /data/local/tmp/spoken_digit
Note: It may take some time to push the DLC file to your target.
Running on Android using CPU Runtime
Run the Android C++ executable with the following commands:
adb shell
export SNPE_TARGET_ARCH=aarch64-android
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
cd /data/local/tmp/spoken_digit
snpe-net-run --container spoken_digit.dlc --input_list input_list.txt
exit
The executable will create the results folder: /data/local/tmp/spoken_digit/output. To pull the output:
adb pull /data/local/tmp/spoken_digit/output output_android
Check the classification results by running the interpret python3 script.
python3 interpretRawDNNOutput.py output_android/Result_0/Identity:0.raw
The output should look like the following, showing classification results for all the images.
0 : 0.000110
1 : 0.012185
2 : 0.000011
3 : 0.000593
4 : 0.002053
5 : 0.814478
6 : 0.002425
7 : 0.043664
8 : 0.003228
9 : 0.121254
Classification Result: Class 5.
Running on Android using GPU Runtime
Try running on an Android target with the –use_gpu option as follows. By default, the GPU runtime runs in GPU_FLOAT32_16_HYBRID (math: full float and data storage: half float) mode. We can change the mode to GPU_FLOAT16 (math: half float and data storage: half float) using –gpu_mode option.
adb shell
export SNPE_TARGET_ARCH=aarch64-android
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
cd /data/local/tmp/spoken_digit
snpe-net-run --container spoken_digit.dlc --input_list input_list.txt --use_gpu
exit
Pull the output into an output_android_gpu directory.
adb pull /data/local/tmp/spoken_digit/output output_android_gpu
Again, we can run the interpret script to see the classification results.
python3 interpretRawDNNOutput.py output_android_gpu/Result_0/Identity:0.raw
The output should look like the following, showing classification results for all the images.
0 : 0.000113
1 : 0.012330
2 : 0.000011
3 : 0.000604
4 : 0.002087
5 : 0.813591
6 : 0.002461
7 : 0.043883
8 : 0.003279
9 : 0.121640
Classification Result: Class 5.
Review the output for the classification results. Classification results are identical to the run with CPU runtime, but there are differences in the probabilities associated with the output labels due to floating point precision differences.