C++ Tutorial - Build the Sample¶
Prerequisites
The Qualcomm® Neural Processing SDK has been set up following the Qualcomm (R) Neural Processing SDK Setup chapter.
The Tutorials Setup has been completed.
Introduction
This tutorial demonstrates how to build a C++ sample application that can execute neural network models on the PC or target device. Please note, while this sample code does not do any error checking, it is strongly recommended that users check for errors when using the Qualcomm® Neural Processing SDK APIs.
Most applications will follow the following pattern while using a neural network:
static zdl::DlSystem::Runtime_t runtime = checkRuntime(); std::unique_ptr<zdl::DlContainer::IDlContainer> container = loadContainerFromFile(dlc); std::unique_ptr<zdl::SNPE::SNPE> snpe = setBuilderOptions(container, runtime, useUserSuppliedBuffers); std::unique_ptr<zdl::DlSystem::ITensor> inputTensor = loadInputTensor(snpe, fileLine); // ITensor loadInputUserBuffer(applicationInputBuffers, snpe, fileLine); // User Buffer executeNetwork(snpe , inputTensor, OutputDir, inputListNum); // ITensor executeNetwork(snpe, inputMap, outputMap, applicationOutputBuffers, OutputDir, inputListNum); // User BufferThe sections below describe how to implement each step described above. For more details, please refer to the collection of source code files located at $SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPP.
Get Available Runtime
The code excerpt below illustrates how to check if a specific runtime is available using the native APIs (the GPU runtime is used as an example).
zdl::DlSystem::Runtime_t checkRuntime() { static zdl::DlSystem::Version_t Version = zdl::SNPE::SNPEFactory::getLibraryVersion(); static zdl::DlSystem::Runtime_t Runtime; std::cout << "Qualcomm (R) Neural Processing SDK Version: " << Version.asString().c_str() << std::endl; //Print Version number if (zdl::SNPE::SNPEFactory::isRuntimeAvailable(zdl::DlSystem::Runtime_t::GPU)) { Runtime = zdl::DlSystem::Runtime_t::GPU; } else { Runtime = zdl::DlSystem::Runtime_t::CPU; } return Runtime; }Load Network
The code excerpt below illustrates how to load a network from the Qualcomm® Neural Processing SDK container file (DLC).
std::unique_ptr<zdl::DlContainer::IDlContainer> loadContainerFromFile(std::string containerPath) { std::unique_ptr<zdl::DlContainer::IDlContainer> container; container = zdl::DlContainer::IDlContainer::open(containerPath); return container; }Load UDO
The code excerpt below illustrates how to load UDO package(s).
bool loadUDOPackage(const std::string& UdoPackagePath) { std::vector<std::string> udoPkgPathsList; split(udoPkgPathsList, UdoPackagePath, ','); for (const auto &u : udoPkgPathsList) { if (false == zdl::SNPE::SNPEFactory::addOpPackage(u)) { std::cerr << "Error while loading UDO package: "<< u << std::endl; return false; } } return true; }Qualcomm® Neural Processing SDK can execute network with user-defined operations (UDO). Please refer to UDO Tutorial for implementing an UDO. The UDO can be specified to snpe-sample using “-u” option.
Set Network Builder Options
The following code demonstrates how to instantiate a SNPE Builder object, which will be used to execute the network with the given parameters.
std::unique_ptr<zdl::SNPE::SNPE> setBuilderOptions(std::unique_ptr<zdl::DlContainer::IDlContainer>& container, zdl::DlSystem::RuntimeList runtimeList, bool useUserSuppliedBuffers) { std::unique_ptr<zdl::SNPE::SNPE> snpe; zdl::SNPE::SNPEBuilder snpeBuilder(container.get()); snpe = snpeBuilder.setOutputLayers({}) .setRuntimeProcessorOrder(runtimeList) .setUseUserSuppliedBuffers(useUserSuppliedBuffers) .build(); return snpe; }Load Network Inputs
Network inputs and outputs can be either user-backed buffers or ITensors (built-in Qualcomm® Neural Processing SDK buffers), but not both. The advantage of using user-backed buffers is that it eliminates an extra copy from user buffers to create ITensors. Both methods of loading network inputs are shown below.
Using User Buffers
Qualcomm® Neural Processing SDK can create its network inputs and outputs from user-backed buffers. Note that Qualcomm® Neural Processing SDK expects the values of the buffers to be present and valid during the duration of its execution.
Here is a function for creating a Qualcomm® Neural Processing SDK UserBuffer from a user-backed buffer and storing it in a zdl::DlSystem::UserBufferMap These maps are a convenient collection of all input or output user buffers that can be passed to Qualcomm® Neural Processing SDK to execute the network.
Disclaimer: The strides of the buffer should already be known by the user and should not be calculated as shown below. The calculation shown is solely used for executing the example code.
void createUserBuffer(zdl::DlSystem::UserBufferMap& userBufferMap, std::unordered_map<std::string, std::vector<uint8_t>>& applicationBuffers, std::vector<std::unique_ptr<zdl::DlSystem::IUserBuffer>>& snpeUserBackedBuffers, std::unique_ptr<zdl::SNPE::SNPE>& snpe, const char * name) { // get attributes of buffer by name auto bufferAttributesOpt = snpe->getInputOutputBufferAttributes(name); if (!bufferAttributesOpt) throw std::runtime_error(std::string("Error obtaining attributes for input tensor ") + name); // calculate the size of buffer required by the input tensor const zdl::DlSystem::TensorShape& bufferShape = (*bufferAttributesOpt)->getDims(); // Calculate the stride based on buffer strides, assuming tightly packed. // Note: Strides = Number of bytes to advance to the next element in each dimension. // For example, if a float tensor of dimension 2x4x3 is tightly packed in a buffer of 96 bytes, then the strides would be (48,12,4) // Note: Buffer stride is usually known and does not need to be calculated. std::vector<size_t> strides(bufferShape.rank()); strides[strides.size() - 1] = sizeof(float); size_t stride = strides[strides.size() - 1]; for (size_t i = bufferShape.rank() - 1; i > 0; i--) { stride *= bufferShape[i]; strides[i-1] = stride; } const size_t bufferElementSize = (*bufferAttributesOpt)->getElementSize(); size_t bufSize = calcSizeFromDims(bufferShape.getDimensions(), bufferShape.rank(), bufferElementSize); // set the buffer encoding type zdl::DlSystem::UserBufferEncodingFloat userBufferEncodingFloat; // create user-backed storage to load input data onto it applicationBuffers.emplace(name, std::vector<uint8_t>(bufSize)); // create Qualcomm (R) Neural Processing SDK user buffer from the user-backed buffer zdl::DlSystem::IUserBufferFactory& ubFactory = zdl::SNPE::SNPEFactory::getUserBufferFactory(); snpeUserBackedBuffers.push_back(ubFactory.createUserBuffer(applicationBuffers.at(name).data(), bufSize, strides, &userBufferEncodingFloat)); // add the user-backed buffer to the inputMap, which is later on fed to the network for execution userBufferMap.add(name, snpeUserBackedBuffers.back().get()); }The following function then shows how to load input data from file(s) to user buffers. Note that the input values are simply loaded onto user-backed buffers, on top of which Qualcomm® Neural Processing SDK can create Qualcomm® Neural Processing SDK UserBuffers, as shown above.
void loadInputUserBuffer(std::unordered_map<std::string, std::vector<uint8_t>>& applicationBuffers, std::unique_ptr<zdl::SNPE::SNPE>& snpe, const std::string& fileLine) { // get input tensor names of the network that need to be populated const auto& inputNamesOpt = snpe->getInputTensorNames(); if (!inputNamesOpt) throw std::runtime_error("Error obtaining input tensor names"); const zdl::DlSystem::StringList& inputNames = *inputNamesOpt; assert(inputNames.size() > 0); // treat each line as a space-separated list of input files std::vector<std::string> filePaths; split(filePaths, fileLine, ' '); if (inputNames.size()) std::cout << "Processing DNN Input: " << std::endl; for (size_t i = 0; i < inputNames.size(); i++) { const char* name = inputNames.at(i); std::string filePath(filePaths[i]); // print out which file is being processed std::cout << "\t" << i + 1 << ") " << filePath << std::endl; // load file content onto application storage buffer, // on top of which, Qualcomm (R) Neural Processing SDK has created a user buffer loadByteDataFile(filePath, applicationBuffers.at(name)); }; }Using ITensors
std::unique_ptr<zdl::DlSystem::ITensor> loadInputTensor (std::unique_ptr<zdl::SNPE::SNPE> & snpe , std::string& fileLine) { std::unique_ptr<zdl::DlSystem::ITensor> input; const auto &strList_opt = snpe->getInputTensorNames(); if (!strList_opt) throw std::runtime_error("Error obtaining Input tensor names"); const auto &strList = *strList_opt; // Make sure the network requires only a single input assert (strList.size() == 1); // If the network has a single input, each line represents the input file to be loaded for that input std::string filePath(fileLine); std::cout << "Processing DNN Input: " << filePath << "\n"; std::vector<float> inputVec = loadFloatDataFile(filePath); /* Create an input tensor that is correctly sized to hold the input of the network. Dimensions that have no fixed size will be represented with a value of 0. */ const auto &inputDims_opt = snpe->getInputDimensions(strList.at(0)); const auto &inputShape = *inputDims_opt; /* Calculate the total number of elements that can be stored in the tensor so that we can check that the input contains the expected number of elements. With the input dimensions computed create a tensor to convey the input into the network. */ input = zdl::SNPE::SNPEFactory::getTensorFactory().createTensor(inputShape); /* Copy the loaded input file contents into the networks input tensor.SNPE's ITensor supports C++ STL functions like std::copy() */ std::copy(inputVec.begin(), inputVec.end(), input->begin()); return input; }Execute the Network & Process Output
The following snippets of code use the native API to execute the network (in UserBuffer or ITensor mode) and show how to iterate through the newly populated output tensor.
Using User Buffers
void executeNetwork(std::unique_ptr<zdl::SNPE::SNPE>& snpe, zdl::DlSystem::UserBufferMap& inputMap, zdl::DlSystem::UserBufferMap& outputMap, std::unordered_map<std::string,std::vector<uint8_t>>& applicationOutputBuffers, const std::string& outputDir, int num) { // Execute the network and store the outputs in user buffers specified in outputMap snpe->execute(inputMap, outputMap); // Get all output buffer names from the network const zdl::DlSystem::StringList& outputBufferNames = outputMap.getUserBufferNames(); // Iterate through output buffers and print each output to a raw file std::for_each(outputBufferNames.begin(), outputBufferNames.end(), [&](const char* name) { std::ostringstream path; path << outputDir << "/Result_" << num << "/" << name << ".raw"; SaveUserBuffer(path.str(), applicationOutputBuffers.at(name)); }); } // The following is a partial snippet of the function void SaveUserBuffer(const std::string& path, const std::vector<uint8_t>& buffer) { ... std::ofstream os(path, std::ofstream::binary); if (!os) { std::cerr << "Failed to open output file for writing: " << path << "\n"; std::exit(EXIT_FAILURE); } if (!os.write((char*)(buffer.data()), buffer.size())) { std::cerr << "Failed to write data to: " << path << "\n"; std::exit(EXIT_FAILURE); } }Using ITensors
void executeNetwork(std::unique_ptr<zdl::SNPE::SNPE>& snpe, std::unique_ptr<zdl::DlSystem::ITensor>& input, std::string OutputDir, int num) { //Execute the network and store the outputs that were specified when creating the network in a TensorMap static zdl::DlSystem::TensorMap outputTensorMap; snpe->execute(input.get(), outputTensorMap); zdl::DlSystem::StringList tensorNames = outputTensorMap.getTensorNames(); //Iterate through the output Tensor map, and print each output layer name std::for_each( tensorNames.begin(), tensorNames.end(), [&](const char* name) { std::ostringstream path; path << OutputDir << "/" << "Result_" << num << "/" << name << ".raw"; auto tensorPtr = outputTensorMap.getTensor(name); SaveITensor(path.str(), tensorPtr); }); } // The following is a partial snippet of the function void SaveITensor(const std::string& path, const zdl::DlSystem::ITensor* tensor) { ... std::ofstream os(path, std::ofstream::binary); if (!os) { std::cerr << "Failed to open output file for writing: " << path << "\n"; std::exit(EXIT_FAILURE); } for ( auto it = tensor->cbegin(); it != tensor->cend(); ++it ) { float f = *it; if (!os.write(reinterpret_cast<char*>(&f), sizeof(float))) { std::cerr << "Failed to write data to: " << path << "\n"; std::exit(EXIT_FAILURE); } } }Using IOBufferDataTypeMap
The IOBufferDataTypeMap is used to specify the intended data type for input/output of a network. The data type values include zdl::DlSystem::IOBufferDataType_t::FLOATING_POINT_32, zdl::DlSystem::IOBufferDataType_t::FIXED_POINT_8 and zdl::DlSystem::IOBufferDataType_t::FIXED_POINT_16.
If the output of a network is of type FIXED_POINT_8 and the user intends to access the output in FLOATING_POINT_32 format, the dequantization operation is performed on the ARM side. By specifying the data type as FLOATING_POINT_32 using the IOBufferDataTypeMap API, the dequantization operation is added directly to the graph.
The following snippet of code shows how to specify the data type for a buffer using the native API.
void setBufferDataType(zdl::DlSystem::IOBufferDataTypeMap& bufferDataTypeMap, std::string bufferName, zdl::DlSystem::IOBufferDataType_t dataType) { bufferDataTypeMap.add(bufferName.c_str(), dataType); } setBufferDataType(bufferDataTypeMap, "output_1", zdl::DlSystem::IOBufferDataType_t::FLOATING_POINT_32); zdl::SNPE::SNPEBuilder snpeBuilder(container.get()); snpeBuilder.setBufferDataType(bufferDataTypeMap);Building the C++ Application
Building and Running on x86 Linux and Embedded Linux
Start by going to the snpe-sample base directory.
cd $SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPPNote the different makefiles associated with the different Linux platform. Note that the $CXX would need to be set according to the target platform. Here is a table of the supported targets, and their corresponding settings for $CXX and the Makefiles to use.
Target
Makefile
Possible CXX value
Output Location
aarch64-oe-linux-gcc8.2
Makefile.aarch64-oe-linux-gcc8.2
aarch64-oe-linux-g++
aarch64-oe-linux-gcc8.2
aarch64-oe-linux-gcc9.3
Makefile.aarch64-oe-linux-gcc9.3
aarch64-oe-linux-g++
aarch64-oe-linux-gcc9.3
aarch64-oe-linux-gcc11.2
Makefile.aarch64-oe-linux-gcc11.2
aarch64-oe-linux-g++
aarch64-oe-linux-gcc11.2
aarch64-ubuntu-gcc9.4
Makefile.aarch64-ubuntu-gcc9.4
aarch64-linux-gnu-g++
aarch64-ubuntu-gcc9.4
x86_64-linux
Makefile.x86_64-linux-clang
g++
x86_64-linux-clang
export CXX=<Name of c++ cross compiler> make -f <Makefile for the target>Note: Ensure that the path to the compiler binary is already set in $PATH.
Along with the sample executable, all other libraries need to be pushed onto their respective targets. The $LD_LIBRARY_PATH may also need to be updated to point to the support libraries. You can run the executable with -h argument to see its description.
snpe-sample -hThe description should look like the following:
DESCRIPTION: ------------ Example application demonstrating how to load and execute a neural network using the SNPE C++ API. REQUIRED ARGUMENTS: ------------------- -d <FILE> Path to the DL container containing the network. -i <FILE> Path to a file listing the inputs for the network. -o <PATH> Path to directory to store output results. OPTIONAL ARGUMENTS: ------------------- -b <TYPE> Type of buffers to use [USERBUFFER_FLOAT, USERBUFFER_TF8, ITENSOR, USERBUFFER_TF16] (ITENSOR is default). -q <BOOL> Specifies to use static quantization parameters from the model instead of input specific quantization [true, false]. Used in conjunction with USERBUFFER_TF8. -r <RUNTIME> The runtime to be used [gpu, dsp, aip, cpu] (cpu is default). -u <VAL,VAL> Path to UDO package with registration library for UDOs. Optionally, user can provide multiple packages as a comma-separated list. -z <NUMBER> The maximum number that resizable dimensions can grow into. Used as a hint to create UserBuffers for models with dynamic sized outputs. Should be a positive integer and is not applicable when using ITensor. -c Enable init caching to accelerate the initialization process of SNPE. Defaults to disable. -l <VAL,VAL,VAL> Specifies the order of precedence for runtime e.g cpu_float32, dsp_fixed8_tf etc. Valid values are:- cpu_float32 (Snapdragon CPU) = Data & Math: float 32bit gpu_float32_16_hybrid (Adreno GPU) = Data: float 16bit Math: float 32bit dsp_fixed8_tf (Hexagon DSP) = Data & Math: 8bit fixed point Tensorflow style format gpu_float16 (Adreno GPU) = Data: float 16bit Math: float 16bit cpu (Snapdragon CPU) = Same as cpu_float32 gpu (Adreno GPU) = Same as gpu_float32_16_hybrid dsp (Hexagon DSP) = Same as dsp_fixed8_tfRunning the snpe-sample assumes Running the Inception v3 Model has been previously setup.
Run snpe-sample with the AlexNet model:
cd $SNPE_ROOT/examples/Models/InceptionV3/data $SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPP/obj/local/x86_64-linux-clang/snpe-sample -b ITENSOR -d ../dlc/inception_v3.dlc -i target_raw_list.txt -o outputThe results are stored in the output directory. To process the output run the following script to generate the classifiscation results.
python3 $SNPE_ROOT/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i target_raw_list.txt -o output/ -l imagenet_slim_labels.txt Classification results cropped/notice_sign.raw 0.167454 459 brass cropped/plastic_cup.raw 0.990612 648 measuring cup cropped/chairs.raw 0.382222 832 studio couch cropped/trash_bin.raw 0.684572 413 ashcanBuilding and Running on ARM Android
Prerequisite: You will need the Android NDK to build the Android C++ executable. The tutorial assumes that you can invoke ‘ndk-build’ from the shell.
To build snpe-sample with clang Qualcomm® Neural Processing SDK binaries (i.e., aarch64-android), use the following command:
cd $SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPP ndk-build NDK_TOOLCHAIN_VERSION=clang APP_STL=c++_static NDK_PROJECT_PATH=. NDK_APPLICATION_MK=Application.mk APP_BUILD_SCRIPT=Android.mkThe ndk-build command will build arm64-v8a binaries of snpe-sample.
$SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPP/obj/local/arm64-v8a/snpe-sample
To run the Android C++ executable, push the appropriate Qualcomm® Neural Processing SDK libraries and the executable onto the Android target.
export SNPE_TARGET_ARCH=aarch64-android export SNPE_TARGET_DSPARCH=hexagon-v73 adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin" adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib" adb shell "mkdir -p /data/local/tmp/snpeexample/dsp/lib" adb push $SNPE_ROOT/lib/$SNPE_TARGET_ARCH/*.so /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib adb push $SNPE_ROOT/lib/$SNPE_TARGET_DSPARCH/unsigned/*.so /data/local/tmp/snpeexample/dsp/lib adb push $SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPP/obj/local/arm64-v8a/snpe-sample /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/binRun snpe-sample with the Inception v3 model on the target. This assumes that you have done the setup steps for running Run on Android Target to push to the target all the sample data files and Alexnet model.
adb shell export SNPE_TARGET_ARCH=aarch64-android export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin cd /data/local/tmp/inception_v3 snpe-sample -b ITENSOR -d inception_v3.dlc -i target_raw_list.txt -o output_sample exitPull the target output into a host side output directory.
cd $SNPE_ROOT/examples/Models/InceptionV3 adb pull /data/local/tmp/inception_v3/output_sample output_sampleAgain, we can run the interpret script to see the classification results.
python3 $SNPE_ROOT/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/target_raw_list.txt -o output_sample/ -l data/imagenet_slim_labels.txt Classification results cropped/notice_sign.raw 0.167454 459 brass cropped/plastic_cup.raw 0.990612 648 measuring cup cropped/chairs.raw 0.382221 832 studio couch cropped/trash_bin.raw 0.684573 413 ashcanBuilding and running on Linux (Yocto Based)
- Prerequisite: This assumes that Tutorials Setup has been completed.
For those devices which have Yocto based Linux OS, GCC compiler needs to be used to build the sample source code. To support Yocto Kirkstone based devices, libraries are compiled with gcc11.2. Please refer below steps for building SNPE sample app:
export SNPE_ROOT=/path/to/extracted/snpe-sdk cd ${SNPE_ROOT}/examples/SNPE/NativeCpp/SampleCode_CPP/ export AARCH64_LINUX_OE_GCC_112=/path/to/extracted/toolchain make CXX="<installed_toolchain_path>/tmp/sysroots/x86_64/usr/bin/aarch64-qcom-linux/aarch64-qcom-linux-g++ --sysroot=<installed_toolchain_path>/tmp/sysroots/qcm6490" -f Makefile.aarch64-oe-linux-gcc11.2After executing make from above, you should be able to see two new folders in the same directory:
bin: contains snpe-sample binaries for each platform within respective directories.
obj: contains all the object files that were used for building and linking the executable.
To delete all the artifacts that were generated in the above step, run:
cd ${SNPE_ROOT}/examples/SNPE/NativeCpp/SampleCode_CPP
make clean
To run the snpe-sample C++ executable, push the appropriate Qualcomm® Neural Processing SDK libraries and the executable i.e., aarch64-oe-linux-gcc11.2 onto the target. Run snpe-sample using below command:
snpe-sample -b ITENSOR -d <input_dlc> -i <target_raw_list> -o output_sample