C++ Tutorial - Build the Sample

Prerequisites

Introduction

This tutorial demonstrates how to build a C++ sample application that can execute neural network models on the PC or target device. Please note, while this sample code does not do any error checking, it is strongly recommended that users check for errors when using the Qualcomm® Neural Processing SDK APIs.

Most applications will follow the following pattern while using a neural network:

  1. Get Available Runtime

  2. Load Network

  3. Load UDO

  4. Set Network Builder Options

  5. Load Network Inputs

    1. Using User Buffers

    2. Using ITensors

  6. Execute the Network & Process Output

    1. Using User Buffers

    2. Using ITensors

  7. Using IOBufferDataTypeMap

static zdl::DlSystem::Runtime_t runtime = checkRuntime();
std::unique_ptr<zdl::DlContainer::IDlContainer> container = loadContainerFromFile(dlc);
std::unique_ptr<zdl::SNPE::SNPE> snpe = setBuilderOptions(container, runtime, useUserSuppliedBuffers);
std::unique_ptr<zdl::DlSystem::ITensor> inputTensor = loadInputTensor(snpe, fileLine); // ITensor
loadInputUserBuffer(applicationInputBuffers, snpe, fileLine); // User Buffer
executeNetwork(snpe , inputTensor, OutputDir, inputListNum); // ITensor
executeNetwork(snpe, inputMap, outputMap, applicationOutputBuffers, OutputDir, inputListNum); // User Buffer

The sections below describe how to implement each step described above. For more details, please refer to the collection of source code files located at $SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPP.

Get Available Runtime

The code excerpt below illustrates how to check if a specific runtime is available using the native APIs (the GPU runtime is used as an example).

zdl::DlSystem::Runtime_t checkRuntime()
{
    static zdl::DlSystem::Version_t Version = zdl::SNPE::SNPEFactory::getLibraryVersion();
    static zdl::DlSystem::Runtime_t Runtime;
    std::cout << "Qualcomm (R) Neural Processing SDK Version: " << Version.asString().c_str() << std::endl; //Print Version number
    if (zdl::SNPE::SNPEFactory::isRuntimeAvailable(zdl::DlSystem::Runtime_t::GPU)) {
        Runtime = zdl::DlSystem::Runtime_t::GPU;
    } else {
        Runtime = zdl::DlSystem::Runtime_t::CPU;
    }
    return Runtime;
}

Load Network

The code excerpt below illustrates how to load a network from the Qualcomm® Neural Processing SDK container file (DLC).

std::unique_ptr<zdl::DlContainer::IDlContainer> loadContainerFromFile(std::string containerPath)
{
    std::unique_ptr<zdl::DlContainer::IDlContainer> container;
    container = zdl::DlContainer::IDlContainer::open(containerPath);
    return container;
}

Load UDO

The code excerpt below illustrates how to load UDO package(s).

bool loadUDOPackage(const std::string& UdoPackagePath)
{
    std::vector<std::string> udoPkgPathsList;
    split(udoPkgPathsList, UdoPackagePath, ',');
    for (const auto &u : udoPkgPathsList)
    {
       if (false == zdl::SNPE::SNPEFactory::addOpPackage(u))
       {
          std::cerr << "Error while loading UDO package: "<< u << std::endl;
          return false;
       }
    }
    return true;
}

Qualcomm® Neural Processing SDK can execute network with user-defined operations (UDO). Please refer to UDO Tutorial for implementing an UDO. The UDO can be specified to snpe-sample using “-u” option.

Set Network Builder Options

The following code demonstrates how to instantiate a SNPE Builder object, which will be used to execute the network with the given parameters.

std::unique_ptr<zdl::SNPE::SNPE> setBuilderOptions(std::unique_ptr<zdl::DlContainer::IDlContainer>& container,
                                                   zdl::DlSystem::RuntimeList runtimeList,
                                                   bool useUserSuppliedBuffers)
{
    std::unique_ptr<zdl::SNPE::SNPE> snpe;
    zdl::SNPE::SNPEBuilder snpeBuilder(container.get());
    snpe = snpeBuilder.setOutputLayers({})
       .setRuntimeProcessorOrder(runtimeList)
       .setUseUserSuppliedBuffers(useUserSuppliedBuffers)
       .build();
    return snpe;
}

Load Network Inputs

Network inputs and outputs can be either user-backed buffers or ITensors (built-in Qualcomm® Neural Processing SDK buffers), but not both. The advantage of using user-backed buffers is that it eliminates an extra copy from user buffers to create ITensors. Both methods of loading network inputs are shown below.

Using User Buffers

Qualcomm® Neural Processing SDK can create its network inputs and outputs from user-backed buffers. Note that Qualcomm® Neural Processing SDK expects the values of the buffers to be present and valid during the duration of its execution.

Here is a function for creating a Qualcomm® Neural Processing SDK UserBuffer from a user-backed buffer and storing it in a zdl::DlSystem::UserBufferMap These maps are a convenient collection of all input or output user buffers that can be passed to Qualcomm® Neural Processing SDK to execute the network.

Disclaimer: The strides of the buffer should already be known by the user and should not be calculated as shown below. The calculation shown is solely used for executing the example code.

void createUserBuffer(zdl::DlSystem::UserBufferMap& userBufferMap,
                      std::unordered_map<std::string, std::vector<uint8_t>>& applicationBuffers,
                      std::vector<std::unique_ptr<zdl::DlSystem::IUserBuffer>>& snpeUserBackedBuffers,
                      std::unique_ptr<zdl::SNPE::SNPE>& snpe,
                      const char * name)
{
   // get attributes of buffer by name
   auto bufferAttributesOpt = snpe->getInputOutputBufferAttributes(name);
   if (!bufferAttributesOpt) throw std::runtime_error(std::string("Error obtaining attributes for input tensor ") + name);
   // calculate the size of buffer required by the input tensor
   const zdl::DlSystem::TensorShape& bufferShape = (*bufferAttributesOpt)->getDims();
   // Calculate the stride based on buffer strides, assuming tightly packed.
   // Note: Strides = Number of bytes to advance to the next element in each dimension.
   // For example, if a float tensor of dimension 2x4x3 is tightly packed in a buffer of 96 bytes, then the strides would be (48,12,4)
   // Note: Buffer stride is usually known and does not need to be calculated.
   std::vector<size_t> strides(bufferShape.rank());
   strides[strides.size() - 1] = sizeof(float);
   size_t stride = strides[strides.size() - 1];
   for (size_t i = bufferShape.rank() - 1; i > 0; i--)
   {
      stride *= bufferShape[i];
      strides[i-1] = stride;
   }
   const size_t bufferElementSize = (*bufferAttributesOpt)->getElementSize();
   size_t bufSize = calcSizeFromDims(bufferShape.getDimensions(), bufferShape.rank(), bufferElementSize);
   // set the buffer encoding type
   zdl::DlSystem::UserBufferEncodingFloat userBufferEncodingFloat;
   // create user-backed storage to load input data onto it
   applicationBuffers.emplace(name, std::vector<uint8_t>(bufSize));
   // create Qualcomm (R) Neural Processing SDK user buffer from the user-backed buffer
   zdl::DlSystem::IUserBufferFactory& ubFactory = zdl::SNPE::SNPEFactory::getUserBufferFactory();
   snpeUserBackedBuffers.push_back(ubFactory.createUserBuffer(applicationBuffers.at(name).data(),
                                                              bufSize,
                                                              strides,
                                                              &userBufferEncodingFloat));
   // add the user-backed buffer to the inputMap, which is later on fed to the network for execution
   userBufferMap.add(name, snpeUserBackedBuffers.back().get());
}

The following function then shows how to load input data from file(s) to user buffers. Note that the input values are simply loaded onto user-backed buffers, on top of which Qualcomm® Neural Processing SDK can create Qualcomm® Neural Processing SDK UserBuffers, as shown above.

void loadInputUserBuffer(std::unordered_map<std::string, std::vector<uint8_t>>& applicationBuffers,
                               std::unique_ptr<zdl::SNPE::SNPE>& snpe,
                               const std::string& fileLine)
{
    // get input tensor names of the network that need to be populated
    const auto& inputNamesOpt = snpe->getInputTensorNames();
    if (!inputNamesOpt) throw std::runtime_error("Error obtaining input tensor names");
    const zdl::DlSystem::StringList& inputNames = *inputNamesOpt;
    assert(inputNames.size() > 0);
    // treat each line as a space-separated list of input files
    std::vector<std::string> filePaths;
    split(filePaths, fileLine, ' ');
    if (inputNames.size()) std::cout << "Processing DNN Input: " << std::endl;
    for (size_t i = 0; i < inputNames.size(); i++) {
        const char* name = inputNames.at(i);
        std::string filePath(filePaths[i]);
        // print out which file is being processed
        std::cout << "\t" << i + 1 << ") " << filePath << std::endl;
        // load file content onto application storage buffer,
        // on top of which, Qualcomm (R) Neural Processing SDK has created a user buffer
        loadByteDataFile(filePath, applicationBuffers.at(name));
    };
}

Using ITensors

std::unique_ptr<zdl::DlSystem::ITensor> loadInputTensor (std::unique_ptr<zdl::SNPE::SNPE> & snpe , std::string& fileLine)
{
    std::unique_ptr<zdl::DlSystem::ITensor> input;
    const auto &strList_opt = snpe->getInputTensorNames();
    if (!strList_opt) throw std::runtime_error("Error obtaining Input tensor names");
    const auto &strList = *strList_opt;
    // Make sure the network requires only a single input
    assert (strList.size() == 1);
    // If the network has a single input, each line represents the input file to be loaded for that input
    std::string filePath(fileLine);
    std::cout << "Processing DNN Input: " << filePath << "\n";
    std::vector<float> inputVec = loadFloatDataFile(filePath);
    /* Create an input tensor that is correctly sized to hold the input of the network. Dimensions that have no fixed size will be represented with a value of 0. */
    const auto &inputDims_opt = snpe->getInputDimensions(strList.at(0));
    const auto &inputShape = *inputDims_opt;
    /* Calculate the total number of elements that can be stored in the tensor so that we can check that the input contains the expected number of elements.
       With the input dimensions computed create a tensor to convey the input into the network. */
    input = zdl::SNPE::SNPEFactory::getTensorFactory().createTensor(inputShape);
    /* Copy the loaded input file contents into the networks input tensor.SNPE's ITensor supports C++ STL functions like std::copy() */
    std::copy(inputVec.begin(), inputVec.end(), input->begin());
    return input;
}

Execute the Network & Process Output

The following snippets of code use the native API to execute the network (in UserBuffer or ITensor mode) and show how to iterate through the newly populated output tensor.

Using User Buffers

void executeNetwork(std::unique_ptr<zdl::SNPE::SNPE>& snpe,
                    zdl::DlSystem::UserBufferMap& inputMap,
                    zdl::DlSystem::UserBufferMap& outputMap,
                    std::unordered_map<std::string,std::vector<uint8_t>>& applicationOutputBuffers,
                    const std::string& outputDir,
                    int num)
{
    // Execute the network and store the outputs in user buffers specified in outputMap
    snpe->execute(inputMap, outputMap);
    // Get all output buffer names from the network
    const zdl::DlSystem::StringList& outputBufferNames = outputMap.getUserBufferNames();
    // Iterate through output buffers and print each output to a raw file
    std::for_each(outputBufferNames.begin(), outputBufferNames.end(), [&](const char* name)
    {
       std::ostringstream path;
       path << outputDir << "/Result_" << num << "/" << name << ".raw";
       SaveUserBuffer(path.str(), applicationOutputBuffers.at(name));
    });
}
// The following is a partial snippet of the function
void SaveUserBuffer(const std::string& path, const std::vector<uint8_t>& buffer) {
   ...
   std::ofstream os(path, std::ofstream::binary);
   if (!os)
   {
      std::cerr << "Failed to open output file for writing: " << path << "\n";
      std::exit(EXIT_FAILURE);
   }
   if (!os.write((char*)(buffer.data()), buffer.size()))
   {
      std::cerr << "Failed to write data to: " << path << "\n";
      std::exit(EXIT_FAILURE);
   }
}

Using ITensors

void executeNetwork(std::unique_ptr<zdl::SNPE::SNPE>& snpe,
                    std::unique_ptr<zdl::DlSystem::ITensor>& input,
                    std::string OutputDir,
                    int num)
{
    //Execute the network and store the outputs that were specified when creating the network in a TensorMap
    static zdl::DlSystem::TensorMap outputTensorMap;
    snpe->execute(input.get(), outputTensorMap);
    zdl::DlSystem::StringList tensorNames = outputTensorMap.getTensorNames();
    //Iterate through the output Tensor map, and print each output layer name
    std::for_each( tensorNames.begin(), tensorNames.end(), [&](const char* name)
    {
        std::ostringstream path;
        path << OutputDir << "/"
        << "Result_" << num << "/"
        << name << ".raw";
        auto tensorPtr = outputTensorMap.getTensor(name);
        SaveITensor(path.str(), tensorPtr);
    });
}
// The following is a partial snippet of the function
void SaveITensor(const std::string& path, const zdl::DlSystem::ITensor* tensor)
{
   ...
   std::ofstream os(path, std::ofstream::binary);
   if (!os)
   {
      std::cerr << "Failed to open output file for writing: " << path << "\n";
      std::exit(EXIT_FAILURE);
   }
   for ( auto it = tensor->cbegin(); it != tensor->cend(); ++it )
   {
      float f = *it;
      if (!os.write(reinterpret_cast<char*>(&f), sizeof(float)))
      {
         std::cerr << "Failed to write data to: " << path << "\n";
         std::exit(EXIT_FAILURE);
      }
   }
}

Using IOBufferDataTypeMap

  • The IOBufferDataTypeMap is used to specify the intended data type for input/output of a network. The data type values include zdl::DlSystem::IOBufferDataType_t::FLOATING_POINT_32, zdl::DlSystem::IOBufferDataType_t::FIXED_POINT_8 and zdl::DlSystem::IOBufferDataType_t::FIXED_POINT_16.

  • If the output of a network is of type FIXED_POINT_8 and the user intends to access the output in FLOATING_POINT_32 format, the dequantization operation is performed on the ARM side. By specifying the data type as FLOATING_POINT_32 using the IOBufferDataTypeMap API, the dequantization operation is added directly to the graph.

The following snippet of code shows how to specify the data type for a buffer using the native API.

void setBufferDataType(zdl::DlSystem::IOBufferDataTypeMap& bufferDataTypeMap, std::string bufferName, zdl::DlSystem::IOBufferDataType_t dataType)
{
    bufferDataTypeMap.add(bufferName.c_str(), dataType);
}
setBufferDataType(bufferDataTypeMap, "output_1", zdl::DlSystem::IOBufferDataType_t::FLOATING_POINT_32);
zdl::SNPE::SNPEBuilder snpeBuilder(container.get());
snpeBuilder.setBufferDataType(bufferDataTypeMap);

Building the C++ Application

Building and Running on x86 Linux and Embedded Linux

Start by going to the snpe-sample base directory.

cd $SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPP

Note the different makefiles associated with the different Linux platform. Note that the $CXX would need to be set according to the target platform. Here is a table of the supported targets, and their corresponding settings for $CXX and the Makefiles to use.

Target

Makefile

Possible CXX value

Output Location

aarch64-oe-linux-gcc8.2

Makefile.aarch64-oe-linux-gcc8.2

aarch64-oe-linux-g++

aarch64-oe-linux-gcc8.2

aarch64-oe-linux-gcc9.3

Makefile.aarch64-oe-linux-gcc9.3

aarch64-oe-linux-g++

aarch64-oe-linux-gcc9.3

aarch64-oe-linux-gcc11.2

Makefile.aarch64-oe-linux-gcc11.2

aarch64-oe-linux-g++

aarch64-oe-linux-gcc11.2

aarch64-ubuntu-gcc9.4

Makefile.aarch64-ubuntu-gcc9.4

aarch64-linux-gnu-g++

aarch64-ubuntu-gcc9.4

x86_64-linux

Makefile.x86_64-linux-clang

g++

x86_64-linux-clang

export CXX=<Name of c++ cross compiler>
make -f <Makefile for the target>

Note: Ensure that the path to the compiler binary is already set in $PATH.

Along with the sample executable, all other libraries need to be pushed onto their respective targets. The $LD_LIBRARY_PATH may also need to be updated to point to the support libraries. You can run the executable with -h argument to see its description.

snpe-sample -h

The description should look like the following:

DESCRIPTION:
------------
Example application demonstrating how to load and execute a neural network
using the SNPE C++ API.


REQUIRED ARGUMENTS:
-------------------
  -d  <FILE>   Path to the DL container containing the network.
  -i  <FILE>   Path to a file listing the inputs for the network.
  -o  <PATH>   Path to directory to store output results.

OPTIONAL ARGUMENTS:
-------------------
  -b  <TYPE>   Type of buffers to use [USERBUFFER_FLOAT, USERBUFFER_TF8, ITENSOR, USERBUFFER_TF16] (ITENSOR is default).
  -q  <BOOL>    Specifies to use static quantization parameters from the model instead of input specific quantization [true, false]. Used in conjunction with USERBUFFER_TF8.
  -r  <RUNTIME> The runtime to be used [gpu, dsp, aip, cpu] (cpu is default).
  -u  <VAL,VAL> Path to UDO package with registration library for UDOs.
                Optionally, user can provide multiple packages as a comma-separated list.
  -z  <NUMBER>  The maximum number that resizable dimensions can grow into.
                Used as a hint to create UserBuffers for models with dynamic sized outputs. Should be a positive integer and is not applicable when using ITensor.
  -c           Enable init caching to accelerate the initialization process of SNPE. Defaults to disable.
  -l  <VAL,VAL,VAL> Specifies the order of precedence for runtime e.g  cpu_float32, dsp_fixed8_tf etc. Valid values are:-
                    cpu_float32 (Snapdragon CPU)       = Data & Math: float 32bit
                    gpu_float32_16_hybrid (Adreno GPU) = Data: float 16bit Math: float 32bit
                    dsp_fixed8_tf (Hexagon DSP)        = Data & Math: 8bit fixed point Tensorflow style format
                    gpu_float16 (Adreno GPU)           = Data: float 16bit Math: float 16bit
                    cpu (Snapdragon CPU)               = Same as cpu_float32
                    gpu (Adreno GPU)                   = Same as gpu_float32_16_hybrid
                    dsp (Hexagon DSP)                  = Same as dsp_fixed8_tf

Running the snpe-sample assumes Running the Inception v3 Model has been previously setup.

Run snpe-sample with the AlexNet model:

cd $SNPE_ROOT/examples/Models/InceptionV3/data
$SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPP/obj/local/x86_64-linux-clang/snpe-sample -b ITENSOR -d ../dlc/inception_v3.dlc -i target_raw_list.txt -o output

The results are stored in the output directory. To process the output run the following script to generate the classifiscation results.

python3 $SNPE_ROOT/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i target_raw_list.txt -o output/ -l imagenet_slim_labels.txt
Classification results
cropped/notice_sign.raw 0.167454 459 brass
cropped/plastic_cup.raw 0.990612 648 measuring cup
cropped/chairs.raw      0.382222 832 studio couch
cropped/trash_bin.raw   0.684572 413 ashcan

Building and Running on ARM Android

Prerequisite: You will need the Android NDK to build the Android C++ executable. The tutorial assumes that you can invoke ‘ndk-build’ from the shell.

To build snpe-sample with clang Qualcomm® Neural Processing SDK binaries (i.e., aarch64-android), use the following command:

cd $SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPP
ndk-build NDK_TOOLCHAIN_VERSION=clang APP_STL=c++_static NDK_PROJECT_PATH=. NDK_APPLICATION_MK=Application.mk APP_BUILD_SCRIPT=Android.mk

The ndk-build command will build arm64-v8a binaries of snpe-sample.

  • $SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPP/obj/local/arm64-v8a/snpe-sample

To run the Android C++ executable, push the appropriate Qualcomm® Neural Processing SDK libraries and the executable onto the Android target.

export SNPE_TARGET_ARCH=aarch64-android
export SNPE_TARGET_DSPARCH=hexagon-v73
adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin"
adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib"
adb shell "mkdir -p /data/local/tmp/snpeexample/dsp/lib"
adb push $SNPE_ROOT/lib/$SNPE_TARGET_ARCH/*.so /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
adb push $SNPE_ROOT/lib/$SNPE_TARGET_DSPARCH/unsigned/*.so /data/local/tmp/snpeexample/dsp/lib
adb push $SNPE_ROOT/examples/SNPE/NativeCpp/SampleCode_CPP/obj/local/arm64-v8a/snpe-sample /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin

Run snpe-sample with the Inception v3 model on the target. This assumes that you have done the setup steps for running Run on Android Target to push to the target all the sample data files and Alexnet model.

adb shell
export SNPE_TARGET_ARCH=aarch64-android
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
cd /data/local/tmp/inception_v3
snpe-sample -b ITENSOR -d inception_v3.dlc -i target_raw_list.txt -o output_sample
exit

Pull the target output into a host side output directory.

cd $SNPE_ROOT/examples/Models/InceptionV3
adb pull /data/local/tmp/inception_v3/output_sample output_sample

Again, we can run the interpret script to see the classification results.

python3 $SNPE_ROOT/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/target_raw_list.txt -o output_sample/ -l data/imagenet_slim_labels.txt
Classification results
cropped/notice_sign.raw 0.167454 459 brass
cropped/plastic_cup.raw 0.990612 648 measuring cup
cropped/chairs.raw      0.382221 832 studio couch
cropped/trash_bin.raw   0.684573 413 ashcan

Building and running on Linux (Yocto Based)

Prerequisite: This assumes that Tutorials Setup has been completed.

For those devices which have Yocto based Linux OS, GCC compiler needs to be used to build the sample source code. To support Yocto Kirkstone based devices, libraries are compiled with gcc11.2. Please refer below steps for building SNPE sample app:

export SNPE_ROOT=/path/to/extracted/snpe-sdk
cd ${SNPE_ROOT}/examples/SNPE/NativeCpp/SampleCode_CPP/
export AARCH64_LINUX_OE_GCC_112=/path/to/extracted/toolchain
make CXX="<installed_toolchain_path>/tmp/sysroots/x86_64/usr/bin/aarch64-qcom-linux/aarch64-qcom-linux-g++
--sysroot=<installed_toolchain_path>/tmp/sysroots/qcm6490" -f Makefile.aarch64-oe-linux-gcc11.2

After executing make from above, you should be able to see two new folders in the same directory:

  1. bin: contains snpe-sample binaries for each platform within respective directories.

  2. obj: contains all the object files that were used for building and linking the executable.

To delete all the artifacts that were generated in the above step, run:

cd ${SNPE_ROOT}/examples/SNPE/NativeCpp/SampleCode_CPP
make clean

To run the snpe-sample C++ executable, push the appropriate Qualcomm® Neural Processing SDK libraries and the executable i.e., aarch64-oe-linux-gcc11.2 onto the target. Run snpe-sample using below command:

snpe-sample -b ITENSOR -d <input_dlc> -i <target_raw_list> -o output_sample