PSNPE C Tutorial

Prerequisites

Introduction

This tutorial demonstrates how to use PSNPE C APIs to build its C sample application that can execute neural network models with multiple runtimes on the target device. While this sample code does not do any error checking, it is strongly recommended that users check for errors when using the PSNPE APIs. Besides, since sample code is based on C, all relevant handles need to be freed in the end.

Using sync mode as a sample, a PSNPE integrated application will follow the following pattern while using a neural network:

  1. Get Configuration of Available Runtimes

  2. Get Builder Configuration

  3. Build PSNPE Instance

  4. Load Network Inputs with User Buffer List

  5. Execute the Network & Process Output for Sync Mode

  6. C Application Example

auto runtimeConfigListHandle = Snpe_RuntimeConfigList_Create();
auto bcHandle = Snpe_BuildConfig_Create();
buildStatus = (Snpe_PSNPE_Build(psnpeHandle, bcHandle) == SNPE_SUCCESS);
exeStatus = SNPE_SUCCESS == Snpe_PSNPE_Execute(psnpeHandle, inputMapList2, outputMapList2);

PSNPE uses sync model as default, if you want to choose async mode, please refer to BuildConfig for Async Mode. For output async mode, loading input data and executing psnpe is similar as sync mode, but you need to get output data by defining outputCallback function in Callback for OutputAsync Mode. For input/output async mode, both loading input data and get output data need callback functions which could refer to Execution and Callback for InputOutputAsync Mode.

The sections below describe how to implement each step described above.

Get Configuration of Available Runtimes

The code excerpt below illustrates how to set the config for each available runtime with the given parameters. Creation of multiple instances for the same runtime can be done by adding multiple runtime config handles to the runtime config list. Multiple instances, even of the same runtime, would create multiple worker threads to queue work for execution, improving throughput.

auto runtimeConfigListHandle = Snpe_RuntimeConfigList_Create();
for (size_t j = 0; j < numRequestedInstances; j++)
{
    auto runtimeConfigHandle = Snpe_RuntimeConfig_Create();
    Snpe_RuntimeConfig_SetRuntimeList(runtimeConfigHandle, RuntimesListVector[j]);
    Snpe_RuntimeConfig_SetPerformanceProfile(runtimeConfigHandle, PerfProfile[j]);
    Snpe_RuntimeConfigList_PushBack(runtimeConfigListHandle, runtimeConfigHandle);
    Snpe_RuntimeConfig_Delete(runtimeConfigHandle);
    numCreatedInstances++;
}

Get Builder Configuration

The code excerpt below illustrates how to set the configuration for PSNPE builder with the given parameters including DLC, runtimeConfigList, output layer, transmission mode etc.

auto containerHandle = Snpe_DlContainer_Open(ContainerPath.c_str());
auto bcHandle = Snpe_BuildConfig_Create();
Snpe_BuildConfig_SetContainer(bcHandle, containerHandle);
Snpe_BuildConfig_SetRuntimeConfigList(bcHandle, runtimeConfigListHandle);
Snpe_BuildConfig_SetOutputBufferNames(bcHandle, outputLayers);
Snpe_BuildConfig_SetInputOutputTransmissionMode(bcHandle, static_cast<Snpe_PSNPE_InputOutputTransmissionMode_t>(inputOutputTransmissionMode));
Snpe_BuildConfig_SetEncode(bcHandle, input_encode[0], input_encode[1]);
Snpe_BuildConfig_SetEnableInitCache(bcHandle, usingInitCache);
Snpe_BuildConfig_SetProfilingLevel(bcHandle, profilingLevel);
Snpe_BuildConfig_SetPlatformOptions(bcHandle, platformOptions.c_str());
Snpe_BuildConfig_SetOutputTensors(bcHandle, outputTensors);

Build PSNPE Instance

The following code demonstrates how to instantiate a PSNPE Builder object which will be used to execute the network.

buildStatus = (Snpe_PSNPE_Build(psnpeHandle, bcHandle) == SNPE_SUCCESS);

Load Network Inputs with User Buffer List

This input loading method is used in synchronous mode and output asynchronous mode which is similar as the method used by Qualcomm® Neural Processing SDK to create inputs and outputs from user-backed buffers.

std::vector<std::unordered_map <std::string, std::vector<uint8_t>>> outputBuffersVec(nums);
std::vector<std::unordered_map <std::string, std::vector<uint8_t>>> inputBuffersVec(nums);
std::vector<Snpe_IUserBuffer_Handle_t> snpeUserBackedInputBuffers;
std::vector<Snpe_IUserBuffer_Handle_t> snpeUserBackedOutputBuffers;
Snpe_UserBufferList_Handle_t inputMapList  = Snpe_UserBufferList_CreateSize(BufferNum);
Snpe_UserBufferList_Handle_t outputMapList = Snpe_UserBufferList_CreateSize(BufferNum);
if(inputOutputTransmissionMode != zdl::PSNPE::InputOutputTransmissionMode::inputOutputAsync)
{
   for (size_t i = 0; i < inputs.size(); ++i) {
      for (size_t j = 0; j < Snpe_StringList_Size(inputTensorNamesList[0]); ++j) {
         const char* name = Snpe_StringList_At(inputTensorNamesList[0], j);
         uint8_t bufferBitWidth = bitWidthMap[bufferDataTypeMap[name]];
         uint8_t nativeBitWidth = usingNativeInputDataType ? bitWidthMap[nativeDataTypeMap[name]]: 32;
         std::string nativeDataType = usingNativeInputDataType ? nativeDataTypeMap[name] : "float32";
         if(bufferDataTypeMap[name] == "float16" || bufferDataTypeMap[name] == "float32"){
            if(!LoadInputBufferMapsFloatN(inputs[i][j], name, {psnpeHandle, true},
                                        Snpe_UserBufferList_At_Ref(inputMapList, i),
                                        snpeUserBackedInputBuffers, inputBuffersVec[i],numFilesCopied, batchSize, dynamicQuantization,
                                        bufferBitWidth,10, rpcMemAllocFnHandle, false, ionBufferMapHandle,
                                        usingNativeInputDataType, nativeDataType, nativeBitWidth))
            {
               return EXIT_FAILURE;
            }
         }
      }
      Snpe_StringList_Handle_t outputBufferNamesHandle = Snpe_PSNPE_GetOutputTensorNames(psnpeHandle);
      for (size_t j = 0; j < Snpe_StringList_Size(outputBufferNamesHandle); ++j) {
         const char* name = Snpe_StringList_At(outputBufferNamesHandle, j);
         if(bufferDataTypeMap.find(name) == bufferDataTypeMap.end()){
            std::cerr << "DataType not specified for buffer " << name << std::endl;
         }
         uint8_t bitWidth = bitWidthMap[bufferDataTypeMap[name]];
         if(bufferDataTypeMap[name] == "float16" || bufferDataTypeMap[name] == "float32"){
            PopulateOutputBufferMapsFloatN({psnpeHandle, true}, name,
                                          Snpe_UserBufferList_At_Ref(outputMapList, i),
                                          snpeUserBackedOutputBuffers, outputBuffersVec[i], bitWidth, 10,
                                          rpcMemAllocFnHandle, usingIonBuffer, ionBufferMapHandle);
         }
      }
   }
}

Execute the Network & Process Output for Sync Mode

The following code uses the native API to execute the network in synchronous mode. The saveOutput function could refer to PSNPE C++ Tutorial

exeStatus = SNPE_SUCCESS == Snpe_PSNPE_Execute(psnpeHandle, inputMapList, outputMapList);
for (size_t i = 0; i < inputs.size(); i++) {
   saveOutput(Snpe_UserBufferList_At_Ref(outputMapList, i), outputBuffersVec[i], ionBufferMapReg, OutputDir, i * batchSize,  batchSize, false);
}

BuildConfig for Async Mode

If you want to run outputAsync mode or inputOutputAsync mode, you need to set callback fuction in buildConfig.

if (inputOutputTransmissionMode == SNPE_PSNPE_INPUTOUTPUTTRANSMISSIONMODE_OUTPUTASYNC) {
   Snpe_BuildConfig_SetOutputThreadNumbers(bcHandle, outputNum);
   Snpe_BuildConfig_SetOutputCallback(bcHandle, OCallback);
}
if (inputOutputTransmissionMode == SNPE_PSNPE_INPUTOUTPUTTRANSMISSIONMODE_INPUTOUTPUTASYNC) {
   Snpe_BuildConfig_SetInputThreadNumbers(bcHandle, inputNum);
   Snpe_BuildConfig_SetOutputThreadNumbers(bcHandle, outputNum);
   Snpe_BuildConfig_SetInputOutputCallback(bcHandle, IOCallback);
   Snpe_BuildConfig_SetInputOutputInputCallback(bcHandle, inputCallback);
}

Callback for OutputAsync Mode

Output asynchronous mode provide real-time output by calling callback function.

void OCallback(Snpe_PSNPE_OutputAsyncCallbackParam_Handle_t oacpHandle) {
   if(!Snpe_PSNPE_OutputAsyncCallbackParam_GetExecuteStatus(oacpHandle)) {
      std::cerr << "excute fail ,index: " << Snpe_PSNPE_OutputAsyncCallbackParam_GetDataIdx(oacpHandle) << std::endl;
   }
}

Execution and Callback for InputOutputAsync Mode

Asynchronous execution can provide real-time output result while synchronous mode provides the outputs after finishing execution.

for (size_t i = 0; i < inputs.size(); ++i) {
std::vector< std::string > filePaths;
std::vector<std::queue<std::string>> temp = inputs[i];
for(size_t j=0;j<temp.size();j++)
   {
      while(temp[j].size()!= 0){
         filePaths.push_back(temp[j].front());
         temp[j].pop();
      }
      numLines++;
      Snpe_StringList_Handle_t filePathsHandle = toStringList(filePaths);
      exeStatus = SNPE_SUCCESS == Snpe_PSNPE_ExecuteInputOutputAsync(psnpeHandle, filePathsHandle, i, usingTf8UserBuffer, usingTf8UserBuffer);
   }
}
//In input/output asynchronous mode, loading input data through callback function with TF8 vector.
Snpe_ApplicationBufferMap_Handle_t inputCallback(Snpe_StringList_Handle_t inputs, Snpe_StringList_Handle_t inputNames) {
  Snpe_ApplicationBufferMap_Handle_t inputMap = Snpe_ApplicationBufferMap_Create();
  for (size_t j = 0; j < Snpe_StringList_Size(inputNames); j++) {
    std::vector<uint8_t> loadVector;
    ...  //load input data
    Snpe_ApplicationBufferMap_Add(inputMap, Snpe_StringList_At(inputNames, j), loadVector.data(), loadVector.size());
  }
  return inputMap;
}
// In input/output asynchronous mode, the index and data of output can be obtained through a callback function
void IOCallback(Snpe_PSNPE_InputOutputAsyncCallbackParam_Handle_t ioacpHandle)
{
   Snpe_StringList_Handle_t names = Snpe_PSNPE_InputOutputAsyncCallbackParam_GetUserBufferNames(ioacpHandle);
   std::vector<std::pair<const char*, Snpe_UserBufferData_t>> vec;
   const auto end = Snpe_StringList_End(names);
   for(auto it = Snpe_StringList_Begin(names); it != end; ++it){
      vec.emplace_back(*it, Snpe_PSNPE_InputOutputAsyncCallbackParam_GetUserBuffer(ioacpHandle, *it));
   }
   saveOutput(vec, OutputDir, Snpe_PSNPE_InputOutputAsyncCallbackParam_GetDataIdx(ioacpHandle));
}
// The below shows parts of the function.
void saveOutput(const std::vector<std::pair<const char*, Snpe_UserBufferData_t>>& applicationOutputBuffers, const std::string& outputDir, int num){
  std::for_each(applicationOutputBuffers.begin(),
                applicationOutputBuffers.end(),
                [&](std::pair<std::string, Snpe_UserBufferData_t> a) {
                  std::ostringstream path;
                  path << outputDir << "/"
                       << "Result_" << num << "/" << pal::FileOp::toLegalFilename(a.first) << ".raw";
                  std::string outputPath = path.str();
                  std::string::size_type pos = outputPath.find(":");
                  if (pos != std::string::npos) outputPath = outputPath.replace(pos, 1, "_");
                  SaveUserBuffer(outputPath, a.second.data, a.second.size);
                });
}

C Application Example

The C application integrated with PSNPE in this tutorial is called snpe-parallel-run. It is a command line executable that executes a DLC model using Qualcomm® Neural Processing SDK SDK APIs. It’s usage is same as snpe-net-run example from Running the Inception v3 Model while running on android target.

  1. Push model data to Android target.

  2. Select target architecture.

  3. Push binaries to target.

  4. Set up environment variables.

adb shell
export ADSP_LIBRARY_PATH="/data/local/tmp/snpeexample/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/aarch64-android/lib
export PATH=$PATH:/data/local/tmp/snpeexample/aarch64-android/bin/
cd /data/local/tmp/inception_v3
snpe-parallel-run --container inception_v3_quantized.dlc --input_list target_raw_list.txt --use_dsp --perf_profile burst --cpu_fallback false --use_dsp --perf_profile burst --cpu_fallback false --runtime_mode output_async
exit