PSNPE C++ Tutorial

Prerequisites

Introduction

This tutorial demonstrates how to use PSNPE C++ APIs to build its C++ sample application that can execute neural network models with multiple runtimes on the target device. While this sample code does not do any error checking, it is strongly recommended that users check for errors when using the PSNPE APIs.

Using sync mode as a sample, a PSNPE integrated application will follow the following pattern while using a neural network:

  1. Get Configuration of Available Runtimes

  2. Get Builder Configuration

  3. Build PSNPE Instance

  4. Load Network Inputs with User Buffer List

  5. Execute the Network & Process Output for Sync Mode

  6. C++ Application Example

zdl::PSNPE::RuntimeConfigList runtimeconfigs;
zdl::PSNPE::BuildConfig buildConfig;
buildStatus = psnpe->build(buildConfig);
exeStatus = psnpe->execute(inputMapList, outputMapList);

PSNPE uses sync model as default, if you want to choose async mode, please refer to BuildConfig for Async Mode. For output async mode, loading input data and executing psnpe is similar as sync mode, but you need to get output data by defining outputCallback function in Callback for OutputAsync Mode. For input/output async mode, both loading input data and get output data need callback functions which could refer to Execution and Callback for InputOutputAsync Mode.

The sections below describe how to implement each step described above.

Get Configuration of Available Runtimes

The code excerpt below illustrates how to set the config for each available runtime with the given parameters.

zdl::PSNPE::RuntimeConfigList runtimeconfigs;
for (size_t j = 0; j < numRequestedInstances; j++)
{
    zdl::PSNPE::RuntimeConfig runtimeConfig;
    zdl::SNPE::SNPEFactory::isRuntimeAvailable(Runtimes[j]);
    runtimeConfig.runtime = Runtimes[j];
    runtimeConfig.enableCPUFallback = cpuFallBacks[j];
    runtimeConfig.perfProfile = PerfProfile[j];
    runtimeconfigs.push_back(runtimeConfig);
}

Get Builder Configuration

The code excerpt below illustrates how to set the configuration for PSNPE builder with the given parameters including DLC, runtimeConfigList, output layer, transmission mode etc.

zdl::PSNPE::BuildConfig buildConfig;
std::unique_ptr<zdl::DlContainer::IDlContainer> container = zdl::DlContainer::IDlContainer::open(ContainerPath);
buildConfig.container = container.get();
buildConfig.runtimeConfigList = runtimeconfigs;
buildConfig.outputBufferNames = outputLayers;
buildConfig.inputOutputTransmissionMode = inputOutputTransmissionMode;
buildConfig.enableInitCache = usingInitCache;
buildConfig.profilingLevel = profilingLevel;
buildConfig.platformOptions = platformOptions;
buildConfig.outputTensors = outputTensors;

Build PSNPE Instance

The following code demonstrates how to instantiate a PSNPE Builder object which will be used to execute the network.

bool buildStatus = psnpe->build(buildConfig);

Load Network Inputs with User Buffer List

This input loading method is used in synchronous mode and output asynchronous mode which is similar as the method used by Qualcomm® Neural Processing SDK to create inputs and outputs from user-backed buffers. Functions createUserBuffer() and loadInputUserBuffer() can refer to C++ Tutorial - Build the Sample.

std::vector<std::unordered_map <std::string, std::vector<uint8_t>>> outputBuffersVec(nums);
std::vector<std::unordered_map <std::string, std::vector<uint8_t>>> inputBuffersVec(nums);
std::vector <std::unique_ptr<zdl::DlSystem::IUserBuffer>> snpeUserBackedInputBuffers, snpeUserBackedOutputBuffers;
zdl::PSNPE::UserBufferList inputMapList(nums), outputMapList(nums);
const zdl::DlSystem::StringList innames = psnpe->getInputTensorNames();
const zdl::DlSystem::StringList outnames = psnpe->getOutputTensorNames();
if(inputOutputTransmissionMode != zdl::PSNPE::InputOutputTransmissionMode::inputOutputAsync)
{
   for (size_t i = 0; i < inputs.size(); ++i)
   {
      for (const char* name : innames)
      {
         createUserBuffer(inputMapList[i],
                          inputBuffersVec[i],
                          snpeUserBackedInputBuffers,
                          psnpe,
                          name,
                          usingTf8UserBuffer);
      }
      for (const char* name : outnames)
      {
         createUserBuffer(outputMapList[i],
                          outputBuffersVec[i],
                          snpeUserBackedOutputBuffers,
                          psnpe,
                          name,
                          usingTf8UserBuffer);
      }
      loadInputUserBuffer(inputBuffersVec[i], psnpe, inputs[i], inputMapList[i], usingTf8UserBuffer)
   }
}

Execute the Network & Process Output for Sync Mode

The following code uses the native API to execute the network in synchronous mode.

bool exeStatus = psnpe->execute(inputMapList, outputMapList);
saveOutput(outputMapList[i], outputBuffersVec[i], OutputDir, i * batchSize, batchSize, true);
// The below shows parts of the function.
void saveOutput (zdl::DlSystem::UserBufferMap& outputMap,
                 std::unordered_map<std::string,std::vector<uint8_t>>& applicationOutputBuffers,
                 const std::string& outputDir,
                 int num,
                 size_t batchSize,
                 bool isTf8Buffer)
{
   const zdl::DlSystem::StringList& outputBufferNames = outputMap.getUserBufferNames();
   for(auto & name : outputBufferNames )
   {
      ... //get output data
   }
}

BuildConfig for Async Mode

If you want to run outputAsync mode or inputOutputAsync mode, you need to set callback fuction in buildConfig.

if (inputOutputTransmissionMode == zdl::PSNPE::InputOutputTransmissionMode::outputAsync) {
   buildConfig.outputThreadNumbers = outputNum;
   buildConfig.outputCallback = OCallback;
}
if (inputOutputTransmissionMode == zdl::PSNPE::InputOutputTransmissionMode::inputOutputAsync) {
   buildConfig.inputThreadNumbers = inputNum;
   buildConfig.outputThreadNumbers = outputNum;
   buildConfig.inputOutputCallback = IOCallback;
   buildConfig.inputOutputInputCallback = inputCallback;
}

Callback for OutputAsync Mode

Output asynchronous mode provide real-time output by calling callback function.

void OCallback(zdl::PSNPE::OutputAsyncCallbackParam p) {
   if (!p.executeStatus) {
      std::cerr << "excute fail ,index: " << p.dataIndex << std::endl;
   }
}

Execution and Callback for InputOutputAsync Mode

Asynchronous execution can provide real-time output result while synchronous mode provides the outputs after finishing execution.

bool exeStatus = psnpe->executeInputOutputAsync(zdl::PSNPE::ApplicationBufferMap(inputMap),i,usingTf8UserBuffer);
//In input/output asynchronous mode, loading input data through callback function with TF8 vector.
std::shared_ptr<zdl::PSNPE::ApplicationBufferMap> inputCallback(
        std::vector<std::string> inputs, const zdl::DlSystem::StringList& inputNames) {
   std::shared_ptr<zdl::PSNPE::ApplicationBufferMap> inputMap(new zdl::PSNPE::ApplicationBufferMap);
   for (std::string fileLine : inputs) {
      for (size_t j = 0; j < inputNames.size(); j++) {
         ... //load input data
         }
         inputMap->add(inputNames.at(j), loadVector);
      }
   }
   return inputMap;
}
// In input/output asynchronous mode, the index and data of output can be obtained through a callback function
void IOCallback(zdl::PSNPE::InputOutputAsyncCallbackParam p)
{
   saveOutput(p.outputMap.getUserBuffer(), OutputDir, p.dataIndex);
}
// The below shows parts of the function.
void saveOutput(const std::unordered_map<std::string, std::vector<uint8_t>>& applicationOutputBuffers, const std::string& outputDir, int num)
{
   std::for_each(applicationOutputBuffers.begin(), applicationOutputBuffers.end(), [&](std::pair<std::string, std::vector<uint8_t>> a) {
       std::ostringstream path;
       path << outputDir << "/" << "Result_" << num << "/" <<a.first.data()<< ".raw";
       std::string outputPath = path.str();
       std::string::size_type pos = outputPath.find(":");
       if(pos != std::string::npos) outputPath = outputPath.replace(pos, 1, "_");
       SaveUserBuffer(outputPath,a.second.data(),a.second.size());
     });

C++ Application Example

The C++ application integrated with PSNPE in this tutorial is called snpe-parallel-run. It is a command line executable that executes a DLC model using Qualcomm® Neural Processing SDK APIs. It’s usage is same as snpe-net-run example from Running the Inception v3 Model while running on android target.

  1. Push model data to Android target.

  2. Select target architecture.

  3. Push binaries to target.

  4. Set up environment variables.

adb shell
export ADSP_LIBRARY_PATH="/data/local/tmp/snpeexample/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/aarch64-android/lib
export PATH=$PATH:/data/local/tmp/snpeexample/aarch64-android/bin/
cd /data/local/tmp/inception_v3
snpe-parallel-run --container inception_v3_quantized.dlc --input_list target_raw_list.txt --use_dsp --perf_profile burst --cpu_fallback false --use_dsp --perf_profile burst --cpu_fallback false --runtime_mode output_async
exit