PSNPE C Tutorial¶
Prerequisites
The Qualcomm® Neural Processing SDK has been set up following the |Qualcomm(R)| Neural Processing SDK setup .
The Tutorials Setup has been completed.
APIs used in this page can be found in C Tutorial - Build the Sample chapter.
Introduction
This tutorial demonstrates how to use PSNPE C APIs to build its C sample application that can execute neural network models with multiple runtimes on the target device. While this sample code does not do any error checking, it is strongly recommended that users check for errors when using the PSNPE APIs. Besides, since sample code is based on C, all relevant handles need to be freed in the end.
Using sync mode as a sample, a PSNPE integrated application will follow the following pattern while using a neural network:
auto runtimeConfigListHandle = Snpe_RuntimeConfigList_Create();
auto bcHandle = Snpe_BuildConfig_Create();
buildStatus = (Snpe_PSNPE_Build(psnpeHandle, bcHandle) == SNPE_SUCCESS);
exeStatus = SNPE_SUCCESS == Snpe_PSNPE_Execute(psnpeHandle, inputMapList2, outputMapList2);
PSNPE uses sync model as default, if you want to choose async mode, please refer to BuildConfig for Async Mode. For output async mode, loading input data and executing psnpe is similar as sync mode, but you need to get output data by defining outputCallback function in Callback for OutputAsync Mode. For input/output async mode, both loading input data and get output data need callback functions which could refer to Execution and Callback for InputOutputAsync Mode.
The sections below describe how to implement each step described above.
Get Configuration of Available Runtimes
The code excerpt below illustrates how to set the config for each available runtime with the given parameters. Creation of multiple instances for the same runtime can be done by adding multiple runtime config handles to the runtime config list. Multiple instances, even of the same runtime, would create multiple worker threads to queue work for execution, improving throughput.
auto runtimeConfigListHandle = Snpe_RuntimeConfigList_Create();
for (size_t j = 0; j < numRequestedInstances; j++)
{
auto runtimeConfigHandle = Snpe_RuntimeConfig_Create();
Snpe_RuntimeConfig_SetRuntimeList(runtimeConfigHandle, RuntimesListVector[j]);
Snpe_RuntimeConfig_SetPerformanceProfile(runtimeConfigHandle, PerfProfile[j]);
Snpe_RuntimeConfigList_PushBack(runtimeConfigListHandle, runtimeConfigHandle);
Snpe_RuntimeConfig_Delete(runtimeConfigHandle);
numCreatedInstances++;
}
Get Builder Configuration
The code excerpt below illustrates how to set the configuration for PSNPE builder with the given parameters including DLC, runtimeConfigList, output layer, transmission mode etc.
auto containerHandle = Snpe_DlContainer_Open(ContainerPath.c_str());
auto bcHandle = Snpe_BuildConfig_Create();
Snpe_BuildConfig_SetContainer(bcHandle, containerHandle);
Snpe_BuildConfig_SetRuntimeConfigList(bcHandle, runtimeConfigListHandle);
Snpe_BuildConfig_SetOutputBufferNames(bcHandle, outputLayers);
Snpe_BuildConfig_SetInputOutputTransmissionMode(bcHandle, static_cast<Snpe_PSNPE_InputOutputTransmissionMode_t>(inputOutputTransmissionMode));
Snpe_BuildConfig_SetEncode(bcHandle, input_encode[0], input_encode[1]);
Snpe_BuildConfig_SetEnableInitCache(bcHandle, usingInitCache);
Snpe_BuildConfig_SetProfilingLevel(bcHandle, profilingLevel);
Snpe_BuildConfig_SetPlatformOptions(bcHandle, platformOptions.c_str());
Snpe_BuildConfig_SetOutputTensors(bcHandle, outputTensors);
Build PSNPE Instance
The following code demonstrates how to instantiate a PSNPE Builder object which will be used to execute the network.
buildStatus = (Snpe_PSNPE_Build(psnpeHandle, bcHandle) == SNPE_SUCCESS);
Load Network Inputs with User Buffer List
This input loading method is used in synchronous mode and output asynchronous mode which is similar as the method used by Qualcomm® Neural Processing SDK to create inputs and outputs from user-backed buffers.
std::vector<std::unordered_map <std::string, std::vector<uint8_t>>> outputBuffersVec(nums);
std::vector<std::unordered_map <std::string, std::vector<uint8_t>>> inputBuffersVec(nums);
std::vector<Snpe_IUserBuffer_Handle_t> snpeUserBackedInputBuffers;
std::vector<Snpe_IUserBuffer_Handle_t> snpeUserBackedOutputBuffers;
Snpe_UserBufferList_Handle_t inputMapList = Snpe_UserBufferList_CreateSize(BufferNum);
Snpe_UserBufferList_Handle_t outputMapList = Snpe_UserBufferList_CreateSize(BufferNum);
if(inputOutputTransmissionMode != zdl::PSNPE::InputOutputTransmissionMode::inputOutputAsync)
{
for (size_t i = 0; i < inputs.size(); ++i) {
for (size_t j = 0; j < Snpe_StringList_Size(inputTensorNamesList[0]); ++j) {
const char* name = Snpe_StringList_At(inputTensorNamesList[0], j);
uint8_t bufferBitWidth = bitWidthMap[bufferDataTypeMap[name]];
uint8_t nativeBitWidth = usingNativeInputDataType ? bitWidthMap[nativeDataTypeMap[name]]: 32;
std::string nativeDataType = usingNativeInputDataType ? nativeDataTypeMap[name] : "float32";
if(bufferDataTypeMap[name] == "float16" || bufferDataTypeMap[name] == "float32"){
if(!LoadInputBufferMapsFloatN(inputs[i][j], name, {psnpeHandle, true},
Snpe_UserBufferList_At_Ref(inputMapList, i),
snpeUserBackedInputBuffers, inputBuffersVec[i],numFilesCopied, batchSize, dynamicQuantization,
bufferBitWidth,10, rpcMemAllocFnHandle, false, ionBufferMapHandle,
usingNativeInputDataType, nativeDataType, nativeBitWidth))
{
return EXIT_FAILURE;
}
}
}
Snpe_StringList_Handle_t outputBufferNamesHandle = Snpe_PSNPE_GetOutputTensorNames(psnpeHandle);
for (size_t j = 0; j < Snpe_StringList_Size(outputBufferNamesHandle); ++j) {
const char* name = Snpe_StringList_At(outputBufferNamesHandle, j);
if(bufferDataTypeMap.find(name) == bufferDataTypeMap.end()){
std::cerr << "DataType not specified for buffer " << name << std::endl;
}
uint8_t bitWidth = bitWidthMap[bufferDataTypeMap[name]];
if(bufferDataTypeMap[name] == "float16" || bufferDataTypeMap[name] == "float32"){
PopulateOutputBufferMapsFloatN({psnpeHandle, true}, name,
Snpe_UserBufferList_At_Ref(outputMapList, i),
snpeUserBackedOutputBuffers, outputBuffersVec[i], bitWidth, 10,
rpcMemAllocFnHandle, usingIonBuffer, ionBufferMapHandle);
}
}
}
}
Execute the Network & Process Output for Sync Mode
The following code uses the native API to execute the network in synchronous mode. The saveOutput function could refer to PSNPE C++ Tutorial
exeStatus = SNPE_SUCCESS == Snpe_PSNPE_Execute(psnpeHandle, inputMapList, outputMapList);
for (size_t i = 0; i < inputs.size(); i++) {
saveOutput(Snpe_UserBufferList_At_Ref(outputMapList, i), outputBuffersVec[i], ionBufferMapReg, OutputDir, i * batchSize, batchSize, false);
}
BuildConfig for Async Mode
If you want to run outputAsync mode or inputOutputAsync mode, you need to set callback fuction in buildConfig.
if (inputOutputTransmissionMode == SNPE_PSNPE_INPUTOUTPUTTRANSMISSIONMODE_OUTPUTASYNC) {
Snpe_BuildConfig_SetOutputThreadNumbers(bcHandle, outputNum);
Snpe_BuildConfig_SetOutputCallback(bcHandle, OCallback);
}
if (inputOutputTransmissionMode == SNPE_PSNPE_INPUTOUTPUTTRANSMISSIONMODE_INPUTOUTPUTASYNC) {
Snpe_BuildConfig_SetInputThreadNumbers(bcHandle, inputNum);
Snpe_BuildConfig_SetOutputThreadNumbers(bcHandle, outputNum);
Snpe_BuildConfig_SetInputOutputCallback(bcHandle, IOCallback);
Snpe_BuildConfig_SetInputOutputInputCallback(bcHandle, inputCallback);
}
Callback for OutputAsync Mode
Output asynchronous mode provide real-time output by calling callback function.
void OCallback(Snpe_PSNPE_OutputAsyncCallbackParam_Handle_t oacpHandle) {
if(!Snpe_PSNPE_OutputAsyncCallbackParam_GetExecuteStatus(oacpHandle)) {
std::cerr << "excute fail ,index: " << Snpe_PSNPE_OutputAsyncCallbackParam_GetDataIdx(oacpHandle) << std::endl;
}
}
Execution and Callback for InputOutputAsync Mode
Asynchronous execution can provide real-time output result while synchronous mode provides the outputs after finishing execution.
for (size_t i = 0; i < inputs.size(); ++i) {
std::vector< std::string > filePaths;
std::vector<std::queue<std::string>> temp = inputs[i];
for(size_t j=0;j<temp.size();j++)
{
while(temp[j].size()!= 0){
filePaths.push_back(temp[j].front());
temp[j].pop();
}
numLines++;
Snpe_StringList_Handle_t filePathsHandle = toStringList(filePaths);
exeStatus = SNPE_SUCCESS == Snpe_PSNPE_ExecuteInputOutputAsync(psnpeHandle, filePathsHandle, i, usingTf8UserBuffer, usingTf8UserBuffer);
}
}
//In input/output asynchronous mode, loading input data through callback function with TF8 vector.
Snpe_ApplicationBufferMap_Handle_t inputCallback(Snpe_StringList_Handle_t inputs, Snpe_StringList_Handle_t inputNames) {
Snpe_ApplicationBufferMap_Handle_t inputMap = Snpe_ApplicationBufferMap_Create();
for (size_t j = 0; j < Snpe_StringList_Size(inputNames); j++) {
std::vector<uint8_t> loadVector;
... //load input data
Snpe_ApplicationBufferMap_Add(inputMap, Snpe_StringList_At(inputNames, j), loadVector.data(), loadVector.size());
}
return inputMap;
}
// In input/output asynchronous mode, the index and data of output can be obtained through a callback function
void IOCallback(Snpe_PSNPE_InputOutputAsyncCallbackParam_Handle_t ioacpHandle)
{
Snpe_StringList_Handle_t names = Snpe_PSNPE_InputOutputAsyncCallbackParam_GetUserBufferNames(ioacpHandle);
std::vector<std::pair<const char*, Snpe_UserBufferData_t>> vec;
const auto end = Snpe_StringList_End(names);
for(auto it = Snpe_StringList_Begin(names); it != end; ++it){
vec.emplace_back(*it, Snpe_PSNPE_InputOutputAsyncCallbackParam_GetUserBuffer(ioacpHandle, *it));
}
saveOutput(vec, OutputDir, Snpe_PSNPE_InputOutputAsyncCallbackParam_GetDataIdx(ioacpHandle));
}
// The below shows parts of the function.
void saveOutput(const std::vector<std::pair<const char*, Snpe_UserBufferData_t>>& applicationOutputBuffers, const std::string& outputDir, int num){
std::for_each(applicationOutputBuffers.begin(),
applicationOutputBuffers.end(),
[&](std::pair<std::string, Snpe_UserBufferData_t> a) {
std::ostringstream path;
path << outputDir << "/"
<< "Result_" << num << "/" << pal::FileOp::toLegalFilename(a.first) << ".raw";
std::string outputPath = path.str();
std::string::size_type pos = outputPath.find(":");
if (pos != std::string::npos) outputPath = outputPath.replace(pos, 1, "_");
SaveUserBuffer(outputPath, a.second.data, a.second.size);
});
}
C Application Example
The C application integrated with PSNPE in this tutorial is called snpe-parallel-run. It is a command line executable that executes a DLC model using Qualcomm® Neural Processing SDK SDK APIs. It’s usage is same as snpe-net-run example from Running the Inception v3 Model while running on android target.
Push model data to Android target.
Select target architecture.
Push binaries to target.
Set up environment variables.
adb shell
export ADSP_LIBRARY_PATH="/data/local/tmp/snpeexample/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/aarch64-android/lib
export PATH=$PATH:/data/local/tmp/snpeexample/aarch64-android/bin/
cd /data/local/tmp/inception_v3
snpe-parallel-run --container inception_v3_quantized.dlc --input_list target_raw_list.txt --use_dsp --perf_profile burst --cpu_fallback false --use_dsp --perf_profile burst --cpu_fallback false --runtime_mode output_async
exit