Overview of UDO¶
Introduction
Qualcomm® Neural Processing SDK provides the ability for users to plug in custom neural network operations that may not be inherently supported by the runtime engine in the form of User-Defined Operations (hereafter referred to as UDO). These could be operations defined in popular training frameworks such as Tensorflow or custom operations that are built based as framework extensions but not available in the Qualcomm® Neural Processing SDK. They can be natively executed on any of the supported hardware accelerators for which they are implemented. Qualcomm® Neural Processing SDK provides the infrastructure to execute these operations in a seamless fashion with little to no overhead compared to executing internally supported operations.
Anatomy of a UDO package
The registration library consists of methods that specify all user-defined operations and the hardware cores they are designed for. It also consists of methods that allow operations to be validated for sanity at the time of network creation. The registration library is loaded and executed on the ARM CPU.
The hardware-specific implementation libraries expose several other methods that implement operation instance creation, execution, profiling and destruction. These are implemented with programming constructs supported from corresponding software platforms, such as OpenCL for GPU and Hexagon-NN SDK for DSP. While core-specific implementation files may differ entirely in source, they are all required to interface with Qualcomm® Neural Processing SDK using a set of C APIs defined in $SNPE_ROOT/include/SNPE/SnpeUdo. The complete details on these APIs can be obtained from Qualcomm® Neural Processing SDK API.
UDO workflow
Qualcomm® Neural Processing SDK recommends the following workflow in developing and integrating a UDO into the runtime:
The first step in the workflow is to identify the operations in the model that need to be expressed as user-defined operations and describing their attributes through a configuration file. The format and contents of this file are described in Defining a UDO.
The next set of steps produce the components of a UDO package
by creating source files for the UDO kernels and compiling them
against appropriate tool-chains to generate dynamic libraries
specific to hardware cores such as the GPU and DSP. Qualcomm® Neural Processing SDK
provides a tool called snpe-udo-package-generator that
assists users in creating common skeleton code for interfacing
with Qualcomm® Neural Processing SDK UDO APIs and leaves placeholders for users to fill in
the kernel implementation. It also generates makefiles for
common targets such as x86, Android, and for runtimes per
target specified in the config file. For more details on the
package generation refer to Creating a UDO
Package. For details on
compiling the UDO package for a specific runtime refer to
Compiling a UDO package.
The config file created in the first step is also required to
be used by the Qualcomm® Neural Processing SDK model conversion tools along with the
actual trained model to allow interpretation of the
user-defined operations using definitions from the file. The
resulting DLC files can then be inspected using tools like
snpe-dlc-info to probe the attributes of the UDOs in the
model. For details on creating (and optionally quantizing) DLCs
with UDOs refer to Preparing a model with
UDO. Optionally, models with
UDOs can also be quantized using Qualcomm® Neural Processing SDK quantization tools to use
with fixed-point runtimes such as DSP. The quantizer tool
estimates the quantization ranges for activations from all
layers in the network including UDOs. Since the tool runs
offline on an x86 host machine, it is required to have a CPU
implementation for the UDO in order to perform inference
through the entire network. This is also illustrated in dotted
lines in the workflow diagram. Refer to Quantizing a DLC with
UDO
for details on the quantization process.
The final step in this workflow is to be able to actually execute network models with UDOs. Qualcomm® Neural Processing SDK applications use the UDO package to register UDO implementations within the process that runs inference on select network models. It should be noted that these UDOs can be exercised by multiple instances of Qualcomm® Neural Processing SDK simultaneously without race conditions, which increases the overall throughput for network inference. For more details on the UDO package registration process refer to Running a model with UDO.
If the DSP implementation library of the UDO is not signed for execution on a signed process domain (the default for a Qualcomm® Neural Processing SDK application), it is required to request the use of an unsigned process domain. Unsigned process domains apply only to the DSP target, and allow Qualcomm® Neural Processing SDK to use unsigned UDO implementation libraries. To see how to utilize an unsigned process domain with the Qualcomm® Neural Processing SDK application, refer to Running a model with UDO.
UDO Backward Compatibility
This section specifies limitations of UDO packages :
The UDOs compiled for DSP V68 or later on a particular Qualcomm® Neural Processing SDK release version, needs to be used with same release version and can’t be used with different release version.
Users need to recompile UDO packages generated for DSP V68 by using correct Qualcomm® AI Direct SDK which is compatible with a particular Qualcomm® Neural Processing SDK release.