snpe-net-run¶
snpe-net-run loads a DLC file, loads the data for the input tensor(s), and executes the network on the specified runtime.
DESCRIPTION:
------------
Tool that loads and executes a neural network using the SDK API.
REQUIRED ARGUMENTS:
-------------------
--container <FILE> Path to the DL container containing the network.
--input_list <FILE> Path to a file listing the inputs for the network.
Optionally the file can have "#" starting line to specify the layer names or "%" to specify the output tensor
names for which output tensor files are to be produced. For more details about the input_list file format,
please refer to SDK html documentation (docs/general/tools.html#snpe-net-run input_list argument).
OPTIONAL ARGUMENTS:
-------------------
--use_gpu Use the GPU runtime for SNPE. Default float32 math and float16 storage.
--use_dsp Use the DSP fixed point runtime for SNPE. Data & Math: 8bit fixed point Tensorflow style format.
--use_aip Use the AIP fixed point runtime for SNPE. Data & Math: 8bit fixed point Tensorflow style format.
--debug Specifies that output from all layers of the network
will be saved.
--output_dir=<val>
The directory to save output to. Defaults to ./output
--storage_dir=<val>
The directory to store metadata files
--encoding_type=<val>
Specifies the encoding type of input file. Valid settings are "nv21".
Cannot be combined with --userbuffer*.
--use_native_input_files
Specifies to consume the input file(s) in their native data type(s).
Must be used with --userbuffer_xxx.
--use_native_output_files
Specifies to write the output file(s) in their native data type(s).
Must be used with --userbuffer_xxx.
--userbuffer_auto
Specifies to use userbuffer for input and output, with auto detection of types enabled.
Must be used with user specified buffer. Cannot be combined with --encoding_type.
--userbuffer_float
Specifies to use userbuffer for inference, and the input type is float.
Cannot be combined with --encoding_type.
--userbuffer_floatN=<val>
Specifies to use userbuffer for inference, and the input type is float 16 or float 32.
Cannot be combined with --encoding_type.
--userbuffer_tf8 Specifies to use userbuffer for inference, and the input type is tf8exact0.
Cannot be combined with --encoding_type.
--userbuffer_tfN=<val>
Overrides the userbuffer output used for inference, and the output type is tf8exact0 or tf16exact0.
Must be used with user specified buffer.
--userbuffer_float_output
Overrides the userbuffer output used for inference, and the output type is float. Must be used with user
specified buffer.
--userbuffer_floatN_output=<val>
Overrides the userbuffer output used for inference, and the output type is float 16 or float 32. Must be used with user
specified buffer.
--userbuffer_tfN_output=<val>
Overrides the userbuffer output used for inference, and the output type is tf8exact0 or tf16exact0.
Must be used with user specified buffer.
--userbuffer_tf8_output
Overrides the userbuffer output used for inference, and the output type is tf8exact0.
--userbuffer_uintN_output=<val>
Overrides the userbuffer output used for inference, and the output type is Uint N. Must be used with user
specified buffer.
--userbuffer_memorymapped
Specifies to use memory-mapped (zero-copy) user buffer. Must be used with --userbuffer_float or
--userbuffer_tf8 or userbuffer_tfN or userbuffer_auto etc. Cannot be combined with --encoding_type or
--userbuffer_memorymapped_shared. Note: Passing this option will turn all input and output userbuffers into memory mapped buffer.
--userbuffer_memorymapped_shared
Specifies to use memory-mapped (zero-copy) user buffer with shared memory chunk. Must be used with --userbuffer_float or
--userbuffer_tf8 or userbuffer_tfN or userbuffer_auto etc. Cannot be combined with --encoding_type or
--userbuffer_memorymapped. Note: Passing this option will turn all input and output userbuffers into memory mapped buffer.
--static_min_max Specifies to use quantization parameters from the model instead of
input specific quantization. Used in conjunction with --userbuffer_tf8.
--resizable_dim=<val>
Specifies the maximum number that resizable dimensions can grow into.
Used as a hint to create UserBuffers for models with dynamic sized outputs. Should be a
positive integer and is not applicable when using ITensor.
--userbuffer_glbuffer
[EXPERIMENTAL] Specifies to use userbuffer for inference, and the input source is OpenGL buffer.
Cannot be combined with --encoding_type.
GL buffer mode is only supported on Android OS.
--data_type_map=<val>
Sets data type of IO buffers during prepare.
Arguments should be provided in the following format:
--data_type_map buffer_name1=buffer_name1_data_type --data_type_map buffer_name2=buffer_name2_data_type
Data Type can have the following values: float16, float32, fixedPoint8, fixedPoint16, int8, int16, int32, int64, uint8, uint16, uint32, uint64, bool8
Note: Must use this option with --tensor_mode.
--tensor_mode=<val>
Sets type of tensor to use.
Arguments should be provided in the following format:
--tensor_mode itensor
Data Type can have the following values: userBuffer, itensor
--perf_profile=<val>
Specifies perf profile to set. Valid settings are "low_balanced" , "balanced" , "default",
"high_performance" ,"sustained_high_performance", "burst", "low_power_saver", "power_saver",
"high_power_saver", "extreme_power_saver", and "system_settings".
--perf_config_yaml Specifies the path to the yaml file containing the perf profile settings.
--profiling_level=<val>
Specifies the profiling level. Valid settings are "off", "basic", "moderate", "detailed", and "linting".
Default is detailed.
--enable_cpu_fallback
Enables cpu fallback functionality. Defaults to disable mode.
--input_name=<val>
Specifies the name of input for which dimensions are specified
e.g. --input_name=<input name>
If the DLC has multiple graphs, graph names are required.
Quotes are required if graph names are specified. Graph names can be found in snpe-dlc-info.
e.g. --input_name="<graph name> <input name>"
--input_dimensions=<val>
Specifies new dimensions for input whose name is specified in input_name.
e.g. --input_dimension=1,224,224,3
For multiple inputs, specify --input_name=<input name> and --input_dimensions=<input dimensions> multiple times.
If the DLC has multiple graphs, graph names are required.
Quotes are required if graph names are specified. Graph names can be found in snpe-dlc-info.
e.g. --input_dimensions="<graph name> 1,224,224,3"
--gpu_mode=<val>
Specifies gpu operation mode. Valid settings are "default", "float16".
default = float32 math and float16 storage (equiv. use_gpu arg).
float16 = float16 math and float16 storage.
--enable_init_cache
Enable init caching mode to accelerate the network building process. Defaults to disable.
--platform_options=<val>
Specifies value to pass as platform options.
--priority_hint=<val>
Specifies hint for priority level. Valid settings are "low", "normal", "normal_high", "high". Defaults to normal.
Note: "normal_high" is only available on DSP.
--inferences_per_duration=<val>
Specifies the number of inferences in specific duration (in seconds). e.g. "10,20".
--runtime_order=<val>
Specifies the order of precedence for runtimes.
Valid values are: cpu (Snapdragon CPU), gpu_float16 (Adreno GPU), gpu (Adreno GPU), aip (Snapdragon HTA+HVX), and dsp (Hexagon DSP).
This option cannot be passed when any variant of --use_<RUNTIME> is used.
--set_output_tensors=<val>
Optionally, Specifies a comma separated list of tensors to be output after execution.
If using Multi Graph DLC, use --set_output_tensors for each graph.
e.g --set_output_tensors="graphA tensorA1,tensorA2" --set_output_tensors="graphB tensorB1,tensorB2"
graph name is specified in snpe-dlc-info and defaults to the name of the first graph in the DLC.
--set_unconsumed_as_output
Sets all unconsumed tensors as outputs.
aip_fixed8_tf (Snapdragon HTA+HVX) = Data & Math: 8bit fixed point Tensorflow style format
cpu (Snapdragon CPU) = Same as cpu_float32
gpu (Adreno GPU) = Same as gpu_float32_16_hybrid
dsp (Hexagon DSP) = Same as dsp_fixed8_tf
aip (Snapdragon HTA+HVX) = Same as aip_fixed8_tf
--udo_package_path=<val>
Path to the registration library for UDO package(s).
Optionally, user can provide multiple packages as a comma-separated list.
--duration=<val> Specified the duration of the run in seconds. Loops over the input_list until this amount of time has transpired.
--keep_num_outputs=<val>
Specifies the number of output sets to be generated. Loops over the input_list until this amount of outputs have been saved.
--enable_cpu_fxp Enable the fixed point execution on CPU runtime
--dbglogs
--timeout=<val> Execution terminated when exceeding time limit (in microseconds). Only valid for HTP (dsp v68+) runtime.
--userlogs=<val> Specifies the user level logging as level,<optional logPath>.
Valid values are: "warn", "verbose", "info", "error", "fatal"
--model_name=<val>
Specifies model name for logging.
--cache_compatibility_mode=<val>
Specifies the cache compatibility check mode; valid values are: "permissive" (default), "strict", and "always_generate_new".
Only valid for HTP (dsp v68+) runtime.
--validate_cache Perform an additional validation step just before building SNPE to check the validity of the selected cache record in the DLC.
Upon success, app will proceed as usual. On validation failure, the app will report the validation error before exiting.
--graph_init=<val>
Optionally, Specifies a comma separated list of specified graphs in the current DLC that is set to be inited.
e.g --graph_init graph1, graph2, graph3
--graph_execute=<val>
Optionally, Specifies a comma separated list of specified graphs in the current DLC that is set to be executed.
e.g --graph_execute graph1, graph2, graph3
--memory_limit_hint=<val>
Specifies the memory limit in Mb that DSP can use for initializing a cache.
--init_from_buffer
Specifies to use the Snpe_DlContainer_OpenBuffer API to create DLContainer handle.
--enable_htp_accelerated_init
Enable accelerated initialization only for HTP runtime and offline prepared dlcs.
--help Show this help message.
--version Show SDK Version Number.
Running batched inputs:
snpe-net-run is able to automatically batch the input data. The batch size is indicated in the model container (DLC file) but can also be set using the “input_dimensions” argument passed to snpe-net-run. Users do not need to batch their input data. If the input data is not batch, the input size needs to be a multiple of the size of the input data files. snpe-net-run would group the provided inputs into batches and pad the incomplete batches (if present) with zeros.
In the example below, the model is set to accept batches of three inputs. So, the inputs are automatically grouped together to form batches by snpe-net-run and padding is done to the final batch. Note that there are five output files generated by snpe-net-run:
… Processing DNN input(s): cropped/notice_sign.raw cropped/trash_bin.raw cropped/plastic_cup.raw Processing DNN input(s): cropped/handicap_sign.raw cropped/chairs.raw Applying padding
input_list argument:
snpe-net-run can take multiple input files as input data per iteration, and specify multiple output names, in an input list file formated as below:
#<output_name>[<space><output_name>] <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>] …
The first line starting with a “#” specifies the output layers’ names. If there is more than one output, a whitespace should be used as a delimiter. Following the first line, you can use multiple lines to supply input files, one line per iteration, and each line only supply one layer. If there is more than one input per line, a whitespace should be used as a delimiter.
Here is an example, where the layer names are “Input_1” and “Input_2”, and inputs are located in the path “Placeholder_1/real_input_inputs_1/”. Its input list file should look like this:
#Output_1 Output_2 Input_1:=Placeholder_1/real_input_inputs_1/0-0#e6fb51.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/0-1#8a171b.rawtensor Input_1:=Placeholder_1/real_input_inputs_1/1-0#67c965.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/1-1#54f1ff.rawtensor Input_1:=Placeholder_1/real_input_inputs_1/2-0#b42dc6.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/2-1#346a0e.rawtensor
Similar to above a first line starting with “%” specifies the output tensor names
Note: If the batch dimension of the model is greater than 1, the number of batch elements in the input file has to either match the batch dimension specified in the DLC or it has to be one. In the latter case, snpe-net-run will combine multiple lines into a single input tensor.
Running AIP Runtime:
AIP Runtime requires a DLC which was quantized, and HTA sections were generated offline.
AIP Runtime does not support debug_mode
AIP Runtime requires a DLC with all the layers partitioned to HTA to support batched inputs
Set cache compatibility mode:
A DLC can include more than one cache record and users can set the compatibility mode to check whether the best cache record is optimal for the device. The available modes indicate binary cache compatibility as follows.
permissive – Compatible if it could run on the device.
strict – Compatible if it could run on the device and fully utilize hardware capability.
always_generate_new – Always incompatible; SNPE will generate a new cache.
snpe-parallel-run¶
snpe-parallel-run loads a DLC file, loads the data for the input tensor(s), and executes the network on the specified runtime. This app is similar to snpe-net-run, but is able to run multiple threads of inference on the same network for benchmarking purposes.
DESCRIPTION:
------------
Tool that loads and executes one or more neural networks on different threads with optional asynchronous input/output processing using SDK APIs.
REQUIRED ARGUMENTS:
-------------------
--container <FILE> Path to the DL container containing the network.
--input_list <FILE> Path to a file listing the inputs for the network.
--perf_profile <VAL>
Specifies perf profile to set. Valid settings are "balanced" , "default" , "high_performance" , "sustained_high_performance" , "burst" , "power_saver", "low_power_saver", "high_power_saver", "extreme_power_saver", "low_balanced", and "system_settings".
NOTE: "balanced" and "default" are the same. "default" is being deprecated in the future.
--cpu_fallback Enables cpu fallback functionality. Valid settings are "false", "true".
--runtime_order <VAL,VAL,VAL,..>
Specifies the order of precedence for runtime e.g cpu,gpu etc. Valid values are: cpu, gpu, gpu_float16, dsp, aip. This option cannot be passed when any variant of --use_<RUNTIME> is used.
--use_cpu Use the CPU runtime for SNPE (Snapdragon CPU). Data & Math: float 32bit. Only one --use_<RUNTIME> option is needed.
--use_gpu Use the GPU float32 runtime (Adreno GPU). Data: float 16bit Math: float 32bit.
--use_gpu_fp16 Use the GPU float16 runtime (Adreno GPU). Data: float 16bit Math: float 16bit.
--use_dsp Use the DSP fixed point runtime (Hexagon DSP). Data & Math: 8bit fixed point Tensorflow style format.
--use_aip Use the AIP fixed point runtime (Snapdragon HTA+HVX). Data & Math: 8bit fixed point Tensorflow style format.
--perf_config_yaml <VAL>
Specifies the path to the yaml file containing the paths to the init, execute and dinit yaml files.
NOTE: --perf_profile and --perf_config_yaml are mutually exclusive. Only one of the options can be specified at a time.
OPTIONAL ARGUMENTS:
-------------------
--userbuffer_float Specifies to use userbuffer for inference, and the input type is float.
--userbuffer_tf8 Specifies to use userbuffer for inference, and the input type is tf8exact0.
--userbuffer_auto Specifies to use userbuffer with automatic input and output type detection for inference.
--use_native_input_files
Specifies to consume the input file(s) in their native data type(s).
Must be used with --userbuffer_xxx.
--use_native_output_files
Specifies to write the output file(s) in their native data type(s).
Must be used with --userbuffer_xxx.
--input_name <INPUT_NAME>
Specifies the name of input for which dimensions are specified.
--input_dimensions <INPUT_DIM>
Specifies new dimensions for input whose name is specified in input_name. e.g. "1,224,224,3".
--output_dir <DIR> The directory to save result files
--static_min_max Specifies to use quantization parameters from the model instead of
input specific quantization. Used in conjunction with --userbuffer_tf8.
--userbuffer_float_output
Overrides the userbuffer output used for inference, and the output type is float.
Must be used with user specified buffer.
--userbuffer_tf8_output
Overrides the userbuffer output used for inference, and the output type is tf8exact0.
Must be used with user specified buffer.
--userbuffer_memorymapped
Specifies to use memory-mapped (zero-copy) user buffer. Must be used with --userbuffer_float or
--userbuffer_tf8 or userbuffer_tfN or userbuffer_auto etc. Cannot be combined with --encoding_type.
Note: Passing this option will turn all input and output userbuffers into memory mapped buffer.
--userbuffer_memorymapped_shared
Specifies to use memory-mapped (zero-copy) user buffer with shared memory chunk.
Must be used with --userbuffer_float or --userbuffer_tf8 or userbuffer_tfN or
userbuffer_auto etc. Cannot be combined with --encoding_type or --userbuffer_memorymapped.
Note: Passing this option will turn all input and output userbuffers into memory mapped buffer
--enable_init_cache Enable init caching mode to accelerate the network building process. Defaults to disable.
--profiling_level Specifies the profiling level. Valid settings are "off", "basic", "moderate", "detailed", and "linting".
Default is off.
--platform_options Specifies value to pass as platform options. Valid settings: "HtaDLBC:ON/OFF", "unsignedPD:ON/OFF".
--platform_options_local
Specifies the value to pass for the current SNPE instance. Valid settings: "HtpDLBC:ON/OFF", "HtaDLBC:ON/OFF;HtpDLBC:ON/OFF".
--runtime_mode Specifies the transmission mode for input/output processing. Valid settings are "sync", "output_async", "inputoutput_async".
--set_output_tensors Specifies a comma separated list of tensors to be output after execution.
--userlogs <VAL> Specifies the user level logging as level,<optional logPath>.
--enable_cpu_fxp Enable the fixed point execution on CPU runtime.
--init_from_buffer Specifies to use the Snpe_DlContainer_OpenBuffer API to create DLContainer handle.
--enable_htp_accelerated_init
Enable accelerated initialization only for HTP runtime and offline prepared dlcs.
--version Show SDK Version Number.
--help Show this help message.
Required runtime argument:
For the required arguments pertaining to runtime specification, either –runtime_order OR –use_cpu OR –use_gpu etc. needs to be specified. The following example demonstrates an equivalent command using either of these options.
snpe-parallel-run --container container.dlc --input_list input_list.txt --perf_profile burst --cpu_fallback true --use_dsp --use_gpu --userbuffer_auto
is equivalent to
snpe-parallel-run --container container.dlc --input_list input_list.txt --perf_profile burst --cpu_fallback true --runtime_order dsp,gpu --userbuffer_auto
Spawning multiple threads:
snpe-parallel-run is able to create multiple threads to execute identical inference passes.
In the example below, the given command has the required arguments for container and input list given. After these 2 options, the remaining options form a repeating sequence that corresponds to each thread. In this example, we have varied the runtimes specified for each thread (one for dsp, another for gpu, and the last one for dsp).
snpe-parallel-run --container container.dlc --input_list input_list.txt
--perf_profile burst --cpu_fallback true --use_dsp --userbuffer_auto
--perf_profile burst --cpu_fallback true --use_gpu --userbuffer_auto
--perf_profile burst --cpu_fallback true --use_dsp --userbuffer_auto
When this command is executed, the following section of output is observed:
...
Processing DNN input(s):
input.raw
PSNPE start executing...
runtimes: dsp_fixed8_tf gpu_float32_16_hybrid dsp_fixed8_tf - Mode :0- Number of images processed: x
Build time: x seconds.
...
Note that the number of runtimes listed corresponds to the number of threads specified, as well as the order in which those threads were specified.
snpe-throughput-net-run¶
snpe-throughput-net-run concurrently runs multiple instances of SNPE for a certain duration of time and measures inference throughput. Each instance of SNPE can have its own model, designated runtime and performance profile. Please note that the –duration parameter is common for all instances of SNPE created.
DESCRIPTION:
------------
Tool to load and execute concurrent SNPE objects using the SDK API.
REQUIRED ARGUMENTS:
-------------------
--container <FILE> Path to the DL container containing the network.
--duration <VAL> Duration of time (in seconds) to run network execution.
--use_cpu Use the CPU runtime for SNPE (Snapdragon CPU). Data & Math: float 32bit. Only one --use_<RUNTIME> option is needed.
--use_gpu Use the GPU float32 runtime (Adreno GPU). Data: float 16bit Math: float 32bit.
--use_gpu_fp16 Use the GPU float16 runtime (Adreno GPU). Data: float 16bit Math: float 16bit.
--use_dsp Use the DSP fixed point runtime (Hexagon DSP). Data & Math: 8bit fixed point Tensorflow style format.
--use_aip Use the AIP fixed point runtime (Snapdragon HTA+HVX). Data & Math: 8bit fixed point Tensorflow style format.
--perf_profile <VAL> Specifies perf profile to set. Valid settings are "balanced" , "default" , "high_performance" ,
"sustained_high_performance" , "burst" , "power_saver", "low_power_saver", "high_power_saver", "extreme_power_saver", "low_balanced", and "system_settings".
NOTE: "balanced" and "default" are the same. "default" is being deprecated in the future.
--perf_config_yaml <VAL> Specifies the path to the yaml config file containing the perf configs.
--runtime_order <VAL,VAL,VAL,..> Specifies the order of precedence for runtime e.g cpu,gpu etc. Valid values are: cpu, gpu, gpu_float16, dsp, aip. This option cannot be passed when any variant of --use_<RUNTIME> is used.
OPTIONAL ARGUMENTS:
-------------------
--debug Specifies that output from all layers of the network
will be saved.
--userbuffer_auto Specifies to use userbuffer for input and output, with auto detection of types enabled.
Must be used with user specified buffer.
--userbuffer_float Specifies to use userbuffer for inference, and the input type is float.
Must be used with user specified buffer.
--userbuffer_floatN Specifies to use userbuffer for inference, and the input type is float16 or float32.
Must be used with user specified buffer.
--userbuffer_tf8 Specifies to use userbuffer for inference, and the input type is tf8exact0.
Must be used with user specified buffer.
--userbuffer_tfN Specifies to use userbuffer for inference, and the input type is tf8exact0 or tf16exact0.
Must be used with user specified buffer.
--userbuffer_memorymapped Specifies to use memory-mapped (zero-copy) user buffer. Must be used with --userbuffer_float or
--userbuffer_tf8 or userbuffer_tfN or userbuffer_auto etc. Cannot be combined with --encoding_type or
--userbuffer_memorymapped_shared. Note: Passing this option will turn all input and output userbuffers into memory mapped buffer.
Note: Passing this option will turn all input and output userbuffers into memory mapped buffer.
--userbuffer_memorymapped_shared Specifies to use memory mapped user buffer in a shared memory chunk. Must be used with --userbuffer_float or
--userbuffer_tf8 or userbuffer_tfN or userbuffer_auto etc. Cannot be combined with --encoding_type or
--userbuffer_memorymapped. Note: Passing this option will turn all input and output userbuffers into memory mapped buffer.
--userbuffer_float_output Overrides the userbuffer output used for inference, and the output type is float.
Must be used with user specified buffer.
--userbuffer_floatN_output Overrides the userbuffer output used for inference, and the output type is float16 or float32.
Must be used with user specified buffer.
--userbuffer_tf8_output Overrides the userbuffer output used for inference, and the output type is tf8exact0.
Must be used with user specified buffer.
--userbuffer_tfN_output Overrides the userbuffer output used for inference, and the output type is tf8exact0 or tf16exact0.
Must be used with user specified buffer.
--storage_dir <DIR> The directory to store metadata files
--version Show SDK Version Number.
--iterations <VAL> Number of times to iterate through entire input list
--verbose Print more debug information.
--skip_execute Don't do execution (just graph build/teardown)
--enable_cpu_fallback Enables cpu fallback functionality. Defaults to disable mode.
--json <FILE> Generated JSON report.
--input_raw <FILE> Path to raw inputs for the network, seperated by ",".
--fixed_fps <VAL> Fix fps so as to control system loading, total FPS will be limited to around <VAL> Ex: 30,20,0(free run)
--udo_package_path <VAL,VAL> Path to UDO package with registration library for UDOs.
Optionally, user can provide multiple packages as a comma-separated list.
--enable_init_cache Enable init caching mode to accelerate the network building process. Defaults to disable.
--platform_options <VAL> Specifies value to pass as platform options for all SNPE instances.
--platform_options_local <VAL> Specifies the value to pass as per SNPE instance platform options for the current SNPE instance.
if --platform_options is specified then it overwrites the global platform options for the current SNPE instance.
--priority_hint <VAL> Specifies hint for priority level. Valid settings are "low", "normal", "normal_high", "high". Defaults to normal.
Note: "normal_high" is only available on DSP.
--groupDuration <VAL> Duration (in ms) of execution before next sleep.(Optional)
--groupSleep <VAL> Sleep interval (in ms) after execution of group.(Optional)
--set_output_layers <VAL> Optionally, user can provide a comma separated list of layers to be output after execution.
If using multi-graph DLC, provide <graph name> <comma separated layers> in double quotes.
It should be defined for all instances or none at all.
Use empty string for instances that doesn't need any layer outputs.
e.g --set_output_layers "graphA layer1,layer2,layer3"
--set_output_tensors <VAL> Optionally, user can provide a comma separated list of tensors to be output after execution.
If using multi-graph DLC, provide <graph name> <comma separated tensors> in double quotes.
It should be defined for all instances or none at all.
Use empty string for instances that doesn't need any layer outputs.
e.g --set_output_tensors "graphA tensor1,tensor2,tensor3"
--userlogs=<val> Specifies the user level logging as level,<optional logPath>.
Valid values are: "warn", "verbose", "info", "error", "fatal"
--enable_cpu_fxp Enable the fixed point execution on CPU runtime.
--model_name <VAL> To add the model name to the logs (Optional)
--cache_compatibility_mode=<val> Specifies the cache compatibility check mode; valid values are: "permissive" (default), "strict", and "always_generate_new".
Only valid for HTP (dsp v68+) runtime.
--validate_cache Perform an additional validation step just before building SNPE to check the validity of the selected cache record in the DLC.
Upon success, app will proceed as usual. On validation failure, the app will report the validation error before exiting.
--graph_init Optionally, Specifies a comma separated list of specified graphs in the current DLC that is set to be inited.
e.g --graph_init graph1, graph2, graph3
--graph_execute Optionally, Specifies a comma separated list of specified graphs in the current DLC that is set to be executed.
e.g --graph_execute graph1, graph2, graph3
--enable_htp_accelerated_init
Enable accelerated initialization only for HTP runtime and offline prepared dlcs.
--help Show this help message.