Low Level Perf APIs¶
Introduction
In SNPE-2.22, we introduced a set of low level performance management APIs primarily pertaining to the DSP runtime - SNPEPerfProfile.h ( C++ Wrapper SNPEPerfProfile.hpp). In addition to this, two new perf profile setting APIs were added to SNPE and one new API to Builder. Details are here.
These new APIs provide a way for a client application to set different perf settings for initialization, between inferences, and for de-initialization. Up until SNPE-2.21, a user-supplied preset profile (defaulting to Balanced) would be applied during the entire lifecycle of SNPE.
The new APIs are designed to build a perf profile from the ground up OR start with a preset profile (like Burst) and tweak certain parameters on a case-by-case basis. Such customized profiles (or hybrid profiles) can be applied both during initialization via the new builder API or after initialization for inference and de-initialization using the new SNPE API. Detailed usage examples are in New perf APIs in SNPE-2.22.
SNPE employs certain heuristics based on the preset profile – which was true even in SNPE-2.21. As part of the new custom perf profile, we have exposed them as knobs that can be tweaked via APIs in SNPEPerfProfile.h. Details are in Ability to hold the vote for a longer duration – Minimizing RPC Calls.
The new APIs enable better synchronization flexibility for cooperative multiple SNPE instances within the same process or in different processes pertaining to aggregation of voting packets at the HTP/NSP level. Example with an use case is in Synchronization problems in SNPE 2.21 and Better Synchronization flexibility SNPE in 2.22.
The net run applications snpe-net-run and snpe-throughput-net-run have a command-line argument –perf_config_yaml that takes in a user-provided YAML config file that can configure all the perf settings. A sample YAML config corresponding to sustained high performance profile is provided in Sample YAML file.
PSNPE API configurability options are available in SNPE-2.23.
The following sections demonstrate the Perf support in Qualcomm® Neural Processing SDK:
New perf APIs in SNPE-2.22
The current performance setting builder API allows the selection of preset profiles during SNPE initialization. This preset profile is used for voting both before and after initialization, inference, and de-initialization. Based on these preset profiles, SNPE internally employs certain heuristics with respect to RPC polling, async voting thread, and hysteresis timer for voting. The limitation of this design is its lack of flexibility. The same profile is used for the entire lifecycle of SNPE. Additionally, the client application is stuck with the corner voltages selected by these preset profiles, even if alternate voltages are offered by the hardware that may better suit the performance-power trade-off for a particular use case.
Introducing low level perf API header SNPEPerfProfile.h ( C++ Wrapper SNPEPerfProfile.hpp) that allows client application to create a SNPE perf profile handle/object.
Snpe_SNPEPerfProfile_Handle_t perfHandle = Snpe_SNPEPerfProfile_Create();
SNPEPerfProfile perfProfile; // C++ Wrapper API
A perf profile handle can also be created with a preset perf profile as a starting point:-
Snpe_SNPEPerfProfile_Handle_t perfHandle = Snpe_SNPEPerfProfile_CreatePreset(SNPE_PERFORMANCE_PROFILE_BURST);
SNPEPerfProfile perfProfile(BURST); // C++ Wrapper API
This perf profile handle can now be customized as follows:-
Snpe_SNPEPerfProfile_SetDcvsVoltageCornerDcvsVCornerMinDone(perfHandle, SNPE_DCVS_VOLTAGE_VCORNER_NOM_PLUS);
perfProfile.SetDcvsVoltageCornerDcvsVCornerMinDone(DCVS_VOLTAGE_VCORNER_NOM_PLUS); // C++ Wrapper API
This custom perf profile handle/object can now be associated with SNPE Builder handle (for init) or SNPE handle (for inference and de-init):-
Snpe_SNPEBuilder_SetCustomPerfProfile(snpeBuilderHandle, perfHandle);
builder.setCustomPerfProfile(perfProfile); // C++ Wrapper API
Snpe_SNPE_SetCustomPerfProfile(snpeHandle, perfHandle);
snpe->setCustomPerformanceProfile(perfProfile); // C++ Wrapper API
Snpe_SNPE_ExecuteUserBuffers(snpeHandle, inputHandle, outputHandle);
snpe->execute(inputUbMap, outputUbMap); // C++ Wrapper API
An example usage with C API showing different perf settings for init, between inferences and for de-init:-
Snpe_SNPEPerfProfile_Handle_t perfHandle = Snpe_SNPEPerfProfile_CreatePreset(SNPE_PERFORMANCE_PROFILE_HIGH_PERFORMANCE);
Snpe_SNPEPerfProfile_SetCoreVoltageCornerMaxMvStart(perfHandle, SNPE_DCVS_VOLTAGE_VCORNER_TURBO_PLUS);
Snpe_SNPEPerfProfile_SetCoreVoltageCornerMaxMvDone(perfHandle, SNPE_DCVS_VOLTAGE_VCORNER_SVS);
Snpe_SNPEBuilder_SetCustomPerfProfile(snpeBuilderHandle, perfHandle); // perf settings set to high_performance + new corners
Snpe_SNPEPerfProfile_Delete(perfHandle);
Snpe_SNPE_Handle_t snpeHandle = Snpe_SNPEBuilder_Build(snpeBuilderHandle); //Init performed with this custom setting
Snpe_SNPE_ExecuteUserBuffers(snpeHandle, inputHandle, outputHandle); // 1st inference with this custom setting
Snpe_SNPE_SetPerformanceProfile(snpeHandle, SNPE_PERFORMANCE_PROFILE_BURST); //perf setting now overwritten to Burst
Snpe_SNPE_ExecuteUserBuffers(snpeHandle, inputHandle, outputHandle); // 2nd inference with preset Burst profile
Snpe_SNPEPerfProfile_Handle_t perfHandle2 = Snpe_SNPEPerfProfile_Create();
Snpe_SNPEPerfProfile_SetBusVoltageCornerMinStart(perfHandle2, SNPE_DCVS_VOLTAGE_VCORNER_NOM);
……… more settings ………
Snpe_SNPE_SetCustomPerfProfile(snpeHandle, perfHandle2); // perf settings for the SNPE handle set to cutom values
Snpe_SNPEPerfProfile_Delete(perfHandle2);
Snpe_SNPE_ExecuteUserBuffers(snpeHandle, inputHandle, outputHandle); // 3rd inference with custom perf settings
Snpe_SNPE_SetPerformanceProfile(snpeHandle, SNPE_PERFORMANCE_PROFILE_LOW_BALANCED); //perf setting now overwritten to preset
Snpe_SNPE_Delete(snpeHandle); //De-Init performed with this new preset of low_balanced profile
Ability to hold the vote for a longer duration – Minimizing RPC calls
With the preset profiles burst and sustained high performance, inference voting has a hysteresis feature. Once clocks are bumped up before execute, they are not brought down after inference is done for a certain period (300 ms) to reduce the amount of voting packets sent to the NSP. This feature is present in SNPE-2.21 and particularly helps when there are back to back inferences. With SNPE-2.22, we are exposing this as part of custom perf profile. So the hysteresis timer can be increased or reduced or totally turned off for burst and sustained high performance. Also this can now be enabled for any other perf profile. Example:-
Snpe_SNPEPerfProfile_Handle_t perfHandle = Snpe_SNPEPerfProfile_CreatePreset(SNPE_PERFORMANCE_PROFILE_LOW_POWER_SAVER);
Snpe_SNPEPerfProfile_SetDspHysteresisTime(perfHandle, 500); //Set inference voting hysteresis to 500 ms
Snpe_SNPE_SetCustomPerfProfile(snpeHandle, perfHandle);
Snpe_SNPE_ExecuteUserBuffers(snpeHandle, inputHandle, outputHandle);
…… more inferences ……
Snpe_SNPEPerfProfile_SetDspHysteresisTime(perfHandle, 0); //Disable voting hysteresis after a few inferences
Snpe_SNPE_SetCustomPerfProfile(snpeHandle, perfHandle);
Snpe_SNPE_ExecuteUserBuffers(snpeHandle, inputHandle, outputHandle);
Synchronization problem for multiple SNPE instances in SNPE-2.21
Consider the use case of two SNPE instances, as shown in the diagram above. These instances could be part of the same process or different processes. They are executing concurrently, with one having a high-performance workload (represented in red) and the other having a low-performance workload (represented in blue).
When these two SNPE instances are in separate processes, they get unique power config ids. When they are in the same process and unique power config ids were requested via Platform Option “dspPowerSettingContext” (which is the default), they get different power config ids.
Due to this, during vote aggregation at the HTP/NSP level, the after/done vote of SNPE-1 would vote out the before/start vote of SNPE-2. This is because the default after/done vote of SNPE-1 is still high compared to the before/start vote of SNPE-2. The before and after voting voltage corners are controlled by a preset performance profile that was passed during builder/init stage. This means the execution of SNPE-2 would be performed at a higher performance level, which is not intended and thus ends up consuming more power.
With the introduction of the new perf APIs in SNPE-2.22, this can be solved which is outlined in the next section.
Better synchronization flexibility for cooperative multiple SNPE instances in SNPE-2.22
With the introduction of the new performance APIs (illustrated in the diagram above), SNPE-1 can now selectively control the after/done vote (indicated by the dotted red arrow). SNPE-1 can match the after/done vote voltage corner to match SNPE-2’s before/start vote or lower than that or not vote at all. This allows it to cooperate with SNPE-2, thus enabling SNPE-2 to execute in desired low power mode. Code snippet to lower the after/done vote for SNPE-1 the high performance workload is provided below:-
// C API
Snpe_SNPEPerfProfile_Handle_t perfHandle1 = Snpe_SNPEPerfProfile_CreatePreset(SNPE_PERFORMANCE_PROFILE_BURST);
Snpe_SNPEPerfProfile_SetBusVoltageCornerMinDone(perfHandle1, SNPE_DCVS_VOLTAGE_CORNER_DISABLE);
Snpe_SNPEPerfProfile_SetBusVoltageCornerTargetDone(perfHandle1, SNPE_DCVS_VOLTAGE_CORNER_DISABLE);
Snpe_SNPEPerfProfile_SetBusVoltageCornerMaxDone(perfHandle1, SNPE_DCVS_VOLTAGE_CORNER_DISABLE);
Snpe_SNPEPerfProfile_SetCoreVoltageCornerMinMvDone(perfHandle1, SNPE_DCVS_VOLTAGE_CORNER_DISABLE);
Snpe_SNPEPerfProfile_SetCoreVoltageCornerTargetMvDone(perfHandle1, SNPE_DCVS_VOLTAGE_CORNER_DISABLE);
Snpe_SNPEPerfProfile_SetCoreVoltageCornerMaxMvStart(perfHandle1, SNPE_DCVS_VOLTAGE_CORNER_DISABLE);
Snpe_SNPE_SetCustomPerfProfile(snpe1Handle, perfHandle1);
Snpe_SNPE_ExecuteUserBuffers(snpeHandle1, inputHandle, outputHandle);
// C++ Wrapper API
SNPEPerfProfile perfProfile1(BURST);
perfProfile1.setBusVoltageCornerMinDone(DCVS_VOLTAGE_CORNER_DISABLE);
perfProfile1.setBusVoltageCornerTargetDone(DCVS_VOLTAGE_CORNER_DISABLE);
perfProfile1.setBusVoltageCornerMaxDone(DCVS_VOLTAGE_CORNER_DISABLE);
perfProfile1.setCoreVoltageCornerMinMvDone(DCVS_VOLTAGE_CORNER_DISABLE);
perfProfile1.setCoreVoltageCornerTargetMvDone(DCVS_VOLTAGE_CORNER_DISABLE);
perfProfile1.setCoreVoltageCornerMaxMvDone(DCVS_VOLTAGE_CORNER_DISABLE);
snpe1->setCustomPerformanceProfile(perfProfile1);
snpe1->execute(inputUbMap, outputUbMap);
Sample snpe-net-run YAML file
Below is a snippet of snpe-net-run / snpe-throughput-net-run yaml corresponding to SUSTAINED_HIGH_PERFORMANCE profile:-
general:
ASYNC_VOTING_ENABLE: false
DSP_HYSTERESIS_TIME_US: 300000
DSP_SLEEP_DISABLE_MS: 0
DSP_RPC_POLLING_TIME_US: 9999
init:
DSP_ENABLE_DCVS_START: false
DSP_ENABLE_DCVS_DONE: true
DSP_SLEEP_LATENCY_START_US: 100
HIGH_PERFORMANCE_MODE: true
POWERMODE_START: SNPE_DSP_PERF_INFRASTRUCTURE_POWERMODE_ADJUST_UP_DOWN
POWERMODE_DONE: SNPE_DSP_PERF_INFRASTRUCTURE_POWERMODE_ADJUST_UP_DOWN
BUS_VOLTAGE_CORNER_MIN_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
BUS_VOLTAGE_CORNER_TARGET_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
BUS_VOLTAGE_CORNER_MAX_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
CORE_VOLTAGE_CORNER_MIN_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
CORE_VOLTAGE_CORNER_TARGET_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
CORE_VOLTAGE_CORNER_MAX_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
BUS_VOLTAGE_CORNER_MIN_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS2
BUS_VOLTAGE_CORNER_TARGET_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS
BUS_VOLTAGE_CORNER_MAX_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS
CORE_VOLTAGE_CORNER_MIN_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS2
CORE_VOLTAGE_CORNER_TARGET_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS
CORE_VOLTAGE_CORNER_MAX_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS
DSP_SLEEP_LATENCY_DONE_US: 2000
execute:
DSP_ENABLE_DCVS_START: false
DSP_ENABLE_DCVS_DONE: true
DSP_SLEEP_LATENCY_START_US: 100
HIGH_PERFORMANCE_MODE: true
POWERMODE_START: SNPE_DSP_PERF_INFRASTRUCTURE_POWERMODE_ADJUST_UP_DOWN
POWERMODE_DONE: SNPE_DSP_PERF_INFRASTRUCTURE_POWERMODE_ADJUST_UP_DOWN
BUS_VOLTAGE_CORNER_MIN_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
BUS_VOLTAGE_CORNER_TARGET_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
BUS_VOLTAGE_CORNER_MAX_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
CORE_VOLTAGE_CORNER_MIN_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
CORE_VOLTAGE_CORNER_TARGET_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
CORE_VOLTAGE_CORNER_MAX_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
BUS_VOLTAGE_CORNER_MIN_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS2
BUS_VOLTAGE_CORNER_TARGET_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS
BUS_VOLTAGE_CORNER_MAX_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS
CORE_VOLTAGE_CORNER_MIN_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS2
CORE_VOLTAGE_CORNER_TARGET_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS
CORE_VOLTAGE_CORNER_MAX_DONE: SNPE_DCVS_VOLTAGE_VCORNER_SVS
DSP_SLEEP_LATENCY_DONE_US: 2000
deinit:
DSP_ENABLE_DCVS_START: false
DSP_ENABLE_DCVS_DONE: true
HIGH_PERFORMANCE_MODE: true
DSP_SLEEP_LATENCY_START_US: 100
POWERMODE_START: SNPE_DSP_PERF_INFRASTRUCTURE_POWERMODE_ADJUST_UP_DOWN
POWERMODE_DONE: SNPE_DSP_PERF_INFRASTRUCTURE_POWERMODE_ADJUST_UP_DOWN
BUS_VOLTAGE_CORNER_MIN_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
BUS_VOLTAGE_CORNER_TARGET_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
BUS_VOLTAGE_CORNER_MAX_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
CORE_VOLTAGE_CORNER_MIN_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
CORE_VOLTAGE_CORNER_TARGET_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
CORE_VOLTAGE_CORNER_MAX_START: SNPE_DCVS_VOLTAGE_VCORNER_TURBO
BUS_VOLTAGE_CORNER_MIN_DONE: SNPE_DCVS_VOLTAGE_VCORNER_MIN_VOLTAGE_CORNER
BUS_VOLTAGE_CORNER_TARGET_DONE: SNPE_DCVS_VOLTAGE_VCORNER_MIN_VOLTAGE_CORNER
BUS_VOLTAGE_CORNER_MAX_DONE: SNPE_DCVS_VOLTAGE_VCORNER_MIN_VOLTAGE_CORNER
CORE_VOLTAGE_CORNER_MIN_DONE: SNPE_DCVS_VOLTAGE_VCORNER_MIN_VOLTAGE_CORNER
CORE_VOLTAGE_CORNER_TARGET_DONE: SNPE_DCVS_VOLTAGE_VCORNER_MIN_VOLTAGE_CORNER
CORE_VOLTAGE_CORNER_MAX_DONE: SNPE_DCVS_VOLTAGE_VCORNER_MIN_VOLTAGE_CORNER
DSP_SLEEP_LATENCY_DONE_US: 2000
Note:- These settings can be retrieved via the getter APIs after starting with a preset profile. Example:-
CAPI
Snpe_SNPEPerfProfile_Handle_t perfHandle = Snpe_SNPEPerfProfile_CreatePreset(SNPE_PERFORMANCE_PROFILE_BURST);
auto asyncVoteStatus = Snpe_SNPEPerfProfile_GetEnableAsyncVoting(perfHandle);
auto sleepLatencyStart = Snpe_SNPEPerfProfile_GetSleepLatencyStart(perfHandle);
auto busVoltageCornerMinDone = Snpe_SNPEPerfProfile_GetBusVoltageCornerMinDone(perfHandle);
C++ Wrapper API
SNPEPerfProfile perfProfile(BURST);
auto asyncVoteStatus = perfProfile.getEnableAsyncVoting();
auto sleepLatencyStart = perfProfile.getSleepLatencyStart();
auto busVoltageCornerMinDone = perfProfile.getBusVoltageCornerMinDone();