QNN GPU Tuning Mode Tutorial¶
Introduction¶
This tutorial demonstrates the usage of the QNN GPU Tuning Mode. The following steps introduce different components used for tuning and their respective configurations to tune various models.
Below is a representation of high level call-flow of the tuning mode:
Performance Cache
The performance cache file contains the mapping of each kernel to its performance numbers and currenty supported tunable parameters. Tunable parameters include:
- Initialization and Usage:
When a backend is initialized with custom config PerformanceCacheDir, the cache file will be created and stored in the given directory, unique to each device tier. Refer the file management below for more details on the cache file.
During kernel profiling, the model references these cached entries to avoid profiling the performance numbers, thereby minimizing the tuning time.
- File Management:
If the cache file does not exist in the specified directory, it will be created automatically based on the device the model is tuned.
If an existing cache file is found and InvalidatePerformanceCache is set to true, the file is deleted and a new one is created.
- Steps to set Custom Config for different modes (enable tuning, set performanceCache directory, invalidate performanceCache):
QnnGpuBackend_CustomConfig_t gpuTuningEnableConfig; QnnGpuBackend_CustomConfig_t gpuTuningPerformanceCacheConfig; QnnGpuBackend_CustomConfig_t gpuTuningInvalidatePerformanceCacheConfig; // enable tuning mode gpuTuningEnableConfig.option = QNN_GPU_BACKEND_CONFIG_OPTION_ENABLE_TUNING_MODE; gpuTuningEnableConfig.enableTuningMode = true; // set performanceCache directory gpuTuningPerformanceCacheConfig.option = QNN_GPU_BACKEND_CONFIG_OPTION_PERFORMANCE_CACHE_DIR; gpuTuningPerformanceCacheConfig.performanceCacheDir = "path_to_user_supplied_directory"; // invalidate performanceCache gpuTuningPerformanceCacheConfig.option = QNN_GPU_BACKEND_CONFIG_OPTION_INVALIDATE_PERFORMANCE_CACHE; gpuTuningPerformanceCacheConfig.invalidatePerformanceCache = false;
Note
Ensure that the cache file directory has the necessary permissions.
Set the invalidate flag only when you need to discard the outdated cache for a fresh cache build.
Ensure the tuning mode is enabled when performanceCacheDir or invalidatePerformanceCache are to be set.
Target Server
The target server facilitates the client-server relationship. The main operations are: compiling, profiling, and executing kernels. This client-server communication is done via tcp connection along with MsgPack to offload tasks.
- The server performs the following tasks:
Compile Kernel: compile an OpenCL kernel and return the compiled bin for on-host context serialization.
Profile Kernel: profile kernel variants for different tunable parameters and returns the most performant kernel.
Execute Kernel: executes finalize kernels on device (weights rearrangement, static kernels, etc.).
- The client performs the following tasks:
Connects to the target server
Processes the received data from the server.
- Steps to run Target Server:
$ adb forward tcp:22598 tcp:22598 $ adb push $SDK_ROOT/$DEVICE_TARGET/qnn-gpu-target-server $USER_SELECTED_LOCATION $ adb shell "$USER_SELECTED_LOCATION/qnn-gpu-target-server
Note
If the device is attached to a remote host, set the environment variable
TARGET_SERVER_SOCKET=<hostname>:<port>
Context Binary Generator
With the target server running, the user can now tune a specific model by generating a context binary cache on a different terminal.
$ qnn-context-binary-generator --model <qnn_model.so>
--backend <path_to_backend_library>/libQnnGpu.so \
--binary_file context_cache \
--config_file <path_to_JSON_of_backend_extensions>
This generates a context binary cache with the best performing kernels, resulting in an optimized model.
A sample json used in the context binary generation for tuning mode is provided below:
{
"tuning_mode": true,
"performance_cache_dir": "path_to_user_supplied_directory",
"invalidate_performance_cache": false
}
Note
- Known limitations
Current support only for QAIRT internal GPU OpPackage.
