QNN LPAI Setup & Configuration

Set up the environment variables

Set up your environment with the required SDK paths and configuration files. Use the following variables:

  • This includes setting paths to toolchains, libraries, and runtime binaries.

  • Key environment variables:

    • QNN_SDK_ROOT: Root directory of the QNN SDK installation.

    • PATH: Must include paths to QNN tools and binaries (e.g., $QNN_SDK_ROOT/bin).

    • LD_LIBRARY_PATH (Linux only): Must include paths to required shared libraries (e.g., $QNN_SDK_ROOT/lib).

Important

Ensure the following environment variables are set before using offline tools:

Linux Example:

export QNN_SDK_ROOT=/path/to/qnn_sdk
export PATH=$QNN_SDK_ROOT/bin/x86_64-linux-clang:$PATH
export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/x86_64-linux-clang:$LD_LIBRARY_PATH

Windows Example (Command Prompt):

set QNN_SDK_ROOT=C:\path\to\qnn_sdk
set PATH=%QNN_SDK_ROOT%\bin\x86_64-windows-msvc;%PATH%

Prepare the JSON configuration file

The configuration file defines both model generation and execution parameters for a specific LPAI hardware version.

  • The JSON file consists of two sections:

    • Model generation: Specifies how the model should be compiled for the target LPAI version.

    • Model execution: Defines runtime behavior, including memory allocation and device-specific settings.

  • Different Snapdragon platform may support different LPAI versions. Refer to the compatibility table at Supported Snapdragon Devices.

Create a configuration JSON file with model generation and execution parameters. Example:

{
   "lpai_backend": {
      "target_env": "adsp",
      "enable_hw_ver": "v6"
   }
}

QNN LPAI Backend Configuration Guide

This document outlines the structure and usage of LPAI backend configuration files employed by QNN tools such as qnn-net-run and qnn-context-binary-generator. These JSON-formatted files enable fine-grained control over model preparation, runtime behavior, debugging, profiling, and internal backend features.

Overview

There are two primary JSON configuration files:

  1. Backend Extension Configuration File Specifies the path to the LPAI backend extension shared library and the path to the LPAI backend configuration file.

    Example usage: --config_file <path_to_backend_extension_JSON>

    Example format:

    {
        "backend_extensions" : {
            "shared_library_path" : "path_to_Lpai_extension_shared_library",
            "config_file_path" : "path_to_Lpai_extension_config_file"
        }
    }
    
  2. LPAI Backend Configuration File Defines all configurable parameters for model generation and execution. This file is parsed by the LPAI backend extension library.

Configuration Schema

The configuration is organized into the following sections:

  • lpai_backend: Global backend settings.

  • lpai_graph : Graph generation and execution parameters.

  • lpai_profile: Profiling options (optional).

Each section and its parameters are described below.

lpai_backend

  • target_env (string):

    Target environment for model execution.

    Options: arm, adsp, x86 Default: adsp

  • enable_hw_ver (string):

    Hardware version of target refer to Supported Snapdragon Devices.

    Options: v5, v5_1, v6 Default: v6

lpai_graph

lpai_profile (Optional)

  • level (string): Profiling level: basic, detailed. Default: basic Lpai Profiling

QNN LPAI Backend Configuration Parameters

Fps and ftrt_ratio information

These parameters define how a client configures its processing behavior for eNPU hardware.

  • fps (Frames Per Second)
    • Specifies how frequently inference must be completed.

    • For example, fps = 10 means the system must process one frame every 100 milliseconds (i.e., 1000 ms / 10).

    • This sets the overall time budget for each frame, including pre-processing, inference, and post-processing.

  • ftrt_ratio (Factor to Real-Time Ratio)
    • Determines the hardware configuration to meet the latency requirement for inference.

    • If pre- and post-processing take up most of the frame time (e.g., 80 ms out of 100 ms), only 20 ms remain for inference.

    • To ensure inference completes within this reduced time window, the eNPU must be boosted.

    • Setting ftrt_ratio = 50 applies a multiplication factor of 5.0 to the base clock frequency, helping the eNPU meet the tighter latency constraint.

  • Default Values
    • fps = 1 (1 frame per second, allowing 1000 ms per frame)

    • ftrt_ratio = 10 (moderate clock scaling factor)

These defaults imply a relaxed processing schedule and a balanced performance-power tradeoff.

Realtime vs Non-Realtime client

  • Real-time: Indicates that the model is intended for real-time use cases, where a specific performance threshold must be met. If the required performance cannot be achieved, the finalize function will return an error.

  • Non-real-time: Refers to models without strict performance requirements. In these cases, LPAI will make a best-effort attempt to accommodate the workload, and finalize will not fail due to performance limitations.

Core Selection & Affinity

Any client can set its core selection and affinity setting to the eAI, which will be applied to the offloaded Ops of that client’s model. If the client does not set the core selection and affinity, the default is any core and soft affinity.

Core selection and affinity settings are defined in the following table:

Core Selection

Hard Affinity

Soft Affinity

core_0

Offloaded Ops shall be executed on core_0 only.

Offloaded Ops shall be executed on core_0 if it is available, otherwise on core_1 if available. Core availability is defined as whether the core is not currently executing any OP, and is determined at runtime.

core_1

Offloaded Ops shall be executed on core_1 only.

Offloaded Ops shall be executed on core_1 if it is available, otherwise on core_0 if available. Core availability is defined as whether the core is not currently executing any OP, and is determined at runtime.

Any

Offloaded Ops shall be executed on whichever core is available.

Offloaded Ops shall be executed on whichever core is available. Core availability is defined as whether the core is not currently executing any OP, and is determined at runtime.

Usage Guidelines

  • The recommendation on core selection and affinity is that for models with heavy computational workloads, it is better to use Core 1 (big core). For example, large convNets.

  • How to set the core affinity is a system question, as it requires customers to understand the system’s concurrency level, workload, expected KPI, and power, and to do profiling and tuning based on the overall integrated use case on the given system.

  • When it comes to integration, each customer could have their own guidelines. For example: - Dedicate Core 0 (small core) to audio use cases - Dedicate Core 1 (big core) to camera use cases - Share both cores across audio and camera use cases

  • Core 1 (big core) uses slightly more power than Core 0, but leads to shorter inference time.

  • Customers cannot shut down any of the cores, as this is not necessary—idle cores will be power collapsed automatically.

Full JSON Scheme

Below is a complete scheme of the LPAI backend configuration file with all supported parameters:

{
   "lpai_backend": {

      // Selection of targets [options: arm/adsp/x86] [default: adsp] (Simulator or target)
      // Used by qnn-context-binary-generator during offline generation
      "target_env": "adsp",

      // Corresponds to the LPAI hardware version [options: v5/v5_1/v6] [default: v6]
      // Used by qnn-context-binary-generator during offline generation
      "enable_hw_ver": "v6"
   },
   "lpai_graph": {
      "execute": {

         // Specify the fps rate number, used for clock voting [options: number] [default: 1]
         // Used by qnn-net-run during execution
         "fps": {"type": "integer"},

         // Specify the ftrt_ratio number [options: number] [default: 10]
         // Used by qnn-net-run during execution
         "ftrt_ratio": {"type": "integer"},

         // Definition of client type [options: real_time/non_real_time] [default: real_time]
         // Used by qnn-net-run during execution
         "client_type": {"type": "string"},

         // Definition of affinity type [options: soft/hard] [default: soft]
         // Used by qnn-net-run during execution
         "affinity": {"type": "string"},

         // Specify the core number [options: number] [default: 0]
         // Used by qnn-net-run during execution
         "core_selection": {"type": "integer"}
      }
   }
}

Full JSON Example

Below is a complete example of the LPAI backend configuration file with all supported parameters:

{
   "lpai_backend": {
      "target_env": "adsp",
      "enable_hw_ver": "v6"
   },
   "lpai_graph": {
      "execute": {
         "fps": 1,
         "ftrt_ratio": 10,
         "client_type": "real_time",
         "affinity": "soft",
      }
   }
}

Best Practices

  • Minimal Changes: Use default values unless specific tuning is required.

  • Validation: Ensure all values conform to expected types and allowed options.

  • Version Compatibility: Refer to the Supported Snapdragon Devices for supported LPAI versions.