Revision History

This page contains the change log revision history starting from QAIRT SDK v2.34.0. For details on earlier releases, please refer to ReleaseNotes.txt in QNN_SDK_ROOT for QNN revision history.

Version

Date

Description

2.39.0

Sep 2025

  • API:Genie: Added the GenieDialog_embeddingTokenQuery API. {148803}

  • API:Genie: Added the GenieDialog_setMaxNumTokens API. {146820}

  • API:HTP: Added a new HTP-specific property to support a detachable buffers feature. {148227}

  • API:HTP: Enhanced profiling capabilities to expose detailed timing information for each component during the graph preparation phase (QnnGraph_finalize). {143804}

  • API:HTP: Implemented a feature allowing read-only weights buffers to be detached and unmapped. {141354}

  • API:HTP: Introduced new APIs and configuration options to support a detachable buffers feature. {143832}

  • API:SNPE: Added new builder API for enabling accelerated HTP inititialization with a pre-prepared cache Snpe_SNPEBuilder_SetAcceleratedInit() / SNPEBuilder::setAcceleratedInit(). Support also added to snpe-net-run, snpe-throughput-net-run and snpe-parallel-run via cmd line argument –enable_htp_accelerated_init. {149873}

  • Docs: Updated documentation for qairt-accuracy-debugger to include support for the Windows on Snapdragon (WoS) platform, including updated help sections and sample commands. {149286}

  • Docs: Updated the LPAI documentation to include a summary of the required steps for model preparation. {142076}

  • Genie: Added new profiling option for collecting detailed trace events. {133638}

  • Genie: Added the GENIE_STATUS_ERROR_CONTEXT_EXCEEDED error code to provide a specific status when a prompt exceeds the model’s context length limit. {145721}

  • HTP: Added support for multi-graph switching, which allows multiple graphs to be loaded and retained in memory simultaneously. {139603}

  • HTP: Added support for several operator fusion patterns on the HTP backend, including combinations like Conv-Relu and Conv-Batchnorm-HardSwish. {125633}

  • HTP: Added support for the BFloat16 data type by including the necessary header and definitions in the HTP backend. {140994}

  • HTP: Minor performance improvement for benchmark models. {147751}

  • LPAI: Fixed an issue where the quantization process would incorrectly modify the offset specified in a quant.json file. {145916}

  • LPAI: Resolved an accuracy issue with audio context detection models on the LPAI backend. The issue was caused by incorrect bias quantization settings for convolution and GEMM operations. {146710}

  • Op:GPU: Added support for QNN_DATATYPE_INT_32 inputs to StridedSlice op. {142629}

  • Op:HTP: Added support for 6D variants of Cast, GatherElements, Pad, and StridedSlice with certain constraints. For GatherElements, input and index shapes must match except along the axis dimension. For Pad, padding is limited to dimensions 5D or smaller. For StridedSlice, slicing is limited to dimensions 5D or smaller, and some axis parameters are not supported. {147157}

  • Op:HTP: Enabled support for the SFIXED_POINT_16 data type for the Sqrt Op in QNN HTP Op validation flow. {142710}

  • OpDef: Added support for the RandomUniformLike Op. This includes the ONNX to QNN IR translation in the converter and the backend implementation. {138616}

  • OpDef: Updated the NonZero Op definition to clarify that it outputs -1 for padded values in static shapes. Also updated Gather and Scatter Ops to restrict index tensors to non-negative values, allowing -1 only as a sentinel value for indices generated from other Ops. {142505}

  • QNN: TFLite Delegate: Added support for the Broadcast_to Op. {149782}

  • Tool: Added native support for WoS to the Accuracy Evaluator tool. This includes updates to handle platform-specific file paths and resolves a file permission error in the SQuAD evaluation script on Windows. {136566}

  • Tool: Added support for multi-graph switching in qnn-net-run and qnn-throughput-net-run via the new custom configuration option graphs_retention_order. {145979}

  • Tool: Enabled support for the Windows on Snapdragon (WoS) platform in the accuracy debugger. Users can now debug models on WoS using both the CLI and Python API interfaces. {147963}

  • Tool:Converter: Added reference implementations for static tensor manipulation Ops, including Add, Mul, Sub, Div, Transpose, and Reshape. {133602}

  • Tool:Converter: Fixed a segmentation fault in qairt-converter that occurred during float fallback for models with external data. {147000}

  • Tool:Converter: Fixed an issue where FP16 constant tensors were not correctly interpreted at the Python layer. {147009}

  • Tool:Converter: Introduced new flags to provide fine-grained control over the IR optimizer passes. {135982}

  • Tool:Converter: RMSNorm node names now use either the common prefix of all matched nodes in the pattern or, if no common prefix exists, the output buffer name of the pattern. This replaces the previous rms_norm_i naming based on topological order. {146838}

  • Tool:Converter: Removed exception handling for 6D tensors in the converter. {144599}

  • API:HTA: Resolved an application crash that occurred when calling the QNN API to get the HTA device infrastructure for performance tuning. {146157}

  • DLC: Fixed issues within the DLC format when per-channel block quantization is employed on a multi-graph DLC. {138853}

  • GPU: Improved performance by updating heuristics for Pooling and Reduction Ops to better utilize hardware resources, addressing inference time regressions on some models. {147242}

  • Genie: Fixed an accuracy bug with cross-layer attention networks when the decoder block is a single context binary. {150908}

  • Genie: Fixed an issue that caused incorrect calculation of KV cache tensor sizes on the HTP backend, which could lead to segmentation faults. {148675}

  • Genie: Fixed an issue where no output was generated for certain models when the prompt prefill phase required multiple graph executions. {145896}

  • HTP: Enabled support for using the ScatterElements Op within LoRA-updatable models. {147845}

  • HTP: Fixed a checksum mismatch error that could occur during graph finalization for models using LoRa. {147901}

  • HTP: Fixed a crash that could occur during long-running stress tests involving VTCM sharing. {148064}

  • HTP: Fixed a graph finalization failure by adjusting the optimization pass order for certain Ops like Split and Unpack. {141064}

  • HTP: Fixed a memory leak that occurred in the HTP backend during repeated inference runs when performance profiling was enabled. {146627}

  • HTP: Fixed an Op package deregistration failure that could occur in specific multi-core use cases. {143977}

  • HTP: Fixed an issue preventing context binary generation for models using LoRA adapters where a MatMul operation of size 16x16 was present. {149711}

  • HTP: Fixed an issue that caused graph finalization failures for certain large models on specific SoCs. {147402}

  • HTP: Fixed an issue that caused incorrect error code translation when writing shared weight buffers. {147793}

  • HTP: Fixed an issue where applying a LoRA adapter binary would fail for multicore scenarios or float-precision graphs. {149995}

  • HTP: Fixed an issue where requesting a signed PD would fail on x86 simulation environments. The configuration is now ignored for x86, as it makes no difference in that context. {145651}

  • HTP: Fixed an occasional VTCM memory allocation error that could occur during context binary generation. {145879}

  • HTP: Optimized performance for a text encoder model by successfully applying MHA-to-SHA transformations, converting MatMuls to Convolutions, and ensuring correct quantization settings. {136947}

  • HTP: Resolved a failure in on-device context binary generation when using custom Ops. {147187}

  • HTP: Resolved an error where applying a LoRA adapter failed with the message “Apply cannot happen as context bin did not have serialized bin.” {149992}

  • HTP: Resolved an issue where using Op packages in multi-threaded applications could cause a QNN_OP_PACKAGE_ERROR_LIBRARY_ALREADY_INITIALIZED error, halting execution. {147431}

  • HTP: Resolved memory leaks observed under specific stress scenarios. {145181}

  • LPAI: Fixed an issue that caused the ADSP driver to fail to load on certain Windows on Snapdragon platforms. {149188}

  • Op:CPU: Fixed the Mod Op to align its calculation with the behavior of standard frameworks. {147060}

  • Op:CPU: Resolved an issue that caused model failures on the CPU backend when a quantized Div Op encountered a zero-valued divisor. {150630}

  • SDK: Optimized specific library functions on Windows by replacing parts of the C++ standard library with native Windows API calls, reducing the overall binary size. {150497}

  • SNPE:DSP: Resolved an issue where executing a model with a UDO package on the DSP backend could fail with a QNN_OP_PACKAGE_ERROR_LIBRARY_ALREADY_INITIALIZED error. {135967}

  • Tool: Fixed an input parsing issue in the ModelModifierArchChecker tool. {144884}

  • Tool: Resolved an issue where qnn-accuracy-debugger would fail with a FileNotFoundError when using a compiled model (–stage compiled). {149891}

  • Tool:Compiler: Fixed an issue in the context binary generator where a SpaceToDepth Op adjacent to a graph input could cause an error. {147548}

  • Tool:Converter: Enabled support for dynamic 16-bit weights by default in qairt-converter and qairt-quantizer. This resolves an issue where an unnecessary Convert Op was inserted for MatMul weights, which previously led to increased model size and reduced accuracy. A new –disable_dynamic_16_bit_weights flag has been added to revert to 8-bit conversion if needed. {147008}

  • Tool:Converter: Fixed a bug in the quantizer where node-squashing logic could fail for nodes that were both a graph output and had inputs with multiple consumers. {136028}

  • Tool:Converter: Fixed a bug that could cause a ‘Duplicate buffer name’ error during certain graph optimizations. {145690}

  • Tool:Converter: Fixed a fatal “access violation” exception that occurred when running the ONNX converter on WoS devices. {149750}

  • Tool:Converter: Fixed an issue with generating quantization encodings for models containing LSTM or GRU layers. {146424}

  • Tool:Converter: Fixed an issue with handling dynamic inputs for the slope tensor in the PReLU Op. {145599}

  • Tool:Converter: Fixed an issue with the LoRA model conversion flow where certain graph optimization passes were not being applied consistently. {150868}

  • Tool:Converter: Fixed incorrect weight broadcasting behavior in the RMSNorm and LayerNorm fusion patterns within the ONNX converter. {124105}

  • Tool:Converter: Resolved an issue where certain graph optimizations could incorrectly remove a tensor that was also a graph output. {150933}

  • Tool:qairt-tool: Added support for Clip,SpaceToDepth,Relu Ops in mha2sha-v2 {149759}

  • Models with very large buffers (~1 GB or more) can abort during execution with “Could not create context from binary” due to FastRPC mapping failures {148198}

2.38.0

Aug 2025

  • API: Generalized the qairt.transform API to support multiple, interchangeable transformation implementations. {138775}

  • API:GPU: Added support for the QNN_GPU_PRECISION_USER_PROVIDED precision mode to the GPU backend extension API, allowing users to specify custom precision settings for a graph. {142096}

  • API:Genie: Added GENIE_NODE_IMAGE_ENCODER_IMAGE_FULL_ATTN_MASK and GENIE_NODE_IMAGE_ENCODER_IMAGE_WINDOW_ATTN_MASK node inputs. {145051}

  • Genie: Added a source code example for genie-t2e-run to the SDK. {144427}

  • Genie: Added embeddingQuery support for offline embeddings in genie-app. {146044}

  • Genie: Added engine sharing support for models used across different dialogs, currently available for the HTP backend and applicable to basic and SSD dialogs. {147585}

  • Genie: Added support for encoder-decoder models in Gen AI Transformer. {136070}

  • HTP: Improved performance and reduced memory usage for certain vision models by removing redundant space_rearrange operations from the graph. {141570}

  • HTP: Removed the -ffast-math compiler flag from the build configuration to prevent potential numerical inconsistencies and improve accuracy alignment for floating-point operations. {139547}

  • Op:CPU: Added support for the Logit Op. {136656}

  • Op:GPU: Added support for INT32 data type inputs to the ArgMax Op on the GPU backend. {133989}

  • Op:GPU: Added support for the CumulativeSum Op. {38682}

  • Op:HTP: Added backend support for the STFT Op. {134956}

  • Op:HTP: Added documentation for dynamic dimension constraints in HTP Op definitions. {143878}

  • Op:HTP: Added support for Int32 ElementWiseAbs and ElementWiseUnary with Abs operation. {138856}

  • Op:HTP: Added support for signed int16 data type in Unpack Op validation. {142708}

  • Op:HTP: Enabled support for the 5D Cast Op. {143121}

  • Op:HTP: Enabled support for the 5D GatherElements Op with non-zero axis values. {143123}

  • Op:HTP: Enabled support for the 5D Pad Op with a constant padding scheme for FP16 and FP32 data types. {143122}

  • OpDef: Added Op definition for STFT {134955}

  • OpDef: Added support for int32 and UFIXEDPOINT8 data types for the RandomUniformLike Op. {146810}

  • QNN: TFLite Delegate: Added support for the Broadcast_to Op. {138848}

  • QNN:HTP: Enabled Quant & Dequant Op between FP32 and QINT16 op validator {141056}

  • SDK: Added a new RandomUniformLike Op definition and reference implementation to align with the ONNX specification. {134859}

  • SDK: Enhanced OEM control over QNN priority levels, allowing more flexible configuration of graph execution priorities on HTP backend. {126262}

  • SNPE: Added documentation for low-level performance APIs under “Tutorials and Examples”, “Application Tips” {145899}

  • Tool: Added the ability to debug a specific subgraph by introducing two new command-line options: –debug_subgraph_inputs and –debug_subgraph_outputs. These options allow specifying the input and output tensors that define the subgraph to be analyzed. {127762}

  • Tool: Introduced a new Network Specialization module and API to programmatically convert and optimize models with multiple graph configurations into a single DLC file. This replaces the previous command-line-only workflow. {108571}

  • Tool:Converter: Added support for the Logit Op. {138107}

  • Tool:Converter: Added support for the ONNX RandomUniformLike Op. {134348}

  • Tool:Converter: Added support for the ONNX STFT Op. {134349}

  • Tool:Converter: Added support for the `STFT Op in the ONNX converter. {138613}

  • Tool:Converter: Added support for the buffer_padding parameter in the Buffer Op. {128998}

  • Tool:Converter: Enhanced the converter to automatically apply a float-fallback quantization behavior for models that contain Quantize-Dequantize nodes or are provided with quantization overrides (e.g., for LoRA). {139341}

  • Tool:Converter: First version (v0.1) of the QAIRT Quantization Specification is released which supports 2.0.0 schema version for quantization overrides file. {114160}

  • DSP: Significantly improved performance for models with a batch size greater than one by optimizing the 5D Reshape-Transpose-Gather pattern in the backend. {140837}

  • GPU: Improved inference performance for select models in GPU FP16 mode on certain chipsets. {144204}

  • Genie: Added the missing ‘type’ field to the sampler.json configuration example. {138004}

  • Genie: Fixed a regression in Eaglet token generation rate. {145608}

  • Genie: Fixed a segmentation fault caused by uninitialized variables. {144692}

  • Genie: Fixed a segmentation fault that occurred when running LLM models with the genie-t2t-run tool. {147760}

  • Genie: Fixed an issue loading lm_head or LoRA adapters on Windows platforms. {143661}

  • Genie: Fixed an issue where paused queries with LUT encoder models could not resume. {145135}

  • Genie: Fixed an issue where prompt templates were not applied when GenieEmbedding_generate outputs were truncated. {143445}

  • Genie: Fixed memory leaks occurring during GenieDialog_applyLora. {136542}

  • HTP: Added support for casting from uint8 to fp16 to resolve an accuracy issue where uint8 was incorrectly interpreted during a cast to a float type. {135317}

  • HTP: Enabled support for asynchronous context initialization in multi-core environments. {138427}

  • HTP: Fixed a memory corruption crash that could occur in multi-threaded applications during deinitialization. {144587}

  • HTP: Fixed a segmentation fault that occurred when using asynchronous initialization on multi-core HTP configurations. {138335}

  • HTP: Fixed an accuracy issue that produced incorrect output when using LPBQ. {146380}

  • HTP: Fixed an issue where models would crash or hang on the HTP backend when the inference batch size was greater than one. {144574}

  • HTP: Fixed an issue where the deviceGetPlatformInfo API returned incorrect SoC information when using the non-RPC path. {141569}

  • HTP: Implemented a fix to prevent a CDSP crash when Virtual Address space is exhausted during memory allocation. {145909}

  • HTP: Resolved an intermittent failure in asynchronous execution mode that could lead to errors {138318}

  • HTP: Resolved an issue on certain platforms where a failure to lock the HMX context could cause a DMA execution failure. {138289}

  • HTP: Resolved execution failures for certain models in Gen AI corner cases. {129730}

  • HTP: Significantly improved performance for models using grouped TransposeConv2d by enabling an optimization that was previously restricted to operations with zero padding. {143544}

  • Op:HTP: Added support for FP32 weight-only quantization in fully connected layers. {131398}

  • Op:HTP: Fixed NullRequant Op registration failure when using w16 and per-channel quantization. {145523}

  • Op:HTP: Fixed a crash in PoolAvg2d Op when reducing NxM inputs to 1x1 with padding and count_pad=0. {131311}

  • Op:HTP: Fixed a crash occurring during GroupNorm fusion. {130501}

  • Op:HTP: Fixed a runtime failure during context creation when a spill_fill_buffer was configured. {143863}

  • Op:HTP: Fixed an accuracy issue in ElementWiseAdd Op when broadcasting a constant zero. {143254}

  • Op:HTP: Fixed an accuracy issue in FP16 models caused by a faulty SlicePad_shape->Transpose graph optimization rule. {145638}

  • Op:HTP: Improved performance of the ReduceSum Op for FP16 data types by ensuring a faster, optimized implementation is used. {143158}

  • Op:HTP: Resolved a performance regression affecting model execution. {145191}

  • Op:HTP: Resolved accuracy issue in Gather Op for depth=1 cases. {134448}

  • Op:HTP: Resolved performance regressions for select models. {143809}

  • SNPE: Added support for the –optimization_preset option in snpe-dlc-graph-prepare and enabled online preparation via platform options. {135223}

  • SNPE: Fixed an issue where setting HTP graph optimization levels in online preparation did not support distinct optimization levels for different SNPE instances. {142940}

  • SNPE: The snpe-dlc-info tool now displays input, output, and unconsumed tensors in topologically sorted order. {146793}

  • Tool: Fixed an accuracy regression that could occur in certain models due to an incorrect start index calculation in a transpose operation. {144858}

  • Tool: Fixed an issue where block quantized convolution with special dimensions could cause preparation failures. {144994}

  • Tool: Resolved an issue where snpe-parallel-run-cpp would crash when used with the –userbuffer_memorymapped argument. {119102}

  • Tool:Converter: Fixed a bug in Expand Op translation caused by incorrect data type population. {141810}

  • Tool:Converter: Fixed a bug in sink_transpose optimization where a transpose node could be consumed twice by the same node. {140535}

  • Tool:Converter: Fixed a bug that introduced redundant Convert nodes before LSTM/GRU nodes during mixed precision conversion. {145617}

  • Tool:Converter: Fixed an axis tracking issue in ONNX PRelu Op that could cause incorrect broadcasting. {142728}

  • Tool:Converter: Fixed an issue where 0D tensors were incorrectly retained as 1D tensors by propagating scalar tensor information as needed. {141899}

  • Tool:Converter: Fixed an issue where models with extremely small, near-zero quantization scale values (e.g., 1e-35) would fail during inference on the CPU backend. {127367}

  • Tool:Converter: Fixed an issue where the –float_bitwidth option could incorrectly update non-quantizable tensors. {145723}

  • Tool:Converter: Fixed an issue where the second input tensor of MatMul nodes from QDQ models was not correctly quantized. {136049}

  • Tool:Converter: Fixed an issue with encoding population in LayerNorm pattern matching. {141265}

  • Tool:Converter: Fixed issue where squashable elementwise operations following convolution operations caused errors when encodings of the convolution’s weights/bias were provided. {85485}

  • Tool:Converter: Improved validation in Resize optimization to prevent errors when invalid scale values are provided. {138778}

  • Tool:Converter: Resolved a model conversion failure for large ONNX models caused by excessive memory consumption. {122217}

  • Tool:Converter: Resolved an issue where recent updates to the model converter caused excessive memory consumption during graph serialization, leading to failures when creating context binaries for large models. {136952}

  • Tool:Converter: Squashed identity Expand and Tile nodes in the graph to remove redundant operations. {144693}

  • Tool:Converter: Updated the logic for matching RmsNorm patterns to improve pattern recognition. {146093}

2.37.0

July 2025

  • QNN HTP opdef supplement doc updated with descriptions of use of QNN_DEFINITION_IMPL_GENERATED encoding definition. {127977}

  • API:GPU: Added support for the Qnn_DeviceHandle_t argument in the QnnContext_create API. {123584}

  • API:GPU: Added support for the Qnn_GlobalConfig API. {135731}

  • Genie: Added an async command to genie-app allowing for execution of asynchronous statements. {137243}

  • Genie: Added support for non-updatable quantization (NUQ) and grouped LoRA adapters. {138782}

  • Genie: Added the cache-groups JSON configuration option allowing for the sliding window attention (SWA) cache management policy. {135552}

  • Genie: Introduced the SSD dialog “branch-mode” config option with “top-1” and “all-expand” supported values. {134925}

  • Genie: Added Eaglet dialog support for dual head draft models. {134373}

  • Genie:API: Added GENIE_NODE_IMAGE_ENCODER_IMAGE_POS_SIN and GENIE_NODE_IMAGE_ENCODER_IMAGE_POS_COS node inputs. {133935}

  • HTP: Support LoRA weights sharing feature by extracting updatable weights across all graphs into a shared blob. {126930}

  • HTP: Added Support for QAIRT Block Ops Stateful LSTM, Stateful GRU & Buffer Ops for FP16 precision {125048}

  • HTP: Added support for VA Reservation on Windows platforms. {138341}

  • HTP: Support LoRA weights sharing feature by extracting updatable weights across all graphs into a shared blob. {128558}

  • Op:GPU: Added support for the GatherND Op on the GPU backend. {61057}

  • OpDef: Added Op definition for IsNaN. {135847}

  • QNN: Fixed html documentation broken links for SNPE documentation URL “Qualcomm Neural Processing SDK” under Overview -> Integration workflow and in the tutorial for Utilizing DLCs. {143420}

  • Tool: Lora Creator: Added support for any kernel shape for Conv in Lora Branch. This removes limitation where only 1x1 Conv was supported. {140575}

  • Tool:Converter: Added support for SparseConvolution2D. {118014}

  • Tool:Converter: Optimized Lora Importer for non-updatable quantization (NUQ). {127586}

  • Tool:Converter: Resolved performance regression on CPU/DSP backends by removing redundant clip operations in the TFLite converter; now, clip is only added when required based on fused_activation_function. {123581}

  • Tool:Genie: Added support for GenieEmbedding APIs in genie-app. {123549}

  • Fixed for wrongly freeing rpc memory allocation for lora adapter in scenarios where context had multiple graphs. {138835}

  • Fixed lora weight tensor names not found issue when graph transformation involved {136062}

  • QNN Docs: Corrected html docs for qnn-net-run command line argument –output to –output_dir {144805}

  • SNPE Tools: snpe/qairt dlc-info fixed to display the correct graph optimization level for HTP cache records generated via API Snpe_SNPEBuilder_SetInitCacheMode() / SNPEBuilder::setInitCacheMode() or net-run option –enable_init_cache {142514}

  • Support is added for Conv2D ops with reuse_space_indices parameter defined. Prepare/graph finalization failures will be prevented. {143040}

  • Tool Update: [Converter]: Few performances regression observed on CPU/DSP backends and fixed by removing redundant clip operations in the TFLite converter; now, clip is only added when required based on fused_activation_function. {141085}

  • Fixed updatable attribute tracking error for torch models {145158}

  • CPU: Fixed quantization issues for large models by correcting the softmax Op implementation. {140260}

  • CPU: Resolved an issue with axis permutation for BW_AXIS_SCALE_OFFSET quantization encoding in Conv operations. {138266}

  • DLC: Fixed small memory leak in DLC based initialization in SNPE and QNN. made to track it {135810}

  • Genie: Fixed a crash when running SSD or SPD dialog types on certain Linux platforms. {137954}

  • Genie: Fixed an out of bounds read issue observed on uint16 embedding LUTs. {144801}

  • Genie: Fixed issue where first context binary split does not contain sufficient information about graph variants to properly initialize the KV$ Manager. {136530}

  • Genie: Fixed issue where the draft model EOS token was not set causing an Eaglet initialization failure. {145057}

  • Genie: Fixed minor memory leaks. {136813}

  • Genie: Fixed segmentation fault when graph switching is enabled along with memory mapping. {143826}

  • HTP: Fixed a deadlock issue that could cause the qnn-throughput-netrun application to hang under stress conditions. {142471}

  • KI: In QNN HTP BE, update on the prepare sequence is causing a regression on some specific models. This will be fixed in the next release (2.36) {136438}

  • Op:HTP: Optimized qu16 Dequantize op {136231}

  • Op:HTP: Optimized the pattern of TransposeConv2d-Dequantize pattern near Output. {134467}

  • Op:HTP: Optimized the pattern of TransposeConv2d-Dequantize pattern near Output. {136219}

  • Op:HTP: Reduced preparation time for 5D operations with large batch sizes. {130280}

  • SNPE: Fixed a crash in snpe-throughput-net-run when the container argument was not specified before certain optional arguments. {141598}

  • Tool: Calibration Input Validation, Quantizer Params, Input Type Conversion handled for HTP Memory Pipeline {138064}

  • Tool: Fixed a failure in the memory pipeline when filtered inference schemas were non-sequential. {142391}

  • Tool: Ordered ONNX Runtime outputs based on output name to resolve issues in memory pipeline inference. {136967}

  • Tool: Removed backend_info from Quantizer params to resolve issue in memory pipeline compilation {136586}

  • Tool: Updated params access way of pydantic object to resolve preserve_io_datatype issue in memory pipeline {144331}

  • Tool:Converter: Added support for Layernorm with multiple normalization dimension {137898}

  • Tool:Converter: Added support for matching new GeLU Op patterns that include Reshape operations to addsress an issue where semantic search models failed conversion with AutoMHA2SHA. {139465}

  • Tool:Converter: Fixed a bug in the Conv/MatMul quantizer optimization to ensure safe indexing. {142845}

  • Tool:Converter: Resolved performance regression on CPU/DSP backends by removing redundant clip operations in the TFLite converter; now, clip is only added when required based on fused_activation_function. {140762}

  • Tool:Converter: Updated conv node’s weight/bias naming during BatchNorm fusion to resolve quantization parameter naming conflicts. {139997}

  • Tool:Converter: Added support for a new pattern in RMSNORM pattern matching {134922}

  • Tool:Converter: Added fix to remove injected ops blocking supergroups {134113}

  • Tool:Converter: Fixed accuracy drop in models having shared biases {134589}

  • Tool:Converter: Updated Tensor Name Sanitization Logic in {141135}

  • Tool:Converter: Updated gamma and beta shape of Layernorm Onnx Op {130934}

  • Tool:Converter:TFLite: Add support for int64 quantized bias {140882}

  • Tool:Converters: Fixed issue of LayerNorm pattern mismatch. {137459}

  • Tool:Converters: Supported dynamic bias to ConvOp. {142223}

  • Tool:qairt-accuracy-evaluator: Fixed inclusion of converter params in execcution summary {140752}

  • Tool:qairt-accuracy-evaluator: Limit parallel qnn x86 evaluations to1 {138075}

  • Tool:snpe-net-run: Fixed a dynamic resizing issue in Conv op when using the –input_dimensions option. {142139}

  • Tools:Converters: Reduced conversion time for large models with more than 10000 ops. {135822}

2.36.0

June 2025

  • API: Added LLM support in the Python API. {118016}

  • API: Added support for quantizer-specific options in the Converter Python API, including parameters for act_quantizer_schema, param_quantizer_schema, and target_backend. These options are now available through the CalibrationConfig object, improving feature parity with the command-line interface. {136135}

  • API: Added support for the Baichuan2-7b model through the high-level Generative AI Python API, enabling both builder and executor workflows. {126702}

  • API: Added support for the Phi-3.5-mini model through the high-level Generative AI Python API, enabling both builder and executor workflows. {138126}

  • API: Added support for the Qwen2-7b model through the high-level Generative AI Python API, enabling both builder and executor workflows. {132444}

  • API: Enabled the generation and consumption of JSON profiling data on Windows platforms. Users can now utilize the profiling capabilities of the Python API on Windows on Snapdragon (WoS) systems. {138647}

  • API: Introduced a model conversion capability to modify the Auto-Regression (AR) number and Context Length (CL) of ONNX-based language models. This allows for flexible adaptation of models to different deployment requirements. {123570}

  • API:Genie: Introduced Genie Dialog and Embedding APIs to set and get performance policy. {137070}

  • API:HTP: Added support for ContextFinalize for the HTP backend, enhancing context management capabilities. {136699}

  • API:HTP: Implemented a URI Builder abstraction to simplify the programmatic construction of FastRPC URIs used for opening sessions with the HTP backend. {110797}

  • Core: Added custom Op support to oe–gcc11.2 and oe-gcc 9.3 toolchains for QNN OP Package Support on LE Target for HTP. {130471}

  • Docs: Updated the LoRAv2 tutorial to indicate support for Windows operating systems in both offline and online workflows. {138772}

  • Genie: Added skip-lora-validation option to reduce LoRA adapter switch time by allowing skipping of LoRA CRC checks on QnnHtp engines. {134913}

  • Genie: Added experimental support for the arm64x-windows-msvc platform. {129093}

  • Genie: Added support for Non-Updateable Quantization (NUQ) and Grouped LoRA, allowing LoRA adapter groups to share encoding bins and supporting non-updateable quant adapters. {138782}

  • Genie: Added support for pausing and resuming active queries using a signal API, introducing an architecture for resuming paused queries in SSD and basic dialogs. {119704}

  • Genie: Added support for profiling and logging of GenieEngine APIs, enabling measurement of switch time, creation time, and other metrics. {131908}

  • Genie: Added support for repetition penalties in sampling within the Genie Sampler. {118081}

  • HTP: Added support for HTP online graph preparation optimization level via platform options. {138420}

  • HTP: Added validation to reject Per-Graph-Execution (PGE) configurations that specify incompatible features such as shared spill/fill buffers or VTCM backup sharing. A warning is now issued to prevent these unsupported setups. {128832}

  • HTP: Enabled 64-bit UDMA support in QNN HTP, allowing access to memory beyond 4GB for large neural networks, and implemented shared-weights far mapping. {91520}

  • HTP: Enabled multi-context spill/fill buffer sharing for QNX. {128061}

  • HTP: Enhanced the HTP backend polling mechanism to support separate polling contexts and threads for each execution priority level. This design improves performance and resource management for multithreaded applications that concurrently run graphs with different priorities. {131859}

  • LPAI: Added support for LPAI backend RPC mode and QNN_GRAPH_ERROR_EARLY_TERMINATION in qnn-throughput-net-run. {121599}

  • Op:CPU: Added support for Sparse Convolution 2D. {120883}

  • Op:CPU: Updated the Cast Op to correctly map NaN (Not a Number) inputs to True when casting floating-point values to BOOL8, aligning with ONNX implementation. {136649}

  • Op:HTP: Added support for the MaskedSoftmax Op on the HTP backend for LLM use cases. {110661}

  • Op:LPAI: Added support for the frame_pad parameter to the Buffer Op on the LPAI backend. {128999}

  • OpDef: Added an optional parameter reuse_sparse_indices to the Conv2d Op, with default support for AIC, GPU, HTA, and LPAI backends. {118012}

  • SDK: Introduced QAIRT_SDK_ROOT as the new primary environment variable for setting the SDK path. The previous QNN_SDK_ROOT and SNPE_ROOT variables are now deprecated and will be removed in a future release. For backward compatibility, they are currently set based on QAIRT_SDK_ROOT. {121206}

  • Tool: Enhanced layerwise debugging tools to accept externally provided “golden” reference outputs for comparison. This allows users to supply their own reference data. A new option to disable layout transformation during this process has also been added to accommodate various data sources. {122717}

  • Tool:Converter: Added support for the new Einsum equation nkctv,kvw->nctw, expanding the range of supported ONNX models. {126231}

  • Tool:Converter: Added support to serialize disconnected model inputs (dangling inputs) from the source framework into the DLC file. {139058}

  • Tool:Converter: Defer loading is now enabled by default for the ONNX converter to improve memory usage and processing time. To disable this feature, use the new –onnx_disable_defer_loading flag for the QAIRT converter or the –disable_defer_loading flag for the QNN/SNPE ONNX converter. {139858}

  • Tool:Converter: Enabled support for the –defer_loading option in the QNN ONNX converter when generating C++/binary outputs. This feature, which was previously unsupported for this output format, helps reduce memory consumption and processing time during conversion. {139859}

  • Tool:Converter: Removed a limitation in the ONNX converter that previously prevented using defer loading (–onnx_defer_loading) and ONNX model simplification in the same conversion. Both features can now be used simultaneously. {116422}

  • Tool:Converter:ONNX: Added support for the ONNX Size Op, which outputs the total number of elements of an input tensor as an int64 scalar. {138523}

  • API: Fixed a bug in the converter input configuration where the data type of the first input was incorrectly applied to all other inputs. {137113}

  • API: Fixed a bug in the model-level API where a typo in an internal variable could cause issues with input list file generation. {137830}

  • API: Fixed an issue in the Quantizer API where parsing an input list file containing comment lines (e.g., lines starting with ‘%’) could fail. {136414}

  • API: Fixed an issue where the GenAIExecutor would return invalid performance metrics, such as -1 or 0 for timing and tokens per second. {137575}

  • API: Reduced excessive warning messages generated by qairt.compile by correcting an internal log level configuration. {137628}

  • API: Refactored the Python API to ensure model configuration files (config.json) can be loaded correctly using standard methods like autoconfig.from_pretrained. {131057}

  • API:CPU: Fixed an issue where graph composition for the CPU backend would fail with an OpConfig validation error for the Transpose Op, particularly when using the float_precision=16 conversion option. {138242}

  • CPU: Fixed an issue where certain models failed during inference due to an invalid layer parameter value resulting from a GroupNorm operation failure. {135924}

  • Core: Improved model initialization time on the HTP backend by optimizing internal system calls during runtime setup. {136899}

  • Genie: Fixed LM head execution for split LEQ models during the last iteration of prefill. {139824}

  • Genie: Fixed a memory leak in the tokenizer implementation observed when running genie-t2t-run with the LoRA adapter. {130865}

  • Genie: Fixed an issue where LLM inference could produce random or incorrect output. {124867}

  • Genie: Fixed sampling for float16 models which would produce nonsensical response text. {134604}

  • Genie: Reduced peak RAM by removing unnecessary copies for embedding LUT encoders when running embeddings on CPU, addressing high memory usage for longer prompts. {134506}

  • Genie: Resolved a crash in the Genie runtime that occurred when using non-empty stop sequences in a dialogue query. {138311}

  • HTA: Fixed a segmentation fault that could occur when executing a cached model on the HTA backend if a subgraph fell back to the DSP backend. {127808}

  • HTP: Fixed a performance regression on the HTP backend that affected certain transformer models, including those using masked softmax. {137554}

  • HTP: Fixed an accuracy regression for models using the ResizeNearestNeighbour Op. The fix adapts the HTP backend to handle updated quantization parameters resulting from an improved CPU backend implementation of the Op. {116566}

  • HTP: Fixed an issue that prevented the DSP driver from loading correctly for multicore execution on Android. {135235}

  • HTP: Fixed memory deregistering failures in GenAI use cases by deallocating unused tensor buffers after inference completion in async mode. {129731}

  • HTP: Resolved a performance regression on the HTP backend that affected both synchronous and asynchronous inference modes for certain models. {137386}

  • HTP:Op: Fixed ElementwiseFloorDiv name mismatch. {135158}

  • LPAI: Fixed an accuracy regression for models using asymmetric parameter quantization. A change was introduced to correctly handle the –param_quantizer_schema flag, which may require users to update their quantization settings. When a tensor’s encoding is symmetric, the quantizer schema must now be set to unsignedsymmetric to ensure correct behavior. {138453}

  • Op:CPU: Fixed a dynamic bias issue in the DepthwiseConv2d Op that caused a segmentation fault with the QNN CPU backend. {137313}

  • Op:CPU: Fixed a memory leak in the Expand Dims Op by ensuring the freeing of space created for axis data. {138049}

  • Op:CPU: Fixed an issue by adding INT8 support for GroupNorm Op. {135932}

  • Op:DSP: Fixed a performance regression by preventing an unnecessary Reshape Op from being added by the LogSoftmax implementation when its input and output shapes are identical. {137013}

  • Op:HTP: Added 5D rank constraints for Softmax and Conv Ops, resolving an issue with ExecuTorch QNN Delegate model preparation. {137462}

  • Op:HTP: Fixed an accuracy drop in the HTP backend’s GridSample Op that occurred with multi-batch inputs (batch size > 1). {134663}

  • Op:HTP: Fixed an accuracy regression in the HTP backend implementation of the DepthToSpace Op. This change restores the behavior to align with previous versions, resolving potential output deviations for models utilizing this operation. {139578}

  • Op:HTP: Resolved an accuracy issue where models using the Concat Op on the HTP backend could produce different and less accurate results when running without the –debug flag in qnn-net-run. {134084}

  • Tool: Fixed an issue where an incorrect offset was generated during the dequantization of tensors with signed symmetric, per-channel encodings. {137056}

  • Tool: Resolved a segmentation fault that could occur in the qnn-context-binary-generator tool during the QnnContext_free call. {139746}

  • Tool:Converter: Added support for GRU Op quantization, specifically enabling quantization for LPAI backend by optimizing static inputs. {126350}

  • Tool:Converter: Corrected an issue that could lead to accuracy regressions on the LPAI backend for models using 4-bit activation quantization. The SDK now correctly enforces the use of 8-bit activation quantization, as 4-bit is not supported on the LPAI backend. {137976}

  • Tool:Converter: Enabled enableQnnQuant flag for Resize Op in-out optimization, resolving issues with Nearest Neighbor and Bilinear modes. {137641}

  • Tool:Converter: Fixed a bug in the Converter tool that ensures the correct order of input and output tensors in the QNN graph JSON file during serialization, aligning them with the IR graph. {118500}

  • Tool:Converter: Fixed a corner case in the Expand Op pattern matching, specifically resolving an issue in the Squash Tile Unsqueeze optimization that led to incorrect shape inference for multi-consumer cases. {136864}

  • Tool:Converter: Fixed a log print format issue that affected accuracy when converting LLM models with maskedsoftmax. {137471}

  • Tool:Converter: Fixed an issue where Batch Normalization (BN) scales and offsets were not correctly obtained from QDQ models, ensuring proper application of BN parameter encodings. {129578}

  • Tool:Converter: Fixed an issue where ONNX Logsoftmax Opset11 would add unnecessary reshapes, leading to extra transpose operations, even when input/output shapes were identical. {137545}

  • Tool:Converter: Fixed an issue where per-Block/per-Channel encodings were not correctly applied for weights during QAIRT conversion, resolving the inability to quantize DLC with 4-bit BQ weights. {134363}

  • Tool:Converter: Fixed an issue where using multiple Static Tensor nodes in a single graph would fail due to duplicate output tensor names. {136080}

  • Tool:Converter: Fixed an issue with merging Mul and Add operations into Batchnorm by correcting pattern definitions and adding validation checks. {136756}

  • Tool:Converter: Reduced converter memory and time usage by avoiding unnecessary access to tensor weights. {137665}

  • Tool:Converter: Removed the beartype import in the PyTorch converter. {134045}

  • Tool:Converter: Resolved an issue in the Layout Transform post-optimization where a node could be incorrectly squashed multiple times, causing incorrect broadcasted output shapes for certain Reshape and Transpose operations. {139382}

  • Tool:Converter: Updated tensor name sanitization logic to ensure uniqueness and prevent conflicts, resolving issues like “Compose Graph failed: Sigmoid Tensor already exists”. {135409}

  • Tool:Converter:ONNX: Enhanced support for the If Op in the ONNX converter to allow subgraphs with multiple outputs. {136721}

  • Tool:Converter:ONNX: Resolved a NameError in the quantizer tool that occurred due to a missing internal logging function. {140893}

  • Tool:Quantizer: Resolved an issue in the quantizer to correctly apply per-channel quantization for grouped ConvTranspose Ops. {136585}

  • Tool:qnn-context-binary-generator: Enhanced qnn-context-binary-generator to precompute and validate adaptation weight metadata paths, allowing early error detection for erroneous LoRA config contents and avoiding long wait times. {126629}

  • Tool:qnn-model-lib-generator: Redirected error logs to stderr and all other logs to stdout. {135807}

2.35.0

May 2025

  • API: Added LLM support in the Python API. {118016}

  • API:Genie: Added a data-alignment-size configuration option for dialog and embeddings APIs. {130270}

  • API:Genie: Introduced the GeniePipeline.h and GenieNode.h APIs, providing multimodal support. {123389}

  • API:Genie: Introduced the GenieTokenizer.h API. {126408}

  • API:HTP: Added support for new memory buffer types (QNN_HTP_MEM_WEIGHTS_BUFFER and QNN_HTP_MEM_SCRATCH_BUFFER) in the QnnMem_register and QnnMem_deregister APIs. {121766}

  • API:HTP: Introduced API changes to support external weights and spillfill buffers. {121760}

  • CPU: Added Phi 3 and Phi 3.5 model configurations to the Genie SDK. {134117}

  • CPU: Added dangling inputs support in Graph. {134280}

  • Core: Added platform information to the JSON output of the context binary utility. {129905}

  • Docs: Updated QNN/SNPE documentation to include QCS8625 in the list of supported Snapdragon devices. {134450}

  • Genie: Added support for use-mmap on Windows platforms. {116519}

  • Genie: Enabled support for multi-modal inference with low latency through the GenIE pipeline, supporting various input/output modalities and utilizing shared embedding weights. {120507}

  • Genie: Removed printing of KPIs to stdout, favoring use of GenieProfile. {123352}

  • HTP: Added initial support for multi-core weight sharing during deserialization, including functions to handle VA allocation for weights per core and passing multi-core metadata. {124612}

  • HTP: Added multicore weight sharing support during deserialization to map shared weights to different cores without requiring VA reservations. {135411}

  • HTP: Added support for configuring extended_udma prepare time. {136435}

  • HTP: Added support for measuring end-to-end latency in the runtime. {98570}

  • HTP: Added support for the QNN_HTP_CONTEXT_CONFIG_OPTION_DEFER_GRAPH_INIT context configuration option to postpone graph-related tasks. {130605}

  • HTP: Added support for the QNN_HTP_CONTEXT_GET_PROP_BUFFER_START_ALIGNMENT context property to retrieve buffer start alignment. {134678}

  • HTP: Added support for the usage of external weights and scratch buffers on the HTP backend. {121767}

  • HTP: Added support to save the transport result for multicore transport during async execution. {132146}

  • HTP: Enabled support for dynamic input and output resolution for SD3 on the HTP backend. {105781}

  • HTP: Enabled the mmap budget feature for WoS to reduce peak RAM usage during context initialization for GenAI use cases. {131070}

  • HTP: Extended binary format support for spill/fill to include external buffers. {136017}

  • HTP: Implemented buffer size calculations for the HTP backend, including consideration for graph selection and calculation of maximum spill/fill buffer size. {121765}

  • HTP: Updated the Throughput Net Run (TNR) application to utilize thread_pool utilities for thread management. {113123}

  • Op:CPU: Added dynamic dimension support for AvgPool2D. {126775}

  • Op:CPU: Added dynamic dimension support for InstanceNorm Op. {101384}

  • Op:CPU: Added support for the ‘frame_pad’ parameter in Buffer Op. {133242}

  • Op:GPU: Added support for the Cast operation from INT64 to INT32 on Windows. {132750}

  • Op:HTP: Added INT16 support for the ElementWiseAsin Op on the HTP backend. {114479}

  • Op:HTP: Added support for the MaskedSoftmax Op on the HTP backend for LLM use cases. {110661}

  • Op:HTP: Implemented performance optimizations for the Score Filter and NMS operations on the HTP backend. {134740}

  • OpDef: Added Op definition for IsInf. {125370}

  • SDK: Added an option to enable optrace profiling in the TNR application. {135588}

  • SDK: Enabled SNPE, QNN, and QNN delegate support for the QCM8550 platform. {129533}

  • Tool:Converter: Added dynamic weights support for the Deconv Op in TensorFlow models. {109713}

  • Tool:Converter: Added support for Add, Subtract, Multiply, and Divide operations in Float32 precision for static tensor manipulation within the G2G IR. {125540}

  • Tool:Converter: Added support for ONNX 1.16.1 in the Ubuntu 20.04 (Focal) environment. {134975}

  • Tool:Converter: Added support for the Size operation and updated Relu opset versions in the ONNX converter to address unsupported operations in certain models. {133472}

  • Tool:Genie: Introduced the genie-app command-line utility. {123548}

  • Tool:HTP: Added support for the HTP MCP Binary format in the QnnHtpBinaryBufferPrinter tool, enabling proper parsing and printing of MCP binaries. {128507}

  • API: Allowed passing extra arguments through the Python API’s ConverterConfig to underlying modules. {133985}

  • API: Fixed an encodings path issue during the build phase with GenAI models using the Python API. {133815}

  • API: Fixed an issue where quantized and compiled models failed during execution with the Python API when using default CalibrationConfig values. {134858}

  • API: Fixed an issue where the QAIRT Python API failed to load backend libraries (QnnCpu.dll/QnnHtp.dll) on certain devices. {134461}

  • API: Fixed an issue with the JSON reader setting in QNN profiling on Windows. {134565}

  • CPU: Fixed a memory management issue for xnnpack Conv2D nodes. {132710}

  • CPU: Fixed an issue where certain models failed during inference due to an invalid layer parameter value resulting from a GroupNorm operation failure. {135924}

  • Core: Fixed cross SoC compatibility issues caused by unsynchronized GpuInfo fields between SocServer and SocUtility. {135786}

  • DSP: Fixed a context binary generation issue on OE Linux Platform. {124376}

  • DSP: Fixed an issue where snpe-net-run failed due to an unavailable runtime. {135399}

  • DSP: Fixed inference time regressions observed on HTP_FP16 and HTP backends by propagating DSP architecture characteristics to the HTP core. {133777}

  • GPU: Resolved model verification failures encountered with certain CNN models on the GPU backend, related to Conv Kernel processing. {130041}

  • Genie: Fixed an asynchronous initialization issue for Windows platforms. {135904}

  • Genie: Fixed an issue where GenieDialog_save/restore could not be used with GENIE_DIALOG_SENTENCE_REWIND. {135558}

  • Genie: Fixed an issue where GenieProfiling data could report invalid initialization time data. {134498}

  • Genie: Fixed an issue where stop sequences did not work with GenieDialog_embeddingQuery. {134592}

  • HTP: Adjusted max PD size calculation to correctly account for far weights, resolving an issue with unexpected secondary PD triggers during specific test conditions. {127268}

  • HTP: Fixed a Stability issue with Llama 3 3B multicore models by updating the method for setting the mc_spill_fill buffer. {135253}

  • HTP: Fixed a crash occurring in multicore graphs due to incorrect identification of spillfill memory pools by the Hexagon NN API. {135543}

  • HTP: Fixed an issue where qnn-net-run failed to open a session due to library loading and device transport instance creation errors. {135028}

  • HTP: Fixed an issue where core information was not correctly captured in optrace for multicore execution. {133797}

  • HTP: Fixed an out-of-memory issue occurring when running Llama 3 8B models on a single core without splitting. {134696}

  • HTP: Fixed async execution failures observed while running certain models in a multicore configuration with shared buffers. {135047}

  • HTP: Fixed logic in graph switching to prevent a bug. {133794}

  • HTP: Fixed multicore async inference failures, including issues observed with Zero copy. {134701}

  • HTP: Improved model execution time performance on SM8750, addressing an issue where the execution time KPI was not being met. {128145}

  • HTP: Resolved a graph execution failure issue observed during the async_group_init_llama7b_graph_switch_no_shared_resources test. {126402}

  • HTP: Resolved an issue causing incorrect mapping of test failures in nightly reports. {125884}

  • HTP: Resolved an issue leading to a “Failed to deregister ion memory with the backend” log message during multi-threaded HTP binary execution with shared buffers. {129716}

  • HTP: Resolved differences in adapter switch time between Genie and qnn-net-run by addressing issues related to graph switching and power settings. {131776}

  • Op:CPU: Fixed TransposeConv2d for asymmetric kernels in Float execution. {133778}

  • Op:CPU: Fixed an issue by adding INT8 support for GroupNorm Op. {135932}

  • Op:GPU: Fixed accuracy errors with the ReduceSum operation when used with Image2DArray for non-Mean ops and specific dimensions. {131616}

  • Op:GPU: Fixed inference failures in models with Argmax/Argmin Ops. {133052}

  • Op:HTP: Added support for LayerNorm when the constant input is FP16 converted to FP32. {131420}

  • Op:HTP: Enabled UINT_8 datatype support for the StridedSlice Op on the HTP backend, resolving model conversion and graph preparation failures. {125597}

  • Op:HTP: Fixed accuracy issue for GatherNd Op. {110126}

  • Op:HTP: Fixed an accuracy issue with LPBQ convolution for MOE on v73. {133134}

  • Op:HTP: Fixed an issue where the Genie output resulted in an infinite loop with WoS by updating the prompt file. {134680}

  • Op:HTP: Fixed an issue with high power consumption for DepthwiseConv op with asymmetric stride by optimizing the pattern on the HTP backend. {133635}

  • Op:HTP: Improved accuracy of the Swish Op. {133898}

  • Op:HTP: Improved performance of the MatMul Op running on HVX. {135210}

  • Op:HTP: Improved the performance of the 5D GridSample Op on the HTP backend for W8A16 quantization. {122831}

  • Op:HTP: Improved the performance of the GridSample Op on the HTP backend by addressing tiling and scheduling issues. {126462}

  • SDK: Fixed an issue where some models failed at the concat operation during graph preparation. {132887}

  • Tool: Added a validation check for float fallback to prevent quantizer failures when encodings or calibration lists are not provided. {133463}

  • Tool: Added support for the –onnx_batch and –tensorflow_batch options in Hypertuner after QAIRT converter changes. {131064}

  • Tool: Eliminated a misleading warning message “Function not called, PrepareLib isn’t loaded!” that would appear when running qnn-net-run successfully on HTP. {122382}

  • Tool: Fixed an issue where the is_symmetric value for 32-bit bias tensors was incorrectly reset during Float Fallback, causing failures when the output DLC was passed back to the quantizer. {135379}

  • Tool: Fixed quantizer to insert Convert Op for LayerNorm weights with external encoding. {134466}

  • Tool: Resolved an issue where snpe-dlc-graph-prepare failed for certain models due to incompatible float bitwidths when QParams were present, particularly in the float fallback path. {130558}

  • Tool:Converter: Added a fix for a bug in LayerNorm squeeze_axes. {126234}

  • Tool:Converter: Added a pattern to map to expand op to reduce inference time. {132363}

  • Tool:Converter: Added a warning message for the Non-Zero Op when the output shape is dynamic. {126185}

  • Tool:Converter: Added support for a new einsum equation, expanding the range of supported ONNX models. {133824}

  • Tool:Converter: Converter-generated FullyConnected Ops now have 2D input and 2D output. {127049}

  • Tool:Converter: Ensured that ApplyEncodings is called by the quantizer when –use_quantize_v2 is provided internally, even if not on the command line. {133705}

  • Tool:Converter: Fixed JSON dumping for 4-bit quantized tensors. {133481}

  • Tool:Converter: Fixed KernelScale expansion for scalars in TFLite DeConv dequantization. {128978}

  • Tool:Converter: Fixed a bug in NonZero Op translation constant folding. {127165}

  • Tool:Converter: Fixed a bug in the squash_node_into_nn_node optimization. {126354}

  • Tool:Converter: Fixed a conversion error that occurred when –float_bitwidth 16 was provided on the command line with existing quantization parameters. {134716}

  • Tool:Converter: Fixed a corner case in the DCE process in the converter to correctly handle node removal based on the number of consumers of output tensors. {129704}

  • Tool:Converter: Fixed an error in the squash_node_into_nn_node optimization. {132836}

  • Tool:Converter: Fixed an issue where output nodes for BatchMatMul and BatchMatMulV2 Ops were missing by adding support to convert them to FullyConnected Op. {127139}

  • Tool:Converter: Fixed an issue where the converter failed when using the –desired_input_layout argument with the new layout transform algorithm by unifying its behavior with custom_io. {136144}

  • Tool:Converter: Fixed an issue with 6D support for Concat and Constant Ops in the frontend, resolving a core dump error during quantization. {117698}

  • Tool:Converter: Fixed incorrect population of the “is_symmetric” flag, ensuring encodings are dumped correctly. {134673}

  • Tool:Converter: Fixed issue observed when several GRU share one init hidden status, add UT for bidirectional GRU. {91127}

  • Tool:Converter: Resolved an accuracy regression issue related to the squash_batchnorm optimization in the converter by ensuring the optimization correctly handles encodings. {130130}

  • Tool:Converter: Skipped adding dummy weights and bias tensors during LayerNorm pattern matching. {128870}

  • Tool:Converter:ONNX: Added a fix for axis_format handling in matmul_to_fc translation. {118318}

  • Tool:Converter:ONNX: Fixed a model conversion issue with the Resize operation in the ONNX converter. {131677}

  • Tool:Converter:ONNX: Fixed an ONNX conversion failure for the Sam2 Image Encoder model by addressing layout format issues for Matmul node inputs and outputs. {131098}

  • Tool:Op:HTP: Optimized the DepthwiseConv op with asymmetric stride to improve performance for specific models. {132474}

  • Tool:accuracy_debugger: Corrected a tensor shape issue for the oneshot algorithm with ONNX batch=1; the onnx_batch override option is no longer accessible. {133915}

  • Tool:qairt-accuracy-evaluator: Removed the preproc-file option from the Accuracy Evaluator CLI as it is no longer valid due to the deprecation of minimal mode. {129278}

  • Tool:qnn-onnx-converter: Fixed an issue where static tensor framework trace information was missing for some tensors. {120982}

  • Tool:qnn-tensorflow-converter: Added logic to ensure the min-max in TensorFlow FakeQuantPerChannel nodes are symmetric. {118672}

  • Tool:quantizer: Fixed an issue with 2-bit weight quantization calculation, resolving incorrect output values. {132048}

2.34.0

April 2025

  • API:Genie: Added GenieSampler_registerUserDataCallback API which adds a userData argument to the sampler custom callback. {130164}

  • API:Genie: Added GenieEngine.h, GenieDialog_getEngine, and GenieDialog_bindEngine APIs. {126715}

  • API:SNPE: Added Java API setUnconsumedTensorsOutput(), equivalent to the C/C++ builder API Snpe_SNPEBuilder_SetUnconsumedTensorsAsOutputs() / SNPEBuilder::setUnconsumedTensorsAsOutputs(). {125891}

  • CPU: Added BOOL support in CPU Concat Op. {130940}

  • CPU: Added axes parameter support in L2Norm. {121463}

  • DSP:SNPE: Added the ability to display the exact priority of the HVX thread in the log to help identify potential issues related to HVX concurrency scenarios. {117790}

  • Genie: Added KV quantization support for GenAiTransformer backend. {123438}

  • Genie: Added a LoRAv3 reference/sample Genie configuration to the SDK examples. {130008}

  • Genie: Added the Eaglet dialog type. {126452}

  • Genie: Added token-acceptance-rate to the GenieProfile output for some dialog types. {123350}

  • Genie: Introduced a performance optimization where logits are sampled using the native datatype output of the model. {121359}

  • HTP: Deprecated optrace collection via debug configuration files. Use optrace via profiling instead. {124739}

  • HTP: Fixed an issue where the number of items was missing in the multicore callback. {129636}

  • HTP: Implemented service call to do dspqueue_close for multicore environments. {126381}

  • HTP: Introduced parallel graph execution, enabling concurrent running of multiple graphs on a single HTP core to improve throughput and resource utilization {89181}

  • HTP: Performance improvement for Softmax Op with 32 channels or less. {130819}

  • Op:GPU: Added support for GridSample Op. {127898}

  • Op:HTP: Optimized DepthWiseConv2d op execution by ensuring it runs on HMX {128655}

  • Op:HTP: Optimized DepthwiseConv op performance for an ASR model on SM8750 HTP W8A16. {129860}

  • OpDef: Added dynamic shape support for FullyConnected Op. {116235}

  • OpDef: Added optional parameter buffer_padding to Buffer Op. {125962}

  • Tool:Converter: Added support for BQ and LPBQ in JSON serializer and deserializer. {132650}

  • Tool:Converter: Added support for quantized DLC files as input to the quantizer module. 1. If all tensors are quantized or overridden float, return directly. 2. If half-quantized DLC, dequantize the fixed-point tensors back to float before quantization. 3. Quantize all float tensors. {129135}

  • Tool:Converter: Added support to trigger Quantizer with float_fallback mode. {129131}

  • Tool:Converter: Fixed handling of dynamic input shapes with a more informative error message. {127631}

  • Tool:Converter: Introduced a new Converter argument to guide different Converter output export formats: –export_format ["DLC_DEFAULT", "DLC_STRIP_QUANT"] {129132}

  • Tool:Converter: QAIRT Quantizer now skips quantization steps if float_fallback is specified for an input Quant DLC. {130397}

  • Tool:qnn-onnx-converter: Added the –preserve_onnx_output_order option to maintain ONNX output order in the converted graph. {126070}

  • QNN Core: Fixed an issue where QNN Savecontext failed for multiple models on Windows platforms due to the inability to find the graph in the DLC. {130104}

  • CPU: Added int32 data datatype for ScatterElements. {126766}

  • CPU: Fixed L2Norm to handle multiple axis {127053}

  • CPU: Fixed verifier failures for single-layer resize models on ONNX16 framework. {124524}

  • CPU: Implemented deep copy of opConfig in CPU to prevent model failures. {128204}

  • DSP: Fixed an SNPE inference failure due to QnnContext_createFromBinary failing with a memory allocation error. {127804}

  • DSP: Fixed an SNPE inference failure where multiple models failed due to errors obtaining input tensor names {127809}

  • DSP: Fixed inference failures for specific models on HTP due to network partition issues. {131151}

  • GPU: Fixed accuracy error in QnnGpuOperationTestActivationAndroid. {125640}

  • GPU: Fixed accuracy error in QnnGpuOperationTestTransposeConvAndroid. {125992}

  • GPU: Fixed inference regressions in models having Convolution Op in gpu_fp16 mode for some devices. {120026}

  • Genie: Fixed issue in genie-t2t-run where dialog de-initialization data was not saved. {132621}

  • Genie: Fixed issue where GenieEmbedding_generate would return a rank of 0. {131581}

  • Genie: Fixed issue where quantized values may overflow or underflow. {125929}

  • HTP: Addressed inference time regressions on multiple chipsets for HTP and HTP_FP16 configurations. {128165}

  • HTP: Corrected the TransportResult resize function to properly set the number of cores. {132311}

  • HTP: Fixed a LayerNorm validation failure by checking rank of bias only if it’s present in LayerNorm Op. {106186}

  • HTP: Fixed a Windows compatibility issue related to non-shared weight VA reservation. {130567}

  • HTP: Fixed a crash in libQnnHtp.so that occurred in graph switch scenarios involving spill fill buffer sharing. {131575}

  • HTP: Fixed a deadlock in allocateAndMapPersistentSpillFillBuffer() that occurred due to locking conflicts. {132488}

  • HTP: Fixed a hang issue in GenAI TNR tests when using asynchronous group initialization with weight sharing and spill-fill sharing with weight sharing. {132586}

  • HTP: Fixed a multithreaded concurrency issue with LLM and small models that caused a ‘memHandles registration failure’. {131051}

  • HTP: Fixed a performance regression for a MobileBERT model that was introduced in a previous release. {132111}

  • HTP: Fixed a prepare failure for the L2Norm op with fp16 when the relaxed_precision_flag is not set during converter stage. {129566}

  • HTP: Fixed an issue where QNN HTP inference failed during MC detailed profiling. {132564}

  • HTP: Fixed an issue where multiple VA sharing groups caused the error ‘Unable to map reserved buffer for non-shared weights’. {131009}

  • HTP: Fixed an issue where qnn-context-binary-generator would hang, consuming excessive CPU and memory. {126833}

  • HTP: Fixed intermittent hangs that occurred during the creation of a context from a binary in concurrent scenarios. {131049}

  • HTP: Fixed the checker failures related to the OpPackage example by correcting the include path. {130707}

  • HTP: Improved performance to address inference time regressions observed on multiple chipsets. {131073}

  • HTP: Resolved an issue related to spill-fill buffer sharing, which caused incorrect output. {124544}

  • HTP: Resolved an issue with x86_prepare failures during savecontext. High CPU utilization during graph preparation was addressed. {125093}

  • HTP: Resolved failures in LoRA v2 test cases due to DSP transport call issues, impacting multi-model context and graph switch scenarios. {130142}

  • HTP: Resolved inference time regressions on SM8750. Avoided broadcast overhead on mul_op to improve performance of uint16 elementwise multiplication. {125746}

  • HTP: Reverted the enablement of the 64-bit flag to address reported hangs. {130301}

  • HTP: Updated PGE support check to use support Features on SoC Model. {127754}

  • LPAI: Fixed a failure in LPAI direct mode {131750}

  • LPAI: Fixed an issue where LPAI single layer models were failing. {130729}

  • Op:DSP: Supported LayerNorm; modified the hard code check. {122112}

  • Op:HTP: Added 5D support for float Sigmoid. {128867}

  • Op:HTP: Addressed performance issues when converting models with w8a16 compared to w8a8 on SM8350 by optimizing matmul and Gemm OPs. {121404}

  • Op:HTP: Fixed ReduceMax FP16 compilation error. {127900}

  • Op:HTP: Fixed a QNN context-binary-generator failure due to a TCM insufficient tile error when processing a custom model. {129510}

  • Op:HTP: Fixed context binary generation failures for ArgMin/ArgMax ops due to TCM overflow. {108763}

  • Op:HTP: Fixed model validation errors during context saving, specifically addressing issues with the DepthToSpace Op. {131083}

  • Op:HTP: Fixed numerical issue for DepthwiseConv2d -> HardSwish in a MobileNetV3 model. {128158}

  • Op:HTP: Fixed rank constraints of Op replacement rule. {130194}

  • Op:HTP: Improved DepthwiseConv2D performance. {126421}

  • Op:HTP: Optimized Reshape Ops when PCQ is enabled on constant tensors going into a MatMul Op, improving performance. {130415}

  • Op:HTP: Registered QInt16 for Concat Op to resolve graph preparation failures when using QuantInt16 tensors. {125735}

  • Op:HTP: Resolved an issue where context binary size calculation failed during graph preparation. {124130}

  • Op:HTP: Resolved an on-device hang issue during execution of Dynamic MobileNet V2, specifically during the Transpose Op {126806}

  • Op:HTP: Resolved context binary generation failures for the BevFormer model with AMP encodings. {129991}

  • SDK: Fixed build issues in Qnn SampleApp, Qnn SampleAppAsyncExecution and Qnn SampleAppSharedBuffer. {131442}

  • SDK: Removed “pytorch to onnx conversion avoidance suggestions” from QNN SDK Docs. {132125}

  • SDK: ReleaseNotes.txt renamed to QAIRT_ReleaseNotes.txt and now contains release notes for both Unix and WoS. {127817}

  • SNPE: Fixed API Snpe_SNPEBuilder_SetInitCacheMode()/SNPEBuilder::setInitCacheMode() breakage for non-HTP backends when using the snpe-net-run option –enable_init_cache. {129545}

  • SNPE: Fixed the –enable_init_cache option (API SNPEBuilder::setInitCacheMode()/Snpe_SNPEBuilder_SetInitCacheMode()) in net-run for AIP runtime. {131929}

  • Tool:Converter: Corrected an issue where qnn-context-binary-generator logged an incorrect QPC path when the –backend_binary option was used. {126169}

  • Tool:Converter: Corrected the allowed length for pad amounts for 4D tensors in the emitter. {132185}

  • Tool:Converter: Enabled data invariant optimizations for the Tile Op. If the input of Tile Op is quantized, the input dataType and qInfo are copied to the output. {126372}

  • Tool:Converter: Fixed Layout Transform to avoid unintentionally loading deferred weights. {132173}

  • Tool:Converter: Fixed a segfault issue in IrJsonDeserializer during deserialization of newly generated model JSON files. {129816}

  • Tool:Converter: Fixed an issue where Accuracy Evaluator runs failed at the Netrun stage. {129997}

  • Tool:Converter: Fixed an issue where FOLD_MULTIPLE_TRANSPOSE was incorrectly pruning graph outputs. {127963}

  • Tool:Converter: Fixed an issue where context binary generation failed with a ‘Graph Finalize failure’ when using multi-Qranium pipelined partitioning. {124908}

  • Tool:Converter: Fixed an issue where qnn-context-binary generation failed for LVM UNet models due to tensor updateability and GroupNorm Op validation errors with the HTP backend. {127887}

  • Tool:Converter: Fixed an issue where the qnn-context-binary-generator tool failed on Windows-X86 when processing LoRAv3 models. {130894}

  • Tool:Converter: Fixed index error failure in remove identity optimization. {125867}

  • Tool:Converter: Fixed issue when folding multiple transposes to retain graph output names. {128685}

  • Tool:Converter: Resolved a serialization issue with MatMul ops involving int16*int16 data types when using dynamic 16-bit weights. {129733}

  • Tool:Converter:ONNX: Added support for dynamic inputs for Clip Op. {124203}

  • Tool:Converter:ONNX: Fixed an issue in the Converter to ensure correct name sanitization following C++ naming conventions. {129356}

  • Tool:Converter:ONNX: Fixed axis tracking in ScatterElements. {118614}

  • Tool:Converter:ONNX: Fixed issue for reverse GRU Op to ensure the correct order of input names for the first output. {130544}

  • Tool:Converter:ONNX: Updated translation for ExpandOp to reduce inference time. {127065}

  • Tool:qairt-accuracy-evaluator: Fixed issue where the input list was incorrectly passed to the quantizer. {130537}

  • Tool:qairt-accuracy-evaluator: - Added support for the ‘algorithms’ quantizer parameter in the evaluator. - Provided input shape to the converter for PyTorch models. {126291}

  • Tool:qnn-accuracy-debugger: Enhanced the qnn-accuracy-debugger tool to provide more meaningful metrics for intermediate tensor cosine similarity. {126437}

  • Tool:qnn-net-run: Resolved an issue in accuracy evaluator runs where the error “‘Namespace’ object has no attribute ‘preserve_graph_output_order’” was encountered. {132180}

  • Tool:qnn-onnx-converter: Aligned the ONNX Resize Op translator’s behavior with ONNX definitions. {123092}

  • Tool:snpe-architecture-checker: Fixed an issue where snpe-architecture-checker would fail due to an uninitialized variable. {126778}

  • Tool:snpe-stress-net-run: Fixed a memory leak issue when loading QNN models. {128498}