API:Genie: Added GenieSampler_registerUserDataCallback API which adds a userData argument to the sampler custom callback. {130164}
API:Genie: Added GenieEngine.h, GenieDialog_getEngine, and GenieDialog_bindEngine APIs. {126715}
API:SNPE: Added Java API setUnconsumedTensorsOutput(), equivalent to the C/C++ builder API
Snpe_SNPEBuilder_SetUnconsumedTensorsAsOutputs() / SNPEBuilder::setUnconsumedTensorsAsOutputs(). {125891}
CPU: Added BOOL support in CPU Concat Op. {130940}
CPU: Added axes parameter support in L2Norm. {121463}
DSP:SNPE: Added the ability to display the exact priority of the HVX thread in the log to help identify potential issues related
to HVX concurrency scenarios. {117790}
Genie: Added KV quantization support for GenAiTransformer backend. {123438}
Genie: Added a LoRAv3 reference/sample Genie configuration to the SDK examples. {130008}
Genie: Added the Eaglet dialog type. {126452}
Genie: Added token-acceptance-rate to the GenieProfile output for some dialog types. {123350}
Genie: Introduced a performance optimization where logits are sampled using the native datatype output of the model. {121359}
HTP: Deprecated optrace collection via debug configuration files. Use optrace via profiling instead. {124739}
HTP: Fixed an issue where the number of items was missing in the multicore callback. {129636}
HTP: Implemented service call to do dspqueue_close for multicore environments. {126381}
HTP: Introduced parallel graph execution, enabling concurrent running of multiple graphs on a single HTP core to improve
throughput and resource utilization {89181}
HTP: Performance improvement for Softmax Op with 32 channels or less. {130819}
Op:GPU: Added support for GridSample Op. {127898}
Op:HTP: Optimized DepthWiseConv2d op execution by ensuring it runs on HMX {128655}
Op:HTP: Optimized DepthwiseConv op performance for an ASR model on SM8750 HTP W8A16. {129860}
OpDef: Added dynamic shape support for FullyConnected Op. {116235}
OpDef: Added optional parameter buffer_padding to Buffer Op. {125962}
Tool:Converter: Added support for BQ and LPBQ in JSON serializer and deserializer. {132650}
Tool:Converter: Added support for quantized DLC files as input to the quantizer module. 1. If all tensors are quantized or
overridden float, return directly. 2. If half-quantized DLC, dequantize the fixed-point tensors back to float before quantization.
3. Quantize all float tensors. {129135}
Tool:Converter: Added support to trigger Quantizer with float_fallback mode. {129131}
Tool:Converter: Fixed handling of dynamic input shapes with a more informative error message. {127631}
Tool:Converter: Introduced a new Converter argument to guide different Converter output export formats: –export_format
["DLC_DEFAULT", "DLC_STRIP_QUANT"] {129132}
Tool:Converter: QAIRT Quantizer now skips quantization steps if float_fallback is specified for an input Quant DLC. {130397}
Tool:qnn-onnx-converter: Added the –preserve_onnx_output_order option to maintain ONNX output order in the converted graph.
{126070}
QNN Core: Fixed an issue where QNN Savecontext failed for multiple models on Windows platforms due to the inability to find the
graph in the DLC. {130104}
CPU: Added int32 data datatype for ScatterElements. {126766}
CPU: Fixed L2Norm to handle multiple axis {127053}
CPU: Fixed verifier failures for single-layer resize models on ONNX16 framework. {124524}
CPU: Implemented deep copy of opConfig in CPU to prevent model failures. {128204}
DSP: Fixed an SNPE inference failure due to QnnContext_createFromBinary failing with a memory allocation error. {127804}
DSP: Fixed an SNPE inference failure where multiple models failed due to errors obtaining input tensor names {127809}
DSP: Fixed inference failures for specific models on HTP due to network partition issues. {131151}
GPU: Fixed accuracy error in QnnGpuOperationTestActivationAndroid. {125640}
GPU: Fixed accuracy error in QnnGpuOperationTestTransposeConvAndroid. {125992}
GPU: Fixed inference regressions in models having Convolution Op in gpu_fp16 mode for some devices. {120026}
Genie: Fixed issue in genie-t2t-run where dialog de-initialization data was not saved. {132621}
Genie: Fixed issue where GenieEmbedding_generate would return a rank of 0. {131581}
Genie: Fixed issue where quantized values may overflow or underflow. {125929}
HTP: Addressed inference time regressions on multiple chipsets for HTP and HTP_FP16 configurations. {128165}
HTP: Corrected the TransportResult resize function to properly set the number of cores. {132311}
HTP: Fixed a LayerNorm validation failure by checking rank of bias only if it’s present in LayerNorm Op. {106186}
HTP: Fixed a Windows compatibility issue related to non-shared weight VA reservation. {130567}
HTP: Fixed a crash in libQnnHtp.so that occurred in graph switch scenarios involving spill fill buffer sharing. {131575}
HTP: Fixed a deadlock in allocateAndMapPersistentSpillFillBuffer() that occurred due to locking conflicts. {132488}
HTP: Fixed a hang issue in GenAI TNR tests when using asynchronous group initialization with weight sharing and spill-fill sharing
with weight sharing. {132586}
HTP: Fixed a multithreaded concurrency issue with LLM and small models that caused a ‘memHandles registration failure’. {131051}
HTP: Fixed a performance regression for a MobileBERT model that was introduced in a previous release. {132111}
HTP: Fixed a prepare failure for the L2Norm op with fp16 when the relaxed_precision_flag is not set during converter stage.
{129566}
HTP: Fixed an issue where QNN HTP inference failed during MC detailed profiling. {132564}
HTP: Fixed an issue where multiple VA sharing groups caused the error ‘Unable to map reserved buffer for non-shared weights’.
{131009}
HTP: Fixed an issue where qnn-context-binary-generator would hang, consuming excessive CPU and memory. {126833}
HTP: Fixed intermittent hangs that occurred during the creation of a context from a binary in concurrent scenarios. {131049}
HTP: Fixed the checker failures related to the OpPackage example by correcting the include path. {130707}
HTP: Improved performance to address inference time regressions observed on multiple chipsets. {131073}
HTP: Resolved an issue related to spill-fill buffer sharing, which caused incorrect output. {124544}
HTP: Resolved an issue with x86_prepare failures during savecontext. High CPU utilization during graph preparation was addressed.
{125093}
HTP: Resolved failures in LoRA v2 test cases due to DSP transport call issues, impacting multi-model context and graph switch
scenarios. {130142}
HTP: Resolved inference time regressions on SM8750. Avoided broadcast overhead on mul_op to improve performance of uint16
elementwise multiplication. {125746}
HTP: Reverted the enablement of the 64-bit flag to address reported hangs. {130301}
HTP: Updated PGE support check to use support Features on SoC Model. {127754}
LPAI: Fixed a failure in LPAI direct mode {131750}
LPAI: Fixed an issue where LPAI single layer models were failing. {130729}
Op:DSP: Supported LayerNorm; modified the hard code check. {122112}
Op:HTP: Added 5D support for float Sigmoid. {128867}
Op:HTP: Addressed performance issues when converting models with w8a16 compared to w8a8 on SM8350 by optimizing matmul and Gemm
OPs. {121404}
Op:HTP: Fixed ReduceMax FP16 compilation error. {127900}
Op:HTP: Fixed a QNN context-binary-generator failure due to a TCM insufficient tile error when processing a custom model. {129510}
Op:HTP: Fixed context binary generation failures for ArgMin/ArgMax ops due to TCM overflow. {108763}
Op:HTP: Fixed model validation errors during context saving, specifically addressing issues with the DepthToSpace Op. {131083}
Op:HTP: Fixed numerical issue for DepthwiseConv2d -> HardSwish in a MobileNetV3 model. {128158}
Op:HTP: Fixed rank constraints of Op replacement rule. {130194}
Op:HTP: Improved DepthwiseConv2D performance. {126421}
Op:HTP: Optimized Reshape Ops when PCQ is enabled on constant tensors going into a MatMul Op, improving performance. {130415}
Op:HTP: Registered QInt16 for Concat Op to resolve graph preparation failures when using QuantInt16 tensors. {125735}
Op:HTP: Resolved an issue where context binary size calculation failed during graph preparation. {124130}
Op:HTP: Resolved an on-device hang issue during execution of Dynamic MobileNet V2, specifically during the Transpose Op {126806}
Op:HTP: Resolved context binary generation failures for the BevFormer model with AMP encodings. {129991}
SDK: Fixed build issues in Qnn SampleApp, Qnn SampleAppAsyncExecution and Qnn SampleAppSharedBuffer. {131442}
SDK: Removed “pytorch to onnx conversion avoidance suggestions” from QNN SDK Docs. {132125}
SDK: ReleaseNotes.txt renamed to QAIRT_ReleaseNotes.txt and now contains release notes for both Unix and WoS. {127817}
SNPE: Fixed API Snpe_SNPEBuilder_SetInitCacheMode()/SNPEBuilder::setInitCacheMode() breakage for non-HTP backends when using
the snpe-net-run option –enable_init_cache. {129545}
SNPE: Fixed the –enable_init_cache option (API SNPEBuilder::setInitCacheMode()/Snpe_SNPEBuilder_SetInitCacheMode()) in
net-run for AIP runtime. {131929}
Tool:Converter: Corrected an issue where qnn-context-binary-generator logged an incorrect QPC path when the –backend_binary
option was used. {126169}
Tool:Converter: Corrected the allowed length for pad amounts for 4D tensors in the emitter. {132185}
Tool:Converter: Enabled data invariant optimizations for the Tile Op. If the input of Tile Op is quantized, the input dataType and
qInfo are copied to the output. {126372}
Tool:Converter: Fixed Layout Transform to avoid unintentionally loading deferred weights. {132173}
Tool:Converter: Fixed a segfault issue in IrJsonDeserializer during deserialization of newly generated model JSON files. {129816}
Tool:Converter: Fixed an issue where Accuracy Evaluator runs failed at the Netrun stage. {129997}
Tool:Converter: Fixed an issue where FOLD_MULTIPLE_TRANSPOSE was incorrectly pruning graph outputs. {127963}
Tool:Converter: Fixed an issue where context binary generation failed with a ‘Graph Finalize failure’ when using multi-Qranium
pipelined partitioning. {124908}
Tool:Converter: Fixed an issue where qnn-context-binary generation failed for LVM UNet models due to tensor updateability and
GroupNorm Op validation errors with the HTP backend. {127887}
Tool:Converter: Fixed an issue where the qnn-context-binary-generator tool failed on Windows-X86 when processing LoRAv3 models.
{130894}
Tool:Converter: Fixed index error failure in remove identity optimization. {125867}
Tool:Converter: Fixed issue when folding multiple transposes to retain graph output names. {128685}
Tool:Converter: Resolved a serialization issue with MatMul ops involving int16*int16 data types when using dynamic 16-bit weights.
{129733}
Tool:Converter:ONNX: Added support for dynamic inputs for Clip Op. {124203}
Tool:Converter:ONNX: Fixed an issue in the Converter to ensure correct name sanitization following C++ naming conventions.
{129356}
Tool:Converter:ONNX: Fixed axis tracking in ScatterElements. {118614}
Tool:Converter:ONNX: Fixed issue for reverse GRU Op to ensure the correct order of input names for the first output. {130544}
Tool:Converter:ONNX: Updated translation for ExpandOp to reduce inference time. {127065}
Tool:qairt-accuracy-evaluator: Fixed issue where the input list was incorrectly passed to the quantizer. {130537}
Tool:qairt-accuracy-evaluator: - Added support for the ‘algorithms’ quantizer parameter in the evaluator. - Provided input shape
to the converter for PyTorch models. {126291}
Tool:qnn-accuracy-debugger: Enhanced the qnn-accuracy-debugger tool to provide more meaningful metrics for intermediate tensor
cosine similarity. {126437}
Tool:qnn-net-run: Resolved an issue in accuracy evaluator runs where the error “‘Namespace’ object has no attribute
‘preserve_graph_output_order’” was encountered. {132180}
Tool:qnn-onnx-converter: Aligned the ONNX Resize Op translator’s behavior with ONNX definitions. {123092}
Tool:snpe-architecture-checker: Fixed an issue where snpe-architecture-checker would fail due to an uninitialized variable.
{126778}
Tool:snpe-stress-net-run: Fixed a memory leak issue when loading QNN models. {128498}