API: Added LLM support in the Python API. {118016}
API:Genie: Added a data-alignment-size configuration option for dialog and embeddings APIs. {130270}
API:Genie: Introduced the GeniePipeline.h and GenieNode.h APIs, providing multimodal support. {123389}
API:Genie: Introduced the GenieTokenizer.h API. {126408}
API:HTP: Added support for new memory buffer types (QNN_HTP_MEM_WEIGHTS_BUFFER and QNN_HTP_MEM_SCRATCH_BUFFER) in the
QnnMem_register and QnnMem_deregister APIs. {121766}
API:HTP: Introduced API changes to support external weights and spillfill buffers. {121760}
CPU: Added Phi 3 and Phi 3.5 model configurations to the Genie SDK. {134117}
CPU: Added dangling inputs support in Graph. {134280}
Core: Added platform information to the JSON output of the context binary utility. {129905}
Docs: Updated QNN/SNPE documentation to include QCS8625 in the list of supported Snapdragon devices. {134450}
Genie: Added support for use-mmap on Windows platforms. {116519}
Genie: Enabled support for multi-modal inference with low latency through the GenIE pipeline, supporting various input/output
modalities and utilizing shared embedding weights. {120507}
Genie: Removed printing of KPIs to stdout, favoring use of GenieProfile. {123352}
HTP: Added initial support for multi-core weight sharing during deserialization, including functions to handle VA allocation for
weights per core and passing multi-core metadata. {124612}
HTP: Added multicore weight sharing support during deserialization to map shared weights to different cores without requiring VA
reservations. {135411}
HTP: Added support for configuring extended_udma prepare time. {136435}
HTP: Added support for measuring end-to-end latency in the runtime. {98570}
HTP: Added support for the QNN_HTP_CONTEXT_CONFIG_OPTION_DEFER_GRAPH_INIT context configuration option to postpone graph-related
tasks. {130605}
HTP: Added support for the QNN_HTP_CONTEXT_GET_PROP_BUFFER_START_ALIGNMENT context property to retrieve buffer start alignment.
{134678}
HTP: Added support for the usage of external weights and scratch buffers on the HTP backend. {121767}
HTP: Added support to save the transport result for multicore transport during async execution. {132146}
HTP: Enabled support for dynamic input and output resolution for SD3 on the HTP backend. {105781}
HTP: Enabled the mmap budget feature for WoS to reduce peak RAM usage during context initialization for GenAI use cases. {131070}
HTP: Extended binary format support for spill/fill to include external buffers. {136017}
HTP: Implemented buffer size calculations for the HTP backend, including consideration for graph selection and calculation of
maximum spill/fill buffer size. {121765}
HTP: Updated the Throughput Net Run (TNR) application to utilize thread_pool utilities for thread management. {113123}
Op:CPU: Added dynamic dimension support for AvgPool2D. {126775}
Op:CPU: Added dynamic dimension support for InstanceNorm Op. {101384}
Op:CPU: Added support for the ‘frame_pad’ parameter in Buffer Op. {133242}
Op:GPU: Added support for the Cast operation from INT64 to INT32 on Windows. {132750}
Op:HTP: Added INT16 support for the ElementWiseAsin Op on the HTP backend. {114479}
Op:HTP: Added support for the MaskedSoftmax Op on the HTP backend for LLM use cases. {110661}
Op:HTP: Implemented performance optimizations for the Score Filter and NMS operations on the HTP backend. {134740}
OpDef: Added Op definition for IsInf. {125370}
SDK: Added an option to enable optrace profiling in the TNR application. {135588}
SDK: Enabled SNPE, QNN, and QNN delegate support for the QCM8550 platform. {129533}
Tool:Converter: Added dynamic weights support for the Deconv Op in TensorFlow models. {109713}
Tool:Converter: Added support for Add, Subtract, Multiply, and Divide operations in Float32 precision for static tensor
manipulation within the G2G IR. {125540}
Tool:Converter: Added support for ONNX 1.16.1 in the Ubuntu 20.04 (Focal) environment. {134975}
Tool:Converter: Added support for the Size operation and updated Relu opset versions in the ONNX converter to address unsupported
operations in certain models. {133472}
Tool:Genie: Introduced the genie-app command-line utility. {123548}
Tool:HTP: Added support for the HTP MCP Binary format in the QnnHtpBinaryBufferPrinter tool, enabling proper parsing and
printing of MCP binaries. {128507}
API: Allowed passing extra arguments through the Python API’s ConverterConfig to underlying modules. {133985}
API: Fixed an encodings path issue during the build phase with GenAI models using the Python API. {133815}
API: Fixed an issue where quantized and compiled models failed during execution with the Python API when using default
CalibrationConfig values. {134858}
API: Fixed an issue where the QAIRT Python API failed to load backend libraries (QnnCpu.dll/QnnHtp.dll) on certain devices.
{134461}
API: Fixed an issue with the JSON reader setting in QNN profiling on Windows. {134565}
CPU: Fixed a memory management issue for xnnpack Conv2D nodes. {132710}
CPU: Fixed an issue where certain models failed during inference due to an invalid layer parameter value resulting from a
GroupNorm operation failure. {135924}
Core: Fixed cross SoC compatibility issues caused by unsynchronized GpuInfo fields between SocServer and SocUtility. {135786}
DSP: Fixed a context binary generation issue on OE Linux Platform. {124376}
DSP: Fixed an issue where snpe-net-run failed due to an unavailable runtime. {135399}
DSP: Fixed inference time regressions observed on HTP_FP16 and HTP backends by propagating DSP architecture characteristics to the
HTP core. {133777}
GPU: Resolved model verification failures encountered with certain CNN models on the GPU backend, related to Conv Kernel
processing. {130041}
Genie: Fixed an asynchronous initialization issue for Windows platforms. {135904}
Genie: Fixed an issue where GenieDialog_save/restore could not be used with GENIE_DIALOG_SENTENCE_REWIND. {135558}
Genie: Fixed an issue where GenieProfiling data could report invalid initialization time data. {134498}
Genie: Fixed an issue where stop sequences did not work with GenieDialog_embeddingQuery. {134592}
HTP: Adjusted max PD size calculation to correctly account for far weights, resolving an issue with unexpected secondary PD
triggers during specific test conditions. {127268}
HTP: Fixed a Stability issue with Llama 3 3B multicore models by updating the method for setting the mc_spill_fill buffer.
{135253}
HTP: Fixed a crash occurring in multicore graphs due to incorrect identification of spillfill memory pools by the Hexagon NN API.
{135543}
HTP: Fixed an issue where qnn-net-run failed to open a session due to library loading and device transport instance creation
errors. {135028}
HTP: Fixed an issue where core information was not correctly captured in optrace for multicore execution. {133797}
HTP: Fixed an out-of-memory issue occurring when running Llama 3 8B models on a single core without splitting. {134696}
HTP: Fixed async execution failures observed while running certain models in a multicore configuration with shared buffers.
{135047}
HTP: Fixed logic in graph switching to prevent a bug. {133794}
HTP: Fixed multicore async inference failures, including issues observed with Zero copy. {134701}
HTP: Improved model execution time performance on SM8750, addressing an issue where the execution time KPI was not being met.
{128145}
HTP: Resolved a graph execution failure issue observed during the async_group_init_llama7b_graph_switch_no_shared_resources test.
{126402}
HTP: Resolved an issue causing incorrect mapping of test failures in nightly reports. {125884}
HTP: Resolved an issue leading to a “Failed to deregister ion memory with the backend” log message during multi-threaded HTP
binary execution with shared buffers. {129716}
HTP: Resolved differences in adapter switch time between Genie and qnn-net-run by addressing issues related to graph switching
and power settings. {131776}
Op:CPU: Fixed TransposeConv2d for asymmetric kernels in Float execution. {133778}
Op:CPU: Fixed an issue by adding INT8 support for GroupNorm Op. {135932}
Op:GPU: Fixed accuracy errors with the ReduceSum operation when used with Image2DArray for non-Mean ops and specific dimensions.
{131616}
Op:GPU: Fixed inference failures in models with Argmax/Argmin Ops. {133052}
Op:HTP: Added support for LayerNorm when the constant input is FP16 converted to FP32. {131420}
Op:HTP: Enabled UINT_8 datatype support for the StridedSlice Op on the HTP backend, resolving model conversion and graph
preparation failures. {125597}
Op:HTP: Fixed accuracy issue for GatherNd Op. {110126}
Op:HTP: Fixed an accuracy issue with LPBQ convolution for MOE on v73. {133134}
Op:HTP: Fixed an issue where the Genie output resulted in an infinite loop with WoS by updating the prompt file. {134680}
Op:HTP: Fixed an issue with high power consumption for DepthwiseConv op with asymmetric stride by optimizing the pattern on the
HTP backend. {133635}
Op:HTP: Improved accuracy of the Swish Op. {133898}
Op:HTP: Improved performance of the MatMul Op running on HVX. {135210}
Op:HTP: Improved the performance of the 5D GridSample Op on the HTP backend for W8A16 quantization. {122831}
Op:HTP: Improved the performance of the GridSample Op on the HTP backend by addressing tiling and scheduling issues. {126462}
SDK: Fixed an issue where some models failed at the concat operation during graph preparation. {132887}
Tool: Added a validation check for float fallback to prevent quantizer failures when encodings or calibration lists are not
provided. {133463}
Tool: Added support for the –onnx_batch and –tensorflow_batch options in Hypertuner after QAIRT converter changes. {131064}
Tool: Eliminated a misleading warning message “Function not called, PrepareLib isn’t loaded!” that would appear when running
qnn-net-run successfully on HTP. {122382}
Tool: Fixed an issue where the is_symmetric value for 32-bit bias tensors was incorrectly reset during Float Fallback, causing
failures when the output DLC was passed back to the quantizer. {135379}
Tool: Fixed quantizer to insert Convert Op for LayerNorm weights with external encoding. {134466}
Tool: Resolved an issue where snpe-dlc-graph-prepare failed for certain models due to incompatible float bitwidths when QParams
were present, particularly in the float fallback path. {130558}
Tool:Converter: Added a fix for a bug in LayerNorm squeeze_axes. {126234}
Tool:Converter: Added a pattern to map to expand op to reduce inference time. {132363}
Tool:Converter: Added a warning message for the Non-Zero Op when the output shape is dynamic. {126185}
Tool:Converter: Added support for a new einsum equation, expanding the range of supported ONNX models. {133824}
Tool:Converter: Converter-generated FullyConnected Ops now have 2D input and 2D output. {127049}
Tool:Converter: Ensured that ApplyEncodings is called by the quantizer when –use_quantize_v2 is provided internally, even if
not on the command line. {133705}
Tool:Converter: Fixed JSON dumping for 4-bit quantized tensors. {133481}
Tool:Converter: Fixed KernelScale expansion for scalars in TFLite DeConv dequantization. {128978}
Tool:Converter: Fixed a bug in NonZero Op translation constant folding. {127165}
Tool:Converter: Fixed a bug in the squash_node_into_nn_node optimization. {126354}
Tool:Converter: Fixed a conversion error that occurred when –float_bitwidth 16 was provided on the command line with existing
quantization parameters. {134716}
Tool:Converter: Fixed a corner case in the DCE process in the converter to correctly handle node removal based on the number of
consumers of output tensors. {129704}
Tool:Converter: Fixed an error in the squash_node_into_nn_node optimization. {132836}
Tool:Converter: Fixed an issue where output nodes for BatchMatMul and BatchMatMulV2 Ops were missing by adding support to convert
them to FullyConnected Op. {127139}
Tool:Converter: Fixed an issue where the converter failed when using the –desired_input_layout argument with the new layout
transform algorithm by unifying its behavior with custom_io. {136144}
Tool:Converter: Fixed an issue with 6D support for Concat and Constant Ops in the frontend, resolving a core dump error during
quantization. {117698}
Tool:Converter: Fixed incorrect population of the “is_symmetric” flag, ensuring encodings are dumped correctly. {134673}
Tool:Converter: Fixed issue observed when several GRU share one init hidden status, add UT for bidirectional GRU. {91127}
Tool:Converter: Resolved an accuracy regression issue related to the squash_batchnorm optimization in the converter by ensuring
the optimization correctly handles encodings. {130130}
Tool:Converter: Skipped adding dummy weights and bias tensors during LayerNorm pattern matching. {128870}
Tool:Converter:ONNX: Added a fix for axis_format handling in matmul_to_fc translation. {118318}
Tool:Converter:ONNX: Fixed a model conversion issue with the Resize operation in the ONNX converter. {131677}
Tool:Converter:ONNX: Fixed an ONNX conversion failure for the Sam2 Image Encoder model by addressing layout format issues for
Matmul node inputs and outputs. {131098}
Tool:Op:HTP: Optimized the DepthwiseConv op with asymmetric stride to improve performance for specific models. {132474}
Tool:accuracy_debugger: Corrected a tensor shape issue for the oneshot algorithm with ONNX batch=1; the onnx_batch override option
is no longer accessible. {133915}
Tool:qairt-accuracy-evaluator: Removed the preproc-file option from the Accuracy Evaluator CLI as it is no longer valid due to the
deprecation of minimal mode. {129278}
Tool:qnn-onnx-converter: Fixed an issue where static tensor framework trace information was missing for some tensors. {120982}
Tool:qnn-tensorflow-converter: Added logic to ensure the min-max in TensorFlow FakeQuantPerChannel nodes are symmetric. {118672}
Tool:quantizer: Fixed an issue with 2-bit weight quantization calculation, resolving incorrect output values. {132048}