Revision history

QAIRT SDK Version

Description

2.39.0

  • Genie C API 1.13.0:

    • Introduced new profiling option for collecting detailed trace events.

    • Added the GenieDialog_embeddingTokenQuery API.

    • Added the GenieDialog_setMaxNumTokens API.

    • Added the GENIE_STATUS_WARNING_CONTEXT_EXCEEDED status code to provide a specific status when a prompt exceeds the model’s context length limit.

  • Bugfixes:

    • Fixed an issue that caused incorrect calculation of KV cache tensor sizes on the HTP backend, which could lead to segmentation faults.

    • Fixed an issue where no output was generated for certain models when the prompt prefill phase required multiple graph executions.

2.38.0

  • Genie C API 1.12.0:

    • Added GENIE_NODE_IMAGE_ENCODER_IMAGE_FULL_ATTN_MASK and GENIE_NODE_IMAGE_ENCODER_IMAGE_WINDOW_ATTN_MASK node inputs.

  • Dialog JSON configuration:

    • Added engine sharing support for HTP and basic or SSD dialogs.

  • SDK:

    • Added embeddingQuery support in genie-app.

    • Added support for encoder-decoder model in Gen AI Transformer.

    • Added support for Cross Layer Attention in HTP backend.

    • Added genie-t2e-run source code example.

  • Bugfixes:

    • Added missing type field in sampler.json config example.

    • Fixed issue loading the lm_head or LoRA adapters on Windows platforms.

    • Fixed segmentation fault due to uninitialized variables.

    • Fixed an issue where Eaglet token generation rate had regressed.

    • Fixed an issue where the prompt-template is not applied when GenieEmbedding_generate outputs are truncated.

    • Fixed issue where Genie could crash if the vocabulary size is less than the AR-N.

    • Fixed issue where a paused query with LUT encoder models failed to resume.

    • Fixed memory leaks during GenieDialog_applyLora.

    • Fixed reduced SSD acceptance rate when using cache-group with long context.

2.37.0

  • Genie C API 1.11.0:

    • Added GENIE_NODE_IMAGE_ENCODER_IMAGE_POS_SIN and GENIE_NODE_IMAGE_ENCODER_IMAGE_POS_COS node inputs.

  • SDK:

    • Added an async command to genie-app allowing for execution of asynchronous statements.

    • Added GenieEmbedding support to genie-app.

    • Added Eaglet dialog support for dual head draft models.

  • Dialog JSON configuration:

    • Added support for non-updatable quantization (NUQ) and grouped LoRA.

    • Added the cache-groups JSON configuration option allowing for the sliding window attention (SWA) cache management policy.

    • Introduced the SSD dialog “branch-mode” config option with “top-1” and “all-expand” supported values.

  • Bugfixes:

    • Fixed issue where SSD or SPD dialog types would crash on the aarch64-oe-linux-gcc11.2 platform.

    • Fixed segmentation fault when graph switching is enabled along with memory mapping.

    • Fixed minor memory leaks.

2.36.0

  • Genie C API 1.10.0:

    • Added support for profiling and logging of GenieEngine APIs.

    • Introduced Genie Dialog and Embedding APIs to set and get performance policy.

    • Added support for pausing and resuming active dialog queries.

  • SDK:

    • Added experimental support for arm64x-windows-msvc.

    • Include Eaglet dialog implementation in the SDK source code example.

    • Added support for KV cache rewind after KV cache restore.

    • Added a performance optimization where tokenizers are shared across dialogs and embeddings when their tokenizer file paths are identical.

    • Added a performance optimization for improved KV cache conversion logic for kv-share dialogs.

  • Dialog JSON configuration:

    • Added skip-lora-validation option to reduce LoRA adapter switch time on HTP.

    • Added support for repetition penalties in sampling within the Genie sampler.

  • Bugfixes:

    • Fixed sampling for float16 models which would produce nonsensical response text.

    • Fixed LM head scheduling optimization when native sampling is enabled.

    • Fixed segmentation fault when model validation fails on HTP.

    • Fixed memory leak in the tokenizer implementation.

    • Reduced memory overhead for token embedding LUT encoders.

2.35.0

  • Genie C API 1.9.0:

    • Introduced the (experimental quality) GeniePipeline.h and GenieNode.h APIs which provide multimodal support.

    • Introduced the GenieTokenizer.h API.

  • SDK:

    • Introduced the (experimental quality) genie-app command-line utility.

    • Removed printing of KPIs to stdout by the Genie library.

  • Dialog JSON configuration:

    • Added support for use-mmap on Windows platforms.

    • Added a data-alignment-size configuration options for dialogs and embeddings APIs.

  • Bugfixes:

    • Fixed issue where GenieProfiling data could report invalid init time data.

    • Fixed an issue detected by BinSkim in Genie.dll.

    • Fixed issue where stop sequence would not work with GenieDialog_embeddingQuery.

    • Fixed issue where save/restore would not work for Eaglet dialogs.

    • Fixed issue in Eaglet dialogs where incorrect sentence code was called back.

2.34.0

  • Genie C API 1.8.0:

    • Added GenieEngine.h, GenieDialog_getEngine, and GenieDialog_bindEngine APIs.

    • Added GenieSampler_registerUserDataCallback API which adds a userData argument to the sampler custom callback.

    • Added token-acceptance-rate to the GenieProfile output for some dialog types.

    • Added the Eaglet dialog type.

  • SDK:

    • Introduced a performance optimization where logits are sampled using the native datatype output of the LLM.

  • Bugfixes:

    • Fixed genie-t2t-run issue where dialog de-init data was not saved.

    • Fixed issue where GenieEmbedding_generate would return a rank of 0.

    • Fixed issue where quantized value may overflow or underflow.

2.33.0

  • Genie C API 1.7.0:

    • Added the GenieLog.h API.

    • Added LoRA adapter switch latency to GenieProfile output.

    • Allow sampler type to be changed in GenieSampler_applyConfig.

  • Bugfixes:

    • Fixed issue where RPC memory handles were not unregistered.

    • Fixed issue where queries after a KV cache rewind resulted in poor text generation.

2.32.0

  • Genie C API 1.6.0:

    • Added dialog priority support with GenieDialog_setOemKey and GenieDialog_setPriority.

  • SDK:

    • Added Windows build support for the source code examples.

    • Reorganized the Genie SDK documentation.

    • Removed shift concat and pointer shift KV cache update methods in lieu of smart mask.

  • Bugfixes:

    • Fixed issue where SPD token rate is incorrectly reported when the query is aborted.

    • Fixed issue where multi-token stop sequences were not fully omitted in queryCallback and KV$.

    • Fixed issue where tokenizer state is corrupted after a query abort.

    • Fixed issue where a Gen AI Transformer dialog attempts to double free memory.

    • Fixed a qnn-genai-transformer-composer failure when preparing LoRA adapters.

    • Fixed a performance regression for kv-share dialogs using the token query API.

2.31.0

  • Genie C API 1.5.0:

    • Added GenieDialog_signal API.

  • Dialog JSON configuration:

    • Added dialog debug configuration option.

  • Bugfixes:

    • Improved numerical stability of embedding requantization in genie-t2t-run.

    • Fixed a crash in genie when running lookahead decoder dialog when setting up attention masks and rope position embeddings.

    • Fixed issue with value data type in GenieProfile JSON output.

2.30.0

  • Genie C API 1.4.0:

    • Added GenieProfile.h APIs.

    • Added GENIE_DIALOG_SENTENCE_REWIND sentence code option.

    • Added support for dialog custom sampler implementations.

    • Added GenieDialog_setStopSequence API to allow updating the stop sequence configuration between dialog queries.

  • Bugfixes:

    • Fix issue causing FP16 model validation failures.

    • Fix issue where the end sentence code is not provided on a stop sequence match.

2.29.0

  • SDK:

    • Improved prompt processing time for SSD dialogs.

    • Include GenieEmbedding implementation in the SDK source code example.

    • Reduced the size of libGenie.so built from the source example.

  • Genie C API 1.3.0:

    • Added GenieSampler.h API.

  • Dialog JSON configuration:

    • Added QNN GPU engine type.

    • Added enable-graph-switching dialog JSON configuration option.

    • Added LoRA V1 support.

    • Added LoRA V2 support in GenAiTransformer backend.

    • Added JSON config support for longrope.

    • Added support for multistream SSD dialogs.

    • Added support for multistream embedding to text dialog.

  • Bugfixes:

    • Fix issue in genie-t2t-run that could result in a double free of a dialog config handle.

    • Fix issue where KV cache restore could hang on aarch64-windows-msvc.

    • Fixed handling of rope-theta and rope-scaling configuration.

2.28.0

  • SDK:

    • Added genie-t2e-run application and sample config for GenieEmbedding.h.

    • Added llama-3-8b JSON config example for HTP.

  • Genie C API 1.2.0:

    • Added GenieEmbedding.h.

    • Added GenieDialog_applyLora and GenieDialog_setLoraStrength.

    • Added GenieDialog_tokenQuery. (Supported for basic dialog type only).

    • Added GenieDialog_save and GenieDialog_restore.

  • Dialog JSON configuration:

    • Added self-speculative decoding (SSD) dialog type.

    • Added speculative decoding (SPD) dialog type.

    • Added lookahead decoding (LADE) dialog type.

    • Added multistream dialog type.

    • Added rope-scaling.

    • Added support for multiple EOS tokens.

    • Added alibi and absolute positional encoding support.

    • Added SSD support for GenieDialog_embeddingQuery.

  • Bugfixes:

    • Fixed issue where mmap-budget was unused.

    • Fixed link issue with libGenie.so on aarch64-oe-linux-gcc11.2.

    • Fixed memory leak in model loading when setting use-mmap to false.

2.27.0

  • SDK:

    • Added genie-t2t-run source code example.

    • Added llama-3-8b JSON config example for HTP.

  • Genie C API 1.1.0:

    • Added GenieDialog_embeddingQuery API with corresponding tool and configuration support.

  • Dialog JSON configuration:

    • Added kv-share dialog type which provides support for KV cache transfer between HTP and GenAiTransformer backends.

    • Added max-num-tokens dialog configuration option.

  • Bugfixes:

    • Fixed issue where IO encodings were not updated when a LoRA adapter was applied.

    • Workaround issue where segmentation fault occurs after GenieDialog_free when using the HTP backend.

2.26.0

  • Dialog JSON configuration:

    • Added eot-token configuration option.

    • Added rope-theta configuration option.

    • Added support for async initialization and added allow-async-init

    • config option.

    • Added stop-sequence configuration option to enable dialog query cancellation based upon response text matching.

  • Bugfixes:

    • Fix issue where incorrect Windows .lib files were packaged.

    • Fix issue where unknown genie-t2t-run option does not generate an error.

    • Fixed memory allocation failures during HTP initialization.

2.25.0

  • Genie C API 1.0.0:

    • Genie C API moves into production.

  • Dialog JSON configuration:

    • Introduced GenieDialog JSON configuration format.

2.23.0

  • Genie C API 0.1.0:

    • Added GenieCommon.h and GenieDialog.h.