Tutorials

This section contains tutorials that assist users in becoming familiar with the Genie workflow. Tutorials are split into two sections: one for the QNN GenAITransformer backend and one for the QNN HTP backend.

Note

Please refer to Setup before starting any of the tutorials.

QNN GenAITransformer backend workflow

The Genie provided QNN GenAITransformer backend leverages the QNN op package interface to represent an entire LLaMA model as a single op. The model execution engine is provided via the QnnGenAiTransformerCpuOpPkg op package library. The Genie packages a prebuilt QnnGenAiTransformerModel model library. The corresponding source for this model library can be found at ${SDK_ROOT}/examples/Genie/Model/model.cpp. Because the QNN GenAITransformer backend model and op package are prebuilt, this backend uses the qnn-genai-transformer-composer tool for preparation.

Model conversion

The following section demonstrates converting a model using qnn-genai-transformer-composer.

Model conversion on Linux and Android

Open a command shell on Linux host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Linux host
cd ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/
./qnn-genai-transformer-composer --quantize Z4
                                 --outfile <output filename with complete path>.bin
                                 --model <path-to-downloaded-LLama-model-directory>

Model conversion on Windows

Open Developer PowerShell for VS2022 on Windows host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Windows host
cd ${QNN_SDK_ROOT}>\bin\x86_64-windows-msvc
python .\qnn-genai-transformer-composer --quantize Z4
                                        --outfile <output filename with complete path>.bin
                                        --model <path-to-downloaded-LLama-model-directory>

Model configuration

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example model config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b/llama2-7b-genaitransformer.json. Note that the tokenizer path and model bin fields will need to be updated based on your actual preparation steps.

Model execution

The following section demonstrates running a model on the QNN GenAITransformer backend using genie-t2t-run.

Model execution on Linux

Open a command shell on Linux host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Linux host
cd <QNN_SDK_ROOT>\bin\x86_64-linux-clang
./genie-t2t-run -c <path to cpu_model_config.json>
                -p "Tell me about Qualcomm"

Model execution on Android

Open a command shell on Linux host and run:

# make sure a test device is connected
adb devices

# push artifacts to device
adb push ${QNN_SDK_ROOT}/bin/aarch64-android/genie-t2t-run /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libGenie.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGenAiTransformer.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGenAiTransformerCpuOpPkg.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGenAiTransformerModel.so /data/local/tmp/
adb push <path to tokenizer.json> /data/local/tmp/
adb push <path to cpu-model-config.json> /data/local/tmp/
adb push <path to model bin file, e.g. <path-to-downloaded-LLama-model-directory>/model.bin> /data/local/tmp/

# open adb shell
adb shell

export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH

cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to cpu-model-config.json>
                -p "Tell me about Qualcomm"

Model execution on Windows

Open Developer PowerShell for VS2022 on Windows on Snapdragon host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Windows host
cd <QNN_SDK_ROOT>\bin\aarch64-windows-msvc
.\genie-t2t-run.exe -c <path to cpu-model-config.json>
                    -p "Tell me about Qualcomm"

Bge-large model inference using GenAiTransformer on Android

See Genie Embedding JSON configuration string for details on the fields and what they mean. An example model_config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/bge-large-genaitransformer.json. Note that the tokenizer path and model bin fields will need to be updated based on your actual preparation steps.

Note

Use qnn-genai-transformer-composer without --quantize Z4 option to generate the model binary.

To run on QNN GenAiTransformer backend, open a command shell on android and run the following:

Note

Results will be saved in output.raw file in working directory.

# make sure a test device is connected
adb devices

# push artifacts to device
adb push ${QNN_SDK_ROOT}/bin/aarch64-android/genie-t2e-run /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libGenie.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGenAiTransformer.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGenAiTransformerCpuOpPkg.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGenAiTransformerModel.so /data/local/tmp/
adb push <path to tokenizer.json> /data/local/tmp/
adb push <path to cpu-model-config.json> /data/local/tmp/
adb push <path to model bin file, e.g. <path-to-downloaded-BGE-model-directory>/model.bin> /data/local/tmp/

# open adb shell
adb shell

export LD_LIBRARY_PATH=/data/local/tmp

cd $LD_LIBRARY_PATH
./genie-t2e-run -c <path to bge-large-genaitransformer.json>
                -p "Tell me about Qualcomm "

Building the example model library

Building an example model library is optional.

In the case of the GenAITransformer backend, the model is composed of a single custom op implemented by the pre-built libQnnGenAiTransformerCpuOpPkg.so and QnnGenAiTransformerCpuOpPkg.dll op packages.

The Genie provides pre-built libQnnGenAiTransformerModel.so and QnnGenAiTransformerModel.dll libraries as described in the Introduction. The source for these libraries is provided by ${QNN_SDK_ROOT}/examples/Genie/Model/model.cpp. This section shows the user how to compile this source into a model library consumable by the Genie.

Model build on Linux host

Open a command shell on Linux host and run:

$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-model-lib-generator \
  -c ${QNN_SDK_ROOT}/examples/GenAiTransformer/Model/model.cpp \
  -o ${QNN_SDK_ROOT}/examples/GenAiTransformer/Model/model_libs # This can be any path

This will produce the following artifacts:

  • ${QNN_SDK_ROOT}/examples/GenAiTransformer/Model/model_libs/aarch64-android/libqnn_model.so

  • ${QNN_SDK_ROOT}/examples/GenAiTransformer/Model/model_libs/x86_64-linux-clang/libqnn_model.so

By default libraries are built for all targets. To compile for a specific target, use the -t <target> option with

qnn-model-lib-generator. Choices of <target> are aarch64-android and x86_64-linux-clang.

QNN GPU backend workflow

The following tutorial demonstrates running a model on the QNN GPU backend using genie-t2t-run.

Note

This section assumes that the QNN GPU context binaries have been obtained via the QNN workflow.

GPU Backend Example Model Config

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example model_config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b/llama2-7b-gpu.json. Note that the tokenizer path and context binary fields will need to be updated based on your actual preparation steps.

LLaMA model inference on Android

To run on QNN GPU backend, open a command shell on android and run the following.

adb shell mkdir -p /data/local/tmp/
adb push ${QNN_SDK_ROOT}/bin/aarch64-android/genie-t2t-run /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libGenie.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGpu.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnSystem.so /data/local/tmp/
adb push <path to llama2-7b-gpu.json> /data/local/tmp/
adb push <path to tokenizer.json> /data/local/tmp/
adb push <path to model bin file> /data/local/tmp/

# open adb shell
adb shell

export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH

cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to llama2-7b-gpu.json>
                -p "Tell me about Qualcomm"

QNN HTP backend workflow

The following tutorial demonstrates running a model on the QNN HTP backend using genie-t2t-run.

Note

This section assumes that the QNN HTP context binaries have been obtained via the QNN workflow.

HTP Backend Example Model Config and Backend Extension Config

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example model_config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b/llama2-7b-htp.json. Note that the tokenizer path and context binary fields will need to be updated based on your actual preparation steps. There is also a Windows specific configuration file located here: ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b/llama2-7b-htp-windows.json.

An example backend_ext_config.json can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/htp_backend_ext_config.json.

For more information on the QNN HTP backend extension configurations options, please refer to ${QNN_SDK_ROOT}/docs/QNN/general/htp/htp_backend.html.

LLaMA model inference on Android

To run on QNN HTP backend, open a command shell on android and run the following. This assumes that the HTP architecture has been set (e.g., ARCH=75).

adb shell mkdir -p /data/local/tmp/
adb push ${QNN_SDK_ROOT}/bin/aarch64-android/genie-t2t-run /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libGenie.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtp.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnSystem.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpPrepare.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpNetRunExtensions.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpV${ARCH}Stub.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/hexagon-v${ARCH}/unsigned/libQnnHtpV${ARCH}Skel.so /data/local/tmp/
adb push <path to htp_backend_ext_config.json> /data/local/tmp/
adb push <path to llama2-7b-htp.json> /data/local/tmp/
adb push <path to tokenizer.json> /data/local/tmp/
adb push <path to model bin files> /data/local/tmp/

# open adb shell
adb shell

export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH

cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to llama2-7b-htp.json>
                -p "What is the most popular cookie in the world?"

LLaMA model inference on Windows

Open Developer PowerShell for VS2022 on Windows on Snapdragon host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Windows host
cd <QNN_SDK_ROOT>\bin\aarch64-windows-msvc
.\genie-t2t-run.exe -c <path to llama2-7b-htp.json>
                    -p "Tell me about Qualcomm"

LLaMA-2-7b model inference using SSD-Q1 on Android

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example model_config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b-htp-ssd.json. Note that the tokenizer path and context binary fields will need to be updated based on your actual preparation steps.

Note

Use LLaMA-2-7b notebook’s for generating AR-N models.

To run using SSD on QNN HTP backend, open a command shell on android and run the following. This assumes that the HTP architecture has been set (e.g., ARCH=79). Please use the steps mentioned above for libraries, binaries, tokenizer and backend_ext_config.

adb shell mkdir -p /data/local/tmp/
adb push <path to llama2-7b-htp-ssd.json> /data/local/tmp/
adb push <path to forecast-prefix-dir> /data/local/tmp/

# open adb shell
adb shell
export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH

cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to llama2-7b-htp-ssd.json>
                -p "What is the most popular cookie in the world?"

LLaMA-2-7b model inference using LADE on Android

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example model_config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b-htp-lade.json. Note that the tokenizer path and context binary fields will need to be updated based on your actual preparation steps.

Note

Use LLaMA-2-7b notebook’s for generating AR-N models.

To run using LADE on QNN HTP backend, open a command shell on android and run the following. This assumes that the HTP architecture has been set (e.g., ARCH=79). Please use the steps mentioned above for libraries, binaries, tokenizer and backend_ext_config.

adb shell mkdir -p /data/local/tmp/
adb push <path to llama2-7b-htp-lade.json> /data/local/tmp/

# open adb shell
adb shell
export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH

cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to llama2-7b-htp-lade.json>
                -p "What is the most popular cookie in the world?"

LLaMA-2-7b model LoRA inference using HTP on Android

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example model_config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b-htp-lora.json. Note that the tokenizer path and context binary fields will need to be updated based on your actual preparation steps.

Note

Use LLaMA-2-7b notebook’s for generating AR-N models.

To run using LoRA on QNN HTP backend, open a command shell on android and run the following. This assumes that the HTP architecture has been set (e.g., ARCH=79). Please use the steps mentioned above for libraries, binaries, tokenizer and backend_ext_config.

adb shell mkdir -p /data/local/tmp/
adb push <path to llama2-7b-htp-lora.json> /data/local/tmp/
adb push <path to lora bin files> /data/local/tmp/

# open adb shell
adb shell
export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH

cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to llama2-7b-htp-lora.json>
                -p "What is the most popular cookie in the world?"
                -l lora1,alpha,0.5

Model download

Download Llama-2-7b lora adapter (French to English language translator) from https://huggingface.co/kaitchup/Llama-2-7b-mt-French-to-English

Model conversion

The following section demonstrates converting a lora adapter using qnn-genai-transformer-composer.

Model conversion on Linux

Open a command shell on Linux host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Linux host
# Additionally do the following
export LD_LIBRARY_PATH=${QNN_SDK_ROOT}/lib/x86_64-linux-clang:$LD_LIBRARY_PATH
export PYTHONPATH=${QNN_SDK_ROOT}/lib/python/qti/aisw/genai:$PYTHONPATH

# lora adapter conversion command
cd ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/
./qnn-genai-transformer-composer --model <path-to-downloaded-LLama-model-directory>
                                 --outfile <output filename with complete path>.bin
                                 --lora <path-to-downloaded-Lora-adapter-directory>

Model configuration

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example model config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b/llama2-7b-genaitransformer-lora.json. Note that the tokenizer path and model bin fields will need to be updated based on your actual preparation steps.

Model execution

The following section demonstrates running a model on the QNN GenAiTransformer backend using genie-t2t-run.

Model execution on Linux

Open a command shell on Linux host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Linux host
cd <QNN_SDK_ROOT>\bin\x86_64-linux-clang
./genie-t2t-run -c <path to llama2-7b-genaitransformer-lora.json>
                -p "Le certificat peut être imprimé dans une ou plusieurs langues de la convention et doit être complété dans l'une de ces langues."
                --lora lora1,alpha,1

Model execution on Android

Open a command shell on Linux host and run:

# make sure a test device is connected
adb devices

# push artifacts to device
adb push ${QNN_SDK_ROOT}/bin/aarch64-android/genie-t2t-run /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libGenie.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGenAiTransformer.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGenAiTransformerCpuOpPkg.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGenAiTransformerModel.so /data/local/tmp/
adb push <path to tokenizer.json> /data/local/tmp/
adb push <path to llama2-7b-genaitransformer-lora.json> /data/local/tmp/
adb push <path to model bin file, e.g. <path-to-converted-LLama-model-directory>/model.bin> /data/local/tmp/
adb push <path to lora adapter bin file, e.g. <path-to-converted-lora-adapter-directory>/lora_adapter.bin> /data/local/tmp/
# open adb shell
adb shell

export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH

cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to llama2-7b-genaitransformer-lora.json>
                   -p "Le certificat peut être imprimé dans une ou plusieurs langues de la convention et doit être complété dans l'une de ces langues."
                   --lora lora1,alpha,1

LLaMA-2-7b model inference using SPD on Android

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example model_config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b-draft-htp-target-htp-spd.json. Note that the tokenizer path and context binary fields will need to be updated based on your actual preparation steps.

Note

Use LLaMA-2-7b notebook’s for generating AR-N models for target.

Note

Use Small Sized LLM models ex. 115M for draft.

To run using SSD on QNN HTP backend, open a command shell on android and run the following. This assumes that the HTP architecture has been set (e.g., ARCH=79). Please use the steps mentioned above for libraries, binaries, tokenizer and backend_ext_config. .. note:: If using different backends for target and draft, ensure the libraries and binaries both the engines are present.

adb shell mkdir -p /data/local/tmp/
adb push <path to llama2-7b-draft-htp-target-htp-spd.json> /data/local/tmp/

# open adb shell
adb shell
export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH

cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to llama2-7b-draft-htp-target-htp-spd.json>
                -p "What is the most popular cookie in the world?"

LLaMA-2-7b model inference using KV-SHARE on Android

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example model_config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b-genaitransformer-htp-kv-share.json. Note that the tokenizer path and context binary fields will need to be updated based on your actual preparation steps.

Note

Use LLaMA-2-7b notebook’s for generating AR-N models.

KV-SHARE uses QNN HTP backend for prompt processing and QNN GenAITransformer backend for Token generation. To run using KV-SHARE dialog, open a command shell on android and run the following. Please use the steps mentioned above for libraries, binaries, tokenizer and backend_ext_config.

Note

Using different backends for primary and secondary engine, ensure that libraries and binaries of both the engines are present.

adb shell mkdir -p /data/local/tmp/
adb push <path to llama2-7b-genaitransformer-htp-kv-share.json> /data/local/tmp/

# open adb shell
adb shell
export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH
export ADSP_LIBRARY_PATH=$LD_LIBRARY_PATH

cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to llama2-7b-genaitransformer-htp-kv-share.json>
                -p "What is the most popular cookie in the world?"

Bge-large model inference using HTP on Android

See Genie Embedding JSON configuration string for details on the fields and what they mean. An example model_config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/bge-large-htp.json. Note that the tokenizer path and context binary fields will need to be updated based on your actual preparation steps.

Note

Use regular QNN flow to get required context binaries of BGE model.

To run on QNN HTP backend, open a command shell on android and run the following. This assumes that the HTP architecture has been set (e.g., ARCH=79). Please use the steps mentioned above for libraries, binaries, tokenizer and backend_ext_config.

Note

Results will be saved in output.raw file in working directory.

adb shell mkdir -p /data/local/tmp/
adb push <path to bge-large-htp.json> /data/local/tmp/

# open adb shell
adb shell
export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH

cd $LD_LIBRARY_PATH
./genie-t2e-run -c <path to bge-large-htp.json>
                -p "What is the most popular cookie in the world?"

Genie sample tutorial

Warning

libGenie.so is subject to change without notice.

Genie sample pre-requisites

Building libGenie.so has three external dependencies:
  1. clang compiler

  2. ndk-build (for Android targets only)

  3. RUST

If the clang compiler is not available in your system PATH, the script ${QNN_SDK_ROOT}/bin/check-linux-dependency.sh provided with the SDK can be used to install and prepare your environment. Alternatively, you could install these dependencies and make them available in your PATH.

Command to automatically install required dependencies:

1 $ sudo bash ${QNN_SDK_ROOT}/bin/check-linux-dependency.sh

For the second dependency to be satisfied, ndk-build needs to be set, which you can check with:

1 $ ${QNN_SDK_ROOT}/bin/envcheck -n

Note: libGenie.so has been verified to work with Android NDK version r26c and clang14.

For the third dependency, RUST, run the following command in a terminal:

1 $ export RUSTUP_HOME=</path/for/rustup>
2 $ mkdir -p ${RUSTUP_HOME}
3 $ export CARGO_HOME=</path/for/cargo>
4 $ mkdir -p ${CARGO_HOME}
5 $ curl --proto '=https' --tlsv1.2 https://sh.rustup.rs -sSf | sh
6 $ source ${CARGO_HOME}/env
7 $ rustup target add aarch64-linux-android

Building libGenie.so

x86

1 $ cd ${SDK_ROOT}/examples/Genie/Genie
2 $ make x86

After executing make as shown above, you should be able to see libGenie.so in lib/x86_64-linux-clang

Android

1 $ cd ${SDK_ROOT}/examples/Genie/Genie
2 $ make android

After executing make as shown above, you should be able to see libGenie.so in lib/aarch64-android You can now proceed to link this library to your app and call APIs exposed by libGenie.so

Sample genie-t2t-run tutorial

genie-t2t-run sample pre-requisites

genie-t2t-run depends on libGenie.so. Please follow its instructions above on how to build it.

Building sample genie-t2t-run

x86

1 $ cd ${SDK_ROOT}/examples/Genie/genie-t2t-run
2 $ make x86

After executing make as shown above, you should be able to see genie-t2t-run in bin/x86_64-linux-clang

Android

1 $ cd ${SDK_ROOT}/examples/Genie/genie-t2t-run
2 $ make android

After executing make as shown above, you should be able to see genie-t2t-run in bin/aarch64-android

Executing sample genie-t2t-run

You can follow QNN GenAITransformer backend workflow and QNN HTP backend workflow for the instructions to run on CPU and HTP respectively.

Model inference using token to token feature on Android

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example model_config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b-genaitransformer.json.

Note

Use LLaMA-2-7b notebook’s for generating AR-N models.

adb shell mkdir -p /data/local/tmp/
adb push <path to llama2-7b-genaitransformer.json> /data/local/tmp/
adb push <path to token file(.txt)> /data/local/tmp/

# open adb shell
adb shell
export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH
export ADSP_LIBRARY_PATH=$LD_LIBRARY_PATH

cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to llama2-7b-genaitransformer-htp-kv-share.json>
                -tok <path to token file(.txt)>
# Example tokenfile.txt
24948 592 1048 15146 2055

Model inference using token to token feature on Windows

Open Developer PowerShell for VS2022 on Windows on Snapdragon host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Windows host
cd <QNN_SDK_ROOT>\bin\aarch64-windows-msvc
.\genie-t2t-run.exe -c <path to cpu-model-config.json>
                    -tok <path to token file(.txt)>
# Example tokenfile.txt
24948 592 1048 15146 2055

Update sampler params tutorial

Note

Please refer to ${SDK_ROOT}/examples/Genie/configs/sampler.json for the parameters that can be updated

Genie provides the flexibility of updating a single parameter and multiple params in one API call

The APIs used for this exercise are:

GenieSamplerConfig_createFromJson

GenieDialog_getSampler

GenieSamplerConfig_setParam

GenieDialogSampler_applyConfig

Example on how to update sampler parameters in between queries

 1# Create dialog config
 2GenieDialogConfig_Handle_t dialogConfigHandle = NULL;
 3GenieDialogConfig_createFromJson(dialogConfigStr, &dialogConfigHandle);
 4
 5# Create dialog
 6GenieDialog_Handle_t dialogHandle = NULL;
 7GenieDialog_create(dialogConfigHandle, &dialogHandle);
 8
 9# Query with original config
10GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)
11
12# Get dialog sampler handle
13GenieDialogSampler_Handle_t samplerHandle = NULL;
14GenieDialog_getSampler(dialogHandle, &samplerHandle);
15
16# Create sampler config with a new sampler config
17GenieSamplerConfig_Handle_t samplerConfigHandle = NULL;
18GenieSamplerConfig_createFromJson(samplerConfigStr, &samplerConfigHandle);
19
20# Apply the new sampler config
21GenieDialogSampler_applyConfig(samplerHandle, samplerConfigHandle);
22
23# Query with updated config
24GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)
25
26# Update single parameter
27GenieSamplerConfig_setParam(samplerConfigHandle, "top-p", "0.8");
28GenieSamplerConfig_setParam(samplerConfigHandle, "top-k", "30");
29
30# Apply the new sampler config
31GenieDialogSampler_applyConfig(samplerHandle, samplerConfigHandle);
32
33# Query with updated config
34GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)
35
36# Update multiple parameters(top-k and top-p)
37std::string valueStr = "\"sampler\" : {\n      \"top-k\" : 20,\n      \"top-p\" : 0.75\n } "
38GenieSamplerConfig_setParam(samplerConfigHandle, "", valueStr.c_str());
39
40# Apply the new sampler config
41GenieDialogSampler_applyConfig(samplerHandle, samplerConfigHandle);
42
43# Query with updated config
44GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)