Preparation on Linux

The QNN Gen AI Transformer uses the qnn-genai-transformer-composer utility to prepare models for inference.

Preparation

Open a command shell on Linux host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Linux host
cd ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/
./qnn-genai-transformer-composer --quantize Z4
                                 --outfile <output filename with complete path>.bin
                                 --model <path-to-downloaded-LLama-model-directory>

Dialog JSON Configuration

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example config can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b/llama2-7b-genaitransformer.json. Note that the tokenizer path and model bin fields will need to be updated based on your actual preparation steps.

Inference

Choose your target platform for inference: