Windows

The following tutorial will walk users through offline preparation and inference on the QNN Gen AI Transformer backend on the Windows platform.

Offline Preparation

The QNN Gen AI Transformer backend uses the qnn-genai-transformer-composer utility to prepare models for inference.

Open Developer PowerShell for VS2022 on a Windows host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Windows host
cd ${QNN_SDK_ROOT}>\bin\x86_64-windows-msvc
python .\qnn-genai-transformer-composer --quantize Z4
                                        --outfile <output filename with complete path>.bin
                                        --model <path-to-downloaded-LLama-model-directory>

Inference

Open Developer PowerShell for VS2022 on Windows on Snapdragon host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Windows host
cd <QNN_SDK_ROOT>\bin\aarch64-windows-msvc
.\genie-t2t-run.exe -c <path to llama2-7b-genaitransformer.json> -p "Tell me about Qualcomm"