QNN Gen AI Transformer

The following tutorial demonstrates running the Llama 2 7B model on the QNN Gen AI Transformer backend using genie-t2t-run.

The Genie provided QNN GenAITransformer backend leverages the QNN op package interface to represent an entire Llama 2 7B model as a single op. It runs the inference on the host CPU. The model execution engine is provided via the QnnGenAiTransformerCpuOpPkg op package library. Genie packages a prebuilt QnnGenAiTransformerModel model library. The corresponding source for this model library can be found at ${SDK_ROOT}/examples/Genie/Model/model.cpp. Because the QNN GenAITransformer backend model and op package are prebuilt, this backend uses the qnn-genai-transformer-composer tool for preparation.

Model download

Download Llama-2-7b-chat-hf from https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main.

Preparation tutorials

Choose the offline preparation host platform: