QNN Gen AI Transformer¶
The following tutorial demonstrates running the Llama 2 7B model on the QNN Gen AI Transformer backend using genie-t2t-run.
The Genie provided QNN GenAITransformer backend leverages the QNN op package interface to represent an entire Llama 2 7B
model as a single op. It runs the inference on the host CPU. The model execution engine is provided via the
QnnGenAiTransformerCpuOpPkg op package library. Genie packages a prebuilt QnnGenAiTransformerModel model library. The
corresponding source for this model library can be found at ${SDK_ROOT}/examples/Genie/Model/model.cpp. Because the
QNN GenAITransformer backend model and op package are prebuilt, this backend uses the qnn-genai-transformer-composer
tool for preparation.
Model download¶
Download Llama-2-7b-chat-hf from https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main.