QNN GPU - Llama 2 7B - Android

The following tutorial demonstrates running a Llama 2 7B basic dialog on Android with the the QNN GPU backend using genie-t2t-run.

Note

This section assumes that the QNN GPU context binaries have been obtained via the QAIRT SDK workflow.

Dialog JSON configuration

See Genie Dialog JSON configuration string for details on the fields and what they mean. An example JSON config for this tutorial can be found at ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b/llama2-7b-gpu.json. Note that the tokenizer path and context binary fields will need to be updated based on your actual preparation steps.

Inference

To run on QNN GPU backend, open a command shell on android and run the following.

adb shell mkdir -p /data/local/tmp/
adb push ${QNN_SDK_ROOT}/bin/aarch64-android/genie-t2t-run /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libGenie.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGpu.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnSystem.so /data/local/tmp/
adb push <path to llama2-7b-gpu.json> /data/local/tmp/
adb push <path to tokenizer.json> /data/local/tmp/
adb push <path to model bin file> /data/local/tmp/

# open adb shell
adb shell

export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=/data/local/tmp/:$PATH

cd /data/local/tmp/
./genie-t2t-run -c <path to llama2-7b-gpu.json> -p "Tell me about Qualcomm"