QNN GPU - Llama 2 7B - Android¶
The following tutorial demonstrates running a Llama 2 7B basic dialog on Android with the the QNN GPU backend using genie-t2t-run.
Note
This section assumes that the QNN GPU context binaries have been obtained via the QAIRT SDK workflow.
Dialog JSON configuration¶
See Genie Dialog JSON configuration string for details on the fields and what
they mean. An example JSON config for this tutorial can be found at
${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b/llama2-7b-gpu.json. Note that the tokenizer path and
context binary fields will need to be updated based on your actual preparation steps.
Inference¶
To run on QNN GPU backend, open a command shell on android and run the following.
adb shell mkdir -p /data/local/tmp/
adb push ${QNN_SDK_ROOT}/bin/aarch64-android/genie-t2t-run /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libGenie.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGpu.so /data/local/tmp/
adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnSystem.so /data/local/tmp/
adb push <path to llama2-7b-gpu.json> /data/local/tmp/
adb push <path to tokenizer.json> /data/local/tmp/
adb push <path to model bin file> /data/local/tmp/
# open adb shell
adb shell
export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=/data/local/tmp/:$PATH
cd /data/local/tmp/
./genie-t2t-run -c <path to llama2-7b-gpu.json> -p "Tell me about Qualcomm"