How to Use Genie¶
This tutorial helps you download and configure example code that automates preparing, converting, and using an LLM on a target device with the Genie SDK. Throughout the tutorial we will refer to this example code as the “Jupyter Notebooks” as that is how they are implemented. These code-based tutorials consist of three notebooks that must be run in order:
Step 1: Preparing your model - This notebook helps quantize your model and prepare it for execution on the target device. This uses Qualcomm’s AIMET library to do the Quantization. This is also where optimizations like LoRA are done to improve the accuracy / efficiency of your model.
Step-2: Converting your model - The second notebook takes your optimized model and uses the QNN SDK (aka the ”AI Engine Direct SDK”) to build it into a format that your target device can execute.
Step-3: Configuring Genie and executing your model - The final notebook shows you how to use the Genie specific settings and configuration files at runtime with your model.
This tutorial currently focuses on explaining how to configure the Genie specific settings in Step-3, but it will also provide some guidance on how to complete Step-1 and Step-2 in order to have the proper files for Step-3.
These files are downloadable from Qualcomm’s Package Manager (QPM). There are many different Jupyter Notebooks which can help with various use cases.
Note
If you run into any problems, you can ask for help in the Developer Discord.
Pre-requisites¶
Begin the tutorial by accessing your Linux host machine.
This tutorial assumes you are working on a Linux host machine (where the Jupyter Notebook code will run) with an Android target device (where your model will execute).
Based on your situation you may have to modify these steps, and the tutorial will try to call out where those changes will be needed.
Note
If you are using a Windows machine, you can use WSL to get Linux-like syntax or adapt the commands to PowerShell syntax (ex. using an AI like ChatGPT). If you are using WSL, install the Linux version of files from QPM.
You will need your host machine to be connected to your target device for running the steps in the Step-3 Notebook. If you are using a different target device than Android, you will likely need to change how you move files onto the target devices (ex. using ssh instead of adb).
Part 1: Installing Qualcomm’s Package Manager (QPM3)¶
Warning
If you already have QPM3 installed, skip this section. We need this package manager to extract the downloaded tutorial files we will download later.
Open a terminal on your host machine (where you are doing your dev work).
Check whether you have qpm-cli installed by running:
qpm-cli --helpIf you have qpm-cli installed, skip to Part 2: Downloading the GenAI Tutorial.
If you do NOT have
qpm-cliinstalled, click this link and sign into Qualcomm Package Manager 3 on the browser.If you do not have an account, you can create one by clicking “Sign up” in the bottom right corner and following the account creation steps.
Warning
After signing in, the site may show an error like “502 bad gateway”, the login will still be successful and you can continue to the next step.
Click this link after signing in to go directly to the QPM3 desktop download: https://qpm.qualcomm.com/#/main/tools/details/QPM3
Download the version of QPM3 which matches your host machine’s OS.
If you are using another dev machine via
sshor some other connection, make sure you download the version that corresponds to your connected device and you can transfer the executable file usingscp.
Start QPM3 by running the QPM3 executable / downloaded file.
It should have a name similar to
QPM.3.0.92.3.Windows-AnyCPU.exeif you are on Windows, with a different extension for Linux (ex..deb).
This will open an installation wizard, follow all steps it presents to install QPM3.
This will also install the
qpm-clitool for installing packages via the CLI.
Wait for the installation to complete (this can take ~5-10 minutes).
Verify the new install was successful by running:
qpm-cli --helpYou should see a list of options for
qpm-cliif it was installed properly.
Part 2: Downloading the GenAI Tutorial (GAIT)¶
Check if you have Python installed by running:
python3 --versionIf you do not have Python 3.10, you can install it by running:
sudo apt-get update && sudo apt-get install python3.10 python3-distutils libpython3.10
Install Jupyter Notebook if you do not have it already by running:
pip install notebook
Login to the QPM3 CLI by replacing <username> in the below command with your QualcommID username, then running the command:
qpm-cli --login <username>
Run this command to set which directory our tutorial files will live in.
Note
You can modify this path, but
/tmp/genie_tutorialis a sensible default.export TUTORIAL_DIR="/tmp/genie_tutorial"
Create the folder by running:
mkdir -p $TUTORIAL_DIR
Activate the license so we have permission to download the tutorial files by running:
qpm-cli --license-activate Tutorial_for_Llama_3_x
Install the tutorial files by running:
qpm-cli --install Tutorial_for_Llama_3_x --path $TUTORIAL_DIR
Note
You will have to confirm during the download by typing
y.Verify the files were installed properly by running:
ls $TUTORIAL_DIR
You should see
commonandmodelfolders, along with a few other files.Navigate to the newly installed files by running:
cd $TUTORIAL_DIR
Part 3: Read and run the Step-1 Notebook¶
This Notebook helps quantize and prepare your model. This is where optimizations such as LoRA take place. It helps show how you can use AIMET to create an efficient representation of your model, with fine-tuned values.
Read the overall README by running:
cat $TUTORIAL_DIR/model/README.md
Follow the instructions in that README for configurations you must do.
Read the README.md for Notebook 1 located at:
cat $TUTORIAL_DIR/model/Step-1/README.md
Follow the instructions within the Step-1 README.md
This will help you install necessary dependencies, set up a docker container, and prepare your model for the Step-2 notebook.
This will also walk you through running the notebook
Part 4: Run the Step-2 Notebook¶
These will help you use the QNN SDK to prepare your model to be executed on a specific target device (ex. Android phone with a GPU). By the end, you should have one or more context binary files that can be used in the Step-3 Notebook to run the model on the target device.
Read the README for Step-2
cat $TUTORIAL_DIR/model/Step-2/host_linux/README.md
Follow the steps in the README for Step-2
Follow any instructions within the notebook file (
$TUTORIAL_DIR/Step-2/host_linux/qnn_model_compile.ipynb) and run each frame in order.
The end result should be context binaries that we will copy into the right folder for Step-3’s notebook.
Part 5: Prepare the config file¶
In order to run the Step-3 notebook (which contains the Genie-specific logic) we need to create several configuration files. These will tell the notebook where to find dependencies such as the QAIRT SDK, configure basic information like where to output the files, etc.
Check if you have
QNN_SDK_ROOTset:echo $QNN_SDK_ROOT
If
QNN_SDK_ROOTis not set:This variable should have been set during the Step-1 Notebook setup, but may have been unset if you are using a new terminal.
If you do not know what the QAIRT SDK is, you can install it by clicking “Get Software” on this page.
Navigate to
qairt/<QNN_SDK_ROOT_LOCATION>/bin(Ex.qairt/2.22.6.240515/bin)Run
source ./envsetup.shto set the environment variable.Note
These changes will only apply to the current terminal instance.
Verify that
QNN_SDK_ROOTis now set. If this does not work you can manually set the value to be the location of<QNN_SDK_ROOT_LOCATION>in the above path.
Set the folder where you want to put the outputs of running Part-2 by running:
export WORKING_DIR="$TUTORIAL_DIR/working_dir"
Set the
BINARIES_PATHvariable and create the folder structure within your working directory by running:export BINARIES_PATH="$WORKING_DIR/artifacts/serialized_binaries" mkdir -p $BINARIES_PATH
Copy ALL serialized binary files from running Part-2 (ex.
weight_sharing_model_x_of_y.serialized.bin) into your working directory by replacing<folder-with-binaries>and running the command:cp <folder-with-binaries>/* $BINARIES_PATH
See what your working directory looks like by running:
tree $WORKING_DIR
This should an output similar to this (with potentially a different number of serialized.bin files):
/tmp/genie_tutorial/working_dir ├── artifacts │ └── serialized_binaries │ ├── weight_sharing_model_1_of_5.serialized.bin │ ├── weight_sharing_model_2_of_5.serialized.bin │ ├── weight_sharing_model_3_of_5.serialized.bin │ ├── weight_sharing_model_4_of_5.serialized.bin │ └── weight_sharing_model_5_of_5.serialized.bin 2 directories, 5 files
Set the target directory where you want the output artifacts from Part-3 to be saved:
export TARGET_DIR="/data/local/tmp/$(whoami)/genie_tutorial"
Check if you have
adbinstalled by running:Note
This tutorial assumes you are working with Android target devices. If you are using a target device with a different OS (ex. Linux), you should use ssh instead of
adbto connect, and may need to modify the code in the notebooks which usesadb.adb --helpYou should see many options for
adbif it is installed.Warning
If
adbis NOT installed, you can install it by downloading the Android NDK onto your host machine. See the Setup steps for QNN Part 5 specifically for Android for more details on how to install the Android NDK.Connect your target device to your host machine (ex. using a USB connection).
Check the device ID of your target device by running:
adb devicesYou should see your target device and the ID. The ID will look something like d925310.
Update
<YOUR_TARGET_DEVICE_ID>below with the device ID from your target device and run the command:export TARGET_DEVICE_ID="<YOUR_TARGET_DEVICE_ID>"
Run the following command which combines all the values we have set so far into a single configuration file:
echo '{ "aimet_path": ".", "qnn_sdk_path": "'${QNN_SDK_ROOT}'", "export_dir": "'${WORKING_DIR}'", "target_dir": "'${TARGET_DIR}'", "device_id": "'${TARGET_DEVICE_ID}'", "adb_executable": "adb" }' > "$TUTORIAL_DIR/model/config/notebookconfig.json"
Note
We are hardcoding the
adb_executableto be adb for this tutorial. We are also settingaimet_pathto be the current directory, which is where it should be following Part-1. If you run into an aimet related error though, you may need to modify this in yourmodel/config/notebookconfig.jsonCheck to see that your
notebookconfig.jsonwas set correctly by running:cat $TUTORIAL_DIR/model/config/notebookconfig.json
You should see the values you set previously for each field in the config file.
Part 6: Modifying the Genie config.json¶
Next, we need to configure Genie in order for it to work with our serialized binaries. In this part of the tutorial, we will create and update a Genie JSON configuration file.
Note
If you are using a model other than Llama 3, you will need to use Genie’s docs to understand what each field in the config file is for in order to modify it to your situation. Keep in mind that you may need to copy files from your model (ex. tokenizer.json) into the config folder.
Set a variable for the path to the Genie config folder by running:
export GENIE_CONFIG_FOLDER="$TUTORIAL_DIR/model/config/genie"
Set a variable for the path to our Genie configuration JSON file by running:
export GENIE_CONFIG_FILE="$GENIE_CONFIG_FOLDER/config.json"
Create the Genie config file by running:
touch $GENIE_CONFIG_FILE
Open the newly created config file, for example using
vim:vim $GENIE_CONFIG_FILE
Copy the following template config file into your Genie config file:
Note
If you want to configure Genie differently, you can find other example configurations in the Genie docs’ Library section (by clicking into the features you are interested in seeing configurations for).
{ "dialog": { "version": 1, "type": "basic", "context": { "version": 1, "size": 4096, "n-vocab": 128256, "bos-token": 128000, "eos-token": 128001, "eot-token": 128009 }, "sampler": { "version": 1, "seed": 42, "temp": 0.8, "top-k": 40, "top-p": 0.95 }, "tokenizer": { "version": 1, "path": "<your/path/to/tokenizer_file.json>" }, "engine": { "version": 1, "n-threads": 3, "backend": { "version": 1, "type": "QnnHtp", "QnnHtp": { "version": 1, "use-mmap": true, "spill-fill-bufsize": 0, "mmap-budget": 0, "poll": true, "pos-id-dim": 64, "cpu-mask": "0xe0", "kv-dim": 128, "rope-theta": 10000 } }, "extensions": "htp_backend_ext_config.json" }, "model": { "version": 1, "type": "binary", "binary": { "version": 1, "ctx-bins": [ "<your-serialized.bin-files-listed-one-after-the-other>" ] } } } }
Update the
"tokenizer" > "path"variable with an absolute path to your tokenizer.Note
You can search for a
tokenizer.jsonfile by runningfind . -type f -name 'tokenizer.json'. Keep in mind that different models may have slightly different file names for their tokenizer info file.Ex. Replacing
"<your/path/to/tokenizer_file.json>"with"/tmp/usr/model/tokenizer.json"Look up the names of your context binary files by running:
ls $BINARIES_PATH
You should see a list of context binary file names such as:
weight_sharing_model_1_of_5.serialized.bin weight_sharing_model_2_of_5.serialized.bin weight_sharing_model_3_of_5.serialized.bin weight_sharing_model_4_of_5.serialized.bin weight_sharing_model_5_of_5.serialized.bin
Update
"model" > "binary" > "ctx-bins"with a list of each context binary name in a list of strings.For example:
"model": { "version": 1, "type": "binary", "binary": { "version": 1, "ctx-bins": [ "weight_sharing_model_1_of_5.serialized.bin", "weight_sharing_model_2_of_5.serialized.bin", "weight_sharing_model_3_of_5.serialized.bin", "weight_sharing_model_4_of_5.serialized.bin", "weight_sharing_model_5_of_5.serialized.bin" ] } }
We will need to update the Jupyter Notebook for Part 3 to point to our Genie configuration as part of the
genie-t2t-runcommand, but we will explain how to do that later once we have opened the Part-3 Notebook.
Part 7: Create and activate the Python virtual environment¶
Run
cd $TUTORIAL_DIR/model/Step-2/host_linux/Create a Python virtual environment: run
python -m venv venvWarning
If you receive a Permission Denied error, run
ls -land check whether venv already exists. If it already exists, you can skip to the next step.Activate the virtual environment by running
. venv/bin/activate
Part 8: Open the notebook¶
Set the path to the Step-3 Notebook folder by running:
export STEP_3_FOLDER="$TUTORIAL_DIR/model/Step-3/host_linux_target_android_without_native/"
Note
If you are using other tutorial notebooks, you may need to specify a different folder after
Step-3as that folder name changes from tutorial to tutorial. In this case, the provided notebook expects to be running on a linux host, with an android target that is not using native Linux features.Navigate to the Step 3 Notebook by running:
cd $STEP_3_FOLDER
Run the following command to install all dependencies:
python3 -m pip install --upgrade pip pip install -r ../../requirements.txt pip install -r requirements.txt
Open the Jupyter Notebook by running:
jupyter notebook --ip=* --no-browser --allow-root &
If this succeeds it will share multiple ways you can see / interact with the Jupyter Notebook in the terminal.
Pick one of the options to see the Jupyter Notebook in the browser.
Part 9: Run the notebook¶
Once you’ve opened the URL in your browser, click on
qnn_model_execute.ipynb.You should see a Notebook that looks like this.
Search for where the Genie config file is specified, and set it to the value of your
$GENIE_CONFIG_FILE.You can run
echo $GENIE_CONFIG_FILEin your terminal to recall what the value you set for that was.
In the top toolbar, click Run > Run All Cells.
Wait until the Notebook completes running before you click on anything.
Note
The last two steps can take a long time to run, and look like they’ve failed but are executing in the background.
Note
Click on the progress wheel in the top right corner of the toolbar (beside Python 3 (ipykernel)) to see which cells are still running.
When the Notebook is completed, scroll to the results of the final
llama3.execute.run()command. You should see [Prompt] and [Begin] lines near the bottom of the results.
Summary¶
You have now successfully used Genie from start to finish! In order to apply this to your own model and situation going forward, you can use this as a starting point and tweak the notebooks and configuration files to your situation.
To use a different model, consider starting from different notebooks by searching for “Generative” in QPM3 and expanding all results. That will show all notebook tutorials that have been released to date.
To quantize and fine-tune your model via Quantization or LoRA using AIMET, modify the Part-1 Notebook.
To change how you are preparing the serialized binaries (ex. to spread them out into smaller files that are easier to load into memory), configure and modify the Part-2 Notebook.
To change how Genie is used on your target device, configure and modify the Part-3 Notebook.
You can also leverage Genie’s profiling and benchmarking capabilities for further optimization of your model’s performance.
If you have any questions, you can ask in the Developer Discord!
