How to Use Genie

This tutorial helps you download and configure example code that automates preparing, converting, and using an LLM on a target device with the Genie SDK. Throughout the tutorial we will refer to this example code as the “Jupyter Notebooks” as that is how they are implemented. These code-based tutorials consist of three notebooks that must be run in order:

  1. Step 1: Preparing your model - This notebook helps quantize your model and prepare it for execution on the target device. This uses Qualcomm’s AIMET library to do the Quantization. This is also where optimizations like LoRA are done to improve the accuracy / efficiency of your model.

  2. Step-2: Converting your model - The second notebook takes your optimized model and uses the QNN SDK (aka the ”AI Engine Direct SDK”) to build it into a format that your target device can execute.

  3. Step-3: Configuring Genie and executing your model - The final notebook shows you how to use the Genie specific settings and configuration files at runtime with your model.

This tutorial currently focuses on explaining how to configure the Genie specific settings in Step-3, but it will also provide some guidance on how to complete Step-1 and Step-2 in order to have the proper files for Step-3.

These files are downloadable from Qualcomm’s Package Manager (QPM). There are many different Jupyter Notebooks which can help with various use cases.

Note

If you run into any problems, you can ask for help in the Developer Discord.

Pre-requisites

Begin the tutorial by accessing your Linux host machine.

This tutorial assumes you are working on a Linux host machine (where the Jupyter Notebook code will run) with an Android target device (where your model will execute).

Based on your situation you may have to modify these steps, and the tutorial will try to call out where those changes will be needed.

Note

If you are using a Windows machine, you can use WSL to get Linux-like syntax or adapt the commands to PowerShell syntax (ex. using an AI like ChatGPT). If you are using WSL, install the Linux version of files from QPM.

You will need your host machine to be connected to your target device for running the steps in the Step-3 Notebook. If you are using a different target device than Android, you will likely need to change how you move files onto the target devices (ex. using ssh instead of adb).

Part 1: Installing Qualcomm’s Package Manager (QPM3)

Warning

If you already have QPM3 installed, skip this section. We need this package manager to extract the downloaded tutorial files we will download later.

  1. Open a terminal on your host machine (where you are doing your dev work).

  2. Check whether you have qpm-cli installed by running:

    qpm-cli --help
    
  3. If you have qpm-cli installed, skip to Part 2: Downloading the GenAI Tutorial.

  4. If you do NOT have qpm-cli installed, click this link and sign into Qualcomm Package Manager 3 on the browser.

    1. If you do not have an account, you can create one by clicking “Sign up” in the bottom right corner and following the account creation steps.

    Warning

    After signing in, the site may show an error like “502 bad gateway”, the login will still be successful and you can continue to the next step.

  5. Click this link after signing in to go directly to the QPM3 desktop download: https://qpm.qualcomm.com/#/main/tools/details/QPM3

  6. Download the version of QPM3 which matches your host machine’s OS.

    1. If you are using another dev machine via ssh or some other connection, make sure you download the version that corresponds to your connected device and you can transfer the executable file using scp.

  7. Start QPM3 by running the QPM3 executable / downloaded file.

    1. It should have a name similar to QPM.3.0.92.3.Windows-AnyCPU.exe if you are on Windows, with a different extension for Linux (ex. .deb).

  8. This will open an installation wizard, follow all steps it presents to install QPM3.

    1. This will also install the qpm-cli tool for installing packages via the CLI.

  9. Wait for the installation to complete (this can take ~5-10 minutes).

  10. Verify the new install was successful by running:

    qpm-cli --help
    

    You should see a list of options for qpm-cli if it was installed properly.

Part 2: Downloading the GenAI Tutorial (GAIT)

  1. Check if you have Python installed by running:

    python3 --version
    
  2. If you do not have Python 3.10, you can install it by running:

    sudo apt-get update && sudo apt-get install python3.10 python3-distutils libpython3.10
    
  3. Install Jupyter Notebook if you do not have it already by running:

    pip install notebook
    
  4. Login to the QPM3 CLI by replacing <username> in the below command with your QualcommID username, then running the command:

    qpm-cli --login <username>
    
  5. Run this command to set which directory our tutorial files will live in.

    Note

    You can modify this path, but /tmp/genie_tutorial is a sensible default.

    export TUTORIAL_DIR="/tmp/genie_tutorial"
    
  6. Create the folder by running:

    mkdir -p $TUTORIAL_DIR
    
  7. Activate the license so we have permission to download the tutorial files by running:

    qpm-cli --license-activate Tutorial_for_Llama_3_x
    
  8. Install the tutorial files by running:

    qpm-cli --install Tutorial_for_Llama_3_x --path $TUTORIAL_DIR
    

    Note

    You will have to confirm during the download by typing y.

  9. Verify the files were installed properly by running:

    ls $TUTORIAL_DIR
    

    You should see common and model folders, along with a few other files.

  10. Navigate to the newly installed files by running:

    cd $TUTORIAL_DIR
    

Part 3: Read and run the Step-1 Notebook

This Notebook helps quantize and prepare your model. This is where optimizations such as LoRA take place. It helps show how you can use AIMET to create an efficient representation of your model, with fine-tuned values.

  1. Read the overall README by running:

    cat $TUTORIAL_DIR/model/README.md
    
  2. Follow the instructions in that README for configurations you must do.

  3. Read the README.md for Notebook 1 located at:

    cat $TUTORIAL_DIR/model/Step-1/README.md
    
  4. Follow the instructions within the Step-1 README.md

    1. This will help you install necessary dependencies, set up a docker container, and prepare your model for the Step-2 notebook.

    2. This will also walk you through running the notebook

Part 4: Run the Step-2 Notebook

These will help you use the QNN SDK to prepare your model to be executed on a specific target device (ex. Android phone with a GPU). By the end, you should have one or more context binary files that can be used in the Step-3 Notebook to run the model on the target device.

  1. Read the README for Step-2

    cat $TUTORIAL_DIR/model/Step-2/host_linux/README.md
    
  2. Follow the steps in the README for Step-2

  3. Follow any instructions within the notebook file ($TUTORIAL_DIR/Step-2/host_linux/qnn_model_compile.ipynb) and run each frame in order.

The end result should be context binaries that we will copy into the right folder for Step-3’s notebook.

Part 5: Prepare the config file

In order to run the Step-3 notebook (which contains the Genie-specific logic) we need to create several configuration files. These will tell the notebook where to find dependencies such as the QAIRT SDK, configure basic information like where to output the files, etc.

  1. Check if you have QNN_SDK_ROOT set:

    echo $QNN_SDK_ROOT
    
  2. If QNN_SDK_ROOT is not set:

    1. This variable should have been set during the Step-1 Notebook setup, but may have been unset if you are using a new terminal.

      1. If you do not know what the QAIRT SDK is, you can install it by clicking “Get Software” on this page.

    2. Navigate to qairt/<QNN_SDK_ROOT_LOCATION>/bin (Ex. qairt/2.22.6.240515/bin)

    3. Run source ./envsetup.sh to set the environment variable.

      1. Note

        These changes will only apply to the current terminal instance.

    4. Verify that QNN_SDK_ROOT is now set. If this does not work you can manually set the value to be the location of <QNN_SDK_ROOT_LOCATION> in the above path.

  3. Set the folder where you want to put the outputs of running Part-2 by running:

    export WORKING_DIR="$TUTORIAL_DIR/working_dir"
    
  4. Set the BINARIES_PATH variable and create the folder structure within your working directory by running:

    export BINARIES_PATH="$WORKING_DIR/artifacts/serialized_binaries"
    mkdir -p $BINARIES_PATH
    
  5. Copy ALL serialized binary files from running Part-2 (ex. weight_sharing_model_x_of_y.serialized.bin) into your working directory by replacing <folder-with-binaries> and running the command:

    cp <folder-with-binaries>/* $BINARIES_PATH
    
  6. See what your working directory looks like by running:

    tree $WORKING_DIR
    

    This should an output similar to this (with potentially a different number of serialized.bin files):

    /tmp/genie_tutorial/working_dir
    ├── artifacts
    │   └── serialized_binaries
    │       ├── weight_sharing_model_1_of_5.serialized.bin
    │       ├── weight_sharing_model_2_of_5.serialized.bin
    │       ├── weight_sharing_model_3_of_5.serialized.bin
    │       ├── weight_sharing_model_4_of_5.serialized.bin
    │       └── weight_sharing_model_5_of_5.serialized.bin
    
    2 directories, 5 files
    
  7. Set the target directory where you want the output artifacts from Part-3 to be saved:

    export TARGET_DIR="/data/local/tmp/$(whoami)/genie_tutorial"
    
  8. Check if you have adb installed by running:

    Note

    This tutorial assumes you are working with Android target devices. If you are using a target device with a different OS (ex. Linux), you should use ssh instead of adb to connect, and may need to modify the code in the notebooks which uses adb.

    adb --help
    

    You should see many options for adb if it is installed.

    Warning

    If adb is NOT installed, you can install it by downloading the Android NDK onto your host machine. See the Setup steps for QNN Part 5 specifically for Android for more details on how to install the Android NDK.

  9. Connect your target device to your host machine (ex. using a USB connection).

  10. Check the device ID of your target device by running:

    adb devices
    

    You should see your target device and the ID. The ID will look something like d925310.

  11. Update <YOUR_TARGET_DEVICE_ID> below with the device ID from your target device and run the command:

    export TARGET_DEVICE_ID="<YOUR_TARGET_DEVICE_ID>"
    
  12. Run the following command which combines all the values we have set so far into a single configuration file:

    echo '{
    "aimet_path": ".",
    "qnn_sdk_path": "'${QNN_SDK_ROOT}'",
    "export_dir": "'${WORKING_DIR}'",
    "target_dir": "'${TARGET_DIR}'",
    "device_id": "'${TARGET_DEVICE_ID}'",
    "adb_executable": "adb"
    }' > "$TUTORIAL_DIR/model/config/notebookconfig.json"
    

    Note

    We are hardcoding the adb_executable to be adb for this tutorial. We are also setting aimet_path to be the current directory, which is where it should be following Part-1. If you run into an aimet related error though, you may need to modify this in your model/config/notebookconfig.json

  13. Check to see that your notebookconfig.json was set correctly by running:

    cat $TUTORIAL_DIR/model/config/notebookconfig.json
    

    You should see the values you set previously for each field in the config file.

Part 6: Modifying the Genie config.json

Next, we need to configure Genie in order for it to work with our serialized binaries. In this part of the tutorial, we will create and update a Genie JSON configuration file.

Note

If you are using a model other than Llama 3, you will need to use Genie’s docs to understand what each field in the config file is for in order to modify it to your situation. Keep in mind that you may need to copy files from your model (ex. tokenizer.json) into the config folder.

  1. Set a variable for the path to the Genie config folder by running:

    export GENIE_CONFIG_FOLDER="$TUTORIAL_DIR/model/config/genie"
    
  2. Set a variable for the path to our Genie configuration JSON file by running:

    export GENIE_CONFIG_FILE="$GENIE_CONFIG_FOLDER/config.json"
    
  3. Create the Genie config file by running:

    touch $GENIE_CONFIG_FILE
    
  4. Open the newly created config file, for example using vim:

    vim $GENIE_CONFIG_FILE
    
  5. Copy the following template config file into your Genie config file:

    Note

    If you want to configure Genie differently, you can find other example configurations in the Genie docs’ Library section (by clicking into the features you are interested in seeing configurations for).

    {
      "dialog": {
        "version": 1,
        "type": "basic",
        "context": {
          "version": 1,
          "size": 4096,
          "n-vocab": 128256,
          "bos-token": 128000,
          "eos-token": 128001,
          "eot-token": 128009
        },
        "sampler": {
          "version": 1,
          "seed": 42,
          "temp": 0.8,
          "top-k": 40,
          "top-p": 0.95
        },
        "tokenizer": {
          "version": 1,
          "path": "<your/path/to/tokenizer_file.json>"
        },
        "engine": {
          "version": 1,
          "n-threads": 3,
          "backend": {
            "version": 1,
            "type": "QnnHtp",
            "QnnHtp": {
              "version": 1,
              "use-mmap": true,
              "spill-fill-bufsize": 0,
              "mmap-budget": 0,
              "poll": true,
              "pos-id-dim": 64,
              "cpu-mask": "0xe0",
              "kv-dim": 128,
              "rope-theta": 10000
            }
          },
          "extensions": "htp_backend_ext_config.json"
        },
        "model": {
          "version": 1,
          "type": "binary",
          "binary": {
            "version": 1,
            "ctx-bins": [
              "<your-serialized.bin-files-listed-one-after-the-other>"
            ]
          }
        }
      }
    }
    
  6. Update the "tokenizer" > "path" variable with an absolute path to your tokenizer.

    Note

    You can search for a tokenizer.json file by running find . -type f -name 'tokenizer.json'. Keep in mind that different models may have slightly different file names for their tokenizer info file.

    Ex. Replacing "<your/path/to/tokenizer_file.json>" with "/tmp/usr/model/tokenizer.json"

  7. Look up the names of your context binary files by running:

    ls $BINARIES_PATH
    

    You should see a list of context binary file names such as:

    weight_sharing_model_1_of_5.serialized.bin
    weight_sharing_model_2_of_5.serialized.bin
    weight_sharing_model_3_of_5.serialized.bin
    weight_sharing_model_4_of_5.serialized.bin
    weight_sharing_model_5_of_5.serialized.bin
    
  8. Update "model" > "binary" > "ctx-bins" with a list of each context binary name in a list of strings.

    1. For example:

    "model": {
        "version": 1,
        "type": "binary",
        "binary": {
            "version": 1,
            "ctx-bins": [
                "weight_sharing_model_1_of_5.serialized.bin",
                "weight_sharing_model_2_of_5.serialized.bin",
                "weight_sharing_model_3_of_5.serialized.bin",
                "weight_sharing_model_4_of_5.serialized.bin",
                "weight_sharing_model_5_of_5.serialized.bin"
            ]
        }
    }
    
  9. We will need to update the Jupyter Notebook for Part 3 to point to our Genie configuration as part of the genie-t2t-run command, but we will explain how to do that later once we have opened the Part-3 Notebook.

Part 7: Create and activate the Python virtual environment

  1. Run cd $TUTORIAL_DIR/model/Step-2/host_linux/

  2. Create a Python virtual environment: run python -m venv venv

    Warning

    If you receive a Permission Denied error, run ls -l and check whether venv already exists. If it already exists, you can skip to the next step.

  3. Activate the virtual environment by running . venv/bin/activate

Part 8: Open the notebook

  1. Set the path to the Step-3 Notebook folder by running:

    export STEP_3_FOLDER="$TUTORIAL_DIR/model/Step-3/host_linux_target_android_without_native/"
    

    Note

    If you are using other tutorial notebooks, you may need to specify a different folder after Step-3 as that folder name changes from tutorial to tutorial. In this case, the provided notebook expects to be running on a linux host, with an android target that is not using native Linux features.

  2. Navigate to the Step 3 Notebook by running:

    cd $STEP_3_FOLDER
    
  3. Run the following command to install all dependencies:

    python3 -m pip install --upgrade pip
    pip install -r ../../requirements.txt
    pip install -r requirements.txt
    
  4. Open the Jupyter Notebook by running:

    jupyter notebook --ip=* --no-browser --allow-root &
    

    If this succeeds it will share multiple ways you can see / interact with the Jupyter Notebook in the terminal.

    Jupyter Notebook launch output showing access URLs for remote interaction.
  5. Pick one of the options to see the Jupyter Notebook in the browser.

Part 9: Run the notebook

  1. Once you’ve opened the URL in your browser, click on qnn_model_execute.ipynb.

  2. You should see a Notebook that looks like this.

    The Genie execution notebook interface showing the qnn_model_execute.ipynb contents.
  3. Search for where the Genie config file is specified, and set it to the value of your $GENIE_CONFIG_FILE.

    1. You can run echo $GENIE_CONFIG_FILE in your terminal to recall what the value you set for that was.

  4. In the top toolbar, click Run > Run All Cells.

  5. Wait until the Notebook completes running before you click on anything.

    1. Note

      The last two steps can take a long time to run, and look like they’ve failed but are executing in the background.

    2. Note

      Click on the progress wheel in the top right corner of the toolbar (beside Python 3 (ipykernel)) to see which cells are still running.

  6. When the Notebook is completed, scroll to the results of the final llama3.execute.run() command. You should see [Prompt] and [Begin] lines near the bottom of the results.

    Terminal output after llama3.execute.run() with [Prompt] and [Begin] markers.

Summary

You have now successfully used Genie from start to finish! In order to apply this to your own model and situation going forward, you can use this as a starting point and tweak the notebooks and configuration files to your situation.

  1. To use a different model, consider starting from different notebooks by searching for “Generative” in QPM3 and expanding all results. That will show all notebook tutorials that have been released to date.

  2. To quantize and fine-tune your model via Quantization or LoRA using AIMET, modify the Part-1 Notebook.

  3. To change how you are preparing the serialized binaries (ex. to spread them out into smaller files that are easier to load into memory), configure and modify the Part-2 Notebook.

  4. To change how Genie is used on your target device, configure and modify the Part-3 Notebook.

You can also leverage Genie’s profiling and benchmarking capabilities for further optimization of your model’s performance.

If you have any questions, you can ask in the Developer Discord!