Introduction

Qualcomm® Genie is a software library created to simplify the deployment of Gen AI pipelines at the connected edge. Genie enables running generative transformer models on the Qualcomm AI Runtime stack, for example, on Qualcomm AI Engine Direct (QNN). Support is currently limited to large language models. Genie queries support execution on the QNN HTP backend, QNN GPU backend, as well as on Snapdragon CPU with the Genie provided QNN GenAiTransformer backend.

Note

Qualcomm® Gen AI Inference Extensions is also referred to as Genie in the source code and documentation.

You should start by reading the tutorial section to understand how to use Genie.

After you have learned the general workflow, you can:

  1. Configure Genie for different types of AI workflows (Ex. Dialog, Pipeline, etc).

  2. Implement an application to execute inferences on your target device using Genie’s API (similar to genie-t2t-run).

  3. Optimize your model’s performance and accuracy with various techniques.

For reference docs on these topics, see the Library section of the docs. It contains example configuration files and details about how to use the Genie API for your application.