Genie Embedding JSON configuration string

The following sections contain information that pertain to the format of the JSON configuration string that is supplied to GenieEmbeddingConfig_createFromJson. This JSON configuration is can also be supplied to the genie-t2e-run tool.

Note

Please refer to the example configs contained in the SDK at ${SDK_ROOT}/examples/Genie/configs/.

General configuration schema

The following provides the schema of the JSON configuration format that is provided to GenieEmbeddingConfig_createFromJson. Note that dependencies are not specified in the schema, but are discussed in the following per-backend sections.

{
  "embedding" : {
    "type": "object",
    "properties": {
      "version" : {"type": "integer"},
      "context" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "ctx-size": {"type": "integer"},
          "n-vocab": {"type": "integer"},
          "embed-size": {"type": "integer"},
          "pad-token": {"type": "integer"}
        }
      },
      "prompt" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "prompt-template" : {"type": "array", "items": {"type": "string"}}
        }
      },
      "tokenizer" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "path" : {"type": "string"}
        }
      },
      "truncate-input" : {"type" : "boolean"},
      "engine" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "n-threads" : {"type": "integer"},
          "backend" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "type" : {"type": "string","enum" : ["QnnHtp", "QnnGenAiTransformer"]},
              "QnnHtp" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "spill-fill-bufsize" : {"type": "integer"},
                  "data-alignment-size" : {"type": "integer"},
                  "use-mmap" : {"type": "boolean"},
                  "allow-async-init" : {"type": "boolean"},
                  "pooled-output" : {"type": "boolean"},
                  "disable-kv-cache" : {"type": "boolean"}
                }
              },
              "QnnGenAiTransformer" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "n-layer" : {"type": "integer"},
                  "n-embd" : {"type": "integer"},
                  "n-heads" : {"type": "integer"}
                }
              },
              "extensions" : {"type": "string"}
            }
          },
          "model" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "type" : {"type": "string","enum":["binary", "library"]},
              "binary" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "ctx-bins" : {"type": "array", "items": {"type": "string"}}
                }
              },
              "library" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "model-bin" : {"type": "string"}
                }
              }
            }
          }
        }
      }
    }
  }
}

Option

Applicability

Description

embedding::version

all backends

Version of embedding object that is supported by APIs.(1)

embedding::truncate-input

all backends

To allow truncation of input, when it exceeds the context length.

context::version

all backends

Version of context object that is supported by APIs. (1)

context::ctx-size

all backends

Context length. Maximum number of tokens to process.

context::n-vocab

all backends

Model vocabulary size.

context::embed-size

all backends

Embedding length. Embedding vector length for each token.

context::pad-token

all backends

Token id for pad token.

prompt::version

all backends

Version of prompt object that is supported by APIs. (1)

prompt::prompt-template

all backends

Prefix and Suffix string that will be added to each prompt.

tokenizer::version

all backends

Version of tokenizer object that is supported by APIs. (1)

tokenizer::path

all backends

Path to tokenizer file.

engine::version

all backends

Version of engine object that is supported by APIs. (1)

engine::n-threads

all backends

Number of threads to use for KV-cache updates.

backend::version

all backends

Version of backend object that is supported by APIs. (1)

backend::type

all backends

Type of engine like “QnnHtp” for QNN HTP and “QnnGenAiTransformer” for QNN GenAITransformer backend.

backend::extensions

QNN HTP

Path to backend extensions configuration file.

QnnHtp::version

QNN HTP

Version of QnnHtp object that is supported by APIs. (1)

QnnHtp::spill-fill-bufsize

QNN HTP

Buffer size to pre-allocate for the QNN HTP spill fill. This field depends upon the HTP VTCM memory size. It should be set greater than the spill-fill required by each context binary in the model. Consult the QNN HTP backend documentation in the QAIRT SDK for more details.

QnnHtp::use-mmap

QNN HTP

Memory map the context binary files. Typically should be turned on.

QnnHtp::data-alignment-size

QNN HTP

Data will be aligned by rounding up the size to the nearest multiple of alignment number. Typically should be zero.

QnnHtp::allow-async-init

QNN HTP

Allow context binaries to be initialized asynchronously if the backend supports it.

QnnHtp::pooled-output

QNN HTP

To decide in between pooled or per token embedding result as generation result.

QnnHtp::disable-kv-cache

QNN HTP

Disables the KV cache Manager, as models will not have KV cache.

QnnGenAiTransformer::version

QNN GenAiTransformer

Version of QnnGenAiTransformer object that is supported by APIs. (1)

QnnGenAiTransformer::n-layer

QNN GenAiTransformer

Number of decoder layers model is having.

QnnGenAiTransformer::n-embd

QNN GenAiTransformer

Size of embedding vector for each token.

QnnGenAiTransformer::n-heads

QNN GenAiTransformer

Number of heads model is having.

model::version

all backends

Version of model object that is supported by APIs. (1)

model::type

all backends

Type of model object “binary” for QNN HTP and “library” for QNN GenAiTransformer.

binary::version

QNN HTP

Version of binary object that is supported by APIs. (1)

binary::ctx-bins

QNN HTP

List of serialized model files.

library::version

QNN GenAiTransformer

Version of library object that is supported by APIs. (1)

library::model-bin

QNN GenAiTransformer

Path to model.bin file.

QNN GenAITransformer backend configuration example

The following is an example configuration for the QNN GenAITransformer backend.

{
  "embedding" : {
    "version" : 1,
    "context": {
      "version": 1,
      "n-vocab": 30522,
      "ctx-size": 512,
      "embed-size" : 1024,
      "pad-token" : 0
    },
    "prompt": {
      "version" : 1,
      "prompt-template": ["[CLS]","[SEP]"]
    },
    "tokenizer" : {
      "version" : 1,
      "path" : "test_path"
    },
    "truncate-input" : true,
    "engine": {
      "version": 1,
      "n-threads" : 10,
      "backend" : {
        "version" : 1,
        "type" : "QnnGenAiTransformer",
        "QnnGenAiTransformer" : {
          "version" : 1,
          "n-layer": 24,
          "n-embd": 1024,
          "n-heads": 16
        }
      },
      "model" : {
        "version" : 1,
        "type" : "library",
        "library" : {
          "version" : 1,
          "model-bin" : "path_to_model_binary_file"
        }
      }
    }
  }
}

QNN HTP backend configuration example

The following is an example configuration for the QNN HTP backend.

{
  "embedding" : {
    "version" : 1,
    "context": {
      "version": 1,
      "n-vocab": 30522,
      "ctx-size": 512,
      "embed-size" : 1024,
      "pad-token" : 0
    },
    "prompt": {
      "version" : 1,
      "prompt-template": ["[CLS]","[SEP]"]
    },
    "tokenizer" : {
      "version" : 1,
      "path" : "test_path"
    },
    "truncate-input" : true,
    "engine" : {
      "version" : 1,
      "backend" : {
        "version" : 1,
        "type" : "QnnHtp",
        "QnnHtp" : {
          "version" : 1,
          "spill-fill-bufsize" : 0,
          "use-mmap" : true,
          "pooled-output" : true,
          "allow-async-init": false,
          "disable-kv-cache": true
        },
        "extensions" : "htp_backend_ext_config.json"
      },
      "model" : {
        "version" : 1,
        "type" : "binary",
        "binary" : {
          "version" : 1,
          "ctx-bins" : [
            "file_1_of_1.bin"
          ]
        }
      }
    }
  }
}