Genie Embedding JSON configuration string¶
The following sections contain information that pertain to the format of the JSON configuration string that is supplied to GenieEmbeddingConfig_createFromJson. This JSON configuration is can also be supplied to the genie-t2e-run tool.
Note
Please refer to the example configs contained in the SDK at ${SDK_ROOT}/examples/Genie/configs/.
General configuration schema¶
The following provides the schema of the JSON configuration format that is provided to GenieEmbeddingConfig_createFromJson. Note that dependencies are not specified in the schema, but are discussed in the following per-backend sections.
{
"embedding" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"context" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"ctx-size": {"type": "integer"},
"n-vocab": {"type": "integer"},
"embed-size": {"type": "integer"},
"pad-token": {"type": "integer"}
}
},
"prompt" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"prompt-template" : {"type": "array", "items": {"type": "string"}}
}
},
"tokenizer" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"path" : {"type": "string"}
}
},
"truncate-input" : {"type" : "boolean"},
"engine" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"n-threads" : {"type": "integer"},
"backend" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"type" : {"type": "string","enum" : ["QnnHtp", "QnnGenAiTransformer"]},
"QnnHtp" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"spill-fill-bufsize" : {"type": "integer"},
"data-alignment-size" : {"type": "integer"},
"use-mmap" : {"type": "boolean"},
"allow-async-init" : {"type": "boolean"},
"pooled-output" : {"type": "boolean"},
"disable-kv-cache" : {"type": "boolean"}
}
},
"QnnGenAiTransformer" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"n-layer" : {"type": "integer"},
"n-embd" : {"type": "integer"},
"n-heads" : {"type": "integer"}
}
},
"extensions" : {"type": "string"}
}
},
"model" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"type" : {"type": "string","enum":["binary", "library"]},
"binary" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"ctx-bins" : {"type": "array", "items": {"type": "string"}}
}
},
"library" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"model-bin" : {"type": "string"}
}
}
}
}
}
}
}
}
}
Option |
Applicability |
Description |
|---|---|---|
embedding::version |
all backends |
Version of embedding object that is supported by APIs.(1) |
embedding::truncate-input |
all backends |
To allow truncation of input, when it exceeds the context length. |
context::version |
all backends |
Version of context object that is supported by APIs. (1) |
context::ctx-size |
all backends |
Context length. Maximum number of tokens to process. |
context::n-vocab |
all backends |
Model vocabulary size. |
context::embed-size |
all backends |
Embedding length. Embedding vector length for each token. |
context::pad-token |
all backends |
Token id for pad token. |
prompt::version |
all backends |
Version of prompt object that is supported by APIs. (1) |
prompt::prompt-template |
all backends |
Prefix and Suffix string that will be added to each prompt. |
tokenizer::version |
all backends |
Version of tokenizer object that is supported by APIs. (1) |
tokenizer::path |
all backends |
Path to tokenizer file. |
engine::version |
all backends |
Version of engine object that is supported by APIs. (1) |
engine::n-threads |
all backends |
Number of threads to use for KV-cache updates. |
backend::version |
all backends |
Version of backend object that is supported by APIs. (1) |
backend::type |
all backends |
Type of engine like “QnnHtp” for QNN HTP and “QnnGenAiTransformer” for QNN GenAITransformer backend. |
backend::extensions |
QNN HTP |
Path to backend extensions configuration file. |
QnnHtp::version |
QNN HTP |
Version of QnnHtp object that is supported by APIs. (1) |
QnnHtp::spill-fill-bufsize |
QNN HTP |
Buffer size to pre-allocate for the QNN HTP spill fill. This field depends upon the HTP VTCM memory size. It should be set greater than the spill-fill required by each context binary in the model. Consult the QNN HTP backend documentation in the QAIRT SDK for more details. |
QnnHtp::use-mmap |
QNN HTP |
Memory map the context binary files. Typically should be turned on. |
QnnHtp::data-alignment-size |
QNN HTP |
Data will be aligned by rounding up the size to the nearest multiple of alignment number. Typically should be zero. |
QnnHtp::allow-async-init |
QNN HTP |
Allow context binaries to be initialized asynchronously if the backend supports it. |
QnnHtp::pooled-output |
QNN HTP |
To decide in between pooled or per token embedding result as generation result. |
QnnHtp::disable-kv-cache |
QNN HTP |
Disables the KV cache Manager, as models will not have KV cache. |
QnnGenAiTransformer::version |
QNN GenAiTransformer |
Version of QnnGenAiTransformer object that is supported by APIs. (1) |
QnnGenAiTransformer::n-layer |
QNN GenAiTransformer |
Number of decoder layers model is having. |
QnnGenAiTransformer::n-embd |
QNN GenAiTransformer |
Size of embedding vector for each token. |
QnnGenAiTransformer::n-heads |
QNN GenAiTransformer |
Number of heads model is having. |
model::version |
all backends |
Version of model object that is supported by APIs. (1) |
model::type |
all backends |
Type of model object “binary” for QNN HTP and “library” for QNN GenAiTransformer. |
binary::version |
QNN HTP |
Version of binary object that is supported by APIs. (1) |
binary::ctx-bins |
QNN HTP |
List of serialized model files. |
library::version |
QNN GenAiTransformer |
Version of library object that is supported by APIs. (1) |
library::model-bin |
QNN GenAiTransformer |
Path to model.bin file. |
QNN GenAITransformer backend configuration example¶
The following is an example configuration for the QNN GenAITransformer backend.
{
"embedding" : {
"version" : 1,
"context": {
"version": 1,
"n-vocab": 30522,
"ctx-size": 512,
"embed-size" : 1024,
"pad-token" : 0
},
"prompt": {
"version" : 1,
"prompt-template": ["[CLS]","[SEP]"]
},
"tokenizer" : {
"version" : 1,
"path" : "test_path"
},
"truncate-input" : true,
"engine": {
"version": 1,
"n-threads" : 10,
"backend" : {
"version" : 1,
"type" : "QnnGenAiTransformer",
"QnnGenAiTransformer" : {
"version" : 1,
"n-layer": 24,
"n-embd": 1024,
"n-heads": 16
}
},
"model" : {
"version" : 1,
"type" : "library",
"library" : {
"version" : 1,
"model-bin" : "path_to_model_binary_file"
}
}
}
}
}
QNN HTP backend configuration example¶
The following is an example configuration for the QNN HTP backend.
{
"embedding" : {
"version" : 1,
"context": {
"version": 1,
"n-vocab": 30522,
"ctx-size": 512,
"embed-size" : 1024,
"pad-token" : 0
},
"prompt": {
"version" : 1,
"prompt-template": ["[CLS]","[SEP]"]
},
"tokenizer" : {
"version" : 1,
"path" : "test_path"
},
"truncate-input" : true,
"engine" : {
"version" : 1,
"backend" : {
"version" : 1,
"type" : "QnnHtp",
"QnnHtp" : {
"version" : 1,
"spill-fill-bufsize" : 0,
"use-mmap" : true,
"pooled-output" : true,
"allow-async-init": false,
"disable-kv-cache": true
},
"extensions" : "htp_backend_ext_config.json"
},
"model" : {
"version" : 1,
"type" : "binary",
"binary" : {
"version" : 1,
"ctx-bins" : [
"file_1_of_1.bin"
]
}
}
}
}
}