Genie Node JSON configuration string¶
The following sections contain information that pertain to the format of the JSON configuration string that is supplied to GenieNodeConfig_createFromJson. This JSON configuration can also be supplied to the genie-app tool.
Note
Please refer to the example configs contained in the SDK at ${SDK_ROOT}/examples/Genie/configs/pipeline.
General configuration schema¶
The following provides the schema of the JSON configuration format that is provided to GenieNodeConfig_createFromJson. Note that the JSON configurations follows dialog schema for text-generation and embedding schema for image and text encoding.
Text Generator schema:
{
"text-generator" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"type" : {"type": "string", "enum":["basic"]},
"context" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"size": {"type": "integer"},
"n-vocab": {"type": "integer"},
"bos-token": {"type": "integer"},
"eos-token": {"type": "integer"}
}
},
"sampler" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"seed" : {"type": "integer"},
"temp" : {"type": "float"},
"top-k" : {"type": "integer"},
"top-p" : {"type": "float"},
"greedy" : {"type": "boolean"}
}
},
"tokenizer" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"path" : {"type": "string"}
}
},
"engine" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"n-threads" : {"type": "integer"},
"backend" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"type" : {"type": "string","enum" : ["QnnHtp", "QnnGenAiTransformer"]},
"QnnHtp" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"spill-fill-bufsize" : {"type": "integer"},
"data-alignment-size" : {"type": "integer"},
"use-mmap" : {"type": "boolean"},
"mmap-budget" : {"type": "integer"},
"poll" : {"type": "boolean"},
"pos-id-dim" : {"type": "integer"},
"cpu-mask" : {"type": "string"},
"kv-dim" : {"type": "integer"},
"allow-async-init" : {"type": "boolean"},
"rope-theta" : {"type": "double"}
}
},
"QnnGenAiTransformer" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"n-logits" : {"type": "integer"},
"n-layer" : {"type": "integer"},
"n-embd" : {"type": "integer"},
"n-heads" : {"type": "integer"},
"kv-quantization" : {"type": "boolean"}
}
},
"extensions" : {"type": "string"}
}
},
"model" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"type" : {"type": "string","enum":["binary", "library"]},
"binary" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"ctx-bins" : {"type": "array", "items": {"type": "string"}}
}
},
"library" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"model-bin" : {"type": "string"}
}
}
}
}
}
}
}
}
}
Option |
Applicability |
Description |
|---|---|---|
text-generator::version |
all backends |
Version of node object that is supported by APIs.(1) |
text-generator::type |
all backends |
Type of node supported by APIs.(basic) |
text-generator::stop-sequence |
all backends |
Stop query when a set of sequences detected in response. Argument passed in as an array of strings |
text-generator::max-num-tokens |
all backends |
Stop query when max number of tokens generated in response. |
context::version |
all backends |
Version of context object that is supported by APIs. (1) |
context::size |
all backends |
Context length. Maximum number of tokens to store. |
context::n-vocab |
all backends |
Model vocabulary size. |
context::bos-token |
all backends |
Beginning of sentence token. |
context::eos-token |
all backends |
End of sentence token. Argument passed in as an integer or array of integers |
context::eot-token |
all backends |
End of turn token. |
sampler::version |
all backends |
Version of sampler object that is supported by APIs. (1) |
sampler::type |
all backends |
Type of sampler to use. Supported options: basic, custom |
sampler::callback-name |
all backends |
Name of the callback function to use for Sampling. |
sampler::seed |
all backends |
Sampling random number generation seed. |
sampler::temp |
all backends |
Sampling temperature. |
sampler::top-k |
all backends |
Top-k number of samples. |
sampler::top-p |
all backends |
Top-p sampling threshold. |
sampler::greedy |
all backends |
Sampler that need to be used is random or greedy. true value specify greedy sampling. |
tokenizer::version |
all backends |
Version of tokenizer object that is supported by APIs. (1) |
tokenizer::path |
all backends |
Path to tokenizer file. |
engine::version |
all backends |
Version of engine object that is supported by APIs. (1) |
engine::n-threads |
all backends |
Number of threads to use for KV-cache updates. |
debug::path |
all backends |
File path to dump debug information. |
debug::dump-tensors |
all backends |
Raw data dump of input and output tensors |
debug::dump-specs |
all backends |
Dumps Input output tensor specification such as bw, scale, offset, dimensions |
debug::dump-outputs |
all backends |
Raw data dump of output tensor from engine |
backend::version |
all backends |
Version of backend object that is supported by APIs. (1) |
backend::type |
all backends |
Type of engine like “QnnHtp” for QNN HTP, “QnnGenAiTransformer” for QNN GenAITransformer backend and “QnnGpu” for QNN GPU. |
backend::extensions |
QNN HTP |
Path to backend extensions configuration file. |
QnnHtp::version |
QNN HTP |
Version of QnnHtp object that is supported by APIs. (1) |
QnnHtp::spill-fill-bufsize |
QNN HTP |
Buffer size to pre-allocate for the QNN HTP spill fill. This field depends upon the HTP VTCM memory size. It should be set greater than the spill-fill required by each context binary in the model. Consult the QNN HTP backend documentation in the QAIRT SDK for more details. |
QnnHtp::data-alignment-size |
QNN HTP |
Data will be aligned by rounding up the size to the nearest multiple of alignment number. Typically should be zero. |
QnnHtp::use-mmap |
QNN HTP |
Memory map the context binary files. Typically should be turned on. |
QnnHtp::mmap-budget |
QNN HTP |
Memory map the context binary files in chunks of the given size. Typically should be 25MB. |
QnnHtp::poll |
QNN HTP |
Specify whether to busy-wait on threads. |
QnnHtp::pos-id-dim |
QNN HTP |
Dimension of positional embeddings, usually (kv-dim) / 2. |
QnnHtp::cpumask |
QNN HTP |
CPU affinity mask. |
QnnHtp::kv-dim |
QNN HTP |
Dimension of the KV-cache embedding. |
QnnHtp::allow-async-init |
QNN HTP |
Allow context binaries to be initialized asynchronously if the backend supports it. |
QnnHtp::rope-theta |
QNN HTP |
Used to calculate rotary positional encodings. |
QnnHtp::enable-graph-switching |
QNN HTP |
Enables graph switching for graphs within each context binary. |
QnnGenAiTransformer::version |
QNN GenAiTransformer |
Version of QnnGenAiTransformer object that is supported by APIs. (1) |
QnnGenAiTransformer::n-logits |
QNN GenAiTransformer |
Number of logit vectors that result will have for sampling. |
QnnGenAiTransformer::n-layer |
QNN GenAiTransformer |
Number of decoder layers model is having. |
QnnGenAiTransformer::n-embd |
QNN GenAiTransformer |
Size of embedding vector for each token. |
QnnGenAiTransformer::n-heads |
QNN GenAiTransformer |
Number of heads model is having. |
QnnGenAiTransformer::kv-quantization |
QNN GenAiTransformer |
Quantize KV Cache to Q8_0_32. |
model::version |
all backends |
Version of model object that is supported by APIs. (1) |
model::type |
all backends |
Type of model object “binary” for QNN HTP and “library” for QNN GenAiTransformer. |
model::positional-encoding |
all backends |
Captures positional encoding parameters for a model. |
positional-encoding::type |
all backends |
Type of positional encoding. Supported types are rope, alibi and absolute |
positional-encoding::rope-dim |
all backends |
Dimension of Rope positional embeddings, usually (kv-dim) / 2. |
positional-encoding::rope-theta |
all backends |
Used to calculate rotary position encodings for type rope |
binary::version |
QNN HTP |
Version of binary object that is supported by APIs. (1) |
binary::ctx-bins |
QNN HTP |
List of serialized model files. |
library::version |
QNN GenAiTransformer |
Version of library object that is supported by APIs. (1) |
library::model-bin |
QNN GenAiTransformer |
Path to model.bin file. |
Text Encoder schema:
{
"text-encoder" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"context" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"ctx-size": {"type": "integer"},
"n-vocab": {"type": "integer"},
"embed-size": {"type": "integer"},
"pad-token": {"type": "integer"}
}
},
"prompt" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"prompt-template" : {"type": "array", "items": {"type": "string"}}
}
},
"tokenizer" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"path" : {"type": "string"}
}
},
"truncate-input" : {"type" : "boolean"},
"engine" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"n-threads" : {"type": "integer"},
"backend" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"type" : {"type": "string","enum" : ["QnnHtp", "QnnGenAiTransformer"]},
"QnnHtp" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"spill-fill-bufsize" : {"type": "integer"},
"data-alignment-size" : {"type": "integer"},
"use-mmap" : {"type": "boolean"},
"allow-async-init" : {"type": "boolean"},
"pooled-output" : {"type": "boolean"},
"disable-kv-cache" : {"type": "boolean"}
}
},
"QnnGenAiTransformer" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"n-layer" : {"type": "integer"},
"n-embd" : {"type": "integer"},
"n-heads" : {"type": "integer"}
}
},
"extensions" : {"type": "string"}
}
},
"model" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"type" : {"type": "string","enum":["binary", "library"]},
"binary" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"ctx-bins" : {"type": "array", "items": {"type": "string"}}
}
},
"library" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"model-bin" : {"type": "string"}
}
}
}
}
}
}
}
}
}
Option |
Applicability |
Description |
|---|---|---|
text-encoder::version |
all backends |
Version of encoder object that is supported by APIs.(1) |
text-encoder::truncate-input |
all backends |
To allow truncation of input, when it exceeds the context length. |
context::version |
all backends |
Version of context object that is supported by APIs. (1) |
context::ctx-size |
all backends |
Context length. Maximum number of tokens to process. |
context::n-vocab |
all backends |
Model vocabulary size. |
context::embed-size |
all backends |
Embedding length. Embedding vector length for each token. |
context::pad-token |
all backends |
Token id for pad token. |
prompt::version |
all backends |
Version of prompt object that is supported by APIs. (1) |
prompt::prompt-template |
all backends |
Prefix and Suffix string that will be added to each prompt. |
tokenizer::version |
all backends |
Version of tokenizer object that is supported by APIs. (1) |
tokenizer::path |
all backends |
Path to tokenizer file. |
engine::version |
all backends |
Version of engine object that is supported by APIs. (1) |
engine::n-threads |
all backends |
Number of threads to use for KV-cache updates. |
backend::version |
all backends |
Version of backend object that is supported by APIs. (1) |
backend::type |
all backends |
Type of engine like “QnnHtp” for QNN HTP and “QnnGenAiTransformer” for QNN GenAITransformer backend. |
backend::extensions |
QNN HTP |
Path to backend extensions configuration file. |
QnnHtp::version |
QNN HTP |
Version of QnnHtp object that is supported by APIs. (1) |
QnnHtp::spill-fill-bufsize |
QNN HTP |
Buffer size to pre-allocate for the QNN HTP spill fill. This field depends upon the HTP VTCM memory size. It should be set greater than the spill-fill required by each context binary in the model. Consult the QNN HTP backend documentation in the QAIRT SDK for more details. |
QnnHtp::use-mmap |
QNN HTP |
Memory map the context binary files. Typically should be turned on. |
QnnHtp::data-alignment-size |
QNN HTP |
Data will be aligned by rounding up the size to the nearest multiple of alignment number. Typically should be zero. |
QnnHtp::allow-async-init |
QNN HTP |
Allow context binaries to be initialized asynchronously if the backend supports it. |
QnnHtp::pooled-output |
QNN HTP |
To decide in between pooled or per token embedding result as generation result. |
QnnHtp::disable-kv-cache |
QNN HTP |
Disables the KV cache Manager, as models will not have KV cache. |
QnnGenAiTransformer::version |
QNN GenAiTransformer |
Version of QnnGenAiTransformer object that is supported by APIs. (1) |
QnnGenAiTransformer::n-layer |
QNN GenAiTransformer |
Number of decoder layers model is having. |
QnnGenAiTransformer::n-embd |
QNN GenAiTransformer |
Size of embedding vector for each token. |
QnnGenAiTransformer::n-heads |
QNN GenAiTransformer |
Number of heads model is having. |
model::version |
all backends |
Version of model object that is supported by APIs. (1) |
model::type |
all backends |
Type of model object “binary” for QNN HTP and “library” for QNN GenAiTransformer. |
binary::version |
QNN HTP |
Version of binary object that is supported by APIs. (1) |
binary::ctx-bins |
QNN HTP |
List of serialized model files. |
library::version |
QNN GenAiTransformer |
Version of library object that is supported by APIs. (1) |
library::model-bin |
QNN GenAiTransformer |
Path to model.bin file. |
Image Encoder schema:
{
"image-encoder" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"engine" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"n-threads" : {"type": "integer"},
"backend" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"type" : {"type": "string","enum" : ["QnnHtp"]},
"QnnHtp" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"spill-fill-bufsize" : {"type": "integer"},
"data-alignment-size" : {"type": "integer"},
"use-mmap" : {"type": "boolean"},
"allow-async-init" : {"type": "boolean"}
}
},
"extensions" : {"type": "string"}
}
},
"model" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"type" : {"type": "string","enum":["binary"]},
"binary" : {
"type": "object",
"properties": {
"version" : {"type": "integer"},
"ctx-bins" : {"type": "array", "items": {"type": "string"}}
}
}
}
}
}
}
}
}
}
Option |
Applicability |
Description |
|---|---|---|
image-encoder::version |
all backends |
Version of encoder object that is supported by APIs.(1) |
engine::version |
all backends |
Version of engine object that is supported by APIs. (1) |
engine::n-threads |
all backends |
Number of threads to use for KV-cache updates. |
backend::version |
all backends |
Version of backend object that is supported by APIs. (1) |
backend::type |
all backends |
Type of engine like “QnnHtp” for QNN HTP and “QnnGenAiTransformer” for QNN GenAITransformer backend. |
backend::extensions |
QNN HTP |
Path to backend extensions configuration file. |
QnnHtp::version |
QNN HTP |
Version of QnnHtp object that is supported by APIs. (1) |
QnnHtp::spill-fill-bufsize |
QNN HTP |
Buffer size to pre-allocate for the QNN HTP spill fill. This field depends upon the HTP VTCM memory size. It should be set greater than the spill-fill required by each context binary in the model. Consult the QNN HTP backend documentation in the QAIRT SDK for more details. |
QnnHtp::use-mmap |
QNN HTP |
Memory map the context binary files. Typically should be turned on. |
QnnHtp::data-alignment-size |
QNN HTP |
Data will be aligned by rounding up the size to the nearest multiple of alignment number. Typically should be zero. |
QnnHtp::allow-async-init |
QNN HTP |
Allow context binaries to be initialized asynchronously if the backend supports it. |
QnnHtp::pooled-output |
QNN HTP |
To decide in between pooled or per token embedding result as generation result. |
QnnHtp::disable-kv-cache |
QNN HTP |
Disables the KV cache Manager, as models will not have KV cache. |
model::version |
all backends |
Version of model object that is supported by APIs. (1) |
model::type |
all backends |
Type of model object “binary” for QNN HTP and “library” for QNN GenAiTransformer. |
binary::version |
QNN HTP |
Version of binary object that is supported by APIs. (1) |
binary::ctx-bins |
QNN HTP |
List of serialized model files. |