Genie Node JSON configuration string

The following sections contain information that pertain to the format of the JSON configuration string that is supplied to GenieNodeConfig_createFromJson. This JSON configuration can also be supplied to the genie-app tool.

Note

Please refer to the example configs contained in the SDK at ${SDK_ROOT}/examples/Genie/configs/pipeline.

General configuration schema

The following provides the schema of the JSON configuration format that is provided to GenieNodeConfig_createFromJson. Note that the JSON configurations follows dialog schema for text-generation and embedding schema for image and text encoding.

Text Generator schema:

{
  "text-generator" : {
    "type": "object",
    "properties": {
      "version" : {"type": "integer"},
      "type" : {"type": "string", "enum":["basic"]},
      "context" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "size": {"type": "integer"},
          "n-vocab": {"type": "integer"},
          "bos-token": {"type": "integer"},
          "eos-token": {"type": "integer"}
        }
      },
      "sampler" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "seed" : {"type": "integer"},
          "temp" : {"type": "float"},
          "top-k" : {"type": "integer"},
          "top-p" : {"type": "float"},
          "greedy" : {"type": "boolean"}
        }
      },
      "tokenizer" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "path" : {"type": "string"}
        }
      },
      "engine" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "n-threads" : {"type": "integer"},
          "backend" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "type" : {"type": "string","enum" : ["QnnHtp", "QnnGenAiTransformer"]},
              "QnnHtp" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "spill-fill-bufsize" : {"type": "integer"},
                  "data-alignment-size" : {"type": "integer"},
                  "use-mmap" : {"type": "boolean"},
                  "mmap-budget" : {"type": "integer"},
                  "poll" : {"type": "boolean"},
                  "pos-id-dim" : {"type": "integer"},
                  "cpu-mask" : {"type": "string"},
                  "kv-dim" : {"type": "integer"},
                  "allow-async-init" : {"type": "boolean"},
                  "rope-theta" : {"type": "double"}
                }
              },
              "QnnGenAiTransformer" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "n-logits" : {"type": "integer"},
                  "n-layer" : {"type": "integer"},
                  "n-embd" : {"type": "integer"},
                  "n-heads" : {"type": "integer"},
                  "kv-quantization" : {"type": "boolean"}
                }
              },
              "extensions" : {"type": "string"}
            }
          },
          "model" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "type" : {"type": "string","enum":["binary", "library"]},
              "binary" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "ctx-bins" : {"type": "array", "items": {"type": "string"}}
                }
              },
              "library" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "model-bin" : {"type": "string"}
                }
              }
            }
          }
        }
      }
    }
  }
}

Option

Applicability

Description

text-generator::version

all backends

Version of node object that is supported by APIs.(1)

text-generator::type

all backends

Type of node supported by APIs.(basic)

text-generator::stop-sequence

all backends

Stop query when a set of sequences detected in response. Argument passed in as an array of strings

text-generator::max-num-tokens

all backends

Stop query when max number of tokens generated in response.

context::version

all backends

Version of context object that is supported by APIs. (1)

context::size

all backends

Context length. Maximum number of tokens to store.

context::n-vocab

all backends

Model vocabulary size.

context::bos-token

all backends

Beginning of sentence token.

context::eos-token

all backends

End of sentence token. Argument passed in as an integer or array of integers

context::eot-token

all backends

End of turn token.

sampler::version

all backends

Version of sampler object that is supported by APIs. (1)

sampler::type

all backends

Type of sampler to use. Supported options: basic, custom

sampler::callback-name

all backends

Name of the callback function to use for Sampling.

sampler::seed

all backends

Sampling random number generation seed.

sampler::temp

all backends

Sampling temperature.

sampler::top-k

all backends

Top-k number of samples.

sampler::top-p

all backends

Top-p sampling threshold.

sampler::greedy

all backends

Sampler that need to be used is random or greedy. true value specify greedy sampling.

tokenizer::version

all backends

Version of tokenizer object that is supported by APIs. (1)

tokenizer::path

all backends

Path to tokenizer file.

engine::version

all backends

Version of engine object that is supported by APIs. (1)

engine::n-threads

all backends

Number of threads to use for KV-cache updates.

debug::path

all backends

File path to dump debug information.

debug::dump-tensors

all backends

Raw data dump of input and output tensors

debug::dump-specs

all backends

Dumps Input output tensor specification such as bw, scale, offset, dimensions

debug::dump-outputs

all backends

Raw data dump of output tensor from engine

backend::version

all backends

Version of backend object that is supported by APIs. (1)

backend::type

all backends

Type of engine like “QnnHtp” for QNN HTP, “QnnGenAiTransformer” for QNN GenAITransformer backend and “QnnGpu” for QNN GPU.

backend::extensions

QNN HTP

Path to backend extensions configuration file.

QnnHtp::version

QNN HTP

Version of QnnHtp object that is supported by APIs. (1)

QnnHtp::spill-fill-bufsize

QNN HTP

Buffer size to pre-allocate for the QNN HTP spill fill. This field depends upon the HTP VTCM memory size. It should be set greater than the spill-fill required by each context binary in the model. Consult the QNN HTP backend documentation in the QAIRT SDK for more details.

QnnHtp::data-alignment-size

QNN HTP

Data will be aligned by rounding up the size to the nearest multiple of alignment number. Typically should be zero.

QnnHtp::use-mmap

QNN HTP

Memory map the context binary files. Typically should be turned on.

QnnHtp::mmap-budget

QNN HTP

Memory map the context binary files in chunks of the given size. Typically should be 25MB.

QnnHtp::poll

QNN HTP

Specify whether to busy-wait on threads.

QnnHtp::pos-id-dim

QNN HTP

Dimension of positional embeddings, usually (kv-dim) / 2.

QnnHtp::cpumask

QNN HTP

CPU affinity mask.

QnnHtp::kv-dim

QNN HTP

Dimension of the KV-cache embedding.

QnnHtp::allow-async-init

QNN HTP

Allow context binaries to be initialized asynchronously if the backend supports it.

QnnHtp::rope-theta

QNN HTP

Used to calculate rotary positional encodings.

QnnHtp::enable-graph-switching

QNN HTP

Enables graph switching for graphs within each context binary.

QnnGenAiTransformer::version

QNN GenAiTransformer

Version of QnnGenAiTransformer object that is supported by APIs. (1)

QnnGenAiTransformer::n-logits

QNN GenAiTransformer

Number of logit vectors that result will have for sampling.

QnnGenAiTransformer::n-layer

QNN GenAiTransformer

Number of decoder layers model is having.

QnnGenAiTransformer::n-embd

QNN GenAiTransformer

Size of embedding vector for each token.

QnnGenAiTransformer::n-heads

QNN GenAiTransformer

Number of heads model is having.

QnnGenAiTransformer::kv-quantization

QNN GenAiTransformer

Quantize KV Cache to Q8_0_32.

model::version

all backends

Version of model object that is supported by APIs. (1)

model::type

all backends

Type of model object “binary” for QNN HTP and “library” for QNN GenAiTransformer.

model::positional-encoding

all backends

Captures positional encoding parameters for a model.

positional-encoding::type

all backends

Type of positional encoding. Supported types are rope, alibi and absolute

positional-encoding::rope-dim

all backends

Dimension of Rope positional embeddings, usually (kv-dim) / 2.

positional-encoding::rope-theta

all backends

Used to calculate rotary position encodings for type rope

binary::version

QNN HTP

Version of binary object that is supported by APIs. (1)

binary::ctx-bins

QNN HTP

List of serialized model files.

library::version

QNN GenAiTransformer

Version of library object that is supported by APIs. (1)

library::model-bin

QNN GenAiTransformer

Path to model.bin file.

Text Encoder schema:

{
  "text-encoder" : {
    "type": "object",
    "properties": {
      "version" : {"type": "integer"},
      "context" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "ctx-size": {"type": "integer"},
          "n-vocab": {"type": "integer"},
          "embed-size": {"type": "integer"},
          "pad-token": {"type": "integer"}
        }
      },
      "prompt" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "prompt-template" : {"type": "array", "items": {"type": "string"}}
        }
      },
      "tokenizer" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "path" : {"type": "string"}
        }
      },
      "truncate-input" : {"type" : "boolean"},
      "engine" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "n-threads" : {"type": "integer"},
          "backend" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "type" : {"type": "string","enum" : ["QnnHtp", "QnnGenAiTransformer"]},
              "QnnHtp" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "spill-fill-bufsize" : {"type": "integer"},
                  "data-alignment-size" : {"type": "integer"},
                  "use-mmap" : {"type": "boolean"},
                  "allow-async-init" : {"type": "boolean"},
                  "pooled-output" : {"type": "boolean"},
                  "disable-kv-cache" : {"type": "boolean"}
                }
              },
              "QnnGenAiTransformer" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "n-layer" : {"type": "integer"},
                  "n-embd" : {"type": "integer"},
                  "n-heads" : {"type": "integer"}
                }
              },
              "extensions" : {"type": "string"}
            }
          },
          "model" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "type" : {"type": "string","enum":["binary", "library"]},
              "binary" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "ctx-bins" : {"type": "array", "items": {"type": "string"}}
                }
              },
              "library" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "model-bin" : {"type": "string"}
                }
              }
            }
          }
        }
      }
    }
  }
}

Option

Applicability

Description

text-encoder::version

all backends

Version of encoder object that is supported by APIs.(1)

text-encoder::truncate-input

all backends

To allow truncation of input, when it exceeds the context length.

context::version

all backends

Version of context object that is supported by APIs. (1)

context::ctx-size

all backends

Context length. Maximum number of tokens to process.

context::n-vocab

all backends

Model vocabulary size.

context::embed-size

all backends

Embedding length. Embedding vector length for each token.

context::pad-token

all backends

Token id for pad token.

prompt::version

all backends

Version of prompt object that is supported by APIs. (1)

prompt::prompt-template

all backends

Prefix and Suffix string that will be added to each prompt.

tokenizer::version

all backends

Version of tokenizer object that is supported by APIs. (1)

tokenizer::path

all backends

Path to tokenizer file.

engine::version

all backends

Version of engine object that is supported by APIs. (1)

engine::n-threads

all backends

Number of threads to use for KV-cache updates.

backend::version

all backends

Version of backend object that is supported by APIs. (1)

backend::type

all backends

Type of engine like “QnnHtp” for QNN HTP and “QnnGenAiTransformer” for QNN GenAITransformer backend.

backend::extensions

QNN HTP

Path to backend extensions configuration file.

QnnHtp::version

QNN HTP

Version of QnnHtp object that is supported by APIs. (1)

QnnHtp::spill-fill-bufsize

QNN HTP

Buffer size to pre-allocate for the QNN HTP spill fill. This field depends upon the HTP VTCM memory size. It should be set greater than the spill-fill required by each context binary in the model. Consult the QNN HTP backend documentation in the QAIRT SDK for more details.

QnnHtp::use-mmap

QNN HTP

Memory map the context binary files. Typically should be turned on.

QnnHtp::data-alignment-size

QNN HTP

Data will be aligned by rounding up the size to the nearest multiple of alignment number. Typically should be zero.

QnnHtp::allow-async-init

QNN HTP

Allow context binaries to be initialized asynchronously if the backend supports it.

QnnHtp::pooled-output

QNN HTP

To decide in between pooled or per token embedding result as generation result.

QnnHtp::disable-kv-cache

QNN HTP

Disables the KV cache Manager, as models will not have KV cache.

QnnGenAiTransformer::version

QNN GenAiTransformer

Version of QnnGenAiTransformer object that is supported by APIs. (1)

QnnGenAiTransformer::n-layer

QNN GenAiTransformer

Number of decoder layers model is having.

QnnGenAiTransformer::n-embd

QNN GenAiTransformer

Size of embedding vector for each token.

QnnGenAiTransformer::n-heads

QNN GenAiTransformer

Number of heads model is having.

model::version

all backends

Version of model object that is supported by APIs. (1)

model::type

all backends

Type of model object “binary” for QNN HTP and “library” for QNN GenAiTransformer.

binary::version

QNN HTP

Version of binary object that is supported by APIs. (1)

binary::ctx-bins

QNN HTP

List of serialized model files.

library::version

QNN GenAiTransformer

Version of library object that is supported by APIs. (1)

library::model-bin

QNN GenAiTransformer

Path to model.bin file.

Image Encoder schema:

{
  "image-encoder" : {
    "type": "object",
    "properties": {
      "version" : {"type": "integer"},
      "engine" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "n-threads" : {"type": "integer"},
          "backend" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "type" : {"type": "string","enum" : ["QnnHtp"]},
              "QnnHtp" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "spill-fill-bufsize" : {"type": "integer"},
                  "data-alignment-size" : {"type": "integer"},
                  "use-mmap" : {"type": "boolean"},
                  "allow-async-init" : {"type": "boolean"}
                }
              },
              "extensions" : {"type": "string"}
            }
          },
          "model" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "type" : {"type": "string","enum":["binary"]},
              "binary" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "ctx-bins" : {"type": "array", "items": {"type": "string"}}
                }
              }
            }
          }
        }
      }
    }
  }
}

Option

Applicability

Description

image-encoder::version

all backends

Version of encoder object that is supported by APIs.(1)

engine::version

all backends

Version of engine object that is supported by APIs. (1)

engine::n-threads

all backends

Number of threads to use for KV-cache updates.

backend::version

all backends

Version of backend object that is supported by APIs. (1)

backend::type

all backends

Type of engine like “QnnHtp” for QNN HTP and “QnnGenAiTransformer” for QNN GenAITransformer backend.

backend::extensions

QNN HTP

Path to backend extensions configuration file.

QnnHtp::version

QNN HTP

Version of QnnHtp object that is supported by APIs. (1)

QnnHtp::spill-fill-bufsize

QNN HTP

Buffer size to pre-allocate for the QNN HTP spill fill. This field depends upon the HTP VTCM memory size. It should be set greater than the spill-fill required by each context binary in the model. Consult the QNN HTP backend documentation in the QAIRT SDK for more details.

QnnHtp::use-mmap

QNN HTP

Memory map the context binary files. Typically should be turned on.

QnnHtp::data-alignment-size

QNN HTP

Data will be aligned by rounding up the size to the nearest multiple of alignment number. Typically should be zero.

QnnHtp::allow-async-init

QNN HTP

Allow context binaries to be initialized asynchronously if the backend supports it.

QnnHtp::pooled-output

QNN HTP

To decide in between pooled or per token embedding result as generation result.

QnnHtp::disable-kv-cache

QNN HTP

Disables the KV cache Manager, as models will not have KV cache.

model::version

all backends

Version of model object that is supported by APIs. (1)

model::type

all backends

Type of model object “binary” for QNN HTP and “library” for QNN GenAiTransformer.

binary::version

QNN HTP

Version of binary object that is supported by APIs. (1)

binary::ctx-bins

QNN HTP

List of serialized model files.