genie-t2t-run

Note

The -tok/–tokens_file (token to token feature) option is currently supported for basic dialog type only.

The genie-t2t-run tool is provided as a test application to do text to text inference run on provided LLM network. It takes a user prompt in text format and outputs the result in the text format.

DESCRIPTION:
------------
Tool for text to text inference of LLMs using Genie.


REQUIRED ARGUMENTS:
-------------------

-c or --config                        <FILE>      Dialog JSON configuration file.

OPTIONAL ARGUMENTS:
-------------------
-h or --help                                      Show this help message and exit.

-p or --prompt                        <VAL>       Prompt to query. Mutually exclusive with --prompt_file.

--prompt_file                         <FILE>      Prompt to query provided as a file. Mutually exclusive with --prompt.

-e PATH or --embedding_file           <FILE>      Input embeddings provided as a file. Mutually exclusive with --prompt, --prompt_file and --tokens_file.
                                                  TYPE, SCALE, and OFFSET are optional parameters representing the model's input quantization encodings.
                                                  Required for lookup table requantization. Valid values of TYPE are int8, int16, uint8, uint16. The
                                                  signedness must be consistent with the lookup table encodings.

-t PATH or --embedding_table          <FILE>      Token-to-Embedding lookup table provided as a file. Mutually exclusive with --prompt and --prompt_file.
                                                  "TYPE, SCALE, and OFFSET are optional parameters representing the lookup table's quantization encodings.
                                                  Required for lookup table requantization. Valid values of TYPE are int8, int16, uint8, uint16. The
                                                  signedness must be consistent with the input layer encodings.

-l or --lora                          <VAL>       ADAPTER_NAME,ALPHA_NAME_1,ALPHA_VALUE_1,ALPHA_NAME_2,ALPHA_VALUE_2,.. Apply a LoRA adapter to a dialog.

-tok PATH or --tokens_file            <FILE>      Input tokens provided as a file (Supported format .txt). Mutually exclusive with --prompt, --prompt_file and --embedding_file.

--log                                 <VAL>       Enables logging. LogLevel must be one of error, warn, info, or verbose.

--profile                             <FILE>      Enables profiling. FILE_NAME is mandatory parameter and provides name of
                                                  output file with profiling data.

--action                              <VAL>       Pass the name of action that needs to signaled to inprogress query for currrent active dialog.
                                                  Supported action is ABORT.

--sleep                               <VAL>       Pass the time(in ms) for signal thread to sleep.
                                                  Default sleep is 2025 ms.
--allow_engine_switch                 <VAL>       ENGINE_ROLE, STANDALONE_ENGINE_CONFIG.JSON
                                                  Allows switching the draft engine over the same dialog..
--engine_role                         <VAL>       Option to select engine in case of multi-engine dialog.
                                                  Default to "primary" engine.

See Tutorials for reference example on how to use the genie-t2t-run tool.

Embedding Vector Lookup Table

For Embedding-to-Text models, genie-t2t-run accepts an embedding vector lookup table file to convert tokens to embedding vectors. The file is a two-dimensional array in row-major order and raw binary format. The row count should be equal to the tokenizer vocabulary size and the column count should be equal to the embedding vector length. The datatype of the array is determined by the embedding::datatype Dialog JSON configuration.

Note

If a lookup table is not provided, genie-t2t-run can only generate the result of the first output token.