KV$ Rewind

The KV$ Rewind/KV$ Prefix Match feature allows for efficient query processing by leveraging previously cached KV values. When using KV Rewind, Genie can reuse the KV cache values from a previous query to speed up the processing of a new, similar query. This is particularly useful in scenarios where the new query shares a common prefix with the previous one.

Using KV Rewind between queries

typedef enum {
  /// The string is the entire query/response.
  GENIE_DIALOG_SENTENCE_COMPLETE = 0,
  /// The string is the beginning of the query/response.
  GENIE_DIALOG_SENTENCE_BEGIN = 1,
  /// The string is a part of the query/response and not the beginning or end.
  GENIE_DIALOG_SENTENCE_CONTINUE = 2,
  /// The string is the end of the query/response.
  GENIE_DIALOG_SENTENCE_END = 3,
  /// The query has been aborted.
  GENIE_DIALOG_SENTENCE_ABORT = 4,
  ///Rewind the KV cache as per prefix query match before processing the query
  GENIE_DIALOG_SENTENCE_REWIND = 5,
} GenieDialog_SentenceCode_t;
GENIE_API
Genie_Status_t GenieDialog_query(const GenieDialog_Handle_t dialogHandle,
                                 const char* queryStr,
                                 const GenieDialog_SentenceCode_t sentenceCode,
                                 const GenieDialog_QueryCallback_t callback,
                                 const void* userData);

Use the sentence code GENIE_DIALOG_SENTENCE_REWIND and pass the query string as you would for a normal query. The API will handle prefix matching and KV rewind internally.

Note

KV$ prefix match works well with the KV update method SMART_MASK. However, with KV update method POINTER_SHIFT, we observed that in a few cases, it throws memory register-related errors for weight-shared bins. POINTER_SHIFT works fine or shows no issues with decoder-only models (AR1 / AR8 / AR128, etc.).

In genie-t2t-run, we can use ‘-w’ option for rewind queries.

For example:

./genie-t2t-run -c llama2-7b-htp.json
                -p "Answer in one sentence, what is the capital city of India?"
                -w "Answer in one sentence, what is the capital city of Russia?"