guidellm.data.deserializers.trace_mooncake
Trace deserializer for Mooncake formatted files that generates synthetic prompts per row.
Reads a trace file (timestamp, input_length, output_length, hash_ids) and yields one row per line with a synthetic prompt matching the requested input_length for replay benchmarks.
TraceMooncakeDataArgs
Bases: TraceSyntheticDataArgs
Model for Mooncake trace dataset deserializer arguments.
Source code in src/guidellm/data/deserializers/trace_mooncake.py
TraceMooncakeDatasetDeserializer
Bases: DatasetDeserializer
Mooncake trace format deserializer
The Mooncake trace format requires a column for timestamps, prompt token counts, ouput token counts and lists of hash IDs.
Hash IDs are globally unique identifiers based on the current and previous token blocks in a prompt. The relationships of IDs forms a tree, where every first ID in a prompt has a parent node of None. Parent nodes can have an unbounded number of children. Two hash IDs can represent identical blocks of tokens so long as they do not share the same parent (previous ID). For more details, see section 4 of https://arxiv.org/pdf/2407.00079.
Generated prompts match the prompt token count of the row.