Skip to content

CLI Migration Guide

guidellm benchmark [run]

Run a benchmark against a generative model.

This command is now guidellm run

v0.6.0 option v0.7.0 equivalent
--backend-kwargs JSON string of arguments to pass to the backend. E.g., '{"api_key": "apikey-*", "verify": false}' Options passed to --backend constructor, for example --backend "kind=openai_http,api_key=sk…"
--backend Backend type. Options: vllm_python, openai_http. The "kind" of the backend specification, for example --backend '{"kind": "openai_http", "extras": {"body": {"temperature": 0.6}}}'
--cooldown Cooldown specification: int, float, or dict as string (json or key=value). Controls time or requests after measurement ends. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema. Specify with the cooldown profile attribute, for example --profile kind=synchronous,cooldown=2 for a two second cooldown or --profile '{"kind":"concurrent","cooldown":{"mode":"duration","value":2}}
--data-args JSON string of arguments to pass to dataset creation. Specified with "load_kwargs" data attribute, e.g., --data '{"kind":"huggingface","load_kwargs":{"split":"train"}}'
--data-column-mapper JSON string of column mappings to apply to the dataset. E.g., '{"text_column": "article", "output_tokens_count_column" :"output_tokens"}'` Specify the kind and attributes of a data column mapper, for example --data-column-mapper '{"kind":"generative_column_mapper","column_mappings":{"text_column":"instruction"}}
--data-finalizer JSON string of finalizer to convert dataset rows to requests. E.g., 'generative' or '{"type": "generative"}'` Specify the kind of finalizer, for example --data-finalizer kind=generative
--data-num-workers Number of worker processes for data loading. Specified with the Data Loader num_workers attribute, for example --data-loader kind=pytorch,num_workers=3
--data-preprocessors-kwargs JSON string of arguments to pass to all preprocessors. Add parameters to the data preprocessor constructor, for example --data-preprocessor '{"kind":"encode_media","audio_kwargs":{"format":"mp3"}}'
--data-preprocessors List of preprocessors to apply to the dataset. E.g., 'encode_media,my_custom_preprocessor' Specify the preprocessor kind and attributes, for example --data-preprocessor kind=encode_media … can be repeated to configure multiple preprocessors.
--data-sampler Data sampler type. Shuffle function is a data loader attribute, for example --data-loader kind=pytorch,shuffle=true
--data-samples Number of samples from dataset. -1 (default) uses all samples and dynamically generates more. Specify as part of Data Loader configuration, for example --data-loader kind=pytorch,samples=10
--data HuggingFace dataset ID, path to dataset, path to data file (csv/json/jsonl/txt), or synthetic data config (json/key=value). Specify the kind of dataset together with attributes, for example --data kind=huggingface,source=<id> --data kind=csv_file,path=<file.csv> --data kind=synthetic_text,prompt_tokens=128,output_tokens=64
--dataloader-kwargs JSON string of arguments to pass to the dataloader constructor. Passed directly to Data Loader, for example --data-loader kind=pytorch,shuffle=true,samples=100
--detect-saturation Enable over-saturation detection with default settings. Specify oversaturation constraint kind and attributes, for example --constraint kind=over_saturation
--disable-console-interactive Disable interactive console progress updates. Unchanged: --disable-console-interactive or --disable-progress
--disable-console Disable all outputs to the console (updates, interactive progress, results). Unchanged: --disable-console or --disable-console-outputs
--max-error-rate Maximum error rate before stopping the benchmark. Specify maximum error rate constraint kind and attributes, for example --constraint kind=max_error_rate,rate=10
--max-errors Maximum errors before stopping the benchmark. Specify maximum error count constraint kind and attributes, for example --constraint kind=max_errors,count=10
--max-global-error-rate Maximum global error rate across all benchmarks. Specify maximum global error rate constraint kind and attributes, for example --constraint kind=max_global_error_rate,rate=10,minimum=100
--max-requests Maximum requests per benchmark. If None, runs until max_seconds or data exhaustion. Specify maximum requests constraint kind and attributes, for example --constraint kind=max_requests,count=1000
--max-seconds Maximum seconds per benchmark. If None, runs until max_requests or data exhaustion. Specify maximum duration constraint kind and attributes, for example --constraint kind=max_duration,seconds=60
--model Model ID to benchmark. If not provided, uses first available model. Specify with the model attribute of the backend configuration, for example --backend kind=openai_http,model=gpt4
--output-dir or –-output-path: The directory path to save file output types in Specify paths as part of the individual output configurations, for example --output kind=json,path=/tmp/reports/benchmark.json
--outputs The filename.ext for each of the outputs to create or the alises (json, csv, html) for the output files to create with their default file names (benchmark.[EXT]) Specify multiple output formats by repeating the --output option, for example --output kind=json,path=benchmark.json –output kind=csv,path=benchmark.csv
--over-saturation Enable over-saturation detection. Pass a JSON dict with configuration (e.g., '{"enabled": true, "min_seconds": 30}'). Defaults to None (disabled). Specify oversaturation constraint kind and attributes, for example --constraint kind=over_saturation,mode=enforce,min_seconds=30
--processor-args JSON string of arguments to pass to the processor constructor. Specify options directly along with the tokenizer kind, for example --tokenizer '{"kind":"huggingface_auto","load_kwargs":{"fast":true}}'
--processor Processor or tokenizer for token count calculations. If not provided, loads from model. Defaults to the default tokenizer for the first model supported by the backend target. To override, specify the tokener kind and attributes, for example --tokenizer kind=huggingface_auto,model=gpt4
--profile Benchmark profile type. Options: sweep, async, poisson, synchronous, throughput, concurrent, constant. Specify the benchmark profile kind and attributes to use, for example, --profile kind=sweep,sweep_size=10,warmup=1,cooldown=1
--rampup The time, in seconds, to ramp up the request rate over. Applicable for Throughput, Concurrent, and Constant strategies Specify with the rampup_duration profile attribute, for example --profile kind=constant,rate=10,rampup_duration=2
--random-seed Random seed for reproducibility. Specify the random seed configuration kind and attributes, for example --seed kind=static,value=42
--rate Benchmark rate(s) to test. Meaning depends on profile: sweep=number of benchmarks, concurrent=concurrent requests, async/constant/poisson=requests per second. "Rate" was overloaded to specify the primary configuration for each profile type. Specify the appropriate attribute with --profile or --override profile.<name>: async/constant/poisson → rate, concurrent → streams, sweep → sweep_size, throughput → max_concurrency.
--request-format Format to use for requests. Options depend on backend.

For vLLM backend: plain (no chat template, text appending only), default-template (use tokenizer default), or a file path / single-line template per vLLM docs. Default: default-template

For openai backend: http endpoint path (/v1/chat/completions, /v1/completions, /v1/audio/transcriptions, /v1/audio/translations) or alias (e.g. chat_completions); default /v1/chat/completions.
Specify as part of backend configuration, like --backend kind=openai_http,request_format=/v1/responses
--sample-requests Number of sample requests per status to save. None (default) saves all, recommended: 20. Specify with the sample_size attribute of the metrics configuration, for example --metrics kind=generative,sample_size=20. The default if unspecified is to save all samples.
--scenario Builtin scenario name or path to config file. CLI options override scenario settings. The preferred name is now --config, although both --scenario and -c are aliases, for example --config chat or --config my-scenario.yaml.
--target Target backend URL (e.g., http://localhost:8000). Specify with the target attribute of the backend configuration, for example --backend kind=openai_http,target=http://localhost:8000
--warmup Warmup specification: int, float, or dict as string (json or key=value). Controls time or requests before measurement starts. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema. Specify with the warmup profile attribute, for example --profile kind=synchronous,warmup=2 for a two second warmup or --profile '{"kind":"concurrent","warmup":{"mode":"duration","value":2}}'
NEW OPTIONS v0.7.0 new options
Add metadata to output reports Specify key-value pairs of metadata labels which will be written to the output reports, for example --label gpu=NVIDIA_Z500 --label creator=Intrepid_Adventurer
Override concurrent profile stream count and async/constant/poisson rate Specify profile settings to override the default profile settings, for example --override profile.rate 10,20,30 or --override profile.streams 10,20,30

Guidellm benchmark from-file

Load a saved benchmark report and optionally re-export data.

[!WARNING]\ This command may be changed to be more consistent with the run command in the future.

Option v0.7.0 equivalent
PATH Path to the saved benchmark report file (default: ./benchmarks. Unchanged
--output-path Directory or file path to save re-exported benchmark results. If a directory, all output formats will be saved there. If a file, the matching format will be saved to that file. Unchanged
--output-formats Output formats for benchmark results (e.g., console, json, html, csv). Unchanged

guidellm config

Show configuration settings

Changed from guidellm config to guidellm env to clarify that it displays environment variables affecting GuideLLM operation.

guidellm config will be used later for a different purpose, to generate YAML config files from run options.

guidellm preprocess dataset

Tools for preprocessing datasets for use in benchmarks.

[!WARNING]\ This command may be changed to be more consistent with the run command in the future.

v0.6.0 option v0.7.0 equivalent
data (positional parameter) Use dataset descriptor, for example kind=huggingface,source=<id>
output_path (positional parameter) Results file path, for example file.json
--processor TEXT Processor or tokenizer name for calculating token counts. Unchanged
--config TEXT PreprocessDatasetConfig as JSON string, key=value pairs, or file path (.json, .yaml, .yml, .config). Example: prompt_tokens=100,output_tokens=50,prefix_tokens_max=10 or {"prompt_tokens": 100, "output_tokens": 50, "prefix_tokens_max": 10} [Mandatory] Unchanged
--processor-args TEXT JSON string of arguments to pass to the processor constructor. Unchanged
--data-args TEXT JSON string of arguments to pass to dataset creation Unchanged
--data-column-mapper JSON string of column mappings to apply to the dataset Specify a data column mapper object, for example --data-column-mapper '{"kind":"generative_column_mapper","column_mappings":{"text_column":"instruction"}}'
--short-prompt-strategy [ignore, concatenate, pad, error] Strategy for handling prompts shorter than target length. [default: ignore] Unchanged
--pad-char TEXT Character to pad short prompts with when using "pad" strategy (used with 'concatenate' strategy). Unchanged
--concat-delimiter TEXT Delimiter for concatenating short prompts (used with 'concatenate' strategy). Unchanged
--include-prefix-in-token-count Include prefix tokens in prompt token count calculation. Unchanged
--push-to-hub Push the processed dataset to Hugging Face Hub. Unchanged
--hub-dataset-id TEXT Hugging Face Hub dataset ID for upload (required if --push-to-hub is set). Unchanged
--random-seed INTEGER Random seed for reproducible token sampling. [default: 42] Unchanged