CLI Migration Guide
guidellm benchmark [run]
Run a benchmark against a generative model.
This command is now guidellm run
| v0.6.0 option | v0.7.0 equivalent |
|---|---|
| --backend-kwargs JSON string of arguments to pass to the backend. E.g., '{"api_key": "apikey-*", "verify": false}' | Options passed to --backend constructor, for example --backend "kind=openai_http,api_key=sk…" |
| --backend Backend type. Options: vllm_python, openai_http. | The "kind" of the backend specification, for example --backend '{"kind": "openai_http", "extras": {"body": {"temperature": 0.6}}}' |
| --cooldown Cooldown specification: int, float, or dict as string (json or key=value). Controls time or requests after measurement ends. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema. | Specify with the cooldown profile attribute, for example --profile kind=synchronous,cooldown=2 for a two second cooldown or --profile '{"kind":"concurrent","cooldown":{"mode":"duration","value":2}} |
| --data-args JSON string of arguments to pass to dataset creation. | Specified with "load_kwargs" data attribute, e.g., --data '{"kind":"huggingface","load_kwargs":{"split":"train"}}' |
| --data-column-mapper JSON string of column mappings to apply to the dataset. E.g., '{"text_column": "article", "output_tokens_count_column" :"output_tokens"}'` | Specify the kind and attributes of a data column mapper, for example --data-column-mapper '{"kind":"generative_column_mapper","column_mappings":{"text_column":"instruction"}} |
| --data-finalizer JSON string of finalizer to convert dataset rows to requests. E.g., 'generative' or '{"type": "generative"}'` | Specify the kind of finalizer, for example --data-finalizer kind=generative |
| --data-num-workers Number of worker processes for data loading. | Specified with the Data Loader num_workers attribute, for example --data-loader kind=pytorch,num_workers=3 |
| --data-preprocessors-kwargs JSON string of arguments to pass to all preprocessors. | Add parameters to the data preprocessor constructor, for example --data-preprocessor '{"kind":"encode_media","audio_kwargs":{"format":"mp3"}}' |
| --data-preprocessors List of preprocessors to apply to the dataset. E.g., 'encode_media,my_custom_preprocessor' | Specify the preprocessor kind and attributes, for example --data-preprocessor kind=encode_media … can be repeated to configure multiple preprocessors. |
| --data-sampler Data sampler type. | Shuffle function is a data loader attribute, for example --data-loader kind=pytorch,shuffle=true |
| --data-samples Number of samples from dataset. -1 (default) uses all samples and dynamically generates more. | Specify as part of Data Loader configuration, for example --data-loader kind=pytorch,samples=10 |
| --data HuggingFace dataset ID, path to dataset, path to data file (csv/json/jsonl/txt), or synthetic data config (json/key=value). | Specify the kind of dataset together with attributes, for example --data kind=huggingface,source=<id> --data kind=csv_file,path=<file.csv> --data kind=synthetic_text,prompt_tokens=128,output_tokens=64 |
| --dataloader-kwargs JSON string of arguments to pass to the dataloader constructor. | Passed directly to Data Loader, for example --data-loader kind=pytorch,shuffle=true,samples=100 |
| --detect-saturation Enable over-saturation detection with default settings. | Specify oversaturation constraint kind and attributes, for example --constraint kind=over_saturation |
| --disable-console-interactive Disable interactive console progress updates. | Unchanged: --disable-console-interactive or --disable-progress |
| --disable-console Disable all outputs to the console (updates, interactive progress, results). | Unchanged: --disable-console or --disable-console-outputs |
| --max-error-rate Maximum error rate before stopping the benchmark. | Specify maximum error rate constraint kind and attributes, for example --constraint kind=max_error_rate,rate=10 |
| --max-errors Maximum errors before stopping the benchmark. | Specify maximum error count constraint kind and attributes, for example --constraint kind=max_errors,count=10 |
| --max-global-error-rate Maximum global error rate across all benchmarks. | Specify maximum global error rate constraint kind and attributes, for example --constraint kind=max_global_error_rate,rate=10,minimum=100 |
| --max-requests Maximum requests per benchmark. If None, runs until max_seconds or data exhaustion. | Specify maximum requests constraint kind and attributes, for example --constraint kind=max_requests,count=1000 |
| --max-seconds Maximum seconds per benchmark. If None, runs until max_requests or data exhaustion. | Specify maximum duration constraint kind and attributes, for example --constraint kind=max_duration,seconds=60 |
| --model Model ID to benchmark. If not provided, uses first available model. | Specify with the model attribute of the backend configuration, for example --backend kind=openai_http,model=gpt4 |
| --output-dir or –-output-path: The directory path to save file output types in | Specify paths as part of the individual output configurations, for example --output kind=json,path=/tmp/reports/benchmark.json |
| --outputs The filename.ext for each of the outputs to create or the alises (json, csv, html) for the output files to create with their default file names (benchmark.[EXT]) | Specify multiple output formats by repeating the --output option, for example --output kind=json,path=benchmark.json –output kind=csv,path=benchmark.csv |
| --over-saturation Enable over-saturation detection. Pass a JSON dict with configuration (e.g., '{"enabled": true, "min_seconds": 30}'). Defaults to None (disabled). | Specify oversaturation constraint kind and attributes, for example --constraint kind=over_saturation,mode=enforce,min_seconds=30 |
| --processor-args JSON string of arguments to pass to the processor constructor. | Specify options directly along with the tokenizer kind, for example --tokenizer '{"kind":"huggingface_auto","load_kwargs":{"fast":true}}' |
| --processor Processor or tokenizer for token count calculations. If not provided, loads from model. | Defaults to the default tokenizer for the first model supported by the backend target. To override, specify the tokener kind and attributes, for example --tokenizer kind=huggingface_auto,model=gpt4 |
| --profile Benchmark profile type. Options: sweep, async, poisson, synchronous, throughput, concurrent, constant. | Specify the benchmark profile kind and attributes to use, for example, --profile kind=sweep,sweep_size=10,warmup=1,cooldown=1 |
| --rampup The time, in seconds, to ramp up the request rate over. Applicable for Throughput, Concurrent, and Constant strategies | Specify with the rampup_duration profile attribute, for example --profile kind=constant,rate=10,rampup_duration=2 |
| --random-seed Random seed for reproducibility. | Specify the random seed configuration kind and attributes, for example --seed kind=static,value=42 |
| --rate Benchmark rate(s) to test. Meaning depends on profile: sweep=number of benchmarks, concurrent=concurrent requests, async/constant/poisson=requests per second. | "Rate" was overloaded to specify the primary configuration for each profile type. Specify the appropriate attribute with --profile or --override profile.<name>: async/constant/poisson → rate, concurrent → streams, sweep → sweep_size, throughput → max_concurrency. |
| --request-format Format to use for requests. Options depend on backend. For vLLM backend: plain (no chat template, text appending only), default-template (use tokenizer default), or a file path / single-line template per vLLM docs. Default: default-template For openai backend: http endpoint path (/v1/chat/completions, /v1/completions, /v1/audio/transcriptions, /v1/audio/translations) or alias (e.g. chat_completions); default /v1/chat/completions. | Specify as part of backend configuration, like --backend kind=openai_http,request_format=/v1/responses |
| --sample-requests Number of sample requests per status to save. None (default) saves all, recommended: 20. | Specify with the sample_size attribute of the metrics configuration, for example --metrics kind=generative,sample_size=20. The default if unspecified is to save all samples. |
| --scenario Builtin scenario name or path to config file. CLI options override scenario settings. | The preferred name is now --config, although both --scenario and -c are aliases, for example --config chat or --config my-scenario.yaml. |
| --target Target backend URL (e.g., http://localhost:8000). | Specify with the target attribute of the backend configuration, for example --backend kind=openai_http,target=http://localhost:8000 |
| --warmup Warmup specification: int, float, or dict as string (json or key=value). Controls time or requests before measurement starts. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema. | Specify with the warmup profile attribute, for example --profile kind=synchronous,warmup=2 for a two second warmup or --profile '{"kind":"concurrent","warmup":{"mode":"duration","value":2}}' |
| NEW OPTIONS | v0.7.0 new options |
|---|---|
| Add metadata to output reports | Specify key-value pairs of metadata labels which will be written to the output reports, for example --label gpu=NVIDIA_Z500 --label creator=Intrepid_Adventurer |
| Override concurrent profile stream count and async/constant/poisson rate | Specify profile settings to override the default profile settings, for example --override profile.rate 10,20,30 or --override profile.streams 10,20,30 |
Guidellm benchmark from-file
Load a saved benchmark report and optionally re-export data.
[!WARNING]\ This command may be changed to be more consistent with the
runcommand in the future.
| Option | v0.7.0 equivalent |
|---|---|
| PATH Path to the saved benchmark report file (default: ./benchmarks. | Unchanged |
| --output-path Directory or file path to save re-exported benchmark results. If a directory, all output formats will be saved there. If a file, the matching format will be saved to that file. | Unchanged |
| --output-formats Output formats for benchmark results (e.g., console, json, html, csv). | Unchanged |
guidellm config
Show configuration settings
Changed from guidellm config to guidellm env to clarify that it displays environment variables affecting GuideLLM operation.
guidellm config will be used later for a different purpose, to generate YAML config files from run options.
guidellm preprocess dataset
Tools for preprocessing datasets for use in benchmarks.
[!WARNING]\ This command may be changed to be more consistent with the
runcommand in the future.
| v0.6.0 option | v0.7.0 equivalent |
|---|---|
| data (positional parameter) | Use dataset descriptor, for example kind=huggingface,source=<id> |
| output_path (positional parameter) | Results file path, for example file.json |
| --processor TEXT Processor or tokenizer name for calculating token counts. | Unchanged |
--config TEXT PreprocessDatasetConfig as JSON string, key=value pairs, or file path (.json, .yaml, .yml, .config). Example: prompt_tokens=100,output_tokens=50,prefix_tokens_max=10 or {"prompt_tokens": 100, "output_tokens": 50, "prefix_tokens_max": 10} [Mandatory] | Unchanged |
| --processor-args TEXT JSON string of arguments to pass to the processor constructor. | Unchanged |
| --data-args TEXT JSON string of arguments to pass to dataset creation | Unchanged |
| --data-column-mapper JSON string of column mappings to apply to the dataset | Specify a data column mapper object, for example --data-column-mapper '{"kind":"generative_column_mapper","column_mappings":{"text_column":"instruction"}}' |
| --short-prompt-strategy [ignore, concatenate, pad, error] Strategy for handling prompts shorter than target length. [default: ignore] | Unchanged |
| --pad-char TEXT Character to pad short prompts with when using "pad" strategy (used with 'concatenate' strategy). | Unchanged |
| --concat-delimiter TEXT Delimiter for concatenating short prompts (used with 'concatenate' strategy). | Unchanged |
| --include-prefix-in-token-count Include prefix tokens in prompt token count calculation. | Unchanged |
| --push-to-hub Push the processed dataset to Hugging Face Hub. | Unchanged |
--hub-dataset-id TEXT Hugging Face Hub dataset ID for upload (required if --push-to-hub is set). | Unchanged |
| --random-seed INTEGER Random seed for reproducible token sampling. [default: 42] | Unchanged |