guidellm.mock_server.utils
Mock server utilities for text generation and tokenization testing.
This module provides mock tokenization and text generation utilities for testing guidellm's mock server functionality. It includes a mock tokenizer that simulates tokenization processes, functions to generate reproducible fake text with specific token counts, and timing generators for realistic benchmarking scenarios.
MockTokenizer
Bases: PreTrainedTokenizerBase
Mock tokenizer implementation for testing text processing workflows.
Provides a simplified tokenizer that splits text using regex patterns and generates deterministic token IDs based on string hashing. Used for testing guidellm components without requiring actual model tokenizers.
Attributes:
| Name | Type | Description |
|---|---|---|
VocabSize | Fixed vocabulary size for the mock tokenizer |
Source code in src/guidellm/mock_server/utils.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | |
__call__(text, **kwargs)
Tokenize text and return token IDs (callable interface).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text | str | list[str] | Input text to tokenize | required |
Returns:
| Type | Description |
|---|---|
list[int] | List of token IDs |
Source code in src/guidellm/mock_server/utils.py
__len__()
Get the vocabulary size of the tokenizer.
Returns:
| Type | Description |
|---|---|
int | The total number of tokens in the vocabulary |
apply_chat_template(conversation, tokenize=False, add_generation_prompt=False, **kwargs)
Apply a chat template to format conversation messages.
Mock implementation that concatenates all message content for testing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
conversation | list | List of chat messages | required |
tokenize | bool | Whether to return tokens or string | False |
add_generation_prompt | bool | Whether to add generation prompt | False |
Returns:
| Type | Description |
|---|---|
str | list[int] | Formatted text string or token IDs |
Source code in src/guidellm/mock_server/utils.py
convert_ids_to_tokens(ids, _skip_special_tokens=False)
Convert numeric token IDs back to token strings.
Generates fake text tokens using Faker library seeded with token IDs for deterministic and reproducible token generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids | list[int] | Single token ID or list of token IDs to convert | required |
Returns:
| Type | Description |
|---|---|
list[str] | Single token string or list of token strings |
Source code in src/guidellm/mock_server/utils.py
convert_tokens_to_ids(tokens)
Convert token strings to numeric token IDs.
Uses deterministic hashing to generate consistent token IDs for reproducible testing scenarios.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens | str | list[str] | Single token string or list of token strings | required |
Returns:
| Type | Description |
|---|---|
list[int] | Single token ID or list of token IDs |
Source code in src/guidellm/mock_server/utils.py
convert_tokens_to_string(tokens)
Convert a list of token strings back to a single text string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens | list[str] | List of token strings to concatenate | required |
Returns:
| Type | Description |
|---|---|
str | Concatenated string from all tokens |
Source code in src/guidellm/mock_server/utils.py
decode(token_ids, skip_special_tokens=True, **kwargs)
Decode token IDs back to text string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
token_ids | list[int] | List of token IDs to decode | required |
skip_special_tokens | bool | Whether to skip special tokens | True |
Returns:
| Type | Description |
|---|---|
str | Decoded text string |
Source code in src/guidellm/mock_server/utils.py
tokenize(text, **_kwargs)
Tokenize input text into a list of token strings.
Splits text using regex to separate words, punctuation, and whitespace into individual tokens for processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text | TextInput | Input text to tokenize | required |
Returns:
| Type | Description |
|---|---|
list[str] | List of token strings from the input text |
Source code in src/guidellm/mock_server/utils.py
create_fake_text(num_tokens, processor, seed=42, fake=None)
Generate fake text using a tokenizer processor with specified token count.
Creates text by generating fake tokens and joining them into a string, ensuring the result has the exact number of tokens when processed by the given tokenizer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_tokens | int | Target number of tokens in the generated text | required |
processor | PreTrainedTokenizerBase | Tokenizer to use for token generation and validation | required |
seed | int | Random seed for reproducible text generation | 42 |
fake | Faker | None | Optional Faker instance for text generation | None |
Returns:
| Type | Description |
|---|---|
str | Generated text string with the specified token count |
Source code in src/guidellm/mock_server/utils.py
create_fake_tokens_str(num_tokens, processor, seed=42, fake=None)
Generate fake token strings using a tokenizer processor.
Creates a list of token strings by generating fake text and tokenizing it until the desired token count is reached. Uses the provided tokenizer for accurate token boundary detection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_tokens | int | Target number of tokens to generate | required |
processor | PreTrainedTokenizerBase | Tokenizer to use for token generation and validation | required |
seed | int | Random seed for reproducible token generation | 42 |
fake | Faker | None | Optional Faker instance for text generation | None |
Returns:
| Type | Description |
|---|---|
list[str] | List of token strings with the specified count |
Source code in src/guidellm/mock_server/utils.py
sample_number(mean, standard_dev)
Generate a single timing value from a normal distribution.
Samples one timing value from a normal distribution with the specified parameters, ensuring the result is non-negative for realistic timing simulation in benchmarking scenarios.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mean | float | Mean value for the normal distribution | required |
standard_dev | float | Standard deviation for the normal distribution | required |
Returns:
| Type | Description |
|---|---|
float | Non-negative timing value from the distribution |
Source code in src/guidellm/mock_server/utils.py
times_generator(mean, standard_dev)
Generate infinite timing values from a normal distribution.
Creates a generator that yields timing values sampled from a normal distribution, useful for simulating realistic request timing patterns in benchmarking scenarios.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mean | float | Mean value for the normal distribution | required |
standard_dev | float | Standard deviation for the normal distribution | required |
Returns:
| Type | Description |
|---|---|
Generator[float] | Generator yielding positive timing values from the distribution |