guidellm.scheduler
Scheduler subsystem for orchestrating benchmark workloads and managing worker processes.
This module provides the core scheduling infrastructure for guidellm, including strategies for controlling request timing patterns (synchronous, asynchronous, constant rate, Poisson), constraints for limiting benchmark execution (duration, error rates, request counts), and distributed execution through worker processes. The scheduler coordinates between backend interfaces, manages benchmark state transitions, and handles multi-turn request sequences with customizable timing strategies and resource constraints.
BackendT = TypeVar('BackendT', bound=BackendInterface) module-attribute
Generic backend interface type for request processing
DatasetIterT = TypeAliasType('DatasetIterT', Iterable[Iterable[RequestT]], type_params=(RequestT,)) module-attribute
Output of data loader, an iterable of batches, where each batch is an iterable of (request, timestamp) tuples.
HistoryT = TypeAliasType('HistoryT', list[tuple[RequestT, ResponseT | None]], type_params=(RequestT, ResponseT)) module-attribute
Record of requests + responses in conversation.
RequestDataT = TypeAliasType('RequestDataT', tuple[RequestT, RequestInfo], type_params=(RequestT,)) module-attribute
Request including external metadata and scheduling config.
RequestT = TypeVar('RequestT') module-attribute
Generic request object type for scheduler processing
ResponseT = TypeVar('ResponseT') module-attribute
Generic response object type returned by backend processing
StrategyT = TypeVar('StrategyT', bound=SchedulingStrategy) module-attribute
Type variable bound to SchedulingStrategy for generic strategy operations
AsyncConstantStrategy
Bases: SchedulingStrategy
Constant-rate scheduling for predictable load patterns.
Schedules requests at a fixed rate distributed evenly across worker processes, providing predictable timing behavior for steady-state load simulation and consistent system performance measurement. Requests arrive at uniform intervals.
Source code in src/guidellm/scheduler/strategies.py
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 | |
processes_limit property
Returns:
| Type | Description |
|---|---|
PositiveInt | None | Max concurrency if set, otherwise None for unlimited |
requests_limit property
Returns:
| Type | Description |
|---|---|
PositiveInt | None | Max concurrency if set, otherwise None for unlimited |
__str__()
next_request_time(worker_index) async
Calculate next request time at fixed intervals with optional linear rampup.
Schedules requests at uniform intervals determined by the configured rate, independent of request completion times. If rampup_duration is set, the rate increases linearly from 0 to the target rate during the rampup period, then continues at the constant rate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
worker_index | PositiveInt | Unused for constant strategy | required |
Returns:
| Type | Description |
|---|---|
float | Start time plus interval based on request index and rampup configuration |
Source code in src/guidellm/scheduler/strategies.py
request_completed(request_info)
Handle request completion (no-op for constant strategy).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request_info | RequestInfo | Completed request metadata (unused) | required |
Source code in src/guidellm/scheduler/strategies.py
AsyncPoissonStrategy
Bases: SchedulingStrategy
Poisson-distributed scheduling for realistic load simulation.
Schedules requests following a Poisson process with exponentially distributed inter-arrival times, providing realistic simulation of user behavior and network traffic patterns. Request arrivals have random variance around the target rate.
Source code in src/guidellm/scheduler/strategies.py
574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 | |
processes_limit property
Returns:
| Type | Description |
|---|---|
PositiveInt | None | Max concurrency if set, otherwise None for unlimited |
requests_limit property
Returns:
| Type | Description |
|---|---|
PositiveInt | None | Max concurrency if set, otherwise None for unlimited |
__str__()
init_processes_start(start_time)
Initialize the offset time for Poisson timing calculations.
Sets the initial timing offset from which exponentially distributed intervals are calculated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_time | float | Unix timestamp when request processing should begin | required |
Source code in src/guidellm/scheduler/strategies.py
init_processes_timings(worker_count, max_concurrency, mp_context)
Initialize Poisson-specific timing state.
Sets up shared offset value for coordinating exponentially distributed request timing across worker processes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
worker_count | PositiveInt | Number of worker processes to coordinate | required |
max_concurrency | PositiveInt | Maximum number of concurrent requests allowed | required |
Source code in src/guidellm/scheduler/strategies.py
next_request_time(worker_index) async
Calculate next request time using exponential distribution.
Generates inter-arrival times following exponential distribution, accumulating delays to produce Poisson-distributed request arrivals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
worker_index | PositiveInt | Unused for Poisson strategy | required |
Returns:
| Type | Description |
|---|---|
float | Next arrival time based on Poisson process |
Source code in src/guidellm/scheduler/strategies.py
request_completed(request_info)
Handle request completion (no-op for Poisson strategy).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request_info | RequestInfo | Completed request metadata (unused) | required |
Source code in src/guidellm/scheduler/strategies.py
BackendInterface
Bases: Protocol, Generic[RequestT, ResponseT]
Protocol defining the interface for request processing backends.
Establishes the contract for backend implementations that process requests within the scheduler system. Backends manage initialization, validation, processing, and shutdown lifecycle. All properties must be pickleable before process_startup is called for multi-process environments.
Example: :: class CustomBackend(BackendInterface): @property def processes_limit(self) -> int: return 4
async def resolve(self, request, request_info, history=None):
yield response, updated_request_info
Source code in src/guidellm/scheduler/schemas.py
info property
Returns:
| Type | Description |
|---|---|
dict[str, Any] | Backend metadata including model initialization and configuration |
processes_limit property
Returns:
| Type | Description |
|---|---|
int | None | Maximum worker processes supported, or None if unlimited |
requests_limit property
Returns:
| Type | Description |
|---|---|
int | None | Maximum concurrent requests supported, or None if unlimited |
process_shutdown() async
Perform backend cleanup and shutdown procedures.
Raises:
| Type | Description |
|---|---|
Exception | Implementation-specific exceptions for shutdown failures |
process_startup() async
Perform backend initialization and startup procedures.
Raises:
| Type | Description |
|---|---|
Exception | Implementation-specific exceptions for startup failures |
resolve(request, request_info, history=None) async
Process a request and yield incremental response updates.
:yield: Tuples of (response, updated_request_info) for each response chunk. Response may be None for intermediate updates (e.g., first token arrival).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request | RequestT | The request object to process | required |
request_info | RequestInfo | Scheduling metadata and timing information | required |
history | HistoryT[RequestT, ResponseT] | None | Conversation history for multi-turn requests | None |
Raises:
| Type | Description |
|---|---|
Exception | Implementation-specific exceptions for processing failures |
Source code in src/guidellm/scheduler/schemas.py
validate() async
Validate backend configuration and operational status.
Raises:
| Type | Description |
|---|---|
Exception | Implementation-specific exceptions for validation failures |
ConcurrentStrategy
Bases: SchedulingStrategy
Parallel request processing with fixed concurrency limits.
Enables concurrent request processing up to a specified number of streams, providing balanced throughput while maintaining predictable resource usage. Requests are distributed across streams with completion-based timing coordination.
Source code in src/guidellm/scheduler/strategies.py
317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 | |
processes_limit property
Returns:
| Type | Description |
|---|---|
PositiveInt | Number of streams as maximum worker processes |
requests_limit property
Returns:
| Type | Description |
|---|---|
PositiveInt | Number of streams as maximum concurrent requests |
__str__()
next_request_time(worker_index) async
Calculate next request time with stream-based distribution.
Initial requests are staggered across streams during rampup, subsequent requests scheduled after previous completion within each stream.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
worker_index | PositiveInt | Worker process index for distributing initial requests | required |
Returns:
| Type | Description |
|---|---|
float | Time of last completion or staggered start time if first request |
Source code in src/guidellm/scheduler/strategies.py
request_completed(request_info)
Update timing state with completed request information.
Tracks completion time to schedule next request in the same stream.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request_info | RequestInfo | Completed request metadata including timing | required |
Source code in src/guidellm/scheduler/strategies.py
Constraint
Bases: Protocol
Protocol for constraint evaluation functions that control scheduler behavior.
Defines the interface that all constraint implementations must follow. Constraints are callable objects that evaluate scheduler state and request information to determine whether processing should continue or stop. The protocol enables type checking and runtime validation of constraint implementations while allowing flexible implementation approaches (functions, classes, closures).
Example: :: def my_constraint( state: SchedulerState, request: RequestInfo ) -> SchedulerUpdateAction: if state.processing_requests > 100: return SchedulerUpdateAction(request_queuing="stop") return SchedulerUpdateAction(request_queuing="continue")
Source code in src/guidellm/scheduler/constraints/constraint.py
__call__(state, request)
Evaluate constraint against scheduler state and request information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | SchedulerState | Current scheduler state with metrics and timing information | required |
request | RequestInfo | Individual request information and metadata | required |
Returns:
| Type | Description |
|---|---|
SchedulerUpdateAction | Action indicating whether to continue or stop scheduler operations |
Source code in src/guidellm/scheduler/constraints/constraint.py
ConstraintArgs
Bases: PydanticClassRegistryMixin['ConstraintArgs']
Base class for constraint configuration arguments.
Uses PydanticClassRegistryMixin to enable polymorphic deserialization based on the kind field. Each registered subclass represents a specific constraint type with its own parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
schema_discriminator | str | Field name for polymorphic deserialization |
Source code in src/guidellm/scheduler/constraints/args.py
constraint_key property
The key to use when inserting into the constraints dict.
Defaults to kind, but subclasses may override if the factory registry key differs from the args kind.
Returns:
| Type | Description |
|---|---|
str | Registry key for this constraint type |
__pydantic_schema_base_type__() classmethod
Return base type for polymorphic validation hierarchy.
Returns:
| Type | Description |
|---|---|
type[ConstraintArgs] | Base ConstraintArgs class for schema validation |
Source code in src/guidellm/scheduler/constraints/args.py
ConstraintInitializer
Bases: Protocol
Protocol for constraint initializer factory functions that create constraints.
Defines the interface for factory objects that create constraint instances from configuration parameters. Constraint initializers enable dynamic constraint creation and configuration, supporting both simple boolean flags and complex parameter dictionaries. The protocol allows type checking while maintaining flexibility for different initialization patterns.
Example: :: class MaxRequestsInitializer: def init(self, max_requests: int): self.max_requests = max_requests
def create_constraint(self) -> Constraint:
def evaluate(state, request):
if state.total_requests >= self.max_requests:
return SchedulerUpdateAction(request_queuing="stop")
return SchedulerUpdateAction(request_queuing="continue")
return evaluate
Source code in src/guidellm/scheduler/constraints/constraint.py
create_constraint(**kwargs)
Create a constraint instance from configuration parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | Configuration parameters for constraint creation | {} |
Returns:
| Type | Description |
|---|---|
Constraint | Configured constraint evaluation function |
Source code in src/guidellm/scheduler/constraints/constraint.py
ConstraintsInitializerFactory
Bases: RegistryMixin[ConstraintInitializer]
Registry factory for creating and managing constraint initializers.
Provides centralized access to registered constraint types with support for creating constraints from ConstraintArgs instances or pre-configured initializer instances. Handles constraint resolution and type validation for the scheduler constraint system.
Example: :: from guidellm.scheduler import ConstraintsInitializerFactory
# Register new constraint type
@ConstraintsInitializerFactory.register("new_constraint")
class NewConstraint:
def create_constraint(self, **kwargs) -> Constraint:
return lambda state, request: SchedulerUpdateAction()
# Create and use constraint
args = NewConstraintArgs(kind="new_constraint")
initializer = ConstraintsInitializerFactory.create(args)
constraint = initializer.create_constraint()
Source code in src/guidellm/scheduler/constraints/factory.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
create(args) classmethod
Create a constraint initializer from a ConstraintArgs instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args | ConstraintArgs | Validated constraint arguments with kind discriminator | required |
Returns:
| Type | Description |
|---|---|
ConstraintInitializer | Configured constraint initializer instance |
Raises:
| Type | Description |
|---|---|
ValueError | If args.kind is not registered in the factory |
Source code in src/guidellm/scheduler/constraints/factory.py
deserialize(initializer_dict) classmethod
Deserialize constraint initializer from dictionary format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initializer_dict | dict[str, Any] | Dictionary representation of constraint initializer | required |
Returns:
| Type | Description |
|---|---|
SerializableConstraintInitializer | UnserializableConstraintInitializer | Reconstructed constraint initializer instance |
Raises:
| Type | Description |
|---|---|
ValueError | If constraint type is unknown or cannot be deserialized |
Source code in src/guidellm/scheduler/constraints/factory.py
resolve(initializers) classmethod
Resolve constraint initializers to callable constraints.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initializers | dict[str, Constraint | ConstraintInitializer] | Dictionary mapping constraint keys to specifications. Values must be Constraint instances or ConstraintInitializer instances. | required |
Returns:
| Type | Description |
|---|---|
dict[str, Constraint] | Dictionary mapping constraint keys to callable functions |
Raises:
| Type | Description |
|---|---|
TypeError | If a value is not a supported type |
Source code in src/guidellm/scheduler/constraints/factory.py
Environment
Bases: ABC, Generic[RequestT, ResponseT], InfoMixin
Abstract interface for coordinating scheduler execution across distributed nodes.
Defines the protocol for managing distributed scheduler execution including parameter synchronization, timing coordination, state updates, error propagation, and result aggregation. Implementations handle distributed coordination complexity while providing a unified interface for scheduler orchestration.
Source code in src/guidellm/scheduler/environments.py
sync_run_end() abstractmethod async
Finalize execution and aggregate results from all nodes.
Returns:
| Type | Description |
|---|---|
AsyncIterator[tuple[ResponseT | None, RequestT, RequestInfo, SchedulerState]] | Iterator of (response, request, request_info, state) tuples from remote nodes in distributed environments, empty for non-distributed |
Raises:
| Type | Description |
|---|---|
Exception | Any errors that occurred during execution |
Source code in src/guidellm/scheduler/environments.py
sync_run_error(err) abstractmethod async
Handle and propagate errors across all active nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
err | list[Exception] | Exception | The exception(s) that occurred during execution | required |
sync_run_params(requests, strategy, constraints) abstractmethod async
Synchronize execution parameters across nodes and resolve local scope.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
requests | DatasetIterT[RequestT] | Complete set of requests to process across all nodes | required |
strategy | SchedulingStrategy | Scheduling strategy to apply during execution | required |
constraints | dict[str, Constraint] | Runtime constraints to enforce during execution | required |
Returns:
| Type | Description |
|---|---|
tuple[DatasetIterT[RequestT], SchedulingStrategy, dict[str, Constraint]] | Tuple of (local_requests, strategy, constraints) for this node |
Raises:
| Type | Description |
|---|---|
Exception | If parameter synchronization fails or nodes inconsistent |
Source code in src/guidellm/scheduler/environments.py
sync_run_start() abstractmethod async
Coordinate synchronized start time across all nodes.
Returns:
| Type | Description |
|---|---|
float | Unix timestamp when all nodes should begin processing |
Raises:
| Type | Description |
|---|---|
Exception | If startup synchronization fails across nodes |
Source code in src/guidellm/scheduler/environments.py
update_run_iteration(response, request, request_info, state) abstractmethod async
Update environment state with completed request iteration results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
response | ResponseT | None | Response generated for the request, if successful | required |
request | RequestT | The processed request | required |
request_info | RequestInfo | Metadata about request processing including timings | required |
state | SchedulerState | Current scheduler state with metrics and progress | required |
Raises:
| Type | Description |
|---|---|
Exception | If state update fails or indicates critical errors |
Source code in src/guidellm/scheduler/environments.py
MaxDurationConstraint
Bases: PydanticConstraintInitializer
Constraint that limits execution based on maximum time duration.
Stops both request queuing and processing when the elapsed time since scheduler start exceeds the maximum duration. Provides progress tracking based on remaining time and completion fraction.
Source code in src/guidellm/scheduler/constraints/request.py
__call__(state, request_info)
Evaluate constraint against current scheduler state and elapsed time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | SchedulerState | Current scheduler state with start time | required |
request_info | RequestInfo | Individual request information (unused) | required |
Returns:
| Type | Description |
|---|---|
SchedulerUpdateAction | Action indicating whether to continue or stop operations |
Source code in src/guidellm/scheduler/constraints/request.py
create_constraint(**_kwargs)
Return self as the constraint instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | Additional keyword arguments (unused) | required |
Returns:
| Type | Description |
|---|---|
Constraint | Self instance as the constraint |
Source code in src/guidellm/scheduler/constraints/request.py
MaxDurationConstraintArgs
Bases: ConstraintArgs
Arguments for maximum duration constraint.
Limits benchmark execution time per strategy.
Attributes:
| Name | Type | Description |
|---|---|---|
kind | Literal['max_duration'] | Always "max_duration" |
Source code in src/guidellm/scheduler/constraints/request.py
MaxErrorRateConstraint
Bases: PydanticConstraintInitializer
Constraint that limits execution based on sliding window error rate.
Tracks error status of recent requests in a sliding window and stops all processing when the error rate exceeds the threshold. Only applies the constraint after processing enough requests to fill the minimum window size for statistical significance.
Source code in src/guidellm/scheduler/constraints/error.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 | |
__call__(state, request_info)
Evaluate constraint against sliding window error rate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | SchedulerState | Current scheduler state with request counts | required |
request_info | RequestInfo | Individual request with completion status | required |
Returns:
| Type | Description |
|---|---|
SchedulerUpdateAction | Action indicating whether to continue or stop operations |
Source code in src/guidellm/scheduler/constraints/error.py
create_constraint(**_kwargs)
Create a new instance of MaxErrorRateConstraint (due to stateful window).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | Additional keyword arguments (unused) | required |
Returns:
| Type | Description |
|---|---|
Constraint | New instance of the constraint |
Source code in src/guidellm/scheduler/constraints/error.py
MaxErrorRateConstraintArgs
Bases: ConstraintArgs
Arguments for maximum error rate constraint (sliding window).
Stops execution when the windowed error rate exceeds the threshold.
Attributes:
| Name | Type | Description |
|---|---|---|
kind | Literal['max_error_rate'] | Always "max_error_rate" |
Source code in src/guidellm/scheduler/constraints/error.py
MaxErrorsConstraint
Bases: PydanticConstraintInitializer
Constraint that limits execution based on absolute error count.
Stops both request queuing and all request processing when the total number of errored requests reaches the maximum threshold. Uses global error tracking across all requests for immediate constraint evaluation.
Source code in src/guidellm/scheduler/constraints/error.py
__call__(state, request_info)
Evaluate constraint against current error count.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | SchedulerState | Current scheduler state with error counts | required |
request_info | RequestInfo | Individual request information (unused) | required |
Returns:
| Type | Description |
|---|---|
SchedulerUpdateAction | Action indicating whether to continue or stop operations |
Source code in src/guidellm/scheduler/constraints/error.py
create_constraint(**_kwargs)
Return self as the constraint instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | Additional keyword arguments (unused) | required |
Returns:
| Type | Description |
|---|---|
Constraint | Self instance as the constraint |
Source code in src/guidellm/scheduler/constraints/error.py
MaxErrorsConstraintArgs
Bases: ConstraintArgs
Arguments for maximum error count constraint.
Stops execution when total errors reach the threshold.
Attributes:
| Name | Type | Description |
|---|---|---|
kind | Literal['max_errors'] | Always "max_errors" |
Source code in src/guidellm/scheduler/constraints/error.py
MaxGlobalErrorRateConstraint
Bases: PydanticConstraintInitializer
Constraint that limits execution based on global error rate.
Calculates error rate across all processed requests and stops all processing when the rate exceeds the threshold. Only applies the constraint after processing the minimum number of requests to ensure statistical significance for global error rate calculations.
Source code in src/guidellm/scheduler/constraints/error.py
256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 | |
__call__(state, request_info)
Evaluate constraint against global error rate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | SchedulerState | Current scheduler state with global request and error counts | required |
request_info | RequestInfo | Individual request information (unused) | required |
Returns:
| Type | Description |
|---|---|
SchedulerUpdateAction | Action indicating whether to continue or stop operations |
Source code in src/guidellm/scheduler/constraints/error.py
create_constraint(**_kwargs)
Return self as the constraint instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | Additional keyword arguments (unused) | required |
Returns:
| Type | Description |
|---|---|
Constraint | Self instance as the constraint |
Source code in src/guidellm/scheduler/constraints/error.py
MaxGlobalErrorRateConstraintArgs
Bases: ConstraintArgs
Arguments for maximum global error rate constraint.
Stops execution when the overall error rate across all requests exceeds the threshold. Only applies after min_processed requests are completed.
Attributes:
| Name | Type | Description |
|---|---|---|
kind | Literal['max_global_error_rate'] | Always "max_global_error_rate" |
Source code in src/guidellm/scheduler/constraints/error.py
MaxNumberConstraint
Bases: PydanticConstraintInitializer
Constraint that limits execution based on maximum request counts.
Stops request queuing when created requests reach the limit and stops local request processing when processed requests reach the limit. Provides progress tracking based on remaining requests and completion fraction.
Source code in src/guidellm/scheduler/constraints/request.py
__call__(state, request_info)
Evaluate constraint against current scheduler state and request count.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | SchedulerState | Current scheduler state with request counts | required |
request_info | RequestInfo | Individual request information (unused) | required |
Returns:
| Type | Description |
|---|---|
SchedulerUpdateAction | Action indicating whether to continue or stop operations |
Source code in src/guidellm/scheduler/constraints/request.py
create_constraint(**_kwargs)
Return self as the constraint instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | Additional keyword arguments (unused) | required |
Returns:
| Type | Description |
|---|---|
Constraint | Self instance as the constraint |
Source code in src/guidellm/scheduler/constraints/request.py
MaxRequestsConstraintArgs
Bases: ConstraintArgs
Arguments for maximum request count constraint.
Limits the number of requests processed per strategy.
Attributes:
| Name | Type | Description |
|---|---|---|
kind | Literal['max_requests'] | Always "max_requests" |
Source code in src/guidellm/scheduler/constraints/request.py
NonDistributedEnvironment
Bases: Environment[RequestT, ResponseT]
Single-node scheduler execution environment with minimal coordination overhead.
Implements the Environment interface with no-op synchronization for local testing, development, and single-machine benchmarking. All synchronization methods return immediately without distributed coordination logic.
Example: :: from guidellm.scheduler import ( MaxNumberConstraint, MaxRequestsConstraintArgs, NonDistributedEnvironment, RequestInfo, SchedulerState, SynchronousStrategy, )
env = NonDistributedEnvironment()
requests = [f"req_{ind}" for ind in range(5)]
strategy = SynchronousStrategy()
args = MaxRequestsConstraintArgs(max_num=5)
constraints = {"max_requests": MaxNumberConstraint(args=args)}
state = SchedulerState()
local_req, local_strat, local_const = await env.sync_run_params(
requests, strategy, constraints
)
start_time = await env.sync_run_start()
for req in local_req:
state.processed_requests += 1
await env.update_run_iteration(f"resp_{req}", req, RequestInfo(), state)
async for nonlocal_req in env.sync_run_end():
state.processed_requests += 1
Source code in src/guidellm/scheduler/environments.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 | |
__init__()
sync_run_end() async
Finalize single-node execution and propagate any stored errors.
Returns:
| Type | Description |
|---|---|
AsyncIterator[tuple[ResponseT | None, RequestT, RequestInfo, SchedulerState]] | Empty iterator as there are no remote nodes |
Raises:
| Type | Description |
|---|---|
Exception | Any error stored during execution via sync_run_error |
Source code in src/guidellm/scheduler/environments.py
sync_run_error(err) async
Store error for later propagation during run finalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
err | Exception | list[Exception] | The exception(s) that occurred during execution | required |
Source code in src/guidellm/scheduler/environments.py
sync_run_params(requests, strategy, constraints) async
Return parameters unchanged for single-node execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
requests | DatasetIterT[RequestT] | Requests to process locally | required |
strategy | SchedulingStrategy | Scheduling strategy to apply during execution | required |
constraints | dict[str, Constraint] | Runtime constraints to enforce during execution | required |
Returns:
| Type | Description |
|---|---|
tuple[DatasetIterT[RequestT], SchedulingStrategy, dict[str, Constraint]] | Original (requests, strategy, constraints) tuple unchanged |
Source code in src/guidellm/scheduler/environments.py
sync_run_start() async
Return current time plus configured delay for single-node startup.
Returns:
| Type | Description |
|---|---|
float | Unix timestamp when execution should begin |
Source code in src/guidellm/scheduler/environments.py
update_run_iteration(response, request, request_info, state) async
No-op for single-node execution with no distributed state synchronization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
response | ResponseT | None | Response generated for the request, if successful | required |
request | RequestT | The processed request | required |
request_info | RequestInfo | Metadata about request processing including timings | required |
state | SchedulerState | Current scheduler state with metrics and progress | required |
Source code in src/guidellm/scheduler/environments.py
OverSaturationConstraint
Bases: Constraint
Constraint that detects and stops execution when over-saturation is detected.
This constraint implements the Over-Saturation Detection (OSD) algorithm to identify when a model becomes over-saturated (response rate doesn't keep up with request rate). When over-saturation is detected, the constraint stops request queuing and optionally stops processing of existing requests.
The constraint maintains internal state for tracking concurrent requests and time-to-first-token (TTFT) metrics, using statistical slope detection to identify performance degradation patterns.
Source code in src/guidellm/scheduler/constraints/saturation.py
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 | |
info property
Get current constraint configuration and state information.
Returns:
| Type | Description |
|---|---|
dict[str, Any] | Dictionary containing configuration parameters. |
__call__(state, request_info)
Evaluate constraint against current scheduler state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | SchedulerState | Current scheduler state. | required |
request_info | RequestInfo | Individual request information. | required |
Returns:
| Type | Description |
|---|---|
SchedulerUpdateAction | Action indicating whether to continue or stop operations. |
Source code in src/guidellm/scheduler/constraints/saturation.py
__init__(minimum_duration=30.0, minimum_ttft=2.5, maximum_window_seconds=120.0, moe_threshold=2.0, maximum_window_ratio=0.75, minimum_window_size=5, confidence=0.95, eps=1e-12, mode='enforce')
Initialize the over-saturation constraint.
Creates a new constraint instance with specified detection parameters. The constraint will track concurrent requests and TTFT metrics, using statistical slope detection to identify when the model becomes over-saturated. All parameters have sensible defaults suitable for most benchmarking scenarios.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
minimum_duration | float | Minimum seconds before checking for over-saturation (default: 30.0) | 30.0 |
minimum_ttft | float | Minimum TTFT threshold in seconds for violation counting (default: 2.5) | 2.5 |
maximum_window_seconds | float | Maximum time window in seconds for data retention (default: 120.0) | 120.0 |
moe_threshold | float | Margin of error threshold for slope detection (default: 2.0) | 2.0 |
maximum_window_ratio | float | Maximum window size as ratio of total requests (default: 0.75) | 0.75 |
minimum_window_size | int | Minimum data points required for slope estimation (default: 5) | 5 |
confidence | float | Statistical confidence level for t-distribution (0-1) (default: 0.95) | 0.95 |
eps | float | Epsilon for numerical stability in calculations (default: 1e-12) | 1e-12 |
mode | Literal['enforce', 'monitor'] | Whether to stop when over-saturation is detected, or only monitor (default: "enforce") | 'enforce' |
Source code in src/guidellm/scheduler/constraints/saturation.py
reset()
Reset all internal state to initial values.
Clears all tracked requests, resets counters, and reinitializes slope checkers. Useful for reusing constraint instances across multiple benchmark runs or resetting state after configuration changes.
Source code in src/guidellm/scheduler/constraints/saturation.py
OverSaturationConstraintArgs
Bases: ConstraintArgs
Arguments for over-saturation detection constraint.
Detects when a model becomes over-saturated using statistical slope analysis of concurrent requests and time-to-first-token metrics.
Attributes:
| Name | Type | Description |
|---|---|---|
kind | Literal['over_saturation'] | Always "over_saturation" |
Source code in src/guidellm/scheduler/constraints/saturation.py
OverSaturationConstraintInitializer
Bases: PydanticConstraintInitializer
Factory for creating OverSaturationConstraint instances from configuration.
Stores an OverSaturationConstraintArgs instance and delegates to OverSaturationConstraint in create_constraint().
Example: ::
from guidellm.scheduler.constraints import OverSaturationConstraintArgs
args = OverSaturationConstraintArgs(mode="enforce", min_seconds=60.0)
initializer = OverSaturationConstraintInitializer(args=args)
constraint = initializer.create_constraint()
Source code in src/guidellm/scheduler/constraints/saturation.py
create_constraint(**_kwargs)
Create an OverSaturationConstraint instance from stored args.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
_kwargs | Additional keyword arguments (unused) | {} |
Returns:
| Type | Description |
|---|---|
Constraint | Configured OverSaturationConstraint instance ready for use |
Source code in src/guidellm/scheduler/constraints/saturation.py
PydanticConstraintInitializer
Bases: StandardBaseModel, ABC, InfoMixin
Abstract base for Pydantic-based constraint initializers.
Provides standardized serialization, validation, and metadata handling for constraint initializers using Pydantic models. Subclasses implement specific constraint creation logic while inheriting validation and persistence support. Integrates with the constraint factory system for dynamic instantiation and configuration management.
Example: :: @ConstraintsInitializerFactory.register("max_duration") class MaxDurationConstraintInitializer(PydanticConstraintInitializer): type_: str = "max_duration" max_seconds: float = Field(description="Maximum duration in seconds")
def create_constraint(self) -> Constraint:
def evaluate(state, request):
if time.time() - state.start_time > self.max_seconds:
return SchedulerUpdateAction(request_queuing="stop")
return SchedulerUpdateAction(request_queuing="continue")
return evaluate
Attributes:
| Name | Type | Description |
|---|---|---|
type_ | str | Type identifier for the constraint initializer |
Source code in src/guidellm/scheduler/constraints/constraint.py
info property
Extract serializable information from this constraint initializer.
Returns:
| Type | Description |
|---|---|
dict[str, Any] | Dictionary containing constraint configuration and metadata |
create_constraint(**kwargs) abstractmethod
Create a constraint instance.
Must be implemented by subclasses to return their specific constraint type with appropriate configuration and validation. The returned constraint should be ready for evaluation against scheduler state and requests.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | Additional keyword arguments (usually unused) | {} |
Returns:
| Type | Description |
|---|---|
Constraint | Configured constraint instance |
Raises:
| Type | Description |
|---|---|
NotImplementedError | Must be implemented by subclasses |
Source code in src/guidellm/scheduler/constraints/constraint.py
Scheduler
Bases: Generic[RequestT, ResponseT], ThreadSafeSingletonMixin
Thread-safe singleton scheduler for distributed benchmarking workload coordination.
Orchestrates request processing across worker processes with distributed timing coordination, constraint enforcement, and result aggregation. Abstracts the complexity of multi-process coordination, environment synchronization, and resource management while providing a unified interface for executing benchmarking operations. Implements singleton pattern to ensure consistent execution state.
Example: :: from guidellm.scheduler import Scheduler from guidellm.scheduler import NonDistributedEnvironment, SynchronousStrategy
scheduler = Scheduler()
async for response, request, info, state in scheduler.run(
requests=request_list,
backend=backend,
strategy=SynchronousStrategy(),
env=NonDistributedEnvironment(),
max_requests=1000
):
print(f"Processed: {request}")
Source code in src/guidellm/scheduler/scheduler.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
run(requests, backend, strategy, env, **constraints) async
Execute distributed request processing with coordinated timing and constraints.
Orchestrates the complete benchmarking workflow across worker processes with environment synchronization, constraint enforcement, and error handling. Manages resource lifecycle from initialization through cleanup while yielding real-time processing updates for monitoring and aggregation.
:yields: Request updates as (response, request, request_info, scheduler_state) tuples. Each request generates three ordered updates: queued, in_progress, completed | errored | cancelled
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
requests | DatasetIterT[RequestT] | Request collection to process, supporting single requests or multi-turn sequences with optional inter-request delays | required |
backend | BackendInterface[RequestT, ResponseT] | Backend interface for request processing and response generation | required |
strategy | SchedulingStrategy | Scheduling strategy controlling request timing and distribution | required |
env | Environment[RequestT, ResponseT] | None | Environment interface for distributed coordination and synchronization. Defaults to NonDistributedEnvironment if None | required |
constraints | Constraint | ConstraintInitializer | Runtime constraints for execution control (max_requests, max_duration, max_error_rate, etc.) as primitives, dictionaries, or constraint instances | {} |
Raises:
| Type | Description |
|---|---|
Exception | Worker process errors, environment synchronization failures, or constraint evaluation errors are propagated after cleanup |
Source code in src/guidellm/scheduler/scheduler.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
SchedulerMessagingPydanticRegistry
Bases: RegistryMixin[RegistryObjT]
Registry for Pydantic types used in scheduler inter-process messaging.
Enables generic interface for defining Pydantic class types used for communication between distributed scheduler components and worker processes.
Source code in src/guidellm/scheduler/schemas.py
SchedulerProgress
Bases: StandardBaseModel
Progress tracking data for scheduler operations.
Provides estimates for remaining work in scheduler operations, including fraction complete, request counts, and duration. Used by constraints and monitoring systems to track execution progress and make termination decisions.
Source code in src/guidellm/scheduler/schemas.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 | |
remaining_duration_fraction property
Returns:
| Type | Description |
|---|---|
float | None | Estimated fraction of remaining duration, if known |
remaining_fraction property
Returns:
| Type | Description |
|---|---|
float | None | Estimated fraction of remaining progress, if known |
remaining_requests_fraction property
Returns:
| Type | Description |
|---|---|
float | None | Estimated fraction of remaining requests, if known |
combine(other)
Combine two progress instances, taking the minimum remaining estimates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other | SchedulerProgress | Another progress instance to combine with | required |
Returns:
| Type | Description |
|---|---|
SchedulerProgress | New progress instance with combined estimates |
Source code in src/guidellm/scheduler/schemas.py
SchedulerState
Bases: StandardBaseModel
Comprehensive state tracking for scheduler execution.
Tracks scheduler execution progress, request counts, timing information, and constraint enforcement. Central to scheduler coordination, providing real-time metrics for monitoring and decision-making across distributed worker processes.
Example: :: state = SchedulerState(node_id=0, num_processes=4) state.created_requests += 1 state.queued_requests += 1 completion_rate = state.processed_requests / state.created_requests
Source code in src/guidellm/scheduler/schemas.py
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 | |
SchedulerUpdateAction
Bases: StandardBaseModel
Control directives for scheduler behavior and operations.
Encapsulates control signals for scheduler operations including request queuing and processing directives. Used by constraints to communicate termination conditions and progress to scheduler components.
Example: :: action = SchedulerUpdateAction( request_queuing="stop", request_processing="continue", metadata={"reason": "max_requests_reached"} )
Source code in src/guidellm/scheduler/schemas.py
SchedulingStrategy
Bases: PydanticClassRegistryMixin['SchedulingStrategy'], InfoMixin
Base class for scheduling strategies controlling request processing patterns.
Defines the interface for strategies that combine timing implementations with process and concurrency constraints to enable various benchmark scenarios. Strategies manage request timing, worker process coordination, and concurrency limits across distributed execution environments.
Attributes:
| Name | Type | Description |
|---|---|---|
schema_discriminator | str | Field name used for polymorphic deserialization |
Source code in src/guidellm/scheduler/strategies.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 | |
processes_limit property
Get the maximum number of worker processes supported by this strategy.
Returns:
| Type | Description |
|---|---|
PositiveInt | None | Maximum number of worker processes, None if unlimited |
requests_limit property
Get the maximum number of concurrent requests supported by this strategy.
Returns:
| Type | Description |
|---|---|
PositiveInt | None | Maximum number of concurrent requests, None if unlimited |
get_processes_start_time() async
Get the synchronized start time, waiting if not yet set.
Blocks until the main process sets the start time via init_processes_start, enabling synchronized request scheduling across all workers.
Returns:
| Type | Description |
|---|---|
float | Unix timestamp when request processing began |
Raises:
| Type | Description |
|---|---|
RuntimeError | If called before init_processes_timings |
Source code in src/guidellm/scheduler/strategies.py
init_processes_start(start_time)
Set the synchronized start time for all worker processes.
Updates shared state with the benchmark start time to coordinate request scheduling across all workers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_time | float | Unix timestamp when request processing should begin | required |
Raises:
| Type | Description |
|---|---|
RuntimeError | If called before init_processes_timings |
Source code in src/guidellm/scheduler/strategies.py
init_processes_timings(worker_count, max_concurrency, mp_context)
Initialize shared timing state for multi-process coordination.
Sets up synchronized counters and locks for coordinating request timing across distributed worker processes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
worker_count | PositiveInt | Number of worker processes to coordinate | required |
max_concurrency | PositiveInt | Maximum number of concurrent requests allowed | required |
Source code in src/guidellm/scheduler/strategies.py
next_request_index()
Get the next sequential request index across all worker processes.
Thread-safe counter providing globally unique indices for request timing calculations in distributed environments.
Returns:
| Type | Description |
|---|---|
PositiveInt | Globally unique request index for timing calculations |
Raises:
| Type | Description |
|---|---|
RuntimeError | If called before init_processes_timings |
Source code in src/guidellm/scheduler/strategies.py
next_request_time(worker_index) abstractmethod async
Calculate the scheduled start time for the next request.
Strategy-specific implementation determining when requests should be processed based on timing patterns and worker distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
worker_index | NonNegativeInt | Worker process index for distributing request timing | required |
Returns:
| Type | Description |
|---|---|
float | Unix timestamp when the request should be processed |
Source code in src/guidellm/scheduler/strategies.py
request_completed(request_info) abstractmethod
Handle request completion and update internal timing state.
Strategy-specific handling of completed requests to maintain timing coordination and schedule subsequent requests.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request_info | RequestInfo | Completed request metadata including timing details and completion status | required |
Source code in src/guidellm/scheduler/strategies.py
requeue_delay()
Calculate delay before requeuing a conversation.
Default implementation returns zero delay. Strategies can override to implement custom requeue timing logic.
Returns:
| Type | Description |
|---|---|
float | Delay in seconds before the conversation should be requeued. |
Source code in src/guidellm/scheduler/strategies.py
resolve_dequeued_target_start(worker_index, provisional_start, settings) async
Resolve scheduled start time after dequeue using per-request settings.
Default returns provisional_start unchanged. Strategies with enqueue-bound timing metadata can override to reinterpret settings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
worker_index | NonNegativeInt | Worker process index handling the request | required |
provisional_start | float | Start time from the worker's scheduling slot | required |
settings | RequestSettings | Per-request scheduling metadata attached at enqueue | required |
Returns:
| Type | Description |
|---|---|
float | Unix timestamp when the request should begin processing |
Source code in src/guidellm/scheduler/strategies.py
SerializableConstraintInitializer
Bases: Protocol
Protocol for serializable constraint initializers supporting persistence.
Extends ConstraintInitializer with serialization capabilities, enabling constraint configurations to be saved, loaded, and transmitted. Serializable initializers support validation, model-based configuration, and dictionary-based serialization for integration with configuration systems and persistence layers.
Example: :: class SerializableInitializer: @classmethod def model_validate(cls, data: dict) -> ConstraintInitializer: return cls(**data)
def model_dump(self) -> dict[str, Any]:
return {"type_": "max_requests", "max_requests": self.max_requests}
def create_constraint(self) -> Constraint:
# ... create constraint
Source code in src/guidellm/scheduler/constraints/constraint.py
create_constraint(**kwargs)
Create constraint instance from this initializer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | Additional configuration parameters | {} |
Returns:
| Type | Description |
|---|---|
Constraint | Configured constraint evaluation function |
Source code in src/guidellm/scheduler/constraints/constraint.py
model_dump()
Serialize constraint initializer to dictionary format.
Returns:
| Type | Description |
|---|---|
dict[str, Any] | Dictionary representation of constraint initializer |
model_validate(**kwargs) classmethod
Create validated constraint initializer from configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | Configuration dictionary for initializer creation | {} |
Returns:
| Type | Description |
|---|---|
ConstraintInitializer | Validated constraint initializer instance |
Source code in src/guidellm/scheduler/constraints/constraint.py
SynchronousStrategy
Bases: SchedulingStrategy
Sequential request processing with strict single-request-at-a-time execution.
Processes requests one at a time in strict sequential order, providing predictable timing behavior ideal for measuring maximum sequential throughput and ensuring complete request isolation. Each request completes before the next begins.
Source code in src/guidellm/scheduler/strategies.py
processes_limit property
Returns:
| Type | Description |
|---|---|
PositiveInt | Always 1 to enforce single-process constraint |
requests_limit property
Returns:
| Type | Description |
|---|---|
PositiveInt | Always 1 to enforce single-request constraint |
__str__()
next_request_time(worker_index) async
Calculate next request time based on previous completion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
worker_index | NonNegativeInt | Unused for synchronous strategy | required |
Returns:
| Type | Description |
|---|---|
float | Time of last completion or start time if first request |
Source code in src/guidellm/scheduler/strategies.py
request_completed(request_info)
Update timing state with completed request information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request_info | RequestInfo | Completed request metadata including timing | required |
Source code in src/guidellm/scheduler/strategies.py
ThroughputStrategy
Bases: SchedulingStrategy
Maximum throughput scheduling with optional concurrency limits.
Schedules requests to maximize system throughput by allowing unlimited concurrent processing with optional constraints. Supports startup ramping to gradually distribute initial requests for controlled system ramp-up.
Source code in src/guidellm/scheduler/strategies.py
396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 | |
processes_limit property
Returns:
| Type | Description |
|---|---|
PositiveInt | None | Max concurrency if set, otherwise None for unlimited |
requests_limit property
Returns:
| Type | Description |
|---|---|
PositiveInt | None | Max concurrency if set, otherwise None for unlimited |
__str__()
next_request_time(worker_index) async
Calculate next request time with optional startup ramping.
Spreads initial requests linearly during rampup period, then schedules all subsequent requests immediately.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
worker_index | int | Unused for throughput strategy | required |
Returns:
| Type | Description |
|---|---|
float | Immediate start or ramped start time during startup period |
Source code in src/guidellm/scheduler/strategies.py
request_completed(request_info)
Handle request completion (no-op for throughput strategy).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request_info | RequestInfo | Completed request metadata (unused) | required |
Source code in src/guidellm/scheduler/strategies.py
TraceReplayStrategy
Bases: SchedulingStrategy
Replay scheduling from a trace of timestamps.
Each request carries a relative_timestamp in RequestSettings from the dataset finalizer. next_request_time schedules dequeue immediately at benchmark start; resolve_dequeued_target_start applies the trace offset via start_time + time_scale * relative_timestamp, reproducing inter-arrival timing under multiprocessing.
Source code in src/guidellm/scheduler/strategies.py
UnserializableConstraintInitializer
Bases: PydanticConstraintInitializer
Placeholder for constraints that cannot be serialized or executed.
Represents constraint initializers that failed serialization or contain non-serializable components. Cannot be executed and raises errors when invoked to prevent runtime failures from invalid constraint state. Used by the factory system to preserve constraint information even when full serialization is not possible.
Example: :: # Created automatically by factory when serialization fails unserializable = UnserializableConstraintInitializer( orig_info={"type_": "custom", "data": non_serializable_object} )
# Attempting to use it raises RuntimeError
constraint = unserializable.create_constraint() # Raises RuntimeError
Attributes:
| Name | Type | Description |
|---|---|---|
type_ | Literal['unserializable'] | Always "unserializable" to identify placeholder constraints |
orig_info | dict[str, Any] | Original constraint information before serialization failure |
Source code in src/guidellm/scheduler/constraints/constraint.py
__call__(state, request)
Raise error since unserializable constraints cannot be invoked.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | SchedulerState | Current scheduler state (unused) | required |
request | RequestInfo | Individual request information (unused) | required |
Raises:
| Type | Description |
|---|---|
RuntimeError | Always raised for unserializable constraints |
Source code in src/guidellm/scheduler/constraints/constraint.py
create_constraint(**_kwargs)
Raise error for unserializable constraint creation attempt.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | Additional keyword arguments (unused) | required |
Raises:
| Type | Description |
|---|---|
RuntimeError | Always raised since unserializable constraints cannot be executed |
Source code in src/guidellm/scheduler/constraints/constraint.py
WorkerProcess
Bases: Generic[RequestT, ResponseT]
Worker process for distributed request execution in the scheduler system.
Manages complete request lifecycle including queue consumption, backend processing, timing strategy application, and status publication. Coordinates with other workers through synchronization primitives while maintaining concurrency limits and handling graceful shutdown scenarios including errors and cancellations.
Example: :: worker = WorkerProcess( worker_index=0, messaging=messaging_interface, backend=backend_instance, strategy=timing_strategy, async_limit=10, fut_scheduling_time_limit=5.0, startup_barrier=barrier, requests_generated_event=generated_event, constraint_reached_event=constraint_event, shutdown_event=shutdown, error_event=error, ) worker.run()
Source code in src/guidellm/scheduler/worker.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 | |
__init__(worker_index, messaging, backend, strategy, async_limit, fut_scheduling_time_limit, startup_barrier, requests_generated_event, constraint_reached_event, shutdown_event, error_event)
Initialize worker process instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
worker_index | int | Unique identifier for this worker within the process group | required |
messaging | InterProcessMessaging[tuple[ResponseT | None, RequestT, RequestInfo], ConversationT[RequestT]] | Inter-process messaging interface for request coordination | required |
backend | BackendInterface[RequestT, ResponseT] | Backend interface for processing requests | required |
strategy | SchedulingStrategy | Scheduling strategy for determining request timing | required |
async_limit | int | Maximum concurrent requests this worker can process | required |
fut_scheduling_time_limit | float | Maximum time in seconds to schedule requests into the future | required |
startup_barrier | Barrier | Synchronization barrier for coordinated startup | required |
requests_generated_event | Event | Event signaling request generation completion | required |
constraint_reached_event | Event | Event signaling processing constraint reached | required |
shutdown_event | Event | Event signaling graceful shutdown request | required |
error_event | Event | Event signaling error conditions across processes | required |
Source code in src/guidellm/scheduler/worker.py
run()
Main entry point for worker process execution.
Initializes asyncio event loop with optional uvloop optimization and executes worker async operations. Handles event loop cleanup and error propagation.
Raises:
| Type | Description |
|---|---|
RuntimeError | If worker encounters unrecoverable error during execution |
Source code in src/guidellm/scheduler/worker.py
run_async() async
Execute main asynchronous worker process logic.
Orchestrates concurrent execution of request processing and shutdown monitoring. Handles task cleanup, error propagation, and cancellation coordination when any task completes or encounters an error.
Raises:
| Type | Description |
|---|---|
RuntimeError | If worker tasks encounter unrecoverable errors |
asyncio.CancelledError | If worker process was cancelled |
Source code in src/guidellm/scheduler/worker.py
WorkerProcessGroup
Bases: Generic[RequestT, ResponseT]
Orchestrates multiple worker processes for distributed request processing.
Manages process lifecycle, request distribution, response collection, and state synchronization across workers. Handles dynamic scaling, load balancing, and constraint evaluation with graceful shutdown coordination for high-throughput request processing workloads.
Example: :: from guidellm.scheduler.worker_group import WorkerProcessGroup
group = WorkerProcessGroup(
requests=request_iterable,
backend=backend_instance,
strategy=scheduling_strategy,
constraints={"max_time": time_constraint}
)
await group.create_processes()
await group.start(time.time())
async for response, request, info, state in group.request_updates():
if response is not None:
# Process completed request
handle_response(response)
await group.shutdown()
Source code in src/guidellm/scheduler/worker_group.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 | |
__init__(requests, backend, strategy, **constraints)
Initialize a worker process group for distributed request processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
requests | DatasetIterT[RequestT] | Finite iterable of requests to process sequentially | required |
backend | BackendInterface[RequestT, ResponseT] | Backend interface for processing requests | required |
strategy | SchedulingStrategy | Scheduling strategy for request timing and distribution | required |
constraints | Constraint | Named constraints for controlling execution behavior | {} |
Source code in src/guidellm/scheduler/worker_group.py
create_processes() async
Create and initialize worker processes for distributed request processing.
Sets up multiprocessing infrastructure and worker processes based on strategy constraints, backend capabilities, and system configuration. Determines optimal process count and concurrency limits, then spawns worker processes with distributed request handling capabilities.
Raises:
| Type | Description |
|---|---|
RuntimeError | If process initialization or startup fails |
Source code in src/guidellm/scheduler/worker_group.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 | |
request_updates() async
Yield request processing updates as they become available.
Returns an async iterator of request updates including response, request, request scheduling info, and scheduler state. Updates occur on request queued, processing start, and completion. Response is None until processing completes.
Returns:
| Type | Description |
|---|---|
AsyncIterator[tuple[ResponseT | None, RequestT, RequestInfo, SchedulerState]] | Async iterator yielding (response, request, request_info, state) tuples where response is None until processing is complete |
Raises:
| Type | Description |
|---|---|
RuntimeError | If workers encounter unrecoverable errors |
Source code in src/guidellm/scheduler/worker_group.py
shutdown() async
Gracefully shut down the worker process group and clean up resources.
Performs safe shutdown of worker processes, background tasks, and multiprocessing resources. Coordinates orderly termination across all workers and collects any exceptions encountered during shutdown.
Returns:
| Type | Description |
|---|---|
list[Exception] | List of exceptions encountered during shutdown; empty if no errors |
Source code in src/guidellm/scheduler/worker_group.py
start(start_time) async
Begin request processing at the specified start time.
Initializes scheduler state and background tasks, then waits until the specified start time before beginning operations. Sets up inter-process communication and coordinates synchronized startup across all workers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_time | float | Unix timestamp when processing should begin | required |
Raises:
| Type | Description |
|---|---|
RuntimeError | If workers encounter errors during startup or if create_processes() was not called first |