`guidellm.scheduler.constraints`

Constraint system for scheduler behavior control and request processing limits.

Provides flexible constraints for managing scheduler behavior with configurable thresholds based on time, error rates, and request counts. Constraints evaluate scheduler state and individual requests to determine whether processing should continue or stop based on predefined limits. The constraint system enables sophisticated benchmark stopping criteria through composable constraint types.

`Constraint`

Bases: Protocol

Protocol for constraint evaluation functions that control scheduler behavior.

Defines the interface that all constraint implementations must follow. Constraints are callable objects that evaluate scheduler state and request information to determine whether processing should continue or stop. The protocol enables type checking and runtime validation of constraint implementations while allowing flexible implementation approaches (functions, classes, closures).

Example: :: def my_constraint( state: SchedulerState, request: RequestInfo ) -> SchedulerUpdateAction: if state.processing_requests > 100: return SchedulerUpdateAction(request_queuing="stop") return SchedulerUpdateAction(request_queuing="continue")

Source code in src/guidellm/scheduler/constraints/constraint.py

@runtime_checkable
class Constraint(Protocol):
    """
    Protocol for constraint evaluation functions that control scheduler behavior.

    Defines the interface that all constraint implementations must follow. Constraints
    are callable objects that evaluate scheduler state and request information to
    determine whether processing should continue or stop. The protocol enables type
    checking and runtime validation of constraint implementations while allowing
    flexible implementation approaches (functions, classes, closures).

    Example:
    ::
        def my_constraint(
            state: SchedulerState, request: RequestInfo
        ) -> SchedulerUpdateAction:
            if state.processing_requests > 100:
                return SchedulerUpdateAction(request_queuing="stop")
            return SchedulerUpdateAction(request_queuing="continue")
    """

    def __call__(
        self, state: SchedulerState, request: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against scheduler state and request information.

        :param state: Current scheduler state with metrics and timing information
        :param request: Individual request information and metadata
        :return: Action indicating whether to continue or stop scheduler operations
        """

`call(state, request)`

Evaluate constraint against scheduler state and request information.

Parameters:

Name	Type	Description	Default
`state`	`SchedulerState`	Current scheduler state with metrics and timing information	required
`request`	`RequestInfo`	Individual request information and metadata	required

Returns:

Type	Description
`SchedulerUpdateAction`	Action indicating whether to continue or stop scheduler operations

Source code in src/guidellm/scheduler/constraints/constraint.py

def __call__(
    self, state: SchedulerState, request: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against scheduler state and request information.

    :param state: Current scheduler state with metrics and timing information
    :param request: Individual request information and metadata
    :return: Action indicating whether to continue or stop scheduler operations
    """

`ConstraintArgs`

Bases: PydanticClassRegistryMixin['ConstraintArgs']

Base class for constraint configuration arguments.

Uses PydanticClassRegistryMixin to enable polymorphic deserialization based on the kind field. Each registered subclass represents a specific constraint type with its own parameters.

Attributes:

Name	Type	Description
`schema_discriminator`	`str`	Field name for polymorphic deserialization

Source code in src/guidellm/scheduler/constraints/args.py

class ConstraintArgs(PydanticClassRegistryMixin["ConstraintArgs"]):
    """
    Base class for constraint configuration arguments.

    Uses ``PydanticClassRegistryMixin`` to enable polymorphic deserialization
    based on the ``kind`` field. Each registered subclass represents a specific
    constraint type with its own parameters.

    :cvar schema_discriminator: Field name for polymorphic deserialization
    """

    model_config = ConfigDict(
        extra="forbid",
        serialize_by_alias=True,
        ser_json_bytes="base64",
        val_json_bytes="base64",
    )

    schema_discriminator: ClassVar[str] = "kind"

    @classmethod
    def __pydantic_schema_base_type__(cls) -> type[ConstraintArgs]:
        """
        Return base type for polymorphic validation hierarchy.

        :return: Base ConstraintArgs class for schema validation
        """
        if cls.__name__ == "ConstraintArgs":
            return cls

        return ConstraintArgs

    kind: str = Field(
        description="Constraint type discriminator for polymorphic serialization",
    )

    @property
    def constraint_key(self) -> str:
        """
        The key to use when inserting into the constraints dict.

        Defaults to ``kind``, but subclasses may override if the factory
        registry key differs from the args kind.

        :return: Registry key for this constraint type
        """
        return self.kind

`constraint_key` `property`

The key to use when inserting into the constraints dict.

Defaults to kind, but subclasses may override if the factory registry key differs from the args kind.

Returns:

Type	Description
`str`	Registry key for this constraint type

`__pydantic_schema_base_type__()` `classmethod`

Return base type for polymorphic validation hierarchy.

Returns:

Type	Description
`type[ConstraintArgs]`	Base ConstraintArgs class for schema validation

Source code in src/guidellm/scheduler/constraints/args.py

@classmethod
def __pydantic_schema_base_type__(cls) -> type[ConstraintArgs]:
    """
    Return base type for polymorphic validation hierarchy.

    :return: Base ConstraintArgs class for schema validation
    """
    if cls.__name__ == "ConstraintArgs":
        return cls

    return ConstraintArgs

`ConstraintInitializer`

Bases: Protocol

Protocol for constraint initializer factory functions that create constraints.

Defines the interface for factory objects that create constraint instances from configuration parameters. Constraint initializers enable dynamic constraint creation and configuration, supporting both simple boolean flags and complex parameter dictionaries. The protocol allows type checking while maintaining flexibility for different initialization patterns.

Example: :: class MaxRequestsInitializer: def init(self, max_requests: int): self.max_requests = max_requests

    def create_constraint(self) -> Constraint:
        def evaluate(state, request):
            if state.total_requests >= self.max_requests:
                return SchedulerUpdateAction(request_queuing="stop")
            return SchedulerUpdateAction(request_queuing="continue")
        return evaluate

Source code in src/guidellm/scheduler/constraints/constraint.py

@runtime_checkable
class ConstraintInitializer(Protocol):
    """
    Protocol for constraint initializer factory functions that create constraints.

    Defines the interface for factory objects that create constraint instances from
    configuration parameters. Constraint initializers enable dynamic constraint
    creation and configuration, supporting both simple boolean flags and complex
    parameter dictionaries. The protocol allows type checking while maintaining
    flexibility for different initialization patterns.

    Example:
    ::
        class MaxRequestsInitializer:
            def __init__(self, max_requests: int):
                self.max_requests = max_requests

            def create_constraint(self) -> Constraint:
                def evaluate(state, request):
                    if state.total_requests >= self.max_requests:
                        return SchedulerUpdateAction(request_queuing="stop")
                    return SchedulerUpdateAction(request_queuing="continue")
                return evaluate
    """

    def create_constraint(self, **kwargs) -> Constraint:
        """
        Create a constraint instance from configuration parameters.

        :param kwargs: Configuration parameters for constraint creation
        :return: Configured constraint evaluation function
        """

`create_constraint(**kwargs)`

Create a constraint instance from configuration parameters.

Parameters:

Name	Type	Description	Default
`kwargs`		Configuration parameters for constraint creation	`{}`

Returns:

Type	Description
`Constraint`	Configured constraint evaluation function

Source code in src/guidellm/scheduler/constraints/constraint.py

def create_constraint(self, **kwargs) -> Constraint:
    """
    Create a constraint instance from configuration parameters.

    :param kwargs: Configuration parameters for constraint creation
    :return: Configured constraint evaluation function
    """

`ConstraintsInitializerFactory`

Bases: RegistryMixin[ConstraintInitializer]

Registry factory for creating and managing constraint initializers.

Provides centralized access to registered constraint types with support for creating constraints from ConstraintArgs instances or pre-configured initializer instances. Handles constraint resolution and type validation for the scheduler constraint system.

Example: :: from guidellm.scheduler import ConstraintsInitializerFactory

# Register new constraint type
@ConstraintsInitializerFactory.register("new_constraint")
class NewConstraint:
    def create_constraint(self, **kwargs) -> Constraint:
        return lambda state, request: SchedulerUpdateAction()

# Create and use constraint
args = NewConstraintArgs(kind="new_constraint")
initializer = ConstraintsInitializerFactory.create(args)
constraint = initializer.create_constraint()

Source code in src/guidellm/scheduler/constraints/factory.py

class ConstraintsInitializerFactory(RegistryMixin[ConstraintInitializer]):
    """
    Registry factory for creating and managing constraint initializers.

    Provides centralized access to registered constraint types with support for
    creating constraints from ``ConstraintArgs`` instances or pre-configured
    initializer instances. Handles constraint resolution and type validation
    for the scheduler constraint system.

    Example:
    ::
        from guidellm.scheduler import ConstraintsInitializerFactory

        # Register new constraint type
        @ConstraintsInitializerFactory.register("new_constraint")
        class NewConstraint:
            def create_constraint(self, **kwargs) -> Constraint:
                return lambda state, request: SchedulerUpdateAction()

        # Create and use constraint
        args = NewConstraintArgs(kind="new_constraint")
        initializer = ConstraintsInitializerFactory.create(args)
        constraint = initializer.create_constraint()
    """

    @classmethod
    def create(cls, args: ConstraintArgs) -> ConstraintInitializer:
        """
        Create a constraint initializer from a ``ConstraintArgs`` instance.

        :param args: Validated constraint arguments with kind discriminator
        :return: Configured constraint initializer instance
        :raises ValueError: If args.kind is not registered in the factory
        """
        if cls.registry is None or args.kind not in cls.registry:
            raise ValueError(f"Unknown constraint discriminator: {args.kind}")

        initializer_class = cls.registry[args.kind]
        return initializer_class(args=args)  # type: ignore[operator]

    @classmethod
    def deserialize(
        cls, initializer_dict: dict[str, Any]
    ) -> SerializableConstraintInitializer | UnserializableConstraintInitializer:
        """
        Deserialize constraint initializer from dictionary format.

        :param initializer_dict: Dictionary representation of constraint initializer
        :return: Reconstructed constraint initializer instance
        :raises ValueError: If constraint type is unknown or cannot be deserialized
        """
        if initializer_dict.get("type_") == "unserializable":
            return UnserializableConstraintInitializer.model_validate(initializer_dict)

        if (
            cls.registry is not None
            and initializer_dict.get("type_")
            and initializer_dict["type_"] in cls.registry
        ):
            initializer_class = cls.registry[initializer_dict["type_"]]
            if hasattr(initializer_class, "model_validate"):
                return initializer_class.model_validate(initializer_dict)  # type: ignore[return-value]
            else:
                return initializer_class(**initializer_dict)  # type: ignore[return-value,operator]

        raise ValueError(
            f"Cannot deserialize unknown constraint initializer: "
            f"{initializer_dict.get('type_', 'unknown')}"
        )

    @classmethod
    def resolve(
        cls,
        initializers: dict[
            str,
            Constraint | ConstraintInitializer,
        ],
    ) -> dict[str, Constraint]:
        """
        Resolve constraint initializers to callable constraints.

        :param initializers: Dictionary mapping constraint keys to specifications.
            Values must be Constraint instances or ConstraintInitializer instances.
        :return: Dictionary mapping constraint keys to callable functions
        :raises TypeError: If a value is not a supported type
        """
        constraints = {}

        for key, val in initializers.items():
            if isinstance(val, Constraint):
                constraints[key] = val
            elif isinstance(val, ConstraintInitializer):
                constraints[key] = val.create_constraint()
            else:
                raise TypeError(
                    f"Constraint '{key}' has unsupported value type "
                    f"{type(val).__name__}. Expected a Constraint instance or "
                    f"ConstraintInitializer instance."
                )

        return constraints

`create(args)` `classmethod`

Create a constraint initializer from a ConstraintArgs instance.

Parameters:

Name	Type	Description	Default
`args`	`ConstraintArgs`	Validated constraint arguments with kind discriminator	required

Returns:

Type	Description
`ConstraintInitializer`	Configured constraint initializer instance

Raises:

Type	Description
`ValueError`	If args.kind is not registered in the factory

Source code in src/guidellm/scheduler/constraints/factory.py

@classmethod
def create(cls, args: ConstraintArgs) -> ConstraintInitializer:
    """
    Create a constraint initializer from a ``ConstraintArgs`` instance.

    :param args: Validated constraint arguments with kind discriminator
    :return: Configured constraint initializer instance
    :raises ValueError: If args.kind is not registered in the factory
    """
    if cls.registry is None or args.kind not in cls.registry:
        raise ValueError(f"Unknown constraint discriminator: {args.kind}")

    initializer_class = cls.registry[args.kind]
    return initializer_class(args=args)  # type: ignore[operator]

`deserialize(initializer_dict)` `classmethod`

Deserialize constraint initializer from dictionary format.

Parameters:

Name	Type	Description	Default
`initializer_dict`	`dict[str, Any]`	Dictionary representation of constraint initializer	required

Returns:

Type	Description
`SerializableConstraintInitializer \| UnserializableConstraintInitializer`	Reconstructed constraint initializer instance

Raises:

Type	Description
`ValueError`	If constraint type is unknown or cannot be deserialized

Source code in src/guidellm/scheduler/constraints/factory.py

@classmethod
def deserialize(
    cls, initializer_dict: dict[str, Any]
) -> SerializableConstraintInitializer | UnserializableConstraintInitializer:
    """
    Deserialize constraint initializer from dictionary format.

    :param initializer_dict: Dictionary representation of constraint initializer
    :return: Reconstructed constraint initializer instance
    :raises ValueError: If constraint type is unknown or cannot be deserialized
    """
    if initializer_dict.get("type_") == "unserializable":
        return UnserializableConstraintInitializer.model_validate(initializer_dict)

    if (
        cls.registry is not None
        and initializer_dict.get("type_")
        and initializer_dict["type_"] in cls.registry
    ):
        initializer_class = cls.registry[initializer_dict["type_"]]
        if hasattr(initializer_class, "model_validate"):
            return initializer_class.model_validate(initializer_dict)  # type: ignore[return-value]
        else:
            return initializer_class(**initializer_dict)  # type: ignore[return-value,operator]

    raise ValueError(
        f"Cannot deserialize unknown constraint initializer: "
        f"{initializer_dict.get('type_', 'unknown')}"
    )

`resolve(initializers)` `classmethod`

Resolve constraint initializers to callable constraints.

Parameters:

Name	Type	Description	Default
`initializers`	`dict[str, Constraint \| ConstraintInitializer]`	Dictionary mapping constraint keys to specifications. Values must be Constraint instances or ConstraintInitializer instances.	required

Returns:

Type	Description
`dict[str, Constraint]`	Dictionary mapping constraint keys to callable functions

Raises:

Type	Description
`TypeError`	If a value is not a supported type

Source code in src/guidellm/scheduler/constraints/factory.py

@classmethod
def resolve(
    cls,
    initializers: dict[
        str,
        Constraint | ConstraintInitializer,
    ],
) -> dict[str, Constraint]:
    """
    Resolve constraint initializers to callable constraints.

    :param initializers: Dictionary mapping constraint keys to specifications.
        Values must be Constraint instances or ConstraintInitializer instances.
    :return: Dictionary mapping constraint keys to callable functions
    :raises TypeError: If a value is not a supported type
    """
    constraints = {}

    for key, val in initializers.items():
        if isinstance(val, Constraint):
            constraints[key] = val
        elif isinstance(val, ConstraintInitializer):
            constraints[key] = val.create_constraint()
        else:
            raise TypeError(
                f"Constraint '{key}' has unsupported value type "
                f"{type(val).__name__}. Expected a Constraint instance or "
                f"ConstraintInitializer instance."
            )

    return constraints

`MaxDurationConstraint`

Bases: PydanticConstraintInitializer

Constraint that limits execution based on maximum time duration.

Stops both request queuing and processing when the elapsed time since scheduler start exceeds the maximum duration. Provides progress tracking based on remaining time and completion fraction.

Source code in src/guidellm/scheduler/constraints/request.py

@ConstraintsInitializerFactory.register("max_duration")
class MaxDurationConstraint(PydanticConstraintInitializer):
    """
    Constraint that limits execution based on maximum time duration.

    Stops both request queuing and processing when the elapsed time since scheduler
    start exceeds the maximum duration. Provides progress tracking based on
    remaining time and completion fraction.
    """

    type_: Literal["max_duration"] = "max_duration"  # type: ignore[assignment]
    args: MaxDurationConstraintArgs = Field(
        description="Configuration arguments for max duration constraint",
    )
    current_index: int = Field(default=-1, description="Current index in duration list")

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Return self as the constraint instance.

        :param kwargs: Additional keyword arguments (unused)
        :return: Self instance as the constraint
        """
        self.current_index += 1

        return cast("Constraint", self.model_copy())

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against current scheduler state and elapsed time.

        :param state: Current scheduler state with start time
        :param request_info: Individual request information (unused)
        :return: Action indicating whether to continue or stop operations
        """
        _ = request_info  # Unused parameters
        current_index = max(0, self.current_index)
        max_duration = (
            self.args.seconds
            if isinstance(self.args.seconds, int | float)
            else self.args.seconds[min(current_index, len(self.args.seconds) - 1)]
        )

        start_time = state.start_requests_time or state.start_time
        current_time = time.time()
        elapsed = current_time - start_time
        duration_exceeded = elapsed >= max_duration
        remaining_duration = min(max(0.0, max_duration - elapsed), max_duration)
        stop_time = None if not duration_exceeded else start_time + max_duration

        return SchedulerUpdateAction(
            request_queuing="stop" if duration_exceeded else "continue",
            request_processing="stop_local" if duration_exceeded else "continue",
            metadata={
                "max_duration": max_duration,
                "elapsed_time": elapsed,
                "duration_exceeded": duration_exceeded,
                "start_time": start_time,
                "current_time": current_time,
                "stop_time": stop_time,
            },
            progress=SchedulerProgress(
                remaining_duration=remaining_duration,
                total_duration=max_duration,
                stop_time=stop_time,
            ),
        )

`call(state, request_info)`

Evaluate constraint against current scheduler state and elapsed time.

Parameters:

Name	Type	Description	Default
`state`	`SchedulerState`	Current scheduler state with start time	required
`request_info`	`RequestInfo`	Individual request information (unused)	required

Returns:

Type	Description
`SchedulerUpdateAction`	Action indicating whether to continue or stop operations

Source code in src/guidellm/scheduler/constraints/request.py

def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against current scheduler state and elapsed time.

    :param state: Current scheduler state with start time
    :param request_info: Individual request information (unused)
    :return: Action indicating whether to continue or stop operations
    """
    _ = request_info  # Unused parameters
    current_index = max(0, self.current_index)
    max_duration = (
        self.args.seconds
        if isinstance(self.args.seconds, int | float)
        else self.args.seconds[min(current_index, len(self.args.seconds) - 1)]
    )

    start_time = state.start_requests_time or state.start_time
    current_time = time.time()
    elapsed = current_time - start_time
    duration_exceeded = elapsed >= max_duration
    remaining_duration = min(max(0.0, max_duration - elapsed), max_duration)
    stop_time = None if not duration_exceeded else start_time + max_duration

    return SchedulerUpdateAction(
        request_queuing="stop" if duration_exceeded else "continue",
        request_processing="stop_local" if duration_exceeded else "continue",
        metadata={
            "max_duration": max_duration,
            "elapsed_time": elapsed,
            "duration_exceeded": duration_exceeded,
            "start_time": start_time,
            "current_time": current_time,
            "stop_time": stop_time,
        },
        progress=SchedulerProgress(
            remaining_duration=remaining_duration,
            total_duration=max_duration,
            stop_time=stop_time,
        ),
    )

`create_constraint(**_kwargs)`

Return self as the constraint instance.

Parameters:

Name	Type	Description	Default
`kwargs`		Additional keyword arguments (unused)	required

Returns:

Type	Description
`Constraint`	Self instance as the constraint

Source code in src/guidellm/scheduler/constraints/request.py

def create_constraint(self, **_kwargs) -> Constraint:
    """
    Return self as the constraint instance.

    :param kwargs: Additional keyword arguments (unused)
    :return: Self instance as the constraint
    """
    self.current_index += 1

    return cast("Constraint", self.model_copy())

`MaxDurationConstraintArgs`

Bases: ConstraintArgs

Arguments for maximum duration constraint.

Limits benchmark execution time per strategy.

Attributes:

Name	Type	Description
`kind`	`Literal['max_duration']`	Always "max_duration"

Source code in src/guidellm/scheduler/constraints/request.py

@ConstraintArgs.register("max_duration")
class MaxDurationConstraintArgs(ConstraintArgs):
    """
    Arguments for maximum duration constraint.

    Limits benchmark execution time per strategy.

    :cvar kind: Always "max_duration"
    """

    kind: Literal["max_duration"] = Field(
        default="max_duration",
        description="Constraint type discriminator",
    )
    seconds: PositiveNumOrList = Field(
        description="Maximum duration in seconds before stopping execution",
    )

`MaxErrorRateConstraint`

Bases: PydanticConstraintInitializer

Constraint that limits execution based on sliding window error rate.

Tracks error status of recent requests in a sliding window and stops all processing when the error rate exceeds the threshold. Only applies the constraint after processing enough requests to fill the minimum window size for statistical significance.

Source code in src/guidellm/scheduler/constraints/error.py

@ConstraintsInitializerFactory.register("max_error_rate")
class MaxErrorRateConstraint(PydanticConstraintInitializer):
    """
    Constraint that limits execution based on sliding window error rate.

    Tracks error status of recent requests in a sliding window and stops all
    processing when the error rate exceeds the threshold. Only applies the
    constraint after processing enough requests to fill the minimum window size
    for statistical significance.
    """

    type_: Literal["max_error_rate"] = "max_error_rate"  # type: ignore[assignment]
    args: MaxErrorRateConstraintArgs = Field(
        description="Configuration arguments for max error rate constraint",
    )
    error_window: list[bool] = Field(
        default_factory=list,
        description="Sliding window tracking error status of recent requests",
    )
    current_index: int = Field(
        default=-1, description="Current index in the error window"
    )

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Create a new instance of MaxErrorRateConstraint (due to stateful window).

        :param kwargs: Additional keyword arguments (unused)
        :return: New instance of the constraint
        """
        self.current_index += 1

        return cast("Constraint", self.model_copy())

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against sliding window error rate.

        :param state: Current scheduler state with request counts
        :param request_info: Individual request with completion status
        :return: Action indicating whether to continue or stop operations
        """
        current_index = max(0, self.current_index)
        max_error_rate = (
            self.args.rate
            if isinstance(self.args.rate, int | float)
            else self.args.rate[min(current_index, len(self.args.rate) - 1)]
        )

        if request_info.status in ["completed", "errored", "cancelled"]:
            self.error_window.append(request_info.status == "errored")
            if len(self.error_window) > self.args.window:
                self.error_window.pop(0)

        error_count = sum(self.error_window)
        window_requests = len(self.error_window)
        error_rate = (
            error_count / float(window_requests) if window_requests > 0 else 0.0
        )
        exceeded_min_processed = state.processed_requests >= self.args.window
        exceeded_error_rate = error_rate >= max_error_rate
        exceeded = exceeded_min_processed and exceeded_error_rate
        stop_time = None if not exceeded else request_info.completed_at or time.time()

        return SchedulerUpdateAction(
            request_queuing="stop" if exceeded else "continue",
            request_processing="stop_all" if exceeded else "continue",
            metadata={
                "max_error_rate": max_error_rate,
                "window_size": self.args.window,
                "error_count": error_count,
                "processed_count": state.processed_requests,
                "current_window_size": len(self.error_window),
                "current_error_rate": error_rate,
                "exceeded_min_processed": exceeded_min_processed,
                "exceeded_error_rate": exceeded_error_rate,
                "exceeded": exceeded,
                "stop_time": stop_time,
            },
        )

`call(state, request_info)`

Evaluate constraint against sliding window error rate.

Parameters:

Name	Type	Description	Default
`state`	`SchedulerState`	Current scheduler state with request counts	required
`request_info`	`RequestInfo`	Individual request with completion status	required

Returns:

Type	Description
`SchedulerUpdateAction`	Action indicating whether to continue or stop operations

Source code in src/guidellm/scheduler/constraints/error.py

def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against sliding window error rate.

    :param state: Current scheduler state with request counts
    :param request_info: Individual request with completion status
    :return: Action indicating whether to continue or stop operations
    """
    current_index = max(0, self.current_index)
    max_error_rate = (
        self.args.rate
        if isinstance(self.args.rate, int | float)
        else self.args.rate[min(current_index, len(self.args.rate) - 1)]
    )

    if request_info.status in ["completed", "errored", "cancelled"]:
        self.error_window.append(request_info.status == "errored")
        if len(self.error_window) > self.args.window:
            self.error_window.pop(0)

    error_count = sum(self.error_window)
    window_requests = len(self.error_window)
    error_rate = (
        error_count / float(window_requests) if window_requests > 0 else 0.0
    )
    exceeded_min_processed = state.processed_requests >= self.args.window
    exceeded_error_rate = error_rate >= max_error_rate
    exceeded = exceeded_min_processed and exceeded_error_rate
    stop_time = None if not exceeded else request_info.completed_at or time.time()

    return SchedulerUpdateAction(
        request_queuing="stop" if exceeded else "continue",
        request_processing="stop_all" if exceeded else "continue",
        metadata={
            "max_error_rate": max_error_rate,
            "window_size": self.args.window,
            "error_count": error_count,
            "processed_count": state.processed_requests,
            "current_window_size": len(self.error_window),
            "current_error_rate": error_rate,
            "exceeded_min_processed": exceeded_min_processed,
            "exceeded_error_rate": exceeded_error_rate,
            "exceeded": exceeded,
            "stop_time": stop_time,
        },
    )

`create_constraint(**_kwargs)`

Create a new instance of MaxErrorRateConstraint (due to stateful window).

Parameters:

Name	Type	Description	Default
`kwargs`		Additional keyword arguments (unused)	required

Returns:

Type	Description
`Constraint`	New instance of the constraint

Source code in src/guidellm/scheduler/constraints/error.py

def create_constraint(self, **_kwargs) -> Constraint:
    """
    Create a new instance of MaxErrorRateConstraint (due to stateful window).

    :param kwargs: Additional keyword arguments (unused)
    :return: New instance of the constraint
    """
    self.current_index += 1

    return cast("Constraint", self.model_copy())

`MaxErrorRateConstraintArgs`

Bases: ConstraintArgs

Arguments for maximum error rate constraint (sliding window).

Stops execution when the windowed error rate exceeds the threshold.

Attributes:

Name	Type	Description
`kind`	`Literal['max_error_rate']`	Always "max_error_rate"

Source code in src/guidellm/scheduler/constraints/error.py

@ConstraintArgs.register("max_error_rate")
class MaxErrorRateConstraintArgs(ConstraintArgs):
    """
    Arguments for maximum error rate constraint (sliding window).

    Stops execution when the windowed error rate exceeds the threshold.

    :cvar kind: Always "max_error_rate"
    """

    kind: Literal["max_error_rate"] = Field(
        default="max_error_rate",
        description="Constraint type discriminator",
    )
    rate: ErrorRateOrList = Field(
        description="Maximum error rate (0.0 to 1.0) before stopping execution",
    )
    window: int | float = Field(
        default_factory=lambda: settings.constraint_error_window_size,
        gt=0,
        description="Size of sliding window for calculating error rate",
    )

`MaxErrorsConstraint`

Bases: PydanticConstraintInitializer

Constraint that limits execution based on absolute error count.

Stops both request queuing and all request processing when the total number of errored requests reaches the maximum threshold. Uses global error tracking across all requests for immediate constraint evaluation.

Source code in src/guidellm/scheduler/constraints/error.py

@ConstraintsInitializerFactory.register("max_errors")
class MaxErrorsConstraint(PydanticConstraintInitializer):
    """
    Constraint that limits execution based on absolute error count.

    Stops both request queuing and all request processing when the total number
    of errored requests reaches the maximum threshold. Uses global error tracking
    across all requests for immediate constraint evaluation.
    """

    type_: Literal["max_errors"] = "max_errors"  # type: ignore[assignment]
    args: MaxErrorsConstraintArgs = Field(
        description="Configuration arguments for max errors constraint",
    )
    current_index: int = Field(default=-1, description="Current index in error list")

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Return self as the constraint instance.

        :param kwargs: Additional keyword arguments (unused)
        :return: Self instance as the constraint
        """
        self.current_index += 1

        return cast("Constraint", self.model_copy())

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against current error count.

        :param state: Current scheduler state with error counts
        :param request_info: Individual request information (unused)
        :return: Action indicating whether to continue or stop operations
        """
        _ = request_info  # Unused parameters
        current_index = max(0, self.current_index)
        max_errors = (
            self.args.count
            if isinstance(self.args.count, int | float)
            else self.args.count[min(current_index, len(self.args.count) - 1)]
        )
        errors_exceeded = state.errored_requests >= max_errors
        stop_time = (
            None if not errors_exceeded else request_info.completed_at or time.time()
        )

        return SchedulerUpdateAction(
            request_queuing="stop" if errors_exceeded else "continue",
            request_processing="stop_all" if errors_exceeded else "continue",
            metadata={
                "max_errors": max_errors,
                "errors_exceeded": errors_exceeded,
                "current_errors": state.errored_requests,
                "stop_time": stop_time,
            },
            progress=SchedulerProgress(stop_time=stop_time),
        )

`call(state, request_info)`

Evaluate constraint against current error count.

Parameters:

Name	Type	Description	Default
`state`	`SchedulerState`	Current scheduler state with error counts	required
`request_info`	`RequestInfo`	Individual request information (unused)	required

Returns:

Type	Description
`SchedulerUpdateAction`	Action indicating whether to continue or stop operations

Source code in src/guidellm/scheduler/constraints/error.py

def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against current error count.

    :param state: Current scheduler state with error counts
    :param request_info: Individual request information (unused)
    :return: Action indicating whether to continue or stop operations
    """
    _ = request_info  # Unused parameters
    current_index = max(0, self.current_index)
    max_errors = (
        self.args.count
        if isinstance(self.args.count, int | float)
        else self.args.count[min(current_index, len(self.args.count) - 1)]
    )
    errors_exceeded = state.errored_requests >= max_errors
    stop_time = (
        None if not errors_exceeded else request_info.completed_at or time.time()
    )

    return SchedulerUpdateAction(
        request_queuing="stop" if errors_exceeded else "continue",
        request_processing="stop_all" if errors_exceeded else "continue",
        metadata={
            "max_errors": max_errors,
            "errors_exceeded": errors_exceeded,
            "current_errors": state.errored_requests,
            "stop_time": stop_time,
        },
        progress=SchedulerProgress(stop_time=stop_time),
    )

`create_constraint(**_kwargs)`

Return self as the constraint instance.

Parameters:

Name	Type	Description	Default
`kwargs`		Additional keyword arguments (unused)	required

Returns:

Type	Description
`Constraint`	Self instance as the constraint

Source code in src/guidellm/scheduler/constraints/error.py

def create_constraint(self, **_kwargs) -> Constraint:
    """
    Return self as the constraint instance.

    :param kwargs: Additional keyword arguments (unused)
    :return: Self instance as the constraint
    """
    self.current_index += 1

    return cast("Constraint", self.model_copy())

`MaxErrorsConstraintArgs`

Bases: ConstraintArgs

Arguments for maximum error count constraint.

Stops execution when total errors reach the threshold.

Attributes:

Name	Type	Description
`kind`	`Literal['max_errors']`	Always "max_errors"

Source code in src/guidellm/scheduler/constraints/error.py

@ConstraintArgs.register("max_errors")
class MaxErrorsConstraintArgs(ConstraintArgs):
    """
    Arguments for maximum error count constraint.

    Stops execution when total errors reach the threshold.

    :cvar kind: Always "max_errors"
    """

    kind: Literal["max_errors"] = Field(
        default="max_errors",
        description="Constraint type discriminator",
    )
    count: PositiveNumOrList = Field(
        description="Maximum number of errors before stopping execution",
    )

`MaxGlobalErrorRateConstraint`

Bases: PydanticConstraintInitializer

Constraint that limits execution based on global error rate.

Calculates error rate across all processed requests and stops all processing when the rate exceeds the threshold. Only applies the constraint after processing the minimum number of requests to ensure statistical significance for global error rate calculations.

Source code in src/guidellm/scheduler/constraints/error.py

@ConstraintsInitializerFactory.register("max_global_error_rate")
class MaxGlobalErrorRateConstraint(PydanticConstraintInitializer):
    """
    Constraint that limits execution based on global error rate.

    Calculates error rate across all processed requests and stops all processing
    when the rate exceeds the threshold. Only applies the constraint after
    processing the minimum number of requests to ensure statistical significance
    for global error rate calculations.
    """

    type_: Literal["max_global_error_rate"] = "max_global_error_rate"  # type: ignore[assignment]
    args: MaxGlobalErrorRateConstraintArgs = Field(
        description="Configuration arguments for max global error rate constraint",
    )
    current_index: int = Field(
        default=-1, description="Current index for list-based max_error_rate values"
    )

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Return self as the constraint instance.

        :param kwargs: Additional keyword arguments (unused)
        :return: Self instance as the constraint
        """
        self.current_index += 1

        return cast("Constraint", self.model_copy())

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against global error rate.

        :param state: Current scheduler state with global request and error counts
        :param request_info: Individual request information (unused)
        :return: Action indicating whether to continue or stop operations
        """
        _ = request_info  # Unused parameters
        current_index = max(0, self.current_index)
        max_error_rate = (
            self.args.rate
            if isinstance(self.args.rate, int | float)
            else self.args.rate[min(current_index, len(self.args.rate) - 1)]
        )

        exceeded_min_processed = (
            self.args.minimum is None or state.processed_requests >= self.args.minimum
        )
        error_rate = (
            state.errored_requests / float(state.processed_requests)
            if state.processed_requests > 0
            else 0.0
        )
        exceeded_error_rate = error_rate >= max_error_rate
        exceeded = exceeded_min_processed and exceeded_error_rate
        stop_time = None if not exceeded else request_info.completed_at or time.time()

        return SchedulerUpdateAction(
            request_queuing="stop" if exceeded else "continue",
            request_processing="stop_all" if exceeded else "continue",
            metadata={
                "max_error_rate": max_error_rate,
                "min_processed": self.args.minimum,
                "processed_requests": state.processed_requests,
                "errored_requests": state.errored_requests,
                "error_rate": error_rate,
                "exceeded_min_processed": exceeded_min_processed,
                "exceeded_error_rate": exceeded_error_rate,
                "exceeded": exceeded,
                "stop_time": stop_time,
            },
            progress=SchedulerProgress(stop_time=stop_time),
        )

`call(state, request_info)`

Evaluate constraint against global error rate.

Parameters:

Name	Type	Description	Default
`state`	`SchedulerState`	Current scheduler state with global request and error counts	required
`request_info`	`RequestInfo`	Individual request information (unused)	required

Returns:

Type	Description
`SchedulerUpdateAction`	Action indicating whether to continue or stop operations

Source code in src/guidellm/scheduler/constraints/error.py

def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against global error rate.

    :param state: Current scheduler state with global request and error counts
    :param request_info: Individual request information (unused)
    :return: Action indicating whether to continue or stop operations
    """
    _ = request_info  # Unused parameters
    current_index = max(0, self.current_index)
    max_error_rate = (
        self.args.rate
        if isinstance(self.args.rate, int | float)
        else self.args.rate[min(current_index, len(self.args.rate) - 1)]
    )

    exceeded_min_processed = (
        self.args.minimum is None or state.processed_requests >= self.args.minimum
    )
    error_rate = (
        state.errored_requests / float(state.processed_requests)
        if state.processed_requests > 0
        else 0.0
    )
    exceeded_error_rate = error_rate >= max_error_rate
    exceeded = exceeded_min_processed and exceeded_error_rate
    stop_time = None if not exceeded else request_info.completed_at or time.time()

    return SchedulerUpdateAction(
        request_queuing="stop" if exceeded else "continue",
        request_processing="stop_all" if exceeded else "continue",
        metadata={
            "max_error_rate": max_error_rate,
            "min_processed": self.args.minimum,
            "processed_requests": state.processed_requests,
            "errored_requests": state.errored_requests,
            "error_rate": error_rate,
            "exceeded_min_processed": exceeded_min_processed,
            "exceeded_error_rate": exceeded_error_rate,
            "exceeded": exceeded,
            "stop_time": stop_time,
        },
        progress=SchedulerProgress(stop_time=stop_time),
    )

`create_constraint(**_kwargs)`

Return self as the constraint instance.

Parameters:

Name	Type	Description	Default
`kwargs`		Additional keyword arguments (unused)	required

Returns:

Type	Description
`Constraint`	Self instance as the constraint

Source code in src/guidellm/scheduler/constraints/error.py

def create_constraint(self, **_kwargs) -> Constraint:
    """
    Return self as the constraint instance.

    :param kwargs: Additional keyword arguments (unused)
    :return: Self instance as the constraint
    """
    self.current_index += 1

    return cast("Constraint", self.model_copy())

`MaxGlobalErrorRateConstraintArgs`

Bases: ConstraintArgs

Arguments for maximum global error rate constraint.

Stops execution when the overall error rate across all requests exceeds the threshold. Only applies after min_processed requests are completed.

Attributes:

Name	Type	Description
`kind`	`Literal['max_global_error_rate']`	Always "max_global_error_rate"

Source code in src/guidellm/scheduler/constraints/error.py

@ConstraintArgs.register("max_global_error_rate")
class MaxGlobalErrorRateConstraintArgs(ConstraintArgs):
    """
    Arguments for maximum global error rate constraint.

    Stops execution when the overall error rate across all requests exceeds
    the threshold. Only applies after min_processed requests are completed.

    :cvar kind: Always "max_global_error_rate"
    """

    kind: Literal["max_global_error_rate"] = Field(
        default="max_global_error_rate",
        description="Constraint type discriminator",
    )
    rate: ErrorRateOrList = Field(
        description="Maximum global error rate (0.0 to 1.0) before stopping",
    )
    minimum: int | float | None = Field(
        default_factory=lambda: settings.constraint_error_min_processed,
        gt=0,
        description="Minimum requests processed before applying error rate constraint",
    )

`MaxNumberConstraint`

Bases: PydanticConstraintInitializer

Constraint that limits execution based on maximum request counts.

Stops request queuing when created requests reach the limit and stops local request processing when processed requests reach the limit. Provides progress tracking based on remaining requests and completion fraction.

Source code in src/guidellm/scheduler/constraints/request.py

@ConstraintsInitializerFactory.register("max_requests")
class MaxNumberConstraint(PydanticConstraintInitializer):
    """
    Constraint that limits execution based on maximum request counts.

    Stops request queuing when created requests reach the limit and stops local
    request processing when processed requests reach the limit. Provides progress
    tracking based on remaining requests and completion fraction.
    """

    type_: Literal["max_requests"] = "max_requests"  # type: ignore[assignment]
    args: MaxRequestsConstraintArgs = Field(
        description="Configuration arguments for max request count constraint",
    )
    current_index: int = Field(
        default=-1, description="Current index for list-based max_num values"
    )

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Return self as the constraint instance.

        :param kwargs: Additional keyword arguments (unused)
        :return: Self instance as the constraint
        """
        self.current_index += 1

        return cast("Constraint", self.model_copy())

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against current scheduler state and request count.

        :param state: Current scheduler state with request counts
        :param request_info: Individual request information (unused)
        :return: Action indicating whether to continue or stop operations
        """
        _ = request_info  # Unused parameters
        current_index = max(0, self.current_index)
        max_num = (
            self.args.count
            if isinstance(self.args.count, int | float)
            else self.args.count[min(current_index, len(self.args.count) - 1)]
        )

        create_exceeded = state.created_requests >= max_num
        processed_exceeded = state.processed_requests >= max_num
        remaining_requests = min(max(0, max_num - state.processed_requests), max_num)
        stop_time = (
            None if remaining_requests > 0 else request_info.completed_at or time.time()
        )

        return SchedulerUpdateAction(
            request_queuing="stop" if create_exceeded else "continue",
            request_processing="stop_local" if processed_exceeded else "continue",
            metadata={
                "max_requests": max_num,
                "create_exceeded": create_exceeded,
                "processed_exceeded": processed_exceeded,
                "created_requests": state.created_requests,
                "processed_requests": state.processed_requests,
                "remaining_requests": remaining_requests,
                "stop_time": stop_time,
            },
            progress=SchedulerProgress(
                remaining_requests=remaining_requests,
                total_requests=max_num,
                stop_time=stop_time,
            ),
        )

`call(state, request_info)`

Evaluate constraint against current scheduler state and request count.

Parameters:

Name	Type	Description	Default
`state`	`SchedulerState`	Current scheduler state with request counts	required
`request_info`	`RequestInfo`	Individual request information (unused)	required

Returns:

Type	Description
`SchedulerUpdateAction`	Action indicating whether to continue or stop operations

Source code in src/guidellm/scheduler/constraints/request.py

def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against current scheduler state and request count.

    :param state: Current scheduler state with request counts
    :param request_info: Individual request information (unused)
    :return: Action indicating whether to continue or stop operations
    """
    _ = request_info  # Unused parameters
    current_index = max(0, self.current_index)
    max_num = (
        self.args.count
        if isinstance(self.args.count, int | float)
        else self.args.count[min(current_index, len(self.args.count) - 1)]
    )

    create_exceeded = state.created_requests >= max_num
    processed_exceeded = state.processed_requests >= max_num
    remaining_requests = min(max(0, max_num - state.processed_requests), max_num)
    stop_time = (
        None if remaining_requests > 0 else request_info.completed_at or time.time()
    )

    return SchedulerUpdateAction(
        request_queuing="stop" if create_exceeded else "continue",
        request_processing="stop_local" if processed_exceeded else "continue",
        metadata={
            "max_requests": max_num,
            "create_exceeded": create_exceeded,
            "processed_exceeded": processed_exceeded,
            "created_requests": state.created_requests,
            "processed_requests": state.processed_requests,
            "remaining_requests": remaining_requests,
            "stop_time": stop_time,
        },
        progress=SchedulerProgress(
            remaining_requests=remaining_requests,
            total_requests=max_num,
            stop_time=stop_time,
        ),
    )

`create_constraint(**_kwargs)`

Return self as the constraint instance.

Parameters:

Name	Type	Description	Default
`kwargs`		Additional keyword arguments (unused)	required

Returns:

Type	Description
`Constraint`	Self instance as the constraint

Source code in src/guidellm/scheduler/constraints/request.py

def create_constraint(self, **_kwargs) -> Constraint:
    """
    Return self as the constraint instance.

    :param kwargs: Additional keyword arguments (unused)
    :return: Self instance as the constraint
    """
    self.current_index += 1

    return cast("Constraint", self.model_copy())

`MaxRequestsConstraintArgs`

Bases: ConstraintArgs

Arguments for maximum request count constraint.

Limits the number of requests processed per strategy.

Attributes:

Name	Type	Description
`kind`	`Literal['max_requests']`	Always "max_requests"

Source code in src/guidellm/scheduler/constraints/request.py

@ConstraintArgs.register("max_requests")
class MaxRequestsConstraintArgs(ConstraintArgs):
    """
    Arguments for maximum request count constraint.

    Limits the number of requests processed per strategy.

    :cvar kind: Always "max_requests"
    """

    kind: Literal["max_requests"] = Field(
        default="max_requests",
        description="Constraint type discriminator",
    )
    count: PositiveNumOrList = Field(
        description="Maximum number of requests before stopping execution",
    )

`OverSaturationConstraint`

Bases: Constraint

Constraint that detects and stops execution when over-saturation is detected.

This constraint implements the Over-Saturation Detection (OSD) algorithm to identify when a model becomes over-saturated (response rate doesn't keep up with request rate). When over-saturation is detected, the constraint stops request queuing and optionally stops processing of existing requests.

The constraint maintains internal state for tracking concurrent requests and time-to-first-token (TTFT) metrics, using statistical slope detection to identify performance degradation patterns.

Source code in src/guidellm/scheduler/constraints/saturation.py

class OverSaturationConstraint(Constraint):
    """
    Constraint that detects and stops execution when over-saturation is detected.

    This constraint implements the Over-Saturation Detection (OSD) algorithm to
    identify when a model becomes over-saturated (response rate doesn't keep up with
    request rate). When over-saturation is detected, the constraint stops request
    queuing and optionally stops processing of existing requests.

    The constraint maintains internal state for tracking concurrent requests and
    time-to-first-token (TTFT) metrics, using statistical slope detection to identify
    performance degradation patterns.
    """

    def __init__(
        self,
        minimum_duration: float = 30.0,
        minimum_ttft: float = 2.5,
        maximum_window_seconds: float = 120.0,
        moe_threshold: float = 2.0,
        maximum_window_ratio: float = 0.75,
        minimum_window_size: int = 5,
        confidence: float = 0.95,
        eps: float = 1e-12,
        mode: Literal["enforce", "monitor"] = "enforce",
    ) -> None:  # noqa: PLR0913
        """
        Initialize the over-saturation constraint.

        Creates a new constraint instance with specified detection parameters.
        The constraint will track concurrent requests and TTFT metrics, using
        statistical slope detection to identify when the model becomes
        over-saturated. All parameters have sensible defaults suitable for
        most benchmarking scenarios.

        :param minimum_duration: Minimum seconds before checking for over-saturation
            (default: 30.0)
        :param minimum_ttft: Minimum TTFT threshold in seconds for violation counting
            (default: 2.5)
        :param maximum_window_seconds: Maximum time window in seconds for data retention
            (default: 120.0)
        :param moe_threshold: Margin of error threshold for slope detection
            (default: 2.0)
        :param maximum_window_ratio: Maximum window size as ratio of total requests
            (default: 0.75)
        :param minimum_window_size: Minimum data points required for slope estimation
            (default: 5)
        :param confidence: Statistical confidence level for t-distribution (0-1)
            (default: 0.95)
        :param eps: Epsilon for numerical stability in calculations
            (default: 1e-12)
        :param mode: Whether to stop when over-saturation is detected, or only monitor
            (default: "enforce")
        """
        self.minimum_duration = minimum_duration
        self.minimum_ttft = minimum_ttft
        self.maximum_window_seconds = maximum_window_seconds
        self.maximum_window_ratio = maximum_window_ratio
        self.minimum_window_size = minimum_window_size
        self.moe_threshold = moe_threshold
        self.confidence = confidence
        self.eps = eps
        self.mode = mode
        self.reset()

    @property
    def info(self) -> dict[str, Any]:
        """
        Get current constraint configuration and state information.
        :return: Dictionary containing configuration parameters.
        """

        return {
            "type_": "over_saturation",
            "minimum_duration": self.minimum_duration,
            "minimum_ttft": self.minimum_ttft,
            "maximum_window_seconds": self.maximum_window_seconds,
            "maximum_window_ratio": self.maximum_window_ratio,
            "minimum_window_size": self.minimum_window_size,
            "moe_threshold": self.moe_threshold,
            "confidence": self.confidence,
            "mode": self.mode,
        }

    def reset(self) -> None:
        """
        Reset all internal state to initial values.

        Clears all tracked requests, resets counters, and reinitializes slope
        checkers. Useful for reusing constraint instances across multiple
        benchmark runs or resetting state after configuration changes.
        """
        self.duration = 0.0
        self.started_requests: list[dict[str, Any]] = []
        self.finished_requests: list[dict[str, Any]] = []
        self.ttft_violations_counter = 0
        self.total_finished_ever = 0
        self.total_started_ever = 0
        self._ttft_reported_request_ids: set[str] = set()
        self.concurrent_slope_checker = SlopeChecker(
            moe_threshold=self.moe_threshold, confidence=self.confidence, eps=self.eps
        )
        self.ttft_slope_checker = SlopeChecker(
            moe_threshold=self.moe_threshold, confidence=self.confidence, eps=self.eps
        )

    def _add_finished(self, request: dict[str, Any]) -> None:
        """
        Add a finished request to tracking.

        :param request: Dictionary containing request data with 'ttft' and
            'duration' keys.
        """
        ttft = request["ttft"]
        duration = request["duration"]
        if ttft is not None:
            self.total_finished_ever += 1
            self.finished_requests.append(request)
            if ttft > self.minimum_ttft:
                self.ttft_violations_counter += 1
            self.ttft_slope_checker.add_data_point(duration, ttft)

    def _remove_finished(self, request: dict[str, Any]) -> None:
        """
        Remove a finished request from tracking.

        :param request: Dictionary containing request data with 'ttft' and
            'duration' keys.
        """
        del self.finished_requests[0]
        ttft = request["ttft"]
        duration = request["duration"]
        if ttft > self.minimum_ttft:
            self.ttft_violations_counter -= 1
        self.ttft_slope_checker.remove_data_point(duration, ttft)

    def _add_started(self, request: dict[str, Any]) -> None:
        """
        Add a started request to tracking.

        :param request: Dictionary containing request data with
            'concurrent_requests' and 'duration' keys.
        """
        concurrent = request["concurrent_requests"]
        duration = request["duration"]
        if concurrent is not None:
            self.total_started_ever += 1
            self.started_requests.append(request)
            self.concurrent_slope_checker.add_data_point(duration, concurrent)

    def _remove_started(self, request: dict[str, Any]) -> None:
        """
        Remove a started request from tracking.

        :param request: Dictionary containing request data with
            'concurrent_requests' and 'duration' keys.
        """
        del self.started_requests[0]
        concurrent = request["concurrent_requests"]
        duration = request["duration"]
        self.concurrent_slope_checker.remove_data_point(duration, concurrent)

    def _update_duration(self, duration: float) -> None:
        """
        Update duration and prune old data points.

        Updates the current duration and removes data points that exceed the maximum
        window size (by ratio or time) to maintain bounded memory usage.

        :param duration: Current duration in seconds since benchmark start.
        """
        self.duration = duration

        maximum_finished_window_size = int(
            self.total_finished_ever * self.maximum_window_ratio
        )
        while len(self.finished_requests) > maximum_finished_window_size:
            self._remove_finished(self.finished_requests[0])

        while (len(self.finished_requests) > 0) and (
            (
                time_since_earliest_request := duration
                - self.finished_requests[0]["duration"]
            )
            > self.maximum_window_seconds
        ):
            self._remove_finished(self.finished_requests[0])

        maximum_started_window_size = int(
            self.total_started_ever * self.maximum_window_ratio
        )
        while len(self.started_requests) > maximum_started_window_size:
            self._remove_started(self.started_requests[0])

        while (len(self.started_requests) > 0) and (
            (
                time_since_earliest_request := duration  # noqa: F841
                - self.started_requests[0]["duration"]
            )
            > self.maximum_window_seconds
        ):
            self._remove_started(self.started_requests[0])

    def _check_alert(self) -> bool:
        """
        Check if over-saturation is currently detected.

        :return: True if over-saturation is detected, False otherwise.
        """
        # Use duration as the maximum n value since requests from the
        # same second are highly correlated, this is simple and good enough
        # given that the MOE has a custom threshold anyway.
        concurrent_n = min(self.duration, self.concurrent_slope_checker.n)
        ttft_n = min(self.duration, self.ttft_slope_checker.n)

        if (
            (self.duration < self.minimum_duration)
            or (self.ttft_slope_checker.n > self.ttft_violations_counter * 2)
            or (self.duration < self.minimum_ttft)
            or (concurrent_n < self.minimum_window_size)
        ):
            return False

        is_concurrent_slope_positive = self.concurrent_slope_checker.check_slope(
            concurrent_n
        )

        if ttft_n < self.minimum_window_size:
            return is_concurrent_slope_positive

        is_ttft_slope_positive = self.ttft_slope_checker.check_slope(ttft_n)

        return is_concurrent_slope_positive and is_ttft_slope_positive

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against current scheduler state.

        :param state: Current scheduler state.
        :param request_info: Individual request information.
        :return: Action indicating whether to continue or stop operations.
        """
        duration = time.time() - state.start_time

        if request_info.status == "in_progress":
            concurrent_requests = state.processing_requests
            self._add_started(
                {"concurrent_requests": concurrent_requests, "duration": duration}
            )
        elif request_info.status in ("first_token", "completed"):
            if (
                request_info.request_id not in self._ttft_reported_request_ids
                and request_info.timings
                and request_info.timings.first_token_iteration
                and request_info.timings.request_start
            ):
                self._ttft_reported_request_ids.add(request_info.request_id)
                ttft = (
                    request_info.timings.first_token_iteration
                    - request_info.timings.request_start
                )
                self._add_finished({"ttft": ttft, "duration": duration})

        self._update_duration(duration)
        is_over_saturated = self._check_alert()

        ttft_slope = self.ttft_slope_checker.slope
        ttft_slope_moe = self.ttft_slope_checker.margin_of_error
        ttft_n = self.ttft_slope_checker.n
        ttft_violations = self.ttft_violations_counter
        concurrent_slope = self.concurrent_slope_checker.slope
        concurrent_slope_moe = self.concurrent_slope_checker.margin_of_error
        concurrent_n = self.concurrent_slope_checker.n

        should_stop = is_over_saturated and self.mode == "enforce"
        return SchedulerUpdateAction(
            request_queuing="stop" if should_stop else "continue",
            request_processing="stop_all" if should_stop else "continue",
            metadata={
                "ttft_slope": ttft_slope,
                "ttft_slope_moe": ttft_slope_moe,
                "ttft_n": ttft_n,
                "ttft_violations": ttft_violations,
                "concurrent_slope": concurrent_slope,
                "concurrent_slope_moe": concurrent_slope_moe,
                "concurrent_n": concurrent_n,
                "is_over_saturated": is_over_saturated,
            },
        )

`info` `property`

Get current constraint configuration and state information.

Returns:

Type	Description
`dict[str, Any]`	Dictionary containing configuration parameters.

`call(state, request_info)`

Evaluate constraint against current scheduler state.

Parameters:

Name	Type	Description	Default
`state`	`SchedulerState`	Current scheduler state.	required
`request_info`	`RequestInfo`	Individual request information.	required

Returns:

Type	Description
`SchedulerUpdateAction`	Action indicating whether to continue or stop operations.

Source code in src/guidellm/scheduler/constraints/saturation.py

def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against current scheduler state.

    :param state: Current scheduler state.
    :param request_info: Individual request information.
    :return: Action indicating whether to continue or stop operations.
    """
    duration = time.time() - state.start_time

    if request_info.status == "in_progress":
        concurrent_requests = state.processing_requests
        self._add_started(
            {"concurrent_requests": concurrent_requests, "duration": duration}
        )
    elif request_info.status in ("first_token", "completed"):
        if (
            request_info.request_id not in self._ttft_reported_request_ids
            and request_info.timings
            and request_info.timings.first_token_iteration
            and request_info.timings.request_start
        ):
            self._ttft_reported_request_ids.add(request_info.request_id)
            ttft = (
                request_info.timings.first_token_iteration
                - request_info.timings.request_start
            )
            self._add_finished({"ttft": ttft, "duration": duration})

    self._update_duration(duration)
    is_over_saturated = self._check_alert()

    ttft_slope = self.ttft_slope_checker.slope
    ttft_slope_moe = self.ttft_slope_checker.margin_of_error
    ttft_n = self.ttft_slope_checker.n
    ttft_violations = self.ttft_violations_counter
    concurrent_slope = self.concurrent_slope_checker.slope
    concurrent_slope_moe = self.concurrent_slope_checker.margin_of_error
    concurrent_n = self.concurrent_slope_checker.n

    should_stop = is_over_saturated and self.mode == "enforce"
    return SchedulerUpdateAction(
        request_queuing="stop" if should_stop else "continue",
        request_processing="stop_all" if should_stop else "continue",
        metadata={
            "ttft_slope": ttft_slope,
            "ttft_slope_moe": ttft_slope_moe,
            "ttft_n": ttft_n,
            "ttft_violations": ttft_violations,
            "concurrent_slope": concurrent_slope,
            "concurrent_slope_moe": concurrent_slope_moe,
            "concurrent_n": concurrent_n,
            "is_over_saturated": is_over_saturated,
        },
    )

`init(minimum_duration=30.0, minimum_ttft=2.5, maximum_window_seconds=120.0, moe_threshold=2.0, maximum_window_ratio=0.75, minimum_window_size=5, confidence=0.95, eps=1e-12, mode='enforce')`

Initialize the over-saturation constraint.

Creates a new constraint instance with specified detection parameters. The constraint will track concurrent requests and TTFT metrics, using statistical slope detection to identify when the model becomes over-saturated. All parameters have sensible defaults suitable for most benchmarking scenarios.

Parameters:

Name	Type	Description	Default
`minimum_duration`	`float`	Minimum seconds before checking for over-saturation (default: 30.0)	`30.0`
`minimum_ttft`	`float`	Minimum TTFT threshold in seconds for violation counting (default: 2.5)	`2.5`
`maximum_window_seconds`	`float`	Maximum time window in seconds for data retention (default: 120.0)	`120.0`
`moe_threshold`	`float`	Margin of error threshold for slope detection (default: 2.0)	`2.0`
`maximum_window_ratio`	`float`	Maximum window size as ratio of total requests (default: 0.75)	`0.75`
`minimum_window_size`	`int`	Minimum data points required for slope estimation (default: 5)	`5`
`confidence`	`float`	Statistical confidence level for t-distribution (0-1) (default: 0.95)	`0.95`
`eps`	`float`	Epsilon for numerical stability in calculations (default: 1e-12)	`1e-12`
`mode`	`Literal['enforce', 'monitor']`	Whether to stop when over-saturation is detected, or only monitor (default: "enforce")	`'enforce'`

Source code in src/guidellm/scheduler/constraints/saturation.py

def __init__(
    self,
    minimum_duration: float = 30.0,
    minimum_ttft: float = 2.5,
    maximum_window_seconds: float = 120.0,
    moe_threshold: float = 2.0,
    maximum_window_ratio: float = 0.75,
    minimum_window_size: int = 5,
    confidence: float = 0.95,
    eps: float = 1e-12,
    mode: Literal["enforce", "monitor"] = "enforce",
) -> None:  # noqa: PLR0913
    """
    Initialize the over-saturation constraint.

    Creates a new constraint instance with specified detection parameters.
    The constraint will track concurrent requests and TTFT metrics, using
    statistical slope detection to identify when the model becomes
    over-saturated. All parameters have sensible defaults suitable for
    most benchmarking scenarios.

    :param minimum_duration: Minimum seconds before checking for over-saturation
        (default: 30.0)
    :param minimum_ttft: Minimum TTFT threshold in seconds for violation counting
        (default: 2.5)
    :param maximum_window_seconds: Maximum time window in seconds for data retention
        (default: 120.0)
    :param moe_threshold: Margin of error threshold for slope detection
        (default: 2.0)
    :param maximum_window_ratio: Maximum window size as ratio of total requests
        (default: 0.75)
    :param minimum_window_size: Minimum data points required for slope estimation
        (default: 5)
    :param confidence: Statistical confidence level for t-distribution (0-1)
        (default: 0.95)
    :param eps: Epsilon for numerical stability in calculations
        (default: 1e-12)
    :param mode: Whether to stop when over-saturation is detected, or only monitor
        (default: "enforce")
    """
    self.minimum_duration = minimum_duration
    self.minimum_ttft = minimum_ttft
    self.maximum_window_seconds = maximum_window_seconds
    self.maximum_window_ratio = maximum_window_ratio
    self.minimum_window_size = minimum_window_size
    self.moe_threshold = moe_threshold
    self.confidence = confidence
    self.eps = eps
    self.mode = mode
    self.reset()

`reset()`

Reset all internal state to initial values.

Clears all tracked requests, resets counters, and reinitializes slope checkers. Useful for reusing constraint instances across multiple benchmark runs or resetting state after configuration changes.

Source code in src/guidellm/scheduler/constraints/saturation.py

def reset(self) -> None:
    """
    Reset all internal state to initial values.

    Clears all tracked requests, resets counters, and reinitializes slope
    checkers. Useful for reusing constraint instances across multiple
    benchmark runs or resetting state after configuration changes.
    """
    self.duration = 0.0
    self.started_requests: list[dict[str, Any]] = []
    self.finished_requests: list[dict[str, Any]] = []
    self.ttft_violations_counter = 0
    self.total_finished_ever = 0
    self.total_started_ever = 0
    self._ttft_reported_request_ids: set[str] = set()
    self.concurrent_slope_checker = SlopeChecker(
        moe_threshold=self.moe_threshold, confidence=self.confidence, eps=self.eps
    )
    self.ttft_slope_checker = SlopeChecker(
        moe_threshold=self.moe_threshold, confidence=self.confidence, eps=self.eps
    )

`OverSaturationConstraintArgs`

Bases: ConstraintArgs

Arguments for over-saturation detection constraint.

Detects when a model becomes over-saturated using statistical slope analysis of concurrent requests and time-to-first-token metrics.

Attributes:

Name	Type	Description
`kind`	`Literal['over_saturation']`	Always "over_saturation"

Source code in src/guidellm/scheduler/constraints/saturation.py

@ConstraintArgs.register("over_saturation")
class OverSaturationConstraintArgs(ConstraintArgs):
    """
    Arguments for over-saturation detection constraint.

    Detects when a model becomes over-saturated using statistical slope analysis
    of concurrent requests and time-to-first-token metrics.

    :cvar kind: Always "over_saturation"
    """

    kind: Literal["over_saturation"] = Field(
        default="over_saturation",
        description="Constraint type discriminator",
    )
    mode: Literal["enforce", "monitor"] = Field(
        default="enforce",
        description=(
            "Whether to stop the benchmark if over-saturation is detected. "
            "Set to `enforce` to stop the benchmark if over-saturation is "
            "detected, and `monitor` to only report over-saturation."
        ),
    )
    min_seconds: int | float = Field(
        default=30.0,
        ge=0,
        description="Minimum seconds before checking for over-saturation",
    )
    max_window_seconds: int | float = Field(
        default=120.0,
        ge=0,
        description="Maximum over-saturation checking window size in seconds",
    )
    moe_threshold: float = Field(
        default=2.0,
        ge=0,
        description="Margin of error threshold for slope detection",
    )
    minimum_ttft: float = Field(
        default=2.5,
        ge=0,
        description="Minimum TTFT threshold for violation counting",
    )
    maximum_window_ratio: float = Field(
        default=0.75,
        ge=0,
        le=1.0,
        description="Maximum window size as ratio of total requests",
    )
    minimum_window_size: int = Field(
        default=5,
        ge=0,
        description="Minimum data points required for slope estimation",
    )
    confidence: float = Field(
        default=0.95,
        ge=0,
        le=1.0,
        description="Statistical confidence level for t-distribution",
    )

    @property
    def constraint_key(self) -> str:
        return "over_saturation"

`OverSaturationConstraintInitializer`

Bases: PydanticConstraintInitializer

Factory for creating OverSaturationConstraint instances from configuration.

Stores an OverSaturationConstraintArgs instance and delegates to OverSaturationConstraint in create_constraint().

Example: ::

from guidellm.scheduler.constraints import OverSaturationConstraintArgs

args = OverSaturationConstraintArgs(mode="enforce", min_seconds=60.0)
initializer = OverSaturationConstraintInitializer(args=args)
constraint = initializer.create_constraint()

Source code in src/guidellm/scheduler/constraints/saturation.py

@ConstraintsInitializerFactory.register("over_saturation")
class OverSaturationConstraintInitializer(PydanticConstraintInitializer):
    """
    Factory for creating OverSaturationConstraint instances from configuration.

    Stores an ``OverSaturationConstraintArgs`` instance and delegates to
    ``OverSaturationConstraint`` in ``create_constraint()``.

    Example:
    ::

        from guidellm.scheduler.constraints import OverSaturationConstraintArgs

        args = OverSaturationConstraintArgs(mode="enforce", min_seconds=60.0)
        initializer = OverSaturationConstraintInitializer(args=args)
        constraint = initializer.create_constraint()
    """

    type_: Literal["over_saturation"] = "over_saturation"  # type: ignore[assignment]
    args: OverSaturationConstraintArgs = Field(
        default_factory=OverSaturationConstraintArgs,
        description="Configuration arguments for over-saturation detection",
    )

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Create an OverSaturationConstraint instance from stored args.

        :param _kwargs: Additional keyword arguments (unused)
        :return: Configured OverSaturationConstraint instance ready for use
        """
        return OverSaturationConstraint(
            minimum_duration=self.args.min_seconds,
            minimum_ttft=self.args.minimum_ttft,
            maximum_window_seconds=self.args.max_window_seconds,
            moe_threshold=self.args.moe_threshold,
            maximum_window_ratio=self.args.maximum_window_ratio,
            minimum_window_size=self.args.minimum_window_size,
            confidence=self.args.confidence,
            mode=self.args.mode,
        )

`create_constraint(**_kwargs)`

Create an OverSaturationConstraint instance from stored args.

Parameters:

Name	Type	Description	Default
`_kwargs`		Additional keyword arguments (unused)	`{}`

Returns:

Type	Description
`Constraint`	Configured OverSaturationConstraint instance ready for use

Source code in src/guidellm/scheduler/constraints/saturation.py

def create_constraint(self, **_kwargs) -> Constraint:
    """
    Create an OverSaturationConstraint instance from stored args.

    :param _kwargs: Additional keyword arguments (unused)
    :return: Configured OverSaturationConstraint instance ready for use
    """
    return OverSaturationConstraint(
        minimum_duration=self.args.min_seconds,
        minimum_ttft=self.args.minimum_ttft,
        maximum_window_seconds=self.args.max_window_seconds,
        moe_threshold=self.args.moe_threshold,
        maximum_window_ratio=self.args.maximum_window_ratio,
        minimum_window_size=self.args.minimum_window_size,
        confidence=self.args.confidence,
        mode=self.args.mode,
    )

`PydanticConstraintInitializer`

Bases: StandardBaseModel, ABC, InfoMixin

Abstract base for Pydantic-based constraint initializers.

Provides standardized serialization, validation, and metadata handling for constraint initializers using Pydantic models. Subclasses implement specific constraint creation logic while inheriting validation and persistence support. Integrates with the constraint factory system for dynamic instantiation and configuration management.

Example: :: @ConstraintsInitializerFactory.register("max_duration") class MaxDurationConstraintInitializer(PydanticConstraintInitializer): type_: str = "max_duration" max_seconds: float = Field(description="Maximum duration in seconds")

    def create_constraint(self) -> Constraint:
        def evaluate(state, request):
            if time.time() - state.start_time > self.max_seconds:
                return SchedulerUpdateAction(request_queuing="stop")
            return SchedulerUpdateAction(request_queuing="continue")
        return evaluate

Attributes:

Name	Type	Description
`type_`	`str`	Type identifier for the constraint initializer

Source code in src/guidellm/scheduler/constraints/constraint.py

class PydanticConstraintInitializer(StandardBaseModel, ABC, InfoMixin):
    """
    Abstract base for Pydantic-based constraint initializers.

    Provides standardized serialization, validation, and metadata handling for
    constraint initializers using Pydantic models. Subclasses implement specific
    constraint creation logic while inheriting validation and persistence support.
    Integrates with the constraint factory system for dynamic instantiation and
    configuration management.

    Example:
    ::
        @ConstraintsInitializerFactory.register("max_duration")
        class MaxDurationConstraintInitializer(PydanticConstraintInitializer):
            type_: str = "max_duration"
            max_seconds: float = Field(description="Maximum duration in seconds")

            def create_constraint(self) -> Constraint:
                def evaluate(state, request):
                    if time.time() - state.start_time > self.max_seconds:
                        return SchedulerUpdateAction(request_queuing="stop")
                    return SchedulerUpdateAction(request_queuing="continue")
                return evaluate

    :cvar type_: Type identifier for the constraint initializer
    """

    type_: str = Field(description="Type identifier for the constraint initializer")

    @property
    def info(self) -> dict[str, Any]:
        """
        Extract serializable information from this constraint initializer.

        :return: Dictionary containing constraint configuration and metadata
        """
        return self.model_dump()

    @abstractmethod
    def create_constraint(self, **kwargs) -> Constraint:
        """
        Create a constraint instance.

        Must be implemented by subclasses to return their specific constraint type
        with appropriate configuration and validation. The returned constraint should
        be ready for evaluation against scheduler state and requests.

        :param kwargs: Additional keyword arguments (usually unused)
        :return: Configured constraint instance
        :raises NotImplementedError: Must be implemented by subclasses
        """
        ...

`info` `property`

Extract serializable information from this constraint initializer.

Returns:

Type	Description
`dict[str, Any]`	Dictionary containing constraint configuration and metadata

`create_constraint(**kwargs)` `abstractmethod`

Create a constraint instance.

Must be implemented by subclasses to return their specific constraint type with appropriate configuration and validation. The returned constraint should be ready for evaluation against scheduler state and requests.

Parameters:

Name	Type	Description	Default
`kwargs`		Additional keyword arguments (usually unused)	`{}`

Returns:

Type	Description
`Constraint`	Configured constraint instance

Raises:

Type	Description
`NotImplementedError`	Must be implemented by subclasses

Source code in src/guidellm/scheduler/constraints/constraint.py

@abstractmethod
def create_constraint(self, **kwargs) -> Constraint:
    """
    Create a constraint instance.

    Must be implemented by subclasses to return their specific constraint type
    with appropriate configuration and validation. The returned constraint should
    be ready for evaluation against scheduler state and requests.

    :param kwargs: Additional keyword arguments (usually unused)
    :return: Configured constraint instance
    :raises NotImplementedError: Must be implemented by subclasses
    """
    ...

`RequestsExhaustedConstraint`

Bases: StandardBaseModel, InfoMixin

Source code in src/guidellm/scheduler/constraints/request.py

class RequestsExhaustedConstraint(StandardBaseModel, InfoMixin):
    type_: Literal["requests_exhausted"] = "requests_exhausted"  # type: ignore[assignment]
    num_requests: int

    @property
    def info(self) -> dict[str, Any]:
        """
        Extract serializable information from this constraint initializer.

        :return: Dictionary containing constraint configuration and metadata
        """
        return self.model_dump()

    def __call__(
        self, state: SchedulerState, request: RequestInfo
    ) -> SchedulerUpdateAction:
        _ = request  # Unused parameter
        create_exceeded = state.created_requests >= self.num_requests
        processed_exceeded = state.processed_requests >= self.num_requests
        remaining_requests = max(0, self.num_requests - state.processed_requests)
        stop_time = (
            None if remaining_requests > 0 else request.completed_at or time.time()
        )

        return SchedulerUpdateAction(
            request_queuing="stop" if create_exceeded else "continue",
            request_processing="stop_local" if processed_exceeded else "continue",
            metadata={
                "num_requests": self.num_requests,
                "create_exceeded": create_exceeded,
                "processed_exceeded": processed_exceeded,
                "created_requests": state.created_requests,
                "processed_requests": state.processed_requests,
                "remaining_requests": remaining_requests,
                "stop_time": stop_time,
            },
            progress=SchedulerProgress(
                remaining_requests=remaining_requests,
                total_requests=self.num_requests,
                stop_time=stop_time,
            ),
        )

`info` `property`

Extract serializable information from this constraint initializer.

Returns:

Type	Description
`dict[str, Any]`	Dictionary containing constraint configuration and metadata

`SerializableConstraintInitializer`

Bases: Protocol

Protocol for serializable constraint initializers supporting persistence.

Extends ConstraintInitializer with serialization capabilities, enabling constraint configurations to be saved, loaded, and transmitted. Serializable initializers support validation, model-based configuration, and dictionary-based serialization for integration with configuration systems and persistence layers.

Example: :: class SerializableInitializer: @classmethod def model_validate(cls, data: dict) -> ConstraintInitializer: return cls(**data)

    def model_dump(self) -> dict[str, Any]:
        return {"type_": "max_requests", "max_requests": self.max_requests}

    def create_constraint(self) -> Constraint:
        # ... create constraint

Source code in src/guidellm/scheduler/constraints/constraint.py

@runtime_checkable
class SerializableConstraintInitializer(Protocol):
    """
    Protocol for serializable constraint initializers supporting persistence.

    Extends ConstraintInitializer with serialization capabilities, enabling constraint
    configurations to be saved, loaded, and transmitted. Serializable initializers
    support validation, model-based configuration, and dictionary-based serialization
    for integration with configuration systems and persistence layers.

    Example:
    ::
        class SerializableInitializer:
            @classmethod
            def model_validate(cls, data: dict) -> ConstraintInitializer:
                return cls(**data)

            def model_dump(self) -> dict[str, Any]:
                return {"type_": "max_requests", "max_requests": self.max_requests}

            def create_constraint(self) -> Constraint:
                # ... create constraint
    """

    @classmethod
    def model_validate(cls, **kwargs) -> ConstraintInitializer:
        """
        Create validated constraint initializer from configuration.

        :param kwargs: Configuration dictionary for initializer creation
        :return: Validated constraint initializer instance
        """

    def model_dump(self) -> dict[str, Any]:
        """
        Serialize constraint initializer to dictionary format.

        :return: Dictionary representation of constraint initializer
        """

    def create_constraint(self, **kwargs) -> Constraint:
        """
        Create constraint instance from this initializer.

        :param kwargs: Additional configuration parameters
        :return: Configured constraint evaluation function
        """

`create_constraint(**kwargs)`

Create constraint instance from this initializer.

Parameters:

Name	Type	Description	Default
`kwargs`		Additional configuration parameters	`{}`

Returns:

Type	Description
`Constraint`	Configured constraint evaluation function

Source code in src/guidellm/scheduler/constraints/constraint.py

def create_constraint(self, **kwargs) -> Constraint:
    """
    Create constraint instance from this initializer.

    :param kwargs: Additional configuration parameters
    :return: Configured constraint evaluation function
    """

`model_dump()`

Serialize constraint initializer to dictionary format.

Returns:

Type	Description
`dict[str, Any]`	Dictionary representation of constraint initializer

Source code in src/guidellm/scheduler/constraints/constraint.py

def model_dump(self) -> dict[str, Any]:
    """
    Serialize constraint initializer to dictionary format.

    :return: Dictionary representation of constraint initializer
    """

`model_validate(**kwargs)` `classmethod`

Create validated constraint initializer from configuration.

Parameters:

Name	Type	Description	Default
`kwargs`		Configuration dictionary for initializer creation	`{}`

Returns:

Type	Description
`ConstraintInitializer`	Validated constraint initializer instance

Source code in src/guidellm/scheduler/constraints/constraint.py

@classmethod
def model_validate(cls, **kwargs) -> ConstraintInitializer:
    """
    Create validated constraint initializer from configuration.

    :param kwargs: Configuration dictionary for initializer creation
    :return: Validated constraint initializer instance
    """

`UnserializableConstraintInitializer`

Bases: PydanticConstraintInitializer

Placeholder for constraints that cannot be serialized or executed.

Represents constraint initializers that failed serialization or contain non-serializable components. Cannot be executed and raises errors when invoked to prevent runtime failures from invalid constraint state. Used by the factory system to preserve constraint information even when full serialization is not possible.

Example: :: # Created automatically by factory when serialization fails unserializable = UnserializableConstraintInitializer( orig_info={"type_": "custom", "data": non_serializable_object} )

# Attempting to use it raises RuntimeError
constraint = unserializable.create_constraint()  # Raises RuntimeError

Attributes:

Name	Type	Description
`type_`	`Literal['unserializable']`	Always "unserializable" to identify placeholder constraints
`orig_info`	`dict[str, Any]`	Original constraint information before serialization failure

Source code in src/guidellm/scheduler/constraints/constraint.py

class UnserializableConstraintInitializer(PydanticConstraintInitializer):
    """
    Placeholder for constraints that cannot be serialized or executed.

    Represents constraint initializers that failed serialization or contain
    non-serializable components. Cannot be executed and raises errors when
    invoked to prevent runtime failures from invalid constraint state. Used
    by the factory system to preserve constraint information even when full
    serialization is not possible.

    Example:
    ::
        # Created automatically by factory when serialization fails
        unserializable = UnserializableConstraintInitializer(
            orig_info={"type_": "custom", "data": non_serializable_object}
        )

        # Attempting to use it raises RuntimeError
        constraint = unserializable.create_constraint()  # Raises RuntimeError

    :cvar type_: Always "unserializable" to identify placeholder constraints
    :cvar orig_info: Original constraint information before serialization failure
    """

    type_: Literal["unserializable"] = "unserializable"  # type: ignore[assignment]
    orig_info: dict[str, Any] = Field(
        default_factory=dict,
        description="Original constraint information before serialization failure",
    )

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Raise error for unserializable constraint creation attempt.

        :param kwargs: Additional keyword arguments (unused)
        :raises RuntimeError: Always raised since unserializable constraints
            cannot be executed
        """
        raise RuntimeError(
            "Cannot create constraint from unserializable constraint instance. "
            "This constraint cannot be serialized and therefore cannot be executed."
        )

    def __call__(
        self, state: SchedulerState, request: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Raise error since unserializable constraints cannot be invoked.

        :param state: Current scheduler state (unused)
        :param request: Individual request information (unused)
        :raises RuntimeError: Always raised for unserializable constraints
        """
        _ = (state, request)  # Unused parameters
        raise RuntimeError(
            "Cannot invoke unserializable constraint instance. "
            "This constraint was not properly serialized and cannot be executed."
        )

`call(state, request)`

Raise error since unserializable constraints cannot be invoked.

Parameters:

Name	Type	Description	Default
`state`	`SchedulerState`	Current scheduler state (unused)	required
`request`	`RequestInfo`	Individual request information (unused)	required

Raises:

Type	Description
`RuntimeError`	Always raised for unserializable constraints

Source code in src/guidellm/scheduler/constraints/constraint.py

def __call__(
    self, state: SchedulerState, request: RequestInfo
) -> SchedulerUpdateAction:
    """
    Raise error since unserializable constraints cannot be invoked.

    :param state: Current scheduler state (unused)
    :param request: Individual request information (unused)
    :raises RuntimeError: Always raised for unserializable constraints
    """
    _ = (state, request)  # Unused parameters
    raise RuntimeError(
        "Cannot invoke unserializable constraint instance. "
        "This constraint was not properly serialized and cannot be executed."
    )

`create_constraint(**_kwargs)`

Raise error for unserializable constraint creation attempt.

Parameters:

Name	Type	Description	Default
`kwargs`		Additional keyword arguments (unused)	required

Raises:

Type	Description
`RuntimeError`	Always raised since unserializable constraints cannot be executed

Source code in src/guidellm/scheduler/constraints/constraint.py

def create_constraint(self, **_kwargs) -> Constraint:
    """
    Raise error for unserializable constraint creation attempt.

    :param kwargs: Additional keyword arguments (unused)
    :raises RuntimeError: Always raised since unserializable constraints
        cannot be executed
    """
    raise RuntimeError(
        "Cannot create constraint from unserializable constraint instance. "
        "This constraint cannot be serialized and therefore cannot be executed."
    )

guidellm.scheduler.constraints

Constraint

__call__(state, request)

ConstraintArgs

constraint_key property

__pydantic_schema_base_type__() classmethod

ConstraintInitializer

create_constraint(**kwargs)

ConstraintsInitializerFactory

create(args) classmethod

deserialize(initializer_dict) classmethod

resolve(initializers) classmethod

MaxDurationConstraint

__call__(state, request_info)

create_constraint(**_kwargs)

MaxDurationConstraintArgs

MaxErrorRateConstraint

__call__(state, request_info)

create_constraint(**_kwargs)

MaxErrorRateConstraintArgs

MaxErrorsConstraint

__call__(state, request_info)

create_constraint(**_kwargs)

MaxErrorsConstraintArgs

MaxGlobalErrorRateConstraint

__call__(state, request_info)

create_constraint(**_kwargs)

MaxGlobalErrorRateConstraintArgs

MaxNumberConstraint

__call__(state, request_info)

create_constraint(**_kwargs)

MaxRequestsConstraintArgs

OverSaturationConstraint

info property

__call__(state, request_info)

__init__(minimum_duration=30.0, minimum_ttft=2.5, maximum_window_seconds=120.0, moe_threshold=2.0, maximum_window_ratio=0.75, minimum_window_size=5, confidence=0.95, eps=1e-12, mode='enforce')

reset()

OverSaturationConstraintArgs

OverSaturationConstraintInitializer

create_constraint(**_kwargs)

PydanticConstraintInitializer

info property

create_constraint(**kwargs) abstractmethod

RequestsExhaustedConstraint

info property

SerializableConstraintInitializer

create_constraint(**kwargs)

model_dump()

model_validate(**kwargs) classmethod

UnserializableConstraintInitializer

__call__(state, request)

create_constraint(**_kwargs)

`guidellm.scheduler.constraints`

`Constraint`

`call(state, request)`

`ConstraintArgs`

`constraint_key` `property`

`__pydantic_schema_base_type__()` `classmethod`

`ConstraintInitializer`

`create_constraint(**kwargs)`

`ConstraintsInitializerFactory`

`create(args)` `classmethod`

`deserialize(initializer_dict)` `classmethod`

`resolve(initializers)` `classmethod`

`MaxDurationConstraint`

`call(state, request_info)`

`create_constraint(**_kwargs)`

`MaxDurationConstraintArgs`

`MaxErrorRateConstraint`

`call(state, request_info)`

`create_constraint(**_kwargs)`

`MaxErrorRateConstraintArgs`

`MaxErrorsConstraint`

`call(state, request_info)`

`create_constraint(**_kwargs)`

`MaxErrorsConstraintArgs`

`MaxGlobalErrorRateConstraint`

`call(state, request_info)`

`create_constraint(**_kwargs)`

`MaxGlobalErrorRateConstraintArgs`

`MaxNumberConstraint`

`call(state, request_info)`

`create_constraint(**_kwargs)`

`MaxRequestsConstraintArgs`

`OverSaturationConstraint`

`info` `property`

`call(state, request_info)`

`init(minimum_duration=30.0, minimum_ttft=2.5, maximum_window_seconds=120.0, moe_threshold=2.0, maximum_window_ratio=0.75, minimum_window_size=5, confidence=0.95, eps=1e-12, mode='enforce')`

`reset()`

`OverSaturationConstraintArgs`

`OverSaturationConstraintInitializer`

`create_constraint(**_kwargs)`

`PydanticConstraintInitializer`

`info` `property`

`create_constraint(**kwargs)` `abstractmethod`

`RequestsExhaustedConstraint`

`info` `property`

`SerializableConstraintInitializer`

`create_constraint(**kwargs)`

`model_dump()`

`model_validate(**kwargs)` `classmethod`

`UnserializableConstraintInitializer`

`call(state, request)`

`create_constraint(**_kwargs)`