Skip to content

guidellm.scheduler.constraints

Constraint system for scheduler behavior control and request processing limits.

Provides flexible constraints for managing scheduler behavior with configurable thresholds based on time, error rates, and request counts. Constraints evaluate scheduler state and individual requests to determine whether processing should continue or stop based on predefined limits. The constraint system enables sophisticated benchmark stopping criteria through composable constraint types.

Constraint

Bases: Protocol

Protocol for constraint evaluation functions that control scheduler behavior.

Defines the interface that all constraint implementations must follow. Constraints are callable objects that evaluate scheduler state and request information to determine whether processing should continue or stop. The protocol enables type checking and runtime validation of constraint implementations while allowing flexible implementation approaches (functions, classes, closures).

Example: :: def my_constraint( state: SchedulerState, request: RequestInfo ) -> SchedulerUpdateAction: if state.processing_requests > 100: return SchedulerUpdateAction(request_queuing="stop") return SchedulerUpdateAction(request_queuing="continue")

Source code in src/guidellm/scheduler/constraints/constraint.py
@runtime_checkable
class Constraint(Protocol):
    """
    Protocol for constraint evaluation functions that control scheduler behavior.

    Defines the interface that all constraint implementations must follow. Constraints
    are callable objects that evaluate scheduler state and request information to
    determine whether processing should continue or stop. The protocol enables type
    checking and runtime validation of constraint implementations while allowing
    flexible implementation approaches (functions, classes, closures).

    Example:
    ::
        def my_constraint(
            state: SchedulerState, request: RequestInfo
        ) -> SchedulerUpdateAction:
            if state.processing_requests > 100:
                return SchedulerUpdateAction(request_queuing="stop")
            return SchedulerUpdateAction(request_queuing="continue")
    """

    def __call__(
        self, state: SchedulerState, request: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against scheduler state and request information.

        :param state: Current scheduler state with metrics and timing information
        :param request: Individual request information and metadata
        :return: Action indicating whether to continue or stop scheduler operations
        """

__call__(state, request)

Evaluate constraint against scheduler state and request information.

Parameters:

Name Type Description Default
state SchedulerState

Current scheduler state with metrics and timing information

required
request RequestInfo

Individual request information and metadata

required

Returns:

Type Description
SchedulerUpdateAction

Action indicating whether to continue or stop scheduler operations

Source code in src/guidellm/scheduler/constraints/constraint.py
def __call__(
    self, state: SchedulerState, request: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against scheduler state and request information.

    :param state: Current scheduler state with metrics and timing information
    :param request: Individual request information and metadata
    :return: Action indicating whether to continue or stop scheduler operations
    """

ConstraintArgs

Bases: PydanticClassRegistryMixin['ConstraintArgs']

Base class for constraint configuration arguments.

Uses PydanticClassRegistryMixin to enable polymorphic deserialization based on the kind field. Each registered subclass represents a specific constraint type with its own parameters.

Attributes:

Name Type Description
schema_discriminator str

Field name for polymorphic deserialization

Source code in src/guidellm/scheduler/constraints/args.py
class ConstraintArgs(PydanticClassRegistryMixin["ConstraintArgs"]):
    """
    Base class for constraint configuration arguments.

    Uses ``PydanticClassRegistryMixin`` to enable polymorphic deserialization
    based on the ``kind`` field. Each registered subclass represents a specific
    constraint type with its own parameters.

    :cvar schema_discriminator: Field name for polymorphic deserialization
    """

    model_config = ConfigDict(
        extra="forbid",
        serialize_by_alias=True,
        ser_json_bytes="base64",
        val_json_bytes="base64",
    )

    schema_discriminator: ClassVar[str] = "kind"

    @classmethod
    def __pydantic_schema_base_type__(cls) -> type[ConstraintArgs]:
        """
        Return base type for polymorphic validation hierarchy.

        :return: Base ConstraintArgs class for schema validation
        """
        if cls.__name__ == "ConstraintArgs":
            return cls

        return ConstraintArgs

    kind: str = Field(
        description="Constraint type discriminator for polymorphic serialization",
    )

    @property
    def constraint_key(self) -> str:
        """
        The key to use when inserting into the constraints dict.

        Defaults to ``kind``, but subclasses may override if the factory
        registry key differs from the args kind.

        :return: Registry key for this constraint type
        """
        return self.kind

constraint_key property

The key to use when inserting into the constraints dict.

Defaults to kind, but subclasses may override if the factory registry key differs from the args kind.

Returns:

Type Description
str

Registry key for this constraint type

__pydantic_schema_base_type__() classmethod

Return base type for polymorphic validation hierarchy.

Returns:

Type Description
type[ConstraintArgs]

Base ConstraintArgs class for schema validation

Source code in src/guidellm/scheduler/constraints/args.py
@classmethod
def __pydantic_schema_base_type__(cls) -> type[ConstraintArgs]:
    """
    Return base type for polymorphic validation hierarchy.

    :return: Base ConstraintArgs class for schema validation
    """
    if cls.__name__ == "ConstraintArgs":
        return cls

    return ConstraintArgs

ConstraintInitializer

Bases: Protocol

Protocol for constraint initializer factory functions that create constraints.

Defines the interface for factory objects that create constraint instances from configuration parameters. Constraint initializers enable dynamic constraint creation and configuration, supporting both simple boolean flags and complex parameter dictionaries. The protocol allows type checking while maintaining flexibility for different initialization patterns.

Example: :: class MaxRequestsInitializer: def init(self, max_requests: int): self.max_requests = max_requests

    def create_constraint(self) -> Constraint:
        def evaluate(state, request):
            if state.total_requests >= self.max_requests:
                return SchedulerUpdateAction(request_queuing="stop")
            return SchedulerUpdateAction(request_queuing="continue")
        return evaluate
Source code in src/guidellm/scheduler/constraints/constraint.py
@runtime_checkable
class ConstraintInitializer(Protocol):
    """
    Protocol for constraint initializer factory functions that create constraints.

    Defines the interface for factory objects that create constraint instances from
    configuration parameters. Constraint initializers enable dynamic constraint
    creation and configuration, supporting both simple boolean flags and complex
    parameter dictionaries. The protocol allows type checking while maintaining
    flexibility for different initialization patterns.

    Example:
    ::
        class MaxRequestsInitializer:
            def __init__(self, max_requests: int):
                self.max_requests = max_requests

            def create_constraint(self) -> Constraint:
                def evaluate(state, request):
                    if state.total_requests >= self.max_requests:
                        return SchedulerUpdateAction(request_queuing="stop")
                    return SchedulerUpdateAction(request_queuing="continue")
                return evaluate
    """

    def create_constraint(self, **kwargs) -> Constraint:
        """
        Create a constraint instance from configuration parameters.

        :param kwargs: Configuration parameters for constraint creation
        :return: Configured constraint evaluation function
        """

create_constraint(**kwargs)

Create a constraint instance from configuration parameters.

Parameters:

Name Type Description Default
kwargs

Configuration parameters for constraint creation

{}

Returns:

Type Description
Constraint

Configured constraint evaluation function

Source code in src/guidellm/scheduler/constraints/constraint.py
def create_constraint(self, **kwargs) -> Constraint:
    """
    Create a constraint instance from configuration parameters.

    :param kwargs: Configuration parameters for constraint creation
    :return: Configured constraint evaluation function
    """

ConstraintsInitializerFactory

Bases: RegistryMixin[ConstraintInitializer]

Registry factory for creating and managing constraint initializers.

Provides centralized access to registered constraint types with support for creating constraints from ConstraintArgs instances or pre-configured initializer instances. Handles constraint resolution and type validation for the scheduler constraint system.

Example: :: from guidellm.scheduler import ConstraintsInitializerFactory

# Register new constraint type
@ConstraintsInitializerFactory.register("new_constraint")
class NewConstraint:
    def create_constraint(self, **kwargs) -> Constraint:
        return lambda state, request: SchedulerUpdateAction()

# Create and use constraint
args = NewConstraintArgs(kind="new_constraint")
initializer = ConstraintsInitializerFactory.create(args)
constraint = initializer.create_constraint()
Source code in src/guidellm/scheduler/constraints/factory.py
class ConstraintsInitializerFactory(RegistryMixin[ConstraintInitializer]):
    """
    Registry factory for creating and managing constraint initializers.

    Provides centralized access to registered constraint types with support for
    creating constraints from ``ConstraintArgs`` instances or pre-configured
    initializer instances. Handles constraint resolution and type validation
    for the scheduler constraint system.

    Example:
    ::
        from guidellm.scheduler import ConstraintsInitializerFactory

        # Register new constraint type
        @ConstraintsInitializerFactory.register("new_constraint")
        class NewConstraint:
            def create_constraint(self, **kwargs) -> Constraint:
                return lambda state, request: SchedulerUpdateAction()

        # Create and use constraint
        args = NewConstraintArgs(kind="new_constraint")
        initializer = ConstraintsInitializerFactory.create(args)
        constraint = initializer.create_constraint()
    """

    @classmethod
    def create(cls, args: ConstraintArgs) -> ConstraintInitializer:
        """
        Create a constraint initializer from a ``ConstraintArgs`` instance.

        :param args: Validated constraint arguments with kind discriminator
        :return: Configured constraint initializer instance
        :raises ValueError: If args.kind is not registered in the factory
        """
        if cls.registry is None or args.kind not in cls.registry:
            raise ValueError(f"Unknown constraint discriminator: {args.kind}")

        initializer_class = cls.registry[args.kind]
        return initializer_class(args=args)  # type: ignore[operator]

    @classmethod
    def deserialize(
        cls, initializer_dict: dict[str, Any]
    ) -> SerializableConstraintInitializer | UnserializableConstraintInitializer:
        """
        Deserialize constraint initializer from dictionary format.

        :param initializer_dict: Dictionary representation of constraint initializer
        :return: Reconstructed constraint initializer instance
        :raises ValueError: If constraint type is unknown or cannot be deserialized
        """
        if initializer_dict.get("type_") == "unserializable":
            return UnserializableConstraintInitializer.model_validate(initializer_dict)

        if (
            cls.registry is not None
            and initializer_dict.get("type_")
            and initializer_dict["type_"] in cls.registry
        ):
            initializer_class = cls.registry[initializer_dict["type_"]]
            if hasattr(initializer_class, "model_validate"):
                return initializer_class.model_validate(initializer_dict)  # type: ignore[return-value]
            else:
                return initializer_class(**initializer_dict)  # type: ignore[return-value,operator]

        raise ValueError(
            f"Cannot deserialize unknown constraint initializer: "
            f"{initializer_dict.get('type_', 'unknown')}"
        )

    @classmethod
    def resolve(
        cls,
        initializers: dict[
            str,
            Constraint | ConstraintInitializer,
        ],
    ) -> dict[str, Constraint]:
        """
        Resolve constraint initializers to callable constraints.

        :param initializers: Dictionary mapping constraint keys to specifications.
            Values must be Constraint instances or ConstraintInitializer instances.
        :return: Dictionary mapping constraint keys to callable functions
        :raises TypeError: If a value is not a supported type
        """
        constraints = {}

        for key, val in initializers.items():
            if isinstance(val, Constraint):
                constraints[key] = val
            elif isinstance(val, ConstraintInitializer):
                constraints[key] = val.create_constraint()
            else:
                raise TypeError(
                    f"Constraint '{key}' has unsupported value type "
                    f"{type(val).__name__}. Expected a Constraint instance or "
                    f"ConstraintInitializer instance."
                )

        return constraints

create(args) classmethod

Create a constraint initializer from a ConstraintArgs instance.

Parameters:

Name Type Description Default
args ConstraintArgs

Validated constraint arguments with kind discriminator

required

Returns:

Type Description
ConstraintInitializer

Configured constraint initializer instance

Raises:

Type Description
ValueError

If args.kind is not registered in the factory

Source code in src/guidellm/scheduler/constraints/factory.py
@classmethod
def create(cls, args: ConstraintArgs) -> ConstraintInitializer:
    """
    Create a constraint initializer from a ``ConstraintArgs`` instance.

    :param args: Validated constraint arguments with kind discriminator
    :return: Configured constraint initializer instance
    :raises ValueError: If args.kind is not registered in the factory
    """
    if cls.registry is None or args.kind not in cls.registry:
        raise ValueError(f"Unknown constraint discriminator: {args.kind}")

    initializer_class = cls.registry[args.kind]
    return initializer_class(args=args)  # type: ignore[operator]

deserialize(initializer_dict) classmethod

Deserialize constraint initializer from dictionary format.

Parameters:

Name Type Description Default
initializer_dict dict[str, Any]

Dictionary representation of constraint initializer

required

Returns:

Type Description
SerializableConstraintInitializer | UnserializableConstraintInitializer

Reconstructed constraint initializer instance

Raises:

Type Description
ValueError

If constraint type is unknown or cannot be deserialized

Source code in src/guidellm/scheduler/constraints/factory.py
@classmethod
def deserialize(
    cls, initializer_dict: dict[str, Any]
) -> SerializableConstraintInitializer | UnserializableConstraintInitializer:
    """
    Deserialize constraint initializer from dictionary format.

    :param initializer_dict: Dictionary representation of constraint initializer
    :return: Reconstructed constraint initializer instance
    :raises ValueError: If constraint type is unknown or cannot be deserialized
    """
    if initializer_dict.get("type_") == "unserializable":
        return UnserializableConstraintInitializer.model_validate(initializer_dict)

    if (
        cls.registry is not None
        and initializer_dict.get("type_")
        and initializer_dict["type_"] in cls.registry
    ):
        initializer_class = cls.registry[initializer_dict["type_"]]
        if hasattr(initializer_class, "model_validate"):
            return initializer_class.model_validate(initializer_dict)  # type: ignore[return-value]
        else:
            return initializer_class(**initializer_dict)  # type: ignore[return-value,operator]

    raise ValueError(
        f"Cannot deserialize unknown constraint initializer: "
        f"{initializer_dict.get('type_', 'unknown')}"
    )

resolve(initializers) classmethod

Resolve constraint initializers to callable constraints.

Parameters:

Name Type Description Default
initializers dict[str, Constraint | ConstraintInitializer]

Dictionary mapping constraint keys to specifications. Values must be Constraint instances or ConstraintInitializer instances.

required

Returns:

Type Description
dict[str, Constraint]

Dictionary mapping constraint keys to callable functions

Raises:

Type Description
TypeError

If a value is not a supported type

Source code in src/guidellm/scheduler/constraints/factory.py
@classmethod
def resolve(
    cls,
    initializers: dict[
        str,
        Constraint | ConstraintInitializer,
    ],
) -> dict[str, Constraint]:
    """
    Resolve constraint initializers to callable constraints.

    :param initializers: Dictionary mapping constraint keys to specifications.
        Values must be Constraint instances or ConstraintInitializer instances.
    :return: Dictionary mapping constraint keys to callable functions
    :raises TypeError: If a value is not a supported type
    """
    constraints = {}

    for key, val in initializers.items():
        if isinstance(val, Constraint):
            constraints[key] = val
        elif isinstance(val, ConstraintInitializer):
            constraints[key] = val.create_constraint()
        else:
            raise TypeError(
                f"Constraint '{key}' has unsupported value type "
                f"{type(val).__name__}. Expected a Constraint instance or "
                f"ConstraintInitializer instance."
            )

    return constraints

MaxDurationConstraint

Bases: PydanticConstraintInitializer

Constraint that limits execution based on maximum time duration.

Stops both request queuing and processing when the elapsed time since scheduler start exceeds the maximum duration. Provides progress tracking based on remaining time and completion fraction.

Source code in src/guidellm/scheduler/constraints/request.py
@ConstraintsInitializerFactory.register("max_duration")
class MaxDurationConstraint(PydanticConstraintInitializer):
    """
    Constraint that limits execution based on maximum time duration.

    Stops both request queuing and processing when the elapsed time since scheduler
    start exceeds the maximum duration. Provides progress tracking based on
    remaining time and completion fraction.
    """

    type_: Literal["max_duration"] = "max_duration"  # type: ignore[assignment]
    args: MaxDurationConstraintArgs = Field(
        description="Configuration arguments for max duration constraint",
    )
    current_index: int = Field(default=-1, description="Current index in duration list")

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Return self as the constraint instance.

        :param kwargs: Additional keyword arguments (unused)
        :return: Self instance as the constraint
        """
        self.current_index += 1

        return cast("Constraint", self.model_copy())

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against current scheduler state and elapsed time.

        :param state: Current scheduler state with start time
        :param request_info: Individual request information (unused)
        :return: Action indicating whether to continue or stop operations
        """
        _ = request_info  # Unused parameters
        current_index = max(0, self.current_index)
        max_duration = (
            self.args.seconds
            if isinstance(self.args.seconds, int | float)
            else self.args.seconds[min(current_index, len(self.args.seconds) - 1)]
        )

        start_time = state.start_requests_time or state.start_time
        current_time = time.time()
        elapsed = current_time - start_time
        duration_exceeded = elapsed >= max_duration
        remaining_duration = min(max(0.0, max_duration - elapsed), max_duration)
        stop_time = None if not duration_exceeded else start_time + max_duration

        return SchedulerUpdateAction(
            request_queuing="stop" if duration_exceeded else "continue",
            request_processing="stop_local" if duration_exceeded else "continue",
            metadata={
                "max_duration": max_duration,
                "elapsed_time": elapsed,
                "duration_exceeded": duration_exceeded,
                "start_time": start_time,
                "current_time": current_time,
                "stop_time": stop_time,
            },
            progress=SchedulerProgress(
                remaining_duration=remaining_duration,
                total_duration=max_duration,
                stop_time=stop_time,
            ),
        )

__call__(state, request_info)

Evaluate constraint against current scheduler state and elapsed time.

Parameters:

Name Type Description Default
state SchedulerState

Current scheduler state with start time

required
request_info RequestInfo

Individual request information (unused)

required

Returns:

Type Description
SchedulerUpdateAction

Action indicating whether to continue or stop operations

Source code in src/guidellm/scheduler/constraints/request.py
def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against current scheduler state and elapsed time.

    :param state: Current scheduler state with start time
    :param request_info: Individual request information (unused)
    :return: Action indicating whether to continue or stop operations
    """
    _ = request_info  # Unused parameters
    current_index = max(0, self.current_index)
    max_duration = (
        self.args.seconds
        if isinstance(self.args.seconds, int | float)
        else self.args.seconds[min(current_index, len(self.args.seconds) - 1)]
    )

    start_time = state.start_requests_time or state.start_time
    current_time = time.time()
    elapsed = current_time - start_time
    duration_exceeded = elapsed >= max_duration
    remaining_duration = min(max(0.0, max_duration - elapsed), max_duration)
    stop_time = None if not duration_exceeded else start_time + max_duration

    return SchedulerUpdateAction(
        request_queuing="stop" if duration_exceeded else "continue",
        request_processing="stop_local" if duration_exceeded else "continue",
        metadata={
            "max_duration": max_duration,
            "elapsed_time": elapsed,
            "duration_exceeded": duration_exceeded,
            "start_time": start_time,
            "current_time": current_time,
            "stop_time": stop_time,
        },
        progress=SchedulerProgress(
            remaining_duration=remaining_duration,
            total_duration=max_duration,
            stop_time=stop_time,
        ),
    )

create_constraint(**_kwargs)

Return self as the constraint instance.

Parameters:

Name Type Description Default
kwargs

Additional keyword arguments (unused)

required

Returns:

Type Description
Constraint

Self instance as the constraint

Source code in src/guidellm/scheduler/constraints/request.py
def create_constraint(self, **_kwargs) -> Constraint:
    """
    Return self as the constraint instance.

    :param kwargs: Additional keyword arguments (unused)
    :return: Self instance as the constraint
    """
    self.current_index += 1

    return cast("Constraint", self.model_copy())

MaxDurationConstraintArgs

Bases: ConstraintArgs

Arguments for maximum duration constraint.

Limits benchmark execution time per strategy.

Attributes:

Name Type Description
kind Literal['max_duration']

Always "max_duration"

Source code in src/guidellm/scheduler/constraints/request.py
@ConstraintArgs.register("max_duration")
class MaxDurationConstraintArgs(ConstraintArgs):
    """
    Arguments for maximum duration constraint.

    Limits benchmark execution time per strategy.

    :cvar kind: Always "max_duration"
    """

    kind: Literal["max_duration"] = Field(
        default="max_duration",
        description="Constraint type discriminator",
    )
    seconds: PositiveNumOrList = Field(
        description="Maximum duration in seconds before stopping execution",
    )

MaxErrorRateConstraint

Bases: PydanticConstraintInitializer

Constraint that limits execution based on sliding window error rate.

Tracks error status of recent requests in a sliding window and stops all processing when the error rate exceeds the threshold. Only applies the constraint after processing enough requests to fill the minimum window size for statistical significance.

Source code in src/guidellm/scheduler/constraints/error.py
@ConstraintsInitializerFactory.register("max_error_rate")
class MaxErrorRateConstraint(PydanticConstraintInitializer):
    """
    Constraint that limits execution based on sliding window error rate.

    Tracks error status of recent requests in a sliding window and stops all
    processing when the error rate exceeds the threshold. Only applies the
    constraint after processing enough requests to fill the minimum window size
    for statistical significance.
    """

    type_: Literal["max_error_rate"] = "max_error_rate"  # type: ignore[assignment]
    args: MaxErrorRateConstraintArgs = Field(
        description="Configuration arguments for max error rate constraint",
    )
    error_window: list[bool] = Field(
        default_factory=list,
        description="Sliding window tracking error status of recent requests",
    )
    current_index: int = Field(
        default=-1, description="Current index in the error window"
    )

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Create a new instance of MaxErrorRateConstraint (due to stateful window).

        :param kwargs: Additional keyword arguments (unused)
        :return: New instance of the constraint
        """
        self.current_index += 1

        return cast("Constraint", self.model_copy())

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against sliding window error rate.

        :param state: Current scheduler state with request counts
        :param request_info: Individual request with completion status
        :return: Action indicating whether to continue or stop operations
        """
        current_index = max(0, self.current_index)
        max_error_rate = (
            self.args.rate
            if isinstance(self.args.rate, int | float)
            else self.args.rate[min(current_index, len(self.args.rate) - 1)]
        )

        if request_info.status in ["completed", "errored", "cancelled"]:
            self.error_window.append(request_info.status == "errored")
            if len(self.error_window) > self.args.window:
                self.error_window.pop(0)

        error_count = sum(self.error_window)
        window_requests = len(self.error_window)
        error_rate = (
            error_count / float(window_requests) if window_requests > 0 else 0.0
        )
        exceeded_min_processed = state.processed_requests >= self.args.window
        exceeded_error_rate = error_rate >= max_error_rate
        exceeded = exceeded_min_processed and exceeded_error_rate
        stop_time = None if not exceeded else request_info.completed_at or time.time()

        return SchedulerUpdateAction(
            request_queuing="stop" if exceeded else "continue",
            request_processing="stop_all" if exceeded else "continue",
            metadata={
                "max_error_rate": max_error_rate,
                "window_size": self.args.window,
                "error_count": error_count,
                "processed_count": state.processed_requests,
                "current_window_size": len(self.error_window),
                "current_error_rate": error_rate,
                "exceeded_min_processed": exceeded_min_processed,
                "exceeded_error_rate": exceeded_error_rate,
                "exceeded": exceeded,
                "stop_time": stop_time,
            },
        )

__call__(state, request_info)

Evaluate constraint against sliding window error rate.

Parameters:

Name Type Description Default
state SchedulerState

Current scheduler state with request counts

required
request_info RequestInfo

Individual request with completion status

required

Returns:

Type Description
SchedulerUpdateAction

Action indicating whether to continue or stop operations

Source code in src/guidellm/scheduler/constraints/error.py
def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against sliding window error rate.

    :param state: Current scheduler state with request counts
    :param request_info: Individual request with completion status
    :return: Action indicating whether to continue or stop operations
    """
    current_index = max(0, self.current_index)
    max_error_rate = (
        self.args.rate
        if isinstance(self.args.rate, int | float)
        else self.args.rate[min(current_index, len(self.args.rate) - 1)]
    )

    if request_info.status in ["completed", "errored", "cancelled"]:
        self.error_window.append(request_info.status == "errored")
        if len(self.error_window) > self.args.window:
            self.error_window.pop(0)

    error_count = sum(self.error_window)
    window_requests = len(self.error_window)
    error_rate = (
        error_count / float(window_requests) if window_requests > 0 else 0.0
    )
    exceeded_min_processed = state.processed_requests >= self.args.window
    exceeded_error_rate = error_rate >= max_error_rate
    exceeded = exceeded_min_processed and exceeded_error_rate
    stop_time = None if not exceeded else request_info.completed_at or time.time()

    return SchedulerUpdateAction(
        request_queuing="stop" if exceeded else "continue",
        request_processing="stop_all" if exceeded else "continue",
        metadata={
            "max_error_rate": max_error_rate,
            "window_size": self.args.window,
            "error_count": error_count,
            "processed_count": state.processed_requests,
            "current_window_size": len(self.error_window),
            "current_error_rate": error_rate,
            "exceeded_min_processed": exceeded_min_processed,
            "exceeded_error_rate": exceeded_error_rate,
            "exceeded": exceeded,
            "stop_time": stop_time,
        },
    )

create_constraint(**_kwargs)

Create a new instance of MaxErrorRateConstraint (due to stateful window).

Parameters:

Name Type Description Default
kwargs

Additional keyword arguments (unused)

required

Returns:

Type Description
Constraint

New instance of the constraint

Source code in src/guidellm/scheduler/constraints/error.py
def create_constraint(self, **_kwargs) -> Constraint:
    """
    Create a new instance of MaxErrorRateConstraint (due to stateful window).

    :param kwargs: Additional keyword arguments (unused)
    :return: New instance of the constraint
    """
    self.current_index += 1

    return cast("Constraint", self.model_copy())

MaxErrorRateConstraintArgs

Bases: ConstraintArgs

Arguments for maximum error rate constraint (sliding window).

Stops execution when the windowed error rate exceeds the threshold.

Attributes:

Name Type Description
kind Literal['max_error_rate']

Always "max_error_rate"

Source code in src/guidellm/scheduler/constraints/error.py
@ConstraintArgs.register("max_error_rate")
class MaxErrorRateConstraintArgs(ConstraintArgs):
    """
    Arguments for maximum error rate constraint (sliding window).

    Stops execution when the windowed error rate exceeds the threshold.

    :cvar kind: Always "max_error_rate"
    """

    kind: Literal["max_error_rate"] = Field(
        default="max_error_rate",
        description="Constraint type discriminator",
    )
    rate: ErrorRateOrList = Field(
        description="Maximum error rate (0.0 to 1.0) before stopping execution",
    )
    window: int | float = Field(
        default_factory=lambda: settings.constraint_error_window_size,
        gt=0,
        description="Size of sliding window for calculating error rate",
    )

MaxErrorsConstraint

Bases: PydanticConstraintInitializer

Constraint that limits execution based on absolute error count.

Stops both request queuing and all request processing when the total number of errored requests reaches the maximum threshold. Uses global error tracking across all requests for immediate constraint evaluation.

Source code in src/guidellm/scheduler/constraints/error.py
@ConstraintsInitializerFactory.register("max_errors")
class MaxErrorsConstraint(PydanticConstraintInitializer):
    """
    Constraint that limits execution based on absolute error count.

    Stops both request queuing and all request processing when the total number
    of errored requests reaches the maximum threshold. Uses global error tracking
    across all requests for immediate constraint evaluation.
    """

    type_: Literal["max_errors"] = "max_errors"  # type: ignore[assignment]
    args: MaxErrorsConstraintArgs = Field(
        description="Configuration arguments for max errors constraint",
    )
    current_index: int = Field(default=-1, description="Current index in error list")

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Return self as the constraint instance.

        :param kwargs: Additional keyword arguments (unused)
        :return: Self instance as the constraint
        """
        self.current_index += 1

        return cast("Constraint", self.model_copy())

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against current error count.

        :param state: Current scheduler state with error counts
        :param request_info: Individual request information (unused)
        :return: Action indicating whether to continue or stop operations
        """
        _ = request_info  # Unused parameters
        current_index = max(0, self.current_index)
        max_errors = (
            self.args.count
            if isinstance(self.args.count, int | float)
            else self.args.count[min(current_index, len(self.args.count) - 1)]
        )
        errors_exceeded = state.errored_requests >= max_errors
        stop_time = (
            None if not errors_exceeded else request_info.completed_at or time.time()
        )

        return SchedulerUpdateAction(
            request_queuing="stop" if errors_exceeded else "continue",
            request_processing="stop_all" if errors_exceeded else "continue",
            metadata={
                "max_errors": max_errors,
                "errors_exceeded": errors_exceeded,
                "current_errors": state.errored_requests,
                "stop_time": stop_time,
            },
            progress=SchedulerProgress(stop_time=stop_time),
        )

__call__(state, request_info)

Evaluate constraint against current error count.

Parameters:

Name Type Description Default
state SchedulerState

Current scheduler state with error counts

required
request_info RequestInfo

Individual request information (unused)

required

Returns:

Type Description
SchedulerUpdateAction

Action indicating whether to continue or stop operations

Source code in src/guidellm/scheduler/constraints/error.py
def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against current error count.

    :param state: Current scheduler state with error counts
    :param request_info: Individual request information (unused)
    :return: Action indicating whether to continue or stop operations
    """
    _ = request_info  # Unused parameters
    current_index = max(0, self.current_index)
    max_errors = (
        self.args.count
        if isinstance(self.args.count, int | float)
        else self.args.count[min(current_index, len(self.args.count) - 1)]
    )
    errors_exceeded = state.errored_requests >= max_errors
    stop_time = (
        None if not errors_exceeded else request_info.completed_at or time.time()
    )

    return SchedulerUpdateAction(
        request_queuing="stop" if errors_exceeded else "continue",
        request_processing="stop_all" if errors_exceeded else "continue",
        metadata={
            "max_errors": max_errors,
            "errors_exceeded": errors_exceeded,
            "current_errors": state.errored_requests,
            "stop_time": stop_time,
        },
        progress=SchedulerProgress(stop_time=stop_time),
    )

create_constraint(**_kwargs)

Return self as the constraint instance.

Parameters:

Name Type Description Default
kwargs

Additional keyword arguments (unused)

required

Returns:

Type Description
Constraint

Self instance as the constraint

Source code in src/guidellm/scheduler/constraints/error.py
def create_constraint(self, **_kwargs) -> Constraint:
    """
    Return self as the constraint instance.

    :param kwargs: Additional keyword arguments (unused)
    :return: Self instance as the constraint
    """
    self.current_index += 1

    return cast("Constraint", self.model_copy())

MaxErrorsConstraintArgs

Bases: ConstraintArgs

Arguments for maximum error count constraint.

Stops execution when total errors reach the threshold.

Attributes:

Name Type Description
kind Literal['max_errors']

Always "max_errors"

Source code in src/guidellm/scheduler/constraints/error.py
@ConstraintArgs.register("max_errors")
class MaxErrorsConstraintArgs(ConstraintArgs):
    """
    Arguments for maximum error count constraint.

    Stops execution when total errors reach the threshold.

    :cvar kind: Always "max_errors"
    """

    kind: Literal["max_errors"] = Field(
        default="max_errors",
        description="Constraint type discriminator",
    )
    count: PositiveNumOrList = Field(
        description="Maximum number of errors before stopping execution",
    )

MaxGlobalErrorRateConstraint

Bases: PydanticConstraintInitializer

Constraint that limits execution based on global error rate.

Calculates error rate across all processed requests and stops all processing when the rate exceeds the threshold. Only applies the constraint after processing the minimum number of requests to ensure statistical significance for global error rate calculations.

Source code in src/guidellm/scheduler/constraints/error.py
@ConstraintsInitializerFactory.register("max_global_error_rate")
class MaxGlobalErrorRateConstraint(PydanticConstraintInitializer):
    """
    Constraint that limits execution based on global error rate.

    Calculates error rate across all processed requests and stops all processing
    when the rate exceeds the threshold. Only applies the constraint after
    processing the minimum number of requests to ensure statistical significance
    for global error rate calculations.
    """

    type_: Literal["max_global_error_rate"] = "max_global_error_rate"  # type: ignore[assignment]
    args: MaxGlobalErrorRateConstraintArgs = Field(
        description="Configuration arguments for max global error rate constraint",
    )
    current_index: int = Field(
        default=-1, description="Current index for list-based max_error_rate values"
    )

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Return self as the constraint instance.

        :param kwargs: Additional keyword arguments (unused)
        :return: Self instance as the constraint
        """
        self.current_index += 1

        return cast("Constraint", self.model_copy())

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against global error rate.

        :param state: Current scheduler state with global request and error counts
        :param request_info: Individual request information (unused)
        :return: Action indicating whether to continue or stop operations
        """
        _ = request_info  # Unused parameters
        current_index = max(0, self.current_index)
        max_error_rate = (
            self.args.rate
            if isinstance(self.args.rate, int | float)
            else self.args.rate[min(current_index, len(self.args.rate) - 1)]
        )

        exceeded_min_processed = (
            self.args.minimum is None or state.processed_requests >= self.args.minimum
        )
        error_rate = (
            state.errored_requests / float(state.processed_requests)
            if state.processed_requests > 0
            else 0.0
        )
        exceeded_error_rate = error_rate >= max_error_rate
        exceeded = exceeded_min_processed and exceeded_error_rate
        stop_time = None if not exceeded else request_info.completed_at or time.time()

        return SchedulerUpdateAction(
            request_queuing="stop" if exceeded else "continue",
            request_processing="stop_all" if exceeded else "continue",
            metadata={
                "max_error_rate": max_error_rate,
                "min_processed": self.args.minimum,
                "processed_requests": state.processed_requests,
                "errored_requests": state.errored_requests,
                "error_rate": error_rate,
                "exceeded_min_processed": exceeded_min_processed,
                "exceeded_error_rate": exceeded_error_rate,
                "exceeded": exceeded,
                "stop_time": stop_time,
            },
            progress=SchedulerProgress(stop_time=stop_time),
        )

__call__(state, request_info)

Evaluate constraint against global error rate.

Parameters:

Name Type Description Default
state SchedulerState

Current scheduler state with global request and error counts

required
request_info RequestInfo

Individual request information (unused)

required

Returns:

Type Description
SchedulerUpdateAction

Action indicating whether to continue or stop operations

Source code in src/guidellm/scheduler/constraints/error.py
def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against global error rate.

    :param state: Current scheduler state with global request and error counts
    :param request_info: Individual request information (unused)
    :return: Action indicating whether to continue or stop operations
    """
    _ = request_info  # Unused parameters
    current_index = max(0, self.current_index)
    max_error_rate = (
        self.args.rate
        if isinstance(self.args.rate, int | float)
        else self.args.rate[min(current_index, len(self.args.rate) - 1)]
    )

    exceeded_min_processed = (
        self.args.minimum is None or state.processed_requests >= self.args.minimum
    )
    error_rate = (
        state.errored_requests / float(state.processed_requests)
        if state.processed_requests > 0
        else 0.0
    )
    exceeded_error_rate = error_rate >= max_error_rate
    exceeded = exceeded_min_processed and exceeded_error_rate
    stop_time = None if not exceeded else request_info.completed_at or time.time()

    return SchedulerUpdateAction(
        request_queuing="stop" if exceeded else "continue",
        request_processing="stop_all" if exceeded else "continue",
        metadata={
            "max_error_rate": max_error_rate,
            "min_processed": self.args.minimum,
            "processed_requests": state.processed_requests,
            "errored_requests": state.errored_requests,
            "error_rate": error_rate,
            "exceeded_min_processed": exceeded_min_processed,
            "exceeded_error_rate": exceeded_error_rate,
            "exceeded": exceeded,
            "stop_time": stop_time,
        },
        progress=SchedulerProgress(stop_time=stop_time),
    )

create_constraint(**_kwargs)

Return self as the constraint instance.

Parameters:

Name Type Description Default
kwargs

Additional keyword arguments (unused)

required

Returns:

Type Description
Constraint

Self instance as the constraint

Source code in src/guidellm/scheduler/constraints/error.py
def create_constraint(self, **_kwargs) -> Constraint:
    """
    Return self as the constraint instance.

    :param kwargs: Additional keyword arguments (unused)
    :return: Self instance as the constraint
    """
    self.current_index += 1

    return cast("Constraint", self.model_copy())

MaxGlobalErrorRateConstraintArgs

Bases: ConstraintArgs

Arguments for maximum global error rate constraint.

Stops execution when the overall error rate across all requests exceeds the threshold. Only applies after min_processed requests are completed.

Attributes:

Name Type Description
kind Literal['max_global_error_rate']

Always "max_global_error_rate"

Source code in src/guidellm/scheduler/constraints/error.py
@ConstraintArgs.register("max_global_error_rate")
class MaxGlobalErrorRateConstraintArgs(ConstraintArgs):
    """
    Arguments for maximum global error rate constraint.

    Stops execution when the overall error rate across all requests exceeds
    the threshold. Only applies after min_processed requests are completed.

    :cvar kind: Always "max_global_error_rate"
    """

    kind: Literal["max_global_error_rate"] = Field(
        default="max_global_error_rate",
        description="Constraint type discriminator",
    )
    rate: ErrorRateOrList = Field(
        description="Maximum global error rate (0.0 to 1.0) before stopping",
    )
    minimum: int | float | None = Field(
        default_factory=lambda: settings.constraint_error_min_processed,
        gt=0,
        description="Minimum requests processed before applying error rate constraint",
    )

MaxNumberConstraint

Bases: PydanticConstraintInitializer

Constraint that limits execution based on maximum request counts.

Stops request queuing when created requests reach the limit and stops local request processing when processed requests reach the limit. Provides progress tracking based on remaining requests and completion fraction.

Source code in src/guidellm/scheduler/constraints/request.py
@ConstraintsInitializerFactory.register("max_requests")
class MaxNumberConstraint(PydanticConstraintInitializer):
    """
    Constraint that limits execution based on maximum request counts.

    Stops request queuing when created requests reach the limit and stops local
    request processing when processed requests reach the limit. Provides progress
    tracking based on remaining requests and completion fraction.
    """

    type_: Literal["max_requests"] = "max_requests"  # type: ignore[assignment]
    args: MaxRequestsConstraintArgs = Field(
        description="Configuration arguments for max request count constraint",
    )
    current_index: int = Field(
        default=-1, description="Current index for list-based max_num values"
    )

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Return self as the constraint instance.

        :param kwargs: Additional keyword arguments (unused)
        :return: Self instance as the constraint
        """
        self.current_index += 1

        return cast("Constraint", self.model_copy())

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against current scheduler state and request count.

        :param state: Current scheduler state with request counts
        :param request_info: Individual request information (unused)
        :return: Action indicating whether to continue or stop operations
        """
        _ = request_info  # Unused parameters
        current_index = max(0, self.current_index)
        max_num = (
            self.args.count
            if isinstance(self.args.count, int | float)
            else self.args.count[min(current_index, len(self.args.count) - 1)]
        )

        create_exceeded = state.created_requests >= max_num
        processed_exceeded = state.processed_requests >= max_num
        remaining_requests = min(max(0, max_num - state.processed_requests), max_num)
        stop_time = (
            None if remaining_requests > 0 else request_info.completed_at or time.time()
        )

        return SchedulerUpdateAction(
            request_queuing="stop" if create_exceeded else "continue",
            request_processing="stop_local" if processed_exceeded else "continue",
            metadata={
                "max_requests": max_num,
                "create_exceeded": create_exceeded,
                "processed_exceeded": processed_exceeded,
                "created_requests": state.created_requests,
                "processed_requests": state.processed_requests,
                "remaining_requests": remaining_requests,
                "stop_time": stop_time,
            },
            progress=SchedulerProgress(
                remaining_requests=remaining_requests,
                total_requests=max_num,
                stop_time=stop_time,
            ),
        )

__call__(state, request_info)

Evaluate constraint against current scheduler state and request count.

Parameters:

Name Type Description Default
state SchedulerState

Current scheduler state with request counts

required
request_info RequestInfo

Individual request information (unused)

required

Returns:

Type Description
SchedulerUpdateAction

Action indicating whether to continue or stop operations

Source code in src/guidellm/scheduler/constraints/request.py
def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against current scheduler state and request count.

    :param state: Current scheduler state with request counts
    :param request_info: Individual request information (unused)
    :return: Action indicating whether to continue or stop operations
    """
    _ = request_info  # Unused parameters
    current_index = max(0, self.current_index)
    max_num = (
        self.args.count
        if isinstance(self.args.count, int | float)
        else self.args.count[min(current_index, len(self.args.count) - 1)]
    )

    create_exceeded = state.created_requests >= max_num
    processed_exceeded = state.processed_requests >= max_num
    remaining_requests = min(max(0, max_num - state.processed_requests), max_num)
    stop_time = (
        None if remaining_requests > 0 else request_info.completed_at or time.time()
    )

    return SchedulerUpdateAction(
        request_queuing="stop" if create_exceeded else "continue",
        request_processing="stop_local" if processed_exceeded else "continue",
        metadata={
            "max_requests": max_num,
            "create_exceeded": create_exceeded,
            "processed_exceeded": processed_exceeded,
            "created_requests": state.created_requests,
            "processed_requests": state.processed_requests,
            "remaining_requests": remaining_requests,
            "stop_time": stop_time,
        },
        progress=SchedulerProgress(
            remaining_requests=remaining_requests,
            total_requests=max_num,
            stop_time=stop_time,
        ),
    )

create_constraint(**_kwargs)

Return self as the constraint instance.

Parameters:

Name Type Description Default
kwargs

Additional keyword arguments (unused)

required

Returns:

Type Description
Constraint

Self instance as the constraint

Source code in src/guidellm/scheduler/constraints/request.py
def create_constraint(self, **_kwargs) -> Constraint:
    """
    Return self as the constraint instance.

    :param kwargs: Additional keyword arguments (unused)
    :return: Self instance as the constraint
    """
    self.current_index += 1

    return cast("Constraint", self.model_copy())

MaxRequestsConstraintArgs

Bases: ConstraintArgs

Arguments for maximum request count constraint.

Limits the number of requests processed per strategy.

Attributes:

Name Type Description
kind Literal['max_requests']

Always "max_requests"

Source code in src/guidellm/scheduler/constraints/request.py
@ConstraintArgs.register("max_requests")
class MaxRequestsConstraintArgs(ConstraintArgs):
    """
    Arguments for maximum request count constraint.

    Limits the number of requests processed per strategy.

    :cvar kind: Always "max_requests"
    """

    kind: Literal["max_requests"] = Field(
        default="max_requests",
        description="Constraint type discriminator",
    )
    count: PositiveNumOrList = Field(
        description="Maximum number of requests before stopping execution",
    )

OverSaturationConstraint

Bases: Constraint

Constraint that detects and stops execution when over-saturation is detected.

This constraint implements the Over-Saturation Detection (OSD) algorithm to identify when a model becomes over-saturated (response rate doesn't keep up with request rate). When over-saturation is detected, the constraint stops request queuing and optionally stops processing of existing requests.

The constraint maintains internal state for tracking concurrent requests and time-to-first-token (TTFT) metrics, using statistical slope detection to identify performance degradation patterns.

Source code in src/guidellm/scheduler/constraints/saturation.py
class OverSaturationConstraint(Constraint):
    """
    Constraint that detects and stops execution when over-saturation is detected.

    This constraint implements the Over-Saturation Detection (OSD) algorithm to
    identify when a model becomes over-saturated (response rate doesn't keep up with
    request rate). When over-saturation is detected, the constraint stops request
    queuing and optionally stops processing of existing requests.

    The constraint maintains internal state for tracking concurrent requests and
    time-to-first-token (TTFT) metrics, using statistical slope detection to identify
    performance degradation patterns.
    """

    def __init__(
        self,
        minimum_duration: float = 30.0,
        minimum_ttft: float = 2.5,
        maximum_window_seconds: float = 120.0,
        moe_threshold: float = 2.0,
        maximum_window_ratio: float = 0.75,
        minimum_window_size: int = 5,
        confidence: float = 0.95,
        eps: float = 1e-12,
        mode: Literal["enforce", "monitor"] = "enforce",
    ) -> None:  # noqa: PLR0913
        """
        Initialize the over-saturation constraint.

        Creates a new constraint instance with specified detection parameters.
        The constraint will track concurrent requests and TTFT metrics, using
        statistical slope detection to identify when the model becomes
        over-saturated. All parameters have sensible defaults suitable for
        most benchmarking scenarios.

        :param minimum_duration: Minimum seconds before checking for over-saturation
            (default: 30.0)
        :param minimum_ttft: Minimum TTFT threshold in seconds for violation counting
            (default: 2.5)
        :param maximum_window_seconds: Maximum time window in seconds for data retention
            (default: 120.0)
        :param moe_threshold: Margin of error threshold for slope detection
            (default: 2.0)
        :param maximum_window_ratio: Maximum window size as ratio of total requests
            (default: 0.75)
        :param minimum_window_size: Minimum data points required for slope estimation
            (default: 5)
        :param confidence: Statistical confidence level for t-distribution (0-1)
            (default: 0.95)
        :param eps: Epsilon for numerical stability in calculations
            (default: 1e-12)
        :param mode: Whether to stop when over-saturation is detected, or only monitor
            (default: "enforce")
        """
        self.minimum_duration = minimum_duration
        self.minimum_ttft = minimum_ttft
        self.maximum_window_seconds = maximum_window_seconds
        self.maximum_window_ratio = maximum_window_ratio
        self.minimum_window_size = minimum_window_size
        self.moe_threshold = moe_threshold
        self.confidence = confidence
        self.eps = eps
        self.mode = mode
        self.reset()

    @property
    def info(self) -> dict[str, Any]:
        """
        Get current constraint configuration and state information.
        :return: Dictionary containing configuration parameters.
        """

        return {
            "type_": "over_saturation",
            "minimum_duration": self.minimum_duration,
            "minimum_ttft": self.minimum_ttft,
            "maximum_window_seconds": self.maximum_window_seconds,
            "maximum_window_ratio": self.maximum_window_ratio,
            "minimum_window_size": self.minimum_window_size,
            "moe_threshold": self.moe_threshold,
            "confidence": self.confidence,
            "mode": self.mode,
        }

    def reset(self) -> None:
        """
        Reset all internal state to initial values.

        Clears all tracked requests, resets counters, and reinitializes slope
        checkers. Useful for reusing constraint instances across multiple
        benchmark runs or resetting state after configuration changes.
        """
        self.duration = 0.0
        self.started_requests: list[dict[str, Any]] = []
        self.finished_requests: list[dict[str, Any]] = []
        self.ttft_violations_counter = 0
        self.total_finished_ever = 0
        self.total_started_ever = 0
        self._ttft_reported_request_ids: set[str] = set()
        self.concurrent_slope_checker = SlopeChecker(
            moe_threshold=self.moe_threshold, confidence=self.confidence, eps=self.eps
        )
        self.ttft_slope_checker = SlopeChecker(
            moe_threshold=self.moe_threshold, confidence=self.confidence, eps=self.eps
        )

    def _add_finished(self, request: dict[str, Any]) -> None:
        """
        Add a finished request to tracking.

        :param request: Dictionary containing request data with 'ttft' and
            'duration' keys.
        """
        ttft = request["ttft"]
        duration = request["duration"]
        if ttft is not None:
            self.total_finished_ever += 1
            self.finished_requests.append(request)
            if ttft > self.minimum_ttft:
                self.ttft_violations_counter += 1
            self.ttft_slope_checker.add_data_point(duration, ttft)

    def _remove_finished(self, request: dict[str, Any]) -> None:
        """
        Remove a finished request from tracking.

        :param request: Dictionary containing request data with 'ttft' and
            'duration' keys.
        """
        del self.finished_requests[0]
        ttft = request["ttft"]
        duration = request["duration"]
        if ttft > self.minimum_ttft:
            self.ttft_violations_counter -= 1
        self.ttft_slope_checker.remove_data_point(duration, ttft)

    def _add_started(self, request: dict[str, Any]) -> None:
        """
        Add a started request to tracking.

        :param request: Dictionary containing request data with
            'concurrent_requests' and 'duration' keys.
        """
        concurrent = request["concurrent_requests"]
        duration = request["duration"]
        if concurrent is not None:
            self.total_started_ever += 1
            self.started_requests.append(request)
            self.concurrent_slope_checker.add_data_point(duration, concurrent)

    def _remove_started(self, request: dict[str, Any]) -> None:
        """
        Remove a started request from tracking.

        :param request: Dictionary containing request data with
            'concurrent_requests' and 'duration' keys.
        """
        del self.started_requests[0]
        concurrent = request["concurrent_requests"]
        duration = request["duration"]
        self.concurrent_slope_checker.remove_data_point(duration, concurrent)

    def _update_duration(self, duration: float) -> None:
        """
        Update duration and prune old data points.

        Updates the current duration and removes data points that exceed the maximum
        window size (by ratio or time) to maintain bounded memory usage.

        :param duration: Current duration in seconds since benchmark start.
        """
        self.duration = duration

        maximum_finished_window_size = int(
            self.total_finished_ever * self.maximum_window_ratio
        )
        while len(self.finished_requests) > maximum_finished_window_size:
            self._remove_finished(self.finished_requests[0])

        while (len(self.finished_requests) > 0) and (
            (
                time_since_earliest_request := duration
                - self.finished_requests[0]["duration"]
            )
            > self.maximum_window_seconds
        ):
            self._remove_finished(self.finished_requests[0])

        maximum_started_window_size = int(
            self.total_started_ever * self.maximum_window_ratio
        )
        while len(self.started_requests) > maximum_started_window_size:
            self._remove_started(self.started_requests[0])

        while (len(self.started_requests) > 0) and (
            (
                time_since_earliest_request := duration  # noqa: F841
                - self.started_requests[0]["duration"]
            )
            > self.maximum_window_seconds
        ):
            self._remove_started(self.started_requests[0])

    def _check_alert(self) -> bool:
        """
        Check if over-saturation is currently detected.

        :return: True if over-saturation is detected, False otherwise.
        """
        # Use duration as the maximum n value since requests from the
        # same second are highly correlated, this is simple and good enough
        # given that the MOE has a custom threshold anyway.
        concurrent_n = min(self.duration, self.concurrent_slope_checker.n)
        ttft_n = min(self.duration, self.ttft_slope_checker.n)

        if (
            (self.duration < self.minimum_duration)
            or (self.ttft_slope_checker.n > self.ttft_violations_counter * 2)
            or (self.duration < self.minimum_ttft)
            or (concurrent_n < self.minimum_window_size)
        ):
            return False

        is_concurrent_slope_positive = self.concurrent_slope_checker.check_slope(
            concurrent_n
        )

        if ttft_n < self.minimum_window_size:
            return is_concurrent_slope_positive

        is_ttft_slope_positive = self.ttft_slope_checker.check_slope(ttft_n)

        return is_concurrent_slope_positive and is_ttft_slope_positive

    def __call__(
        self, state: SchedulerState, request_info: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Evaluate constraint against current scheduler state.

        :param state: Current scheduler state.
        :param request_info: Individual request information.
        :return: Action indicating whether to continue or stop operations.
        """
        duration = time.time() - state.start_time

        if request_info.status == "in_progress":
            concurrent_requests = state.processing_requests
            self._add_started(
                {"concurrent_requests": concurrent_requests, "duration": duration}
            )
        elif request_info.status in ("first_token", "completed"):
            if (
                request_info.request_id not in self._ttft_reported_request_ids
                and request_info.timings
                and request_info.timings.first_token_iteration
                and request_info.timings.request_start
            ):
                self._ttft_reported_request_ids.add(request_info.request_id)
                ttft = (
                    request_info.timings.first_token_iteration
                    - request_info.timings.request_start
                )
                self._add_finished({"ttft": ttft, "duration": duration})

        self._update_duration(duration)
        is_over_saturated = self._check_alert()

        ttft_slope = self.ttft_slope_checker.slope
        ttft_slope_moe = self.ttft_slope_checker.margin_of_error
        ttft_n = self.ttft_slope_checker.n
        ttft_violations = self.ttft_violations_counter
        concurrent_slope = self.concurrent_slope_checker.slope
        concurrent_slope_moe = self.concurrent_slope_checker.margin_of_error
        concurrent_n = self.concurrent_slope_checker.n

        should_stop = is_over_saturated and self.mode == "enforce"
        return SchedulerUpdateAction(
            request_queuing="stop" if should_stop else "continue",
            request_processing="stop_all" if should_stop else "continue",
            metadata={
                "ttft_slope": ttft_slope,
                "ttft_slope_moe": ttft_slope_moe,
                "ttft_n": ttft_n,
                "ttft_violations": ttft_violations,
                "concurrent_slope": concurrent_slope,
                "concurrent_slope_moe": concurrent_slope_moe,
                "concurrent_n": concurrent_n,
                "is_over_saturated": is_over_saturated,
            },
        )

info property

Get current constraint configuration and state information.

Returns:

Type Description
dict[str, Any]

Dictionary containing configuration parameters.

__call__(state, request_info)

Evaluate constraint against current scheduler state.

Parameters:

Name Type Description Default
state SchedulerState

Current scheduler state.

required
request_info RequestInfo

Individual request information.

required

Returns:

Type Description
SchedulerUpdateAction

Action indicating whether to continue or stop operations.

Source code in src/guidellm/scheduler/constraints/saturation.py
def __call__(
    self, state: SchedulerState, request_info: RequestInfo
) -> SchedulerUpdateAction:
    """
    Evaluate constraint against current scheduler state.

    :param state: Current scheduler state.
    :param request_info: Individual request information.
    :return: Action indicating whether to continue or stop operations.
    """
    duration = time.time() - state.start_time

    if request_info.status == "in_progress":
        concurrent_requests = state.processing_requests
        self._add_started(
            {"concurrent_requests": concurrent_requests, "duration": duration}
        )
    elif request_info.status in ("first_token", "completed"):
        if (
            request_info.request_id not in self._ttft_reported_request_ids
            and request_info.timings
            and request_info.timings.first_token_iteration
            and request_info.timings.request_start
        ):
            self._ttft_reported_request_ids.add(request_info.request_id)
            ttft = (
                request_info.timings.first_token_iteration
                - request_info.timings.request_start
            )
            self._add_finished({"ttft": ttft, "duration": duration})

    self._update_duration(duration)
    is_over_saturated = self._check_alert()

    ttft_slope = self.ttft_slope_checker.slope
    ttft_slope_moe = self.ttft_slope_checker.margin_of_error
    ttft_n = self.ttft_slope_checker.n
    ttft_violations = self.ttft_violations_counter
    concurrent_slope = self.concurrent_slope_checker.slope
    concurrent_slope_moe = self.concurrent_slope_checker.margin_of_error
    concurrent_n = self.concurrent_slope_checker.n

    should_stop = is_over_saturated and self.mode == "enforce"
    return SchedulerUpdateAction(
        request_queuing="stop" if should_stop else "continue",
        request_processing="stop_all" if should_stop else "continue",
        metadata={
            "ttft_slope": ttft_slope,
            "ttft_slope_moe": ttft_slope_moe,
            "ttft_n": ttft_n,
            "ttft_violations": ttft_violations,
            "concurrent_slope": concurrent_slope,
            "concurrent_slope_moe": concurrent_slope_moe,
            "concurrent_n": concurrent_n,
            "is_over_saturated": is_over_saturated,
        },
    )

__init__(minimum_duration=30.0, minimum_ttft=2.5, maximum_window_seconds=120.0, moe_threshold=2.0, maximum_window_ratio=0.75, minimum_window_size=5, confidence=0.95, eps=1e-12, mode='enforce')

Initialize the over-saturation constraint.

Creates a new constraint instance with specified detection parameters. The constraint will track concurrent requests and TTFT metrics, using statistical slope detection to identify when the model becomes over-saturated. All parameters have sensible defaults suitable for most benchmarking scenarios.

Parameters:

Name Type Description Default
minimum_duration float

Minimum seconds before checking for over-saturation (default: 30.0)

30.0
minimum_ttft float

Minimum TTFT threshold in seconds for violation counting (default: 2.5)

2.5
maximum_window_seconds float

Maximum time window in seconds for data retention (default: 120.0)

120.0
moe_threshold float

Margin of error threshold for slope detection (default: 2.0)

2.0
maximum_window_ratio float

Maximum window size as ratio of total requests (default: 0.75)

0.75
minimum_window_size int

Minimum data points required for slope estimation (default: 5)

5
confidence float

Statistical confidence level for t-distribution (0-1) (default: 0.95)

0.95
eps float

Epsilon for numerical stability in calculations (default: 1e-12)

1e-12
mode Literal['enforce', 'monitor']

Whether to stop when over-saturation is detected, or only monitor (default: "enforce")

'enforce'
Source code in src/guidellm/scheduler/constraints/saturation.py
def __init__(
    self,
    minimum_duration: float = 30.0,
    minimum_ttft: float = 2.5,
    maximum_window_seconds: float = 120.0,
    moe_threshold: float = 2.0,
    maximum_window_ratio: float = 0.75,
    minimum_window_size: int = 5,
    confidence: float = 0.95,
    eps: float = 1e-12,
    mode: Literal["enforce", "monitor"] = "enforce",
) -> None:  # noqa: PLR0913
    """
    Initialize the over-saturation constraint.

    Creates a new constraint instance with specified detection parameters.
    The constraint will track concurrent requests and TTFT metrics, using
    statistical slope detection to identify when the model becomes
    over-saturated. All parameters have sensible defaults suitable for
    most benchmarking scenarios.

    :param minimum_duration: Minimum seconds before checking for over-saturation
        (default: 30.0)
    :param minimum_ttft: Minimum TTFT threshold in seconds for violation counting
        (default: 2.5)
    :param maximum_window_seconds: Maximum time window in seconds for data retention
        (default: 120.0)
    :param moe_threshold: Margin of error threshold for slope detection
        (default: 2.0)
    :param maximum_window_ratio: Maximum window size as ratio of total requests
        (default: 0.75)
    :param minimum_window_size: Minimum data points required for slope estimation
        (default: 5)
    :param confidence: Statistical confidence level for t-distribution (0-1)
        (default: 0.95)
    :param eps: Epsilon for numerical stability in calculations
        (default: 1e-12)
    :param mode: Whether to stop when over-saturation is detected, or only monitor
        (default: "enforce")
    """
    self.minimum_duration = minimum_duration
    self.minimum_ttft = minimum_ttft
    self.maximum_window_seconds = maximum_window_seconds
    self.maximum_window_ratio = maximum_window_ratio
    self.minimum_window_size = minimum_window_size
    self.moe_threshold = moe_threshold
    self.confidence = confidence
    self.eps = eps
    self.mode = mode
    self.reset()

reset()

Reset all internal state to initial values.

Clears all tracked requests, resets counters, and reinitializes slope checkers. Useful for reusing constraint instances across multiple benchmark runs or resetting state after configuration changes.

Source code in src/guidellm/scheduler/constraints/saturation.py
def reset(self) -> None:
    """
    Reset all internal state to initial values.

    Clears all tracked requests, resets counters, and reinitializes slope
    checkers. Useful for reusing constraint instances across multiple
    benchmark runs or resetting state after configuration changes.
    """
    self.duration = 0.0
    self.started_requests: list[dict[str, Any]] = []
    self.finished_requests: list[dict[str, Any]] = []
    self.ttft_violations_counter = 0
    self.total_finished_ever = 0
    self.total_started_ever = 0
    self._ttft_reported_request_ids: set[str] = set()
    self.concurrent_slope_checker = SlopeChecker(
        moe_threshold=self.moe_threshold, confidence=self.confidence, eps=self.eps
    )
    self.ttft_slope_checker = SlopeChecker(
        moe_threshold=self.moe_threshold, confidence=self.confidence, eps=self.eps
    )

OverSaturationConstraintArgs

Bases: ConstraintArgs

Arguments for over-saturation detection constraint.

Detects when a model becomes over-saturated using statistical slope analysis of concurrent requests and time-to-first-token metrics.

Attributes:

Name Type Description
kind Literal['over_saturation']

Always "over_saturation"

Source code in src/guidellm/scheduler/constraints/saturation.py
@ConstraintArgs.register("over_saturation")
class OverSaturationConstraintArgs(ConstraintArgs):
    """
    Arguments for over-saturation detection constraint.

    Detects when a model becomes over-saturated using statistical slope analysis
    of concurrent requests and time-to-first-token metrics.

    :cvar kind: Always "over_saturation"
    """

    kind: Literal["over_saturation"] = Field(
        default="over_saturation",
        description="Constraint type discriminator",
    )
    mode: Literal["enforce", "monitor"] = Field(
        default="enforce",
        description=(
            "Whether to stop the benchmark if over-saturation is detected. "
            "Set to `enforce` to stop the benchmark if over-saturation is "
            "detected, and `monitor` to only report over-saturation."
        ),
    )
    min_seconds: int | float = Field(
        default=30.0,
        ge=0,
        description="Minimum seconds before checking for over-saturation",
    )
    max_window_seconds: int | float = Field(
        default=120.0,
        ge=0,
        description="Maximum over-saturation checking window size in seconds",
    )
    moe_threshold: float = Field(
        default=2.0,
        ge=0,
        description="Margin of error threshold for slope detection",
    )
    minimum_ttft: float = Field(
        default=2.5,
        ge=0,
        description="Minimum TTFT threshold for violation counting",
    )
    maximum_window_ratio: float = Field(
        default=0.75,
        ge=0,
        le=1.0,
        description="Maximum window size as ratio of total requests",
    )
    minimum_window_size: int = Field(
        default=5,
        ge=0,
        description="Minimum data points required for slope estimation",
    )
    confidence: float = Field(
        default=0.95,
        ge=0,
        le=1.0,
        description="Statistical confidence level for t-distribution",
    )

    @property
    def constraint_key(self) -> str:
        return "over_saturation"

OverSaturationConstraintInitializer

Bases: PydanticConstraintInitializer

Factory for creating OverSaturationConstraint instances from configuration.

Stores an OverSaturationConstraintArgs instance and delegates to OverSaturationConstraint in create_constraint().

Example: ::

from guidellm.scheduler.constraints import OverSaturationConstraintArgs

args = OverSaturationConstraintArgs(mode="enforce", min_seconds=60.0)
initializer = OverSaturationConstraintInitializer(args=args)
constraint = initializer.create_constraint()
Source code in src/guidellm/scheduler/constraints/saturation.py
@ConstraintsInitializerFactory.register("over_saturation")
class OverSaturationConstraintInitializer(PydanticConstraintInitializer):
    """
    Factory for creating OverSaturationConstraint instances from configuration.

    Stores an ``OverSaturationConstraintArgs`` instance and delegates to
    ``OverSaturationConstraint`` in ``create_constraint()``.

    Example:
    ::

        from guidellm.scheduler.constraints import OverSaturationConstraintArgs

        args = OverSaturationConstraintArgs(mode="enforce", min_seconds=60.0)
        initializer = OverSaturationConstraintInitializer(args=args)
        constraint = initializer.create_constraint()
    """

    type_: Literal["over_saturation"] = "over_saturation"  # type: ignore[assignment]
    args: OverSaturationConstraintArgs = Field(
        default_factory=OverSaturationConstraintArgs,
        description="Configuration arguments for over-saturation detection",
    )

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Create an OverSaturationConstraint instance from stored args.

        :param _kwargs: Additional keyword arguments (unused)
        :return: Configured OverSaturationConstraint instance ready for use
        """
        return OverSaturationConstraint(
            minimum_duration=self.args.min_seconds,
            minimum_ttft=self.args.minimum_ttft,
            maximum_window_seconds=self.args.max_window_seconds,
            moe_threshold=self.args.moe_threshold,
            maximum_window_ratio=self.args.maximum_window_ratio,
            minimum_window_size=self.args.minimum_window_size,
            confidence=self.args.confidence,
            mode=self.args.mode,
        )

create_constraint(**_kwargs)

Create an OverSaturationConstraint instance from stored args.

Parameters:

Name Type Description Default
_kwargs

Additional keyword arguments (unused)

{}

Returns:

Type Description
Constraint

Configured OverSaturationConstraint instance ready for use

Source code in src/guidellm/scheduler/constraints/saturation.py
def create_constraint(self, **_kwargs) -> Constraint:
    """
    Create an OverSaturationConstraint instance from stored args.

    :param _kwargs: Additional keyword arguments (unused)
    :return: Configured OverSaturationConstraint instance ready for use
    """
    return OverSaturationConstraint(
        minimum_duration=self.args.min_seconds,
        minimum_ttft=self.args.minimum_ttft,
        maximum_window_seconds=self.args.max_window_seconds,
        moe_threshold=self.args.moe_threshold,
        maximum_window_ratio=self.args.maximum_window_ratio,
        minimum_window_size=self.args.minimum_window_size,
        confidence=self.args.confidence,
        mode=self.args.mode,
    )

PydanticConstraintInitializer

Bases: StandardBaseModel, ABC, InfoMixin

Abstract base for Pydantic-based constraint initializers.

Provides standardized serialization, validation, and metadata handling for constraint initializers using Pydantic models. Subclasses implement specific constraint creation logic while inheriting validation and persistence support. Integrates with the constraint factory system for dynamic instantiation and configuration management.

Example: :: @ConstraintsInitializerFactory.register("max_duration") class MaxDurationConstraintInitializer(PydanticConstraintInitializer): type_: str = "max_duration" max_seconds: float = Field(description="Maximum duration in seconds")

    def create_constraint(self) -> Constraint:
        def evaluate(state, request):
            if time.time() - state.start_time > self.max_seconds:
                return SchedulerUpdateAction(request_queuing="stop")
            return SchedulerUpdateAction(request_queuing="continue")
        return evaluate

Attributes:

Name Type Description
type_ str

Type identifier for the constraint initializer

Source code in src/guidellm/scheduler/constraints/constraint.py
class PydanticConstraintInitializer(StandardBaseModel, ABC, InfoMixin):
    """
    Abstract base for Pydantic-based constraint initializers.

    Provides standardized serialization, validation, and metadata handling for
    constraint initializers using Pydantic models. Subclasses implement specific
    constraint creation logic while inheriting validation and persistence support.
    Integrates with the constraint factory system for dynamic instantiation and
    configuration management.

    Example:
    ::
        @ConstraintsInitializerFactory.register("max_duration")
        class MaxDurationConstraintInitializer(PydanticConstraintInitializer):
            type_: str = "max_duration"
            max_seconds: float = Field(description="Maximum duration in seconds")

            def create_constraint(self) -> Constraint:
                def evaluate(state, request):
                    if time.time() - state.start_time > self.max_seconds:
                        return SchedulerUpdateAction(request_queuing="stop")
                    return SchedulerUpdateAction(request_queuing="continue")
                return evaluate

    :cvar type_: Type identifier for the constraint initializer
    """

    type_: str = Field(description="Type identifier for the constraint initializer")

    @property
    def info(self) -> dict[str, Any]:
        """
        Extract serializable information from this constraint initializer.

        :return: Dictionary containing constraint configuration and metadata
        """
        return self.model_dump()

    @abstractmethod
    def create_constraint(self, **kwargs) -> Constraint:
        """
        Create a constraint instance.

        Must be implemented by subclasses to return their specific constraint type
        with appropriate configuration and validation. The returned constraint should
        be ready for evaluation against scheduler state and requests.

        :param kwargs: Additional keyword arguments (usually unused)
        :return: Configured constraint instance
        :raises NotImplementedError: Must be implemented by subclasses
        """
        ...

info property

Extract serializable information from this constraint initializer.

Returns:

Type Description
dict[str, Any]

Dictionary containing constraint configuration and metadata

create_constraint(**kwargs) abstractmethod

Create a constraint instance.

Must be implemented by subclasses to return their specific constraint type with appropriate configuration and validation. The returned constraint should be ready for evaluation against scheduler state and requests.

Parameters:

Name Type Description Default
kwargs

Additional keyword arguments (usually unused)

{}

Returns:

Type Description
Constraint

Configured constraint instance

Raises:

Type Description
NotImplementedError

Must be implemented by subclasses

Source code in src/guidellm/scheduler/constraints/constraint.py
@abstractmethod
def create_constraint(self, **kwargs) -> Constraint:
    """
    Create a constraint instance.

    Must be implemented by subclasses to return their specific constraint type
    with appropriate configuration and validation. The returned constraint should
    be ready for evaluation against scheduler state and requests.

    :param kwargs: Additional keyword arguments (usually unused)
    :return: Configured constraint instance
    :raises NotImplementedError: Must be implemented by subclasses
    """
    ...

RequestsExhaustedConstraint

Bases: StandardBaseModel, InfoMixin

Source code in src/guidellm/scheduler/constraints/request.py
class RequestsExhaustedConstraint(StandardBaseModel, InfoMixin):
    type_: Literal["requests_exhausted"] = "requests_exhausted"  # type: ignore[assignment]
    num_requests: int

    @property
    def info(self) -> dict[str, Any]:
        """
        Extract serializable information from this constraint initializer.

        :return: Dictionary containing constraint configuration and metadata
        """
        return self.model_dump()

    def __call__(
        self, state: SchedulerState, request: RequestInfo
    ) -> SchedulerUpdateAction:
        _ = request  # Unused parameter
        create_exceeded = state.created_requests >= self.num_requests
        processed_exceeded = state.processed_requests >= self.num_requests
        remaining_requests = max(0, self.num_requests - state.processed_requests)
        stop_time = (
            None if remaining_requests > 0 else request.completed_at or time.time()
        )

        return SchedulerUpdateAction(
            request_queuing="stop" if create_exceeded else "continue",
            request_processing="stop_local" if processed_exceeded else "continue",
            metadata={
                "num_requests": self.num_requests,
                "create_exceeded": create_exceeded,
                "processed_exceeded": processed_exceeded,
                "created_requests": state.created_requests,
                "processed_requests": state.processed_requests,
                "remaining_requests": remaining_requests,
                "stop_time": stop_time,
            },
            progress=SchedulerProgress(
                remaining_requests=remaining_requests,
                total_requests=self.num_requests,
                stop_time=stop_time,
            ),
        )

info property

Extract serializable information from this constraint initializer.

Returns:

Type Description
dict[str, Any]

Dictionary containing constraint configuration and metadata

SerializableConstraintInitializer

Bases: Protocol

Protocol for serializable constraint initializers supporting persistence.

Extends ConstraintInitializer with serialization capabilities, enabling constraint configurations to be saved, loaded, and transmitted. Serializable initializers support validation, model-based configuration, and dictionary-based serialization for integration with configuration systems and persistence layers.

Example: :: class SerializableInitializer: @classmethod def model_validate(cls, data: dict) -> ConstraintInitializer: return cls(**data)

    def model_dump(self) -> dict[str, Any]:
        return {"type_": "max_requests", "max_requests": self.max_requests}

    def create_constraint(self) -> Constraint:
        # ... create constraint
Source code in src/guidellm/scheduler/constraints/constraint.py
@runtime_checkable
class SerializableConstraintInitializer(Protocol):
    """
    Protocol for serializable constraint initializers supporting persistence.

    Extends ConstraintInitializer with serialization capabilities, enabling constraint
    configurations to be saved, loaded, and transmitted. Serializable initializers
    support validation, model-based configuration, and dictionary-based serialization
    for integration with configuration systems and persistence layers.

    Example:
    ::
        class SerializableInitializer:
            @classmethod
            def model_validate(cls, data: dict) -> ConstraintInitializer:
                return cls(**data)

            def model_dump(self) -> dict[str, Any]:
                return {"type_": "max_requests", "max_requests": self.max_requests}

            def create_constraint(self) -> Constraint:
                # ... create constraint
    """

    @classmethod
    def model_validate(cls, **kwargs) -> ConstraintInitializer:
        """
        Create validated constraint initializer from configuration.

        :param kwargs: Configuration dictionary for initializer creation
        :return: Validated constraint initializer instance
        """

    def model_dump(self) -> dict[str, Any]:
        """
        Serialize constraint initializer to dictionary format.

        :return: Dictionary representation of constraint initializer
        """

    def create_constraint(self, **kwargs) -> Constraint:
        """
        Create constraint instance from this initializer.

        :param kwargs: Additional configuration parameters
        :return: Configured constraint evaluation function
        """

create_constraint(**kwargs)

Create constraint instance from this initializer.

Parameters:

Name Type Description Default
kwargs

Additional configuration parameters

{}

Returns:

Type Description
Constraint

Configured constraint evaluation function

Source code in src/guidellm/scheduler/constraints/constraint.py
def create_constraint(self, **kwargs) -> Constraint:
    """
    Create constraint instance from this initializer.

    :param kwargs: Additional configuration parameters
    :return: Configured constraint evaluation function
    """

model_dump()

Serialize constraint initializer to dictionary format.

Returns:

Type Description
dict[str, Any]

Dictionary representation of constraint initializer

Source code in src/guidellm/scheduler/constraints/constraint.py
def model_dump(self) -> dict[str, Any]:
    """
    Serialize constraint initializer to dictionary format.

    :return: Dictionary representation of constraint initializer
    """

model_validate(**kwargs) classmethod

Create validated constraint initializer from configuration.

Parameters:

Name Type Description Default
kwargs

Configuration dictionary for initializer creation

{}

Returns:

Type Description
ConstraintInitializer

Validated constraint initializer instance

Source code in src/guidellm/scheduler/constraints/constraint.py
@classmethod
def model_validate(cls, **kwargs) -> ConstraintInitializer:
    """
    Create validated constraint initializer from configuration.

    :param kwargs: Configuration dictionary for initializer creation
    :return: Validated constraint initializer instance
    """

UnserializableConstraintInitializer

Bases: PydanticConstraintInitializer

Placeholder for constraints that cannot be serialized or executed.

Represents constraint initializers that failed serialization or contain non-serializable components. Cannot be executed and raises errors when invoked to prevent runtime failures from invalid constraint state. Used by the factory system to preserve constraint information even when full serialization is not possible.

Example: :: # Created automatically by factory when serialization fails unserializable = UnserializableConstraintInitializer( orig_info={"type_": "custom", "data": non_serializable_object} )

# Attempting to use it raises RuntimeError
constraint = unserializable.create_constraint()  # Raises RuntimeError

Attributes:

Name Type Description
type_ Literal['unserializable']

Always "unserializable" to identify placeholder constraints

orig_info dict[str, Any]

Original constraint information before serialization failure

Source code in src/guidellm/scheduler/constraints/constraint.py
class UnserializableConstraintInitializer(PydanticConstraintInitializer):
    """
    Placeholder for constraints that cannot be serialized or executed.

    Represents constraint initializers that failed serialization or contain
    non-serializable components. Cannot be executed and raises errors when
    invoked to prevent runtime failures from invalid constraint state. Used
    by the factory system to preserve constraint information even when full
    serialization is not possible.

    Example:
    ::
        # Created automatically by factory when serialization fails
        unserializable = UnserializableConstraintInitializer(
            orig_info={"type_": "custom", "data": non_serializable_object}
        )

        # Attempting to use it raises RuntimeError
        constraint = unserializable.create_constraint()  # Raises RuntimeError

    :cvar type_: Always "unserializable" to identify placeholder constraints
    :cvar orig_info: Original constraint information before serialization failure
    """

    type_: Literal["unserializable"] = "unserializable"  # type: ignore[assignment]
    orig_info: dict[str, Any] = Field(
        default_factory=dict,
        description="Original constraint information before serialization failure",
    )

    def create_constraint(self, **_kwargs) -> Constraint:
        """
        Raise error for unserializable constraint creation attempt.

        :param kwargs: Additional keyword arguments (unused)
        :raises RuntimeError: Always raised since unserializable constraints
            cannot be executed
        """
        raise RuntimeError(
            "Cannot create constraint from unserializable constraint instance. "
            "This constraint cannot be serialized and therefore cannot be executed."
        )

    def __call__(
        self, state: SchedulerState, request: RequestInfo
    ) -> SchedulerUpdateAction:
        """
        Raise error since unserializable constraints cannot be invoked.

        :param state: Current scheduler state (unused)
        :param request: Individual request information (unused)
        :raises RuntimeError: Always raised for unserializable constraints
        """
        _ = (state, request)  # Unused parameters
        raise RuntimeError(
            "Cannot invoke unserializable constraint instance. "
            "This constraint was not properly serialized and cannot be executed."
        )

__call__(state, request)

Raise error since unserializable constraints cannot be invoked.

Parameters:

Name Type Description Default
state SchedulerState

Current scheduler state (unused)

required
request RequestInfo

Individual request information (unused)

required

Raises:

Type Description
RuntimeError

Always raised for unserializable constraints

Source code in src/guidellm/scheduler/constraints/constraint.py
def __call__(
    self, state: SchedulerState, request: RequestInfo
) -> SchedulerUpdateAction:
    """
    Raise error since unserializable constraints cannot be invoked.

    :param state: Current scheduler state (unused)
    :param request: Individual request information (unused)
    :raises RuntimeError: Always raised for unserializable constraints
    """
    _ = (state, request)  # Unused parameters
    raise RuntimeError(
        "Cannot invoke unserializable constraint instance. "
        "This constraint was not properly serialized and cannot be executed."
    )

create_constraint(**_kwargs)

Raise error for unserializable constraint creation attempt.

Parameters:

Name Type Description Default
kwargs

Additional keyword arguments (unused)

required

Raises:

Type Description
RuntimeError

Always raised since unserializable constraints cannot be executed

Source code in src/guidellm/scheduler/constraints/constraint.py
def create_constraint(self, **_kwargs) -> Constraint:
    """
    Raise error for unserializable constraint creation attempt.

    :param kwargs: Additional keyword arguments (unused)
    :raises RuntimeError: Always raised since unserializable constraints
        cannot be executed
    """
    raise RuntimeError(
        "Cannot create constraint from unserializable constraint instance. "
        "This constraint cannot be serialized and therefore cannot be executed."
    )