vLLM-Spyre-Next¶
Overview¶
The vllm-spyre-next plugin represents the next iteration of vLLM-Spyre, built on the torch-spyre stack. This plugin integrates with torch-spyre to leverage the PyTorch Inductor backend for model compilation, providing enhanced performance and optimization capabilities for large language model inference.
Key Features¶
- Torch Inductor Backend: Utilizes PyTorch's native Inductor compiler for optimized model execution
- torch-spyre Integration: Built on the torch-spyre framework for advanced compilation and optimization
- vLLM Platform Plugin: Seamlessly integrates with vLLM's plugin architecture via the
vllm.platform_pluginsentry point - CPU-Optimized: Configured for efficient CPU-based inference with vLLM 0.15.1+cpu
Architecture¶
The plugin registers itself as a vLLM platform plugin through the entry point:
This allows vLLM to automatically discover and load the plugin, enabling torch-spyre-based compilation and execution.
Getting Started¶
To get started with vllm-spyre-next, see the Installation Guide.
Documentation¶
- Installation - Setup and installation instructions
Requirements¶
- Python >= 3.11
- torch-spyre (built from source)
- vLLM 0.15.1+cpu
- PyTorch 2.10.0 (CPU version)
License¶
Apache 2.0