aigw

module
v0.0.0-...-89a2ad8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2026 License: Apache-2.0

README

AIGW
The Intelligent Inference Scheduler for Large-scale Inference Services

English | 中文

About

AIGW is an intelligent inference scheduler for large-scale inference services. It provides intelligent routing, overload protection, and multi-tenant QoS capabilities through a global routing solution that is aware of load, KVCache, and Lora. This helps achieve higher throughput, lower latency, and efficient use of resources.

Status

Early & quick developing

Architecture

Architecture

Highlights

  1. A flexible, powerful, and easy-to-maintain Envoy Golang extension
  2. Near real-time load metric collection
  3. A balanced multi-factor composite decision-making algorithm
  4. A highly available architecture that supports horizontal scaling

Developer Guide

Developer Guide

Community

AIGW is built based on Envoy and Istio. We express our sincere gratitude to them.

Roadmap

  1. Precise cache-awareness
  2. SLO-aware algorithm based on latency prediction
  3. PD separation scheduling
  4. DP level scheduling

License

This project is licensed under the Apache 2.0 License.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL