# omni_infer

**Repository Path**: omniai/omniinfer

## Basic Information

- **Project Name**: omni_infer
- **Description**: Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature set.
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 141
- **Forks**: 466
- **Created**: 2025-06-06
- **Last Updated**: 2025-12-15

## Categories & Tags

**Categories**: ai

**Tags**: None

## README

# Omni_Infer: Inference Accelerators for Ascend NPU

Omni_Infer is a powerful suite of inference accelerators tailored for the Ascend NPU platform, fully compatible with vLLM, and designed to deliver high-performance, enterprise-grade inference with native support and a growing feature set.

## Key Features

- **Enterprise-Grade Low-Latency P/D Scheduling**: xPyD scheduling with scale-out support for large-scale, disaggregated PD deployments, ensuring minimal latency. Refer to [Global Proxy Design](omni/accelerators/sched/global_proxy/README.md) for details.
- **Request-Level Load Balancing**: Optimizes prefill and decode phases for maximum throughput and low latency across all sequence lengths.
- **Optimized MoE Expert Deployment**: Supports large-scale Mixture of Experts (MoE) models with EP144/EP288 configurations.
- **MoE Expert Load Balancing**: Features layer-wise, uneven redundancy and near real-time dynamic expert placement for efficient resource utilization. Refer to [OmniPlacement Design](omni/accelerators/placement/README.md) for details.
- **Advanced Attention Optimizations**: Tailored for LLM, MLLM, and MoE models, enhancing performance and scalability.

## High-Level Architecture

![image](/docs/figures/omni_infer_arch.png)

## Getting Started

For an example of PD separation and rapid deployment, please refer to the [quick start guide](docs/omni_infer_quick_start.md). To integrate Omni_Infer into your project, refer to the [installation guide](docs/omni_infer_installation_guide.md) and [documentation](docs/) for detailed setup instructions and API references.

## Contributing
We welcome contributions to enhance Omni_Infer! Please check our [contributing guidelines](./CONTRIBUTION.md) and submit pull requests or issues via [Gitee Issues](https://gitee.com/omniai/omniinfer/issues/new?issue%5Bassignee_id%5D=0&issue%5Bmilestone_id%5D=0).

## License

Omni_Infer is licensed under the [MIT License](LICENSE).