Introduction

Kubernetes provides access to special hardware resources (such as Ascend NPUs) through Device Plugins. However, multiple software components (such as drivers, container runtimes, or other libraries) are required to configure and manage a node having these hardware resources. Installation of these components is complex, difficult, and error-prone. The NPU operator uses the Operator Framework in Kubernetes to automatically manage all software components required for configuring Ascend devices. These components include the Ascend driver and firmware (which support the entire running process of clusters), as well as the MindCluster device plug-in (which supports cluster operations such as job scheduling, O&M monitoring, and fault recovery). By installing corresponding components, you can manage NPU resources, optimize workload scheduling, and containerize training and inference tasks, so that AI jobs can be deployed and run on NPU devices as containers.

For more details, refer to NPU Operator.