Neural processing unit

A neural processing unit (NPU), also known as AI accelerator or deep learning processor, is a class of specialized hardware accelerator^[1] or computer system^[2]^[3] designed to accelerate artificial intelligence (AI) and machine learning applications, including artificial neural networks and computer vision.

Use

Their purpose is either to efficiently execute already trained AI models (inference) or to train AI models. Their applications include algorithms for robotics, Internet of things, and data-intensive or sensor-driven tasks.^[4] They are often manycore or spatial designs and focus on low-precision arithmetic, novel dataflow architectures, or in-memory computing capability. As of 2024^[update], a widely used datacenter-grade AI integrated circuit chip, the Nvidia H100 GPU, contains tens of billions of MOSFETs.^[5]

Consumer devices

AI accelerators are used in mobile devices such as Apple iPhones, AMD AI engines^[6] in Versal and NPUs, Huawei, and Google Pixel smartphones,^[7] and seen in many Apple silicon, Qualcomm, Samsung, and Google Tensor smartphone processors.^[8]

It is more recently (circa 2022) added to computer processors from Intel,^[9] AMD,^[10] and Apple silicon.^[11] All models of Intel Meteor Lake processors have a built-in versatile processor unit (VPU) for accelerating inference for computer vision and deep learning.^[12]

On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models. To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16. A common metric is trillions of operations per second (TOPS), though this metric alone does not quantify which kind of operations are being performed.^[13]

Datacenters

Accelerators are used in cloud computing servers: e.g., tensor processing units (TPU) for Google Cloud Platform,^[14] and Trainium and Inferentia chips for Amazon Web Services.^[15] Many vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.

Since the late 2010s, graphics processing units designed by companies such as Nvidia and AMD often include AI-specific hardware in the form of dedicated functional units for low-precision matrix-multiplication operations. These GPUs are commonly used as AI accelerators, both for training and inference.^[16]

Scientific computation

Although NPUs are tailored for low-precision (e.g. FP16, INT8) matrix multiplication operations, they can be used to emulate higher-precision matrix multiplications in scientific computing. As modern GPUs place much focus on making the NPU part fast, using emulated FP64 (Ozaki scheme) on NPUs can potentially outperform native FP64: this has been demonstrated using FP16-emulated FP64 on NVIDIA TITAN RTX and using INT8-emulated FP64 on NVIDIA consumer GPUs and the A100 GPU. (Consumer GPUs are especially benefitted by this scheme as they have small amounts of FP64 hardware capacity, showing a 6× speedup.)^[17] Since CUDA Toolkit 13.0 Update 2, cuBLAS automatically uses INT8-emulated FP64 matrix multiplication of the equivalent precision if it's faster than native. This is in addition to the FP16-emulated FP32 feature introduced in version 12.9.^[18]

Programming

Mobile NPU vendors typically provide their own application programming interface such as the Snapdragon Neural Processing Engine. An operating system or a higher-level library may provide a more generic interface such as TensorFlow Lite with LiteRT Next (Android) or CoreML (iOS, macOS).

Consumer CPU-integrated NPUs are accessible through vendor-specific APIs. AMD (Ryzen AI), Intel (OpenVINO), Apple silicon (CoreML)^[a] each have their own APIs, which can be built upon by a higher-level library.

GPUs generally use existing GPGPU pipelines such as CUDA and OpenCL adapted for lower precisions and specialized matrix-multiplication operations. Vulkan is also being used. Custom-built systems such as the Google TPU use private interfaces.

There are large number of separate underlying acceleration APIs and compilers/runtimes in use in the AI field, causing a great increase in software development effort due to the many combinations involved. As of 2025, the open standard organization Khronos Group is pursuing standardization of AI-related interfaces to reduce the amount of work needed. Khronos is working on three separate fronts: expansion of data types and intrinsic operations in OpenCL and Vulkan, inclusion of compute graphs in SPIR-V, and a NNEF/SkriptND file format for describing a neural network.^[19]

Notes

^ MLX builds atop the CPU and GPU parts, not the Apple Neural Engine (ANE) part of Apple Silicon chips. The relatively good performance is due to the use of a large, fast unified memory design.

References

^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value). Google using its own AI accelerators.
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

External links

Nvidia Puts The Accelerator To The Metal With Pascal, The Next Platform
Eyeriss Project, Massachusetts Institute of Technology

[19] MLX builds atop the CPU and GPU parts, not the Apple Neural Engine (ANE) part of Apple Silicon chips. The relatively good performance is due to the use of a large, fast unified memory design.

[1] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[2] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[3] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[4] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value). Google using its own AI accelerators.

[5] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[6] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[7] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[8] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[9] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[10] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[11] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[12] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[13] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[14] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[15] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[16] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[17] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[18] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[20] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[a]

[19]

v t e Hardware acceleration
Theory	Universal Turing machine Parallel computing Distributed computing
Applications	GPU GPGPU software DirectX Audio Digital signal processing Hardware random number generation Neural processing unit Cryptography TLS Machine vision Custom hardware attack scrypt Networking Data
Implementations	High-level synthesis C to HDL FPGA ASIC CPLD System on a chip Network on a chip
Architectures	Dataflow Transport triggered Multicore Manycore Heterogeneous In-memory computing Systolic array Neuromorphic
Related	Programmable logic Processor design chronology Digital electronics Virtualization Hardware emulation Logic synthesis Embedded systems

Neural processing unit

Contents

Use

Consumer devices

Datacenters

Scientific computation

Programming

Notes

See also

References

External links

Navigation menu

Neural processing unit

Use

Consumer devices

Datacenters

Scientific computation

Programming

Notes

See also

References

External links

Navigation menu

Search