NVIDIA A30 Tensor Core GPUs provide increased performance for every enterprise workload. It enables secure speedups across different workloads, including AI inference at scale and high-performance computing (HPC) applications, thanks to NVIDIA Ampere architecture Tensor Cores and Multi-Instance GPU (MIG). A30 allows an elastic data center and delivers maximum value for organizations by combining rapid memory bandwidth and low power consumption in a PCIe form factor—optimal for mainstream servers.
The NVIDIA Ampere architecture is part of the unified NVIDIA EGXTM platform, which includes building blocks from the NVIDIA NGCTM catalog for hardware, networking, software, libraries, and optimized AI models and applications. It is the most powerful end-to-end AI and HPC platform for data centers, enabling researchers to quickly generate real-world results and deploy solutions at scale.
Deep Learning Training
Scalability and compute capacity are essential for tackling the next-level difficulties of conversational AI.
Up to ten times faster than the NVIDIA T4, the NVIDIA A30 Tensor Cores with Tensor Float (TF32) give a cumulative throughput gain of 20 times over the NVIDIA T4 with zero code changes. As many as tens of thousands of GPUs can be supported when using the NVIDIA Magnum IO SDK, NVIDIA NVLink, PCIe Gen4, and NVIDIA networking.
A30’s Tensor Cores and MIG allow it to handle a variety of workloads on-the-fly. During peak demand, it can be used for production inference, and off-peak hours, a portion of the GPU can be reused to quickly retrain the same models.
Multiple performance records were set by NVIDIA in the AI training benchmark MLPerf.
Deep Learning Inference
Innovative features in A30 help reduce inference workloads. Precisions range from 64-bit FP64 to 32-bit TF32 and INT4. By allowing up to four MIGs to run simultaneously on each GPU, A30 ensures high-quality service over various networks (QoS). Additional inference performance benefits of up to 2X can be achieved by using A30’s structural sparsity support.
MLPerf Inference proved NVIDIA’s market-leading AI performance. With NVIDIA Triton Inference Server, which makes it simple to deploy AI at scale, A30 is able to deliver this game-changing performance to any business.
High-Performance Computing
Scientists utilize simulations to better comprehend the environment around us in order to uncover next-generation discoveries.
The NVIDIA A30 includes FP64. NVIDIA Ampere architecture Tensor Cores provide the most significant improvement in HPC performance since GPUs were introduced. Researchers can complete double-precision computations quickly using 24 gigabytes (GB) of GPU memory and a bandwidth of 933 gigabytes per second (GB/s). TF32 can also be used in HPC applications to increase throughput for single-precision, dense matrix-multiply operations.
The combination of FP64 Tensor Cores with MIG allows research institutes to securely divide the GPU to provide numerous researchers with guaranteed QoS and maximum GPU utilization. During peak demand periods, enterprises can leverage A30’s inference capabilities, then repurpose the same compute servers for HPC and AI training workloads during off-peak periods.
High-Performace Data Analytics
Data scientists must be able to analyze, visualize, and interpret large datasets. However, datasets distributed across different servers slow down scale-out solutions.
To tackle these tasks, accelerated servers with A30 provide the requisite processing power, as well as massive HBM2 memory, 933GB/sec memory bandwidth, and scalability with NVLink. The NVIDIA data center platform accelerates these massive workloads at unparalleled levels of performance and efficiency when combined with NVIDIA InfiniBand, NVIDIA Magnum IO, and the RAPIDSTM portfolio of open-source libraries, including the RAPIDS Accelerator for Apache Spark.
Enterprise-Ready Utilization
The A30 with MIG makes full use of GPU-accelerated infrastructure. An A30 GPU can be partitioned into up to four distinct instances using MIG, allowing additional users to benefit from GPU acceleration.
Kubernetes, containers, and hypervisor-based server virtualization are all used by MIG. MIG enables infrastructure managers to provide each job with the right-sized GPU and guaranteed QoS, increasing the reach of accelerated computing resources to all users.