Google I/O 2024: An I/O for a new generation

author
2 minutes, 1 second Read

Infrastructure for the AI era: Introducing Trillium

Training state-of-the-art models requires a lot of computing power. Industry demand for ML compute has grown by a factor of 1 million in the last six years. And every year, it increases tenfold.

Google was built for this. For 25 years, we’ve invested in world-class technical infrastructure. From the cutting-edge hardware that powers Search, to our custom tensor processing units that power our AI advances.

Gemini was trained and served entirely on our fourth and fifth generation TPUs. And other leading AI companies, including Anthropic, have trained their models on TPUs as well.

Today, we’re excited to announce our 6th generation of TPUs, called Trillium. Trillium is our most performant and most efficient TPU to date, delivering a 4.7x improvement in compute performance per chip over the previous generation, TPU v5e.

We’ll make Trillium available to our Cloud customers in late 2024.

Alongside our TPUs, we’re proud to offer CPUs and GPUs to support any workload. That includes the new Axion processors we announced last month, our first custom Arm-based CPU that delivers industry-leading performance and energy efficiency.

We’re also proud to be one of the first Cloud providers to offer Nvidia’s cutting-edge Blackwell GPUs, available in early 2025. We’re fortunate to have a longstanding partnership with NVIDIA, and are excited to bring Blackwell’s breakthrough capabilities to our customers.

Chips are a foundational part of our integrated end-to-end system. From performance-optimized hardware and open software to flexible consumption models. This all comes together in our AI Hypercomputer, a groundbreaking supercomputer architecture.

Businesses and developers are using it to tackle more complex challenges, with more than twice the efficiency relative to just buying the raw hardware and chips. Our AI Hypercomputer advancements are made possible in part because of our approach to liquid cooling in our data centers.

We’ve been doing this for nearly a decade, long before it became state-of-the-art for the industry. And today our total deployed fleet capacity for liquid cooling systems is nearly 1 gigawatt and growing — that’s close to 70 times the capacity of any other fleet.

Underlying this is the sheer scale of our network, which connects our infrastructure globally. Our network spans more than 2 million miles of terrestrial and subsea fiber: over 10 times (!) the reach of the next leading cloud provider.

We will keep making the investments necessary to advance AI innovation and deliver state-of-the-art capabilities.

Source

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *