Data centre investment boom: record cheques, but can we build – and run – them?
report, insights, operations Albert Wong report, insights, operations Albert Wong

Data centre investment boom: record cheques, but can we build – and run – them?

Australia's data centre gold rush is well underway, with Amazon, Microsoft, and Anthropic committing tens of billions to new facilities. But the real challenge isn't signing the cheques - it's securing the grid capacity, water for cooling, and the skilled operators needed to keep these complex facilities running for the next two decades.

Value is not extracted from a data centre by its architecture; it is extracted by its operations. Without a world-class operations team, even the most impressive facility is just an expensive, inert building. Building is the easy part. Running it for the next twenty years is where the real battle begins.

Read More
It’s not your UPS. It’s not your generator. So why is your data centre still going dark?
operations, uptime, insights Albert Wong operations, uptime, insights Albert Wong

It’s not your UPS. It’s not your generator. So why is your data centre still going dark?

Uptime's 2026 outage analysis delivers a clear warning: after years of steady gains, reliability improvements are stalling. Power remains the leading cause of impactful outages, but the biggest emerging threats now sit outside the fence line - fibre cuts, grid constraints, and third‑party failures are all on the rise. Human error is a factor in the vast majority of incidents, and most outages could have been prevented with better processes. Costs keep climbing, with a growing share of outages now reaching seven figures. Meanwhile, confidence in public cloud resiliency is falling, and AI workloads are introducing new, poorly understood risks. If your resilience strategy still focuses only on internal systems, you're already behind.

Read More
Beyond the Plumbing: Engineering Direct-to-Chip Cooling for AI Workloads
uptime, engineering, operations Albert Wong uptime, engineering, operations Albert Wong

Beyond the Plumbing: Engineering Direct-to-Chip Cooling for AI Workloads

The Hidden Engineering Challenge of Direct‑to‑Chip Cooling

AI workloads don’t just run hotter – they run differently. Training a large language model can ramp GPU utilisation from 60% to 100% and back down within milliseconds, pushing coolant temperatures above 45°C in closed loops. That rapid thermal cycling demands response times measured in seconds, not minutes.

Direct‑to‑Chip (D2C) liquid cooling is the industry’s answer, but it introduces new risks: fluid inches from $40,000 GPUs, hundreds of potential leak points, and coolant chemistry that can corrode piping from the inside out.

And if a cooling anomaly strikes? You have roughly 5–10 seconds before the silicon throttles – or crashes a multi‑day training job.

Traditional data centre operations weren't built for this. Managing D2C requires fluid chemistry expertise, concurrent maintenance procedures for live liquid loops, and unified IT‑facilities alarm chains.

That’s the new engineering reality of AI infrastructure.

Read More
From Static Inventory to Lifecycle Intelligence
uptime, operations, insights Albert Wong uptime, operations, insights Albert Wong

From Static Inventory to Lifecycle Intelligence

In a recent post, Uptime Institute highlighted that critical spares management is no longer a static decision - it's a moving target. Operators are shifting toward hybrid strategies that blend on-site stock with vendor agreements, but the real gap is lifecycle intelligence: knowing where each asset stands in its service life so spares, maintenance, and replacement plans evolve accordingly.

Read More
From Air to Liquid Fire: Building the AI Factory - Why Old-School Data Centres Just Lost Their Cool
insights, design, maintenance, operations Albert Wong insights, design, maintenance, operations Albert Wong

From Air to Liquid Fire: Building the AI Factory - Why Old-School Data Centres Just Lost Their Cool

Building an AI factory is nothing like a traditional cloud data centre. Cloud racks run at 10-20kW; AI racks scream past 120kW. That changes everything - power, cooling, and especially the build process.

Forget stick-built construction. AI factories demand prefabricated Power Train Units (PTUs) - factory-assembled electrical vaults craned into place and operational in days, not months. Liquid cooling replaces air, forcing vendor lock-in and component-level compatibility testing.

On certification: pursue Tier III for concurrent maintainability (service without shutdown), but accept N+1 cooling rather than 2N fault tolerance. Pure Tier IV doubles your piping and leak points for marginal gain.

Post-build, operations shift from IT to industrial engineering. Methods of Procedure (MOPs) govern every valve turn. Programmed maintenance runs every 2-3 weeks. Your technicians now need fluid dynamics literacy.

The verdict? The cloud was built on air. The AI factory runs on liquid, modular steel, and surgical precision. Build accordingly.

Read More
The Uptime Institute 2026 Vendor Survey: 3 Hard Truths About Data Centre Outages
survey, uptime, insights, operations Albert Wong survey, uptime, insights, operations Albert Wong

The Uptime Institute 2026 Vendor Survey: 3 Hard Truths About Data Centre Outages

The Uptime Institute's 2026 Vendor Survey reveals three hard truths: AI is mostly used for monitoring (54%) and predictive maintenance (44%) - not fixing problems. Cost savings (56%) and energy efficiency (55%) are the top metrics, not uptime. And human error (30%) and power failures (25%) still cause most outages.

Outage frequency may be declining, but the cost of each outage is rising - one in five now exceed $1 million.

Read More
Beyond PUE: Why Total Power Usage Effectiveness (TUE) Is the Metric Liquid Cooling Has Been Waiting For
insights, operations Albert Wong insights, operations Albert Wong

Beyond PUE: Why Total Power Usage Effectiveness (TUE) Is the Metric Liquid Cooling Has Been Waiting For

PUE has been the data centre efficiency standard for years, but it was never designed for liquid cooling. Total Power Usage Effectiveness (TUE) goes deeper - capturing losses not just at the facility level, but inside the servers themselves. With the Uptime Institute’s 2025 survey showing a stagnant average PUE of 1.54, it’s time for a better metric. This article explains what a good TUE looks like, how it compares to PUE, and why liquid‑cooled data centres need TUE to measure what truly matters.

Read More
Why Operations Can’t Be an Afterthought: The 20-Year Lesson from 2014
operations, insights Albert Wong operations, insights Albert Wong

Why Operations Can’t Be an Afterthought: The 20-Year Lesson from 2014

The 20‑Year Payoff: Why Operations Must Lead from Day One

A 2014 Uptime Institute article, “Best Practice Is to Start With the End in Mind,” made a deceptively simple argument: involve data centre operations at the very beginning of any capital project. More than a decade later, that advice is more urgent than ever.

Why? Because a data centre’s design and construction typically take 1‑2 years, but its operational life spans 20+ years. Yet operations is still often treated as an afterthought—brought in only after commissioning, when the biggest opportunities to shape maintainability, efficiency, and total cost of ownership have already passed.

As the article’s author, Lee Kirby, noted: if value engineering happens without operations input, “increased costs over the life of the data center may dwarf any initial savings.” In other words, value is extracted from a data centre by operations, not by the original build.

The solution is to embed operations expertise from day one—and to equip teams with the right frameworks. Turn a decade‑old insight into a 20‑year advantage.

Read More
Beyond the Headlines: Why the Iran War Makes Operational Sustainability a Strategic Imperative
operations Albert Wong operations Albert Wong

Beyond the Headlines: Why the Iran War Makes Operational Sustainability a Strategic Imperative

Energy Crisis & Operational Sustainability: Why the Iran War Changes the Game

Global energy markets are volatile, and the recent Iran‑related conflict has pushed price stability and supply security to the forefront for Australian critical infrastructure operators. In this environment, relying on carbon offsets alone is not enough—resilience starts with facility and IT efficiency.

As the Uptime Institute’s 2025 article The Two Sides of a Sustainability Strategy makes clear, operators must prioritise operational fundamentals: optimising PUE, reducing water and energy use, and building fuel flexibility. These measures directly insulate facilities from energy shocks, while ecosystem initiatives alone do not.

To build this capability, skilled teams and disciplined processes are essential. The Uptime Institute’s 2014 Operational Sustainability framework provides the proven blueprint.

Read More