It’s not your UPS. It’s not your generator. So why is your data centre still going dark?
operations, uptime, insights Albert Wong operations, uptime, insights Albert Wong

It’s not your UPS. It’s not your generator. So why is your data centre still going dark?

Uptime's 2026 outage analysis delivers a clear warning: after years of steady gains, reliability improvements are stalling. Power remains the leading cause of impactful outages, but the biggest emerging threats now sit outside the fence line - fibre cuts, grid constraints, and third‑party failures are all on the rise. Human error is a factor in the vast majority of incidents, and most outages could have been prevented with better processes. Costs keep climbing, with a growing share of outages now reaching seven figures. Meanwhile, confidence in public cloud resiliency is falling, and AI workloads are introducing new, poorly understood risks. If your resilience strategy still focuses only on internal systems, you're already behind.

Read More
Operations & Management Strategy: Keeping AI Facilities Reliable, Safe, and Efficient
uptime, design, whitepaper Albert Wong uptime, design, whitepaper Albert Wong

Operations & Management Strategy: Keeping AI Facilities Reliable, Safe, and Efficient

Uptime Institute’s AI Infrastructure Advisory

Part 5: Operations & Management Strategy

A GPU can burn out in 30 seconds if coolant flow stops – that is the reality of operating an AI data center. Uptime Institute’s Part 5 covers staffing (experienced leaders are non‑negotiable), clear demarcation between IT and facilities for liquid cooling, safety in high‑current and medium‑voltage environments, shorter GPU lifecycles (three years vs. ten for CPUs), and the SOP/MOP/EOP documentation needed to run safely and reliably. Operations is not an afterthought - it is where value is made or lost.

Read More
Level 4 & 5 Commissioning: Testing AI Facilities for Real-World Workloads
uptime, design, whitepaper Albert Wong uptime, design, whitepaper Albert Wong

Level 4 & 5 Commissioning: Testing AI Facilities for Real-World Workloads

Uptime Institute’s AI Infrastructure Advisory

Part 4: Level 4 & 5 Commissioning

Standard load banks are just heaters – they cannot simulate the volatile power draw and heat output of real GPU workloads. In Part 4, Uptime Institute explains why AI facilities require specialised load banks, DLC‑specific fluid cleanliness and pressure testing, continuous cooling validation, and third‑party witnessed Level 5 integrated system testing. Commissioning is not complete until your facility can survive sub‑second cooling failures.

Read More
Construction Oversight & Validation: Preventing Design‑to‑Build Drift in AI Facilities
uptime, design, whitepaper Albert Wong uptime, design, whitepaper Albert Wong

Construction Oversight & Validation: Preventing Design‑to‑Build Drift in AI Facilities

Uptime Institute’s AI Infrastructure Advisory

Part 3: Construction Oversight & Validation

Fast AI builds are prone to design‑to‑build drift – small deviations that become costly remediation if caught late. Uptime Institute’s Part 3 details the physical demands of AI facilities: floor loading >2,000 kg per rack, multi‑story low‑latency designs, hybrid liquid/air cooling installation, and phased construction. Learn why independent milestone inspections are essential to protect your investment and schedule.

Read More
Technical Vendor Requirements & Evaluation: Selecting Cooling and Power Systems for AI
uptime, design, whitepaper Albert Wong uptime, design, whitepaper Albert Wong

Technical Vendor Requirements & Evaluation: Selecting Cooling and Power Systems for AI

Uptime Institute’s AI Infrastructure Advisory

Part 2: Technical Vendor Requirements & Evaluation

Choosing the wrong cooling or power technology can lock you into obsolete infrastructure for years. In Part 2, Uptime Institute compares direct‑to‑chip (DLC) vs. immersion cooling, explains why GPU power fluctuations demand high‑di/dt UPS systems, and provides a structured vendor evaluation framework – including RFP templates, weighted criteria, and the importance of delivery penalties. Maintain owner control while benefiting from independent, vendor‑neutral guidance.

Read More
Design Development & Review: Technical Considerations for High-Density AI Facilities
uptime, design, whitepaper Albert Wong uptime, design, whitepaper Albert Wong

Design Development & Review: Technical Considerations for High-Density AI Facilities

Uptime Institute’s AI Infrastructure Advisory

Part 1: Design Development & Review

Conventional data centers run at 5–15 kW per rack; AI training clusters routinely hit 40–130 kW. According to Uptime Institute, this density forces a complete rethink of cooling, power, and physical space. Part 1 covers direct liquid cooling (DLC), continuous cooling requirements, two reference resiliency topologies (concurrently maintainable and fault tolerant), and the structural must‑haves – from 2,000+ kg racks to taller ceilings and expanded gray space.

Read More
From Design to Operations: A Complete Guide to AI Data Centre Infrastructure
uptime, design, whitepaper Albert Wong uptime, design, whitepaper Albert Wong

From Design to Operations: A Complete Guide to AI Data Centre Infrastructure

Uptime Institute’s Guide to AI Data Center Infrastructure – A Five‑Stage Framework

AI data centers are not scaled‑up traditional facilities. Based on Uptime Institute’s five‑part advisory series, this condensed guide walks you through the entire infrastructure lifecycle: design, vendor selection, construction, commissioning, and operations. Learn why rack densities of 130 kW demand direct liquid cooling, why continuous cooling is non‑negotiable, and how to prevent design‑to‑build drift before it costs millions.

Read More
Beyond the Plumbing: Engineering Direct-to-Chip Cooling for AI Workloads
uptime, engineering, operations Albert Wong uptime, engineering, operations Albert Wong

Beyond the Plumbing: Engineering Direct-to-Chip Cooling for AI Workloads

The Hidden Engineering Challenge of Direct‑to‑Chip Cooling

AI workloads don’t just run hotter – they run differently. Training a large language model can ramp GPU utilisation from 60% to 100% and back down within milliseconds, pushing coolant temperatures above 45°C in closed loops. That rapid thermal cycling demands response times measured in seconds, not minutes.

Direct‑to‑Chip (D2C) liquid cooling is the industry’s answer, but it introduces new risks: fluid inches from $40,000 GPUs, hundreds of potential leak points, and coolant chemistry that can corrode piping from the inside out.

And if a cooling anomaly strikes? You have roughly 5–10 seconds before the silicon throttles – or crashes a multi‑day training job.

Traditional data centre operations weren't built for this. Managing D2C requires fluid chemistry expertise, concurrent maintenance procedures for live liquid loops, and unified IT‑facilities alarm chains.

That’s the new engineering reality of AI infrastructure.

Read More
From Static Inventory to Lifecycle Intelligence
uptime, operations, insights Albert Wong uptime, operations, insights Albert Wong

From Static Inventory to Lifecycle Intelligence

In a recent post, Uptime Institute highlighted that critical spares management is no longer a static decision - it's a moving target. Operators are shifting toward hybrid strategies that blend on-site stock with vendor agreements, but the real gap is lifecycle intelligence: knowing where each asset stands in its service life so spares, maintenance, and replacement plans evolve accordingly.

Read More
From Air to Liquid Fire: Building the AI Factory - Why Old-School Data Centres Just Lost Their Cool
insights, design, maintenance, operations Albert Wong insights, design, maintenance, operations Albert Wong

From Air to Liquid Fire: Building the AI Factory - Why Old-School Data Centres Just Lost Their Cool

Building an AI factory is nothing like a traditional cloud data centre. Cloud racks run at 10-20kW; AI racks scream past 120kW. That changes everything - power, cooling, and especially the build process.

Forget stick-built construction. AI factories demand prefabricated Power Train Units (PTUs) - factory-assembled electrical vaults craned into place and operational in days, not months. Liquid cooling replaces air, forcing vendor lock-in and component-level compatibility testing.

On certification: pursue Tier III for concurrent maintainability (service without shutdown), but accept N+1 cooling rather than 2N fault tolerance. Pure Tier IV doubles your piping and leak points for marginal gain.

Post-build, operations shift from IT to industrial engineering. Methods of Procedure (MOPs) govern every valve turn. Programmed maintenance runs every 2-3 weeks. Your technicians now need fluid dynamics literacy.

The verdict? The cloud was built on air. The AI factory runs on liquid, modular steel, and surgical precision. Build accordingly.

Read More
The Uptime Institute 2026 Vendor Survey: 3 Hard Truths About Data Centre Outages
survey, uptime, insights, operations Albert Wong survey, uptime, insights, operations Albert Wong

The Uptime Institute 2026 Vendor Survey: 3 Hard Truths About Data Centre Outages

The Uptime Institute's 2026 Vendor Survey reveals three hard truths: AI is mostly used for monitoring (54%) and predictive maintenance (44%) - not fixing problems. Cost savings (56%) and energy efficiency (55%) are the top metrics, not uptime. And human error (30%) and power failures (25%) still cause most outages.

Outage frequency may be declining, but the cost of each outage is rising - one in five now exceed $1 million.

Read More
Beyond PUE: Why Total Power Usage Effectiveness (TUE) Is the Metric Liquid Cooling Has Been Waiting For
insights, operations Albert Wong insights, operations Albert Wong

Beyond PUE: Why Total Power Usage Effectiveness (TUE) Is the Metric Liquid Cooling Has Been Waiting For

PUE has been the data centre efficiency standard for years, but it was never designed for liquid cooling. Total Power Usage Effectiveness (TUE) goes deeper - capturing losses not just at the facility level, but inside the servers themselves. With the Uptime Institute’s 2025 survey showing a stagnant average PUE of 1.54, it’s time for a better metric. This article explains what a good TUE looks like, how it compares to PUE, and why liquid‑cooled data centres need TUE to measure what truly matters.

Read More
HPC vs. AI: Same Roots, Different Branches (And Why Your Data Centre Needs to Know the Difference)
insights, design Albert Wong insights, design Albert Wong

HPC vs. AI: Same Roots, Different Branches (And Why Your Data Centre Needs to Know the Difference)

HPC and AI both love parallel processing and GPUs, but that’s where the family resemblance ends. HPC runs steady, precise simulations on air‑cooled racks with low‑latency networks. AI training spikes power to 150 kW per rack, needs liquid cooling, and demands massive bandwidth. AI inference? That’s bursty, auto‑scaling, and lives on cheaper hardware.

Read More
Monitoring, power, and the rising compliance tide: 5 takeaways from the 2026 Uptime Resiliency Survey
survey, uptime, insights Albert Wong survey, uptime, insights Albert Wong

Monitoring, power, and the rising compliance tide: 5 takeaways from the 2026 Uptime Resiliency Survey

What the 2026 data suggests: Uptime Institute’s latest resiliency survey finds that monitoring and analytics (53%) and electrical infrastructure upgrades (49%) remain the two most effective ways to improve data centre uptime - and the top areas for increased investment this year. Tellingly, the main justification for this spending is no longer ROI alone; operators are now citing design and operational standards as their lead argument. And 69% expect more resiliency regulations within three years.

Read More
AI Workloads Are Rewriting the Rules of Data Centre Design
insights, whitepaper Albert Wong insights, whitepaper Albert Wong

AI Workloads Are Rewriting the Rules of Data Centre Design

AI is forcing a fundamental redesign of data centre infrastructure

Rack power densities now exceed 100 kW and are still climbing. Liquid cooling is no longer optional. Grid connection timelines stretch to five years. In Australia, built‑out data centre capacity is set to double by 2030, with billions in new investment already underway – yet water scarcity and grid constraints pose serious risks.

The six AI attributes reshaping power, cooling and racks demand a strategic response. Organisations that act now – assessing electrical capacity, adopting liquid‑ready designs, and deploying digital twins – will secure a competitive advantage.

Read More
Why AI Won’t Make You an Expert - And What Actually Will
insights, education Albert Wong insights, education Albert Wong

Why AI Won’t Make You an Expert - And What Actually Will

Why AI Won’t Make You an Expert—And What Will

Generative AI can help your team work faster, but it can’t turn novices into true experts. Recent research shows that while AI accelerates competence, it does not close the performance gap between beginners and seasoned professionals. Tacit knowledge - the kind gained through experience, mentorship, and real‑world lessons - remains beyond AI’s reach.

Read More
AI Training Boom: Is Your Data Centre Ready for the New Rules?
insights, engineering Albert Wong insights, engineering Albert Wong

AI Training Boom: Is Your Data Centre Ready for the New Rules?

The New Rules of AI Data Centre Engineering

The race to build AI factories isn’t just a technological challenge - it’s redefining the physical limits of our data centres. A new Uptime Institute survey shows average rack density for AI training hardware has hit 56 kW, with the share of sites exceeding 100 kW more than doubling in just one year.

At the same time, the Australian Government has published five formal expectations for AI infrastructure - from underwriting renewable energy to using water‑efficient cooling. Projects that align will be prioritised; those that don’t will face significant headwinds.

For organisations building or upgrading AI infrastructure, the message is clear: business as usual is no longer an option. Specialised engineering is the difference between delay and delivery.

Read More
Why Operations Can’t Be an Afterthought: The 20-Year Lesson from 2014
operations, insights Albert Wong operations, insights Albert Wong

Why Operations Can’t Be an Afterthought: The 20-Year Lesson from 2014

The 20‑Year Payoff: Why Operations Must Lead from Day One

A 2014 Uptime Institute article, “Best Practice Is to Start With the End in Mind,” made a deceptively simple argument: involve data centre operations at the very beginning of any capital project. More than a decade later, that advice is more urgent than ever.

Why? Because a data centre’s design and construction typically take 1‑2 years, but its operational life spans 20+ years. Yet operations is still often treated as an afterthought—brought in only after commissioning, when the biggest opportunities to shape maintainability, efficiency, and total cost of ownership have already passed.

As the article’s author, Lee Kirby, noted: if value engineering happens without operations input, “increased costs over the life of the data center may dwarf any initial savings.” In other words, value is extracted from a data centre by operations, not by the original build.

The solution is to embed operations expertise from day one—and to equip teams with the right frameworks. Turn a decade‑old insight into a 20‑year advantage.

Read More
Is Your Data Centre Facility Ready for the AI Revolution?
whitepaper Albert Wong whitepaper Albert Wong

Is Your Data Centre Facility Ready for the AI Revolution?

Is Your Data Centre Facility Ready for the AI Revolution?

AI workloads are reshaping data center infrastructure—demanding higher power densities, advanced liquid cooling, and faster scalability. The latest Vertiv white paper confirms that traditional facilities must evolve to meet these challenges. As a trusted Vertiv partner, Ecanet delivers end-to-end engineering solutions across Australia and the APAC region, helping colocation providers transition to AI-ready power systems, hybrid cooling, and modular prefabricated designs. From concept to commissioning, we ensure your infrastructure is built to perform, scale, and last.

Read More