Design Development & Review: Technical Considerations for High-Density AI Facilities

Uptime Institute’s AI Infrastructure Advisory – Part 1 of 5

AI data centers are not your grandfather’s server rooms. They are not even your typical enterprise data center. According to Uptime Institute, AI workloads - particularly training - push infrastructure into entirely new territory: rack densities of 40–130 kW, hybrid liquid‑air cooling, and failure responses measured in seconds.

This is the first in a five‑part series based on Uptime Institute’s AI Infrastructure Advisory white papers. Here, we dive deep into Design Development and Review, focusing on the technical considerations that separate an AI‑ready facility from a costly retrofit.

The Density Reality: From 15 kW to 130 kW per Rack

Most conventional data centers operate at 5–15 kW per rack. Uptime Institute’s latest surveys show that even high‑end traditional facilities rarely exceed 20 kW.

AI training clusters break that ceiling. A single rack populated with NVIDIA Blackwell GPUs - up to 144 GPUs per rack - can draw 100 kW or more for the GPUs alone. Add networking, storage, and other components, and you are routinely looking at 40–130 kW per rack.

Technical takeaway:
If your design assumes air cooling alone, you will hit a hard wall. Uptime Institute places the upper limit for air‑only cooling at roughly 40-60 kW per rack, and that threshold depends on workload, economics, and ambient conditions. Above that, liquid cooling is not optional - it is mandatory.

Cooling: Direct Liquid Cooling (DLC) Is the Workhorse

Uptime Institute recommends direct‑to‑chip liquid cooling (DLC) for the vast majority of high‑density AI facilities. Here is why:

  • Thermal design power (TDP): Modern GPUs have a TDP of 700 W or more per chip. DLC removes 70–80% of that heat directly via cold plates circulating a water/glycol mix.

  • Air cooling still needed: The remaining 20–30% of heat (from networking cards, storage, and auxiliary processors) requires conventional air cooling. A 130 kW rack often needs >100 kW of liquid cooling plus >30 kW of air cooling.

  • Retrofit compatibility: DLC can be deployed in standard or near‑standard racks, and even retrofitted into existing data halls—subject to floor loading and access for coolant distribution units (CDUs).

Critical technical nuance - continuous cooling:
In a conventional air‑cooled data center, loss of cooling gives operators minutes of safe runtime. In a DLC environment, a cooling failure can cause GPU shutdown in less than one second. That is why Uptime Institute recommends continuous cooling (cooling equipment backed by UPS) for all AI facilities using DLC, even those designed only for concurrent maintainability (Tier III). Historically, continuous cooling was reserved for fault‑tolerant (Tier IV) sites.

Power Distribution: Higher Voltages, Faster Dynamics

AI workloads are not steady. GPU‑based training clusters exhibit rapid, sharp fluctuations in power draw—far more volatile than traditional CPU workloads. Your electrical design must account for this.

Two specific challenges:

  1. Grid capacity and on‑site power
    In many regions, local grids cannot provision the required megawatts quickly. Some large AI facilities are deploying their own on‑site primary power (e.g., gas turbines) as a bridge until grid connections become available. This is capital‑intensive but can cut deployment time by years.

  2. Medium‑voltage distribution
    To minimise cable and busbar costs at scale, Uptime notes that facilities may need to distribute power at 800 VDC or higher - moving beyond the 415 VDC or 480 VDC common in conventional data centers. Equipment for these voltages is less mature and more expensive, so early vendor evaluation is critical.

Physical Space: Bigger, Heavier, Taller

High density changes the geometry of the data center. Uptime Institute outlines several non‑negotiable physical requirements:

Parameter Conventional Facility AI High-Density Facility
Rack weight 680 - 1,000 kg > 2,000 kg
Rack height 42U 48 - 52U
Ceiling height Standard Increased for cabling, piping and, taller racks
Floor loading Standard concrete Reinforced slab or structurally braced raised floor
Grey space
(power/cooling)
~20-30% of IT white space Often exceeds 50% of white space

Why more grey space?
DLC requires CDUs, heat exchangers, pumps, piping, and tanks. Power distribution for 100+ kW racks needs more switchgear and transformers. Some designs use “sidecar” racks adjacent to GPU racks to house CDUs and power gear, keeping the main IT aisle cleaner but consuming more floor area.

Aisle width and pipework:
Traditional hot‑aisle/cold‑aisle layouts may need wider spacing to accommodate liquid piping, manifolds, and maintenance access. This further reduces usable white space per square metre.

Operability and Maintainability Built Into Design

Uptime Institute stresses that a design is not complete until it answers three operational questions:

  1. Can staff physically access and replace a failed CDU or rack manifold without shutting down adjacent equipment?
    This requires adequate spacing, redundant components, and clear maintenance procedures.

  2. How is coolant quality maintained over time?
    Impurities can clog cold plates. The design must include filtration, testing ports, and cleaning protocols.

  3. What is the demarcation between facility and IT teams for liquid cooling?
    Who owns the coolant loop up to the quick‑tee connectors? Who responds to a leak inside the rack? These must be defined during design, not after a failure.

Final Design Validation

Before final approval, Uptime recommends evaluating the completed design against three feasibility criteria:

  • Constructability: Can it be built on schedule at the intended location, given local labour, materials, and grid constraints?

  • Resiliency: Do the proposed topologies (5/4N, 4/3N, continuous cooling, thermal storage) meet the business’s availability targets?

  • Efficiency: Are cooling and power systems right‑sized to avoid over‑design? Is waste‑heat recovery feasible?

Summary: Technical Must‑Haves for AI Facility Design

Based on Uptime Institute’s Paper 1, any AI data center design must include:

  • Rack densities clearly defined (40–130 kW or higher)

  • Direct liquid cooling (DLC) with redundant manifolds and continuous cooling (UPS‑backed)

  • Hybrid air cooling for non‑GPU components (20–30% of total heat load)

  • Floor loading >2,000 kg per rack, with reinforced slabs or raised floors rated for DLC equipment

  • Ceiling height sufficient for taller racks and overhead piping/cabling

  • Gray space allocation often exceeding 50% of white space for CDUs, pumps, and power gear

  • Resiliency topology (concurrently maintainable or fault tolerant) with documented N+1, 2N, or distributed redundant architectures

  • Thermal storage tanks to bridge power interruptions for cooling systems

  • Operability features such as dripless connectors, leak detection, and clear IT/facilities demarcation

Previous
Previous

Technical Vendor Requirements & Evaluation: Selecting Cooling and Power Systems for AI

Next
Next

From Design to Operations: A Complete Guide to AI Data Centre Infrastructure