Cloud-first strategies could cost 10 times more for HPC workloads, cautions HPE

Enterprises across APAC are adopting cloud-first strategies to modernise IT environments. A growing number of organisations are starting to realise that the cloud isn’t built for everything, especially when it comes to high-performance computing (HPC) and AI workloads.

“I have seen companies that have adopted a 100 percent cloud-first strategy for enterprise workloads. They're going to get hit very hard when they try to run HPC workloads in the cloud,” HPE’ general manager of HPC and AI, APAC and India, Joseph Yang, told iTnews Asia.

Cloud-based HPC can be up to 10 times more expensive than on-premises deployments for HPC use cases, leading many cloud-first companies to reconsider and seek ways to re-establish on-premises capabilities, Yang added.

Senior IT leaders must adopt a clear, data-driven approach to total cost of ownership (TCO) and long-term value when planning HPC or AI infrastructure, considering the high upfront costs and lengthy deployment timelines.

Start by identifying business problems first

Before investing in HPC or AI infrastructure, the first thing for CIOs is to assess whether their business challenges require such technologies.

Traditional HPC supports specialised use cases including R&D and simulation, so its value is well understood in those domains, said Yang.

He added that AI, built on HPC architecture, expands its relevance across various sectors. This calls for assessing how these technologies can fulfil specific business needs rather than investing for the sake of a trend.

Second, understand TCO clearly.

In traditional enterprise workloads, infrastructure accounts for only 15 to 20 percent of costs; in HPC and AI, it can exceed 50 percent. These systems utilise large-scale, integrated server environments to run intensive workloads, resulting in higher capital and operational costs. This shift forces organisations to budget and justify spending differently.

- Joseph Yang, General Manager of HPC and AI, APAC and India at HPE

According to Yang, AI investments today often stem from FOMO, not firm TCO justification. With competitors moving ahead and a two-year lead time to realise value, waiting may now pose a bigger risk than acting early, even if the investment remains speculative.

Third, choose the right system architecture. Yang mentioned that HPC and AI performance depend on efficient design and integration.

Purpose-built systems, by maximising compute density, reducing latency with optimised cabling, and using advanced cooling like direct liquid cooling, outperform standard racks, and cut energy use per unit of performance, he added.

When these principles are applied effectively, the returns can be substantial, as seen in industries that have embraced HPC for years.

HPC is moving from niche to enterprise-scale ROI

Traditional HPC use cases focus mainly on modelling and simulation - designing cars, aeroplanes, weather forecasts, and natural resource exploration.

These workloads replace expensive and slow physical testing with virtual simulations, providing major ROI through cost savings and faster turnaround.

Yang mentioned that the Japanese tire and rubber products company, Toyo Tire Corporation, has upgraded to a seventh-generation HPC system from HPE, delivered through the HPE GreenLake cloud.

The system, powered by HPE Cray XD, has helped the company cut large-scale tire design simulation times by half or more, allowing more engineers to run simulations at once.

This has allowed Toyo to optimise its in-house TOYO-FEM application and enhance its deep learning tools to speed development of next-generation tire structures, shapes, and patterns, Yang added.

However, he mentions that only a narrow band of R&D-intensive industries use traditional HPC.

Yang added that the real growth now comes from artificial intelligence, which relies on HPC-class systems but runs fundamentally different algorithms.

AI mimics the brain’s neural networks by processing large datasets and training virtual neurons.

It produces results for uses ranging from language models like ChatGPT to border-control image recognition, often without a clear explanation of how.

According to Yang, companies are seeing productivity gains as AI automates routine work, generates up to half of all code, and speeds airport immigration through image recognition.

These examples show how AI, built on HPC systems, is no longer confined to R&D-heavy sectors but is improving productivity, efficiency, and operations across industries.

However, Yang cautions that organisations scaling up HPC or AI infrastructure often make three key missteps that hinder performance, user experience, and long-term success.

Starting small backfires in HPC and AI scaling

The first major mistake is treating HPC or AI workloads like traditional enterprise workloads, where it's common to start small and scale gradually.

This approach fails with generative AI, particularly large language models (LLMs), where performance and responsiveness are critical, Yang said.

“Underpowered systems either take too long to generate results or require using smaller, less capable models that deliver poor user experiences. A user isn't going to wait an hour to get an answer to the question that they ask. They expect the answer to come back in 15 seconds,” he added.

The second common misstep is assuming that meaningful results from generative AI require immediate customisation with proprietary data.

According to Yang, early use cases should focus on internal productivity enhancements using generic models.

Applications, including automating routine tasks, summarising content, or assisting with email triage, can deliver efficiency gains without any custom training.

Letting users explore AI tools organically often reveals high-impact business use cases that emerge naturally over time.

The third and most critical issue is not involving the right experts early.

This is especially risky for companies that went all-in on the cloud and are now facing financial and operational challenges, said Yang.

Bringing these workloads back requires not just capital investment but access to facilities with appropriate power, cooling, and compute capabilities – and this is something many organisations no longer have, he added.

A shortage of skilled HPC and AI specialists makes early engagement with the right partners essential to avoid costly mistakes.