High-performance computing (HPC) technologies: what does the future hold? [part 6]

by Jon Thor Kristinsson on 5 December 2022

In this blog, we will look towards the future of HPC technologies and where innovations in the HPC industry might be going.

This blog is part of a series of blogs that introduce you to the world of HPC:

It’s always difficult to predict the future, but we expect to see plenty around easing the deployment of applications and clusters for HPC. Composability is positioned to grow with the advent of new server components. Schedulers continue to advance. Quantum computing is getting closer each year, with new updates expected in the years to come. Alternative architectures are growing in use and becoming more mature. Ubuntu is positioned to work well everywhere, so it will be a great asset for organisations looking to use HPC.  Let’s explore technological innovations surrounding:

  • Ease of use and containerisation
  • DPUs, GPUs and FPGAs
  • Bare metal clouds
  • Quantum computing
  • Alternative architectures
  • Workloads and scheduling

Ease of use and containerisation

Users experience is key, users expect things to work easily without hassle. So as with other types of workloads, a lot of effort is being put into making things easier to use and increasing automation in HPC. Users now depend on being able to access applications easily and want to be able to run applications without having to deal with complex compilation pipelines and endless dependencies, and they want application delivery to be both rapid and timely. They want to be able to take advantage of recent releases and multiple historical versions of the software. 

We are sure to see more user-friendly build pipelines for HPC software delivery, such as those used in Spack. Spack is an HPC-specific package manager. Unlike most package managers where everything is pre-built, Spack has to build things as part of the package installation process. This benefits users by allowing them to build against hardware-specific library implementations which yields better performance. Spack also makes it easier to compile multiple combinations of software and libraries with multiple compilers, this makes benchmarking one combination against another quite easy.

Another trend we can expect to see is more application container usage. Applications could be rapidly built into runnable containers with all relevant dependencies executable in all environments.

There is also a need to easily deploy these complex HPC clusters, on anything from a single workstation for quick experimentation to large and expanding HPC clusters. We are even seeing a need to segment clusters, allowing for the rapid deployment of cluster parts with a slightly different configuration to cater to specific workload needs. Sometimes this allows for the  allocation of  a combined set of resources into smaller, flexible segments that meet different departmental needs. Cluster delivery isn’t just dependent on on-premise clusters, clouds offer a quick access to hardware without server purchasing and delivery, it’s important to be able to take advantage of clouds and on-premise deployments with the same tooling giving a unified experience.

DPUs, GPUs and FPGAs

Computing devices are becoming smarter and more functional providing more extensibility and advanced use cases or interfaces that often deliver greater performance or insight into the overall processing. 

For example, data processing units or DPUs, which move data around the datacenter while also processing it, and smartnics are offering compute power on the NIC.  Effectively, the nic functions as a small server, allowing the server itself to offload various workloads or subprocesses to the NIC. For instance, MVAPICH has added capabilities to offload the MPI processes running on a compute node to Nvidia Bluefield 2 networking cards, as part of an overall workload. This allows the workload to consume all available hardware resources without any MPI processes interrupting the main workload. 

There are also efforts like OpenSNAPI, part of the Unified Communication Framework (UCF) consortium, which are looking to create a unified API for accessing compute engines on networking or smart networking devices. 

GPU usage in clusters continues to expand, due to the ever increasing availability of workloads that can take advantage of the different operations and architectures of GPUs. GPUs continue to add new operations providing flexibility when it comes to executing computations.  We even see them gaining more and more addressable memory and adding features that extend their capabilities. Examples include GPUDirect which allows GPUs to communicate between each other inside a single compute node or between compute nodes. This avoids the CPU allowing for direct communication, without overhead. 

All these types of interfaces or compute engines in various types are driving a need to take advantage of these resources through a simplified, high level application programming interface (API), so we are seeing more unified implementations of high level abstraction based APIs.  Intel’s efforts around Intel OneAPI being one such example, where Intel offers a programmable abstraction that can utilise the various compute devices, through a single implementation.

Bare metal clouds

The drive for on demand computing is resulting in a need for public and private clouds with bare metal provisioning capabilities. Public clouds, for example, are taking advantage of DPUs to create bare metal instance types, where bare metal servers can effectively be spun up like virtualised instances while avoiding the overhead of virtualisation. This is done by taking advantage of several DPU features, for example network virtualisation. Access control can be applied on the smartnic limiting the server’s access to the customer’s network. 

Another important feature is mounting network storage, such as ephemeral storage devices, as PCI based storage devices exposed to the host. This effectively makes the bare metal server stateless; it’s only temporarily providing access to local resources as long as the instance is spun up. But once it’s shut down there is nothing stopping the cloud provider from spinning up the instance elsewhere as the state is external to the actual bare metal server.

Enterprises, governments and academics continue to look for more cloud-like functionality while remaining on-premise, either utilising their own or external data centres This is often the case for HPC workloads, where performance is key.  As a result, demand for private clouds with bare metal provisioning capabilities is increasing. To meet this need, companies can rely on private cloud solutions with bare metal provisioning capabilities or use bare metal provisioners with virtualisation management capabilities such as MAAS, depending on their multi-tenancy needs.

Rapid growth of data centres in the recent years has seen them take advantage of compute-focused workload needs. Many are looking to add more cloud-like functionalities that improve delivery and automation overall. So there has been a huge growth in data centres looking to become local cloud providers with a dedicated focus on providing on demand bare metal resources focused on HPC and AI/ML, taking advantage of the changing needs of enterprises, government and the academic sector.

Quantum computing

Often described as being only five to ten years away, quantum computing remains currently out of reach for most. But there are a lot of ongoing initiatives to build the next generation of quantum computers. There are examples of some early quantum computing capabilities being deployed. And in some cases even available for experimentation in a cloud format. For example, the Google Quantum Computing Service makes quantum computing primitives available in the cloud.

With the amount of research being put into building the next generation of quantum computers, we are sure to see more use in an even more accessible format.

Alternative architectures

Composable computers or composable hardware, is starting to be a more prominent topic these days. This is where there is an active development around making the components of a server effectively pluggable. Need more memory? Just bring in a server. It’s kind of a modern reincarnation of the old mainframes and supercomputers, where you have a single machine that can be endlessly extended. DPUs kind of feed into this narrative. For example, they give you the option of attaching network-based resources as PCI devices on the host server. So think of a network-based block storage or a NVMe over fabric storage device attached as a local storage device. Using this means you don’t need local storage, effectively.

Alternative architectures are growing in use. For example, ARM based CPUs are seeing growing usage in data centres and Ampere servers catered to data centre usage are starting to become rapidly available in the market. You can even easily get cloud instances based on ARM-based servers. This is likely to continue and there is even an upcoming server from Nvidia equipped with a new Grace CPU and ARM-based CPU coming in the next generation of server releases.

Some supercomputing clusters are starting to consider RISC-V. For example, there is a European initiative called the European Processor Initiative that aims to get RISC-V based CPUs into future clusters. 

Workloads and scheduling

Large scale cluster-based scheduling is an actively developing area. No longer is scheduling limited to HPC. There have been developments around scheduling of cloud-native workloads that might not necessarily need the performance or specialised resource dependencies of HPC workloads.

The main scheduler in recent years for such cloud-native workloads has been Kubernetes. Canonical offers two distributions: Charmed Kubernetes and Microk8s for those who need to schedule cloud-native workloads. There are active initiatives to add more advanced scheduling capabilities to Kubernetes often through the development of addons. One such example is the Volcano.sh addon, which tries to bring some of the additional complexities of scheduling compute-intensive workloads in Kubernetes.

Traditional HPC scheduling is becoming easier to use. For example, Open OnDemand adds a UI and predefined workload definitions available on demand for traditional SLURM HPC clusters to ease overall cluster usage.

Summary

This is a very exciting time for HPC, as we are seeing a lot of innovation in the space. Servers are taking on advanced capabilities in terms of compute and networking and we are seeing compatibility increase. Overall cluster deployment and usage is being driven by ease of use.
If you are interested in learning more, take a look at the previous blog in the series “Open Source in HPC” or at this video of how Scania is mastering multi cloud for HPC systems with Juju.  Alternatively, dive into some of our other HPC content.

Related posts

Open Source in HPC [part 5]

Introduction to open source HPC focused applications and components. […]

High-performance computing (HPC) cluster architecture [part 4]

This article describes what HPC clusters are, their various components and their particular purposes. It also covers the different types of HPC clusters. […]

Join our Ubuntu circle

The Ubuntu circle: We are because you are The MAAS 3.3 Beta 1 release is out.  You should take a look. Normally, a blog like this would wait for the final release.  And that blog will still happen, later, but this feels like a watershed moment: There are some significant new features, including better search […]