Create a globally scalable (multi-region) compute partition with Fluid Numerics' Slurm-GCP

Last Updated: 2021-07-26

In this article, we assume that you have already completed the "Create a Research Computing Cluster on Google Cloud" codelab and have an existing fluid-slurm-gcp cluster.

Use case for multi-region cluster configurations on Google Cloud Platform

Some applications and algorithms in high performance computing exhibit near perfect weak scaling. In this case, compute processes are independent and require no inter-communication or synchronization. This is typical of workflows where a large number of serial or multi-threaded applications are launched to independently process data (e.g. images and video, numerical simulation output, environmental data). This type of work is also known as high throughput computing (HTC).

For some HTC workloads, you may find yourself needing to pool resources across multiple GCP regions to keep pace with data inflow rates. Additionally, having compute resources available across multiple regions further protects you from interruptions caused by service failures that can occur in a single GCP zone or region.

In this introductory section of the codelab, we will cover the basics of GCP regions and zones. We'll then use this information to design a compute partition configuration that is globally scalable. In the following sections, we will walk you through implementing this globally scalable design with fluid-slurm-gcp.

Regions, Zones, and Subnets

Google Cloud Platform (GCP) offers on-demand access to compute, storage, and networking resources worldwide. A region is a specific geographical location where you can host your resources. Each region has one or more zones that correspond to distinct data-centers in that geographical location. As an example, europe-west1-a and europe-west1-b are two different zones in the europe-west1 region. Geographically, europe-west1 is located in St. Ghislain, Belgium.

From Google Cloud's Compute Engine documentation :

"Putting resources in different zones in a region provides isolation from most types of physical infrastructure and infrastructure software service failures. Putting resources in different regions provides an even higher degree of failure independence. This allows you to design robust systems with resources spread across different failure domains."

Virtual Private Cloud (VPC) networks provide connectivity between compute instances, GCP services, and 3rd party system. Subnetworks are regional resources. This means that, for each region you wish to deploy compute instances, you will need a subnetwork defined in that region. Traffic between subnetworks and on-premise systems is controlled by network firewall rules.

GCP projects come with a default network with pre-populated firewall rules and a subnetwork for each region.

Infrastructure

Fluid-slurm-gcp provides a schema that you can use to describe what machines you want to use on GCP, which regions and zones to deploy them in, and what partitions in slurm to align them with. This allows you to have identical machines spread across multiple region in a single globally scalable compute partition.

This schematic provides an example of a fluid-slurm-gcp deployment in the us-west-1 and europe-west-1 regions (The Dalles, Oregon, USA and St. Ghislain, Belgium). The login and controller instances reside in a single zone of us-west-1.

The compute partition (called "globally-scalable") has two sets of machines : compute-us-w1-b-* and compute-eu-w1-b-*. Each machine set is deployed in their own subnetwork corresponding to the region they will be deployed to. In this configuration, users can submit jobs to the globally-scalable partition and the Slurm job scheduler will schedule jobs to run in any of these regions.

What you will build

In this codelab, you are going to configure a globally scalable (multi-region) compute partition on an existing fluid-slurm-gcp HPC cluster on Google Cloud Platform.

What you will learn

Basics of the cluster-config schema
Basics of cluster-services
How to create a globally scalable (multi-region) compute partition

What you will need

GSuite, Cloud Identity, or Gmail Account
Google Cloud Platform Project with Billing enabled
An existing fluid-slurm-gcp deployment
Project owner, Compute OS Admin Login, and Service Account User roles for your account on your GCP Project
Sufficient Compute Engine Quota in two GCP regions
Basic Command-Line Linux Experience

Fluid Numerics' slurm-gcp comes with a command-line tool called cluster-services that is used to manage available compute nodes, compute partitions, Slurm user accounting, and network attached storage. You can update your cluster configuration by providing cluster-services a .yaml file that defines a valid cluster configuration. By default, cluster-services looks for a cluster configuration file in /apps/cls/etc/cluster-config.yaml . Alternatively, you can use cluster-services to report your current cluster configuration that you can then modify.

Modifying compute partitions must be done on the controller instance of your cluster with root privileges. This is required, since the Slurm controller daemon is restarted when partitions are modified to allow the changes to take effect.

In this section, you will use cluster-services to create a valid cluster configuration yaml that you will modify in the next section.

Navigate to Compute Engine > VM Instances in the Products and Services menu
Click "SSH" to the right of your controller instance and wait for the terminal on the controller to become active.
Generate a cluster-configuration file that describes your current cluster configuration.

$ sudo su
[root]# cluster-services list all > config.yaml

Open the config.yaml file in either nano, emacs, or vim. The contents of this configuration file are described in detail in Fluid Numerics' cluster-config schema documentation. We will focus on the partitions configuration, which begins on line 10

  1 compute_image: projects/fluid-cluster-ops/global/images/fluid-slurm-gcp-compute-centos-v2-3-0
  2 compute_service_account: default
  3 controller:
  4   project: fluid-slurm-gcp-codelabs
  5   region: us-west1
  6   vpc_subnet: https://www.googleapis.com/compute/v1/projects/fluid-slurm-gcp-codelabs/regions/us-west1/subnetworks/default
  7   zone: us-west1-b
  8 controller_image: projects/fluid-cluster-ops/global/images/fluid-slurm-gcp-controller-centos-v2-3-0
  9 controller_service_account: default
 10 default_partition: partition-1
 11 login:
 12 - project: fluid-slurm-gcp-codelabs
 13   region: us-west1
 14   vpc_subnet: https://www.googleapis.com/compute/v1/projects/fluid-slurm-gcp-codelabs/regions/us-west1/subnetworks/default
 15   zone: us-west1-b
 16 login_image: projects/fluid-cluster-ops/global/images/fluid-slurm-gcp-login-centos-v2-3-0
 17 login_service_account: default
 18 mounts: []
 19 munge_key: ''
 20 name: fluid-slurm-gcp-1
 21 partitions:
 22 - labels:
 23     goog-dm: fluid-slurm-gcp-1
 24   machines:
 25   - disable_hyperthreading: false
 26     disk_size_gb: 15
 27     disk_type: pd-standard
 28     external_ip: false
 29     gpu_count: 0
 30     gpu_type: nvidia-tesla-v100
 31     image: projects/fluid-cluster-ops/global/images/fluid-slurm-gcp-compute-centos-v2-3-0
 32     local_ssd_mount_directory: /scratch
 33     machine_type: n1-standard-2
 34     max_node_count: 10
 35     n_local_ssds: 0
 36     name: partition-1
 37     preemptible_bursting: false
 38     static_node_count: 0
 39     vpc_subnet: https://www.googleapis.com/compute/v1/projects/fluid-slurm-gcp-codelabs/regions/us-west1/subnetworks/default
 40     zone: us-west1-b
 41   max_time: INFINITE
 42   name: partition-1
 43   project: fluid-slurm-gcp-codelabs
 44 slurm_accounts: []
 45 slurm_db_host: {}
 46 suspend_time: 300
 47 tags:
 48 - default

The partitions definition, in this cluster-configuration file, is specified between lines 21-43. The partitions attribute is a list of objects. Each partitions object has the attributes labels, machines, max_time, name, and project.

The partitions.machines attribute is also a list of objects. Each partitions.machines attribute defines a set of machines that you want to place in the Slurm partition defined by the parent partitions object. Take some time to review the partitions object schema before moving onto the next section.

At this point, you now have a configuration file, config.yaml, in your home directory. You will now add one more machine set to the first partitions object that define identical machine types in different regions.

By the end of this section, you will have a globally scalable, multi-region compute partition. We walk you through building compute partitions that open access to any Google data-center worldwide.

First, use cluster-services to make the changes to your HPC cluster configuration. We will build from the config.yaml file created in the previous section.

Rename the existing partition and machine set

Open config.yaml (generated in the previous section) in your favorite text editor.
Change the partition name on line 42 to globally-scalable

 42   name: globally-scalable

Change the name of the machines in this partition, on line 36 to compute-w1-b

 36   name: gl-compute-us-w1-b

Add machine sets in two additional zones

Copy the machines block in the first-partition specification (lines 25-40) and paste below line 40.
Modify the second machines block zone to europe-west1-b, vpc_subnet to https://www.googleapis.com/compute/v1/projects/[PROJECT-ID]/regions/europe-west1/subnetworks/default, and name to gl-compute-eu-w1-b.

After completing this step, your machines block should look like the following:

 25   - disable_hyperthreading: false
 26     disk_size_gb: 15
 27     disk_type: pd-standard
 28     external_ip: false
 29     gpu_count: 0
 30     gpu_type: nvidia-tesla-v100
 31     image: projects/fluid-cluster-ops/global/images/fluid-slurm-gcp-compute-centos-v2-3-0
 32     local_ssd_mount_directory: /scratch
 33     machine_type: n1-standard-2
 34     max_node_count: 2
 35     n_local_ssds: 0
 36     name: gl-compute-us-w1-b
 37     preemptible_bursting: false
 38     static_node_count: 0
 39     vpc_subnet: https://www.googleapis.com/compute/v1/projects/fluid-slurm-gcp-codelabs/regions/us-west1/subnetworks/default
 40     zone: us-west1-b
 41   - disable_hyperthreading: false
 42     disk_size_gb: 15
 43     disk_type: pd-standard
 44     external_ip: false
 45     gpu_count: 0
 46     gpu_type: nvidia-tesla-v100
 47     image: projects/fluid-cluster-ops/global/images/fluid-slurm-gcp-compute-centos-v2-3-0
 48     local_ssd_mount_directory: /scratch
 49     machine_type: n1-standard-2
 50     max_node_count: 2
 51     n_local_ssds: 0
 52     name: gl-compute-eu-w1-b
 53     preemptible_bursting: false
 54     static_node_count: 0
 55     vpc_subnet: https://www.googleapis.com/compute/v1/projects/fluid-slurm-gcp-codelabs/regions/europe-west1/subnetworks/default
 56     zone: europe-west1-b

Save config.yaml and return to the terminal.

Update your partitions

Use cluster-services to update your cluster-configuration

[root]# cluster-services update partitions --config=config.yaml --preview
 ~ default_partition = partition-1 -> globally-scalable
 ~ partitions[0].machines[0].name = partition-1 -> gl-compute-us-w1-b
 ~ partitions[0].machines[0].zone = us-west1-c -> us-west1-b
 + partitions[0].machines[1] = {'disable_hyperthreading': False, 'disk_size_gb': 15, 'disk_type': 'pd-standard', 'external_ip': False, 'gpu_count': 0, 'gpu_type': 'nvidia-tesla-v100', 'image': 'projects/fluid-cluster-ops/global/images/fluid-slurm-gcp-compute-centos-v2-3-0', 'local_ssd_mount_directory': '/scratch', 'machine_type': 'n1-standard-2', 'max_node_count': 2, 'n_local_ssds': 0, 'name': 'gl-compute-eu-w1-b', 'preemptible_bursting': False, 'static_node_count': 0, 'vpc_subnet': 'https://www.googleapis.com/compute/v1/projects/fluid-shared-vpc-networking/regions/us-west1/subnetworks/fluid-cluster-subnet-usw1', 'zone': 'europe-west1-b'}
[root]# cluster-services update partitions --config=config.yaml

Verify that the partition name is now set to globally-scalable and the two sets of compute instances are available in Slurm

[root]# sinfo
PARTITION         AVAIL  TIMELIMIT  NODES  STATE NODELIST
globally-scalable    up   infinite      4  idle~ gl-compute-us-w1-b-[0-1],gl-compute-eu-w1-b-[0-1]

In the last section, you added machines in us-west1-b and europe-west1-b to be available in a Slurm partition. We will now submit a test job that demonstrates that both regions are used in this configuration.

Navigate back to your login node's terminal.
Use srun to submit a job-step across all 4 nodes in the globally-scalable partition.

$ srun -N4 --partition=globally-scalable hostname
gl-compute-us-w1-b-1
gl-compute-us-w1-b-0
gl-compute-eu-w1-b-1
gl-compute-eu-w1-b-0

It can take anywhere from 1-3 minutes for the nodes to respond with the hostname. When communicating between GCP regions, there is increased network latency between instances. If you monitor the Compute Engine UI, you will be able to see the compute nodes becoming live in us-west1-b (The Dalles, Oregon, USA) and europe-west1-b (St. Ghislain, Belgium)

Congratulations! You have just created and tested a globally scalable HPC computing partition on Google Cloud Platform!

In this codelab, you

Created a valid cluster-configuration file and modified it to customize your cluster
Configured and tested a multi-region (globally scalable) compute partition

What's next?

Learn how to configure a high availability compute partition (multi-zone)

Submit your feedback and request new codelabs using our feedback form

Reference docs

https://help.fluidnumerics.com/slurm-gcp