About the HPE CPE installation guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing systems

The HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems (S-8022) includes procedures to install HPE Cray Programming Environment (CPE) and Parallel Application Launch Service (PALS) with HPE Performance Cluster Manager (HPCM) on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems.

This publication is intended for system administrators receiving their first release of this product or upgrading from a previous release. The information assumes that the administrator understands Linux system administration and HPCM. For information on how to use the HPE Cray Programming environment, see the HPE Cray Programming Environment web page.

Release information

This publication supports installing CPE 25.03 on HPE Cray Supercomputing EX systems with:

  • HPCM 1.13, HPE Cray Supercomputing Operating System (COS) 25.3 (COS Base 3.3/USS 1.3.X), and SLES 15 SP6 (X86 or AArch64);

  • HPCM 1.13, HPE Cray Supercomputing Operating System (COS) 24.7 (COS Base 3.1/USS 1.1.X), and SLES 15 SP5 (X86 or AArch64);

  • HPCM 1.13 and RHEL 9.4 or 9.5 (X86 or AArch64); or

  • HPCM 1.13 and RHEL 8.10 (X86)

COS 23.11 (and later) comprises:

  • COS Base

  • HPE Cray Supercomputing User Services Software (USS)

  • HPE SUSE Linux Enterprise Server

Variable substitutions

Use the following variable substitutions throughout the included procedures.

  • <CPE_RELEASE> = 25.03

  • <CPE_VERSION> = 25.03 (or, if applicable, 25.03.X, where X is the third digit of the actual three-digit version number of an official and HPE-supported version number)

  • <spX> or <SPX> = sp6 or SP6 (as appropriate)

  • <spX> or <SPX> = sp5 or SP5 (as appropriate)

  • <RHELX-X> = rhel-9.5

  • <RHELX-X> = rhel-9.4

  • <RHELX-X> = rhel-8.10

Record of revision

New in the CPE 25.03 (Rev. 1) publication

New in the CPE 25.03 publication

New in the CPE 24.11 publication

New in the CPE 24.07 publication

New in the CPE 24.03 publication

New in the CPE 23.12 publication

New in the CPE 23.09 publication

Publication Title

Date

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems (25.03-Rev. 1) S-8022

April 2025

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems (25.03) S-8022

March 2025

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems (24.11) S-8022

January 2025

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems (24.07) S-8022

August 2024

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (24.03) S-8022

May 2024

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.12) S-8022

December 2023

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.09) S-8022

September 2023

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.05) S-8022

June 2023

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.02-Rev A) S-8022

March 2023

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.02) S-8022

February 2023

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.12) S-8022

December 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.11) S-8022

November 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.10) S-8022

October 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.09) S-8022

September 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.08 Rev A) S-8022

August 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.08) S-8022

August 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.06) S-8022

June 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.05) S-8022

May 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.04) S-8022

April 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.03) S-8022

March 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.02) S-8022

February 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (21.02 - 21.12) S-8022

Feb - Dec 2021

Downloading HPE Cray Supercomputing EX software

To download HPE Cray Supercomputing EX software, refer to the HPE Support Center or download it directly from My HPE Software Center. The HPE Support Center contains a wealth of documentation, training videos, knowledge articles, and alerts for HPE Cray Supercomputing EX systems. It provides the most detailed information about a release as well as direct links to product firmware, software, and patches available through My HPE Software Center.

Downloading the software through the HPE Support Center

HPE recommends downloading software through the HPE Support Center because of the many other resources available on the website.

  1. Visit the HPE Cray Supercomputing EX product page on the HPE Support Center.

  2. Search for specific product info, such as the full software name or recipe name and version.

    For example, search for “Slingshot 2.1” or “Cray System Software with CSM 24.3.0.”

  3. Find the desired software in the search results and select it to review details.

  4. Select Obtain Software and select Sign in Now when prompted.

    If a customer’s Entitlement Order Number (EON) is tied to specific hardware rather than software, the software is available without providing account credentials. Access the software instead by selecting Download Software and skip the next step in this procedure.

  5. Enter account credentials when prompted and accept the HPE License Terms.

    To download software, customers must ensure their Entitlement Order Number (EON) is active under My Contracts & Warranties on My HPE Software Center. If customers have trouble with the EON or are not entitled to a product, they must contact their HPE contract administrator or sales representative for assistance.

  6. Choose the needed software and documentation files to download and select curl Copy to access the files.

    Just like the software files, the documentation files change with each release. In addition to the official documentation, valuable information for a release is often available in files that include the phrase README in their name. Be sure to select and review these files in detail.

    HPE recommends the curl Copy option, which downloads a single text file with curl commands to use on the desired system. You must run the curl commands within 24 hours of downloading them or download new commands if more than 24 hours have passed.

    To validate the security of the downloads, you can later compare the files on the desired system against the checksums provided by HPE underneath each selected download.

  7. Save the text file to a central location.

  8. On the system where the software will be downloaded, run a shell script to execute the text file that includes the curl commands.

    For example:

    ncn-m001# bash -x <TEXT_FILE_PATH>
    

    The -x option in this example tracks the download progress of each curl command in the text file.

Downloading the software directly from the My HPE Software Center

Users already familiar with a release can save time by downloading software directly from My HPE Software Center.

  1. Visit My HPE Software Center and select Sign in.

  2. Enter account credentials when prompted and select Software in the left navigation bar.

  3. Search for specific product info, such as the full software name or recipe name and version.

    For example, search for “Slingshot 2.1” or “Cray System Software with CSM 24.3.0.”

  4. Find the desired software in the search results and review details by selecting Product Details under the Action column.

    Image of Product Detailsoption

  5. Select Go To Downloads Page and accept the HPE License Terms.

    To download software, customers must ensure their Entitlement Order Number (EON) is active under My Contracts & Warranties. If customers have trouble with the EON or are not entitled to a product, they must contact their HPE contract administrator or sales representative for assistance.

  6. Choose the needed software and documentation files to download and select curl Copy to access the files.

    Just like the software files, the documentation files change with each release. In addition to the official documentation, valuable information for a release is often available in files that include the phrase README in their name. Be sure to select and review these files in detail.

    HPE recommends the curl Copy option, which downloads a single text file with curl commands to use on the desired system. You must run the curl commands within 24 hours of downloading them or download new commands if more than 24 hours have passed.

    To validate the security of the downloads, you can later compare the files on the desired system against the checksums provided by HPE underneath each selected download

  7. Save the text file to a central location.

  8. On the system where the software will be downloaded, run a shell script to execute the text file that includes the curl commands.

    For example:

    ncn-m001# bash -x <TEXT_FILE_PATH>
    

    The -x option in this example tracks the download progress of each curl command in the text file.

Installation Prerequisites

For systems that include Scalable Unit (SU) leaders, HPE only supports NFS for CPE deployments using leader aliases. HPCM does not support exporting the filesystem from the admin node if leaders are in place. All nodes deploying CPE subsequently must have a leader IP alias assigned. Use the scalable bittorrent transport to accommodate these requirements. Compute nodes on an HPE Cray Supercomputing EX system are often already set up with SU leader aliases during auto-discovery. Any other nodes must be manually configured. Note also that multiple nodes can be managed together by passing a list or range of nodes to -n.

  1. Set up the transport for service nodes. Use bittorrent for the transport for service nodes.

    IMPORTANT: This setup is required, even if you are not reinstalling the node. The running node is not affected; the node is affected only if it is provisioned again in the future.

    admin# cm node set --transport bittorrent -n n1

  2. Create the bittorrent tarball if it is not available yet:

    admin# cm image refresh --bittorrent -i ubu-22.04.1

  3. Get the IP address of a SU leader node. The example below uses leader1, but any SU leader node can be used:

    admin# ssh leader1 ctdb ip

  4. Pick one of the IP addresses returned in the previous step, and assign it to the nodes:

    admin# cm node set -n n1 --su-leader 172.23.255.1

Install HPE Cray Programming Environment on SUSE Linux Enterprise Server

OBJECTIVE

This procedure provides instructions for installing Cray Programming Environment (CPE), and optionally sets Lmod as the default module handling system.

OPTIONAL

  • For HPE Cray Supercomputing EX or HPE Cray Supercomputing systems with GPU compute nodes that are not running (COS), the rocm/x.x.x or cudatoolkit/x.x.x GPU toolkit modulefile is required. Go online here for environment modulefiles and pkg-config file templates.

  • Systems running COS typically have GPU toolkit modulefiles pre-installed and ready for use.

PROCEDURE

Before you begin

  1. Obtain required cpe-<CPE_VERSION>-sles15-<spX>-<ARCH>.iso ISO files.

  2. Enable repositories (for the installation of cpe-support), including:

    • SLE Module Basesystem

    • SLE Module HPC

    • USS

  3. Note that to use UCX with Cray-MPICH:

    • Cray-MPICH using the UCX netmod is supported on SLES 15 or other systems with the HPCM installer.

    • HPE does not distribute UCX directly.

    • Mellanox provides a UCX solution as a part of their HPC-X software toolkit. This solution is the recommended path. Open source and Linux distribution packages provide a functional, although not necessarily performant, alternative.

  4. Set the ENV variables:

    OS_NAME=sles
    OS_MAJOR=15
    OS_MINOR=6
    OS_NAME_VERSION=${OS_NAME}${OS_MAJOR}-sp${OS_MINOR}
    ARCH=x86_64
    
    CPE_VERSION=25.3.X
    CPE_ISO=cpe-${CPE_VERSION}-${OS_NAME_VERSION}-${ARCH}.iso
    CPE_REPO=CPE-${CPE_VERSION}-${OS_NAME_VERSION}-${ARCH}
    CPE_REPO_GROUP=${CPE_REPO}-recipe
    CPE_IMAGE_NAME=cpe-${CPE_VERSION}-${OS_NAME}${OS_MAJOR}_${OS_MINOR}-${ARCH}
    
    BOOT_IMAGE_NAME=my-image
    

    Replace CPE_VERSION=25.3.X (above) with the actual three-digit version being installed.

Building the CPE image

  1. Add the CPE repository from the ISO:

    admin# cm repo add ${CPE_ISO} --priority 99

  2. Create a repository group for the CPE recipe. This group should:

    • Include repositories for CPE and versions of HPCM, SLES, and either USS or USS-wlm-support, and

    • Match the compute nodes.

    Note: CPE 25.03 requires gcc14 packages from the SLES update repositories.

    admin# cm repo group add ${CPE_REPO_GROUP} --repos \                          
    ${CPE_REPO} \                                                                 
    Cluster-Manager-1.13-sles15sp6-x86_64 \                                       
    SLE-15-SP6-Full-x86_64 \                                                      
    USS-1.3.0-82-cos-base-3.3-x86_64 \                                            
    HPE-SLE-Updates-Module-Basesystem-15-SP6-x86_64-24.11.250127-sles15-sp6-x86_64 \
    HPE-SLE-Updates-Module-Development-Tools-15-SP6-x86_64-24.11.250127-sles15-sp6-x86_64 \
    HPE-SLE-Updates-Module-HPC-15-SP6-x86_64-24.11.250127-sles15-sp6-x86_64 \     
    HPE-SLE-Updates-Module-Legacy-15-SP6-x86_64-24.11.250127-sles15-sp6-x86_64 \  
    HPE-SLE-Updates-Product-SLES-15-SP6-x86_64-24.11.250127-sles15-sp6-x86_64  
    

    Note: Check supporting documentation for exact names and versions of SLE and USS repositories.

  3. Create a copy of cpe.rpmlist located in the CPE repository. Optionally, skip this step if no modifications are made to the default rpmlist.

    admin# cp /opt/clmgr/repos/cm/${CPE_REPO}/cpe.rpmlist ${HOME}/cpe-${CPE_VERSION}.rpmlist
    
  4. (Optional) Modify the rpmlist to include or exclude necessary components. Additional compiler programming environments can be added by un-commenting the existing subsections in the rpmlist.

    admin# vim ${HOME}/cpe-${CPE_VERSION}.rpmlist
    
    Subcomponents predefined in the provided `rpmlist`, include (but are not limited to):
    
    # # --- Base ---
    cpe-gcc-native-12.2
    cpe-gcc-native-13.1
    ...
    
    # # --- CCE ---
    cce-19.0.0
    cce-19.0.0-binutils
    ...
    
    # # --- CSML ---
    cray-fftw-3.3.10.9
    cray-hdf5-1.14.3.3
    ...
    
    # # --- MPT ---
    cray-dsmml-0.3.0
    cray-mpich-8.1.31-cray180
    ...
    
    # # --- TOOLS ---
    atp-3.15.5
    cray-ccdb-5.0.5
    ...
    
    # # --- PrgEnv-amd ---
    #cpe-descriptive-manifest-amd
    #cpe-prgenv-amd
    ...
    
    # # --- PrgEnv-aocc ---
    #cpe-descriptive-manifest-aocc
    #cpe-prgenv-aocc
    ...
    
    # # --- PrgEnv-intel ---
    #cpe-descriptive-manifest-intel
    #cpe-prgenv-intel
    ...
    
    # # --- PrgEnv-nvidia ---
    #cpe-descriptive-manifest-nvidia
    #cpe-prgenv-nvidia
    ...
    
    # # --- HPCM (Do not remove) ---
    sgi-cluster
    

    Important: HPCM 1.13 (and later) supports non-bootable images. These images do not require a kernel for HPCM 1.12 (and older) kernel packages will be required to build an image. In this case, make sure to add the appropriate kernel package for the target OS.

  5. Add cpe-module and cpe-descriptive-manifest-base packages to the rpmlist:

    admin# printf "cpe-module-25.03\n" >> cpe-${CPE_VERSION}.rpmlist
    admin# printf "cpe-descriptive-manifest-base\n" >> cpe-${CPE_VERSION}.rpmlist
    
  6. Create a non-bootable CPE image using the modified CPE rpmlist and CPE recipe repository group. The default rpmlist from the CPE repository directory can also be used if no modifications were made.

    Important: The image name must follow the same pattern as the one shown in the example; otherwise, the image is not mounted later. Specifically, the image name must begin with cpe- and end with the OS and architecture. For example:

    • cpe-*-<os_name><os_major_version>_<os_minor_version>-<architecture>

    • cpe-25.3.3-sles15_6-x86_64

    • cpe-25.3.3-sles15_6-aarch64

    For the image name, note that flexibility is allowed for the cited version.

    Important: HPCM 1.13 introduced the --non-bootable option to build non-bootable images without a kernel. For HPCM 1.12 (and older), you must remove the --non-bootable option from the image create command.

    admin# cm image create --non-bootable \
    -i ${CPE_IMAGE_NAME} \
    --pkglist ${HOME}/cpe-${CPE_VERSION}.rpmlist \
    --repo-group ${CPE_REPO_GROUP}
    
  7. Install the CPE defaults package into the image. This package sets all appropriate packages for the release as the default in the image.

    admin# cm image zypper -i ${CPE_IMAGE_NAME} --repo-group \
    ${CPE_REPO_GROUP} install cpe-defaults
    
  8. Activate the CPE image to create a squashfs file, which will be visible to the compute nodes:

    Note: The kdump.server failure message can be safely ignored since the image is built without a kernel.

    admin# cm image activate -i ${CPE_IMAGE_NAME}
    

Enabling CPE in the boot image and provision nodes

  1. Enable the CPE repository:

    admin# cm repo select ${CPE_REPO}
    

    Note that adding the CPE repository to a repository group automatically adds the required CPE packages to the auto-generated rpmlist for the corresponding repository group. This method is useful for when a new image is built.

  2. For each compute and service node image to include CPE:

    Note: Use the cm image show command to display available images.

    1. Import the RPM public key:

      admin# rpm --root /opt/clmgr/image/images/${BOOT_IMAGE_NAME} \
      --import /opt/clmgr/repos/cm/${CPE_REPO}/*.public
      
    2. Install the cm-pe-integration and cpe-support RPMs, included in the CPE repository, into the image:

      admin# cm image zypper -i ${BOOT_IMAGE_NAME} install cm-pe-integration cpe-support
      
    3. (Optional) For CPE releases before 25.03, modify /opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/pe_releases to indicate which CPE releases to install:

      21.09
      21.07
      

      Note: CPE version 25.03 (and later) should not be added to this file.

    4. (Optional) While you can use either the default CPE Module Environment system or Lmod on a sitewide basis (as the systems are mutually exclusive and cannot both run on the same system), set Lmod as the default module handling system in the image:

      1. In /opt/clmgr/image/images/${BOOT_IMAGE_NAME}/etc/cray-pe.d/cray-pe-configuration.csh, change the module_prog variable:

        set module_prog="environment modules"
        set module_prog="lmod"
        
      2. In /opt/clmgr/image/images/${BOOT_IMAGE_NAME}/etc/cray-pe.d/cray-pe-configuration.sh, change the module_prog variable:

        module_prog="environment modules"
        module_prog="lmod"
        
    5. (Optional) Update revision history with a comment if you are using HPCM image revision management:

      admin# cm image revision commit -i ${BOOT_IMAGE_NAME} -m "Update CPE to ${CPE_VERSION}"
      
    6. (For disk-less nodes) Activate the image:

      admin# cm image activate -i ${BOOT_IMAGE_NAME}
      

      See instructions for initializing an image in the HPE Performance Cluster Manager Administration Guide for all other node configurations.

    7. Reboot one compute node based on your system. For example:

      Diskless Node Example

      admin# cm power reboot
      

      or

      Diskful Node Example

      admin# cm node provision -i ${BOOT_IMAGE_NAME}
      

      See the HPE Performance Cluster Manager Administration Guide for reboot options for your specific environment.

    8. Connect to the booted compute node, and verify that CPE modules are loaded.

      Example:

      admin# ssh nid0001
      nid0001# module list
      Currently Loaded Modules:
      1) craype-x86-rome          5) cce/17.0.0           9) cray-libsci/23.12.5
      2) libfabric/1.15.2.0       6) craype/2.7.30       10) PrgEnv-cray/8.5.0
      3) craype-network-ofi       7) cray-dsmml/0.2.2
      4) perftools-base/23.12.0   8) cray-mpich/8.1.28
      

      See the HPE HPCM release announcement for current CPE release product versions on the HPE Cray Programming Environment website.

    9. Reboot the remaining compute nodes on the system if the installation appears correct.

  3. Disable the CPE repository after all images are updated:

    admin# cm repo unselect ${CPE_REPO}
    

Install HPE Cray Programming Environment on Red Hat Enterprise Linux

OBJECTIVE

This procedure provides instructions for installing Cray Programming Environment (CPE), and optionally sets Lmod as the default module handling system

OPTIONAL

For HPE Cray EX or HPE Cray supercomputer systems with GPU compute nodes and not running the Cray Operating System (COS):

  • If a rocm/x.x.x or cudatoolkit/x.x.x GPU toolkit modulefile is required, refer to the GPU Toolkit Modulefile Templates for Cray PE, which provides environment modulefiles and pkg-config file templates.

  • Systems running COS typically have GPU toolkit modulefiles pre-installed and ready for use.

IMPORTANT: Throughout this procedure, replace instances of:

  • <CPE_RELEASE>

  • <CPE_VERSION>

  • <RHELX-X>

with the values specified in Release Information.

PROCEDURE

Before you begin

  1. Obtain the cpe-<CPE_VERSION>-<RHELX-X>-<ARCH>.iso ISO files.

  2. Download and enable required OS repositories (for the installation of cpe-support), including:

    • RHEL BaseOS

    • RHEL AppStream

    • EPEL

    • USS WLM Support

  3. Note that to use UCX with HPE Cray-MPICH:

    • HPE Cray-MPICH using the UCX netmod is supported on RHEL 8.7 systems with the HPCM installer.

    • HPE does not distribute UCX directly.

    • Mellanox provides a UCX solution as a part of their HPC-X software toolkit. This solution is the recommended path. Open source and Linux distribution packages provide a functional, although not necessarily performant, alternative.

  4. Set the ENV variables:

    OS_NAME=rhel
    OS_MAJOR=9
    OS_MINOR=5
    OS_NAME_VERSION=${OS_NAME}-${OS_MAJOR}-${OS_MINOR}
    ARCH=x86_64
    
    CPE_VERSION=25.3.X
    CPE_ISO=cpe-${CPE_VERSION}-${OS_NAME_VERSION}-${ARCH}.iso
    CPE_REPO=cpe-${CPE_VERSION}-${OS_NAME_VERSION}-${ARCH}
    CPE_REPO_GROUP=${CPE_REPO}-recipe
    CPE_IMAGE_NAME=cpe-${CPE_VERSION}-${OS_NAME}${OS_MAJOR}_${OS_MINOR}-${ARCH}
    
    BOOT_IMAGE_NAME=my-image
    

    Replace CPE_VERSION=25.3.X (above) with the actual three-digit version being installed.

Building the CPE image

  1. Add the CPE repository from the ISO:

    admin# cm repo add ${CPE_ISO} --priority 99

  2. Create a repository group for the CPE recipe. This group should:

    • Include repositories for CPE and versions of HPCM, RHEL, EPEL, and either USS or USS-wlm-support, and

    • Match the compute nodes.

    Note: RHEL AppStream updates may be required for gcc support.

    admin# cm repo group add ${CPE_REPO_GROUP} --repos \
    ${CPE_REPO} \
    Cluster-Manager-1.13-rhel95-x86_64 \
    Red-Hat-Enterprise-Linux-9.5.0-x86_64 \
    EPEL-9 \
    USS-1.3.0-84-rhel-9.5-rhel9.5-x86_64
    

    Note: Check supporting documentation for exact names and versions of USS repositories.

  3. Create a copy of cpe.rpmlist located in the CPE repository. Optionally, skip this step if no modifications are made to the default rpmlist.

    admin# cp /opt/clmgr/repos/cm/${CPE_REPO}/cpe.rpmlist ${HOME}/cpe-${CPE_VERSION}.rpmlist
    
  4. (Optional) Modify the rpmlist to include or exclude necessary components. Additional compiler programming environments can be added by un-commenting the existing subsections in the rpmlist.

    admin# vim ${HOME}/cpe-${CPE_VERSION}.rpmlist
    
    Subcomponents predefined in the provided `rpmlist`, include (but are not limited to):
    
    # # --- Base ---
    cpe-gcc-native-12.2
    cpe-gcc-native-13.1
    ...
    
    # # --- CCE ---
    cce-19.0.0
    cce-19.0.0-binutils
    ...
    
    # # --- CSML ---
    cray-fftw-3.3.10.9
    cray-hdf5-1.14.3.3
    ...
    
    # # --- MPT ---
    cray-dsmml-0.3.0
    cray-mpich-8.1.31-cray180
    ...
    
    # # --- TOOLS ---
    atp-3.15.5
    cray-ccdb-5.0.5
    ...
    
    # # --- PrgEnv-amd ---
    #cpe-descriptive-manifest-amd
    #cpe-prgenv-amd
    ...
    
    # # --- PrgEnv-aocc ---
    #cpe-descriptive-manifest-aocc
    #cpe-prgenv-aocc
    ...
    
    # # --- PrgEnv-intel ---
    #cpe-descriptive-manifest-intel
    #cpe-prgenv-intel
    ...
    
    # # --- PrgEnv-nvidia ---
    #cpe-descriptive-manifest-nvidia
    #cpe-prgenv-nvidia
    ...
    
    # # --- HPCM (Do not remove) ---
    sgi-cluster
    

    Important: HPCM 1.13 (and later) supports non-bootable images. These images do not require a kernel for HPCM 1.12 (and older) kernel packages will be required to build an image. In this case, make sure to add the appropriate kernel package for the target OS.

  5. Add cpe-module and cpe-descriptive-manifest-base packages to the rpmlist:

    admin# printf "cpe-module-25.03\n" >> cpe-${CPE_VERSION}.rpmlist
    admin# printf "cpe-descriptive-manifest-base\n" >> cpe-${CPE_VERSION}.rpmlist
    
  6. Create a non-bootable CPE image using the modified CPE rpmlist and CPE recipe repository group. The default rpmlist from the CPE repository directory can also be used if no modifications were made.

    Important: The image name must follow the same pattern as the one shown in the example; otherwise, the image is not mounted later. Specifically, the image name must begin with cpe- and end with the OS and architecture. For example:

    • cpe-*-<os_name><os_major_version>_<os_minor_version>-<architecture>

    • cpe-25.3.3-rhel9_5-x86_64

    • cpe-25.3.3-rhel9_5-aarch64

    For the image name, note that flexibility is allowed for the cited version.

    Important: HPCM 1.13 introduced the --non-bootable option to build non-bootable images without a kernel. For HPCM 1.12 (and older), you must remove the --non-bootable option from the image create command.

    admin# cm image create --non-bootable \
    -i ${CPE_IMAGE_NAME} \
    --pkglist ${HOME}/cpe-${CPE_VERSION}.rpmlist \
    --repo-group ${CPE_REPO_GROUP}
    
  7. Install the CPE defaults package into the image. This package sets all appropriate packages for the release as the default in the image.

    admin# cm image dnf -i ${CPE_IMAGE_NAME} --repo-group \
    ${CPE_REPO_GROUP} install cpe-defaults
    
  8. Activate the CPE image to create a squashfs file, which will be visible to the compute nodes:

    Note: The kdump.server failure message can be safely ignored since the image is built without a kernel.

    admin# cm image activate -i ${CPE_IMAGE_NAME}
    

Enabling CPE in the boot image and provision nodes

  1. Enable the CPE repository:

    admin# cm repo select ${CPE_REPO}
    

    Note that adding the CPE repository to a repository group automatically adds the required CPE packages to the auto-generated rpmlist for the corresponding repository group. This method is useful for when a new image is built.

  2. For each compute and service node image to include CPE:

    Note: Use the cm image show command to display available images.

    1. Import the RPM public key:

      admin# rpm --root /opt/clmgr/image/images/${BOOT_IMAGE_NAME} \
      --import /opt/clmgr/repos/cm/${CPE_REPO}/*.public
      
    2. Install the cm-pe-integration and cpe-support RPMs, included in the CPE repository, into the image:

      admin# cm image dnf -i ${BOOT_IMAGE_NAME} install cm-pe-integration cpe-support
      
    3. (Optional) For CPE releases before 25.03, modify /opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/pe_releases to indicate which CPE releases to install:

      21.09
      21.07
      

      Note: CPE version 25.03 (and later) should not be added to this file.

    4. (Optional) While you can use either the default CPE Module Environment system or Lmod on a sitewide basis (as the systems are mutually exclusive and cannot both run on the same system), set Lmod as the default module handling system in the image:

      1. In /opt/clmgr/image/images/${BOOT_IMAGE_NAME}/etc/cray-pe.d/cray-pe-configuration.csh, change the module_prog variable:

        set module_prog="environment modules"
        set module_prog="lmod"
        
      2. In /opt/clmgr/image/images/${BOOT_IMAGE_NAME}/etc/cray-pe.d/cray-pe-configuration.sh, change the module_prog variable:

        module_prog="environment modules"
        module_prog="lmod"
        
    5. (Optional) Update revision history with a comment if you are using HPCM image revision management:

      admin# cm image revision commit -i ${BOOT_IMAGE_NAME} -m "Update CPE to ${CPE_VERSION}"
      
    6. (For disk-less nodes) Activate the image:

      admin# cm image activate -i ${BOOT_IMAGE_NAME}
      

      See instructions for initializing an image in the HPE Performance Cluster Manager Administration Guide for all other node configurations.

    7. Reboot one compute node based on your system. For example:

      Diskless Node Example

      admin# cm power reboot
      

      or

      Diskful Node Example

      admin# cm node provision -i ${BOOT_IMAGE_NAME}
      

      See the HPE Performance Cluster Manager Administration Guide for reboot options for your specific environment.

    8. Connect to the booted compute node, and verify that CPE modules are loaded.

      Example:

      admin# ssh nid0001
      nid0001# module list
      Currently Loaded Modules:
      1) craype-x86-rome          5) cce/17.0.0           9) cray-libsci/23.12.5
      2) libfabric/1.15.2.0       6) craype/2.7.30       10) PrgEnv-cray/8.5.0
      3) craype-network-ofi       7) cray-dsmml/0.2.2
      4) perftools-base/23.12.0   8) cray-mpich/8.1.28
      

      See the HPE HPCM release announcement for current CPE release product versions on the HPE Cray Programming Environment website.

    9. Reboot the remaining compute nodes on the system if the installation appears correct.

  3. Disable the CPE repository after all images are updated:

    admin# cm repo unselect ${CPE_REPO}
    

Configuring the ATP Slurm SPANK plugin (Conditional)

ATP requires a Slurm plugin to start analysis tools alongside job launches. Before HPE CPE 22.10, ATP included a global Slurm plugin file. This global plugin file had to be recompiled to match the updated Slurm plugin API when Slurm was updated. To resolve this requirement, since HPE CPE 22.10, ATP has been designed to build and configure its plugin as part of the module loading process instead of relying on a single global plugin. If your Slurm system is configured to use the global ATP Slurm plugin and job launches are working as expected, it is not necessary to remove it from the system configuration.

Note that the following error might occur:

srun: error: spank: /opt/cray/pe/atp/libAtpDispatch.so: Incompatible plugin version

If you see the above error when running Slurm jobs, remove the include /etc/plugstack.conf.d/* line from your Slurm plugin configuration file to disable the global ATP plugin. This modification disables the use of the potentially outdated ATP plugin. Users will still have the correct plugin built and configured when loading the ATP module.

Create Modulefiles for third-party products

PREREQUISITES

Download and install third-party packages before initiating this procedure.

OBJECTIVE

These instructions describe how to create a modulefile for third-party products and use crypkg-gen to create a modulefile for a specific version of a supported third-party product. This usage allows a site to set a specific version as default.

PROCEDURE

The following steps are necessary and can be embedded in a script where a third-party product is being installed.

  1. Load craypkg-gen module:

    admin# source /opt/cray/pe/modules/default/init/bash
    admin# module use /opt/cray/pe/modulefiles
    admin# module load craypkg-gen
    
  2. Generate module and set default scripts for products:

    AMD Optimizing C/C++ Compiler: (requires craypkg-gen >= 1.3.16)

    admin# craypkg-gen -m /opt/AMD/aocc-compiler-<MODULE_VERSION>/
    

    NVIDIA HPC SDK (requires craypkg-gen >= 1.3.16)

    admin# craypkg-gen -m /opt/nvidia/hpc_sdk/Linux_x86_64/<MODULE_VERSION>/
    

    Intel oneAPI

    admin# craypkg-gen -m /opt/intel/oneapi/compiler/<MODULE_VERSION>/
    

    Note: The Intel compiler of the Intel OneAPI release needs to have been installed in a directory or a symbolic link that follows the <PREFIX>/oneapi/compiler/<VERSION> format be incorporated in order for craypkg-gen to create the Intel modulefiles. The craypkg-gen utility creates the intel and intel-oneapi modulefiles after the process completes successfully.

    AMD ROCm

    admin# craypkg-gen -m /opt/rocm-<MODULE_VERSION>
    
  3. Run a set default script.

    admin# /opt/admin-pe/set_default_craypkg/set_default_<MODULE_NAME>_<MODULE_VERSION>
    

Lmod Custom Dynamic Hierarchy

Lmod enables a user to dynamically modify their user environment through Lua modules. The CPE implementation of Lmod capitalizes on its hierarchical structure, including the Lmod module auto-swapping functionality. This capability means that module dependencies determine the branches of the tree-like hierarchy. Lmod allows static and dynamic hierarchical module paths. Lmod provides full support for static paths, which build the hierarchy based on the current set of modules loaded. Alongside static paths, CPE implements dynamic paths for a subset of the Lmod hierarchy (compilers, networks, CPUs, and MPIs). Dynamic paths give an advanced level of flexibility for detecting multiple dependency paths and allow custom paths to join existing Lmod hierarchy in CPE without modifying customer modulefiles.

Static Lmod Hierarchy

Modules dependent on one or more modules being loaded are not visible to a user until their prerequisite modules are loaded. When the prerequisite modules are loaded, it adds the static paths of the dependent modules to the MODULEPATH environment variable, thereby exposing the dependent modules to the user. For more detailed information on the Lmod static module hierarchy, consult the User Guide for Lmod.

Dynamic Lmod Hierarchy

The CPE Lmod custom dynamic hierarchy abbreviates the overall Lmod hierarchy tree by relying on compatibility and not directly on a prerequisite version. Therefore, dependent modules do not need to exist in a new branch every time their prerequisite modules change versions. Instead, dynamic paths use a compatibility version that increases when a new prerequisite module version breaks compatibility in some way. The number following the path alias of the module (for example, 1.0 in x86-rome/1.0 and ofi/1.0) identifies the compatible version.

Module Path Aliases and Current Compatibility Versions

Compatible versions listed in the following tables include the minimum supported versions.

Compiler

RHEL Module Alias/Compatible Version

SLES Module Alias/Compatible Version

amd

amd/4.0

amd/4.0

cce

crayclang/16.0

crayclang/17.0

gcc

gnu/10.0

gnu/12.0

aocc

aocc/4.1

aocc/4.1

intel

intel/2023.2

intel/2023.2

nvidia(x86)

nvidia/20

nvidia/20

Network

Module Alias/Compatible Version

craype-network-none

none/1.0

craype-network-ofi

ofi/1.0

craype-network-ucx

ucx/1.0

CPU

Module Alias/Compatible Version

craype-x86-milan

x86-milan/1.0

craype-x86-rome

x86-rome/1.0

craype-x86-trento

x86-trento/1.0

MPI

Module Alias/Compatible Version

cray-mpich

cray-mpich/8.0

cray-mpich-abi

cray-mpich/8.0

cray-mpich-abi-pre-intel-5.0

cray-mpich/8.0

cray-mpich-ucx

cray-mpich/8.0

cray-mpich-ucx-abi

cray-mpich/8.0

cray-mpich-ucx-abi-pre-intel-5.0

cray-mpich/8.0

Custom Dynamic Hierarchy

The CPE custom dynamic hierarchy extension allows custom module paths to join the existing CPE Lmod hierarchy implementation without modifying customer modulefiles. The custom dynamic module types that CPE supports include:

  • Compiler

  • Network

  • CPU

  • MPI

  • Compiler/Network

  • Compiler/CPU

  • Compiler/Network/CPU/MPI

As each custom dynamic module type loads, a handshake occurs using special pre-defined environment variables. When all hierarchical prerequisites are met, the paths of the dependent modulefiles are added to the MODULEPATH environment variable, thereby exposing the dependent modules to the user.

For Lmod to assist a user optimally, load the compiler, network, CPU, and MPI module. Lmod cannot detect modules hidden in dynamic paths without one of each type of module being loaded.

Create a custom dynamic hierarchy

PREREQUISITES

Set Lmod as the default module handling system before initiating this procedure.

OBJECTIVE

For the CPE custom dynamic hierarchy to detect the desired Lmod module path, one or more custom dynamic environment variables must be created according to the requirements defined within this procedure.

PROCEDURE

To create a custom dynamic environment variable:

  1. Begin the environment variable name with LMOD_CUSTOM_.

  2. Append the descriptor of the module type that the environment variable will represent. The module types and descriptors are:

    Module Type

    Descriptor

    Compiler

    COMPILER_

    Network

    NETWORK_

    CPU

    CPU_

    MPI

    MPI_

    Compiler/Network

    COMNET_

    Compiler/CPU

    COMCPU_

    Compiler/Network/CPU/MPI

    CNCM_

    Example: The custom dynamic environment variable for the combined compiler and CPU module begins with LMOD_CUSTOM_COMCPU_.

  3. Following the descriptor, append all prerequisite module aliases along with their respective compatible versions. See Module Path Aliases and Current Compatibility Versions for more information. The format of the module path alias/compatible version string for each module type is shown below. Note that due to publishing issues, long module alias/compatible version strings are split across two lines as indicated below.

    Module Type: Module Path Alias/Compatible Version String

    Compiler: <compiler_name>/<compatible_version>

    Network: <network_name>/<compatible_version>

    CPU: <cpu_name>/<compatible_version>

    MPI:

    <compiler_name>/<compatible_version>/<network_name>/<compatible_version>/

    <mpi_name>/<compatible_version>

    Compiler/Network: <compiler_name>/<compatible_version/<network_name>/<compatible_version>

    Compiler/CPU: <compiler_name>/<compatible_version>/<cpu_name>/<compatible_version>

    Compiler/Network/CPU/MPI:

    <compiler_name>/<compatible_version>/<network_name>/<compatible_version>/

    <cpu_name>/<compatible_version>/<mpi_name>/<compatible_version>

    To create an acceptably formatted environment variable name, replace all slashes and dots in the module alias/compatible version string with underscores. Also, all letters must be in uppercase format.

    Example Module Path Alias/Compatible Version Strings:

  • Compiler = cce

    The path alias/compatible version string (values found in Module Path Aliases and Current Compatibility Versions) is crayclang/10.0; therefore, the text added to the environment variable name is:

    CRAYCLANG_10_0

  • Network = craype-network-ofi

    The path alias/compatible version string is ofi/1.0; therefore, the environment variable text is:

    OFI_1_0

  • CPU = craype-x86-rome

    The path alias/compatible version string is x86-rome/1.0; therefore, the environment variable text is:

    X86_ROME_1_0

  • MPI = cray-mpich

    cray-mpich has two prerequisite module types (compiler and network). Therefore, the environment variable must include the alias/compatible version for the desired compiler, network, and MPI. For a cray-mpich module dependent on cce and craype-network-ofi, the path alias/compatible version string is crayclang/10.0/ofi/1.0/cray_mpich/8.0; therefore, the environment variable text is:

    CRAYCLANG_10_0_OFI_1_0_CRAY_MPICH_8_0.

  • Compiler/Network = cce with craype-network-ofi

    The path alias/compatible version string is crayclang/10.0/ofi/1.0; therefore, the environment variable text is:

    CRAYCLANG_10_0_OFI_1_0

  • Compiler/CPU = cce with craype-x86-rome

    The path alias/compatible version string is crayclang/10.0/x86-rome/1.0; therefore, the environment variable text is:

    CRAYCLANG_10_0_X86_ROME_1_0

  • Compiler/Network/CPU/MPI = cce, craype-network-ofi, craype-x86-rome, and cray-mpich

    The path alias/compatible version string is crayclang/10.0/ofi/1.0/x86-rome/1.0/cray-mpich/8.0; therefore, the environment variable text is:

    CRAYCLANG_10_0_OFI_1_0_X86_ROME_1_0_CRAY_MPICH_8_0

  1. Append _PREFIX following the final module/compatibility text instance:

    Example: Network = craype-network-ofi

    The custom dynamic environment variable is LMOD_CUSTOM_NETWORK_OFI_1_0_PREFIX.

    Creation of the custom dynamic environment variable is now complete.

  2. Add the custom dynamic environment variable to the user environment by exporting it with its value set to the Lmod module path:

    # export LMOD_CUSTOM_NETWORK_OFI_1_0_PREFIX=<lmod_module_path>
    

    Example: Network = craype-network-ofi

    All modulefiles in <lmod_module_path> are shown to users whenever craype-network-ofi is loaded.