Running h5bench

h5bench (recommended)

We provide a single script you can use to run the benchmarks available in the h5bench Benchmarking Suite. You can combine multiple benchmarks into a workflow with distinct configurations. If you prefer, you can also manually run each benchmark in h5bench. For more details, refer to the Manual Execution section.

usage: h5bench [-h] [-a] [-d] [-v] [-p PREFIX] [-f FILTER] [-V] setup

H5bench: a Parallel I/O Benchmark Suite for HDF5:

positional arguments:
  setup                          JSON file with the benchmarks to run

options:
  -h, --help                     Show this help message and exit
  -a, --abort-on-failure         Stop h5bench if a benchmark failed
  -d, --debug                    Enable debug mode
  -v, --validate-mode            Validated if the requested mode (async/sync) was run
  -p PREFIX, --prefix PREFIX     Prefix where all h5bench binaries were installed
  -f FILTER, --filter FILTER     Execute only filtered benchmarks
  -V, --version                  Show program's version number and exit

You need to provide a JSON file with the configurations you want to run. If you’re using h5bench, you should not call mpirun, srun, or any other parallel launcher on your own. Refer to the manual execution section if you want to follow that approach instead. The main script will handle setting and unsetting environment variables, launching the benchmarks with the provided configuration and HDF5 VOL connectors.

h5bench configuration.json

If you run it with the --debug option, h5bench will also print log messages stdout. The default behavior is to store it in a file.

Warning

Make sure you do not call srun, mpirun, etc directly but instead define that in the JSON configuration file. You should always call h5bench directly.

You can define a single .json file with a complete setup and a combination of kernels you want to run. You can filter which of those benchmarks h5bench should run by passing the –filter option when running. For instance, the following command will only run the read, and openpmd kernels defined in the .json. The remaining ones would be ignored.

h5bench --filter read,openpmd configuration.json

Configuration

The JSON configuration file has five main properties: mpi, vol, file-system, directory, benchmarks. All should be defined, even if empty.

MPI

You can set the MPI launcher you want to use, e.g. mpirun, mpiexec, and srun, and provide the number of processes you want to use. For other methods or a fine grain control on the job configuration, you can define the configuration properties that h5bench will use to launch the experiments using the command property you provided. If the configuration option is defined, h5bench will ignore the ranks property.

"mpi": {
   "command": "mpirun",
   "ranks": "4",
   "configuration": "-np 8 --oversubscribe"
}

VOL

You can use HDF5 VOL connectors (async, cache, etc) for h5bench_write and h5bench_read. Because some benchmarks inside h5bench do not have support for VOL connectors yet, you need to provide the necessary information in the configuration file to handle the VOL setup during runtime.

"vol": {
   "library": "/vol-async/src:/hdf5-async-vol-register-install/lib:/argobots/install/lib:/hdf5-install/install:",
   "path": "/vol-async/src",
   "connector": "async under_vol=0;under_info={}"
}

You should provide the absolute path for all the libraries required by the VOL connector using the library property, the path of the VOL connector, and the configuration in connector. The provided example depicts how to configure the HDF5 VOL async connector.

Directory

h5bench will create a directory for the given execution workflow, where it will store all the generated files and logs. Additional options such as data striping for Lustre, if configured, will be applied to this directory.

"directory": "hdf5-output-directory"

File System

You can use this property to configure some file system options. For now, you can use it for Lustre to define the striping count and size that should be applied to the directory that will store all the generated data from h5bench.

"file-system": {
   "lustre": {
      "stripe-size": "1M",
      "stripe-count": "4"
   }
}

Benchmarks

You can specify which benchmarks h5bench should run using this property, their order, and configuration. You can choose between: write, write-unlimited, overwrite, append, read, metadata, exerciser, openpmd, amrex, e3sm.

For each pattern of h5bench, you should provide the file and the configuration:

{
   "benchmark": "write",
   "file": "test.h5",
   "configuration": {
      "MEM_PATTERN": "CONTIG",
      "FILE_PATTERN": "CONTIG",
      "NUM_PARTICLES": "16 M",
      "TIMESTEPS": "5",
      "DELAYED_CLOSE_TIMESTEPS": "2",
      "COLLECTIVE_DATA": "NO",
      "COLLECTIVE_METADATA": "NO",
      "EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s",
      "NUM_DIMS": "1",
      "DIM_1": "16777216",
      "DIM_2": "1",
      "DIM_3": "1",
      "MODE": "SYNC",
      "CSV_FILE": "output.csv"
   }
}

If you provide the same file name used for a previous write execution, it will read from that file. This way, you can configure a workflow with multiple interleaving files, e.g., write file-01, write file-02, read file-02, read file-01.

{
   "benchmark": "read": {
   "file": "test.h5",
   "configuration": {
      "MEM_PATTERN": "CONTIG",
      "FILE_PATTERN": "CONTIG",
      "NUM_PARTICLES": "16 M",
      "TIMESTEPS": "5",
      "DELAYED_CLOSE_TIMESTEPS": "2",
      "COLLECTIVE_DATA": "NO",
      "COLLECTIVE_METADATA": "NO",
      "EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s",
      "NUM_DIMS": "1",
      "DIM_1": "16777216",
      "DIM_2": "1",
      "DIM_3": "1",
      "MODE": "SYNC",
      "CSV_FILE": "output.csv"
   }
}

You can also provide the align settings for GPFS filesystem in the benchmark property configuration. Note, not in the filesystem property. This parameter is enabled only for h5bench-write and h5bench-write-unlimited.

{
   "benchmark": "write",
   "file": "test.h5",
   "configuration": {
         "MEM_PATTERN": "CONTIG",
         "FILE_PATTERN": "CONTIG",
         "TIMESTEPS": "5",
         "DELAYED_CLOSE_TIMESTEPS": "2",
         "COLLECTIVE_DATA": "YES",
         "COLLECTIVE_METADATA": "YES",
         "EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s",
         "NUM_DIMS": "1",
         "DIM_1": "4194304",
         "DIM_2": "1",
         "DIM_3": "1",
         "MODE": "ASYNC",
         "CSV_FILE": "output.csv",
         "ALIGN":"YES",
         "ALIGN_THRESHOLD":"16777216",
         "ALIGN_LEN":"16777216"
   }
}

For the metadata stress benchmark, file and configuration properties must be defined:

{
   "benchmark": "metadata",
   "file": "hdf5_iotest.h5",
   "configuration": {
      "version": "0",
      "steps": "20",
      "arrays": "500",
      "rows": "100",
      "columns": "200",
      "process-rows": "2",
      "process-columns": "2",
      "scaling": "weak",
      "dataset-rank": "4",
      "slowest-dimension": "step",
      "layout": "contiguous",
      "mpi-io": "independent",
      "csv-file": "hdf5_iotest.csv"
   }
}

For the exerciser benchmark, you need to provide the required runtime options in the JSON file inside the configuration property.

{
   "benchmark": "exerciser",
   "configuration": {
      "numdims": "2",
      "minels": "8 8",
      "nsizes": "3",
      "bufmult": "2 2",
      "dimranks": "8 4"
   }
}

You can find several samples of configuration file with all the options in our [GitHub repository] (https://github.com/hpc-io/h5bench/tree/master/samples). You can also refer to this sample of a complete configuration.json file that defined the workflow of the execution of multiple benchmarks from h5bench Suite:

{
    "mpi": {
        "command": "mpirun",
        "ranks": "4",
        "configuration": "-np 8 --oversubscribe"
    },
    "vol": {
        "library": "/vol-async/src:/hdf5-async-vol-register-install/lib:/argobots/install/lib:/hdf5-install/install:",
        "path": "/vol-async/src",
        "connector": "async under_vol=0;under_info={}"
    },
    "file-system": {
        "lustre": {
            "stripe-size": "1M",
            "stripe-count": "4"
        }
    },
    "directory": "full-teste",
    "benchmarks": [
        {
            "benchmark": "e3sm",
            "file": "coisa.h5",
            "configuration": {
                "k": "",
                "x": "blob",
                "a": "hdf5",
                "r": "25",
                "o": "ON",
                "netcdf": "../../e3sm/datasets/f_case_866x72_16p.nc"
            }
        },
        {
            "benchmark": "write",
            "file": "test.h5",
            "configuration": {
                "MEM_PATTERN": "CONTIG",
                "FILE_PATTERN": "CONTIG",
                "NUM_PARTICLES": "16 M",
                "TIMESTEPS": "5",
                "DELAYED_CLOSE_TIMESTEPS": "2",
                "COLLECTIVE_DATA": "NO",
                "COLLECTIVE_METADATA": "NO",
                "EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s", 
                "NUM_DIMS": "1",
                "DIM_1": "16777216",
                "DIM_2": "1",
                "DIM_3": "1",
                "MODE": "SYNC",
                "CSV_FILE": "output.csv"
            }
        },
        {
            "benchmark": "exerciser",
            "configuration": {
                "numdims": "2",
                "minels": "8 8",
                "nsizes": "3",
                "bufmult": "2 2",
                "dimranks": "8 4"
            }
        },
        {
            "benchmark": "exerciser",
            "configuration": {
                "numdims": "2",
                "minels": "8 8",
                "nsizes": "3",
                "bufmult": "2 2",
                "dimranks": "8 4"
            }
        },
        {
            "benchmark": "exerciser",
            "configuration": {
                "numdims": "2",
                "minels": "8 8",
                "nsizes": "3",
                "bufmult": "2 2",
                "dimranks": "8 4"
            }
        },
        {
            "benchmark": "exerciser",
            "configuration": {
                "numdims": "2",
                "minels": "8 8",
                "nsizes": "3",
                "bufmult": "2 2",
                "dimranks": "8 4"
            }
        },
        {
            "benchmark": "exerciser",
            "configuration": {
                "numdims": "2",
                "minels": "8 8",
                "nsizes": "3",
                "bufmult": "2 2",
                "dimranks": "8 4"
            }
        },
        {
            "benchmark": "read",
            "file": "test.h5",
            "configuration": {
                "MEM_PATTERN": "CONTIG",
                "FILE_PATTERN": "CONTIG",
                "NUM_PARTICLES": "16 M",
                "TIMESTEPS": "5",
                "DELAYED_CLOSE_TIMESTEPS": "2",
                "COLLECTIVE_DATA": "NO",
                "COLLECTIVE_METADATA": "NO",
                "EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s", 
                "NUM_DIMS": "1",
                "DIM_1": "16777216",
                "DIM_2": "1",
                "DIM_3": "1",
                "MODE": "SYNC",
                "CSV_FILE": "output.csv"
            }
        },
        {
            "benchmark": "write",
            "file": "test-two.h5",
            "configuration": {
                "MEM_PATTERN": "CONTIG",
                "FILE_PATTERN": "CONTIG",
                "NUM_PARTICLES": "2 M",
                "TIMESTEPS": "20",
                "DELAYED_CLOSE_TIMESTEPS": "2",
                "COLLECTIVE_DATA": "NO",
                "COLLECTIVE_METADATA": "NO",
                "EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s", 
                "NUM_DIMS": "1",
                "DIM_1": "16777216",
                "DIM_2": "1",
                "DIM_3": "1",
                "MODE": "SYNC",
                "CSV_FILE": "output.csv"
            }
        },
        {
            "benchmark": "metadata",
            "file": "hdf5_iotest.h5",
            "configuration": {
                "version": "0",
                "steps": "20",
                "arrays": "500",
                "rows": "100",
                "columns": "200",
                "process-rows": "2",
                "process-columns": "2",
                "scaling": "weak",
                "dataset-rank": "4",
                "slowest-dimension": "step",
                "layout": "contiguous",
                "mpi-io": "independent",       
                "csv-file": "hdf5_iotest.csv"
            }
        }
    ]
}

For a description of all the options available in each benchmark, please refer to their entries in the documentation.

When the --debug option is enabled, you can expect an output similar to:

2021-10-25 16:31:24,866 h5bench - INFO - Starting h5bench Suite
2021-10-25 16:31:24,889 h5bench - INFO - Lustre support detected
2021-10-25 16:31:24,889 h5bench - DEBUG - Lustre stripping configuration: lfs setstripe -S 1M -c 4 your-path
2021-10-25 16:31:24,903 h5bench - INFO - h5bench [write] - Starting
2021-10-25 16:31:24,903 h5bench - INFO - h5bench [write] - DIR: your-path/504fc233/
2021-10-25 16:31:24,904 h5bench - INFO - Parallel setup: srun --cpu_bind=cores -n 4
2021-10-25 16:31:24,908 h5bench - INFO - srun --cpu_bind=cores -n 4 build/h5bench_write your-path/504fc233/h5bench.cfg your-path/test.h5
2021-10-25 16:31:41,670 h5bench - INFO - SUCCESS
2021-10-25 16:31:41,754 h5bench - INFO - Runtime: 16.8505464 seconds (elapsed time, includes allocation wait time)
2021-10-25 16:31:41,755 h5bench - INFO - h5bench [write] - Complete
2021-10-25 16:31:41,755 h5bench - INFO - h5bench [exerciser] - Starting
2021-10-25 16:31:41,755 h5bench - INFO - h5bench [exerciser] - DIR: your-path/247659d1/
2021-10-25 16:31:41,755 h5bench - INFO - Parallel setup: srun --cpu_bind=cores -n 4
2021-10-25 16:31:41,756 h5bench - INFO - srun --cpu_bind=cores -n 4 build/h5bench_exerciser --numdims 2  --minels 8 8  --nsizes 3  --bufmult 2 2  --dimranks 8 4
2021-10-25 16:31:49,174 h5bench - INFO - SUCCESS
2021-10-25 16:31:49,174 h5bench - INFO - Finishing h5bench Suite

Perlmutter (NERSC)

In Perlmutter you need to load Python and its libraries for the main h5bench script to work. For manual execution of each benchmark, that is not required.

module load python

In case you are running on Cori and the benchmark fails with an MPI message indicating no support for multiple threads:

Assertion `MPI_THREAD_MULTIPLE == mpi_thread_lvl_provided' failed.

Please, make sure you define the following:

export MPICH_MAX_THREAD_SAFETY="multiple"

Sunspot (ALCF)

In Sunspot, you need to export one additional environment variable related to ATS. ATS is the Address Translation Service support for using the IOMMU (Input–Output Memory Management Unit) for address translation. ATS is not supported on Intel processors at this time. The default is to NTA (NIC translation).

export FI_CXI_ATS=0

Otherwise, you will encounter the following error:

libfabric:36807:1674015247::cxi:core:cxip_fc_notify_cb():4366<warn>
x1922c0s5b0n0: TXC (0x1081:5:0):: Fatal, unexpected event rc: 26

Manual Execution

If you prefer, you can execute each benchmark manually. In this scenario, you will be responsible for generating the input configuration file needed for each benchmark in the suite, ensuring it follows the pre-defined format unique for each one.

If you want to use HDF5 VOL connectors or tune the file system configuration, h5bench will not take care of that. Remember that not all benchmarks in the suite have support for VOL connectors yet.