H5bench

H5bench is a suite of parallel I/O benchmarks or kernels representing I/O patterns that are commonly used in HDF5 applications on high performance computing systems. H5bench measures I/O performance from various aspects, including the I/O overhead, observed I/O rate, etc.

Build Instructions

Benchmark Suite Usage

h5bench_patterns benchmark

Major refactoring is in progress, this document may be out of date. Both h5bench_write and h5bench_read take config and data file path as command line arguments.

./h5bench_write my_config.cfg my_data.h5
./h5bench_read my_config.cfg my_data.h5

This set of benchmarks contains an I/O kernel developed based on a particle physics simulation’s I/O pattern (VPIC-IO for writing data in a HDF5 file) and on a big data clustering algorithm (BDCATS-IO for reading the HDF5 file VPIC-IO wrote).

Settings in the Configuration File

The h5bench_write and h5bench_read take parameters in a plain text config file. The content format is strict. Unsupported formats :

  • Blank/empty lines, including ending lines.

  • Comment symbol(#) follows value immediately:
    • TIMESTEPS=5# Not supported

    • TIMESTEPS=5 #This is supported

    • TIMESTEPS=5 # This is supported

  • Blank space in assignment
    • TIMESTEPS=5 # This is supported

    • TIMESTEPS = 5 # Not supported

    • TIMESTEPS =5 # Not supported

    • TIMESTEPS= 5 # Not supported

A template of config file can be found basic_io/sample_config/template.cfg:

#========================================================
#   General settings
NUM_PARTICLES=16 M #16 K/G
TIMESTEPS=5
EMULATED_COMPUTE_TIME_PER_TIMESTEP=1 s #1 ms, 1 min
#========================================================
#   Benchmark data dimensionality
NUM_DIMS=1
DIM_1=16777216
DIM_2=1
DIM_3=1
#========================================================
#   IO pattern settings
IO_OPERATION=READ
#IO_OPERATION=WRITE
MEM_PATTERN=CONTIG
# INTERLEAVED STRIDED
FILE_PATTERN=CONTIG # STRIDED
#========================================================
#   Options for IO_OPERATION=READ
READ_OPTION=FULL # PARTIAL STRIDED
TO_READ_NUM_PARTICLES=4 M
#========================================================
#   Strided access parameters, required for strided access
#STRIDE_SIZE=
#BLOCK_SIZE=
#BLOCK_CNT=
#========================================================
# Collective data/metadata settings
#COLLECTIVE_DATA=NO #Optional, default for NO.
#COLLECTIVE_METADATA=NO #Optional, default for NO.
#========================================================
#    Compression, optional, default is NO.
#COMPRESS=NO
#CHUNK_DIM_1=1
#CHUNK_DIM_2=1
#CHUNK_DIM_3=1
#========================================================
#    Async related settings
DELAYED_CLOSE_TIMESTEPS=2
IO_MEM_LIMIT=5000 K
ASYNC_MODE=EXP #EXP IMP NON
#========================================================
#    Output performance results to a CSV file
#CSV_FILE=perf_write_1d.csv
#
#FILE_PER_PROC=
General Settings
  • IO_OPERATION: required, chose from READ and WRITE.

  • MEM_PATTERN: required, chose from CONTIG, INTERLEAVED and STRIDED

  • FILE_PATTERN: required, chose from CONTIG, and STRIDED

  • NUM_PARTICLES: required, the number of particles that each rank needs to process, can be in exact numbers (12345) or in units (format like 16 K, 128 M and 256 G are supported, format like 16K, 128M, 256G is NOT supported).

  • TIMESTEPS: required, the number of iterations

  • EMULATED_COMPUTE_TIME_PER_TIMESTEP: required, must be with units (eg,. 10 s, 100 ms or 5000 us). In each iteration, the same amount of data will be written and the file size will increase correspondingly. After each iteration, the program sleeps for $EMULATED_COMPUTE_TIME_PER_TIMESTEP time to emulate the application computation.

  • NUM_DIMS: required, the number of dimensions, valid values are 1, 2 and 3.

  • DIM_1, DIM_2, and DIM_3: required, the dimensionality of the source data. Always set these parameters in ascending order, and set unused dimensions to 1, and remember that NUM_PARTICLES == DIM_1 * DIM_2 * DIM_3 MUST hold. For example, DIM_1=1024, DIM_2=256, DIM_3=1 is a valid setting for a 2D array when NUM_PARTICLES=262144 or NUM_PARTICLES=256 K, because 10242561 = 262144, which is 256 K.

Example for using multi-dimensional array data
  • Using 2D as the example, 3D cases are similar, the file is generated with with 4 ranks, each rank write 8M elements, organized in a 4096 * 2048 array, in total it forms a (4 * 4096) * 2048 2D array. The file should be around 1GB.

Dimensionality part of the Config file:

NUM_DIMS=2
DIM_1=4096
DIM_2=2048
DIM_3=64        # Note: extra dimensions than specified by NUM_DIMS are ignored
Addtional Settings for READ (h5bench_read)
  • READ_OPTION: required for IO_OPERATION=READ, not allowed for IO_OPERATION=WRITE.

    • FULL: read the whole file

    • PARTIAL: read the first $TO_READ_NUM_PARTICLES particles

    • STRIDED: read in streded pattern

  • TO_READ_NUM_PARTICLES: required, the number for particles attempt to read.

Compression Settings
  • COMPRESS: YES or NO, optional. Only applicable for WRITE(h5bench_write), has no effect for READ. Used to enable compression, when enabled, chunk dimensions(CHUNK_DIM_1, CHUNK_DIM_2, CHUNK_DIM_3) are required. To enable parallel compression feature for VPIC, add following section to the config file, and make sure chunk dimension settings are compatible with the data dimensions: they must have the same rank of dimensions (eg,. 2D array dataset needs 2D chunk dimensions), and chunk dimension size cannot be greater than data dimension size.

COMPRESS=YES    # to enable parallel compression( chunking)
CHUNK_DIM_1=512 # chunk dimensions
CHUNK_DIM_2=256
CHUNK_DIM_3=1   # extra chunk dimension take no effects

Attention

There is a known bug on HDF5 parallel compression that could cause the system run out of memory when the chunk amount is large (large number of particle and very small chunk sizes). On Cori Hasswell nodes, the setting of 16M particles per rank, 8 nodes (total 256 ranks), 64 * 64 chunk size will crash the system by runing out of memory, on single nodes the minimal chunk size is 4 * 4.

Collective Operation Settings
  • COLLECTIVE_DATA: optional, set to “YES” for collective data operations, otherwise and default (not set) cases for independent operations.

  • COLLECTIVE_METADATA: optional, set to “YES” for collective metadata operations, otherwise and default (not set) cases for independent operations.

Other Settings
  • CSV_FILE: optional CSV file output name, performance results will be print to the file and the standard output as well.

Supported Patterns

Attention

Not every pattern combination is covered, supported benchmark parameter settings are listed below.

Supported Write Patterns (h5bench_write): IO_OPERATION=WRITE

The I/O patterns include array of structures (AOS) and structure of arrays (SOA) in memory as well as in file. The array dimensions are 1D, 2D, and 3D for the write benchmark. This defines the write access pattern, including CONTIG (contiguous), INTERLEAVED and STRIDED” for the source (the data layout in the memory) and the destination (the data layout in the resulting file). For example, MEM_PATTERN=CONTIG and FILE_PATTERN=INTERLEAVED is a write pattern where the in-memory data layout is contiguous (see the implementation of prepare_data_contig_2D() for details) and file data layout is interleaved by due to its’ compound data structure (see the implementation of data_write_contig_to_interleaved () for details).

4 patterns for both 1D and 2D array write (NUM_DIMS=1 or NUM_DIMS=2)
MEM_PATTERN=CONTIG, FILE_PATTERN=CONTIG
MEM_PATTERN=CONTIG, FILE_PATTERN=INTERLEAVED
MEM_PATTERN=INTERLEAVED, FILE_PATTERN=CONTIG
MEM_PATTERN=INTERLEAVED, FILE_PATTERN=INTERLEAVED
1 pattern for 3D array (NUM_DIMS=3)
MEM_PATTERN=CONTIG, FILE_PATTERN=CONTIG
1 strided pattern for 1D array (NUM_DIMS=1)
MEM_PATTERN=CONTIG, FILE_PATTERN=STRIDED
Supported Read Patterns (h5bench_read): IO_OPERATION=READ
1 pattern for 1D, 2D and 3D read (NUM_DIMS=1 or NUM_DIMS=2)
MEM_PATTERN=CONTIG, FILE_PATTERN=CONTIG, READ_OPTION=FULL, contiguously read through the whole data file.
2 patterns for 1D read
MEM_PATTERN=CONTIG, FILE_PATTERN=CONTIG, READ_OPTION=PARTIAL, contiguously read the first $TO_READ_NUM_PARTICLES elements.

MEM_PATTERN=CONTIG, FILE_PATTERN=STRIDED, READ_OPTION=STRIDED

Sample Settings

The following setting reads 2048 particles from 128 blocks in total, each block consists of the top 16 from every 64 elements. See HDF5 documentation for details of using strided access.

#   General settings
NUM_PARTICLES=16 M
TIMESTEPS=5
MULATED_COMPUTE_TIME_PER_TIMESTEP=1 s
#========================================================
#   Benchmark data dimensionality
NUM_DIMS=1
DIM_1=16777216
DIM_2=1
DIM_3=1
#========================================================
#   IO pattern settings
IO_OPERATION=READ
MEM_PATTERN=CONTIG
FILE_PATTERN=CONTIG
#========================================================
#    Options for IO_OPERATION=READ
READ_OPTION=PARTIAL # FULL PARTIAL STRIDED
TO_READ_NUM_PARTICLES=2048
#========================================================
#    Strided access parameters
STRIDE_SIZE=64
BLOCK_SIZE=16
BLOCK_CNT=128

For more examples, please find the config files and template.cfg in basic_io/sample_config/ directory.

To Run the h5bench_write and h5bench_read

Both h5bench_write and h5bench_read use the same command line arguments:

Single process run:

./h5bench_write sample_write_cc1d_es1.cfg my_data.h5

Parallel run (replace mpirun with your system provided command, for example, srun on Cori/NERSC and jsrun on Summit/OLCF):

mpirun -n 2 ./h5bench_write sample_write_cc1d_es1.cfg output_file

In Cori/NERSC or similar platforms that use Cray-MPICH library, if you encouter a failed assertion regarding support for MPI_THREAD_MULTIPLE you should define the following environment variable:

export MPICH_MAX_THREAD_SAFETY="multiple"

Argobots in MacOS

If you’re trying to run the benchmark in a MacOS and are getting segmentation fault (from ABT_thread_create), please try to set the following environment variable:

ABT_THREAD_STACKSIZE=100000 ./h5bench_write sample_write_cc1d_es1.cfg my_data.h5

Understanding the Output

The metadata and raw data operations are timed separately, and overserved time and rate are based on the total time.

Sample output of h5bench_write:

==================  Performance results  =================
Total emulated compute time 4000 ms
Total write size = 2560 MB
Data preparation time = 739 ms
Raw write time = 1.012 sec
Metadata time = 284.990 ms
H5Fcreate() takes 4.009 ms
H5Fflush() takes 14.575 ms
H5Fclose() takes 4.290 ms
Observed completion time = 6.138 sec
Raw write rate = 2528.860 MB/sec
Observed write rate = 1197.592 MB/sec

Sample output of h5bench_read:

=================  Performance results  =================
Total emulated compute time = 4 sec
Total read size = 2560 MB
Metadata time = 17.523 ms
Raw read time = 1.201 sec
Observed read completion time = 5.088 sec
Raw read rate = 2132.200 MB/sec
Observed read rate = 2353.605225 MB/sec

h5bench_exerciser

We modified this benchmark slightly so to be able to specify a file location that is writable. Except for the first argument $write_file_prefix, it’s identical to the original one. Detailed README can be found int source code directory, the original can be found here https://xgitlab.cels.anl.gov/ExaHDF5/BuildAndTest/-/blob/master/Exerciser/README.md

Example run:

mpirun -n 8 ./h5bench_exerciser $write_file_prefix -numdims 2 --minels 8 8 --nsizes 3 --bufmult 2 --dimranks 8 4

The Metadata Stress Test: h5bench_hdf5_iotest

This is the same benchmark as it’s originally found at https://github.com/HDFGroup/hdf5-iotest. We modified this benchmark slightly so to be able to specify the config file location, everything else remains untouched.

Example run:

mpirun -n 4 ./h5bench_hdf5_iotest hdf5_iotest.ini

Streaming operation benchmark: h5bench_vl_stream_hl

This benchmark tests the performance of append operation. It supports two types of appends, FIXED and VLEN, represents fixed length data and variable length data respectively. Note: This benchmark doesn’t run in parallel mode.

To run benchmarks

./h5bench_vl_stream_hl write_file_path FIXED/VLEN num_ops

Example runs:

./h5bench_vl_stream_hl here.dat FIXED 1000
./h5bench_vl_stream_hl here.dat VLEN 1000

HDF5 Exerciser Benchmark

Authors:

Updates: December 13th 2018 (Version 2.0)

Attention

For more-detailed instructions of how to build and run the exerciser code on specific machines (at ALCF), see the Exerciser/BGQ/VESTA_XL/README.md and Exerciser/BGQ/THETA/README.md directories of this repository. Those README files also include instructions for building the CCIO and develop versions of HDF5 for use with this benchmark.

Exerciser Overview

The HDF5 Exerciser Benchmark creates an HDF5 use case with some ideas/code borrowed from other benchmarks (namely IOR, VPICIO and FLASHIO). Currently, the algorithm does the following in parallel over all MPI ranks:

  • For each rank, a local data buffer (with dimensions given by numDims) is initialized with minNEls double-precision elements in each dimension.

  • If the –derivedtype flag is used, a second local dataset is also specified with a derived data type a-signed to each element.

  • For a given number of iterations (hardcoded as NUM_ITERATIONS):

    • Open a file, create a top group, set the MPI-IO transfer property, and (optionally) add a simple attribute string to the top group

    • Create memory and file dataspaces with hyperslab selections for simple rank-ordered offsets into the file. The -rshift option can be used to specify the number of rank positions to shift the write position in the file (the read will be shifted twice this amount to avoid client-side caching effects

    • Write the data and close the file

    • Open the file, read in the data, and check correctness (if dataset is small enough)

    • Close the dataset (but not the file)

    • If the second (derived-type) data set is specified: (1) create a derived type, (2) open a new data set with the same number of elements and dimension, (3) write the data and (4) close everything.

  • Each dimension of curNEls is then multiplied by each dimension of bufMult, and the previous steps (the loop over NUM_ITERATIONS) are repeated. This outer loop over local buffer sizes is repeated a total of nsizes times.

Command-line Arguments (Options)

Required
  • –numdims <x>: Dimension of the datasets to write to the hdf5 file

  • –minels <x> … <x>: Min number of double elements to write in each dim of the dataset (one value for each dimension)

Optional
  • –nsizes <x>: How many buffer sizes to use (Code will start with minbuf and loop through nsizes iterations, with the buffer size multiplied by bufmult in each dim, for each iteration)

  • –bufmult <x> … <x>: Constant, for each dimension, used to multiply the buffer [default: 2 2 … ]

  • –metacoll: Whether to set meta data collective usage [default: False]

  • –derivedtype: Whether to create a second data set containing a derived type [default: False]

  • –addattr: Whether to add attributes to group 1 [default: False]

  • –indepio: Whether to use independant I/O (not MPI-IO) [default: False]

  • –keepfile: Whether to keep the file around after the program ends for futher analysis, otherwise deletes it [default: False]

  • –usechunked: Whether to chunk the data when reading/writing [default: False]

  • –maxcheck <x>: Maximum buffer size (in bytes) to validate. Note that all buffers will be vaidated if this option is not set by this command-line argument [default: Inf]

  • –memblock <x>: Define the block size to use in the local memory buffer (local buffer is always 1D for now, Note: This currently applies to the ‘double’ dataset only) [default: local buffer size]

  • –memstride <x>: Define the stride of the local memory buffer (local buffer is always 1D for now, Note: This currently applies to the ‘double’ dataset only) [default: local buffer size]

  • –fileblocks <x>..<x>(one value for each dimension): block sizes to use in the file for each dataset dimension (Note: This currently applies to the ‘double’ dataset only) [default: 1 … 1]

  • –filestrides <x>..<x>(one value for each dimension): stride dist. to use in the file for each dataset dimension (Note: This currently applies to the ‘double’ dataset only) [default: 1 … 1]

The exerciser also allows the MPI decomposition to be explicitly defined:

  • –dimranks <x>…<x>: (one value for each dimension) mpi-rank division in each dimension. Note that, if not set, decomposition will be in 1st dimension only

Exerciser Basics

In the simplest case, the Exerciser code will simply write and then read an n-dimensional double-precision dataset in parallel (with all the necessary HDF5 steps in between). At a minimum, the user must specify the number of dimensions to use for this dataset (using the –numdims flag), and the size of each dimension (using the –minels flag). By default, the maximum number of dimensions allowed by the code is set by MAX_DIM (currently 4, but can be modified easily). Note that the user is specifying the number of elements to use in each dimension with –minels. Therefore, the local buffer size is the product of the dimension sizes and sizeof(double) (and the global dataset in the file is a product of the total MPI ranks and the local buffer size). As illustrated in Fig. 1, the mapping of ranks to hyper-slabs in the global dataset can be specified with the –dimranks flag (here, Example 1 is the default decomposition, while Example 2 corresponds to: –dimranks 2 2). This flag simply allows the user to list the number of spatial decompositions in each dimension of the global dataset, and requires that the product of the input to be equal to the total number of MPI ranks.

Fig. 1 - Illustration of different local-to-global dataset mapping options.

Fig. 1 - Illustration of different local-to-global dataset mapping options.

If the user wants to loop through a range of buffer sizes, the –nsizes flag can be used to specify how many sizes measure, and the –bufmult flag can be used to specify the multiplication factor for each dimension between each loop. For example, if the user wanted to test 64x64, 128x128, and 256x256-element local datasets on 32 ranks, they could use the following command to run the code:

mpirun -np 32 ./hdf5Exerciser --numdims 2 --minels 8 8 --nsizes 3 --bufmult 2 2 --dimranks 8 4

When executed for a single local-buffer size (default), the Exerciser output will look something like this:

useMetaDataCollectives: 0 addDerivedTypeDataset: 0 addAttributes: 0 useIndependentIO: 0 numDims: 1 useChunked: 0 rankShift: 4096
Metric      Bufsize   H5DWrite    RawWrBDWTH    H5Dread    RawRdBDWTH    Dataset      Group  Attribute    H5Fopen   H5Fclose   H5Fflush OtherClose
Min           32768   0.134616   3058.154823   0.191049   2534.613015   0.361010   0.551608   0.000001   0.224550   0.127877   0.210821   0.000755
Med           32768   0.143874   3554.180478   0.191684   2670.829718   0.379858   0.612309   0.000001   0.236735   0.132450   0.228889   0.000761
Max           32768   0.167421   3803.418460   0.202003   2679.939135   0.405620   0.679779   0.000002   0.268622   0.138463   0.270188   0.000785
Avg           32768   0.146435   3506.598052   0.192068   2666.021346   0.379799   0.616157   0.000001   0.237219   0.132410   0.233730   0.000763
Std           32768   0.008055    185.366133   0.002090     27.665058   0.010248   0.026048   0.000000   0.008915   0.002650   0.017362   0.000006

Using NUM_ITERATIONS samples for each local buffer size (Bufsize), the minimum, median, maximum, average, and standard deviation of all metrics will be reported in distinct rows of the output. The Bufsize values are reported in bytes, while the RawWrBDWTH and RawRdBDWTH are in MB/s, and all other metrics are in seconds.

Building Exerciser

H5bench’s make process builds the h5bench_exerciser.

In case, Exerciser needs to be built separately, given the path to a parallel HDF5 installation, building it is straightforward. The following Makefile can be used as a reference:

default: hdf5Exerciser

HDF5_INSTALL_DIR=/Users/rzamora/hdf5-install

h5bench_exerciser.o: h5bench_exerciser.c
mpicc  -c -g -DMETACOLOK -I${HDF5_INSTALL_DIR}/include  h5bench_exerciser.c -o h5bench_exerciser.o

hdf5Exerciser: h5bench_exerciser.o
mpicc h5bench_exerciser.o -o hdf5Exerciser  -L${HDF5_INSTALL_DIR}/lib -lhdf5 -lz

clean:
rm -f h5bench_exerciser.o
rm -f hdf5Exerciser

For more-detailed instructions of how to build and run both HDF5 and the exerciser on specific machines (at ALCF), see the Exerciser/BGQ/VESTA_XL and Exerciser/BGQ/THETA directories of this repository.

h5bench_patterns

Here you can find some sample configuration files for h5bench patterns:

sample_2d_compressed.cfg

# this is a comment
# Benchmark mode can only be one of these: CC/CI/IC/II/CC2D/CI2D/IC2D/II2D/CC2D/CC3D
IO_OPERATION=READ
MEM_PATTERN=CONTIG
FILE_PATTERN=CONTIG
READ_OPTION=FULL
NUM_DIMS=2
NUM_PARTICLES=8 M
TIMESTEPS=5
EMULATED_COMPUTE_TIME_PER_TIMESTEP=1 s
DIM_1=4096
DIM_2=2048
DIM_3=1
COMPRESS=YES  # to enable parallel compression(chunking)
CHUNK_DIM_1=512  # chunk dimensions
CHUNK_DIM_2=256
CHUNK_DIM_3=1

sample_read_cc1d.cfg

# this is a comment
# Benchmark mode can only be one of these: CC/CI/IC/II/CC2D/CI2D/IC2D/II2D/CC2D/CC3D
IO_OPERATION=READ
READ_OPTION=FULL
NUM_DIMS=1
NUM_PARTICLES=8 M
TIMESTEPS=5
EMULATED_COMPUTE_TIME_PER_TIMESTEP=1 s
DIM_1=8 M
DIM_2=1
DIM_3=1

sample_read_cc1d_es1.cfg

# this is a comment
IO_OPERATION=READ
TO_READ_NUM_PARTICLES=16777216
READ_OPTION=READ_FULL
MEM_PATTERN=CONTIG
FILE_PATTERN=CONTIG
TIMESTEPS=5
DELAYED_CLOSE_TIMESTEPS=2
COLLECTIVE_DATA=NO #Optional, default for NO.
COLLECTIVE_METADATA=NO #Optional, default for NO.
EMULATED_COMPUTE_TIME_PER_TIMESTEP=1 s
NUM_DIMS=1
DIM_1=16777216 # 16777216, 8388608
DIM_2=1
DIM_3=1
IO_MEM_LIMIT=1 G
ASYNC_MODE=EXP  #NON
#CSV_FILE=perf_read_1d.csv
#==================================

sample_read_cc2d.cfg

# this is a comment
# Benchmark mode can only be one of these: CC/CI/IC/II/CC2D/CI2D/IC2D/II2D/CC2D/CC3D
IO_OPERATION=READ
MEM_PATTERN=CONTIG
FILE_PATTERN=CONTIG
READ_OPTION=FULL
NUM_DIMS=2
NUM_PARTICLES=8 M
TIMESTEPS=5
EMULATED_COMPUTE_TIME_PER_TIMESTEP=1 s
DIM_1=2048
DIM_2=4096
DIM_3=1

sample_read_strided.cfg

# this is a comment
# Benchmark mode can only be one of these: CC/CI/IC/II/CC2D/CI2D/IC2D/II2D/CC2D/CC3D
IO_OPERATION=READ
TO_READ_NUM_PARTICLES=16777216
READ_OPTION=FULL
MEM_PATTERN=CONTIG
FILE_PATTERN=STRIDED
TIMESTEPS=5
DELAYED_CLOSE_TIMESTEPS=10
COLLECTIVE_DATA=NO #Optional, default for NO.
COLLECTIVE_METADATA=NO #Optional, default for NO.
EMULATED_COMPUTE_TIME_PER_TIMESTEP=1 s
NUM_DIMS=1
DIM_1=16777216 # 16777216, 8388608
DIM_2=1
DIM_3=1
STRIDE_SIZE=64
BLOCK_SIZE=16
BLOCK_CNT=128
ASYNC_MODE=NO  #NON
CSV_FILE=perf_read_1d.csv
#==================================

sample_write_cc1d.cfg

# this is a comment
# Benchmark mode can only be one of these: CC/CI/IC/II/CC2D/CI2D/IC2D/II2D/CC2D/CC3D
# Template cof include all options
IO_OPERATION=WRITE
MEM_PATTERN=CONTIG
FILE_PATTERN=CONTIG
NUM_PARTICLES=16 M #16 K/G
TIMESTEPS=5
#IO_OPERATION=READ #WRITE
#MEM_PATTERN=CONTIG #INTERLEAVED STRIDED
#FILE_PATTERN=CONTIG #STRIDED
DELAYED_CLOSE_TIMESTEPS=2
COLLECTIVE_DATA=NO #Optional, default for NO.
COLLECTIVE_METADATA=NO #Optional, default for NO.
EMULATED_COMPUTE_TIME_PER_TIMESTEP=1 s #1 ms, 1 min
NUM_DIMS=1
DIM_1=16777216 #16777216 # 16777216, 8388608
DIM_2=1
DIM_3=1
ASYNC_MODE=NON #EXP #ASYNC_IMP ASYNC_NON ASYNC_EXP
CSV_FILE=perf_write_1d.csv
#===========================
#WRITE_PATTERN=CC

sample_write_cc1d_es1.cfg

# this is a comment
IO_OPERATION=WRITE
MEM_PATTERN=CONTIG
FILE_PATTERN=CONTIG
NUM_PARTICLES=16 M #K, M, G
TIMESTEPS=5
DELAYED_CLOSE_TIMESTEPS=2
COLLECTIVE_DATA=NO
#Optional, default for NO.
COLLECTIVE_METADATA=NO
#Optional, default for NO.
EMULATED_COMPUTE_TIME_PER_TIMESTEP=1 s
#1 ms, 1 min
NUM_DIMS=1
DIM_1=16777216
#16777216 # 16777216, 8388608
DIM_2=1
DIM_3=1
IO_MEM_LIMIT=1 G
#ASYNC_MODE=ASYNC_EXP
ASYNC_MODE=EXP #IMP NON EXP
#CSV_FILE=perf_write_1d.csv
#===========================
#WRITE_PATTERN=CC

sample_write_cc1d_fileperproc.cfg

# this is a comment
# Benchmark mode can only be one of these: CC/CI/IC/II/CC2D/CI2D/IC2D/II2D/CC2D/CC3D
WRITE_PATTERN=CC
PARTICLE_CNT_M=8
TIME_STEPS_CNT=1
DATA_COLL=NO #Optional, default for NO.
META_COLL=NO #Optional, default for NO.
SLEEP_TIME=1
DIM_1=8388608
DIM_2=1
DIM_3=1
ASYNC_MODE=ASYNC_NON
CSV_FILE=perf_write_1d.csv
FILE_PER_PROC=YES #Optional, default is NO.

sample_write_cc2d.cfg

# this is a comment
# Benchmark mode can only be one of these: CC/CI/IC/II/CC2D/CI2D/IC2D/II2D/CC2D/CC3D
# Template cof include all options
IO_OPERATION=WRITE
MEM_PATTERN=CONTIG
FILE_PATTERN=CONTIG
NUM_PARTICLES=16 M #16 K/G
TIMESTEPS=5
#IO_OPERATION=READ #WRITE
#MEM_PATTERN=CONTIG #INTERLEAVED STRIDED
#FILE_PATTERN=CONTIG #STRIDED
DELAYED_CLOSE_TIMESTEPS=2
COLLECTIVE_DATA=NO #Optional, default for NO.
COLLECTIVE_METADATA=NO #Optional, default for NO.
EMULATED_COMPUTE_TIME_PER_TIMESTEP=1 s #1 ms, 1 min
NUM_DIMS=2
DIM_1=4096 #16777216 # 16777216, 8388608
DIM_2=4096
DIM_3=1
ASYNC_MODE=NON #EXP #ASYNC_IMP ASYNC_NON ASYNC_EXP
CSV_FILE=perf_write_1d.csv
#===========================
#WRITE_PATTERN=CC

sample_write_strided.cfg

# this is a comment
# Benchmark mode can only be one of these: CC/CI/IC/II/CC2D/CI2D/IC2D/II2D/CC2D/CC3D
WRITE_PATTERN=CC
NUM_PARTICLES=16
TIMESTEPS=1
COLLECTIVE_DATA=NO #Optional, default for NO.
COLLECTIVE_METADATA=NO #Optional, default for NO.
EMULATED_COMPUTE_TIME_PER_TIMESTEP=1
DIM_1=8388608
DIM_2=1
DIM_3=1
STRIDE_SIZE=2
BLOCK_SIZE=2
BLOCK_CNT=1048576

template.cfg

#========================================================
#   General settings
NUM_PARTICLES=16 M # 16 K  16777216
TIMESTEPS=5
EMULATED_COMPUTE_TIME_PER_TIMESTEP=1 s #1 ms, 1 min
#========================================================
#   Benchmark data dimensionality
NUM_DIMS=1
DIM_1=16777216 # 16777216, 16 M
DIM_2=1
DIM_3=1
#========================================================
#   IO pattern settings
IO_OPERATION=READ # WRITE
MEM_PATTERN=CONTIG # INTERLEAVED STRIDED
FILE_PATTERN=CONTIG # STRIDED
#========================================================
#    Options for IO_OPERATION=READ
READ_OPTION=FULL # PARTIAL STRIDED
TO_READ_NUM_PARTICLES=4 M
#========================================================
#    Strided access parameters
#STRIDE_SIZE=
#BLOCK_SIZE=
#BLOCK_CNT=
#========================================================
# Collective data/metadata settings
#COLLECTIVE_DATA=NO #Optional, default for NO.
#COLLECTIVE_METADATA=NO #Optional, default for NO.
#========================================================
#    Compression, optional, default is NO.
#COMPRESS=NO
#CHUNK_DIM_1=1
#CHUNK_DIM_2=1
#CHUNK_DIM_3=1
#========================================================
#    Async related settings
DELAYED_CLOSE_TIMESTEPS=2
IO_MEM_LIMIT=5000 K
ASYNC_MODE=EXP #EXP NON
#========================================================
#    Output performance results to a CSV file
#CSV_FILE=perf_write_1d.csv
#
#FILE_PER_PROC=

License

H5bench: a benchmark suite for parallel HDF5 (H5bench) Copyright (c) 2021, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and North Carolina State University. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

(1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

(2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

(3) Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy, North Carolina State University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code (“Enhancements”) to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form.

LAWRENCE BERKELEY NATIONAL LABORATORY Software: PIOK: Parallel I/O Kernels Developers: Suren Byna and Mark Howison

* License Agreement * ” PIOK - Parallel I/O Kernels - VPIC-IO, VORPAL-IO, and GCRM-IO, Copyright (c) 2015, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.”

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

(1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

(2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

(3) Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code (“Enhancements”) to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form.

Indices and tables