Read / Write
This set of benchmarks contains an I/O kernel developed based on a particle physics simulation’s I/O pattern (VPIC-IO for writing data in a HDF5 file) and on a big data clustering algorithm (BDCATS-IO for reading the HDF5 file VPIC-IO wrote).
Configuration
You can configure the h5bench_write
and h5bench_read
benchmarks with the following options. Notice that if you use the configuration.json
approach to define the runs for h5bench
, we will automatically generate the final configuration file based on the options you provide in the JSON file. For standalone usage of this benchmark, you can check the input format at the end of this document and refer to its documentation.
Parameter |
Description |
---|---|
|
Options: |
|
Options: |
|
The number of iterations |
|
Sleeps after each iteration to emulate computation |
|
The number of dimensions, valid values are 1, 2 and 3 |
|
The dimensionality of the source data |
|
The dimensionality of the source data |
|
The dimensionality of the source data |
|
Size of the block of data along |
|
Size of the block of data along |
|
Stride of the block of data along |
|
Stride of the block of data along |
For MEM_PATTERN
, CONTIG
represents arrays of basic data types (i.e., int, float, double, etc.); INTERLEAVED
represents an array of structure (AOS) where each array element is a C struct; and STRIDED
represents a few elements in an array of basic data types that are separated by a constant stride. STRIDED
is supported only for 1D arrays.
For FILE_PATTERN
, CONTIG
represents a HDF5 dataset of basic data types (i.e., int, float, double, etc.); INTERLEAVED
represents a dataset of a compound datatype;
For EMULATED_COMPUTE_TIME_PER_TIMESTEP
, you must provide the time unit (e.g. 10 s
, 100 ms
, or 5000us
) to ensure correct behavior.
For DIM_2
and DIM_3
if unused, you should set both as 1
. Notice that the total number of particles will be given by DIM_1 * DIM_2 * DIM_3
. For example, DIM_1=1024
, DIM_2=256
, DIM_3=1
is a valid setting for a 2D array and it will generate 262144
particles.
A set of sample configuration files can be found in the samples/
diretory in GitHub.
READ Settings (h5bench_read
)
Parameter |
Description |
---|---|
|
Options: |
For the PARTIAL
option, the benchmark will read only the first TO_READ_NUM_PARTICLES
particles. PRL
, LDC
, RDC
and CS
options work with a single MPI process. In case multiple processes are used, only the root performs the read operations and all other processes skip the reads.
Asynchronous Settings
Parameter |
Description |
---|---|
|
Options: |
|
Memory threshold to determine when to execute I/O |
|
Groups and datasets will be closed later. |
The IO_MEM_LIMIT
parameter is optional. Its default value is 0
and it requires ASYNC
, i.e., it only works in asynchronous mode. This is the memory threshold used to determine when to actually execute the I/O operations. The actual I/O operations (data read/write) will not be executed until the timesteps associated memory reachs the threshold, or the application run to the end.
For the ASYNC
mode to work you must define the necessay HDF5 ASYNC-VOL connector. For more information about it, refer to its documentation.
Compression Settings
Parameter |
Description |
---|---|
|
YES or NO (optional) enables parralel compression |
|
Chunk dimension |
|
Chunk dimension |
|
Chunk dimension |
Compression is only applicable for h5bench_write
. It has not effect for h5bench_read
. When enabled the chunk dimensions parameters (CHUNK_DIM_1
, CHUNK_DIM_2
, CHUNK_DIM_3
) are required. The chunk dimension settings should be compatible with the data dimensions, i.e., they must have the same rank of dimensions, and chunk dimension size cannot be greater than data dimension size. Extra chunk dimensions have no effect and should be set to 1
.
Warning
There is a known bug on HDF5 parallel compression that could cause the system run out of memory when the chunk amount is large (large number of particle and very small chunk sizes). On Cori Hasswell nodes, the setting of 16M particles per rank, 8 nodes (total 256 ranks), 64 * 64 chunk size will crash the system by runing out of memory, on single nodes the minimal chunk size is 4 * 4.
Collective Operation Settings
Parameter |
Description |
---|---|
|
Enables collective operation (default is |
|
Enables collective HDF5 metadata (default is |
Both COLLECTIVE_DATA
and COLLECTIVE_METADATA
parameters are optional.
Subfiling Settings
Parameter |
Description |
---|---|
|
Enables HDF5 subfiling (default is |
Attention
In order to enable this option your HDF5 must have been compiled with support for the HDF5 Subfiling Virtual File Driver (VFD) which was introduced in the HDF5 1.14.0. For CMake you can use the -DHDF5_ENABLE_PARALLEL=ON -DHDF5_ENABLE_SUBFILING_VFD=ON
and for autotools --enable-parallel --enable-subfiling-vfd=yes
. Without this support, this parameter has no effect.
CSV Settings
Performance results will be written to this file and standard output once a file name is provided.
Parameter |
Description |
---|---|
|
CSV file name to store benchmark results |
Supported Patterns
Attention
Not every pattern combination is covered by the benchmark. Supported benchmark parameter settings are listed below.
Supported Write Patterns (h5bench_write
)
The I/O patterns include array of structures (AOS) and structure of arrays (SOA) in memory as well as in file. The array dimensions are 1D, 2D, and 3D for the write benchmark. This defines the write access pattern, including CONTIG
(contiguous), INTERLEAVED
and STRIDED
for the source (the data layout in the memory) and the destination (the data layout in the resulting file). For example, MEM_PATTERN=CONTIG
and FILE_PATTERN=INTERLEAVED
is a write pattern where the in-memory data layout is contiguous (see the implementation of prepare_data_contig_2D()
for details) and file data layout is interleaved by due to its compound data structure (see the implementation of data_write_contig_to_interleaved()
for details).
4 patterns for both 1D and 2D array write (
NUM_DIMS=1
orNUM_DIMS=2
)
'MEM_PATTERN': 'CONTIG'
'FILE_PATTERN': 'CONTIG'
'MEM_PATTERN': 'CONTIG'
'FILE_PATTERN': 'INTERLEAVED'
'MEM_PATTERN': 'INTERLEAVED'
'FILE_PATTERN': 'CONTIG'
'MEM_PATTERN': 'INTERLEAVED'
'FILE_PATTERN': 'INTERLEAVED'
1 pattern for 3D array (
NUM_DIMS=3
)
'MEM_PATTERN': 'CONTIG'
'FILE_PATTERN': 'CONTIG'
1 strided pattern for 1D array (
NUM_DIMS=1
)
'MEM_PATTERN': 'CONTIG'
'FILE_PATTERN': 'STRIDED'
Supported Read Patterns (h5bench_read
)
1 pattern for 1D, 2D and 3D read (
NUM_DIMS=1
orNUM_DIMS=2
)
Contiguously read through the whole data file:
'MEM_PATTERN': 'CONTIG'
'FILE_PATTERN': 'CONTIG'
'READ_OPTION': 'FULL'
2 patterns for 1D read
Contiguously read the first TO_READ_NUM_PARTICLES
elements:
'MEM_PATTERN': 'CONTIG'
'FILE_PATTERN': 'CONTIG'
'READ_OPTION': 'PARTIAL'
'MEM_PATTERN': 'CONTIG'
'FILE_PATTERN': 'STRIDED'
'READ_OPTION': 'STRIDED'
4 patterns for 2D read
1. PRL: Refers to the Peripheral data access pattern. Data is read from the periphery of the 2D dataset, which is a frame of fixed width and height around the dataset. .. code-block:: none
‘MEM_PATTERN’: ‘CONTIG’ ‘FILE_PATTERN’: ‘CONTIG’ ‘READ_OPTION’: ‘PRL’
2. RDC: Refers to the Right Diagonal Corner data access pattern. Data is read from two identical blocks of fixed sides, one in the top right corner and the other in the bottom left corner in the 2D HDF5 dataset .. code-block:: none
‘MEM_PATTERN’: ‘CONTIG’ ‘FILE_PATTERN’: ‘CONTIG’ ‘READ_OPTION’: ‘RDC’
3. LDC: Refers to the Left Diagonal Corner data access pattern. Data is read from two identical blocks of fixed sides, one in the top left corner and the other in the bottom right corner in the 2D HDF5 dataset .. code-block:: none
‘MEM_PATTERN’: ‘CONTIG’ ‘FILE_PATTERN’: ‘CONTIG’ ‘READ_OPTION’: ‘LDC’
4. CS: Refers to the Cross Stencil data access pattern. A block of fixed sides is used to read data from an HDF5 dataset. This block is given a fixed stride in each dimension and data till end of dataset is read. .. code-block:: none
‘MEM_PATTERN’: ‘CONTIG’ ‘FILE_PATTERN’: ‘CONTIG’ ‘READ_OPTION’: ‘CS’
Understanding the Output
The metadata and raw data operations are timed separately, and the overserved time and I/O rate are based on the total time.
Sample output of h5bench_write
:
================== Performance results =================
Total emulated compute time 4000 ms
Total write size = 2560 MB
Data preparation time = 739 ms
Raw write time = 1.012 sec
Metadata time = 284.990 ms
H5Fcreate() takes 4.009 ms
H5Fflush() takes 14.575 ms
H5Fclose() takes 4.290 ms
Observed completion time = 6.138 sec
Raw write rate = 2528.860 MB/sec
Observed write rate = 1197.592 MB/sec
Sample output of h5bench_read
:
================= Performance results =================
Total emulated compute time = 4 sec
Total read size = 2560 MB
Metadata time = 17.523 ms
Raw read time = 1.201 sec
Observed read completion time = 5.088 sec
Raw read rate = 2132.200 MB/sec
Observed read rate = 2353.605225 MB/sec
Supported Special Write Pattern (h5bench_write_var_normal_dist
)
In h5bench_write
, each process writes the same amount of local data. This program h5bench_write_var_normal_dist
demonstrates a prototype for each process writing a varying size local data buffer which
follows a normal distribution based on the given mean number of particles provided from DIM1
and standard deviation STDEV_DIM1
in the config file. This special benchmark currently supports only DIM1
. check samples/sync-write-1d-contig-contig-write-full_var_normal_dist.json
"benchmarks": [
{
"benchmark": "write_var_normal_dist",
"file": "test.h5",
"configuration": {
"MEM_PATTERN": "CONTIG",
"FILE_PATTERN": "CONTIG",
"TIMESTEPS": "5",
"DELAYED_CLOSE_TIMESTEPS": "2",
"COLLECTIVE_DATA": "YES",
"COLLECTIVE_METADATA": "YES",
"EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s",
"NUM_DIMS": "1",
"DIM_1": "524288",
"STDEV_DIM_1":"100000",
"DIM_2": "1",
"DIM_3": "1",
"CSV_FILE": "output.csv",
"MODE": "SYNC"
}
Sample output of h5bench_write_var_normal_dist
:
================== Performance results =================
metric, value, unit
operation, write,
ranks, 16,
Total number of particles, 8M,
Final mean particles, 550199,
Final standard deviation, 103187.169653,
collective data, YES,
collective meta, YES,
subfiling, NO,
total compute time, 4.000, seconds
total size, 1.849, GB
raw time, 17.949, seconds
raw rate, 105.509, MB/s
metadata time, 0.001, seconds
observed rate, 87.519, MB/s
observed time, 25.639, seconds
Known Issues
Warning
In Cori/NERSC or similar platforms that use Cray-MPICH library, if you encouter a failed assertion regarding support for MPI_THREAD_MULTIPLE
you should define the following environment variable:
export MPICH_MAX_THREAD_SAFETY="multiple"
Warning
If you’re trying to run the benchmark with the HDF5 VOL ASYNC connector in MacOS and are getting segmentation fault (from ABT_thread_create
), please try to set the following environment variable:
export ABT_THREAD_STACKSIZE=100000