Scyld ClusterWare HPC: Programmer's Guide
<< Previous	Parallel Virtual File System (PVFS)	Next >>

Using Multi-Dimensional Blocking

The PVFS multi-dimensional block interface (MDBI) provides a slightly higher-level view of file data than the native PVFS interface. With the MDBI, file data is considered as an N dimensional array of records. This array is divided into "blocks'' of records by specifying the dimensions of the array and the size of the blocks in each dimension. The parameters used to describe the array are as follows:

D - number of dimensions

rs - record size

nbn - number of blocks (in each dimension)

nen - number of elements in a block (in each dimension)

bfn - blocking factor (in each dimension), described later

Once the programmer has defined the view of the data set, blocks of data can be read with single function calls, greatly simplifying the act of accessing these types of data sets. This is done by specifying a set of index values, one per dimension.

There are five basic calls used for accessing files with MDBI:

	int open_blk(char *path, int flags, int mode);
	int set_blk(int fd, int D, int rs, int ne1, int nb1, ..., int nen, int nbb);
	int read_blk(int fd, char *buf, int index1, ..., int indexn);
	int write_blk(int fd, char *buf, int index1, ..., int indexn);
	int close_blk(int fd);

The open_blk() and close_blk() calls operate similarly to the standard UNIX open() and close(). set_blk() is the call used to set the blocking parameters for the array before reading or writing. It can be used as often as necessary and does not entail communication. read_blk() and write_blk() are used to read blocks of records once the blocking has been set.

Figure 5. MDBI Example

In Figure 5 we can see an example of blocking. Here a file has been described as a two dimensional array of blocks, with blocks consisting of a two by three array of records. Records are shown with dotted lines, with groups of records organized into blocks denoted with solid lines.

In this example, the array would be described with a call to set_blk() as follows:

	set_blk(fd, 2, 500, 2, 6, 3, 3);

If we wanted to read block (2, 0) from the array, we could then:

	read_blk(fd, &buf, 2, 0);

Similarly, to read block (5, 2):

	write_blk(fd, &blk, 5, 2);

A final feature of the MDBI is block buffering. Sometimes multi-dimensional blocking is used to set the size of the data that the program wants to read and write from disk. Other times the block size has some physical meaning in the program and is set for other reasons. In this case, individual blocks may be rather small, resulting in poor I/O performance an under utilization of memory. MDBI provides a buffering mechanism that causes multiple blocks to be read and written from disk and stored in a buffer in the program's memory address space. Subsequent transfers using read_blk() and write_blk() result in memory-to-memory transfers unless a block outside of the current buffer is accessed.

Since it is difficult to predict what blocks should be accessed when ahead of time, PVFS relies on user cues to determine what to buffer. This is done by defining "blocking factors'' which group blocks together. A single function is used to define the blocking factor:

	int buf_blk(int fd, int bf1, ..., int bfn);

The blocking factor indicates how many blocks in the given dimension should be buffered.

Looking at Figure 5 again, we can see how blocking factors can be defined. In the example, the call:

	buf_blk(fd, 2, 1);

is used to specify the blocking factor. We denote the larger resulting buffered blocks as superblocks (a poor choice of terms in retrospect), one of which is shaded in the example.

Whenever a block is accessed, if its superblock is not in the buffer, the current superblock is written back to disk (if dirty) and the new superblock is read in its place — then the desired block is copied into the given buffer. The default blocking factor for all dimensions is 1, and any time the blocking factor is changed the buffer is written back to disk if dirty.

It is important to understand that no cache coherency is performed here; if application tasks are sharing superblocks, unexpected results will occur. It is up to the user to ensure that this does not happen. A good strategy for buffering is to develop your program without buffering turned on, and then enable it later in order to improve performance.

<< Previous	Home	Next >>
Parallel Virtual File System (PVFS)	Up	Using ROMIO MPI-IO