introduced the ability to specify the database block size required: 1KB, 2KB, 4KB, 8KB.
Typically Linux and Windows use a 4KB default filesystem block size.
On most UNIX platforms an 8 KB block size is the filesystem block size when creating a filesystem with the NEWFS command and not specifying the -b to be otherwise.
Before the introduction of this Progress feature, the database block size was limited to one Progress database block size specific to the supported platform. This blocksize frequently did not match the filesystem block size because these values were chosen many years ago before modern file systems evolved.Matching the database blocksize to the filesystem blocksize:
One should always consider using a database block size that matches the file system's page size or block size (depending on the type of filesystem). While the best I/O performance is obtained when the database block size is the same as, or a multiple of the filesystem blocksize, the advantage of matching the database block size with the filesystem block size, is that database I/O is better optimized and torn pages cannot occur:
Using a smaller database blocksize than the filesystem blocksize:
When the database block size is smaller than the filesystem blocksize, this results in bad i/o performance as unnecessary OS reads are made when a database block needs to be written to disk. When the database block size (say 1 KB) is smaller than the filesystem blocksize (say 8 KB), the following happens:
- Progres requests a 1 KB block from the OS through a read() system call.
- The OS finds the correct filesystem block and reads 8 KB into a kernel buffer then it copies 1 KB into the database buffer pool.
I/O operations on writes can be doubled and in addition, may have to wait for the disk to spin all the way around:
Using a larger database blocksize than the filesystem blocksize:
- Progress requests a write of 1 KB through the write() system call
- The OS then has to read the appropriate 8 KB block from disk because it is no longer in the kernel buffers
- Copy in 1 KB from the write call
- Write the 8 KB kernel buffer
If the OS block size is 4K and the database has an 8 KB database block size:
- The OS will need to perform 2 physical I/O's for every database block read.
- However there will be a benefit regarding index blocks. With larger index blocks more keys can be stored in a single block, so less block reads need to be performed to find specific keys.
- Similarly, more records can be stored in a single block when the area is configured with the best records per block for the content. This can have a dramatic performance benefit on system i/o performance.
If the filesystem block size can be changed to 8 KB, there will be a benefit all around (assuming the database block size is also changed to 8 KB). An 8 KB block size will generally give better performance than 4 KB if the records per block have been correctly configured for the content. There are exceptions, such as if the average record size is very small and there are many of these records when the records per block are set too low.
Reliability is affected when the database block size is larger than the filesystem blocksize. It introduces a small possibility that a database block could be partially written to disk if a complete system crash (say a power failure) or a hardware problem occurs at the right moment.
In most cases, database writes will be atomic, but not always. Database writes will usually be performed as a single disk write by the OS even though two filesystem pages may be involved. But sometimes a database block could be split across a track boundary and the write not be atomic causing a torn page
The reliability problem is real. The probability of it happening is very low, but if it does, then whatever data block was partially written cannot be corrected by crash recovery.
- Crash recovery may fail if that block is involved in Redo or Undo bi recovery operations and the database becomes unusable.
- If Crash Recovery does not fail, the torn page will contain logically incorrect data which will not be detectable through block header integrity checks, only block content checks (data, index).
Torn pages can happen to both Type I or II Areas when a block write is interrupted.
To change the database block size:
- With Type I areas there is no way to know that a torn page happened.
- With Type II areas a check sum of the block is stored in all data block headers, enabling the detection of torn pages, but will not prevent these from occurring.
The Database blocksize can only be configured when the database is created. This means an existing database needs to be dumped and loaded in order to change the existing database blocksize to a different database blocksize.
- Dump the database
- Create the void database structure:
- $ prostrct create dbname dbname.st -blocksize 8192
- Create an empty database to include the meta-schema:
- $ procopy $DLC/empty8 dbname
- Load the database