Feedback
Did this article resolve your question/issue?

   

Article

Should the database block size always be 8K?

« Go Back

Information

 
TitleShould the database block size always be 8K?
URL Name18293
Article Number000123131
EnvironmentProduct : Progress
Version: 8.x, 9.x
Product: OpenEdge
Version: 10.x, 11.x
OS: All supported platforms
Question/Problem Description
Should the database block size always be 8 KB?
What to consider when deciding which database block size to use?
What is the best database block size to use in relation to the filesystem blocksize?
Steps to Reproduce
Clarifying Information
Error Message
Defect Number
Enhancement Number
Cause
Resolution
Progress 8.2A introduced the ability to specify the database block size required: 1KB, 2KB, 4KB, 8KB.
Typically Linux and Windows use a 4KB default filesystem block size.
On most UNIX platforms an 8 KB block size is the filesystem block size when creating a filesystem with the NEWFS command and not specifying the -b to be otherwise. 

Before the introduction of this Progress feature, the database block size was limited to one Progress database block size specific to the supported platform. This blocksize frequently did not match the filesystem block size because these values were chosen many years ago before modern file systems evolved.

Matching the database blocksize to the filesystem blocksize:

One should always consider using a database block size that matches the file system's page size or block size (depending on the type of filesystem). While the best I/O performance is obtained when the database block size is the same as, or a multiple of the filesystem blocksize, the advantage of matching the database block size with the filesystem block size, is that database I/O is better optimized and torn pages cannot occur:
Using a smaller database blocksize than the filesystem blocksize:

When the database block size is smaller than the filesystem blocksize, this results in bad i/o performance as unnecessary OS reads are made when a database block needs to be written to disk. When the database block size (say 1 KB) is smaller than the filesystem blocksize (say 8 KB), the following happens:

On Reads:
  • Progres requests a 1 KB block from the OS through a read() system call.
  • The OS finds the correct filesystem block and reads 8 KB into a kernel buffer then it copies 1 KB into the database buffer pool.
On Writes:
I/O operations on writes can be doubled and in addition, may have to wait for the disk to spin all the way around:
  • Progress requests a write of 1 KB through the write() system call
  • The OS then has to read the appropriate 8 KB block from disk because it is no longer in the kernel buffers
  • Copy in 1 KB from the write call 
  • Write the 8 KB kernel buffer

Using a larger database blocksize than the filesystem blocksize:

If the OS block size is 4K and the database has an 8 KB database block size:
  • The OS will need to perform 2 physical I/O's for every database block read.
  • However there will be a benefit regarding index blocks. With larger index blocks more keys can be stored in a single block, so less block reads need to be performed to find specific keys.
  • Similarly, more records can be stored in a single block when the area is configured with the best records per block for the content. This can have a dramatic performance benefit on system i/o performance.
If the filesystem block size can be changed to 8 KB, there will be a benefit all around (assuming the database block size is also changed to 8 KB).  An 8 KB block size will generally give better performance than 4 KB if the records per block have been correctly configured for the content. There are exceptions, such as if the average record size is very small and there are many of these records when the records per block are set too low.

Reliability is affected when the database block size is larger than the filesystem blocksize. It introduces a small possibility that a database block could be partially written to disk if a complete system crash (say a power failure) or a hardware problem occurs at the right moment.

In most cases, database writes will be atomic, but not always. Database writes will usually be performed as a single disk write by the OS even though two filesystem pages may be involved. But sometimes a database block could be split across a track boundary and the write not be atomic causing a torn page.

The reliability problem is real. The probability of it happening is very low, but if it does, then whatever data block was partially written cannot be corrected by crash recovery.
  • Crash recovery may fail if that block is involved in Redo or Undo bi recovery operations and the database becomes unusable.
  • If Crash Recovery does not fail, the torn page will contain logically incorrect data which will not be detectable through block header integrity checks, only block content checks (data, index).
Torn pages can happen to both Type I or II Areas when a block write is interrupted.
  • With Type I areas there is no way to know that a torn page happened.
  • With Type II areas a check sum of the block is stored in all data block headers, enabling the detection of torn pages, but will not prevent these from occurring.

To change the database block size:

The Database blocksize can only be configured when the database is created. This means an existing database needs to be dumped and loaded in order to change the existing database blocksize to a different database blocksize.
  • Dump the database
  • Create the void database structure:
  • $   prostrct create dbname dbname.st -blocksize 8192
  • Create an empty database to include the meta-schema:
  • $   procopy $DLC/empty8 dbname
  • Load the database
Workaround
Notes
Last Modified Date11/20/2020 7:24 AM
Attachment 
Files
Disclaimer The origins of the information on this site may be internal or external to Progress Software Corporation (“Progress”). Progress Software Corporation makes all reasonable efforts to verify this information. However, the information provided is for your information only. Progress Software Corporation makes no explicit or implied claims to the validity of this information.

Any sample code provided on this site is not supported under any Progress support program or service. The sample code is provided on an "AS IS" basis. Progress makes no warranties, express or implied, and disclaims all implied warranties including, without limitation, the implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample code is borne by the user. In no event shall Progress, its employees, or anyone else involved in the creation, production, or delivery of the code be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample code, even if Progress has been advised of the possibility of such damages.