1. Does the Progress RDBMS work with RAID?
The disk array's physical configuration details are largely transparent to the Progress database storage manager. The array looks to the database like one or more separate disk partitions. The database performs I/O operations in exactly the same way when using a set of independent disks and when using storage arrays. Performance will differ considerably among the various possible storage options.
Progress recommends using a combination of striping and mirroring (RAID 10) for best results. Striping enables the equitable distribution of the I/O workload among the available disks. Mirroring provides fault tolerance as well as the ability to read from any drive in the mirror set.
For the highest level of fault tolerance, you may want to consider triple mirroring. With triple mirroring, you can split off one set of disks for the purpose of making backups while you still have redundancy with the remaining mirrored pairs.
2. Does the Progress database work with disk arrays in a RAID 5 configuration?
Assuming the storage system itself is reliable, there are no inherent reliability problems that occur purely because the storage system is configured for RAID 5. You do not run the risk of database corruption just by using RAID 5.
3. Does the Progress database work well with disk arrays in a RAID 5 configuration?
RAID 5 configurations should /never/ be used for database storage because you will experience poor performance -- possibly extremely poor performance. This is true for all database systems, not just the Progress RDBMS. Read on. Test data which proves this assertion is provided below.
4. Why is RAID 5 performance poor for databases?
All the various RAID configurations involve tradeoffs and compromises among a variety of factors. From a database point of view, RAID 5 is optimized for the wrong thing -- cost instead of disk accesses per second. Disk accesses per second are a precious commodity. The number of disk accesses per second is largely determined by the number of separate disk spindles used for database storage.
There are several reasons RAID 5 is bad for database performance:
a) At a minimum, all write operations to RAID 5 arrays require writing the data to one disk and writing an equal amount of "parity" or error-correction information to a second disk. In many cases, a single write operation will actually require 4 disk i/o operations -- two reads to get the previous data and parity information, and two writes to update the new data and parity information.
b) Write operations always consume half, and sometimes more than half of the total available disk bandwidth. For example, with a 4 disk array, at most two simultaneous writes are possible since each write operation always updates two disks.
But when there is already a write taking place, there is a 50% probability that a second write operation will be delayed because it requires updating one of the two disks that are already busy from the first write.
c) Because write operations consume so much of the disk bandwidth, read performance will also be reduced.
d) The parity information recorded by the array enables recovery of lost data if one disk should fail. But the recovery process requires reading all the data from all the remaining disk drives while a failed disk is being reconstructed. This causes overall performance to be very bad during the recovery operation. The system may become unusable.
These performance disadvantages become worse and worse as the system workload and disk activity increases.
5. Doesn't caching solve these problems?
It helps to mitigate the disadvantages of RAID 5, but it does NOT eliminate them. And the need for large, reliable cache memories adds greatly to the overall complexity and expense of the disk array.
a) In order for the disk subsystem to be reliable in the face of power outages and other failures, the cache memory must be provided with battery power.
b) In order for the cache memory to be useful, it must be fairly large.
Certainly it will help performance a bit, but disk capacity has become so great that caches are generally a tiny percentage of the total storage capacity. Bulk operations such as disk-to-disk backups, dumps and loads, index rebuilds, and so on can generate such a large I/O workload that the cache will be easily be
overwhelmed and become temporarily useless.
c) Caching reads is fine. But caching writes is another matter.
When writes are cached, the database will think that stuff has been written to disk and it has not been. If any of the cached data are lost due to some type of failure, for example a power failure, your database will be smashed and cannot be recovered. Some disk subsystems have battery-backed-up cache memories. You
have to be very, very certain that a failure will not cause the cache contents to be lost. The number of things that can go wrong is large. How sure are you that you can fix the problem before the batteries are used up? And what if you have to disassemble the computer to fix it?
6. Can RAID 5 be used if the database is not being updated?
But if the database is not being updated, then the disks are not being written to and there is not much benefit from using RAID 5 either.
7. If I shouldn't use RAID 5, why do the storage vendors recommend it?
Because they compete with each other on price and don't care about performance. They may also tout the labor savings from increased manageability, without bothering to mention the performance effects.
8. Do you have any data to support your assertions?
Network Appliance ran a benchmark called PostMark to test various filesystems. While this is not a database workload, the results are still worth examining. When comparing the performance of various configurations. They found that in every test, RAID 5 performs worse than all the others.
During the fall of 2002, PSC conducted a series of benchmarks of the Progress RDBMS on Linux. Among other things, was measured the performance of RAID 5 compared to a stripe set and to individual disks. The RAID 5 configuration throughput was only 56% of the throughput of the other two configurations.
9. What about software RAID?
a) RAID is always software; it is just a question of where the software is -- in the operating system, in the disk controller, or in a separate disk subsystem. You don't want it in the operating system because then it uses the same processors that you want to use. RAID in the OS can use anywhere from 2 to 10 percent of the CPU cycles, depending on processor speed, bus speed, number of disks, and other factors.
b) Aside from performance, regardless of level, RAID done in the OS tends to be a bit less reliable than in the disk subsystem. This is because the RAID software is mixed in with a lot of other stuff instead of isolated. If something goes wrong with the other stuff, it could affect the disk buffers.
10. What about vendor x¹s "magical wonderful mumbo-jumbo RAID"?
Read the descriptions of what it does and how it works carefully. If it turns out to be RAID 5 dressed up as something else, beware. Remember that YOU will suffer the consequences of choosing RAID 5, not the storage vendor. Caching mitigates the disadvantages of RAID 5, but it does NOT eliminate them. And the need for large, reliable cache memories adds greatly to the overall complexity and expense of the disk array.