Take a backup baseline
Before proceeding take an OS backup of all database files. Compress them an place in a secure location. This allows access to the current database state if needed. Restoring this baseline:
- Allows recovery to start-again in the originating state, should the initial crash recovery fail for unplanned reasons. Otherwise the bi filespace required will double if restarted from the failed baseline.
- Allows for different recovery plans to be undertaken in parallel by restoring this baseline to another location, assuring full paths are update in the structure file before running PROSTRCT REPAIR
- Allows application development access to uncommitted transaction data for data reinstatement once bi recovery has completed on the originating database. Alternatively up-to-date roll-forward hotspare databases can be accessed read-only or a replication target database can be used.
Database BI Recovery Options:
Option 1: Go through bi recovery
1. Ensure sufficient bi space
is available; typically twice the current bi size on disk is needed.
- If the database has not been enabled for largefiles, add additional bi extents beforehand.
- If the largefiles feature is enabled, assure there are no filesystem or OS limit/quotas that limit the variable extentsize and that this diskspace is available to extend the bi file for the entire duration. Otherwise reserving space is recommended by adding additional fixed bi extents.
- These can be removed once bi recovery completes with the PROSTRCT REMOVE utility.
2. Consider disabling After Imaging
beforehand (rfutil -C aimage end
Otherwise sufficient AI space will also be needed to record the bi redo undo notes. This adds additional diskspace and i/o overhead during bi recovery.
3. Access the database to allow bi recovery to complete:
a. Recommended: Truncate BI
Once bi recovery completes, this allows an offline maintenance window before starting the database to remove additional recovery extents, using bigrow to preformat clusters and adding or revising the existing -bithold value until the root cause of unexpected bi growth can be isolated:
$ proutil dbname -C truncate bi -crStatus 30 -crTXDisplay
b. Open the database single user
$ _progres -1 dbname -crStatus 30 -crTXDisplay >> recovery.txt
c. Open the database multi-user, without -bithold -bistall parameters
$ proserve dbname -crStatus 30 -crTXDisplay
Result: All open transactions will be rolled back - leaving the database in a state as though all uncommitted transactions and specifically the long transaction didn't run.
Prior to OpenEdge 11.6.1 where the -crStatus -crTXDisplay parameters were added, one cannot predict how long it will take for bi recovery to finalise the REDO and UNDO phases. Refer to Articles:
Option 2: Restore and roll-forward
- Restore the previous database backup
- Ensure sufficient bi space is available; add at least one additional variable bi file with largefiles enabled in an Enterprise database license is in use (as outlined above)
- Roll forward AI archives up to the point the long transaction began (see Option 3 below).
Result: After roll-forward operations complete, the Database is in a state prior to the runaway process.
However, any valid data\updates that were originally created in the application in separate transactions from this point onward will have to be re-done. While it is possible to roll forward to a specific transaction it, it is not possible to roll forward specific transactions.
Option 3: Analyse with aiscan to restore and roll-forward
While there's rarely enough time to force analysis under these conditions, in theory one could scan the ai files (RFUTIL - C aiscan verbose
) in order to apply as many closed transactions while the open transaction was running. It is recommended to proof this option in a test environment and document the site-specific procedure:
- Identify the long running transaction ID. Parsing the Transaction Identifier begin and end notes from aimage scan verbose output tracks a transaction's activity and the transaction duration, across ai files. This information could also have been found online at the time once the bithold value had been reached when bistall placed the database in a stalled mode for transaction processing, using the _trans VST.
- Identify how many other transactions OPEN / CLOSED while bad-tran was running. [RL_TBGN|RL_TEND]
- Find a sweet-point to roll forward, which will include some of bad-tran but as many good-trans that were committed while it was running.
Some of the open/closed transactions are preserved. The open transaction will still need to go through UNDO recovery when the database is next accessed and will require additional bi space
as outlined above.
For further information refer to Article How to view the contents of after image files?
Data Reinstatement for Recovery Options 1, 2 and 3:
Data recovery requires re-running online transactions (OLTP).
- Unless the application vendor has a documented data reinstatement procedure, this can be both onerous and time consuming task.
- When the application architecture uses multiple databases, site-specific transaction design may require other databases to also be restored to the same transaction-consistent state as the crashed database.
- Recovering transaction data requires application development engagement, where initially backend reporting against the recovered database provides a baseline on where to start.
Forcing into a copy of the existing database before recovery (hence skipping crash recovery and having a database with open transactions and logical corruption), allows access to extract information to apply to the current database which has completed bi recovery.
Option 4: Don’t shut the database down.
When the database is started with -bistall
, instead of shutting down:
- First raise the bi threshold. Refer to Article How can the -bithold -bistall and the proquiet bithreshold be implemented?
- Identify the bad-tran, gather as much information for later preventative actions then disconnect the client process. Assuming enough bi space, which can be added online if needed, at least most of what they were running are still in shared memory while transaction backout takes place online.
- If AI is enabled, the AI files will be recycled as BI notes are written to undo the transaction. Assure sufficient available AI space, otherwise AI can be disabled online with RFUTIL -C AIOFF.
- If Replication is enabled, ensure that the target is online to unlock AI extents. Otherwise ensure there are also enough AI file space to hold LOCKED AI extents which can be applied to the target database later.
- If AIMGT is enabled, re-set the switch interval to "on-demand" to use the available ai files to their full extent: rfutil -C aiarchiver setinterval 0
- Start disconnecting other users as soon as possible after expediently finishing off their current transactions, limiting further bi growth as a result of good-transactions.
Result: The database has the long running transaction reverted and all other transactions completed, at sometime the bi file should be truncated offline.
Option 5: Last resort - skip crash recovery
When the -F option is used, crash recovery is skipped which should only ever be used as a resource of last resort. For further information refer to Article:
Considerations and outstanding Actions:
1. In order to improve the performance of UNDO/REDO operations:
- Improve the environment subsystems for faster I/O
- For faster block finds during UNDO processing: Dump and load tables with high fragmentation; run idxcompact; move tables to their own Type II Storage Areas.
2. Review the -bithold
parameter. Then the bithold value is reached, it provides an excellent braking system when the unexpected happens rather than the consequences of unattended growth.
The bithold value should adhere to an acceptable 'undo
' when needed and ideally with -bistall
, allows earlier intervention while the -bithreshold
is raised online. It is recommended to at least use these parameters when the root cause of unexpected bi growth remains to be isolated. For further information refer to Articles: