Feedback
Did this article resolve your question/issue?

   

Article

Database recovery after runaway transactions

Information

 
TitleDatabase recovery after runaway transactions
URL NameDatabase-recovery-after-runaway-transactions-000063388
Article Number000136546
EnvironmentProduct: OpenEdge
Version: All supported versions
OS: All supported platforms
Question/Problem Description
Database recovery after runaway transactions cause abnormal bi growth.
Transaction runs until the BI file reaches -bithold or available space.
Considerations when long running transactions need to be undone
Can AI roll forward specific transactions?
What to do when the BI file grows unattended to GB size?
How to recover when the database crashes because there's no more BI space available?
How to run database bi recovery mode when the system runs out of disk space?
 
Steps to Reproduce
Clarifying Information
Error Message
Defect Number
Enhancement Number
Cause
Resolution

Take a backup baseline 

Before proceeding take an OS backup of all database files. Compress them an place in a secure location. This allows access to the current database state if needed. Restoring this baseline:

  • Allows recovery to start-again in the originating state, should the initial crash recovery fail for unplanned reasons. Otherwise the bi filespace required will double if restarted from the failed baseline.
  • Allows for different recovery plans to be undertaken in parallel by restoring this baseline to another location, assuring full paths are update in the structure file before running PROSTRCT REPAIR
  • Allows application development access to uncommitted transaction data for data reinstatement once bi recovery has completed on the originating database. Alternatively up-to-date roll-forward hotspare databases can be accessed read-only or a replication target database can be used.

Database BI Recovery Options:   

Option 1: Go through bi recovery   

1.  Ensure sufficient bi space is available; typically twice the current bi size on disk is needed.
  • If the database has not been enabled for largefiles, add additional bi extents beforehand.
  • If the largefiles feature is enabled, assure there are no filesystem or OS limit/quotas that limit the variable extentsize and that this diskspace is available to extend the bi file for the entire duration. Otherwise reserving space is recommended by adding additional fixed bi extents.
  • These can be removed once bi recovery completes with the PROSTRCT REMOVE utility.
2. Consider disabling After Imaging beforehand (rfutil -C aimage end),

Otherwise sufficient AI space will also be needed to record the bi redo undo notes. This adds additional diskspace and i/o overhead during bi recovery.

3. Access the database to allow bi recovery to complete:
          
a. Recommended: Truncate BI

Once bi recovery completes, this allows an offline maintenance window before starting the database to remove additional recovery extents, using bigrow to preformat clusters and adding or revising the existing -bithold value until the root cause of unexpected bi growth can be isolated:
$   proutil dbname -C truncate bi -crStatus 30 -crTXDisplay 

b. Open the database single user
$  _progres -1 dbname -crStatus 30 -crTXDisplay >> recovery.txt

c. Open the database multi-user, without -bithold -bistall parameters
$   proserve dbname -crStatus 30 -crTXDisplay

Result: All open transactions will be rolled back - leaving the database in a state as though all uncommitted transactions and specifically the long transaction didn't run.  

Prior to OpenEdge 11.6.1 where the -crStatus -crTXDisplay parameters were added, one cannot predict how long it will take for bi recovery to finalise the REDO and UNDO phases. Refer to Articles:

Option 2: Restore and roll-forward    

  1. Restore the previous database backup
  2. Ensure sufficient bi space is available; add at least one additional variable bi file with largefiles enabled in an Enterprise database license is in use (as outlined above)
  3. Roll forward AI archives up to the point the long transaction began (see Option 3 below).
Result: After roll-forward operations complete, the Database is in a state prior to the runaway process.
However, any valid data\updates that were originally created in the application in separate transactions from this point onward will have to be re-done. While it is possible to roll forward to a specific transaction it, it is not possible to roll forward specific transactions.

Option 3:  Analyse with aiscan to restore and roll-forward   

While there's rarely enough time to force analysis under these conditions, in theory one could scan the ai files (RFUTIL - C aiscan verbose) in order to apply as many closed transactions while the open transaction was running. It is recommended to proof this option in a test environment and document the site-specific procedure:
  1. Identify the long running transaction ID. Parsing the Transaction Identifier begin and end notes from aimage scan verbose output tracks a transaction's activity and the transaction duration, across ai files. This information could also have been found online at the time once the bithold value had been reached when bistall placed the database in a stalled mode for transaction processing, using the _trans VST.
  2. Identify how many other transactions OPEN / CLOSED while bad-tran was running. [RL_TBGN|RL_TEND]
  3. Find a sweet-point to roll forward, which will include some of bad-tran but as many good-trans that were committed while it was running.
Result: Some of the open/closed transactions are preserved. The open transaction will still need to go through UNDO recovery when the database is next accessed and will require additional bi space as outlined above.

For further information refer to Article  How to view the contents of after image files?  

Data Reinstatement for Recovery Options 1, 2 and 3:   

Data recovery requires re-running online transactions (OLTP).
  • Unless the application vendor has a documented data reinstatement procedure, this can be both onerous and time consuming task.
  • When the application architecture uses multiple databases, site-specific transaction design may require other databases to also be restored to the same transaction-consistent state as the crashed database.
  • Recovering transaction data requires application development engagement, where initially backend reporting against the recovered database provides a baseline on where to start.
Forcing into a copy of the existing database before recovery (hence skipping crash recovery and having a database with open transactions and logical corruption), allows access to extract information to apply to the current database which has completed bi recovery.

Option 4: Don’t shut the database down.   

When the database is started with -bistall, instead of shutting down:
  • First raise the bi threshold. Refer to Article How can the -bithold -bistall and the proquiet bithreshold be implemented?  
  • Identify the bad-tran, gather as much information for later preventative actions then disconnect the client process. Assuming enough bi space, which can be added online if needed, at least most of what they were running are still in shared memory while transaction backout takes place online.  
  • If AI is enabled, the AI files will be recycled as BI notes are written to undo the transaction. Assure sufficient available AI space, otherwise AI can be disabled online with RFUTIL -C AIOFF.
  • If Replication is enabled, ensure that the target is online to unlock AI extents. Otherwise ensure there are also enough AI file space to hold LOCKED AI extents which can be applied to the target database later.
  • If AIMGT is enabled, re-set the switch interval to "on-demand" to use the available ai files to their full extent: rfutil -C aiarchiver setinterval 0
  • Start disconnecting other users as soon as possible after expediently finishing off their current transactions, limiting further bi growth as a result of good-transactions.
Result: The database has the long running transaction reverted and all other transactions completed, at sometime the bi file should be truncated offline.

Option 5: Last resort - skip crash recovery    

When the -F option is used, crash recovery is skipped which should only ever be used as a resource of last resort. For further information refer to Article:

Considerations and outstanding Actions:

1. In order to improve the performance of UNDO/REDO operations:
  • Improve the environment subsystems for faster I/O 
  • For faster block finds during UNDO processing: Dump and load tables with high fragmentation; run idxcompact; move tables to their own Type II Storage Areas.
2. Review the -bithold parameter.  Then the bithold value is reached, it provides an excellent braking system when the unexpected happens rather than the consequences of unattended growth.

The bithold value should adhere to an acceptable 'undo' when needed and ideally with -bistall, allows earlier intervention while the -bithreshold is raised online. It is recommended to at least use these parameters when the root cause of unexpected bi growth remains to be isolated. For further information refer to Articles:    
Workaround
Notes

 
Last Modified Date11/20/2020 6:55 AM
Attachment 
Files
Disclaimer The origins of the information on this site may be internal or external to Progress Software Corporation (“Progress”). Progress Software Corporation makes all reasonable efforts to verify this information. However, the information provided is for your information only. Progress Software Corporation makes no explicit or implied claims to the validity of this information.

Any sample code provided on this site is not supported under any Progress support program or service. The sample code is provided on an "AS IS" basis. Progress makes no warranties, express or implied, and disclaims all implied warranties including, without limitation, the implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample code is borne by the user. In no event shall Progress, its employees, or anyone else involved in the creation, production, or delivery of the code be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample code, even if Progress has been advised of the possibility of such damages.