Feedback
 
Did this article resolve your question/issue?

   

Your feedback is appreciated.

Please tell us how we can make this article more useful. Please provide us a way to contact you, should we need clarification on the feedback provided or if you need further assistance.

Characters Remaining: 1025

 


Article

ABL client hangs. Protrace shows malloc function making subcall to malloc on Linux.

Information

 
Article Number000079495
EnvironmentProduct: OpenEdge
Version: 11.x
OS: Linux
Question/Problem Description
ABL client (_progres, _proapsv) hangs and does not respond to PROSHUT commands after sending HANGUP or USR1 signals to the process
The hung ABL client process does not respond to a USR1 request to generate a protrace file
The process no longer responds to SIGUSR1
There is no CPU usage for the client process

PROMON "Blocked Clients" screen shows this processes is WAITING on a latch resource which is never released
multiple processes appear to be waiting and not processing anything while "queueing" on the latch

3 Stacks generated with pstack instead of SIGUSR1 shows a deadlock in the malloc calls where the malloc function makes a subcall to malloc on Linux:
malloc_consolidate
_int_malloc
malloc

 
Steps to Reproduce
Clarifying Information
Other client processes queue up waiting on the hung client's latch and also appear hung while they cannot processing anything:

One user holding MTX and BIB for an extremely long time.
This user is not listed among Blocked clients below:

 Status: Active Transactions
26    user1          -4      SELF/ABL  01/24/19 18:09 01/24/19 18:14 2065125432  Phase 2 FWD 

 Activity: Latch Counts 
MTX 26 0 0 0 0.0 0 0 0 0 0 0 
BIB 26 0 0 0 0.0 0 0 0 0 0 0  


PROMON shows Blocked clients are waiting on BKSH or BKEX
  • Waiting for the MTX as they try to make updates to the database or start a new transaction
  • Blocked attempting to get a buffer lock held by user 26
Status: Blocked Clients

Usr:Ten   Name      Domain     Type     Wait  Wait Info Trans id  Login time     Schema Timestamp
   34     appuser     -4      SELF/APSV BKSH  192:6            0  01/24/19 18:10   1523990790   0
   36     appuser     -4      SELF/APSV BKSH  192:6            0  01/24/19 18:10   1523990790   0
   56     appuser     -4      SELF/ABL  BKEX  192:6            0  11/29/18 10:10   1523990790   0
   57     appuser     -4      SELF/ABL  BKEX  192:6            0  11/29/18 10:10   1523990790   0
   86     appuser     -4      SELF/ABL  BKEX  192:6            0  11/29/18 10:10   1523990790   0
   ...


 
Error Message
Defect/Enhancement NumberDefect PSC00354138
Cause
Asynchronous calls to malloc cause a deadlock in the OpenEdge client.

The ABL client received a SIGUSR1 request (signal handler called) which internally deadlocked on a malloc() call while unwinding the stack inside the signal handler.

The call to the signal handler runs the same a-sync function both inside and outside of the signal handler which leads to a malloc a-sync issues. The stack trace generated will show malloc() or _int_malloc() more than once in the c-stack. 

Example:
The call to malloc, followed by the int_malloc , followed by the call to malloc_consolidate are indicators that the problem is being encountered. Example of the c-stack in the protrace file below.

#0  0x000000385c6f809e in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x000000385c67d2df in _L_lock_10176 () from /lib64/libc.so.6
#2  0x000000385c67ab83 in malloc () from /lib64/libc.so.6
#3  0x0000000000a163bb in uttraceback ()
#4  0x0000000000a1992e in uttrace_withsigid ()
#5  0x00000000008eccc4 in drProTrace ()
#6  0x00000000008ebfe8 in drSigDo1 ()
#7  0x00000000008ec360 in drSigDispatch ()
#8  <signal handler called>
#9  0x000000385c6760c2 in malloc_consolidate () from /lib64/libc.so.6
#10 0x000000385c679c28 in _int_malloc () from /lib64/libc.so.6
#11 0x000000385c67ab1c in malloc () from /lib64/libc.so.6
#12 0x0000000000a24801 in stgetbk ()

 
Resolution
Upgrade to OpenEdge 10.2B0868, 11.3.3.041, 11.4.0.050, 11.6.3.014, 11.6.4, 11.7 or later, where code review avoids calling malloc/free from being called inside the core signal handler.

Since Progress have fixed all known issues with deadlocking in localtime malloc, it was found that the backtrace() API from glibc used to unwind the C-level stack trace is also not async signal safe in a number of Linux builds. A new startup parameter -cstackPrintopt , was added as a workaround for this external Linux defect. This is an undocumented workaround. This startup parameter controls which API to use when printing the C-level stack trace when handling SIGUSR1 signal on Linux. For further information refer to Articles:
 
Workaround
Notes
Attachment 
Last Modified Date11/11/2019 3:26 PM