Feedback
Did this article resolve your question/issue?

   

Article

ABL client hangs. Protrace shows malloc function making subcall to malloc on Linux.

Information

 
TitleABL client hangs. Protrace shows malloc function making subcall to malloc on Linux.
URL Nameabl-client-hangs-protrace-shows-malloc-function-making-subcall-to-malloc-on-linux
Article Number000181910
EnvironmentProduct: OpenEdge
Version: 11.x
OS: Linux
Question/Problem Description
ABL client (_progres, _proapsv) hangs and does not respond to PROSHUT commands after sending HANGUP or USR1 signals to the process
The hung ABL client process does not respond to a USR1 request to generate a protrace file
The process no longer responds to SIGUSR1
There is no CPU usage for the client process

PROMON "Blocked Clients" screen shows this processes is WAITING on a latch resource which is never released
multiple processes appear to be waiting and not processing anything while "queueing" on the latch

3 Stacks generated with pstack instead of SIGUSR1 shows a deadlock in the malloc calls where the malloc function makes a subcall to malloc on Linux:
malloc_consolidate
_int_malloc
malloc

 
Steps to Reproduce
Clarifying Information
Other client processes queue up waiting on the hung client's latch and also appear hung while they cannot processing anything:

One user holding MTX and BIB for an extremely long time.
This user is not listed among Blocked clients below:

 Status: Active Transactions
26    user1          -4      SELF/ABL  01/24/19 18:09 01/24/19 18:14 2065125432  Phase 2 FWD 

 Activity: Latch Counts 
MTX 26 0 0 0 0.0 0 0 0 0 0 0 
BIB 26 0 0 0 0.0 0 0 0 0 0 0  


PROMON shows Blocked clients are waiting on BKSH or BKEX
  • Waiting for the MTX as they try to make updates to the database or start a new transaction
  • Blocked attempting to get a buffer lock held by user 26
Status: Blocked Clients

Usr:Ten   Name      Domain     Type     Wait  Wait Info Trans id  Login time     Schema Timestamp
   34     appuser     -4      SELF/APSV BKSH  192:6            0  01/24/19 18:10   1523990790   0
   36     appuser     -4      SELF/APSV BKSH  192:6            0  01/24/19 18:10   1523990790   0
   56     appuser     -4      SELF/ABL  BKEX  192:6            0  11/29/18 10:10   1523990790   0
   57     appuser     -4      SELF/ABL  BKEX  192:6            0  11/29/18 10:10   1523990790   0
   86     appuser     -4      SELF/ABL  BKEX  192:6            0  11/29/18 10:10   1523990790   0
   ...


 
Error Message
Defect NumberDefect PSC00354138
Enhancement Number
Cause
Asynchronous calls to malloc cause a deadlock in the OpenEdge client.

The ABL client received a SIGUSR1 request (signal handler called) which internally deadlocked on a malloc() call while unwinding the stack inside the signal handler.

The call to the signal handler runs the same a-sync function both inside and outside of the signal handler which leads to a malloc a-sync issues. The stack trace generated will show malloc() or _int_malloc() more than once in the c-stack. 

Example:
The call to malloc, followed by the int_malloc , followed by the call to malloc_consolidate are indicators that the problem is being encountered. Example of the c-stack in the protrace file below.

#0  0x000000385c6f809e in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x000000385c67d2df in _L_lock_10176 () from /lib64/libc.so.6
#2  0x000000385c67ab83 in malloc () from /lib64/libc.so.6
#3  0x0000000000a163bb in uttraceback ()
#4  0x0000000000a1992e in uttrace_withsigid ()
#5  0x00000000008eccc4 in drProTrace ()
#6  0x00000000008ebfe8 in drSigDo1 ()
#7  0x00000000008ec360 in drSigDispatch ()
#8  <signal handler called>
#9  0x000000385c6760c2 in malloc_consolidate () from /lib64/libc.so.6
#10 0x000000385c679c28 in _int_malloc () from /lib64/libc.so.6
#11 0x000000385c67ab1c in malloc () from /lib64/libc.so.6
#12 0x0000000000a24801 in stgetbk ()

 
Resolution
Upgrade to OpenEdge 10.2B0868, 11.3.3.041, 11.4.0.050, 11.6.3.014, 11.6.4, 11.7 or later, where code review avoids calling malloc/free from being called inside the core signal handler.

Since Progress have fixed all known issues with deadlocking in localtime malloc, it was found that the backtrace() API from glibc used to unwind the C-level stack trace is also not async signal safe in a number of Linux builds. A new startup parameter -cstackPrintopt , was added as a workaround for this external Linux defect. This is an undocumented workaround. This startup parameter controls which API to use when printing the C-level stack trace when handling SIGUSR1 signal on Linux. For further information refer to Articles:
 
Workaround
Notes
Last Modified Date11/20/2020 6:58 AM
Attachment 
Files
Disclaimer The origins of the information on this site may be internal or external to Progress Software Corporation (“Progress”). Progress Software Corporation makes all reasonable efforts to verify this information. However, the information provided is for your information only. Progress Software Corporation makes no explicit or implied claims to the validity of this information.

Any sample code provided on this site is not supported under any Progress support program or service. The sample code is provided on an "AS IS" basis. Progress makes no warranties, express or implied, and disclaims all implied warranties including, without limitation, the implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample code is borne by the user. In no event shall Progress, its employees, or anyone else involved in the creation, production, or delivery of the code be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample code, even if Progress has been advised of the possibility of such damages.