| Detailed description |
We're using pgluster on FreeBSD in multimaster configuration.
(e.g Master cluster node A, cluster node B, pgreplicate on node A)
Nodes are configured to be read-write on cluster failure
After detaching, for example, node B, node A works about several minutes and main process dumps a core.
Backtrace:
#0 0x00000000 in ?? ()
#1 0x082b3b76 in fmgr_security_definer (fcinfo=0xbfbfd908) at fmgr.c:950
#2 0x0816a917 in ExecMakeFunctionResult (fcache=0x6a2daa90, econtext=0x6a2da930, isNull=0x6a33f6b3 "", isDone=0x6a33f71c) at execQual.c:1351
#3 0x08168d8f in ExecProject (projInfo=0x6a33f6c8, isDone=0xbfbfdbe8) at execQual.c:4610
#4 0x081766ae in ExecHashJoin (node=0x6a2da8a8) at nodeHashjoin.c:316
#5 0x081686e2 in ExecProcNode (node=0x6a2da8a8) at execProcnode.c:375
#6 0x08167900 in ExecutorRun (queryDesc=0x6a317440, direction=ForwardScanDirection, count=0) at execMain.c:1335
#7 0x0820adf8 in PortalRunSelect (portal=0x6a2e4018, forward=Variable "forward" is not available.
) at pquery.c:959
#8 0x0820bd83 in PortalRun (portal=0x6a2e4018, count=2147483647, isTopLevel=1 '\001', dest=0x6a31ca18, altdest=0x6a31ca18, completionTag=0xbfbfddda "") at pquery.c:813
#9 0x082069aa in exec_simple_query (
query_string=0x6a240828 "SELECT tr.trouble_id, tr.conn_id, tr.client_id, billing.client_name(tr.client_id) AS client_name,\n", ' ' <repeats 19 times>, "billing.personal_account(tr.client_id, conn.conn_id) AS personal_account,\n "...) at postgres.c:1252
#10 0x082083b9 in PostgresMain (argc=4, argv=0x28931448, username=0x28931428 "vml") at postgres.c:3976
#11 0x081dc80c in PostmasterMain (argc=5, argv=0xbfbfede8) at postmaster.c:3427
#12 0x08194825 in main (argc=5, argv=0xbfbfede8) at main.c:188
(gdb) l fmgr.c:950
945
946 PG_TRY();
947 {
948 fcinfo->flinfo = &fcache->flinfo;
949
950 result = FunctionCallInvoke(fcinfo);
951 }
952 PG_CATCH();
953 {
954 fcinfo->flinfo = save_flinfo;
(gdb) p *fcinfo->flinfo
$3 = {fn_addr = 0, fn_oid = 0, fn_nargs = 1, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x6a3016f8, fn_expr = 0x6a312f60}
fn_addr is NULL and this is causing core
Some code digging reveals code at fmgr.c:236
If PGR_Replicate_Function_Call fails then function retuns without setting fn_addr in appropriate structure.
Removing return fixes this issue
|
|