<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Replication while interface is brought down</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2900.3354" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff>Hi,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff>Further to the logs which was attached in the previous mails, i
was debugging the PgCluster and found the following behaviour for the hanging of
the test app.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff>The 'PQexec()' will poll() with timeout '-1' ('inifinite') unless
there is something to be read on the socket file descriptor or it syscall is
interrupted. This is in the client app side (mainly the library
../interfaces/libpq-fe). The poll() syscall will get blocked for about ~15
minutes after the network interface is brought down and after that
PQexec(dbConn, "BEGIN") is called. The poll() will return with return value '1'.
My question is who will write into this file descriptor and what is written? Why
will it take 15 minutes to write. </FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff>Is this a known problem or is there any workaround to deal with
this problem. Could you please let me know this.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff>In the server side, there will be connection requests keep coming
for the dsn 'template1'. </FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff>gdb backtrack - </FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=468265306-30062008><FONT face=Arial
color=#ff0000>#0 0x00a5b7a2 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2<BR>#1 0x00b33dbd in poll () from
/lib/tls/libc.so.6<BR>#2 0x00134be6 in pqSocketPoll (sock=6, forRead=1,
forWrite=0, end_time=-1) at fe-misc.c:1037<BR>#3 0x00134ab6 in
pqSocketCheck (conn=0x9eb1008, forRead=1, forWrite=0, end_time=-1) at
fe-misc.c:979<BR>#4 0x001349c7 in pqWaitTimed (forRead=1, forWrite=0,
conn=0x9eb1008, finish_time=-1) at fe-misc.c:911<BR>#5 0x0013499b in
pqWait (forRead=1, forWrite=0, conn=0x9eb1008) at fe-misc.c:894<BR>#6
0x001315ee in PQgetResult (conn=0x9eb1008) at fe-exec.c:1223<BR>#7
0x00131a75 in PQexecFinish (conn=0x9eb1008) at fe-exec.c:1452<BR>#8
0x001317c9 in PQexec (conn=0x9eb1008, query=0x8048c36 "BEGIN") at
fe-exec.c:1293<BR>#9 0x0804881b in main (argc=1, argv=0xbff2f794
"\220«ù¿") at pg_test_app.cpp:35</FONT></SPAN></DIV>
<DIV><FONT face=Arial color=#0000ff></FONT> </DIV>
<DIV><SPAN class=468265306-30062008><FONT face=Arial color=#0000ff>I have
attached the log file again.</FONT></SPAN></DIV>
<DIV><SPAN class=468265306-30062008><FONT face=Arial color=#0000ff>Check from
Line 75 in the attached log.</FONT></SPAN></DIV>
<DIV><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff></FONT></SPAN> </DIV>
<DIV><SPAN class=468265306-30062008><FONT face=Arial color=#0000ff>Please let me
know, what could be the problem as i will have to provide inputs related to the
evaluation.</FONT></SPAN></DIV>
<DIV><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff></FONT></SPAN> </DIV>
<DIV><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff>Environment:</FONT></SPAN></DIV>
<DIV><SPAN class=468265306-30062008><FONT face=Arial color=#0000ff>PgCluster -
1.9.0rc5</FONT></SPAN></DIV>
<DIV><SPAN class=468265306-30062008><FONT face=Arial color=#0000ff>2 ClusterDB -
One in each server</FONT></SPAN></DIV>
<DIV><SPAN class=468265306-30062008><FONT face=Arial color=#0000ff>1 Replicator
in the Server 1 in Active mode</FONT></SPAN></DIV>
<DIV><SPAN class=468265306-30062008><FONT face=Arial color=#0000ff>1 replicator
in the Server 2 in cold standby mode.</FONT></SPAN></DIV>
<DIV><SPAN class=468265306-30062008><FONT face=Arial
color=#0000ff></FONT></SPAN> </DIV>
<DIV><SPAN lang=en-us><FONT face=Arial color=#0000ff>regards,</FONT></SPAN>
<BR><SPAN lang=en-us><FONT face=Arial
color=#0000ff>Niranjan</FONT></SPAN> <BR> </DIV>
<DIV>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> K, Niranjan (NSN - IN/Bangalore)
<BR><B>Sent:</B> Thursday, June 26, 2008 8:17 PM<BR><B>To:</B>
mitani_nl@yahoo.co.jp<BR><B>Subject:</B> RE: [Pgcluster-general] Replication
while interface is brought down<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV dir=ltr align=left><SPAN class=792194414-26062008><FONT face=Arial
color=#0000ff>Hi,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=792194414-26062008><FONT face=Arial
color=#0000ff></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=792194414-26062008><FONT face=Arial
color=#0000ff>I have attached the logs of pgreplicate & postgres. i can see
there is some problem with the lock
'ShareUpdateExclusiveLock'.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=792194414-26062008><FONT face=Arial
color=#0000ff>Do you have any clue why this happens & is there any solution
or workaround for this problem?</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=792194414-26062008><FONT face=Arial
color=#0000ff></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN lang=en-us><FONT face=Arial><FONT
color=#0000ff><SPAN
class=792194414-26062008>r</SPAN>egards,</FONT></FONT></SPAN> <BR><SPAN
lang=en-us><FONT face=Arial color=#0000ff>Niranjan</FONT></SPAN> <BR><BR></DIV>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> pgcluster-general-bounces@pgfoundry.org
[mailto:pgcluster-general-bounces@pgfoundry.org] <B>On Behalf Of </B>ext K,
Niranjan (NSN - IN/Bangalore)<BR><B>Sent:</B> Wednesday, June 25, 2008 10:54
AM<BR><B>To:</B> mitani_nl@yahoo.co.jp;
pgcluster-general@pgfoundry.org<BR><B>Subject:</B> [Pgcluster-general]
Replication while interface is brought down<BR></FONT><BR></DIV>
<DIV></DIV><!-- Converted from text/rtf format -->
<P><FONT face=Arial>Hi,</FONT> </P>
<P><FONT face=Arial>I was checking the synchronous replication scenarios. I have
a test application, which reads counter (COUNTER column) from the table
(COUNTER_TABLE) and increments the counter and updates the table. This will be
done in a loop. I have attached the test app for your reference.</FONT></P>
<P><FONT face=Arial color=#000000 size=2><<test_app.cpp>>
</FONT><BR><FONT face=Arial>When the test_app is in the loop, I bring down the
standby node's interface (ifconfig eth0 down). With this, the test_app in the
active node hangs at the SELECT statement and this hang lasted for ~15 minutes
and then the updation resumed after that. I have configured
'Replication_timeout' as 50 seconds.</FONT></P>
<P><FONT face=Arial>Proceeding further to the above, I brought back the
interface up on the standby node but the replication did not happen.</FONT></P>
<P><FONT face=Arial>Are these known issues? And are there any workarounds to
deal with these problems?</FONT> </P>
<P><FONT face=Arial>regards,</FONT> <BR><FONT face=Arial>Niranjan</FONT>
</P></BODY></HTML>