[Pgcluster-general] pgcluster problems after ungraceful shutdown of a replicate server
Chris Price
cprice at its.to
Mon Jan 15 03:12:04 UTC 2007
Scenario Setup;
(config's for servers at end of this email)
2 cluster db's : db01 & db02
2 replication servers in cascade config : app01 & app02 (app01 is
'upper', 02 is 'lower')
'testdb' created and initialized on db01 (pgcbench -i testdb),
viewable on db02.
run pgcbench to put load on cluster from db01 (./pgcbench -c 90 -t
1000 testdb), transactions viewable on db02's 'testdb' database.
TEST GRACEFUL FAILOVER of repl servers;
By shutting down pgreplicate on app01. Immediately app02's pgreplicate
starts handling replication functions. No apparent issues in failover
from app01 (upper) to app02 (lower). Bring app01 (upper) back into
service, gracefully shutdown app02 (lower) and app01 starts handling
replication functions with no apparent issues.
TEST SYSTEM FAILURE of one repl server;
By physically unplugging ethernet cable on app01. app02 DOES NOT start
handling repl fucntions until 30-120 seconds later. Pgcbench ouput
ceases on db01, yet app02 (lower repl) seems to get stuck in a
'replication loop' attempting to process a set of transactions over and
over again. Further, database instances on db01 and db02 no longer
respond to shutdown requests via the 'stop' command. db's must be killed
with a 'kill -9' sequence. In fact, it seems thru my moderate amount of
testing that all replication servers and db servers must be shutdown
(graceful if they take it, kill -9 if they dont) and entire setup
brought back up from scratch to restore correct cluster and repl server
operation.
Thoughts?
Has anyone tested any sort of disaster related scenarios with
failure of one pgreplicate server setup in a cascade config? Graceful
shutdown is all fine and dandy, but I'm worried about whats going to
happen when I have a server crash, failed network switch or other 'bad'
event.
Further, can some kind souls on this list please review my
configurations and help me shed some light on this problem?
Chris
db01 cluster.conf
<Replicate_Server_Info>
<Host_Name> app01 </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>
<Replicate_Server_Info>
<Host_Name> app02 </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>
<Host_Name> db01 </Host_Name>
<Recovery_Port> 7001 </Recovery_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh </Rsync_Option>
<Rsync_Compress> yes </Rsync_Compress>
<Pg_Dump_Path> /clientdata/pgcluster/bin/pg_dump</Pg_Dump_Path>
<When_Stand_Alone> read_only </When_Stand_Alone>
<Replication_Timeout> 1min </Replication_Timeout>
end db01 cluster.conf
db02 cluster.conf
<Replicate_Server_Info>
<Host_Name> app01 </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>
<Replicate_Server_Info>
<Host_Name> app02 </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>
<Host_Name> db02 </Host_Name>
<Recovery_Port> 7001 </Recovery_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh </Rsync_Option>
<Rsync_Compress> yes </Rsync_Compress>
<Pg_Dump_Path> /clientdata/pgcluster/bin/pg_dump</Pg_Dump_Path>
<When_Stand_Alone> read_only </When_Stand_Alone>
<Replication_Timeout> 1min </Replication_Timeout>
end db02 cluster.conf
app01 pgreplicate.conf (upper replicator)
<Cluster_Server_Info>
<Host_Name> db01 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7001 </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> db02 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7001 </Recovery_Port>
</Cluster_Server_Info>
<Host_Name> app01 </Host_Name>
<Replication_Port> 8001 </Replication_Port>
<Recovery_Port> 8101 </Recovery_Port>
<RLOG_Port> 8301 </RLOG_Port>
<Response_Mode> normal </Response_Mode>
<Use_Replication_Log> yes </Use_Replication_Log>
<Replication_Timeout> 1min </Replication_Timeout>
<Error_Log_File> /clientdata/pgcluster/var/log/pgreplicate.log
</Error_Log_File>
<Log_File_Info>
<File_Name> /clientdata/pgcluster/var/log/pgreplicate.log
</File_Name>
<File_Size> 10M </File_Size>
<Rotate> 3 </Rotate>
</Log_File_Info>
end app01 pgreplicate.conf
app02 pgreplicate.conf
<Cluster_Server_Info>
<Host_Name> db01 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7001 </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> db02 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7001 </Recovery_Port>
</Cluster_Server_Info>
<Replicate_Server_Info>
<Host_Name> app01 </Host_Name>
<Port> 8002 </Port>
<Recovery_Port> 8102 </Recovery_Port>
</Replicate_Server_Info>
<Host_Name> app02 </Host_Name>
<Replication_Port> 8001 </Replication_Port>
<Recovery_Port> 8101 </Recovery_Port>
<RLOG_Port> 8301 </RLOG_Port>
<Response_Mode> normal </Response_Mode>
<Use_Replication_Log> yes </Use_Replication_Log>
<Replication_Timeout> 1min </Replication_Timeout>
<Log_File_Info>
<File_Name> /tmp/pgreplicate.log </File_Name>
<File_Size> 10M </File_Size>
<Rotate> 3 </Rotate>
</Log_File_Info>
end app02 pgreplicate.conf
More information about the Pgcluster-general
mailing list