[Pgcluster-general] Failover weakness

"Tomás A. Rossi" tomas at mecon.gov.ar
Fri Jan 12 12:44:23 UTC 2007


So, did you get the same results? I'm sure this wouldn't be that hard to 
fix :)

Regards,
--
Tom;

a.mitani at sra-europe.com escribió:
> Indeed.
>
> At any rate, life-check is required in each server.
>
> At.Mitnai
>
>   
>>  So to simulate a failure, just 'power off' the vm's you wish to appear as
>> failed - ie: simulate a non-graceful shutdown/system crash.
>>
>> Chris
>>
>>
>>     
>>> Hi Tom,
>>>
>>> It is depend on the OS whether it detect unpluged cable as network
>>> error.
>>> The application on linux can not detect it from socket error.
>>>
>>> My multi-node test is runing on VMware.
>>> Therefore, I can not unpluged from it.
>>>
>>> Regards,
>>> -----------------
>>> At.Mitani
>>>
>>>       
>>>> Did you try to unplug ethernet cable of node-A instead of bringing down
>>>> node-A services?
>>>> Your test (with the exception of recovery start) works for me, except
>>>> that if I unplug node-A from the network, things doesn't work.
>>>> Also, I guess that the results will be pretty the same if you cut the
>>>> power of node-A instead of unplugging it from the network.
>>>>
>>>> Regards,
>>>> --
>>>> Tom;
>>>>
>>>> a.mitani at sra-europe.com
>>>>         
>>>>> Hi Tom,
>>>>>
>>>>>
>>>>>           
>>>>>> Again, I have nodes A and B both with cluster and replication server
>>>>>> (synchronized). When node A or B gets disconnected from the network
>>>>>> (whether the active replicator is in A or B), attempting to execute a
>>>>>> write query on the connected DB hangs the query for, say, infinite
>>>>>> time.
>>>>>> The conclusion, failover doesn't work as one might expect because I
>>>>>> have
>>>>>> N fail-points for N nodes :( .
>>>>>>
>>>>>> Is this normal, or just happened to me?
>>>>>>
>>>>>>             
>>>>> In my test environment, the Cluster A is recovered as normally.
>>>>> test was done as follow.
>>>>>
>>>>> (1) start cluster DBs
>>>>> cluster db A started in node A.
>>>>> cluster db B started in node B.
>>>>>
>>>>> (2) start replication servers
>>>>> replicator A started in node A.
>>>>> replicator B started in node B.
>>>>>
>>>>> (3)create DB
>>>>> send "createdb" command to cluster A.
>>>>>
>>>>> (4)stop node A
>>>>> cluster db A and replicator A were downed.
>>>>>
>>>>> (5)run pgbench
>>>>> run pgbench in cluster B.
>>>>>
>>>>> (6)re-start cluster db A
>>>>> start cluster A with "-R" option in node A.
>>>>> <<cluster db A was recovered as normally.>>
>>>>>
>>>>> (7)re-start replicator A
>>>>> just start replicator A in node A
>>>>>
>>>>> (8)run pgbench again
>>>>> run pgbench in cluster db B.
>>>>> <<it was replicated in cluster db A as well.>>
>>>>>
>>>>> I tried same test in 3 nodes.
>>>>> The result is same (stopped cluster db is recovered as normally).
>>>>>
>>>>> I believe that this recovery function is normal.
>>>>> However, it seems not easy to find problem with present package.
>>>>>
>>>>> I'll send a patch for adding debug message to you.
>>>>>
>>>>>
>>>>> At.Mitani
>>>>>
>>>>> _______________________________________________
>>>>> Pgcluster-general mailing list
>>>>> Pgcluster-general at pgfoundry.org
>>>>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>>>>
>>>>>           
>>>> _______________________________________________
>>>> Pgcluster-general mailing list
>>>> Pgcluster-general at pgfoundry.org
>>>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>>>
>>>>         
>>> _______________________________________________
>>> Pgcluster-general mailing list
>>> Pgcluster-general at pgfoundry.org
>>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>>
>>>       
>> _______________________________________________
>> Pgcluster-general mailing list
>> Pgcluster-general at pgfoundry.org
>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>
>>     
>
> _______________________________________________
> Pgcluster-general mailing list
> Pgcluster-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pgfoundry.org/pipermail/pgcluster-general/attachments/20070112/0728bcde/attachment.html 


More information about the Pgcluster-general mailing list