[Pgcluster-general] Failover weakness

cprice at its.to cprice at its.to
Thu Jan 11 21:31:30 UTC 2007


 So to simulate a failure, just 'power off' the vm's you wish to appear as
failed - ie: simulate a non-graceful shutdown/system crash.

Chris


> Hi Tom,
>
> It is depend on the OS whether it detect unpluged cable as network error.
> The application on linux can not detect it from socket error.
>
> My multi-node test is runing on VMware.
> Therefore, I can not unpluged from it.
>
> Regards,
> -----------------
> At.Mitani
>
>> Did you try to unplug ethernet cable of node-A instead of bringing down
>> node-A services?
>> Your test (with the exception of recovery start) works for me, except
>> that if I unplug node-A from the network, things doesn't work.
>> Also, I guess that the results will be pretty the same if you cut the
>> power of node-A instead of unplugging it from the network.
>>
>> Regards,
>> --
>> Tom;
>>
>> a.mitani at sra-europe.com
>>> Hi Tom,
>>>
>>>
>>>> Again, I have nodes A and B both with cluster and replication server
>>>> (synchronized). When node A or B gets disconnected from the network
>>>> (whether the active replicator is in A or B), attempting to execute a
>>>> write query on the connected DB hangs the query for, say, infinite
>>>> time.
>>>> The conclusion, failover doesn't work as one might expect because I
>>>> have
>>>> N fail-points for N nodes :( .
>>>>
>>>> Is this normal, or just happened to me?
>>>>
>>>
>>> In my test environment, the Cluster A is recovered as normally.
>>> test was done as follow.
>>>
>>> (1) start cluster DBs
>>> cluster db A started in node A.
>>> cluster db B started in node B.
>>>
>>> (2) start replication servers
>>> replicator A started in node A.
>>> replicator B started in node B.
>>>
>>> (3)create DB
>>> send "createdb" command to cluster A.
>>>
>>> (4)stop node A
>>> cluster db A and replicator A were downed.
>>>
>>> (5)run pgbench
>>> run pgbench in cluster B.
>>>
>>> (6)re-start cluster db A
>>> start cluster A with "-R" option in node A.
>>> <<cluster db A was recovered as normally.>>
>>>
>>> (7)re-start replicator A
>>> just start replicator A in node A
>>>
>>> (8)run pgbench again
>>> run pgbench in cluster db B.
>>> <<it was replicated in cluster db A as well.>>
>>>
>>> I tried same test in 3 nodes.
>>> The result is same (stopped cluster db is recovered as normally).
>>>
>>> I believe that this recovery function is normal.
>>> However, it seems not easy to find problem with present package.
>>>
>>> I'll send a patch for adding debug message to you.
>>>
>>>
>>> At.Mitani
>>>
>>> _______________________________________________
>>> Pgcluster-general mailing list
>>> Pgcluster-general at pgfoundry.org
>>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>>
>>
>> _______________________________________________
>> Pgcluster-general mailing list
>> Pgcluster-general at pgfoundry.org
>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>
>
> _______________________________________________
> Pgcluster-general mailing list
> Pgcluster-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>




More information about the Pgcluster-general mailing list