[Pgcluster-general] Failover weakness
a.mitani at sra-europe.com
a.mitani at sra-europe.com
Thu Jan 11 22:05:31 UTC 2007
Indeed.
At any rate, life-check is required in each server.
At.Mitnai
> So to simulate a failure, just 'power off' the vm's you wish to appear as
> failed - ie: simulate a non-graceful shutdown/system crash.
>
> Chris
>
>
>> Hi Tom,
>>
>> It is depend on the OS whether it detect unpluged cable as network
>> error.
>> The application on linux can not detect it from socket error.
>>
>> My multi-node test is runing on VMware.
>> Therefore, I can not unpluged from it.
>>
>> Regards,
>> -----------------
>> At.Mitani
>>
>>> Did you try to unplug ethernet cable of node-A instead of bringing down
>>> node-A services?
>>> Your test (with the exception of recovery start) works for me, except
>>> that if I unplug node-A from the network, things doesn't work.
>>> Also, I guess that the results will be pretty the same if you cut the
>>> power of node-A instead of unplugging it from the network.
>>>
>>> Regards,
>>> --
>>> Tom;
>>>
>>> a.mitani at sra-europe.com
>>>> Hi Tom,
>>>>
>>>>
>>>>> Again, I have nodes A and B both with cluster and replication server
>>>>> (synchronized). When node A or B gets disconnected from the network
>>>>> (whether the active replicator is in A or B), attempting to execute a
>>>>> write query on the connected DB hangs the query for, say, infinite
>>>>> time.
>>>>> The conclusion, failover doesn't work as one might expect because I
>>>>> have
>>>>> N fail-points for N nodes :( .
>>>>>
>>>>> Is this normal, or just happened to me?
>>>>>
>>>>
>>>> In my test environment, the Cluster A is recovered as normally.
>>>> test was done as follow.
>>>>
>>>> (1) start cluster DBs
>>>> cluster db A started in node A.
>>>> cluster db B started in node B.
>>>>
>>>> (2) start replication servers
>>>> replicator A started in node A.
>>>> replicator B started in node B.
>>>>
>>>> (3)create DB
>>>> send "createdb" command to cluster A.
>>>>
>>>> (4)stop node A
>>>> cluster db A and replicator A were downed.
>>>>
>>>> (5)run pgbench
>>>> run pgbench in cluster B.
>>>>
>>>> (6)re-start cluster db A
>>>> start cluster A with "-R" option in node A.
>>>> <<cluster db A was recovered as normally.>>
>>>>
>>>> (7)re-start replicator A
>>>> just start replicator A in node A
>>>>
>>>> (8)run pgbench again
>>>> run pgbench in cluster db B.
>>>> <<it was replicated in cluster db A as well.>>
>>>>
>>>> I tried same test in 3 nodes.
>>>> The result is same (stopped cluster db is recovered as normally).
>>>>
>>>> I believe that this recovery function is normal.
>>>> However, it seems not easy to find problem with present package.
>>>>
>>>> I'll send a patch for adding debug message to you.
>>>>
>>>>
>>>> At.Mitani
>>>>
>>>> _______________________________________________
>>>> Pgcluster-general mailing list
>>>> Pgcluster-general at pgfoundry.org
>>>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>>>
>>>
>>> _______________________________________________
>>> Pgcluster-general mailing list
>>> Pgcluster-general at pgfoundry.org
>>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>>
>>
>> _______________________________________________
>> Pgcluster-general mailing list
>> Pgcluster-general at pgfoundry.org
>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>
>
>
> _______________________________________________
> Pgcluster-general mailing list
> Pgcluster-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>
More information about the Pgcluster-general
mailing list