[Pgcluster-general] Failover weakness
a.mitani at sra-europe.com
a.mitani at sra-europe.com
Thu Jan 11 21:19:47 UTC 2007
Hi Tom,
It is depend on the OS whether it detect unpluged cable as network error.
The application on linux can not detect it from socket error.
My multi-node test is runing on VMware.
Therefore, I can not unpluged from it.
Regards,
-----------------
At.Mitani
> Did you try to unplug ethernet cable of node-A instead of bringing down
> node-A services?
> Your test (with the exception of recovery start) works for me, except
> that if I unplug node-A from the network, things doesn't work.
> Also, I guess that the results will be pretty the same if you cut the
> power of node-A instead of unplugging it from the network.
>
> Regards,
> --
> Tom;
>
> a.mitani at sra-europe.com
>> Hi Tom,
>>
>>
>>> Again, I have nodes A and B both with cluster and replication server
>>> (synchronized). When node A or B gets disconnected from the network
>>> (whether the active replicator is in A or B), attempting to execute a
>>> write query on the connected DB hangs the query for, say, infinite
>>> time.
>>> The conclusion, failover doesn't work as one might expect because I
>>> have
>>> N fail-points for N nodes :( .
>>>
>>> Is this normal, or just happened to me?
>>>
>>
>> In my test environment, the Cluster A is recovered as normally.
>> test was done as follow.
>>
>> (1) start cluster DBs
>> cluster db A started in node A.
>> cluster db B started in node B.
>>
>> (2) start replication servers
>> replicator A started in node A.
>> replicator B started in node B.
>>
>> (3)create DB
>> send "createdb" command to cluster A.
>>
>> (4)stop node A
>> cluster db A and replicator A were downed.
>>
>> (5)run pgbench
>> run pgbench in cluster B.
>>
>> (6)re-start cluster db A
>> start cluster A with "-R" option in node A.
>> <<cluster db A was recovered as normally.>>
>>
>> (7)re-start replicator A
>> just start replicator A in node A
>>
>> (8)run pgbench again
>> run pgbench in cluster db B.
>> <<it was replicated in cluster db A as well.>>
>>
>> I tried same test in 3 nodes.
>> The result is same (stopped cluster db is recovered as normally).
>>
>> I believe that this recovery function is normal.
>> However, it seems not easy to find problem with present package.
>>
>> I'll send a patch for adding debug message to you.
>>
>>
>> At.Mitani
>>
>> _______________________________________________
>> Pgcluster-general mailing list
>> Pgcluster-general at pgfoundry.org
>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>
>
> _______________________________________________
> Pgcluster-general mailing list
> Pgcluster-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>
More information about the Pgcluster-general
mailing list