[Pgcluster-general] Failover weakness
a.mitani at sra-europe.com
a.mitani at sra-europe.com
Thu Jan 11 16:59:16 UTC 2007
Hi Tom,
> Again, I have nodes A and B both with cluster and replication server
> (synchronized). When node A or B gets disconnected from the network
> (whether the active replicator is in A or B), attempting to execute a
> write query on the connected DB hangs the query for, say, infinite time.
> The conclusion, failover doesn't work as one might expect because I have
> N fail-points for N nodes :( .
>
> Is this normal, or just happened to me?
In my test environment, the Cluster A is recovered as normally.
test was done as follow.
(1) start cluster DBs
cluster db A started in node A.
cluster db B started in node B.
(2) start replication servers
replicator A started in node A.
replicator B started in node B.
(3)create DB
send "createdb" command to cluster A.
(4)stop node A
cluster db A and replicator A were downed.
(5)run pgbench
run pgbench in cluster B.
(6)re-start cluster db A
start cluster A with "-R" option in node A.
<<cluster db A was recovered as normally.>>
(7)re-start replicator A
just start replicator A in node A
(8)run pgbench again
run pgbench in cluster db B.
<<it was replicated in cluster db A as well.>>
I tried same test in 3 nodes.
The result is same (stopped cluster db is recovered as normally).
I believe that this recovery function is normal.
However, it seems not easy to find problem with present package.
I'll send a patch for adding debug message to you.
At.Mitani
More information about the Pgcluster-general
mailing list