[Pgcluster-general] Failover weakness

a.mitani at sra-europe.com a.mitani at sra-europe.com
Thu Jan 11 16:59:16 UTC 2007


Hi Tom,

> Again, I have nodes A and B both with cluster and replication server
> (synchronized). When node A or B gets disconnected from the network
> (whether the active replicator is in A or B), attempting to execute a
> write query on the connected DB hangs the query for, say, infinite time.
> The conclusion, failover doesn't work as one might expect because I have
> N fail-points for N nodes :( .
>
> Is this normal, or just happened to me?

In my test environment, the Cluster A is recovered as normally.
test was done as follow.

(1) start cluster DBs
cluster db A started in node A.
cluster db B started in node B.

(2) start replication servers
replicator A started in node A.
replicator B started in node B.

(3)create DB
send "createdb" command to cluster A.

(4)stop node A
cluster db A and replicator A were downed.

(5)run pgbench
run pgbench in cluster B.

(6)re-start cluster db A
start cluster A with "-R" option in node A.
<<cluster db A was recovered as normally.>>

(7)re-start replicator A
just start replicator A in node A

(8)run pgbench again
run pgbench in cluster db B.
<<it was replicated in cluster db A as well.>>

I tried same test in 3 nodes.
The result is same (stopped cluster db is recovered as normally).

I believe that this recovery function is normal.
However, it seems not easy to find problem with present package.

I'll send a patch for adding debug message to you.


At.Mitani



More information about the Pgcluster-general mailing list