[Pgcluster-general] what's happening here? As I recover it?

AJ Kertis akertis at gmail.com
Thu Mar 24 21:36:18 GMT 2005


I've looked into this some more. It seems for me at least most of these 
errors are from rsync itself. I can re-produce the same errors disabling 
pgcluster and just trying a manual rsync. I don't know if its a bug or 
the options we are passing to rsync? I don't know that much about rsync. 
Except it seems it thinks it should have usually some pg_xlog file and 
it doesn't. So i don't know what goes wrong in the transfer. But once it 
fails in a recovery like not getting a pg_xlog file its hard to start up 
your postmaster when the xlog files don't match but maybe you have 
something new in the base directory. Thats why a pg_resetxlog sometimes 
works to fix the problem.  I'm currently running rsync 2.6.3 on both 
servers here.

AJ Kertis


Jorge Enrique Gallegos Grunauer wrote:

>My English isn't good, but, here I am.
>
>I think that the installation and configuration of the pgcluster
>are correct in the test computers. Here are the configuration files:
>
>**********************************
>hosts (and network)
>-------------------
>#
>#local
>#
>127.0.0.1         localhost.localdomain localhost
>
>#
># replicator
>#
>192.168.1.4       srv4.example.com 
>
>#
>#clusters
>#
>192.168.1.5       srv5.example.com
>192.168.1.6       srv6.example.com
>
>
>**********************************
>Cluster.conf
>------------
>#          Cluster DB Server configuration file
>#------------------------------------------------------------
># file: cluster.conf
>#------------------------------------------------------------
># This file controls:
>#       o which hosts & port are replication server
>#       o which port use for replication request to replication
>server
>#       o which command use for recovery function
>#============================================================
>#------------------------------------------------------------
># set Replication Server information
>#		o Host_Name : hostname
>#		o Port : connection for postmaster
>#		o Recovery_Port : connection for recovery process
>#------------------------------------------------------------
><Replicate_Server_Info>
>	<Host_Name> srv4.example.com </Host_Name>
>	<Port> 8001 </Port>
>	<Recovery_Port> 8101 </Recovery_Port>
>	<LifeCheck_Port> 8201 </LifeCheck_Port>
></Replicate_Server_Info>
>#-------------------------------------------------------------
># set Cluster DB Server information
>#		o Recovery_Port : connection for recovery
>#		o Rsync_Path : path of rsync command 
>#		o Rsync_Option : file transfer option for rsync
>#       o When_Stand_Alone : When all replication servers fell,
>#                            you can set up two kinds of permittion,
>#                            "real_only" or "read_write".
>#-------------------------------------------------------------
><Recovery_Port> 7101 </Recovery_Port>
><LifeCheck_Port> 7201 </LifeCheck_Port>
><Rsync_Path> /usr/bin/rsync </Rsync_Path>
><Rsync_Option> ssh -2 </Rsync_Option>
><When_Stand_Alone> read_write  </When_Stand_Alone>
><Status_Log_File>  /tmp/cluster.sts </Status_Log_File>
><Error_Log_File> /tmp/cluster.log  </Error_Log_File>
>#-------------------------------------------------------------
># set partitional replicate control information
>#     set DB name and Table name to stop reprication
>#       o DB_Name : DB name
>#       o Table_Name : table name
>#-------------------------------------------------------------
>#<Not_Replicate_Info>
>#	<DB_Name>     test_db      </DB_Name>
>#	<Table_Name>  log_table    </Table_Name>
>#</Not_Replicate_Info>
>
>**************************
>pgreplicate.conf
>----------------
>#=============================================================
>#  PGReplicate configuration file
>#                                     for  PGCluster-1.1.0a
>#-------------------------------------------------------------
># file: pgreplicate.conf
>#-------------------------------------------------------------
># This file controls:
>#       o which hosts & port are cluster server
>#       o which port use for replication request from cluster server
>#=============================================================
>#
>#-------------------------------------------------------------
># A setup of Cluster DB(s)
>#
>#		o Host_Name : The host name of Cluster DB.
>#		              -- please write a host name by FQDN.
>#		              -- do not write IP address.
>#		o Port : The connection port with postmaster.
>#		o Recovery_Port : The connection port at the time of 
>#		                  a recovery sequence .
>#		o LifeCheck_Port : connection for life check process
>#-------------------------------------------------------------
><Cluster_Server_Info>
>    <Host_Name>   srv5.example.com </Host_Name>
>    <Port>                5432        </Port>
>    <Recovery_Port>       7101        </Recovery_Port>
>    <LifeCheck_Port>      7201        </LifeCheck_Port>
></Cluster_Server_Info>
><Cluster_Server_Info>
>    <Host_Name>   srv6.example.com   </Host_Name>
>    <Port>                5432        </Port>
>    <Recovery_Port>       7101        </Recovery_Port>
>    <LifeCheck_Port>      7201        </LifeCheck_Port>
></Cluster_Server_Info>
>#
>#-------------------------------------------------------------
># A setup of Load Balance Server
>#
>#		o Host_Name : The host name of a load balance server.
>#		              -- please write a host name by FQDN.
>#		              -- do not write IP address.
>#		o Recovery_Port : The connection port at the time of 
>#		                  a recovery sequence .
>#		o LifeCheck_Port : connection for life check process
>#-------------------------------------------------------------
>#<LoadBalance_Server_Info>
>#    <Host_Name>   loadbalancer.postgres.jp  </Host_Name>
>#    <Recovery_Port>       6101        </Recovery_Port>
>#    <LifeCheck_Port>      6201        </LifeCheck_Port>
>#</LoadBalance_Server_Info>
>#
>#------------------------------------------------------------
># A setup of the upper replication server for cascade connection.
>#
>#		o Host_Name : The host name of Cluster DB.
>#		              -- please write a host name by FQDN.
>#		              -- do not write IP address.
>#		o Port : The connection port with postmaster.
>#		o Recovery_Port : The connection port at the time of 
>#		                  a recovery sequence .
>#		o LifeCheck_Port : connection for life check process
>#------------------------------------------------------------
>#<Replicate_Server_Info>
>#    <Host_Name> upper_replicate.postgres.jp </Host_Name>
>#    <Port>                   8001           </Port>
>#    <Recovery_Port>          8101           </Recovery_Port>
>#    <LifeCheck_Port>         8201           </LifeCheck_Port>
>#</Replicate_Server_Info>
>#
>#-------------------------------------------------------------
># A setup of a replication server
>#
>#		o Status_Log_File : logging file of cluster db's status
>#		o Error_Log_File : logging file of error and warning
>#		o Replicate_Port : connection for reprication
>#		o Recovery_Port : connection for recovery
>#		o LifeCheck_Port : connection for life check process
>#		o Response_mode : timing which returns a response
>#		  normal   -- return result of DB which received the query
>#		  reliable -- return result after waiting for response of 
>#                      all Cluster DBs.
>#		o Use_Replication_Log : When this server hangs up without
>#                               being replicated to the end,
>#                               a remote server continues the
>#                               replication using this log. 
>#		  yes  --  use replication log
>#		  no   --  not use replication log
>#		o Reserved_Connections : The number of reserved connections
>#                                from this replication server
>#                                to each cluster dbs.
>#                                (default is 1).
>#-------------------------------------------------------------
><Status_Log_File>  /tmp/pgreplicate.sts  </Status_Log_File>
><Error_Log_File>   /tmp/pgreplicate.log  </Error_Log_File>
><Replication_Port>       8001            </Replication_Port>
><Recovery_Port>          8101            </Recovery_Port>
><LifeCheck_Port>         8201            </LifeCheck_Port>
><RLOG_Port>              8301            </RLOG_Port>
><Response_Mode>        normal            </Response_Mode>
><Use_Replication_Log>      no            </Use_Replication_Log>
><Reserved_Connections>      1            </Reserved_Connections>
>
>
>***********************************
>***********************************
>
>Ok. In the tests, when I set 2 clusters ON and active the
>replicator, they work correct. But, when a cluster need to be
>restored I have some errors:
>
>---- CLUSTER
>$ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -o "-R" start
>postmaster starting
>Start in recovery mode!
>Please wait until a data synchronization finishes from Master DB...
>PGR_Get_Cluster_Conf_Data failed
>----
>
>Then I restart the replication service:
>
>---- REPLICATOR
># pgreplicate -l -n -v -D /usr/local/pgsql/etc/ -U postgres
>DEBUG:replicate_main():replicate main 8001 port bind OK
>DEBUG:PGRreplicate_packet_send():cmdSts=N
>DEBUG:PGRreplicate_packet_send():cmdType=
>DEBUG:PGRreplicate_packet_send():rlog=0
>DEBUG:PGRreplicate_packet_send():request_id=0
>DEBUG:PGRreplicate_packet_send():replicate_id=0
>DEBUG:PGRreplicate_packet_send():port=0
>DEBUG:PGRreplicate_packet_send():pid=0
>DEBUG:PGRreplicate_packet_send():from_host=srv4.exaple.com
>DEBUG:PGRreplicate_packet_send():dbName=template1
>DEBUG:PGRreplicate_packet_send():userName=postgres
>DEBUG:PGRreplicate_packet_send():recieve sec=0
>DEBUG:PGRreplicate_packet_send():recieve usec=0
>DEBUG:PGRreplicate_packet_send():query_size=79
>DEBUG:PGRreplicate_packet_send():query=SELECT
>PGR_SYSTEM_COMMAND_FUNCTION(1,'srv4.exaple.com',8001,8101,8201)
>DEBUG:sem_lock[1]
>DEBUG:pgr_createConn():PQsetdbLogin host[srv5.example.com]
>port[5432] db[template1] user[postgres]
>DEBUG:pgr_createConn():PQsetdbLogin host[srv6.example.com]
>port[5432] db[template1] user[postgres]
>ERROR:pgr_createConn():PQsetdbLogin failed. close socket
>ERROR:pgr_createConn():PQsetdbLogin failed. close socket
>ERROR:pgr_createConn():PQsetdbLogin failed. close socket
>ERROR:pgr_createConn():PQsetdbLogin failed. close socket
>ERROR:pgr_createConn():PQsetdbLogin failed. close socket
>ERROR:pgr_createConn():PQsetdbLogin  timeout
>ERROR:setTransactionTbl():New Transaction but
>pgr_createConn5432 at srv6.example.com failed
>DEBUG:deleteTransactionTbl(): getTransactionTbl failed
>DEBUG:pgr_createConn():PQsetdbLogin ok
>DEBUG:sem_unlock[1]
>
>----
>
>And then I do:
>
>---- CLUSTER
>$ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -o "-R" start
>postmaster starting
>Start in recovery mode!
>Please wait until a data synchronization finishes from Master DB...
>1st recovery step of [global] directory...NG
>PGR_Get_Cluster_Conf_Data failed
>
>----
>
>or
>
>---- CLUSTER
>$ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -o "-R" start
>postmaster starting
>Start in recovery mode!
>Please wait until a data synchronization finishes from Master DB...
>1st recovery step of [global] directory...OK
>1st recovery step of [base] directory...OK
>1st recovery step of [pg_clog] directory...OK
>1st recovery step of [pg_xlog] directory...OK
>1st recovery step of [pg_subtrans] directory...OK
>1st recovery step of [pg_tblspc] directory...OK
>1st sync_table_space OK
>2nd recovery step of [global] directory...OK
>2ndt recovery step of [base] directory...OK
>2nd recovery step of [pg_clog] directory...OK
>2nd recovery step of [pg_xlog] directory...OK
>2nd recovery step of [pg_subtrans] directory...rsync: stat
>"/usr/local/pgsql/data/base/22637/.16691.a6gpRI" failed: No such
>file or directory (2)
>rsync: rename "/usr/local/pgsql/data/base/22637/.16691.a6gpRI"
>-> "base/22637/16691": No such file or directory (2)
>OK
>2nd recovery step of [pg_tblspc] directory...OK
>2nd sync_table_space OK
>LOG:  could not create IPv6 socket: Esta familia de direcciones
>no está soportada por el protocolo
>LOG:  database system was interrupted at 2005-03-23 13:28:19 ECT
>LOG:  could not open file
>"/usr/local/pgsql/data/pg_xlog/000000010000000000000002" (log
>file 0, segment 2): No existe el fichero o el directorio
>LOG:  invalid primary checkpoint record
>LOG:  could not open file
>"/usr/local/pgsql/data/pg_xlog/000000010000000000000002" (log
>file 0, segment 2): No existe el fichero o el directorio
>LOG:  invalid secondary checkpoint record
>PANIC:  could not locate a valid checkpoint record
>LOG:  startup process (PID 10880) was terminated by signal 6
>LOG:  aborting startup due to startup process failure
>rsync error: some files could not be transferred (code 23) at
>main.c(1146)
>
>rsync: stat
>"/usr/local/pgsql/data/pg_xlog/.000000010000000000000002.pjRiT3"
>failed: No such file or directory (2)
>rsync: rename
>"/usr/local/pgsql/data/pg_xlog/.000000010000000000000002.pjRiT3"
>-> "pg_xlog/000000010000000000000002": No such file or directory (2)
>----
>
>and:
>
>---- REPLICATOR
>DEBUG:pgrecovery_loop():[8]receive packet no:1
>DEBUG:first_setup_recovery():1st setup target srv6.example.com
>DEBUG:first_setup_recovery():1st setup port 5432
>DEBUG:pgr_createConn():PQsetdbLogin host[srv5.example.com]
>port[5432] db[template1] user[postgres]
>DEBUG:pgr_createConn():PQsetdbLogin ok
>DEBUG:send_sync_data():sync_command(SELECT
>PGR_SYSTEM_COMMAND_FUNCTION(3,0,0,0,1) )
>DEBUG:pgrecovery_loop():1st master srv5.example.com - 5432
>DEBUG:pgrecovery_loop():1st target srv6.example.com - 5432
>DEBUG:pgrecovery_loop():first_setup_recovery end :0
>DEBUG:pgrecovery_loop():[8]receive packet no:5
>DEBUG:send_sync_data():sync_command(SELECT
>PGR_SYSTEM_COMMAND_FUNCTION(3,0,0,0,1) )
>DEBUG:pgrecovery_loop():2nd master srv5.example.com - 5432
>DEBUG:pgrecovery_loop():2nd target srv6.example.com - 5432
>DEBUG:pgrecovery_loop():second_setup_recovery end :1
>DEBUG:pgrecovery_loop():[8]receive packet no:9
>DEBUG:pgrecovery_loop():last master srv5.example.com - 5432
>DEBUG:pgrecovery_loop():last target srv6.example.com - 5432
>DEBUG:PGRsend_queue():master srv6.example.com - 5432
>
>DEBUG:PGRsend_queue():target srv6.example.com - 5432
>DEBUG:PGRget_recovery_queue_file_for_read(): read_queue_no[0]
>DEBUG:PGRget_recovery_queue_file_for_read():
>fopen[/usr/local/pgsql/etc//.pgr_recovery.1]
>ERROR:PGRget_recovery_queue_file_for_read():could not open
>recovery queue file as /usr/local/pgsql/etc//.pgr_recovery.1.
>reason: No such file or directory
>
>----
>
>what can I do, now? and, What's the .pgr_recovery.1 file?
>
>When all are Ok. I list the registers and see some diference
>between them later the restoration.
>
>When the restoration is Ok. I list the registers in the
>databases and see some diferences between them. Why?  
>
>What are the limitations of the pgcluster in read_write mode?
>
>
>
>Thanks beforehand.
>
>
>------------------------------------------------------------------------
>Mail enviado desde PortalMail 1.4.2 Web based email system.
>PaloSanto Solutions, Sunnyvale CA.
>http://www.palosanto.com
>_______________________________________________
>Pgcluster-general mailing list
>Pgcluster-general at pgfoundry.org
>http://pgfoundry.org/mailman/listinfo/pgcluster-general
>
>  
>



More information about the Pgcluster-general mailing list