[Pgcluster-general] what's happening here? As I recover it?
AJ Kertis
akertis at gmail.com
Thu Mar 24 21:36:18 GMT 2005
I've looked into this some more. It seems for me at least most of these
errors are from rsync itself. I can re-produce the same errors disabling
pgcluster and just trying a manual rsync. I don't know if its a bug or
the options we are passing to rsync? I don't know that much about rsync.
Except it seems it thinks it should have usually some pg_xlog file and
it doesn't. So i don't know what goes wrong in the transfer. But once it
fails in a recovery like not getting a pg_xlog file its hard to start up
your postmaster when the xlog files don't match but maybe you have
something new in the base directory. Thats why a pg_resetxlog sometimes
works to fix the problem. I'm currently running rsync 2.6.3 on both
servers here.
AJ Kertis
Jorge Enrique Gallegos Grunauer wrote:
>My English isn't good, but, here I am.
>
>I think that the installation and configuration of the pgcluster
>are correct in the test computers. Here are the configuration files:
>
>**********************************
>hosts (and network)
>-------------------
>#
>#local
>#
>127.0.0.1 localhost.localdomain localhost
>
>#
># replicator
>#
>192.168.1.4 srv4.example.com
>
>#
>#clusters
>#
>192.168.1.5 srv5.example.com
>192.168.1.6 srv6.example.com
>
>
>**********************************
>Cluster.conf
>------------
># Cluster DB Server configuration file
>#------------------------------------------------------------
># file: cluster.conf
>#------------------------------------------------------------
># This file controls:
># o which hosts & port are replication server
># o which port use for replication request to replication
>server
># o which command use for recovery function
>#============================================================
>#------------------------------------------------------------
># set Replication Server information
># o Host_Name : hostname
># o Port : connection for postmaster
># o Recovery_Port : connection for recovery process
>#------------------------------------------------------------
><Replicate_Server_Info>
> <Host_Name> srv4.example.com </Host_Name>
> <Port> 8001 </Port>
> <Recovery_Port> 8101 </Recovery_Port>
> <LifeCheck_Port> 8201 </LifeCheck_Port>
></Replicate_Server_Info>
>#-------------------------------------------------------------
># set Cluster DB Server information
># o Recovery_Port : connection for recovery
># o Rsync_Path : path of rsync command
># o Rsync_Option : file transfer option for rsync
># o When_Stand_Alone : When all replication servers fell,
># you can set up two kinds of permittion,
># "real_only" or "read_write".
>#-------------------------------------------------------------
><Recovery_Port> 7101 </Recovery_Port>
><LifeCheck_Port> 7201 </LifeCheck_Port>
><Rsync_Path> /usr/bin/rsync </Rsync_Path>
><Rsync_Option> ssh -2 </Rsync_Option>
><When_Stand_Alone> read_write </When_Stand_Alone>
><Status_Log_File> /tmp/cluster.sts </Status_Log_File>
><Error_Log_File> /tmp/cluster.log </Error_Log_File>
>#-------------------------------------------------------------
># set partitional replicate control information
># set DB name and Table name to stop reprication
># o DB_Name : DB name
># o Table_Name : table name
>#-------------------------------------------------------------
>#<Not_Replicate_Info>
># <DB_Name> test_db </DB_Name>
># <Table_Name> log_table </Table_Name>
>#</Not_Replicate_Info>
>
>**************************
>pgreplicate.conf
>----------------
>#=============================================================
># PGReplicate configuration file
># for PGCluster-1.1.0a
>#-------------------------------------------------------------
># file: pgreplicate.conf
>#-------------------------------------------------------------
># This file controls:
># o which hosts & port are cluster server
># o which port use for replication request from cluster server
>#=============================================================
>#
>#-------------------------------------------------------------
># A setup of Cluster DB(s)
>#
># o Host_Name : The host name of Cluster DB.
># -- please write a host name by FQDN.
># -- do not write IP address.
># o Port : The connection port with postmaster.
># o Recovery_Port : The connection port at the time of
># a recovery sequence .
># o LifeCheck_Port : connection for life check process
>#-------------------------------------------------------------
><Cluster_Server_Info>
> <Host_Name> srv5.example.com </Host_Name>
> <Port> 5432 </Port>
> <Recovery_Port> 7101 </Recovery_Port>
> <LifeCheck_Port> 7201 </LifeCheck_Port>
></Cluster_Server_Info>
><Cluster_Server_Info>
> <Host_Name> srv6.example.com </Host_Name>
> <Port> 5432 </Port>
> <Recovery_Port> 7101 </Recovery_Port>
> <LifeCheck_Port> 7201 </LifeCheck_Port>
></Cluster_Server_Info>
>#
>#-------------------------------------------------------------
># A setup of Load Balance Server
>#
># o Host_Name : The host name of a load balance server.
># -- please write a host name by FQDN.
># -- do not write IP address.
># o Recovery_Port : The connection port at the time of
># a recovery sequence .
># o LifeCheck_Port : connection for life check process
>#-------------------------------------------------------------
>#<LoadBalance_Server_Info>
># <Host_Name> loadbalancer.postgres.jp </Host_Name>
># <Recovery_Port> 6101 </Recovery_Port>
># <LifeCheck_Port> 6201 </LifeCheck_Port>
>#</LoadBalance_Server_Info>
>#
>#------------------------------------------------------------
># A setup of the upper replication server for cascade connection.
>#
># o Host_Name : The host name of Cluster DB.
># -- please write a host name by FQDN.
># -- do not write IP address.
># o Port : The connection port with postmaster.
># o Recovery_Port : The connection port at the time of
># a recovery sequence .
># o LifeCheck_Port : connection for life check process
>#------------------------------------------------------------
>#<Replicate_Server_Info>
># <Host_Name> upper_replicate.postgres.jp </Host_Name>
># <Port> 8001 </Port>
># <Recovery_Port> 8101 </Recovery_Port>
># <LifeCheck_Port> 8201 </LifeCheck_Port>
>#</Replicate_Server_Info>
>#
>#-------------------------------------------------------------
># A setup of a replication server
>#
># o Status_Log_File : logging file of cluster db's status
># o Error_Log_File : logging file of error and warning
># o Replicate_Port : connection for reprication
># o Recovery_Port : connection for recovery
># o LifeCheck_Port : connection for life check process
># o Response_mode : timing which returns a response
># normal -- return result of DB which received the query
># reliable -- return result after waiting for response of
># all Cluster DBs.
># o Use_Replication_Log : When this server hangs up without
># being replicated to the end,
># a remote server continues the
># replication using this log.
># yes -- use replication log
># no -- not use replication log
># o Reserved_Connections : The number of reserved connections
># from this replication server
># to each cluster dbs.
># (default is 1).
>#-------------------------------------------------------------
><Status_Log_File> /tmp/pgreplicate.sts </Status_Log_File>
><Error_Log_File> /tmp/pgreplicate.log </Error_Log_File>
><Replication_Port> 8001 </Replication_Port>
><Recovery_Port> 8101 </Recovery_Port>
><LifeCheck_Port> 8201 </LifeCheck_Port>
><RLOG_Port> 8301 </RLOG_Port>
><Response_Mode> normal </Response_Mode>
><Use_Replication_Log> no </Use_Replication_Log>
><Reserved_Connections> 1 </Reserved_Connections>
>
>
>***********************************
>***********************************
>
>Ok. In the tests, when I set 2 clusters ON and active the
>replicator, they work correct. But, when a cluster need to be
>restored I have some errors:
>
>---- CLUSTER
>$ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -o "-R" start
>postmaster starting
>Start in recovery mode!
>Please wait until a data synchronization finishes from Master DB...
>PGR_Get_Cluster_Conf_Data failed
>----
>
>Then I restart the replication service:
>
>---- REPLICATOR
># pgreplicate -l -n -v -D /usr/local/pgsql/etc/ -U postgres
>DEBUG:replicate_main():replicate main 8001 port bind OK
>DEBUG:PGRreplicate_packet_send():cmdSts=N
>DEBUG:PGRreplicate_packet_send():cmdType=
>DEBUG:PGRreplicate_packet_send():rlog=0
>DEBUG:PGRreplicate_packet_send():request_id=0
>DEBUG:PGRreplicate_packet_send():replicate_id=0
>DEBUG:PGRreplicate_packet_send():port=0
>DEBUG:PGRreplicate_packet_send():pid=0
>DEBUG:PGRreplicate_packet_send():from_host=srv4.exaple.com
>DEBUG:PGRreplicate_packet_send():dbName=template1
>DEBUG:PGRreplicate_packet_send():userName=postgres
>DEBUG:PGRreplicate_packet_send():recieve sec=0
>DEBUG:PGRreplicate_packet_send():recieve usec=0
>DEBUG:PGRreplicate_packet_send():query_size=79
>DEBUG:PGRreplicate_packet_send():query=SELECT
>PGR_SYSTEM_COMMAND_FUNCTION(1,'srv4.exaple.com',8001,8101,8201)
>DEBUG:sem_lock[1]
>DEBUG:pgr_createConn():PQsetdbLogin host[srv5.example.com]
>port[5432] db[template1] user[postgres]
>DEBUG:pgr_createConn():PQsetdbLogin host[srv6.example.com]
>port[5432] db[template1] user[postgres]
>ERROR:pgr_createConn():PQsetdbLogin failed. close socket
>ERROR:pgr_createConn():PQsetdbLogin failed. close socket
>ERROR:pgr_createConn():PQsetdbLogin failed. close socket
>ERROR:pgr_createConn():PQsetdbLogin failed. close socket
>ERROR:pgr_createConn():PQsetdbLogin failed. close socket
>ERROR:pgr_createConn():PQsetdbLogin timeout
>ERROR:setTransactionTbl():New Transaction but
>pgr_createConn5432 at srv6.example.com failed
>DEBUG:deleteTransactionTbl(): getTransactionTbl failed
>DEBUG:pgr_createConn():PQsetdbLogin ok
>DEBUG:sem_unlock[1]
>
>----
>
>And then I do:
>
>---- CLUSTER
>$ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -o "-R" start
>postmaster starting
>Start in recovery mode!
>Please wait until a data synchronization finishes from Master DB...
>1st recovery step of [global] directory...NG
>PGR_Get_Cluster_Conf_Data failed
>
>----
>
>or
>
>---- CLUSTER
>$ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -o "-R" start
>postmaster starting
>Start in recovery mode!
>Please wait until a data synchronization finishes from Master DB...
>1st recovery step of [global] directory...OK
>1st recovery step of [base] directory...OK
>1st recovery step of [pg_clog] directory...OK
>1st recovery step of [pg_xlog] directory...OK
>1st recovery step of [pg_subtrans] directory...OK
>1st recovery step of [pg_tblspc] directory...OK
>1st sync_table_space OK
>2nd recovery step of [global] directory...OK
>2ndt recovery step of [base] directory...OK
>2nd recovery step of [pg_clog] directory...OK
>2nd recovery step of [pg_xlog] directory...OK
>2nd recovery step of [pg_subtrans] directory...rsync: stat
>"/usr/local/pgsql/data/base/22637/.16691.a6gpRI" failed: No such
>file or directory (2)
>rsync: rename "/usr/local/pgsql/data/base/22637/.16691.a6gpRI"
>-> "base/22637/16691": No such file or directory (2)
>OK
>2nd recovery step of [pg_tblspc] directory...OK
>2nd sync_table_space OK
>LOG: could not create IPv6 socket: Esta familia de direcciones
>no está soportada por el protocolo
>LOG: database system was interrupted at 2005-03-23 13:28:19 ECT
>LOG: could not open file
>"/usr/local/pgsql/data/pg_xlog/000000010000000000000002" (log
>file 0, segment 2): No existe el fichero o el directorio
>LOG: invalid primary checkpoint record
>LOG: could not open file
>"/usr/local/pgsql/data/pg_xlog/000000010000000000000002" (log
>file 0, segment 2): No existe el fichero o el directorio
>LOG: invalid secondary checkpoint record
>PANIC: could not locate a valid checkpoint record
>LOG: startup process (PID 10880) was terminated by signal 6
>LOG: aborting startup due to startup process failure
>rsync error: some files could not be transferred (code 23) at
>main.c(1146)
>
>rsync: stat
>"/usr/local/pgsql/data/pg_xlog/.000000010000000000000002.pjRiT3"
>failed: No such file or directory (2)
>rsync: rename
>"/usr/local/pgsql/data/pg_xlog/.000000010000000000000002.pjRiT3"
>-> "pg_xlog/000000010000000000000002": No such file or directory (2)
>----
>
>and:
>
>---- REPLICATOR
>DEBUG:pgrecovery_loop():[8]receive packet no:1
>DEBUG:first_setup_recovery():1st setup target srv6.example.com
>DEBUG:first_setup_recovery():1st setup port 5432
>DEBUG:pgr_createConn():PQsetdbLogin host[srv5.example.com]
>port[5432] db[template1] user[postgres]
>DEBUG:pgr_createConn():PQsetdbLogin ok
>DEBUG:send_sync_data():sync_command(SELECT
>PGR_SYSTEM_COMMAND_FUNCTION(3,0,0,0,1) )
>DEBUG:pgrecovery_loop():1st master srv5.example.com - 5432
>DEBUG:pgrecovery_loop():1st target srv6.example.com - 5432
>DEBUG:pgrecovery_loop():first_setup_recovery end :0
>DEBUG:pgrecovery_loop():[8]receive packet no:5
>DEBUG:send_sync_data():sync_command(SELECT
>PGR_SYSTEM_COMMAND_FUNCTION(3,0,0,0,1) )
>DEBUG:pgrecovery_loop():2nd master srv5.example.com - 5432
>DEBUG:pgrecovery_loop():2nd target srv6.example.com - 5432
>DEBUG:pgrecovery_loop():second_setup_recovery end :1
>DEBUG:pgrecovery_loop():[8]receive packet no:9
>DEBUG:pgrecovery_loop():last master srv5.example.com - 5432
>DEBUG:pgrecovery_loop():last target srv6.example.com - 5432
>DEBUG:PGRsend_queue():master srv6.example.com - 5432
>
>DEBUG:PGRsend_queue():target srv6.example.com - 5432
>DEBUG:PGRget_recovery_queue_file_for_read(): read_queue_no[0]
>DEBUG:PGRget_recovery_queue_file_for_read():
>fopen[/usr/local/pgsql/etc//.pgr_recovery.1]
>ERROR:PGRget_recovery_queue_file_for_read():could not open
>recovery queue file as /usr/local/pgsql/etc//.pgr_recovery.1.
>reason: No such file or directory
>
>----
>
>what can I do, now? and, What's the .pgr_recovery.1 file?
>
>When all are Ok. I list the registers and see some diference
>between them later the restoration.
>
>When the restoration is Ok. I list the registers in the
>databases and see some diferences between them. Why?
>
>What are the limitations of the pgcluster in read_write mode?
>
>
>
>Thanks beforehand.
>
>
>------------------------------------------------------------------------
>Mail enviado desde PortalMail 1.4.2 Web based email system.
>PaloSanto Solutions, Sunnyvale CA.
>http://www.palosanto.com
>_______________________________________________
>Pgcluster-general mailing list
>Pgcluster-general at pgfoundry.org
>http://pgfoundry.org/mailman/listinfo/pgcluster-general
>
>
>
More information about the Pgcluster-general
mailing list