Sunday, April 10, 2011

DB2 9.7 HADR with TSA - Part11 - DB2 HADR useful references

In this artical, I will provide all useful reference i had refered for DB2 HADR configuration. You may also look at my twitter account for some useful DB2 links. My twitter reference
http://twitter.com/gilroygonsalves

Part 0 : DB2 9.7 HADR with TSA Part 00 - Introduction










Part 9 : Converting DB2 9.7 HADR no-read-access standby to DB2 9.7 HADR read-only standby database

Part 10 : Configuring DB2 9.7 Automatic Client re-route  (ACR)




Below are some of my reference which i used for writing this artical. You may also look at my twitter account for more useful DB2 links.

My twitter references

http://twitter.com/gilroygonsalves



--  Cleaning up failed DB2 HADR with TSA configuration

Option 1 => db2haicu -delete
Option 2 =>
                   a) On any node, login as "root" user
                   b) Drop the domain by force
                         rmrpdomain -f 
                   c) Unset the DBM CFG parameter CLUSTER_MGR

1) HADR with DB2 Express 9.5 and DB2 Control Center

2) HADR Wiki (Best Practice)

3) HADR using db2haicu in DB2 9.5

4) Configure HADR using IBM Data Studio

5) HADR Simulator

6) HA database environment using WebSphere Middleware

7) Improving HA in WebSphere Commerce using DB2 HADR

8) Implement DB2 HADR in a TSA domain

9) DB2 HADR setup with TSA using db2haicu and Virtual IP

10) Configure DB2 Universal Database for UNIX to use OpenSSH

11) Redbook - HADR option for DB2 on LUW (pg 259)

12) Password less SSH configuration

12) IBM TSA for DB2 HA (Blog)

13) Enable database high availability using DB2 HADR and Tivoli SA MP in an SAP environment


14) IBM DB2 High Availability Solution: IBM Tivoli System Automation for Multiplatforms (03/2008)

15) IBM DB2 High Availability Solution: IBM Tivoli System Automation for Multiplatforms

16)  DB2 HADR - Case Study of Implementation

17) Build a highly available application platform for J2EE Part 5: Set up DB2 for HA using TSA

18) HA Configuration using IBM TSA with DB2 9.5 FP3 on IBM AIX v5.3

19) Integrating TSAMP with DB2 HADR v9.5

20) Restoring DB in a TSAMP automated DB2 HADR environment

21) DB2 HA configuration using TSA command - Page 113 - 118, Page 137 - 148

22) HA DB2 (Partitioned Database) using Tivoli System Automation

23) Introduction to Tivoli System Automation

24) DB2 Integrated Cluster Environment Deployment Guide Page 359 - 370

25) Clearing out IBM.RecoveryRM.log file

26)  Startup/Shutdown Procedure for DB2 9.5/9.7 HADR in TSAMP environment for maintenenace reason

27) DB2 HADR Performance Issue Monitoring

28) Setting up TSAMP cluster for maintenance, including node reboot 

29) db2stop but no failover in DB2 HADR in TSAMP environment 

30) No failover after db2_kill issued in DB2 HADR in TSAMP environment

31) DB2 HADR resource group remains locked after successful takeover 

32) SuspendedPropagated fro HADR group in TSAMP environment

33) ExcludedList for HADR resource group in TSAMP environment   


34) How to stop RSCT (TSAMP) from rebooting a node

35) Nominal State (Desired State) vs Operational State

36) The "resetrsrc" command - A brief how to guide

37) Service IP shows "Failed Offline" state when failover attempted

38) "2612-023" error code when attempting "resetrsrc" on "Failed Offline" standby resource

39) Using DB2 HADR with TSAMP

DB2 9.7 HADR with TSA - Part10 - Configuring DB2 9.7 Automatic Client re-route (ACR)

In this artical i will provide reference to the document which describe the DB2 9.7 Automatic Client Reroute (ACR) configuration and Limitation

Part 0 : DB2 9.7 HADR with TSA Part 00 - Introduction










Part 9 : Converting DB2 9.7 HADR no-read-access standby to DB2 9.7 HADR read-only standby database


A) Automatic Client Reroute (ACR) Configuration

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0011976.html


B) Automatic Client Reroute (ACR) Limitation

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0011977.html

DB2 9.7 HADR with TSA - Part 09 - Converting DB2 9.7 HADR no-read-access standby to DB2 9.7 HADR read-only standby database

DB2 9.7 HADR with TSA - Part 08 - Performing some DB2 9.7 HADR failover and failback test

In this artical i will provide failover and failback command for DB2 HADR with TSA and provide DB2 HADR test cases. I will also provide HADR step that i followed for testing.

Part 0 : DB2 9.7 HADR with TSA Part 00 - Introduction







 
A) Normal Operation
When DB2 HADR is configured under TSA clustering, under normal operation below is status of TSA resources and resource group

lssam -top



B) Controlled Failover


1. Current Primary Node => mumbai
2. Current Standby Node => london
3. Node on which failover command is executed => mumbai

Note:
For failover operation, the TSA command must be executed as "root" user on the existing primary node.





C) Controlled Failback

1. Current Primary Node => london
2. Current Standby Node => mumbai
3. Node on which failover command is executed => london

Note:
For failover operation, the TSA command must be executed as "root" user on the existing primary node.


 
D) Primary Instance Failure


In this case, TSA will try to start the Primary Instance Automatically.




E) Stanby Instance Failure


In this case, TSA will try to start the Standby Instance Automatically.



F) Primary Node failure


In this test scenario, you need a third node which behave as a Tie-breaker when the communication between each of the HADR node is lost. i.e. when the communication link between HADR node is lost, then the Network Tie-breaker node is used to decide which node will be the owning the cluster resource and will reboot the remaining node.


In order to test this scenario,
1) Create the TSA domain with network quoum which refer to the IP address of third non-HADR node.
2) Then bring down the eth0 card of primary node.
3) Bringing the eth0 card on primary node will force hard boot of Primary Node and TSA will perform a force takeover of HADR on Standby node
4) Following the Primary node restart the DB take the new role as STANDBY node.

G) Stopping TSA monitoring for Database


This will stop the TSA monitoring of the HADR Databases. But, the DB2 HADR is not terminated by this operation. DB2 HADR configuration continue to work as normal. Only, automatic failover of the DB2 HADR is disabled.







H) Starting TSA Monitoring for Database


This operation will start the TSA monitoring of the DB2 HADR database and provide operation of automatic failover following the primay node failure.







I) Standby instance TSA Resource Group failure




Primary Enters into disconnected state



Standby Instance Resource Group restored
At this stage, if the Standby Resource is stopped because of some error, then TSA will try to start it. If the Standby resource is manually stopped then TSA will not try to start it.




J) Primary Instance TSA Resource Group Failure





Standby Enters into DISCONNECTEDPEER State because HADR_PEER_WINDOW=300 (seconds)
Standby enters into REMOTECATCHPENDING state after HADR_PEER_WINDOW expires
Standby continue to stay in REMOTECATCHPENDING STATE

Restore the PRIMARY instance RESOURCE group


K) Failover using DB2 TAKEOVER command



L) Failback using DB2 TAKEOVER command


Monday, April 04, 2011

DB2 9.7 HADR with TSA - Part 07 : Configuring DB2 9.7 HADR to work with TSA

In this artical i will discuss some of the pre-req needed to configure DB2 HADR with TSA and provide some useful references.
a) Pre-Configuration Details
b) Configuring DB2 HADR with TSA
c) Monitoring DB2 HADR with TSA configuration

Part 0 : DB2 9.7 HADR with TSA Part 00 - Introduction






A) Pre-Configuration Details

Item # DescriptionCommand
System On which command will be executed => PrimaryStandby
1 Archive the Current db2diag.log file
db2diag -A

db2diag -A
2 Set the database HADR_PEER_WINDOW configuration parameter to non-zero value
db2 get db cfg for sample | grep -i hadr

db2 get db cfg for sample | grep -i hadr 
3 Prepare the Environment for TSA configuration
a) Login as root
su -

b) Execute the TSA command
preprpnode {node1} {node2}

For Example,
preprpnode mumbai london

a) Login as root
su -

b) Execute the TSA command
preprpnode {node1} {node2}

For Example,
preprpnode mumbai london
4 Identify your Virutal IP(VIP)m DB2 Service Port and Network Quorum IP Address
a) Virtual IP => 192.168.5.55

b) DB2 Service Port
DBM CFG parameter svcename = 60000

c) Network Quorum IP address
In production Environment this should be the address of third machine which becomes the tie breaker. For testing purpose i am using one of the DB2 HADR node IP address.





B) Configuring DB2 HADR with TSA

Item # DescriptionCommand
System On which command will be executed => PrimaryStandby
1 Login as Instance owner No Operation
su - db2inst1
2 Configure the DB2 HADR with TSA No Operation
a) Start the DB2 HA instance configuration Utility
db2haicu

b) Refere the below link for further step by step details
ftp://ftp.software.ibm.com/software/data/pubs/papers/HADR_db2haicu.pdf
3 Problem resoultion for the errors when using Virutal Machine
"If during the creation of Cluster domain following error is reported. This happen when the HADR node is cloned using OS copy command.

Error
2632-044 The domain cannot be created due to the following errors that were detected while harvesting information from the target nodes:
london: 2632-068 This node has the same internal identifier as mumbai and cannot be included

Action

1) Identify the node where it is failing in the db2diag.log file with the above message. For example, from the above message we see "london" is the node name"

2) Login as ""root"" user on that node

3) Execute the command
/usr/sbin/rsct/install/bin/recfgct

4) Prepare the nodes again on both PRIMARY and SECONDARY
preprpnode mumbai london 
 
 
If you get following errors on Standby
2011-10-06-17.53.36.820837-240 E11710207E627       LEVEL: Warning
PID     : 15534                TID  : 47621153264496PROC : db2haicu
INSTANCE: db2inst1             NODE : 000
FUNCTION: DB2 Common, SQLHA APIs for DB2 HA Infrastructure, sqlhaUICreateHADR, probe:1000
RETCODE : ECF=0x9000056F=-1879046801=ECF_SQLHA_HADR_VALIDATION_FAILED
          The HADR DB failed validation before being added to the cluster
MESSAGE : Standby Node not configured for HADR.
DATA #1 : String, 8 bytes
db2inst1
DATA #2 : String, 8 bytes
db2inst1
DATA #3 : String, 6 bytes
sydney
DATA #4 : String, 6 bytes
london
DATA #5 : String, 6 bytes
SAMPLE


1) Refer link
https://www-304.ibm.com/support/docview.wss?uid=swg21420060
https://www-304.ibm.com/support/docview.wss?uid=swg21443643
http://www.ibm.com/developerworks/data/tutorials/dm-1009db2hadr/section3.html



C) Monitoring DB2 HADR with TSA configuration

Item # DescriptionCommand
System On which command will be executed => PrimaryStandby
1 Listing the TSA resources and resource groups as "Instance Owner" user
a) Point in time Snapshot
lssam 

b) Continues Snapshot
 lssam -top
2 Check Network Equivalency
lsequ -Ab
3 Identify the Communication Group
lsrsrc -Ab IBM.NetworkInterface Name IPAddress CommGroup HeartbeatActive NodeNameList
4 Active TieBreaker
lsrsrc -c IBM.PeerNode OpQuorumTieBreaker
5 Service IP Resource
lsrsrc -Ab IBM.ServiceIP
6 Active TieBreaker
lsrsrc -c IBM.PeerNode OpQuorumTieBreaker

D) Some useful resolution

A) Error "2612-023" when attempting "resetrsrc" on "Failed Offline" standby resource





Action
1) Identify the Node on which the resource is shown "Failed Offline".

2) Login to node on which it is failing as "root" user

3) Execute "export CT_MANAGEMENT_SCOPE=2"

4) Execute the following command
resetrsrc -s "Name='db2_db2inst1_db2inst1_SAMPLE-rs' and NodeNameList={'sydney'}" IBM.Application

5) The above command will take the resource out of "Failed Offline" mode.

B) Sometime the "Resource Group" state is shown "Offline" even after all the resource under the resource group are Online.
Action
1) Change the Nominal state of all the "Resource Group" to "Offline"
chrg -o offline

2) Stop the HADR configuration on the database using below sequence
Primary  =>  db2 stop hadr on database sample
Primary  => db2 deactivate db sample
Primary  => db2stop force
Standby  =>  db2 deactivate db sample
Standby  => db2 stop hadr on db sample
Standby  => db2stop force

3) Login as "root" user on any one of the node in the cluster and stop the cluster domain
lsrpdomain
stoprpdomain

4) Start the cluster domain as "root" user on any one node.
startrpdomain
lsrpdomain
lsrpnode

Notes:   The domain take some time to bring all the services online so please be patience

5) Change the Nominal state of instance "Resource Group" to "Online" using below sequence
chrg -o online
chrg -o online

Notes:  After execution of the above command instance on both the server are started automatically

6) Start the HADR on the database using below sequence
Primary  =>  db2 activate db sample
Standby  => db2 start hadr on db sample as standby
Primary  => db2 start hadr on db sample as primary
db2pd -db sample -hadr    ........... This show state as "Peer"

7) Change the Nominal state of "HADR Resource group" to "Online" using below sequence
chrg -o online

8) Check to see if all the resources are showing state as "Online"
lssam