High Availability is the backbone of any mission-critical database system, and Oracle RAC (Real Application Clusters) ensures seamless failover. But what happens when a node in the cluster fails? ๐ค
Hereโs a step-by-step guide to reinstating a failed node in an Oracle RAC 19c environment.
โ
Steps to Reinstate the Failed Node
๐น Step 1: Stop Cluster Services on the Failed Node
If the node is partially up, stop all cluster services before proceeding:
crsctl stop crs
๐น Step 2: Reconfigure SSH Equivalency
Ensure passwordless SSH setup between nodes:
cluvfy comp nodecon -n , -verbose
If required, reconfigure SSH:
$GRID_HOME/root.sh
๐น Step 3: Verify Cluster Configuration
Run a cluster verification test:
cluvfy stage -post crsinst -n , -verbose
๐น Step 4: Add the Failed Node Back to the Cluster
On a surviving node, use addnode.sh:
$GRID_HOME/addnode.sh -silent -responseFile /path/to/addnode.rsp
๐น Step 5: Run Root Scripts on the Failed Node
Execute these scripts as root:
$GRID_HOME/root.sh
$ORACLE_HOME/root.sh
๐น Step 6: Verify Cluster Status
Check the cluster status and ensure the node is back online:
crsctl stat res -t
srvctl status nodeapps -n
๐น Step 7: Restart Services if Needed
crsctl start crs
srvctl start instance -d -n
๐ฏ Key Takeaways
โ
Always ensure SSH equivalency before adding a node.
โ
Use cluvfy to verify cluster integrity.
โ
Check logs in $GRID_HOME/log// if issues arise.