High Availability is the backbone of any mission-critical database system, and Oracle RAC (Real Application Clusters) ensures seamless failover. But what happens when a node in the cluster fails? 🤔
Here’s a step-by-step guide to reinstating a failed node in an Oracle RAC 19c environment.
✅ Steps to Reinstate the Failed Node
🔹 Step 1: Stop Cluster Services on the Failed Node
If the node is partially up, stop all cluster services before proceeding:
crsctl stop crs
🔹 Step 2: Reconfigure SSH Equivalency
Ensure passwordless SSH setup between nodes:
cluvfy comp nodecon -n , -verbose
If required, reconfigure SSH:
$GRID_HOME/root.sh
🔹 Step 3: Verify Cluster Configuration
Run a cluster verification test:
cluvfy stage -post crsinst -n , -verbose
🔹 Step 4: Add the Failed Node Back to the Cluster
On a surviving node, use addnode.sh:
$GRID_HOME/addnode.sh -silent -responseFile /path/to/addnode.rsp
🔹 Step 5: Run Root Scripts on the Failed Node
Execute these scripts as root:
$GRID_HOME/root.sh
$ORACLE_HOME/root.sh
🔹 Step 6: Verify Cluster Status
Check the cluster status and ensure the node is back online:
crsctl stat res -t
srvctl status nodeapps -n
🔹 Step 7: Restart Services if Needed
crsctl start crs
srvctl start instance -d -n
🎯 Key Takeaways
✅ Always ensure SSH equivalency before adding a node.
✅ Use cluvfy to verify cluster integrity.
✅ Check logs in $GRID_HOME/log// if issues arise.