I was working kind of late last night, trying to get a new install working. I ran into a bug where the permissions on the ohasd were incorrect after patching GI. I went out to a working server to see what the permissions should be, build my chown and chmod statements and pasted them into my terminal window. Unfortunately I got them in the wrong terminal, and had managed to copy the wrong permissions. I changed the ownership on ohasd on the first node of my production RAC Cluster. Apparently the permissions are really important because the whole node went down.
A little bit of panic set in and I wasn’t sure what I had done. I didn’t realize i had pasted the permission statements into the wrong window, and the error messages weren’t very helpful.
[root@node1 bin]# ./crsctl stop crs -f CRS-4639: Could not contact Oracle High Availability Services CRS-4000: Command Stop failed, or completed with errors. [root@node1 bin]# ps -ef | grep d.bin root 33615 1 0 16:53 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot root 53203 1 0 16:57 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot root 79744 1 0 17:07 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot root 105598 103980 0 17:17 pts/2 00:00:00 grep d.bin [root@node1 bin]# kill -9 33615 53203 79744 [root@node1 bin]# ps -ef | grep d.bin root 106623 103980 0 17:17 pts/2 00:00:00 grep d.bin [root@node1 bin]# date Thu Jan 5 17:18:02 EST 2017 [root@node1 bin]# ./crsctl start crs CRS-4124: Oracle High Availability Services startup failed. CRS-4000: Command Start failed, or completed with errors.
I was getting nothing in the logs.
I figured it must be a permission issue, but I wasn’t quite sure what to reset them to.
Apparently I am not the first person to do this since Oracle has a document for fixing this!
How to check and fix file permissions on Grid Infrastructure environment (Doc ID 1931142.1)
I ran
./rootcrs.pl -init
rebooted the node, and all was right with the world, except for my ego being kind of damaged from making such a silly mistake.