Exadata Database Machine consists of a storage grid, compute grid, and network grid. Each grid, or hardware layer, is built with multiple high-performing, industry-standard Oracle servers to provide hardware and system fault tolerance. The hardware components are subjected to failure. Most common failure on Exadata is Hard Disk failure on Storage Cells. With the latest generation of Exadata the hardware failures are very minimal and less troublesome.
The Exadata Storage Cells and Compute nodes consists of several hardware components, such as:
In this article we will demonstrate how to view the hardware fault and clear it using ILOM fault manager (faultmgmt).
Steps to display and clear hardware fault using faultmgmt:
Step 1: Login to compute node ILOM where the fault occurred
[root@dm01db01 ~]# ssh dm01db02-ilom
Password:
Oracle(R) Integrated Lights Out Manager
Version 4.0.0.24 r121523
Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: dm01db02-ilom
Step 2: Check if the fault manager is supported. If you get the output like below then fault manager is supported.
-> show /SP/faultmgmt/shell
/SP/faultmgmt/shell
Targets:
Properties:
Commands:
cd
show
start
Step 3: Start the fault manager shell
-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
Step 4: Execute the following command to display the fault. Here we can see that there is no issue with hardware but he ILOM file system is 100% full.
faultmgmtsp> fmadm faulty
------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2018-06-17/15:55:32 2a854ad2-4a31-e829-e26c-c84ba212d7f2 ILOM-8000-JV Major
Problem Status : open
Diag Engine : fdd 1.0
System
Manufacturer : Oracle Corporation
Name : Exadata X5-2
Part_Number : Exadata X5-2
Serial_Number : AK00XXXXXX
System Component
Manufacturer : Oracle Corporation
Name : ORACLE SERVER X5-2
Part_Number : 7090664
Serial_Number : 15XXXXXXXX
Firmware_Manufacturer : Oracle Corporation
Firmware_Version : (ILOM)4.0.0.24
Firmware_Release : (ILOM)2017.09.23
----------------------------------------
Suspect 1 of 1
Problem class : defect.ilom.fs.full
Certainty : 100%
Affects : /SYS/SP
Status : faulted
FRU
Status : faulty
Location : /SYS/SP
Manufacturer : Oracle Corporation
Name : SP
Part_Number : PILOT3
Chassis
Manufacturer : Oracle Corporation
Name : ORACLE SERVER X5-2
Part_Number : 7090664
Serial_Number : 1547NM10CX
Description : An ILOM filesystem has exceeded the filesystem capacity
limit.
Response : The chassis wide service-required LED will be illuminated.
Impact : ILOM commands may fail, especially those which make
configuration changes.
Action : Please refer to the associated reference document at
http://support.oracle.com/msg/ILOM-8000-JV for the latest
service procedures and policies regarding this diagnosis.
Step 5: Execute the below command to clear the fault
faultmgmtsp> fmadm acquit UUID --> Get the UUID from the from output of the above command.
faultmgmtsp> fmadm acquit 2a854ad2-4a31-e829-e26c-c84ba212d7f2
Step 6: Verify that the fault is cleared
faultmgmtsp> fmadm faulty
No faults found
Step 7: Exit from the fault manager
faultmgmtsp> exit
Step 8: Reset the ILOM service processor
-> reset /SP
Are you sure you want to reset /SP (y/n)? y
Performing reset on /SP
Step 9: Exit from the ILOM
-> exit
Connection to dm01db02-ilom closed.
Step 10: Connec to ILOM and verify the ILOM SP is restarted
[root@dm01db01 ~]# ssh dm01db02-ilom
Password:
Oracle(R) Integrated Lights Out Manager
Version 4.0.0.24 r121523
Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: dm01db02-ilom
-> show -d properties /SP/clock uptime
/SP/clock
Properties:
uptime = 0 days, 00:08:02
Conclusion
In this article we have learned how to display and clear a fault using fault manager (faultmgmt). The Fault Management Shell is the preferred method for displaying the details of a diagnosed fault. faultmgmt support for command shell varies depending ILOM release level and server product model.
The Exadata Storage Cells and Compute nodes consists of several hardware components, such as:
- Hard disk
- Flash disk
- Physical Memory
- Processor
- IB ports
- Mother Board
- Batteries
- Power Supply
- and So on
In this article we will demonstrate how to view the hardware fault and clear it using ILOM fault manager (faultmgmt).
Steps to display and clear hardware fault using faultmgmt:
Step 1: Login to compute node ILOM where the fault occurred
[root@dm01db01 ~]# ssh dm01db02-ilom
Password:
Oracle(R) Integrated Lights Out Manager
Version 4.0.0.24 r121523
Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: dm01db02-ilom
Step 2: Check if the fault manager is supported. If you get the output like below then fault manager is supported.
-> show /SP/faultmgmt/shell
/SP/faultmgmt/shell
Targets:
Properties:
Commands:
cd
show
start
Step 3: Start the fault manager shell
-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
Step 4: Execute the following command to display the fault. Here we can see that there is no issue with hardware but he ILOM file system is 100% full.
faultmgmtsp> fmadm faulty
------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2018-06-17/15:55:32 2a854ad2-4a31-e829-e26c-c84ba212d7f2 ILOM-8000-JV Major
Problem Status : open
Diag Engine : fdd 1.0
System
Manufacturer : Oracle Corporation
Name : Exadata X5-2
Part_Number : Exadata X5-2
Serial_Number : AK00XXXXXX
System Component
Manufacturer : Oracle Corporation
Name : ORACLE SERVER X5-2
Part_Number : 7090664
Serial_Number : 15XXXXXXXX
Firmware_Manufacturer : Oracle Corporation
Firmware_Version : (ILOM)4.0.0.24
Firmware_Release : (ILOM)2017.09.23
----------------------------------------
Suspect 1 of 1
Problem class : defect.ilom.fs.full
Certainty : 100%
Affects : /SYS/SP
Status : faulted
FRU
Status : faulty
Location : /SYS/SP
Manufacturer : Oracle Corporation
Name : SP
Part_Number : PILOT3
Chassis
Manufacturer : Oracle Corporation
Name : ORACLE SERVER X5-2
Part_Number : 7090664
Serial_Number : 1547NM10CX
Description : An ILOM filesystem has exceeded the filesystem capacity
limit.
Response : The chassis wide service-required LED will be illuminated.
Impact : ILOM commands may fail, especially those which make
configuration changes.
Action : Please refer to the associated reference document at
http://support.oracle.com/msg/ILOM-8000-JV for the latest
service procedures and policies regarding this diagnosis.
Step 5: Execute the below command to clear the fault
faultmgmtsp> fmadm acquit UUID --> Get the UUID from the from output of the above command.
faultmgmtsp> fmadm acquit 2a854ad2-4a31-e829-e26c-c84ba212d7f2
Step 6: Verify that the fault is cleared
faultmgmtsp> fmadm faulty
No faults found
Step 7: Exit from the fault manager
faultmgmtsp> exit
Step 8: Reset the ILOM service processor
-> reset /SP
Are you sure you want to reset /SP (y/n)? y
Performing reset on /SP
Step 9: Exit from the ILOM
-> exit
Connection to dm01db02-ilom closed.
Step 10: Connec to ILOM and verify the ILOM SP is restarted
[root@dm01db01 ~]# ssh dm01db02-ilom
Password:
Oracle(R) Integrated Lights Out Manager
Version 4.0.0.24 r121523
Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: dm01db02-ilom
-> show -d properties /SP/clock uptime
/SP/clock
Properties:
uptime = 0 days, 00:08:02
Conclusion
In this article we have learned how to display and clear a fault using fault manager (faultmgmt). The Fault Management Shell is the preferred method for displaying the details of a diagnosed fault. faultmgmt support for command shell varies depending ILOM release level and server product model.
No comments:
Post a Comment