Introduction
I was working on a hardware (Processor) failure on Exadata X5-2 Compute node. There was an Automatic SR generated for the hardware failure, Oracle Field Engineer contacted us for hardware replacement and replaced the faulty hardware. Everything went smooth until this point. But we noticed that even after the hardware replacement the fault was not cleared automatically. So we ended up clearing the hardware fault manually.
In this article I will demonstrate how to clear a hardware (Processor) fault manually. The same steps can be used for clearing all type of faulty hardware by replacing the hardware name/path.
[root@dm01db01 ~]# ipmitool sunoem cli "show -d properties -level all /SYS/MB fault_state==Faulted"
Connected. Use ^D to exit.
-> show -d properties -level all /SYS/MB fault_state==Faulted
/SYS/MB/P1
Properties:
type = Host Processor
ipmi_name = MB/P1
fru_name = Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
fru_version = 02
fru_part_number = 060F
fault_state = Faulted
clear_fault_action = (none)
-> Session closed
Disconnected
From the output above we can see that Processor P1 (/SYS/MS/P1) is faulty and replacement.
You can also check for hardware failures using Web ILOM
Steps to Clear a hardware fault post hardware replacement:
[root@dm01db01 ~]# ipmitool sunoem cli "show -d properties -level all /SYS/MB fault_state==Faulted"
Connected. Use ^D to exit.
-> show -d properties -level all /SYS/MB fault_state==Faulted
/SYS/MB/P1
Properties:
type = Host Processor
ipmi_name = MB/P1
fru_name = Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
fru_version = 02
fru_part_number = 060F
fault_state = Faulted
clear_fault_action = (none)
-> Session closed
Disconnected
[root@dm01db01 ~]# ssh dm01db01-ilom
The authenticity of host 'dm01db01-ilom (10.10.10.11)' can't be established.
RSA key fingerprint is 52:45:af:c4:08:29:c4:6a:15:d9:5f:6d:14:cb:23:b1.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'dm01db01-ilom,10.10.10.11' (RSA) to the list of known hosts.
Password:
Oracle(R) Integrated Lights Out Manager
Version 3.2.8.24 r114580
Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: dm01db01-ilom
-> show -d properties -level all /SYS/MB fault_state==Faulted
/SYS/MB/P1
Properties:
type = Host Processor
ipmi_name = MB/P1
fru_name = Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
fru_version = 02
fru_part_number = 060F
fault_state = Faulted
clear_fault_action = (none)
-> set /SYS/MB/P1 clear_fault_action=true
Are you sure you want to clear /SYS/MB/P1 (y/n)? y
Set 'clear_fault_action' to 'true'
-> show -d properties -level all /SYS/MB fault_state==Faulted
show: Query found no matches.
No Faulty hardware found.
Verify from Web ILOM
-> exit
Connection to dm01db01-ilom closed.
[root@dm01db01 ~]#
Conclusion
In this article we have learned how to identify the hardware fault and clear it post hardware replacement.
I was working on a hardware (Processor) failure on Exadata X5-2 Compute node. There was an Automatic SR generated for the hardware failure, Oracle Field Engineer contacted us for hardware replacement and replaced the faulty hardware. Everything went smooth until this point. But we noticed that even after the hardware replacement the fault was not cleared automatically. So we ended up clearing the hardware fault manually.
In this article I will demonstrate how to clear a hardware (Processor) fault manually. The same steps can be used for clearing all type of faulty hardware by replacing the hardware name/path.
- To identify faulty hardware, execute the ILOM following command:
[root@dm01db01 ~]# ipmitool sunoem cli "show -d properties -level all /SYS/MB fault_state==Faulted"
Connected. Use ^D to exit.
-> show -d properties -level all /SYS/MB fault_state==Faulted
/SYS/MB/P1
Properties:
type = Host Processor
ipmi_name = MB/P1
fru_name = Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
fru_version = 02
fru_part_number = 060F
fault_state = Faulted
clear_fault_action = (none)
-> Session closed
Disconnected
From the output above we can see that Processor P1 (/SYS/MS/P1) is faulty and replacement.
You can also check for hardware failures using Web ILOM
Steps to Clear a hardware fault post hardware replacement:
- Identify the hardware fault
[root@dm01db01 ~]# ipmitool sunoem cli "show -d properties -level all /SYS/MB fault_state==Faulted"
Connected. Use ^D to exit.
-> show -d properties -level all /SYS/MB fault_state==Faulted
/SYS/MB/P1
Properties:
type = Host Processor
ipmi_name = MB/P1
fru_name = Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
fru_version = 02
fru_part_number = 060F
fault_state = Faulted
clear_fault_action = (none)
-> Session closed
Disconnected
- Connect to problematic Compute node ILOM
[root@dm01db01 ~]# ssh dm01db01-ilom
The authenticity of host 'dm01db01-ilom (10.10.10.11)' can't be established.
RSA key fingerprint is 52:45:af:c4:08:29:c4:6a:15:d9:5f:6d:14:cb:23:b1.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'dm01db01-ilom,10.10.10.11' (RSA) to the list of known hosts.
Password:
Oracle(R) Integrated Lights Out Manager
Version 3.2.8.24 r114580
Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: dm01db01-ilom
-> show -d properties -level all /SYS/MB fault_state==Faulted
/SYS/MB/P1
Properties:
type = Host Processor
ipmi_name = MB/P1
fru_name = Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
fru_version = 02
fru_part_number = 060F
fault_state = Faulted
clear_fault_action = (none)
- Execute the following command to clear the fault
-> set /SYS/MB/P1 clear_fault_action=true
Are you sure you want to clear /SYS/MB/P1 (y/n)? y
Set 'clear_fault_action' to 'true'
- Verify the fault is cleared
-> show -d properties -level all /SYS/MB fault_state==Faulted
show: Query found no matches.
No Faulty hardware found.
Verify from Web ILOM
- Exit from ILOM
-> exit
Connection to dm01db01-ilom closed.
[root@dm01db01 ~]#
Conclusion
In this article we have learned how to identify the hardware fault and clear it post hardware replacement.
No comments:
Post a Comment