Introduction
We had a FAN failure on Exadata Infiniband Switch (FAN2). Scheduled the faulty hardware replacement with Oracle. The Oracle Feild Engineer came to the Customer Data Center and replaced the faulty FAN on Infiniband Switch. The FAN replacement was successful however the fault was not cleared automatically. We can still see the FAN was marked faulted from Infiniband BUI and CLI.
From Infiniband Browser User Interface
In this article we will demonstrate how to clear the fault on Infiniband Switch after hardware replacement.
Environment test started:
Starting Environment Daemon test:
Environment daemon running
Environment Daemon test returned OK
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.28 V
Measured 3.3V Standby = 3.39 V
Measured 12V = 11.97 V
Measured 5V = 5.02 V
Measured VBAT = 3.14 V
Measured 2.5V = 2.49 V
Measured 1.8V = 1.79 V
Measured I4 1.2V = 1.22 V
Voltage test returned OK
Starting PSU test:
PSU 0 present OK
PSU 1 present OK
PSU test returned OK
Starting Temperature test:
Back temperature 40
Front temperature 41
SP temperature 57
Switch temperature 55, maxtemperature 59
Temperature test returned OK
Starting FAN test:
Fan 0 not present
Fan 1 running at rpm 17004
Fan 2 running at rpm 15696
Fan 3 running at rpm 17004
Fan 4 not present
FAN test returned OK
Starting Connector test:
Connector test returned OK
Starting Onboard ibdevice test:
Switch OK
All Internal ibdevices OK
Onboard ibdevice test returned OK
Starting SSD test:
SSD test returned OK
Starting Auto-link-disable test:
Auto-link-disable test returned OK
Environment test PASSED
Fan 0 not present
Fan 1 running at rpm 17004
Fan 2 running at rpm 15478
Fan 3 running at rpm 17004
Fan 4 not present
Oracle(R) Integrated Lights Out Manager
Version 2.2.9-3 ILOM 3.2.11 r124039
Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: dm01sw-iba01.netsoftmate.com
->
Target | Property | Value
----------------------------------------+----------------------------------------------+--------------------------------------------------------------------
/SYS | fault_state | OK
/SYS/MB | fault_state | OK
/SYS/PSU0 | fault_state | OK
/SYS/PSU1 | fault_state | OK
/SYS/FAN1 | fault_state | OK
/SYS/FAN2 | fault_state | Faulted /SYS/FAN3 | fault_state | OK
->
/SP/faultmgmt
Targets:
shell
0 (/SYS/FAN2)
Are you sure you want to clear /SYS/FAN2 (y/n)? y
Set 'clear_fault_action' to 'true'
Target | Property | Value
----------------------------------------+----------------------------------------------+--------------------------------------------------------------------
/SYS | fault_state | OK
/SYS/MB | fault_state | OK
/SYS/PSU0 | fault_state | OK
/SYS/PSU1 | fault_state | OK
/SYS/FAN1 | fault_state | OK
/SYS/FAN2 | fault_state | OK
/SYS/FAN3 | fault_state | OK
-> show -d targets /SP/faultmgmt
/SP/faultmgmt
Targets:
shell
Conclusion
In this article we have learned how to identify the fault and clear it manually on an Exadata Infiniband Switch. The ILOM commands comes handy for clearing the fault. You can also clear the fault using the Browser User Interface (BUI).
We had a FAN failure on Exadata Infiniband Switch (FAN2). Scheduled the faulty hardware replacement with Oracle. The Oracle Feild Engineer came to the Customer Data Center and replaced the faulty FAN on Infiniband Switch. The FAN replacement was successful however the fault was not cleared automatically. We can still see the FAN was marked faulted from Infiniband BUI and CLI.
From Infiniband Browser User Interface
In this article we will demonstrate how to clear the fault on Infiniband Switch after hardware replacement.
- Login to the Infiniband switch using Putty as root user and check the Infiniband health. From the output below we can see the FANs are all good.
Environment test started:
Starting Environment Daemon test:
Environment daemon running
Environment Daemon test returned OK
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.28 V
Measured 3.3V Standby = 3.39 V
Measured 12V = 11.97 V
Measured 5V = 5.02 V
Measured VBAT = 3.14 V
Measured 2.5V = 2.49 V
Measured 1.8V = 1.79 V
Measured I4 1.2V = 1.22 V
Voltage test returned OK
Starting PSU test:
PSU 0 present OK
PSU 1 present OK
PSU test returned OK
Starting Temperature test:
Back temperature 40
Front temperature 41
SP temperature 57
Switch temperature 55, maxtemperature 59
Temperature test returned OK
Starting FAN test:
Fan 0 not present
Fan 1 running at rpm 17004
Fan 2 running at rpm 15696
Fan 3 running at rpm 17004
Fan 4 not present
FAN test returned OK
Starting Connector test:
Connector test returned OK
Starting Onboard ibdevice test:
Switch OK
All Internal ibdevices OK
Onboard ibdevice test returned OK
Starting SSD test:
SSD test returned OK
Starting Auto-link-disable test:
Auto-link-disable test returned OK
Environment test PASSED
- Check the FAN Speed. FAN looks good.
Fan 0 not present
Fan 1 running at rpm 17004
Fan 2 running at rpm 15478
Fan 3 running at rpm 17004
Fan 4 not present
- Switch to the ilom-admin user
Oracle(R) Integrated Lights Out Manager
Version 2.2.9-3 ILOM 3.2.11 r124039
Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: dm01sw-iba01.netsoftmate.com
->
- Now check the fault table for any faulty components. Now we can see the FAN2 is Faulted though the FAN was replaced with a new FAN.
Target | Property | Value
----------------------------------------+----------------------------------------------+--------------------------------------------------------------------
/SYS | fault_state | OK
/SYS/MB | fault_state | OK
/SYS/PSU0 | fault_state | OK
/SYS/PSU1 | fault_state | OK
/SYS/FAN1 | fault_state | OK
/SYS/FAN2 | fault_state | Faulted /SYS/FAN3 | fault_state | OK
->
- You can also execute the below command to identify the fault
/SP/faultmgmt
Targets:
shell
0 (/SYS/FAN2)
- Clear the Fault as show below
Are you sure you want to clear /SYS/FAN2 (y/n)? y
Set 'clear_fault_action' to 'true'
- Verify the fault is cleared
Target | Property | Value
----------------------------------------+----------------------------------------------+--------------------------------------------------------------------
/SYS | fault_state | OK
/SYS/MB | fault_state | OK
/SYS/PSU0 | fault_state | OK
/SYS/PSU1 | fault_state | OK
/SYS/FAN1 | fault_state | OK
/SYS/FAN2 | fault_state | OK
/SYS/FAN3 | fault_state | OK
-> show -d targets /SP/faultmgmt
/SP/faultmgmt
Targets:
shell
- Verify from the Infiniband Band BUI
Conclusion
In this article we have learned how to identify the fault and clear it manually on an Exadata Infiniband Switch. The ILOM commands comes handy for clearing the fault. You can also clear the fault using the Browser User Interface (BUI).
Nice one. Thanks for sharing !!
ReplyDeleteExcellent Article thanks for writing, Please keep the good deed going...
ReplyDeleteA beats might be fantastic. You possess numerous especially capable actors. I just aspire most people the right from victory. Fault Location Services Dahlgren
ReplyDelete