Some
times we encounter Infiniband Port related issues. These alerts can be
triggered from from OEM or any other
monitoring tools.
Sample
Alert from OEM 12c:
Example
1:
Port xx on
dm01sw-ib3.netsoftmate.com is disconnected from port xx
Example
2:
Cable is present on Port xx but
the port is disabled.
This
document provides the steps to resolve Infiniband Switch Port related issues
mentioned above.
Unless
otherwise stated, run all the commands from compute node 1.
Identify the Problematic Infiniband Switch Port
Using OEM 12c
Log in to OEM 12c using web browser of your choice
Click on Target à Exadata
From the list select the appropriate Exadata
Cluster
From the left pain expand “IB Network”
Select the Infiniband switch having problem.
Now this will display the switch status. If there
are any issues with the port it will mark in RED
From the picture above we can see that there is an issue
with Port 35 on Infiniband Switch “dm01sw-ib3”
- Using IB Switch Commands
Verify-Topology
Oracle
supplies a script/utility called /opt/oracle.SupportTools/ibdiagtools/verify-topology,
with Exadata, which is used to validate InfiniBand network layout.
Verify
the InfiniBand topology using the following command from a database server or
Exadata Storage Server:
[root@dm01db01]# cd
/opt/oracle.SupportTools/ibdiagtools/
[root@dm01db01]#
./verify-topology
Oracle
Exadata Database Machine includes the verify-topology utility. This utility can
be used to identify the following network connection problems:
- Missing InfiniBand cable
- Missing InfiniBand connection
- Incorrectly-seated cable
- Cable connected to the wrong endpoint
[root@dm01db01]# cd
/opt/oracle.SupportTools/ibdiagtools/
[root@dm01db01]# ./verify-topology
[ DB Machine Infiniband Cabling Topology Verification Tool ]
[Version IBD VER 2.d ]
External non-Exadata-image nodes found:
...will check for ZFS if on SSC - else ignore
Found 2 leaf, 1 spine, 0 top spine switches
Check if all hosts have 2 HCAs to different
switches...............[SUCCESS]
Leaf switch check: cardinality and even
distribution..............[SUCCESS]
Spine switch check: Are any Exadata nodes
connected ..............[SUCCESS]
Spine switch check: Any inter spine switch
links..................[SUCCESS]
Spine switch check: Any inter top-spine switch
links..............[SUCCESS]
Spine switch check: Correct number of
spine-leaf links............[SUCCESS]
Leaf switch check: Inter-leaf link
check..........................[SUCCESS]
Leaf switch check: Correct number of
leaf-spine links.............[SUCCESS]
|
In
the example above, there are NO ERRORS reported.
Listlinkup
Run
the listlinkup command to verify
InfiniBand Port status enabled/disabled:
Run
this command on problematic Infiniband Switch.
[root@dm01db01]# ssh
root@dm01sw-ib3
[root@dm01sw-ib3 ~]#
listlinkup
[root@dm01sw-ib3 ~]# listlinkup
Connector
0A Not present
Connector
1A Not present
Connector
2A Not present
Connector
3A Not present
Connector
4A Not present
Connector
5A Not present
Connector 6A Present
<-> Switch Port 35 is down (AutomaticHighErrorRate)
Connector
7A Present <-> Switch Port 33 is up (Enabled)
Connector
8A Present <-> Switch Port 31 is up (Enabled)
Connector
9A Present <-> Switch Port 14 is up (Enabled)
Connector 10A Present <-> Switch Port 16
is up (Enabled)
Connector 11A Present <-> Switch Port 18
is up (Enabled)
Connector 12A Not present
Connector 13A Not present
Connector 14A Present <-> Switch Port 07
is up (Enabled)
Connector 15A Not present
Connector 16A Not present
Connector 17A Present <-> Switch Port 01
is up (Enabled)
Connector
0B Not present
Connector
1B Not present
Connector
2B Not present
Connector
3B Not present
Connector
4B Not present
Connector
5B Present <-> Switch Port 29 is up (Enabled)
Connector
6B Not present
Connector
7B Present <-> Switch Port 34 is up (Enabled)
Connector
8B Not present
Connector
9B Present <-> Switch Port 13 is up (Enabled)
Connector 10B Present <-> Switch Port 15
is up (Enabled)
Connector 11B Present <-> Switch Port 17
is up (Enabled)
Connector 12B Not present
Connector 13B Present <-> Switch Port 10
is up (Enabled)
Connector 14B Not present
Connector 15B Not present
Connector 16B Present <-> Switch Port 04
is up (Enabled)
Connector 17B Present <-> Switch Port 02
is up (Enabled)
|
There
is an issue with port 32 on the Infiniband Switch “dm01sw-ib3”.
This
need to be addressed.
Ibswitches
Use
this command to get the Infiniband switch LID number.
[root@dm01sw-ib3
~]# ibswitches
Switch : 0x002128469deca0a0 ports 36 "SUN DCS
36P QDR dm01sw-ib3 10.213.23.85" enhanced port 0 lid 3 lmc 0
Switch : 0x002128469e45a0a0 ports 36 "SUN DCS
36P QDR dm01sw-ib2 10.213.23.84" enhanced port 0 lid 1 lmc 0
|
Here
the lid number for dm01sw-ib3 is 3.
Ibportstate
Use
this command to identify the port state.
[root@dm01sw-ib3
~]# ibportstate 3 35
PortInfo:
#
Port info: Lid 3 port 35
LinkState:.......................Down
PhysLinkState:...................Disabled
LinkWidthSupported:..............1X
or 4X
LinkWidthEnabled:................1X
or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5
Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5
Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................2.5
Gbps
|
From
the output above we can see that the port is diabled and the link speed is
reduced.
Getportstatus:
Use this command to get the port status
[root@dm01sw-ib3
~]# getportstatus 35
Port
status for connector 6A Switch port 35
Adminstate:......................Disabled
(AutomaticHighErrorRate)
LinkWidthEnabled:................1X
or 4X
LinkWidthSupported:..............1X
or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5
Gbps or 5.0 Gbps or 10.0 Gbps
LinkState:.......................Down
PhysLinkState:...................Disabled
LinkSpeedActive:.................2.5
Gbps
LinkSpeedEnabled:................2.5
Gbps or 5.0 Gbps or 10.0 Gbps
NeighborMTU:.....................4096
OperVLs:.........................VL0
|
- Step to resolve the IB Port Issue
Autodisable is a feature that can display the
connectors in the presence of high error rates or suboptimal link speed or
width.
This feature doesn't cause any issues, it just
alerts customer with abnormal status of connectors.
Autodisable feature has been introduced only in
firmware 2.1 and does not apply to firmware 1.3.
Correct way to account for this is to check and
ensure whether any auto-disabled ports exist and if present then re-enable
using enableswitchport --automatic 'before' up/downgrading fw to a different
version. This will ensure compatible settings when moving between different fw.
Problematic Inifiniband switch details:
Switch name : dm01sw-ib3
Firware verison : 2.1.3-4
Port number : 35
Lid number : 3
This solution for the Infiniband switch firmware
verion “2.1.3-4”.
To reenable an autodisabled connector or IB
switch port, on the leaf switch dm01sw-ib3 do the following:
[root@dm01sw-ib3
~]# enableswitchport --automatic Switch 35
Enable
connector 6A Switch port 35
Adminstate:......................Enabled
LinkWidthEnabled:................1X
or 4X
LinkWidthSupported:..............1X
or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5
Gbps or 5.0 Gbps or 10.0 Gbps
LinkState:.......................Down
PhysLinkState:...................PortConfigurationTraining
LinkSpeedActive:.................2.5
Gbps
LinkSpeedEnabled:................2.5
Gbps or 5.0 Gbps or 10.0 Gbps
NeighborMTU:.....................4096
OperVLs:.........................VL0
|
- Verify
Now verify the port status using the following
different commands.
Ibportstate command
[root@dm01sw-ib3
~]# ibportstate 3 35
PortInfo:
#
Port info: Lid 3 port 35
LinkState:.......................Active
PhysLinkState:...................LinkUp
LinkWidthSupported:..............1X
or 4X
LinkWidthEnabled:................1X
or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5
Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5
Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0
Gbps
Peer
PortInfo:
#
Port info: Lid 3 DR path slid 65535; dlid 65535; 0,35 port 2
LinkState:.......................Active
PhysLinkState:...................LinkUp
LinkWidthSupported:..............1X
or 4X
LinkWidthEnabled:................1X
or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5
Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5
Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0
Gbps
|
Getportstatus command
[root@dm01sw-ib3
~]# getportstatus 35
Port
status for connector 6A Switch port 35
Adminstate:......................Enabled
LinkWidthEnabled:................1X
or 4X
LinkWidthSupported:..............1X
or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5
Gbps or 5.0 Gbps or 10.0 Gbps
LinkState:.......................Active
PhysLinkState:...................LinkUp
LinkSpeedActive:.................10.0
Gbps
LinkSpeedEnabled:................2.5
Gbps or 5.0 Gbps or 10.0 Gbps
NeighborMTU:.....................4096
OperVLs:.........................VL0
|
Listlinkup command
[root@dm01sw-ib3
~]# listlinkup
Connector 0A Not present
Connector 1A Not present
Connector 2A Not present
Connector 3A Not present
Connector 4A Not present
Connector 5A Not present
Connector 6A Present <-> Switch Port 35 is up
(Enabled)
Connector 7A Present <-> Switch Port 33 is up
(Enabled)
Connector 8A Present <-> Switch Port 31 is up
(Enabled)
Connector 9A Present <-> Switch Port 14 is up
(Enabled)
Connector
10A Present <-> Switch Port 16 is up (Enabled)
Connector
11A Present <-> Switch Port 18 is up (Enabled)
Connector
12A Not present
Connector
13A Not present
Connector
14A Present <-> Switch Port 07 is up (Enabled)
Connector
15A Not present
Connector
16A Not present
Connector
17A Present <-> Switch Port 01 is up (Enabled)
Connector 0B Not present
Connector 1B Not present
Connector 2B Not present
Connector 3B Not present
Connector 4B Not present
Connector 5B Present <-> Switch Port 29 is up
(Enabled)
Connector 6B Not present
Connector 7B Present <-> Switch Port 34 is up
(Enabled)
Connector 8B Not present
Connector 9B Present <-> Switch Port 13 is up
(Enabled)
Connector
10B Present <-> Switch Port 15 is up (Enabled)
Connector
11B Present <-> Switch Port 17 is up (Enabled)
Connector
12B Not present
Connector
13B Present <-> Switch Port 10 is up (Enabled)
Connector
14B Not present
Connector
15B Not present
Connector
16B Present <-> Switch Port 04 is up (Enabled)
Connector
17B Present <-> Switch Port 02 is up (Enabled)
|
Conclusion
In this article we have learned various Infiniband Switch command to identify the port status and resolve the port related issues.
Hi i am not able to see the port overview details in OEM12c. Can you let me know how to address that
ReplyDelete