[Pkg-ofed-devel] ofa-kernel: ib_query_gid() failed
Mario Lang
mlang at debian.org
Mon Oct 12 18:15:34 UTC 2009
Guy Coates <gmpc at sanger.ac.uk> writes:
>> However, opensm fails to start which can be traced down
>> to ibstat -p hanging. dmesg produces the following output upon
>> /etc/init.d/opensm start:
>>
>> Oct 12 12:52:48 node1 kernel: [ 78.170077] ib0: ib_query_gid() failed
>> Oct 12 12:52:58 node1 kernel: [ 89.272789] ib0: ib_query_port failed
>>
>> We dont get any other obvious dmesg errors.
> Does opensm run if you explicitly tell it to start on the port?
Yes, but the log is full of errors. I attach it at the end of this mail.
> Does ibstat hang if you do not give it the -p option?
node1:~# ibstat
ibpanic: [5764] main: stat of IB device 'mlx4_0' failed: (Device or resource busy)
> That sounds like a bug that should be reported to the openfabrics
> people.
> (I can't reproduce it on my setup, so it sounds like this is a
> hardware dependant bug.)
Is there a good way to verify that all the modules required are
really loaded from /lib/modules/$(uname -r)/updates/ to make sure
the ofa-kernel install really did what it was supposed to do?
Here is the content of /var/log/opensm.0x0003ba0001007209.log:
Oct 12 20:06:50 911824 [54CCB6E0] 0x03 -> OpenSM 3.2.6_20090317
Oct 12 20:06:50 911913 [54CCB6E0] 0x80 -> OpenSM 3.2.6_20090317
Oct 12 20:06:50 913069 [54CCB6E0] 0x02 -> osm_vendor_init: 1000 pending umads specified
Oct 12 20:06:50 930612 [54CCB6E0] 0x80 -> Entering DISCOVERING state
Oct 12 20:06:50 930715 [54CCB6E0] 0x02 -> osm_vendor_bind: Binding to port 0x3ba0001007209
Oct 12 20:06:50 976668 [54CCB6E0] 0x02 -> osm_vendor_bind: Binding to port 0x3ba0001007209
Oct 12 20:06:50 977941 [52310950] 0x80 -> SM port is down
Oct 12 20:06:50 977990 [52310950] 0x01 -> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state DISCOVERING
Oct 12 20:06:53 259032 [BA7346E0] 0x03 -> OpenSM 3.2.6_20090317
Oct 12 20:06:53 259097 [BA7346E0] 0x80 -> OpenSM 3.2.6_20090317
Oct 12 20:06:53 260185 [BA7346E0] 0x02 -> osm_vendor_init: 1000 pending umads specified
Oct 12 20:06:53 260364 [BA7346E0] 0x80 -> Entering DISCOVERING state
Oct 12 20:06:53 260442 [BA7346E0] 0x02 -> osm_vendor_bind: Binding to port 0x3ba0001007209
Oct 12 20:06:53 315287 [BA7346E0] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Oct 12 20:06:53 315321 [BA7346E0] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Oct 12 20:06:53 315334 [BA7346E0] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Oct 12 20:06:53 315351 [BA7346E0] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Oct 12 20:06:53 316938 [BA7346E0] 0x80 -> Exiting SM
Oct 12 20:07:00 933847 [52310950] 0x80 -> Entering MASTER state
Oct 12 20:07:10 941969 [51B0F950] 0x01 -> osm_vendor_send: ERR 5430: Send p_madw = 0xf50bd0 of size 256 failed -5 (Invalid argument)
Oct 12 20:07:10 942012 [51B0F950] 0x01 -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_ERROR)
Oct 12 20:07:10 942019 [51B0F950] 0x01 -> __osm_sm_mad_ctrl_send_err_cb: ERR 3119: Set method failed
Oct 12 20:07:10 942076 [51B0F950] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x2 (SubnSet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x0
trans_id................0x127e
attr_id.................0x15 (PortInfo)
resv....................0x0
attr_mod................0x1
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0
Return path: 0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 FE 80 00 00 00 00 00 00
00 03 00 03 02 51 08 6A 00 00 00 00 01 03 03 02
30 02 00 23 40 40 04 08 08 04 1F 40 00 00 00 00
00 00 80 92 10 88 00 00 00 00 00 00 00 00 00 00
Oct 12 20:07:10 942095 [51B0F950] 0x01 -> vl15_send_mad: ERR 3E03: MAD send failed (IB_UNKNOWN_ERROR)
--
CYa,
⡍⠁⠗⠊⠕
More information about the Pkg-ofed-devel
mailing list