[Pkg-ofed-devel] ofa-kernel: ib_query_gid() failed

Mario Lang mlang at debian.org
Mon Oct 12 18:15:34 UTC 2009


Guy Coates <gmpc at sanger.ac.uk> writes:

>> However, opensm fails to start which can be traced down
>> to ibstat -p hanging.  dmesg produces the following output upon
>> /etc/init.d/opensm start:
>>
>> Oct 12 12:52:48 node1 kernel: [   78.170077] ib0: ib_query_gid() failed
>> Oct 12 12:52:58 node1 kernel: [   89.272789] ib0: ib_query_port failed
>>
>> We dont get any other obvious dmesg errors.

> Does opensm run if you explicitly tell it to start on the port?

Yes, but the log is full of errors.  I attach it at the end of this mail.

> Does ibstat hang if you do not give it the -p option?

node1:~# ibstat
ibpanic: [5764] main: stat of IB device 'mlx4_0' failed: (Device or resource busy)

> That sounds like a bug that should be reported to the openfabrics
> people.

> (I can't reproduce it on my setup, so it sounds like this is a
> hardware dependant bug.)

Is there a good way to verify that all the modules required are
really loaded from /lib/modules/$(uname -r)/updates/ to make sure
the ofa-kernel install really did what it was supposed to do?

Here is the content of /var/log/opensm.0x0003ba0001007209.log:
Oct 12 20:06:50 911824 [54CCB6E0] 0x03 -> OpenSM 3.2.6_20090317
Oct 12 20:06:50 911913 [54CCB6E0] 0x80 -> OpenSM 3.2.6_20090317
Oct 12 20:06:50 913069 [54CCB6E0] 0x02 -> osm_vendor_init: 1000 pending umads specified
Oct 12 20:06:50 930612 [54CCB6E0] 0x80 -> Entering DISCOVERING state
Oct 12 20:06:50 930715 [54CCB6E0] 0x02 -> osm_vendor_bind: Binding to port 0x3ba0001007209
Oct 12 20:06:50 976668 [54CCB6E0] 0x02 -> osm_vendor_bind: Binding to port 0x3ba0001007209
Oct 12 20:06:50 977941 [52310950] 0x80 -> SM port is down
Oct 12 20:06:50 977990 [52310950] 0x01 -> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state DISCOVERING
Oct 12 20:06:53 259032 [BA7346E0] 0x03 -> OpenSM 3.2.6_20090317
Oct 12 20:06:53 259097 [BA7346E0] 0x80 -> OpenSM 3.2.6_20090317
Oct 12 20:06:53 260185 [BA7346E0] 0x02 -> osm_vendor_init: 1000 pending umads specified
Oct 12 20:06:53 260364 [BA7346E0] 0x80 -> Entering DISCOVERING state
Oct 12 20:06:53 260442 [BA7346E0] 0x02 -> osm_vendor_bind: Binding to port 0x3ba0001007209
Oct 12 20:06:53 315287 [BA7346E0] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Oct 12 20:06:53 315321 [BA7346E0] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Oct 12 20:06:53 315334 [BA7346E0] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Oct 12 20:06:53 315351 [BA7346E0] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Oct 12 20:06:53 316938 [BA7346E0] 0x80 -> Exiting SM
Oct 12 20:07:00 933847 [52310950] 0x80 -> Entering MASTER state
Oct 12 20:07:10 941969 [51B0F950] 0x01 -> osm_vendor_send: ERR 5430: Send p_madw = 0xf50bd0 of size 256 failed -5 (Invalid argument)
Oct 12 20:07:10 942012 [51B0F950] 0x01 -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_ERROR)
Oct 12 20:07:10 942019 [51B0F950] 0x01 -> __osm_sm_mad_ctrl_send_err_cb: ERR 3119: Set method failed
Oct 12 20:07:10 942076 [51B0F950] 0x01 -> SMP dump:
				base_ver................0x1
				mgmt_class..............0x81
				class_ver...............0x1
				method..................0x2 (SubnSet)
				D bit...................0x0
				status..................0x0
				hop_ptr.................0x0
				hop_count...............0x0
				trans_id................0x127e
				attr_id.................0x15 (PortInfo)
				resv....................0x0
				attr_mod................0x1
				m_key...................0x0000000000000000
				dr_slid.................65535
				dr_dlid.................65535

				Initial path: 0
				Return path:  0
				Reserved:     [0][0][0][0][0][0][0]

				00 00 00 00 00 00 00 00   FE 80 00 00 00 00 00 00

				00 03 00 03 02 51 08 6A   00 00 00 00 01 03 03 02

				30 02 00 23 40 40 04 08   08 04 1F 40 00 00 00 00

				00 00 80 92 10 88 00 00   00 00 00 00 00 00 00 00

Oct 12 20:07:10 942095 [51B0F950] 0x01 -> vl15_send_mad: ERR 3E03: MAD send failed (IB_UNKNOWN_ERROR)

-- 
CYa,
  ⡍⠁⠗⠊⠕



More information about the Pkg-ofed-devel mailing list