[Pkg-ofed-devel] ofa-kernel: ib_query_gid() failed

Guy Coates gmpc at sanger.ac.uk
Mon Oct 12 19:18:53 UTC 2009


Mario Lang wrote:
> Guy Coates <gmpc at sanger.ac.uk> writes:
> 
>>> However, opensm fails to start which can be traced down
>>> to ibstat -p hanging.  dmesg produces the following output upon
>>> /etc/init.d/opensm start:
>>>
>>> Oct 12 12:52:48 node1 kernel: [   78.170077] ib0: ib_query_gid() failed
>>> Oct 12 12:52:58 node1 kernel: [   89.272789] ib0: ib_query_port failed
>>>
>>> We dont get any other obvious dmesg errors.
> 
>> Does opensm run if you explicitly tell it to start on the port?
> 
> Yes, but the log is full of errors.  I attach it at the end of this mail.
> 
>> Does ibstat hang if you do not give it the -p option?
> 
> node1:~# ibstat
> ibpanic: [5764] main: stat of IB device 'mlx4_0' failed: (Device or resource busy)
> 
>> That sounds like a bug that should be reported to the openfabrics
>> people.
> 
>> (I can't reproduce it on my setup, so it sounds like this is a
>> hardware dependant bug.)
> 
> Is there a good way to verify that all the modules required are
> really loaded from /lib/modules/$(uname -r)/updates/ to make sure
> the ofa-kernel install really did what it was supposed to do?

You can run modinfo on all the modules;

/sbin/lsmod | awk '{print $1}' | xargs modinfo | grep filename

That will tell you what modprobe would load. However, that is not quite
the same as what module *actually* was loaded.

You should regenerate your initrd too, to make sure that the new modules
are installed there too.

 update-initramfs -u

This sounds like the same problem that  jobic at polytech.univ-mrs.fr
reported back in August; we were not able to get to the bottom of it,
but your testing certainly points to a fairly fundamental bug in the
ofa-kernel modules.


Can you try on a node with a minimal infiniband configuration?

deconfigure the ipoib interfaces and just load the following modules:

mlx4_ib
ib_umad

That should be enough to get ibstat working.

I will have another go at trying to reproduce this on my hardware.

Guy

-- 
Dr Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 ex 6925
Fax: +44 (0)1223 496802


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Pkg-ofed-devel mailing list