[Pkg-ofed-devel] ofa-kernel: ib_query_gid() failed
    Guy Coates 
    gmpc at sanger.ac.uk
       
    Mon Oct 12 19:18:53 UTC 2009
    
    
  
Mario Lang wrote:
> Guy Coates <gmpc at sanger.ac.uk> writes:
> 
>>> However, opensm fails to start which can be traced down
>>> to ibstat -p hanging.  dmesg produces the following output upon
>>> /etc/init.d/opensm start:
>>>
>>> Oct 12 12:52:48 node1 kernel: [   78.170077] ib0: ib_query_gid() failed
>>> Oct 12 12:52:58 node1 kernel: [   89.272789] ib0: ib_query_port failed
>>>
>>> We dont get any other obvious dmesg errors.
> 
>> Does opensm run if you explicitly tell it to start on the port?
> 
> Yes, but the log is full of errors.  I attach it at the end of this mail.
> 
>> Does ibstat hang if you do not give it the -p option?
> 
> node1:~# ibstat
> ibpanic: [5764] main: stat of IB device 'mlx4_0' failed: (Device or resource busy)
> 
>> That sounds like a bug that should be reported to the openfabrics
>> people.
> 
>> (I can't reproduce it on my setup, so it sounds like this is a
>> hardware dependant bug.)
> 
> Is there a good way to verify that all the modules required are
> really loaded from /lib/modules/$(uname -r)/updates/ to make sure
> the ofa-kernel install really did what it was supposed to do?
You can run modinfo on all the modules;
/sbin/lsmod | awk '{print $1}' | xargs modinfo | grep filename
That will tell you what modprobe would load. However, that is not quite
the same as what module *actually* was loaded.
You should regenerate your initrd too, to make sure that the new modules
are installed there too.
 update-initramfs -u
This sounds like the same problem that  jobic at polytech.univ-mrs.fr
reported back in August; we were not able to get to the bottom of it,
but your testing certainly points to a fairly fundamental bug in the
ofa-kernel modules.
Can you try on a node with a minimal infiniband configuration?
deconfigure the ipoib interfaces and just load the following modules:
mlx4_ib
ib_umad
That should be enough to get ibstat working.
I will have another go at trying to reproduce this on my hardware.
Guy
-- 
Dr Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 ex 6925
Fax: +44 (0)1223 496802
-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
    
    
More information about the Pkg-ofed-devel
mailing list