[Pkg-ofed-devel] ofa-kernel: ib_query_gid() failed
Guy Coates
gmpc at sanger.ac.uk
Mon Oct 12 19:18:53 UTC 2009
Mario Lang wrote:
> Guy Coates <gmpc at sanger.ac.uk> writes:
>
>>> However, opensm fails to start which can be traced down
>>> to ibstat -p hanging. dmesg produces the following output upon
>>> /etc/init.d/opensm start:
>>>
>>> Oct 12 12:52:48 node1 kernel: [ 78.170077] ib0: ib_query_gid() failed
>>> Oct 12 12:52:58 node1 kernel: [ 89.272789] ib0: ib_query_port failed
>>>
>>> We dont get any other obvious dmesg errors.
>
>> Does opensm run if you explicitly tell it to start on the port?
>
> Yes, but the log is full of errors. I attach it at the end of this mail.
>
>> Does ibstat hang if you do not give it the -p option?
>
> node1:~# ibstat
> ibpanic: [5764] main: stat of IB device 'mlx4_0' failed: (Device or resource busy)
>
>> That sounds like a bug that should be reported to the openfabrics
>> people.
>
>> (I can't reproduce it on my setup, so it sounds like this is a
>> hardware dependant bug.)
>
> Is there a good way to verify that all the modules required are
> really loaded from /lib/modules/$(uname -r)/updates/ to make sure
> the ofa-kernel install really did what it was supposed to do?
You can run modinfo on all the modules;
/sbin/lsmod | awk '{print $1}' | xargs modinfo | grep filename
That will tell you what modprobe would load. However, that is not quite
the same as what module *actually* was loaded.
You should regenerate your initrd too, to make sure that the new modules
are installed there too.
update-initramfs -u
This sounds like the same problem that jobic at polytech.univ-mrs.fr
reported back in August; we were not able to get to the bottom of it,
but your testing certainly points to a fairly fundamental bug in the
ofa-kernel modules.
Can you try on a node with a minimal infiniband configuration?
deconfigure the ipoib interfaces and just load the following modules:
mlx4_ib
ib_umad
That should be enough to get ibstat working.
I will have another go at trying to reproduce this on my hardware.
Guy
--
Dr Guy Coates, Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 ex 6925
Fax: +44 (0)1223 496802
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Pkg-ofed-devel
mailing list