[Pkg-ofed-commits] r487 - /trunk/ofed-docs/trunk/DEBIAN-HOWTO/
gmpc-guest at alioth.debian.org
gmpc-guest at alioth.debian.org
Tue Oct 13 14:22:05 UTC 2009
Author: gmpc-guest
Date: Tue Oct 13 14:22:04 2009
New Revision: 487
URL: http://svn.debian.org/wsvn/pkg-ofed/?sc=1&rev=487
Log:
Add latest howto
Added:
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-12.html
Modified:
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-1.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-10.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-11.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-2.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-3.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-4.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-5.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-6.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-7.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-8.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-9.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto.html
trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto.txt
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-1.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-1.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-1.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-1.html Tue Oct 13 14:22:04 2009
@@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
<TITLE>Infiniband HOWTO: Introduction</TITLE>
<LINK HREF="infiniband-howto-2.html" REL=next>
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-10.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-10.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-10.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-10.html Tue Oct 13 14:22:04 2009
@@ -1,8 +1,8 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
- <TITLE>Infiniband HOWTO: Network Troubleshooting</TITLE>
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
+ <TITLE>Infiniband HOWTO: Troubleshooting</TITLE>
<LINK HREF="infiniband-howto-11.html" REL=next>
<LINK HREF="infiniband-howto-9.html" REL=previous>
<LINK HREF="infiniband-howto.html#toc10" REL=contents>
@@ -12,18 +12,56 @@
<A HREF="infiniband-howto-9.html">Previous</A>
<A HREF="infiniband-howto.html#toc10">Contents</A>
<HR>
-<H2><A NAME="s10">10.</A> <A HREF="infiniband-howto.html#toc10">Network Troubleshooting</A></H2>
+<H2><A NAME="s10">10.</A> <A HREF="infiniband-howto.html#toc10">Troubleshooting</A></H2>
-
-<H2><A NAME="ss10.1">10.1</A> <A HREF="infiniband-howto.html#toc10.1">ibdiagnet</A>
+<P>This section covers general troubleshooting and commonly reported problems.</P>
+<H2><A NAME="ss10.1">10.1</A> <A HREF="infiniband-howto.html#toc10.1">Genernal fabric troubleshooting</A>
</H2>
-<P>The ibdiagnet program can be used to troubleshoot potential issues with your infiniband fabric.</P>
-<P>
+<P>The ibdiagnet program can be used to troubleshoot potential issues with your infiniband fabric.
<BLOCKQUOTE><CODE>
ibdiagnet -r
</CODE></BLOCKQUOTE>
</P>
+
+<H2><A NAME="ss10.2">10.2</A> <A HREF="infiniband-howto.html#toc10.2">ib_query_gid() failed errors on mlx4 platforms</A>
+</H2>
+
+<P>ibstat or opensm hangs and the following kernel messages are printed:</P>
+<P>
+<BLOCKQUOTE><CODE>
+<PRE>
+kernel: [ 78.170077] ib0: ib_query_gid() failed
+kernel: [ 89.272789] ib0: ib_query_port failed
+</PRE>
+</CODE></BLOCKQUOTE>
+</P>
+<P>Fix: Load the mlx4_core module with the msi_x=0 option.</P>
+<P>
+<BLOCKQUOTE><CODE>
+<PRE>
+cat > /etc/modprobe.d/mlx4_core <<EOF
+options mlx4_core msi_x=0
+EOF
+
+update-initramfs -u
+</PRE>
+</CODE></BLOCKQUOTE>
+</P>
+
+<H2><A NAME="ss10.3">10.3</A> <A HREF="infiniband-howto.html#toc10.3">Missing XRC support</A>
+</H2>
+
+<P>If you see error messages pertaining to missing support for XRC, it means you have mis-matched kernel modules and userspace libraries.
+<BLOCKQUOTE><CODE>
+<PRE>
+mlx4: There is a mismatch between the kernel and the userspace
+libraries: Kernel does not support XRC. Exiting.
+</PRE>
+</CODE></BLOCKQUOTE>
+
+Fix: Make sure that you build and install the OFED kernel modules as described in section X.</P>
+
<HR>
<A HREF="infiniband-howto-11.html">Next</A>
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-11.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-11.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-11.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-11.html Tue Oct 13 14:22:04 2009
@@ -1,42 +1,33 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
- <TITLE>Infiniband HOWTO: Further Information</TITLE>
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
+ <TITLE>Infiniband HOWTO: Tips and Tricks</TITLE>
+ <LINK HREF="infiniband-howto-12.html" REL=next>
<LINK HREF="infiniband-howto-10.html" REL=previous>
<LINK HREF="infiniband-howto.html#toc11" REL=contents>
</HEAD>
<BODY>
-Next
+<A HREF="infiniband-howto-12.html">Next</A>
<A HREF="infiniband-howto-10.html">Previous</A>
<A HREF="infiniband-howto.html#toc11">Contents</A>
<HR>
-<H2><A NAME="s11">11.</A> <A HREF="infiniband-howto.html#toc11">Further Information</A></H2>
+<H2><A NAME="s11">11.</A> <A HREF="infiniband-howto.html#toc11">Tips and Tricks</A></H2>
-<P>Extensive documentation on the OFED software is present in the ofed-docs package.</P>
-<P>The openfabrics alliance webpage can be found here:</P>
-<P>
-<A HREF="http://www.openfabrics.org/">http://www.openfabrics.org/</A></P>
+<P>This section details an assortment of miscellaneous tips.</P>
+<H2><A NAME="ss11.1">11.1</A> <A HREF="infiniband-howto.html#toc11.1">Descriptive node names</A>
+</H2>
-<P>The following mailing lists are also useful:</P>
-<P>
-<A HREF="http://lists.alioth.debian.org/mailman/listinfo/pkg-ofed-devel">http://lists.alioth.debian.org/mailman/listinfo/pkg-ofed-devel</A>:
-pkg-ofed-devel: Discussion of debian specific problem or issues.</P>
+<P>You can give your hosts descriptive names by echoing text to the following file:
+<BLOCKQUOTE><CODE>
+<PRE>
+echo `uname -n` > /sys/class/infiniband/<driver>/node_desc
+</PRE>
+</CODE></BLOCKQUOTE>
+</P>
-<P>
-<A HREF="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general</A>:
-ofa-general: General discussion of the OFED software.</P>
-<P>Books:
-<PRE>
-Infiniband Network Architecture
-by MindShare, Inc.; Tom Shanley
-Publisher: Addison-Wesley Professional
-Pub Date: October 31, 2002
-Print ISBN-10: 0-321-11765-4
-</PRE>
-</P>
<HR>
-Next
+<A HREF="infiniband-howto-12.html">Next</A>
<A HREF="infiniband-howto-10.html">Previous</A>
<A HREF="infiniband-howto.html#toc11">Contents</A>
</BODY>
Added: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-12.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-12.html?rev=487&op=file
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-12.html (added)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-12.html Tue Oct 13 14:22:04 2009
@@ -1,0 +1,43 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
+<HTML>
+<HEAD>
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
+ <TITLE>Infiniband HOWTO: Further Information</TITLE>
+ <LINK HREF="infiniband-howto-11.html" REL=previous>
+ <LINK HREF="infiniband-howto.html#toc12" REL=contents>
+</HEAD>
+<BODY>
+Next
+<A HREF="infiniband-howto-11.html">Previous</A>
+<A HREF="infiniband-howto.html#toc12">Contents</A>
+<HR>
+<H2><A NAME="s12">12.</A> <A HREF="infiniband-howto.html#toc12">Further Information</A></H2>
+
+<P>Extensive documentation on the OFED software is present in the ofed-docs package.</P>
+<P>The openfabrics alliance webpage can be found here:</P>
+<P>
+<A HREF="http://www.openfabrics.org/">http://www.openfabrics.org/</A></P>
+
+<P>The following mailing lists are also useful:</P>
+<P>
+<A HREF="http://lists.alioth.debian.org/mailman/listinfo/pkg-ofed-devel">http://lists.alioth.debian.org/mailman/listinfo/pkg-ofed-devel</A>:
+pkg-ofed-devel: Discussion of debian specific problem or issues.</P>
+
+<P>
+<A HREF="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general</A>:
+ofa-general: General discussion of the OFED software.</P>
+<P>Books:
+<PRE>
+Infiniband Network Architecture
+by MindShare, Inc.; Tom Shanley
+Publisher: Addison-Wesley Professional
+Pub Date: October 31, 2002
+Print ISBN-10: 0-321-11765-4
+</PRE>
+</P>
+<HR>
+Next
+<A HREF="infiniband-howto-11.html">Previous</A>
+<A HREF="infiniband-howto.html#toc12">Contents</A>
+</BODY>
+</HTML>
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-2.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-2.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-2.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-2.html Tue Oct 13 14:22:04 2009
@@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
<TITLE>Infiniband HOWTO: Installing the OFED Software</TITLE>
<LINK HREF="infiniband-howto-3.html" REL=next>
<LINK HREF="infiniband-howto-1.html" REL=previous>
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-3.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-3.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-3.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-3.html Tue Oct 13 14:22:04 2009
@@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
<TITLE>Infiniband HOWTO: Install the kernel modules</TITLE>
<LINK HREF="infiniband-howto-4.html" REL=next>
<LINK HREF="infiniband-howto-2.html" REL=previous>
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-4.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-4.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-4.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-4.html Tue Oct 13 14:22:04 2009
@@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
<TITLE>Infiniband HOWTO: Setting up a basic infiniband network </TITLE>
<LINK HREF="infiniband-howto-5.html" REL=next>
<LINK HREF="infiniband-howto-3.html" REL=previous>
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-5.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-5.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-5.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-5.html Tue Oct 13 14:22:04 2009
@@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
<TITLE>Infiniband HOWTO: IP over Infiniband (IPoIB)</TITLE>
<LINK HREF="infiniband-howto-6.html" REL=next>
<LINK HREF="infiniband-howto-4.html" REL=previous>
@@ -101,7 +101,7 @@
<P>In order to obtain maximum IPoIB throughput you may need to tweak the MTU and various kernel TCP buffer and window settings.
See the details in the ipoib_release_notes.txt document in the ofed-docs package.</P>
-<H2><A NAME="ss5.5">5.5</A> <A HREF="infiniband-howto.html#toc5.5">ARP and dual ported cards.</A>
+<H2><A NAME="ss5.5">5.5</A> <A HREF="infiniband-howto.html#toc5.5">ARP and dual ported cards</A>
</H2>
<P>If you have a dual ported card with both ports on the same IB subnet, but different IP subnets, you
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-6.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-6.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-6.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-6.html Tue Oct 13 14:22:04 2009
@@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
<TITLE>Infiniband HOWTO: OpenMPI</TITLE>
<LINK HREF="infiniband-howto-7.html" REL=next>
<LINK HREF="infiniband-howto-5.html" REL=previous>
@@ -74,7 +74,7 @@
<P>OpenMPI uses ssh to spawn jobs on remote hosts. You should configure a public/private keypair to ensure that you
can ssh between hosts without entering a password. You should also ensure that your login process is silent.</P>
-<H2><A NAME="ss6.6">6.6</A> <A HREF="infiniband-howto.html#toc6.6">Run the MPI PingPong benchmark.</A>
+<H2><A NAME="ss6.6">6.6</A> <A HREF="infiniband-howto.html#toc6.6">Run the MPI PingPong benchmark</A>
</H2>
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-7.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-7.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-7.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-7.html Tue Oct 13 14:22:04 2009
@@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
<TITLE>Infiniband HOWTO: SDP</TITLE>
<LINK HREF="infiniband-howto-8.html" REL=next>
<LINK HREF="infiniband-howto-6.html" REL=previous>
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-8.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-8.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-8.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-8.html Tue Oct 13 14:22:04 2009
@@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
<TITLE>Infiniband HOWTO: SRP</TITLE>
<LINK HREF="infiniband-howto-9.html" REL=next>
<LINK HREF="infiniband-howto-7.html" REL=previous>
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-9.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-9.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-9.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto-9.html Tue Oct 13 14:22:04 2009
@@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
<TITLE>Infiniband HOWTO: Building Lustre against OFED</TITLE>
<LINK HREF="infiniband-howto-10.html" REL=next>
<LINK HREF="infiniband-howto-8.html" REL=previous>
@@ -31,7 +31,7 @@
It is required for the next step.</P>
-<H2><A NAME="ss9.3">9.3</A> <A HREF="infiniband-howto.html#toc9.3">Build OFED modules for the lustre patched kernel.</A>
+<H2><A NAME="ss9.3">9.3</A> <A HREF="infiniband-howto.html#toc9.3">Build OFED modules for the lustre patched kernel</A>
</H2>
<P>Build OFED modules against the newly build lustre patched kernel.</P>
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto.html
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto.html?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto.html (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto.html Tue Oct 13 14:22:04 2009
@@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
- <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.50">
+ <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.65">
<TITLE>Infiniband HOWTO</TITLE>
<LINK HREF="infiniband-howto-1.html" REL=next>
@@ -15,6 +15,9 @@
<H1>Infiniband HOWTO</H1>
<H2>Guy Coates </H2>
+<HR>
+<EM>This document describes how to install and configure the OFED infiniband software on Debian.</EM>
+<HR>
<P>
<H2><A NAME="toc1">1.</A> <A HREF="infiniband-howto-1.html">Introduction</A></H2>
@@ -57,7 +60,7 @@
<LI><A NAME="toc5.2">5.2</A> <A HREF="infiniband-howto-5.html#ss5.2">IP Configuration</A>
<LI><A NAME="toc5.3">5.3</A> <A HREF="infiniband-howto-5.html#ss5.3">Connected vs Unconnected Mode</A>
<LI><A NAME="toc5.4">5.4</A> <A HREF="infiniband-howto-5.html#ss5.4">TCP tuning</A>
-<LI><A NAME="toc5.5">5.5</A> <A HREF="infiniband-howto-5.html#ss5.5">ARP and dual ported cards.</A>
+<LI><A NAME="toc5.5">5.5</A> <A HREF="infiniband-howto-5.html#ss5.5">ARP and dual ported cards</A>
</UL>
<P>
<H2><A NAME="toc6">6.</A> <A HREF="infiniband-howto-6.html">OpenMPI</A></H2>
@@ -68,7 +71,7 @@
<LI><A NAME="toc6.3">6.3</A> <A HREF="infiniband-howto-6.html#ss6.3">Check permissions and limits</A>
<LI><A NAME="toc6.4">6.4</A> <A HREF="infiniband-howto-6.html#ss6.4">Install the mpi test programs</A>
<LI><A NAME="toc6.5">6.5</A> <A HREF="infiniband-howto-6.html#ss6.5">Configure Host Access</A>
-<LI><A NAME="toc6.6">6.6</A> <A HREF="infiniband-howto-6.html#ss6.6">Run the MPI PingPong benchmark.</A>
+<LI><A NAME="toc6.6">6.6</A> <A HREF="infiniband-howto-6.html#ss6.6">Run the MPI PingPong benchmark</A>
</UL>
<P>
<H2><A NAME="toc7">7.</A> <A HREF="infiniband-howto-7.html">SDP</A></H2>
@@ -91,17 +94,25 @@
<UL>
<LI><A NAME="toc9.1">9.1</A> <A HREF="infiniband-howto-9.html#ss9.1">Check Compatibility</A>
<LI><A NAME="toc9.2">9.2</A> <A HREF="infiniband-howto-9.html#ss9.2">Build a lustre patched kernel</A>
-<LI><A NAME="toc9.3">9.3</A> <A HREF="infiniband-howto-9.html#ss9.3">Build OFED modules for the lustre patched kernel.</A>
+<LI><A NAME="toc9.3">9.3</A> <A HREF="infiniband-howto-9.html#ss9.3">Build OFED modules for the lustre patched kernel</A>
<LI><A NAME="toc9.4">9.4</A> <A HREF="infiniband-howto-9.html#ss9.4">Configure lustre</A>
</UL>
<P>
-<H2><A NAME="toc10">10.</A> <A HREF="infiniband-howto-10.html">Network Troubleshooting</A></H2>
+<H2><A NAME="toc10">10.</A> <A HREF="infiniband-howto-10.html">Troubleshooting</A></H2>
<UL>
-<LI><A NAME="toc10.1">10.1</A> <A HREF="infiniband-howto-10.html#ss10.1">ibdiagnet</A>
+<LI><A NAME="toc10.1">10.1</A> <A HREF="infiniband-howto-10.html#ss10.1">Genernal fabric troubleshooting</A>
+<LI><A NAME="toc10.2">10.2</A> <A HREF="infiniband-howto-10.html#ss10.2">ib_query_gid() failed errors on mlx4 platforms</A>
+<LI><A NAME="toc10.3">10.3</A> <A HREF="infiniband-howto-10.html#ss10.3">Missing XRC support</A>
</UL>
<P>
-<H2><A NAME="toc11">11.</A> <A HREF="infiniband-howto-11.html">Further Information</A></H2>
+<H2><A NAME="toc11">11.</A> <A HREF="infiniband-howto-11.html">Tips and Tricks</A></H2>
+
+<UL>
+<LI><A NAME="toc11.1">11.1</A> <A HREF="infiniband-howto-11.html#ss11.1">Descriptive node names</A>
+</UL>
+<P>
+<H2><A NAME="toc12">12.</A> <A HREF="infiniband-howto-12.html">Further Information</A></H2>
<HR>
<A HREF="infiniband-howto-1.html">Next</A>
Modified: trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto.txt
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto.txt?rev=487&op=diff
==============================================================================
--- trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto.txt (original)
+++ trunk/ofed-docs/trunk/DEBIAN-HOWTO/infiniband-howto.txt Tue Oct 13 14:22:04 2009
@@ -1,7 +1,10 @@
- Infiniband Howto
+ Infiniband HOWTO
Guy Coates
- ____________________________________________________________
+
+ This document describes how to install and configure the OFED infini-
+ band software on Debian.
+ ______________________________________________________________________
Table of Contents
@@ -38,15 +41,15 @@
5.2 IP Configuration
5.3 Connected vs Unconnected Mode
5.4 TCP tuning
- 5.5 ARP and dual ported cards.
+ 5.5 ARP and dual ported cards
6. OpenMPI
6.1 Configure IPoIB
6.2 Load the modules
6.3 Check permissions and limits
6.4 Install the mpi test programs
- 6.5 Configure Hosts
- 6.6 Run the MPI PingPong benchmark.
+ 6.5 Configure Host Access
+ 6.6 Run the MPI PingPong benchmark
7. SDP
7.1 Configuration
@@ -62,13 +65,18 @@
9. Building Lustre against OFED
9.1 Check Compatibility
9.2 Build a lustre patched kernel
- 9.3 Build OFED modules for the lustre patched kernel.
+ 9.3 Build OFED modules for the lustre patched kernel
9.4 Configure lustre
- 10. Network Troubleshooting
- 10.1 ibdiagnet
-
- 11. Further Information
+ 10. Troubleshooting
+ 10.1 Genernal fabric troubleshooting
+ 10.2 ib_query_gid() failed errors on mlx4 platforms
+ 10.3 Missing XRC support
+
+ 11. Tips and Tricks
+ 11.1 Descriptive node names
+
+ 12. Further Information
______________________________________________________________________
@@ -132,12 +140,11 @@
If you wish to build the OFED packages from the alioth svn repository,
use the following procedure.
-
22..22..11.. IInnssttaallll tthhee pprreerreeqquuiissiitteess ddeevveellooppmmeenntt ppaacckkaaggeess
- aptitude install svn-buildpackage build-essential devscripts
+ aptitude install svn-buildpackage build-essential devscripts
@@ -159,10 +166,19 @@
- Populate the tarballs with the *.orig.tar.gz files available form the
- "upstream source" release on
- https://alioth.debian.org/frs/?group_id=100311
- <https://alioth.debian.org/frs/?group_id=100311>
+ Original source tarballs can be downloaded from the repository:
+
+
+ apt-get source libibverbs
+
+
+
+ Alternatively, you can grab the source code directly from upstream.
+
+ http://www.openfabrics.org/downloads/OFED/
+
+ Upstream source is distributed via SRPMS; you can use alien to convert
+ them into tarballs.
22..22..44.. BBuuiilldd tthhee ppaacckkaaggeess..
@@ -191,29 +207,35 @@
build order is:
-
- libibcm
- libibcommon
- libibumad
- libibmad
- libnes
- libsdp
- dapl
- opensm
- infiniband-diags
- ibutils
- mstflint
- perftest
- qlvnictools
- qpert
- rds-tools
- sdpnetstat
- srptools
- tvflash
- ibsim
- ofed-docs
- ofa_kernel
- ofed
+ libibverbs
+ libnes
+ libcxgb3
+ libipathverbs
+ libmlx4
+ libmthca
+ librdmacm
+ libibcm
+ libibcommon
+ libibumad
+ libibmad
+ libsdp
+ dapl
+ opensm
+ infiniband-diags
+ ibutils
+ mstflint
+ perftest
+ qlvnictools
+ qperf
+ rds-tools
+ sdpnetstat
+ srptools
+ tvflash
+ ibsim
+ mpitests
+ ofed-docs
+ ofa_kernel
+ ofed
@@ -241,6 +263,8 @@
set of modules rather than relying on the modules shipped with the
kernel.
+
+
33..11.. BBuuiillddiinngg nneeww kkeerrnneell mmoodduulleess
You can build new kernel modules using module-assistant.
@@ -253,17 +277,27 @@
Ensure you have the ofa-kernel-source package installed, and then run:
-
- module-assistant prepare
- module-assistant clean ofa-kernel
- module-assistant build ofa-kernel
-
-
-
- This will create a deb which you can then install. As the deb contains
- replacements for existing kernel modules you will need to either manu-
- ally remove any infiniband modules which have already been loaded, or
- reboot the machine, before you can use the new modules.
+ module-assistant prepare
+ module-assistant clean ofa-kernel
+ module-assistant build ofa-kernel
+
+
+
+ This procedure will create an ofa-kernel-modules deb in /usr/src. You
+ can the install the deb using dpkg or by running:
+
+
+ module-assistant install ofa-kernel
+
+
+
+ The deb can also be copied to your other infiniband hosts and
+ installed using dpkg.
+
+ As the deb contains replacements for existing kernel modules you will
+ need to either manually remove any infiniband modules which have
+ already been loaded, or reboot the machine, before you can use the new
+ modules.
The new kernel modules will be installed into /usr/lib/<kernel-
version>/updates. They will not overwrite the original kernel modules,
@@ -282,14 +316,16 @@
- Note that if you wish to rebuild the kernel modules (eg for a new
- kernel version) then you must issue the module-assistant clean command
- before trying a new build.
+ Note that if you wish to rebuild the kernel modules for any reason,
+ (eg for a new kernel version or to continue an interrupted build) then
+ you must issue the "module-assistant clean" command before trying a
+ new build.
44.. SSeettttiinngg uupp aa bbaassiicc iinnffiinniibbaanndd nneettwwoorrkk
This sections describes how to set up a basic infiniband network and
test its functionality.
+
44..11.. UUppggrraaddee yyoouurr IInnffiinniibbaanndd ccaarrdd aanndd sswwiittcchh ffiirrmmwwaarree
@@ -355,9 +391,10 @@
You can find the port GUIDs of your cards with the ibstat -p command:
- # ibstat -p
- 0x0002c9030002fb05
- 0x0002c9030002fb06
+
+ # ibstat -p
+ 0x0002c9030002fb05
+ 0x0002c9030002fb06
@@ -398,32 +435,32 @@
- # ibstat
- CA 'mlx4_0'
- CA type: MT25418
- Number of ports: 2
- Firmware version: 2.3.0
- Hardware version: a0
- Node GUID: 0x0002c9030002fb04
- System image GUID: 0x0002c9030002fb07
- Port 1:
- State: Active
- Physical state: LinkUp
- Rate: 20
- Base lid: 2
- LMC: 0
- SM lid: 1
- Capability mask: 0x02510868
- Port GUID: 0x0002c9030002fb05
- Port 2:
- State: Down
- Physical state: Polling
- Rate: 10
- Base lid: 0
- LMC: 0
- SM lid: 0
- Capability mask: 0x02510868
- Port GUID: 0x0002c9030002fb06
+ # ibstat
+ CA 'mlx4_0'
+ CA type: MT25418
+ Number of ports: 2
+ Firmware version: 2.3.0
+ Hardware version: a0
+ Node GUID: 0x0002c9030002fb04
+ System image GUID: 0x0002c9030002fb07
+ Port 1:
+ State: Active
+ Physical state: LinkUp
+ Rate: 20
+ Base lid: 2
+ LMC: 0
+ SM lid: 1
+ Capability mask: 0x02510868
+ Port GUID: 0x0002c9030002fb05
+ Port 2:
+ State: Down
+ Physical state: Polling
+ Rate: 10
+ Base lid: 0
+ LMC: 0
+ SM lid: 0
+ Capability mask: 0x02510868
+ Port GUID: 0x0002c9030002fb06
@@ -447,6 +484,7 @@
Ca : 0x0002c9030002fc10 ports 2 "MT25408 ConnectX Mellanox Technologies"
+
ibswitches will display all of the switches in the network.
@@ -459,32 +497,33 @@
network.
- #iblinkinfo.pl
- Switch 0x0008f104004121fa ISR9024D-M Voltaire:
- 1 1[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 2 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 2[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 13 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 3[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 4 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 4[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 26 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 5[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 27 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 6[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 24 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 7[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 28 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 8[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 25 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 9[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 31 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 10[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 32 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 11[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 33 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 12[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 29 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 1 13[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 30 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
- 14[ ] ==( 4X 2.5 Gbps Down / Polling)==> [ ] "" ( )
- 1 15[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 3 1[ ] "Voltaire HCA400Ex-D" ( )
- 1 16[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 10 1[ ] "Voltaire HCA400Ex-D" ( )
- 17[ ] ==( 4X 2.5 Gbps Down / Polling)==> [ ] "" ( )
- 18[ ] ==( 4X 2.5 Gbps Down / Polling)==> [ ] "" ( )
- 1 19[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 7 2[ ] "Voltaire HCA400Ex-D" ( )
- 1 20[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 6 2[ ] "Voltaire HCA400Ex-D" ( )
- 1 21[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 5 2[ ] "Voltaire HCA400Ex-D" ( )
- 1 22[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 21 1[ ] "Voltaire HCA400Ex-D" ( )
- 1 23[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 9 2[ ] "Voltaire HCA400Ex-D" ( )
- 1 24[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 8 1[ ] "Voltaire HCA400Ex-D" ( )
+
+ #iblinkinfo.pl
+ Switch 0x0008f104004121fa ISR9024D-M Voltaire:
+ 1 1[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 2 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 2[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 13 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 3[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 4 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 4[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 26 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 5[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 27 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 6[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 24 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 7[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 28 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 8[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 25 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 9[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 31 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 10[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 32 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 11[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 33 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 12[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 29 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 1 13[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 30 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
+ 14[ ] ==( 4X 2.5 Gbps Down / Polling)==> [ ] "" ( )
+ 1 15[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 3 1[ ] "Voltaire HCA400Ex-D" ( )
+ 1 16[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 10 1[ ] "Voltaire HCA400Ex-D" ( )
+ 17[ ] ==( 4X 2.5 Gbps Down / Polling)==> [ ] "" ( )
+ 18[ ] ==( 4X 2.5 Gbps Down / Polling)==> [ ] "" ( )
+ 1 19[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 7 2[ ] "Voltaire HCA400Ex-D" ( )
+ 1 20[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 6 2[ ] "Voltaire HCA400Ex-D" ( )
+ 1 21[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 5 2[ ] "Voltaire HCA400Ex-D" ( )
+ 1 22[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 21 1[ ] "Voltaire HCA400Ex-D" ( )
+ 1 23[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 9 2[ ] "Voltaire HCA400Ex-D" ( )
+ 1 24[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 8 1[ ] "Voltaire HCA400Ex-D" ( )
@@ -523,12 +562,12 @@
server.
- #ib_rdma_lat hostname-of-server
- local address: LID 0x0d QPN 0x18004a PSN 0xca58c4 RKey 0xda002824 VAddr 0x00000000509001
- remote address: LID 0x02 QPN 0x7c004a PSN 0x4b4eba RKey 0x82002466 VAddr 0x00000000509001
- Latency typical: 1.15193 usec
- Latency best : 1.13094 usec
- Latency worst : 5.48519 usec
+ #ib_rdma_lat hostname-of-server
+ local address: LID 0x0d QPN 0x18004a PSN 0xca58c4 RKey 0xda002824 VAddr 0x00000000509001
+ remote address: LID 0x02 QPN 0x7c004a PSN 0x4b4eba RKey 0x82002466 VAddr 0x00000000509001
+ Latency typical: 1.15193 usec
+ Latency best : 1.13094 usec
+ Latency worst : 5.48519 usec
@@ -572,27 +611,28 @@
#modprobe ib_ipoib
- You will now have an "ib" network interface for each of your
- infiniband cards.
-
-
- #ifconfig -a
-
- <snip>
- ib0 Link encap:UNSPEC HWaddr 80-06-00-48-FE-80-00-00-00-00-00-00-00-00-00-00
- BROADCAST MULTICAST MTU:2044 Metric:1
- RX packets:0 errors:0 dropped:0 overruns:0 frame:0
- TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
- collisions:0 txqueuelen:256
- RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
-
- ib1 Link encap:UNSPEC HWaddr 80-06-00-49-FE-80-00-00-00-00-00-00-00-00-00-00
- BROADCAST MULTICAST MTU:2044 Metric:1
- RX packets:0 errors:0 dropped:0 overruns:0 frame:0
- TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
- collisions:0 txqueuelen:256
- RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
- <snip>
+ You will now have an "ib" network interface for each of your infini-
+ band cards.
+
+
+
+ #ifconfig -a
+
+ <snip>
+ ib0 Link encap:UNSPEC HWaddr 80-06-00-48-FE-80-00-00-00-00-00-00-00-00-00-00
+ BROADCAST MULTICAST MTU:2044 Metric:1
+ RX packets:0 errors:0 dropped:0 overruns:0 frame:0
+ TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
+ collisions:0 txqueuelen:256
+ RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
+
+ ib1 Link encap:UNSPEC HWaddr 80-06-00-49-FE-80-00-00-00-00-00-00-00-00-00-00
+ BROADCAST MULTICAST MTU:2044 Metric:1
+ RX packets:0 errors:0 dropped:0 overruns:0 frame:0
+ TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
+ collisions:0 txqueuelen:256
+ RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
+ <snip>
@@ -648,10 +688,10 @@
details in the ipoib_release_notes.txt document in the ofed-docs
package.
- 55..55.. AARRPP aanndd dduuaall ppoorrtteedd ccaarrddss..
-
- If you have a dual ported card with both ports on the same IB subnet
- but a different IP subnet, you will need to tweak the ARP settings for
+ 55..55.. AARRPP aanndd dduuaall ppoorrtteedd ccaarrddss
+
+ If you have a dual ported card with both ports on the same IB subnet,
+ but different IP subnets, you will need to tweak the ARP settings for
the IPoIB interfaces. See ipoib_release_notes.txt in the ofed-docs
package for a full discussion of this issue.
@@ -702,7 +742,7 @@
OpenMPI will need to pin memory. Edit /etc/security/limits.conf and
add the line:
- * hard memlock unlimited
+ * hard memlock unlimited
66..44.. IInnssttaallll tthhee mmppii tteesstt pprrooggrraammss
@@ -712,36 +752,30 @@
aptitude install mpitests
- 66..55.. CCoonnffiigguurree HHoossttss
+ 66..55.. CCoonnffiigguurree HHoosstt AAcccceessss
OpenMPI uses ssh to spawn jobs on remote hosts. You should configure a
public/private keypair to ensure that you can ssh between hosts
without entering a password. You should also ensure that your login
process is silent.
- Choose two hosts on which to test the program and put their hostnames
- into a file called hostfile:
-
-
-
- hostA slots=1
- hostB slots=1
-
-
-
- 66..66.. RRuunn tthhee MMPPII PPiinnggPPoonngg bbeenncchhmmaarrkk..
+ 66..66.. RRuunn tthhee MMPPII PPiinnggPPoonngg bbeenncchhmmaarrkk
We will use the MPI PingPong benchmark for our testing. By default,
openmpi should use inifiniband networks in preference to any tcp
- networks it finds. However, we will force mpi to be extra-chatty
- during the test to ensure that we are really using the infiniband
- interfaces.
-
- (ADDME: Is there a better way to confirm which networks openmpi is
- using?)
-
-
- mpirun --mca btl_openib_verbose 1 --mca btl ^tcp -n 2 -hostfile /path/to/hostfile IMB-MPI1 PingPong
+ networks it finds. However, we will force mpi to ignore tcp networks
+ to ensure that is using the infiniband network.
+
+
+ #!/bin/bash
+ #Infiniband MPI test program
+ #Edit the hosts below to match your test hosts
+ cat > /tmp/hostfile.$$.mpi <<EOF
+ hostA slots=1
+ HostB slots=1
+ EOF
+
+ mpirun --mca btl_openib_verbose 1 --mca btl ^tcp -n 2 -hostfile /tmp/hostfile.$$.mpi IMB-MPI1 PingPong
@@ -798,7 +832,19 @@
are connected via eth0).
- mpirun --mca btl ^openib --mca btl_tcp_if_include eth0 --hostfile hostfile -n 2 IMB-MPI1 -benchmark PingPong
+ #!/bin/bash
+ #TCP MPI test program
+ #Edit the hosts below to match your test hosts
+ cat > /tmp/hostfile.$$.mpi <<EOF
+ hostA slots=1
+ HostB slots=1
+ EOF
+ mpirun --mca btl ^openib --mca btl_tcp_if_include eth0 --hostfile hostfile -n 2 IMB-MPI1 -benchmark PingPong
+
+
+
+ You should notice signficantly higher latencies than for the
+ infiniband test.
@@ -815,6 +861,7 @@
SDP used IPoIB for address resolution, so you must configure IPoIB
before using SDP.
+
You should also ensure the ib_sdp kernel module is installed.
modprobe ib_sdp
@@ -867,23 +914,22 @@
123: 8388611 bytes 3 times --> 2941.76 Mbps in 21755.66 usec
-
Now repeat the test, but force netpipe to use SDP rather than TCP.
- nodeA# LD_PRELOAD=libsdp.so NPtcp
- nodeB# LD_PRELOAD=libsdp.so NPtcp -h 10.0.0.1
- Send and receive buffers are 16384 and 87380 bytes
- (A bug in Linux doubles the requested buffer sizes)
- Now starting the main loop
- 0: 1 bytes 9765 times --> 1.45 Mbps in 5.28 usec
- 1: 2 bytes 18946 times --> 2.80 Mbps in 5.46 usec
- 2: 3 bytes 18323 times --> 4.06 Mbps in 5.63 usec
- <snip>
- 121: 8388605 bytes 5 times --> 7665.51 Mbps in 8349.08 usec
- 122: 8388608 bytes 5 times --> 7668.62 Mbps in 8345.70 usec
- 123: 8388611 bytes 5 times --> 7629.04 Mbps in 8389.00 usec
+ nodeA# LD_PRELOAD=libsdp.so NPtcp
+ nodeB# LD_PRELOAD=libsdp.so NPtcp -h 10.0.0.1
+ Send and receive buffers are 16384 and 87380 bytes
+ (A bug in Linux doubles the requested buffer sizes)
+ Now starting the main loop
+ 0: 1 bytes 9765 times --> 1.45 Mbps in 5.28 usec
+ 1: 2 bytes 18946 times --> 2.80 Mbps in 5.46 usec
+ 2: 3 bytes 18323 times --> 4.06 Mbps in 5.63 usec
+ <snip>
+ 121: 8388605 bytes 5 times --> 7665.51 Mbps in 8349.08 usec
+ 122: 8388608 bytes 5 times --> 7668.62 Mbps in 8345.70 usec
+ 123: 8388611 bytes 5 times --> 7629.04 Mbps in 8389.00 usec
@@ -1002,7 +1048,7 @@
wiki. Once you have build the kernel keep the configured source tree.
It is required for the next step.
- 99..33.. BBuuiilldd OOFFEEDD mmoodduulleess ffoorr tthhee lluussttrree ppaattcchheedd kkeerrnneell..
+ 99..33.. BBuuiilldd OOFFEEDD mmoodduulleess ffoorr tthhee lluussttrree ppaattcchheedd kkeerrnneell
Build OFED modules against the newly build lustre patched kernel.
@@ -1030,18 +1076,70 @@
- 1100.. NNeettwwoorrkk TTrroouubblleesshhoooottiinngg
-
- 1100..11.. iibbddiiaaggnneett
+ 1100.. TTrroouubblleesshhoooottiinngg
+
+ This section covers general troubleshooting and commonly reported
+ problems.
+
+ 1100..11.. GGeenneerrnnaall ffaabbrriicc ttrroouubblleesshhoooottiinngg
The ibdiagnet program can be used to troubleshoot potential issues
with your infiniband fabric.
-
ibdiagnet -r
- 1111.. FFuurrtthheerr IInnffoorrmmaattiioonn
+ 1100..22.. iibb__qquueerryy__ggiidd(()) ffaaiilleedd eerrrroorrss oonn mmllxx44 ppllaattffoorrmmss
+
+ ibstat or opensm hangs and the following kernel messages are printed:
+
+
+
+ kernel: [ 78.170077] ib0: ib_query_gid() failed
+ kernel: [ 89.272789] ib0: ib_query_port failed
+
+
+
+ Fix: Load the mlx4_core module with the msi_x=0 option.
+
+
+ cat > /etc/modprobe.d/mlx4_core <<EOF
+ options mlx4_core msi_x=0
+ EOF
+
+ update-initramfs -u
+
+
+
+ 1100..33.. MMiissssiinngg XXRRCC ssuuppppoorrtt
+
+ If you see error messages pertaining to missing support for XRC, it
+ means you have mis-matched kernel modules and userspace libraries.
+
+
+ mlx4: There is a mismatch between the kernel and the userspace
+ libraries: Kernel does not support XRC. Exiting.
+
+
+
+ Fix: Make sure that you build and install the OFED kernel modules as
+ described in section X.
+
+ 1111.. TTiippss aanndd TTrriicckkss
+
+ This section details an assortment of miscellaneous tips.
+
+ 1111..11.. DDeessccrriippttiivvee nnooddee nnaammeess
+
+ You can give your hosts descriptive names by echoing text to the
+ following file:
+
+
+ echo `uname -n` > /sys/class/infiniband/<driver>/node_desc
+
+
+
+ 1122.. FFuurrtthheerr IInnffoorrmmaattiioonn
Extensive documentation on the OFED software is present in the ofed-
docs package.
@@ -1061,8 +1159,6 @@
general: General discussion of the OFED software.
Books:
-
-
Infiniband Network Architecture
by MindShare, Inc.; Tom Shanley
More information about the Pkg-ofed-commits
mailing list