[Pkg-lustre-maintainers] Add fix for bug 13614

Niklas Edmundsson nikke at acc.umu.se
Thu Sep 27 14:28:43 UTC 2007


Hi again!

I would suggest adding the fix for bug 13614. Really, it's a kludge 
until the core problem is fixed, but it mitigates the problem enough 
to be useful.

We easily triggered it when rebooting both our OST's at the same time. 
When lustre went into recovery it more or less hammered itself to 
death resulting in soft lockups and related nastiness. With the patch 
applied we haven't been able to trigger it.

Attached is a version adapted for pkg-lustre trunk (mainly 
formatting).

/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     nikke at acc.umu.se
---------------------------------------------------------------------------
  "Mr Garibaldi would be delighted."--Garibaldi
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-------------- next part --------------
Index: trunk/debian/patches/bug13614-kludge-stop-resend.dpatch
===================================================================
--- trunk/debian/patches/bug13614-kludge-stop-resend.dpatch	(revision 0)
+++ trunk/debian/patches/bug13614-kludge-stop-resend.dpatch	(revision 0)
@@ -0,0 +1,25 @@
+#! /bin/sh /usr/share/dpatch/dpatch-run
+##
+## All lines beginning with `## DP:' are a description of the patch.
+## DP: CFS bug 13614 - Reduce problem by stopping resend.
+
+ at DPATCH@
+diff -u -p -r1.189.34.11 ldlm_lockd.c
+--- ./lustre/ldlm/ldlm_lockd.c	16 Aug 2007 01:22:37 -0000	1.189.34.11
++++ ./lustre/ldlm/ldlm_lockd.c	12 Sep 2007 01:35:58 -0000
+@@ -566,6 +566,7 @@ int ldlm_server_blocking_ast(struct ldlm
+         req->rq_async_args.pointer_arg[0] = arg;
+         req->rq_async_args.pointer_arg[1] = lock;
+         req->rq_interpret_reply = ldlm_cb_interpret;
++        req->rq_no_resend = 1;
+ 
+         lock_res(lock->l_resource);
+         if (lock->l_granted_mode != lock->l_req_mode) {
+@@ -664,6 +665,7 @@ int ldlm_server_completion_ast(struct ld
+         req->rq_async_args.pointer_arg[0] = arg;
+         req->rq_async_args.pointer_arg[1] = lock;
+         req->rq_interpret_reply = ldlm_cb_interpret;
++        req->rq_no_resend = 1;
+ 
+         body = lustre_msg_buf(req->rq_reqmsg, DLM_LOCKREQ_OFF, sizeof(*body));
+         body->lock_handle[0] = lock->l_remote_handle;


More information about the Pkg-lustre-maintainers mailing list