[Pkg-iscsi-maintainers] Bug#687619: Bug#687619: iscsitarget restart fail if more than 32 session try reconnect

Fri Sep 14 21:48:27 UTC 2012

On 2012. September 15. 01:06:27 Ritesh Raj Sarraf wrote:
> On Friday 14 September 2012 09:07 PM, Laszlo Fekete wrote:
> >> Is there an error message/code ?
> > 
> > This is in the initiator logs:
> > Sep 13 14:40:09 mail01 iscsid: Kernel reported iSCSI connection 4:0 error
> > (1020) state (3)
> > Sep 13 14:40:20 mail01 iscsid: connection4:0 is operational after recovery
> > (2 attempts)
> 
> So the connection did recover.
Yes, it recovers because of 1-5 another iscsi target restart after the first 
failed restart just the initiator don't see any change if the target restart 
failed.
The connection recovers only after a sucessful restart but not all restart 
sucessful if there is more than 32 sessions try to recover in a short time.

> 
> >> Why do you change it to 1 ? That's a very low value and will just flood
> >> the target.
> > 
> > As I said, using multipath, so want a fast response if there is a
> > connection/session error to change to the other path. That's why I'm using
> 
> The multipath path checker loop triggers every 5 seconds.
> 
> > these values:
> > node.session.timeo.replacement_timeout = 5
> > node.session.err_timeo.abort_timeout = 5
> > node.session.err_timeo.lu_reset_timeout = 5
> > node.session.err_timeo.host_reset_timeout = 60
> > node.session.iscsi.FastAbort = Yes
> > node.session.iscsi.InitialR2T = No
> > node.session.iscsi.ImmediateData = Yes
> > node.session.iscsi.FirstBurstLength = 262144
> > node.session.iscsi.MaxBurstLength = 16776192
> > node.conn[0].timeo.logout_timeout = 5
> > node.conn[0].timeo.login_timeout = 5
> > node.conn[0].timeo.auth_timeout = 45
> > node.conn[0].timeo.noop_out_interval = 1
> > node.conn[0].timeo.noop_out_timeout = 1
> > 
> > But as I said, this also affected to that initiators which don't use
> > multipath and had the default open-iscsi values.
> > 
> > 
> > There is an INCOMING_MAX 32 limit in the source, that wrote few minutes
> > before your last mail, hope you got that, I think that will be the
> > problem and will check it next week.
> 
> Okay!! Let me know what your findings are. From what you have shared up
> till now, I don't see much a problem with IET or open-iscsi.

The problem is if there are more than 32 active connections when restart iscsi 
target it may fail and don't see any error in the logs, just the initiators 
try to reconnect.

You can tell to raise the timeouts, but that's still like lottery. If I have 
80 sessions when restarting the target and 35 of them try to reconnect in the 
same time it will also fail and there is nothing error message.

I hope increasing the default INCOMING_MAX 32 setting in the source code will 
solve the problem. (Next week I'm going to test this.)

If you say this isn't a bug, that's fine because this is a limit in the source 
code (if really it's the problem) and can't be configured dinamically.
But this wasn't clear for me and spent 4 days with debugging to suspect only 
that maybe there is a 32 limit somewhere.

So maybe a warning message would be helpful about that in the init script if 
there are more than 32 active sessions or create an error log entry that 
reached the incoming_max limit.