[Pkg-iscsi-maintainers] Bug#687619: Bug#687619: iscsitarget restart fail if more than 32 session try reconnect

Laszlo Fekete blackluck at ktk.bme.hu
Tue Sep 18 09:48:21 UTC 2012


Hello!

The INCOMING_MAX parameter change (in source code) solved the problem.

http://blog.wpkg.org/2007/09/09/solving-reliability-and-scalability-problems-
with-iscsi/

I think it would be very helpful if this parameter could be changed with a 
global variable at iscsitarget daemon start or just create an error log entry 
if the limit reached, maybe a simple warning at init script if more than 32 
active sessions, that it's possible fail.

Regards, blackluck

On 2012. September 14. 23:48:27 Laszlo Fekete wrote:
> On 2012. September 15. 01:06:27 Ritesh Raj Sarraf wrote:
> > On Friday 14 September 2012 09:07 PM, Laszlo Fekete wrote:
> > >> Is there an error message/code ?
> > > 
> > > This is in the initiator logs:
> > > Sep 13 14:40:09 mail01 iscsid: Kernel reported iSCSI connection 4:0
> > > error
> > > (1020) state (3)
> > > Sep 13 14:40:20 mail01 iscsid: connection4:0 is operational after
> > > recovery
> > > (2 attempts)
> > 
> > So the connection did recover.
> 
> Yes, it recovers because of 1-5 another iscsi target restart after the first
> failed restart just the initiator don't see any change if the target
> restart failed.
> The connection recovers only after a sucessful restart but not all restart
> sucessful if there is more than 32 sessions try to recover in a short time.
> 
> > >> Why do you change it to 1 ? That's a very low value and will just flood
> > >> the target.
> > > 
> > > As I said, using multipath, so want a fast response if there is a
> > > connection/session error to change to the other path. That's why I'm
> > > using
> > 
> > The multipath path checker loop triggers every 5 seconds.
> > 
> > > these values:
> > > node.session.timeo.replacement_timeout = 5
> > > node.session.err_timeo.abort_timeout = 5
> > > node.session.err_timeo.lu_reset_timeout = 5
> > > node.session.err_timeo.host_reset_timeout = 60
> > > node.session.iscsi.FastAbort = Yes
> > > node.session.iscsi.InitialR2T = No
> > > node.session.iscsi.ImmediateData = Yes
> > > node.session.iscsi.FirstBurstLength = 262144
> > > node.session.iscsi.MaxBurstLength = 16776192
> > > node.conn[0].timeo.logout_timeout = 5
> > > node.conn[0].timeo.login_timeout = 5
> > > node.conn[0].timeo.auth_timeout = 45
> > > node.conn[0].timeo.noop_out_interval = 1
> > > node.conn[0].timeo.noop_out_timeout = 1
> > > 
> > > But as I said, this also affected to that initiators which don't use
> > > multipath and had the default open-iscsi values.
> > > 
> > > 
> > > There is an INCOMING_MAX 32 limit in the source, that wrote few minutes
> > > before your last mail, hope you got that, I think that will be the
> > > problem and will check it next week.
> > 
> > Okay!! Let me know what your findings are. From what you have shared up
> > till now, I don't see much a problem with IET or open-iscsi.
> 
> The problem is if there are more than 32 active connections when restart
> iscsi target it may fail and don't see any error in the logs, just the
> initiators try to reconnect.
> 
> You can tell to raise the timeouts, but that's still like lottery. If I have
> 80 sessions when restarting the target and 35 of them try to reconnect in
> the same time it will also fail and there is nothing error message.
> 
> 
> I hope increasing the default INCOMING_MAX 32 setting in the source code
> will solve the problem. (Next week I'm going to test this.)
> 
> If you say this isn't a bug, that's fine because this is a limit in the
> source code (if really it's the problem) and can't be configured
> dinamically. But this wasn't clear for me and spent 4 days with debugging
> to suspect only that maybe there is a 32 limit somewhere.
> 
> So maybe a warning message would be helpful about that in the init script if
> there are more than 32 active sessions or create an error log entry that
> reached the incoming_max limit.



More information about the Pkg-iscsi-maintainers mailing list