[Pkg-haproxy-maintainers] Issue with Haproxy Reload

Fri Sep 11 07:42:28 UTC 2015

Hi Rajeev,

> We are using Haproxy on top of Mesos cluster, We are doing  dynamic reloads for Haproxy based on marathon events (50-100 times in a day). We have nearly 300 applications that are running on Mesos (300 virtual hosts in Haproxy).

That should be very doable; for context we reload HAProxy thousands of
times per day and have around the same number of services. We do
leverage our improvements to https://github.com/airbnb/synapse to
minimize the number of reloads we have to do, but marathon is good at
making us reload. Just curious, how do you have HAProxy deployed, is
it running on a centralized machine somewhere or is it running on
every host?

> When we do dynamic reloads, Haproxy is taking long time to reload Haproxy, we observed that for 50 applications takes 30-40secs to reload Haproxy.

This seems very surprising to me unless you're doing something like
SSL. Can you post a portion of your config?

> We have a single config file for Haproxy, when we do reload all the applications are getting reloaded (Front-ends), this causing downtime of all applications. Is there anyway to reduce the downtime and impact on end-users.
>
> We tried this scenario,
> "http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html"
>
> By this if user requests while reload, the requests are queued and serving after reload.

Full disclaimer, I wrote that post, and I'm not sure that it will be
all that useful to you if your clients are external or your reloads
take > 30s. "The largest drawback is that this works only for outgoing
links and not for incoming traffic." It would theoretically not be
hard to extend to incoming traffic using ifb but I haven't worked on
actually proving out that solution. If the reload takes > 30s that
technique simply won't work (you'll be buffering connections for 30s,
and likely dropping them). If the 30s reloads are unavoidable you will
likely want to consider one of the alternative strategies mentioned in
the post. For example you can just drop SYNs since the 1s penalty
isn't that big of a deal (will still see 30s+ of unavailability), use
nginx/haproxy to route in front of haproxy (can be a bit confusing and
hard to work with), or make something similar to
http://inside.unbounce.com/product-dev/haproxy-reloads/ (be wary you
pay conntrack with a solution like that).

> But if we do multiple reloads one after another, HaProxy old processes persist even after reloading the HaProxy service, this is causing the serious issue.
>
> root      7816  0.1  0.0  20024  3028 ?        Ss   03:52   0:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 6778
> root      7817  0.0  0.0  20024  3148 ?        Ss   03:52   0:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 6778
>
> Is there any solution stop the previous process once after it serving the request.

That is expected behaviour afaik. Those processes are likely still
alive because there are still open connections held against them. How
long is the longest timeout on your backend servers? This is common
with long lived TCP mode backends, but those apps are often resilient
to losing the TCP connection so you may just be able to kill the
haproxy instances (it's what we do).

> Can we separate the configurations based on front-ends like in Nginx, so that only those apps will effect if there is any changes in backend.

I mean there is nothing that stops you from running multiple haproxy
instances that bind to different ports. I think the right place to
start though is figuring out why reloading takes so long, which can
probably be figured out by looking at the config.

Good luck,
-Joey