autopkgtest-build-lxd failing with bionic

Steve Langasek steve.langasek at ubuntu.com
Wed Feb 21 00:43:19 UTC 2018


Hi all,

On Tue, Feb 20, 2018 at 10:44:42PM +0100, Martin Pitt wrote:
> Steve Langasek [2018-02-16 11:12 -0800]:
> > > >   [ -n "$(ip route show to 0/0)" ]

> > > This is better though, and works too. Please take a look at the attached
> > > patch. Thanks! :-)

> > Actually no, this is racy, because the route comes up before DNS resolution
> > is in place.

> I'm not actually sure if network-online.target would actually guard against
> that with all implementations.

Then to be blunt, the definition of the target should be fixed in those
implementations so that it's not useless.

I understand and agree with the argument that modern services should be
robust in the face of intermittent networks.  But I don't agree that
network-online is "legacy" only for sysvinit compatibility, or that its
definition is too mushy to be useful.  For oneshot-style operations (such
as... things you want to do on a one-time basis on first boot of an
autopkgtest runner VM, without having to write a daemon around them that
listens to netlink), network-online.target is precisely the right semantic.

autopkgtest is *not* the only thing that cares about this.  The problem
should be solved once, well, in the systemd network stack, not pushed onto
the consumers to repeatedly reimplement poorly.

> But in practice, in most cases you'll get DNS either via static
> configuration (in which case there's nothing further to wait for) or via
> DHCP (in which case your address and DNS solvers ought to arrive at the
> same time).

With systemd-networkd and systemd-resolved, we have genuinely seen races
in autopkgtests because the time between networkd applying the routes from
DHCP, and resolved applying the DNS settings from the same DHCP source, is
measurable.

Maybe us catching this race points to missing optimizations; but the race
will always remain, the route is always going to be configured before the
DNS in this setup so if you're only watching for the route there is a race.

> And there's still the "apt retries several times" fallback (which is why I
> do see the initial apt failure, but the retry works).

But we have all the tools at our disposal to run apt at the /right/ time,
without polling or retrying, for maximum efficiency :)

> > It's also not forwards-compatible with ipv6-only deploys.

> Right now the container network config created by lxc/lxd/netplan assumes
> IPv4 only, so let's cross that bridge when we get to it.  Indeed adding an
> alternative `ip -6 show...` would easily rectify that.

But any way you slice it, you're encoding network policy information in the
autopkgtest runner that is appropriately the domain of the network
configuration manager.  You can't know, without evil introspection, whether
you're *supposed* to have default route on ipv4, ipv6, or both.

> > I think the network-online.target is the better thing to key on.

> I still don't like that much, though:
>   -  there is no requirement that this actually gets "implemented" or even
>      started (it's a passive target)

Right, which is addressed by the explicit call to 'systemctl start'
(granted, not pretty)

>   - it's supposed to be a SysV backwards compat shim for LSB's "network"
>     dependency, and not well-defined

From my POV, the sane definition is:

 - DNS setup is complete
 - all "required" network interfaces (implementation-defined) have completed
   their configuration
 - if no network interfaces are defined to be "required", then at least one
   interface is up

This is broad enough to encompass everything from VPNs to captive portals to
proxy-only networks, and provides a clear separation of responsibilities.

I'd be happy to discuss this somewhere more on-topic than autopkgtest-devel
:-)

>   - These tools should also work with Debian containers, which in theory
>     could also run sysvinit.  This is also the reason why they still use
>     `runlevel` instead of `systemctl is-system-running` or something
>     similar.

Sure, but in principle, once you've reached runlevel 2 under sysvinit you
can rely on the network being up because that's part of the definition of
the runlevel.  So the systemd code doesn't need to have a sysvinit
equivalent.

On Mon, Feb 19, 2018 at 09:42:25AM +0000, Iain Lane wrote:
> On Fri, Feb 16, 2018 at 08:15:35PM +0100, Julian Andres Klode wrote:
> > On Fri, Feb 16, 2018 at 11:12:32AM -0800, Steve Langasek wrote:
> > > [ … ]
> > > Actually no, this is racy, because the route comes up before DNS resolution
> > > is in place.
> > > 
> > > It's also not forwards-compatible with ipv6-only deploys.

> Fair point. I could add an `ip -6' equivalent.

> > > I think the network-online.target is the better thing to key on.

> Ho hum. Well then, I've now made patches for both ways. Can you and
> pitti please decide what is actually better between you? I'll not bother
> writing any more code until then. :-)

> BTW, while I experience a network-is-not-up race most times in current
> autopkgtest, I didn't experience it at all with pitti's suggestion so at
> least for me the race is won all of the time.

Martin's approach definitely *narrows* the race, but it's still there.

> I'm not sure if network-onilne.target's semantics are enough for this
> either?

network-online.target's semantics are either a) enough or b) critically
buggy, since other software depends on them the same way I propose that
autopkgtest would.

But notably, /lib/systemd/system/systemd-resolved.service declares
'Before=network-online.target' - because of issues found and fixed when it
did not.

(That's in Ubuntu.  It appears the Debian package has this as
'Before=network.target', which is strictly before network-online.target.)

> > I think we should just grep the apt output and retry if it fails with
> > connection error messages. This should be fine until I have an improved
> > solution in apt itself, one of

> > (1) "there are no transient errors"
> > (2) one source must have updated
> > (3) all sources must have updated

> > Not sure on details. Could be an option for all three.

> autopkgtest calls apt all over the place. Some of them are covered by
> retry loops already, but I'm not super excited about hunting down the
> rest until we can do it in a clean way or, even better from my POV, if
> apt were to retry itself a few times before giving up.

> It seems sensible to me to try not do work until we think the network is
> "up" enough to contact our apt sources anyway.

+1.

Cheers,
-- 
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
slangasek at ubuntu.com                                     vorlon at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/autopkgtest-devel/attachments/20180220/ff975f61/attachment.sig>


More information about the autopkgtest-devel mailing list