[buildd-tools-devel] Bug#604268: Bug#604268: Bug#604268: Bug#604268: Bug#604268: QEMU linux-user support

Loïc Minier lool at dooz.org
Mon Jan 24 00:20:34 UTC 2011


On Sun, Jan 23, 2011, Roger Leigh wrote:
> This should be sufficiently portable for Linux usage.  I am a little
> concerned that it might be fragile though.  How does the binfmt-misc
> code know which file to pick?  Can't we use the same mechanism and
> avoid this?

 So binfmt-support has an update-binfmts which registers
 /usr/share/binfmt-support/* stuff into /var/lib/binfmts/*; these end up
 enabled in the running kernel (see /proc/sys/fs/binfmt_misc) via
 update-binfmts --enable which is run by /etc/init.d/binfmt-support.

 While we could try parsing binfmt-support's format or binfmt_misc's
 format, I think the best thing here would be to check with the
 binfmt-support maintainer.  I wonder whether binfmt-support is
 Debian/Ubuntu specific.  If it is, then poking the kernel format or
 trying to run a binary might be best.  If it's also used on other
 distros, perhaps we can get some command which tells us what the
 interpreter is for a specific binary.  If that makes sense to you, I
 can poke Colin about it, and perhaps open a bug report for the
 binfmt-support changes.

 (ptrace() might also allow finding out which interpreter is run, but
 that seems fragile too.)

> > > If you're unhappy with any of the names used, that's also trivial to
> > > change if you like.  (I'm fairly rubbish at naming things!)
> > 
> >  That's probably as good as what I would think of; qemu syscall
> >  emulation is usually named "CPU transparency" because it's basically a
> >  mapping of a flow of CPU instruction to another one, with syscall
> >  translation.  This is different from qemu machine emulation which
> >  emulates hardware; this is sometimes called simulation.  Upstream, the
> >  syscall emulation stuff is called "qemu-user emulation".
> 
> Would it be better to use "emulation=qemu-user" rather than just
> "qemu"?  It would allow addition of "qemu-system" at a later point,
> and also makes the distinction between the two.

 I'm happy with either; I don't know of any other implementation of this
 feature than qemu, and I can't think of any other name that we would
 use than "emulation".  So emulation=qemu is fine!  You could also name
 it emulation=binfmt-misc if we manage to get the information without
 special casing qemu  ;-)


 qemu-system-* > ah, I hope you don't mind if I share some thoughts
 here:
 * qemu-system-* can be managed much like a remote system over TCP/IP,
   with some commands to setup and teardown; I've seen software like
   Hudson deal with by having two ways of controlling slaves: a) install
   a piece of software on the slave which connects to the master and
   gets orders, intrusive but clean b) have the master connect to the
   slave over SSH and send command / receive responses over that link.
     I think having simple pre-/post-commands would be fine to deal with
   such use.

 * There are some specificities that you could exploit to tune:
   - qemu-system-* often allows you to interact with consoles of the vm;
     I think that's what qemubuilder and "rootstock" use to interact
     with the guest.  That's nifty, but very specific to this use case.
   - if your guest has virtio support, you have more efficient network
     and block device, can share memory, but you can also share
     filesystems!  This might be very efficient (more than NFS or scp),
     but again, very specific.

 * The complexity is really in getting a good qemu-system-foo
   environment working and up-to-date, including kernel, rootfs, cmdline
   opts, serial line setup etc.  This is a problem which can be solved
   separately, much like creating a chroot is a separate problem from
   using it.  In fact, even for qemu-system-arm, each machine might have
   a different boot mechanism, this board only supports booting from a
   SD image, this other only from flash, that one does not have
   networking etc.

 I found libvirt to be a really nice abstraction to control vms of
 various types; it worked fine for kvm, virtualbox and qemu based vms
 for me, and has a quite complete stack with some language bindings,
 higher level software, UIs etc.  libvirt allows defining additional
 types of vms, and I'm convinced we could define new types of vms to
 start qemu-system-arm with this arm kernel and these command-line flags
 etc.

> >  Ah it's actually qemu-kvm-extras-static in Ubuntu, not qemu-kvm-extras;
> >  I can see I'm at the origin of this bug
> > > +# Depends: file, qemu-user-static | qemu-kvm-extras
> >  Should be | qemu-kvm-extras-static
> If the package is not in Debian, maybe this one should be patched
> in when put in Ubuntu?

 It's not too good though because it means manual merge every time
 schroot is uploaded, which adds work and delays.  I could propose a
 dpkg-vendor based test to generate a ${qemu-user-suggests} or similar
 to use in control, but I think you'll find that uglier in your package
 than the above control snippet  ;-)

 (NB: ideally, Ubuntu would use the same package layout as Debian; it's
 quite complex here and involves many packages and a long story; I'm
 happy to share the details)

>                                If the path wasn't hardcoded, we could
> have put it into /usr/local or even some non-standard name or location
> if we could adjust the interpreter.  It's a shame it's looking inside
> the chroot, rather than the main root, but that's probably the only
> sane thing to do now a system can have multiple namespaces.

 Yeah, it's a complex security problem: it's a kernel service, so
 looking up things in the PATH already sounds scary, the interpreter
 also ends up in userspace memory, so I would be scared if it was
 allowed to load stuff outside of the chroot (albeit it seems useful).

> I think that clearly documenting this limitation is the most
> pragmatic approach here.  I think the chance of installing and
> using the same qemu-$arch-static binary inside the chroot is small,
> but using a diversion will be a good improvement.

 Makes complete sense.

-- 
Loïc Minier





More information about the Buildd-tools-devel mailing list