Bug#800845: autopkgtest: Add support for nested VMs

Christian Seiler christian at iwakd.de
Mon Oct 5 15:36:22 UTC 2015


Hi Martin,

Am 2015-10-05 16:20, schrieb Martin Pitt:
> Christian Seiler [2015-10-04 13:51 +0200]:
>> as per our discussion on the autopkgtest mailing list [1], I'd like 
>> to
>> be able to have autopkgtest support nested VMs for testing of 
>> network
>> clients in the kernel such as NFS, CIFS, iSCSI, NBD, etc.
>
> Back then I actually had that mostly working, except that the nested
> qemu was really unstable. But in later releases QEMU got much 
> beteter,
> and your trick to set the emulated CPU to the host CPU might just be
> the remaining brick :-)

It's not really necessary to make QEMU work. If you don't set the CPU
type properly, QEMU will emulate a generic CPU without the instruction
set that's required for KVM, so the kernel inside the VM will simply
think the CPU doesn't support KVM and /dev/kvm won't be available. Then
QEMU will simply fall back to non-KVM usage, which is slower, but still
works.

Note that QEMU-inside-KVM is bad albeit still acceptable, while
QEMU-inside-QEMU is horrendous when it comes to performance.

>>  - have a copy of the qcow2 base image in the test environment
>
> OOI, is this merely an implementation detail, or do you really need
> qcow2? If we'd export this as a block device (symlink /dev/qemu_image
> -> /dev/xdb or whatever it will actually end up like), you avoid
> introducing an assumption about the "outside" image format. This 
> block
> device would be readonly, so that you can only use it in overlay mode
> for the nested QEMU.

Ok, this is completely my bad. I thought that the base image for a
qcow2 overlay had to be a qcow2 image itself - now that I've checked
that's obviously not the case. Sorry about that.

So yes, I agree that we should just let QEMU interpret the base image
and export the drive directly.

>>     Note that nested KVM will also require a module option to be 
>> set.
>>     (nested=1 for either the kvm_intel or kvm_amd module.) Setting
>>     -cpu host has no negative side effects even if nested=0 is set 
>> on
>>     the host - then the kvm_* modules will not load in the VM and 
>> KVM
>>     will simply be not available - same as without -cpu host.
>
> Where does this need to be set, on the host, or in the outer QEMU
> instance? The patch doesn't cover this bit, and it should at least be
> documented.

On the bare metal host. Just create e.g.
/etc/modprobe.d/nested_kvm.conf
with the following contents:
options kvm_intel nested=1
options kvm_amd   nested=1

Don't know about s390 virtualization (the other supported KVM platform)
and don't know about ARM (where people were talking about supporting
KVM eventually, but isn't completed yet as far as I know).

>> --- a/doc/README.package-tests.rst
>> +++ b/doc/README.package-tests.rst
>> @@ -202,6 +202,26 @@ needs-recommends
>>      Enable installation of recommended packages in apt for the test
>>      dependencies. This does not affect build dependencies.
>>
>> +needs-qcow2-baseimage
>> +    The test needs to have a read-only qcow2 base image available 
>> so it
>> +    may create an overlay and start a qemu/KVM virtual machine 
>> inside
>> +    the test environment.
>
> This is a way too specific restriction that I want to make this
> official and unchangeable API for eternity. If you want to do the 
> same
> with LXC or nspawn etc., we'd have to come up with similarly
> complicated and specific names.
>
> Something like "needs-nesting" would be more general, but it's still
> not sufficient to tell the testbed everything it needs to know, in
> particular where the base image is. With QEMU and LXC exporting a
> block device to the testbed should work. But unless you are
> super-careful, tests which make use of that will be written for one
> specific adt runner (i. e. -qemu).
>
> Can we start small with not declaring a particular restriction (and
> corresponding test bed capabilitiy), and instead the test just skips
> itself if it doesn't find $ADT_QCOW2_BASEIMAGE, or /dev/qemu_image,
> etc.?

Ok, so skip the QCOW2, see above. But having a restriction is a good
thing here, because unless I'm completely mistaken, autopkgtest will
only consider a test skipped if a restriction can't be fulfilled. This
means that if I just do "yay, exit 0" in the outer KVM if the base
image is not present, then the test will be marked as succeeded and we
won't actually get the information that the test was in fact skipped.
(If I'm wrong about that sorry, then please correct me.)

But what about "needs-qcow2-baseimage" -> "needs-baseimage" (+ not
qcow2 but raw format, see above). This guarantees that either a file or
block device is present that contains a disk image (including partition
table) that contains a base image for VMs and/or containers. With
systemd-nspwan you can actually boot disk images as containers, so I
think this is sufficiently generic.

Btw. technically you wouldn't even need to skip the test if the image
is not there, you could just install vmdebootstrap and creeate a new
image inside the VM - the only problem is that that will take forever.

>> --- a/virt-subproc/adt-virt-qemu
>> +++ b/virt-subproc/adt-virt-qemu
>> @@ -80,6 +80,8 @@ def parse_args():
>>                          help='Enable debugging output')
>>      parser.add_argument('--qemu-options',
>>                          help='Pass through arguments to QEMU 
>> command.')
>> +    parser.add_argument('--nested-qcow2-baseimage',
>> +                        help='qcow2 VM base image for use inside 
>> the VM (nested VMs)')
>
> I don't think we need this. This should be really cheap to set up, so
> just always make it available.

Ok, so the reason the option is there is not what you think.

Default behavior with my patch:

   adt-virt-qemu /path/to/image

That will always export /path/to/image as /dev/vdb (second drive) in
addition to the standard overlay being /dev/vda.

The problem that I tried to address with this option is that you can
specify multiple images with adt-virt-qemu, and all further images are
mounted read-only. And if there are additional images specified, my
patch disables the base image exporting unless one is explicitly
specified on the command line, because I don't know how those
additional images are required for the system - but I think it's a bad
idea to make it too complicated for the test, so the way I see it is
that a test should always see just a single image.

>>  def prepare_overlay():
>> @@ -274,6 +278,10 @@ EOF
>>  def make_auxverb(shared_dir):
>>      '''Create auxverb script'''
>>
>> +    envvars = ''
>> +    if args.nested_qcow2_baseimage != None:
>> +        envvars += 'export ADT_QCOW2_BASEIMAGE=/dev/vd%c ; ' % 
>> chr(ord('a') + len(args.image))
>
> As I said above, could this instead be added as an extra readonly
> -drive, and you add an extra command to provide a symlink
> /dev/qemu_image to it?

Yes, although I'd rather use something a more permanent path, so that
reboots (you might need them) don't get rid of the symlink. So maybe
we could have /lib/autopkgtest/baseimage be the canonical path? And it
would be either a file, a block device, or a symlink to either of
those.

>> --- a/virt-subproc/adt-virt-qemu
>> +++ b/virt-subproc/adt-virt-qemu
>> @@ -495,6 +495,10 @@ def hook_open():
>>
>>      if os.path.exists('/dev/kvm'):
>>          argv.append('-enable-kvm')
>> +        # emulate host CPU so that nested KVM might work (if it's
>> +        # enabled)
>> +        argv.append('-cpu')
>> +        argv.append('host')
>
> We've seen test regressions/hangs/failures because of changing CPU
> models, in things like mesa or LLVM. I wouldn't like to do this by
> default as it introduces additional
> unpredictability/unreproducibility. But turning that into an option
> and documenting that it is recommended for nesting is okay for me. (I
> don't think it's strictly necessary -- with current QEMU and kernels
> nesting seems to work reasonably well with the default emulated CPU).

Could we perhaps do it the other way round? Since Squeeze I've never
seen any problems inside a VM that wouldn't otherwise occur outside. I
also didn't have any trouble with Mesa and/or LLVM. (Although I never
did run their testsuite in a VM.) But I've been live-migrating VMs
with libvirt and KVM+QEMU between different hosts over the network
since Squeeze without so much as a hitch, so from my own experience,
QEMU and KVM is really solid. You yourself said that these issues are
(at the very least mostly) things of the past. So the default behavior
should be to have more features - and for those that specifically want
to restrict the emulated CPU may do so with an option, e.g.
adt-virt-qemu --qemu-cpu kvm64. (+ provide documentation for that)

Would that be OK?





To summarize:

  - Append -cpu host in KVM mode: I would like to make that the default
    regardless, but I'm fine with an option to override it.

  - QCOW2 format: after looking at it, I now see that you don't need to
    make sure the image is in QCOW2 format, so I'm fine with not
    specifying the format

  - symlink: completely fine with me, I just wouldn't put in in /dev,
    because that's usually a devtmpfs and won't survive reboot

  - restriction: I would really like to have it there, but since we can
    now drop the qcow2, this could easily be generic enough also for
    other runners

  - option for specifying the base image: as I said above, if only a
    single image is passed to adt-virt-qemu, my patch already does what
    you would want - what's unclear to me is if more than a single image
    is passed, what to do then... (my current patch just disables the
    functionality unless the base image is specified explicitly via an
    option)

Regards,
Christian



More information about the autopkgtest-devel mailing list