[Pkg-ganeti-devel] Bug#782073: ganeti: Refuses to work with (extstorage) disk devices if they have a high minor number

Joerg Jaspert joerg at debian.org
Tue Apr 7 11:54:45 UTC 2015


Package: ganeti
Version: 2.11.6-1~bpo70+1
Severity: important

Dear Maintainer,

I know my setup isn't the most common one, but I don't see why it
should fail as it does.

Setup:
The hosts are blades with loads of RAM, CPU cores and attached to some
SAN storage (HP 3PAR), 2 of them for redundancy. Redundancy must be
provided by the blades, the SAN does not mirror data between its two
devices.

Each VM gets a minimum of 2 disks, where each disks translates into an
MD raid1 array on the host, which itself goes to the SAN via multipath.
That translates into loads of devices for sure, but it all works nicely
and keeps the data of the VMs in 2 seperate datacentres.

Now, we do have loads of VMs running, but a single blade is unable to
take more than 64 VMs - due to a limitation in ganeti (through its
usage of python).

The md arrays are all created (using ganetis extstorage interface) by
exclusively talking to mdadm create/assemble/... using
/dev/md/$VOL_NAME syntax (VOL_NAME being the WWN), so there is no need
to manually dig out a /dev/mdXYZ. "Something does that for us".

This leads to "the md subsystem" starting with /dev/md127, counting
down. Down to md0 (so 128 mds, AKA 64VMs) this works perfectly.

Now, gnt-instance migrate the next VM onto the blade (which has plenty
of spare resources) and ganeti refuses to work with

Tue Apr  7 12:58:17 2015 Pre-migration failed, aborting
Failure: command execution error:
Could not pre-migrate instance VMNAMEHERE: Error
while executing backend function: signed integer is greater than maximum

What happened is that the extstorage scripts nicely setup the new md
device. The automagic numbering made /dev/md/$VOL_NAME a link to
/dev/md1048575. a device which has the major of 9, and minor of 1048575:

brw-rw---T 1 root disk   9, 1048575 Apr  7 12:58 /dev/md1048575


ganetis storage scripts now seem to do an "os.minor()" on the device,
and that emits the error. Whyever it does this, i don't see why it
should be more picky than the entire system around. The kernel is
happy, the block layer is happy, md loves it, just ganeti hates it.
While it may be argued that this is pythons fault being silly here, I
think fixing ganeti to be able to still work is a good thing.

Note: The cluster has
enabled disk templates: ext, diskless
allowed disk templates: ext, diskless


-- System Information:
Debian Release: 7.8
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-0.bpo.4-amd64 (SMP w/2 CPU cores)

-- 
bye Joerg



More information about the Pkg-ganeti-devel mailing list