Bug#851749: autopkgtest: machine-readable sub-tests within an autopkgtest-level test

Wed Jan 18 12:56:15 UTC 2017

Package: autopkgtest
Version: 4.3
Severity: wishlist

autopkgtest currently has one level of hierarchy: a test is either an
executable script in debian/tests/ named in debian/tests/control, or a
command in debian/tests/control.

There is often a finer-grained result than that available. Because
debian/tests/control is a static file in the source package, many source
packages (including those that use the pkg-perl-autopkgtest, and those
that use the GNOME installed-tests convention) have a single autopkgtest
that encapsulates multiple upstream tests. For example, see src:flatpak
(GNOME-style) and src:ikiwiki (Perl-style). There is interest in
reporting the results of thise upstream tests individually.
Ian Jackson writes:

<https://lists.debian.org/debian-devel/2017/01/msg00481.html>
> autopkgtest can report individual test failures without "failing the
> whole test suite".
>
> There is new functionality needed to be able to do this in cases where
> there are many test results run by one upstream script.

and <https://lists.debian.org/debian-devel/2017/01/msg00423.html>
> You should help enhance autopkgtest so that a single test script can
> report results of multiple test.  This will involve some new protocol
> for those test scripts.

Finer-grained than even that, many test frameworks report individual
assertions within an upstream test. GNOME and Perl both conventionally
do this via TAP <http://testanything.org/>, which has producers and
consumers in multiple languages.

I would like to propose TAP as autopkgtest's protocol for finer-grained
test result reporting, something like this:

* Tests in debian/tests/ may declare "Features: TAP". If they do, their
  stdout is expected to be TAP, and the TAP results ("ok" and "not ok"
  lines) are treated as sub-tests of the autopkgtest. If they do not,
  their stdout is assumed to be unstructured. stderr is always
  unstructured.

* Optionally, a TAP test may output sub-tests in the syntax produced by
  Test::More:

      1..3
      ok 1 - first test
          # the detailed output of the sub-test comes *first* so that
          # we can stream incomplete output
          1..2
          ok 1 - first part of second test
          ok 2 - second part of second test
      ok 2 - overall result of second test
      ok 3 - third test

  (This notation is non-standard but widely supported, for example in
  Perl Test::More, node.js node-tap, and the Jenkins TAP consumer.
  TAP consumers that do not support it will typically ignore it.
  I'm deliberately ignoring the bikeshedding about alternatives on
  https://github.com/TestAnything/Specification/issues/2 because the
  protocol that Test::More has supported since at least 2009 is
  a perfectly reasonable one.)

* A failing TAP autopkgtest must still exit nonzero or write to stderr;
  it is not correct for it to write "not ok" or "Bail out!"
  and subsequently exit 0. In practice most TAP producers seem to
  do this correctly, including Perl and GLib.

    * Optionally, we could permit exiting 0 and relying on TAP
      parsing if it declares "Restrictions: TAP" (which would be
      short for "requires TAP parsing for correctness").

* Optionally, the autopkgtest runner could have a mode to output
  TAP itself. It would have to indent TAP tests' output by 4 spaces
  to make them into sub-tests, and escape non-TAP tests' output
  (by either prepending "#" or writing it to autopkgtest's stderr)
  to avoid it invalidating the structured syntax on stdout.

Separately but somewhat relatedly, I've proposed patches for
gnome-desktop-testing (GNOME's test-runner, as used by src:flatpak for
autopkgtests) to make it output TAP; currently it has unstructured output,
and the individual tests that it runs are usually TAP. This could give us
a large number of tests with structured output relatively quickly.

Thoughts?
    S