[Reproducible-commits] [reproducible-builds-howto] 01/01: Let's get the party started!

Jérémy Bobbio lunar at moszumanska.debian.org
Thu Jul 16 10:37:39 UTC 2015


This is an automated email from the git hooks/post-receive script.

lunar pushed a commit to branch master
in repository reproducible-builds-howto.

commit 456b891c1f70548f6374c6d944d9d8bff93a10df
Author: Jérémy Bobbio <lunar at debian.org>
Date:   Thu Jul 16 12:35:22 2015 +0200

    Let's get the party started!
---
 .gitignore                     |   1 +
 Makefile                       |   7 ++
 reproducible-builds-howto.mdwn | 148 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 156 insertions(+)

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..ece0978
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1 @@
+reproducible-builds-howto.html
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..153491e
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,7 @@
+all: reproducible-builds-howto.html
+
+reproducible-builds-howto.html: reproducible-builds-howto.mdwn
+	pandoc --standalone -f markdown -t html \
+		--number-sections \
+		--table-of-contents \
+		--output $@ $<
diff --git a/reproducible-builds-howto.mdwn b/reproducible-builds-howto.mdwn
new file mode 100644
index 0000000..67a68cf
--- /dev/null
+++ b/reproducible-builds-howto.mdwn
@@ -0,0 +1,148 @@
+% How to make your software build reproducibly?
+% Debian reproducible builds squad <reproducible-builds at lists.alioth.debian.org>
+% 2015-07-16
+
+The idea of “reproducible” builds is to empower anyone to verify that no flaws have been introduced during the build process by reproducing byte-for-byte identical binary packages from a given source.
+
+Achieving reproducible builds require cooperation from multiple roles involved in software production. On small projects, all these roles might be carried by a single person, but it helps to differentiate the responsibilities.
+
+In order for software to allow reproducible builds, the source code might not introduce uncontrollable variations in the build output. To enable meaningful comparison of different builds, the build environment must be reproducible, although it is not required that the toolchain[^toolchain] itself be byte-for-byte identical, as long as its output stays the same. When distributed in its binary form, the build environment used to transform the source should also be distributed, ideally in a [...]
+
+[^toolchain]: By *toolchain*, we mean any piece of software needed to create the build output.
+
+Common issues affecting source code and build systems
+=====================================================
+
+A software cannot be easily be built reproducibly if the source varies depending on factors that are hard or impossible to control like the ordering of files on a filesystem or the current time. What follows are some advices on common issues that can affect source code or build systems that makes multiple builds from the exact same source different.
+
+Volatile inputs can disappear
+-----------------------------
+
+A file on a webserver could go away anytime.
+
+If you need to rely on something the network will give you: save it in a fallback location you control, and use a cryptographic hash to verify that the content stayed the same.
+
+Stable order for inputs
+-----------------------
+
+If building your software requires processing several inputs at once, make sure the order is stable accross builds.
+
+A typical example is creating an archive from the content of a directory. Most filesystems do not guarantee that listing files in a directory will always result in the same order.
+
+Bad example[^sorted-wildcard]:
+
+    SRCS = $(wildcard *.c)
+    tool: $(SRCS:.c=.o)
+            $CC) -o $@ $^
+
+Solutions:
+
+a) List all inputs explicitely and ensure they will be processed in that order.
+
+        SRCS = util.c helper.c main.c
+        tool: $(SRCS:.c=.o)
+                $CC) -o $@ $^
+
+b) Sort inputs:
+
+        SRCS = $(sort $(wildcard *.c))
+        tool: $(SRCS:.c=.o)
+                $CC) -o $@ $^
+
+[^sorted-wildcard]: GNU Make used to sort the output of the [wildcard](https://www.gnu.org/software/make/manual/html_node/Wildcard-Function.html#Wildcard-Function) function until version 3.82.
+
+When sorting inputs, one must ensure that the sorting order is not affected by
+the system locale settings. For example, some locale will not make differences
+between uppercase and lowercase.
+
+Bad example:
+
+    $ tar -cf archive.tar src
+
+Solution:
+
+    $ find src -print0 | LC_ALL=C sort -z | tar --null -T - --no-recursion -cf archive.tar
+
+Stable order for outputs
+------------------------
+
+Data structures such as [Perl hashes](http://perldoc.perl.org/functions/keys.html), [Python dictionaries](https://docs.python.org/2/library/stdtypes.html#mapping-types-dict), or [Ruby Hash objects](http://ruby-doc.org/core/Hash.html) will list their keys in a different order on every run to limit [algorithmic complexity attacks](http://perldoc.perl.org/perlsec.html#Algorithmic-Complexity-Attacks).
+
+To get a deterministic output, the easiest way is to explicitely sort the keys. Again, watch out for the locale affecting the sort order.
+
+Bad:
+
+    foreach my $package (keys %deps) {
+        print MANIFEST, "$package: $deps[$packages]";
+    }
+
+Good:
+
+    foreach my $package (sort keys %deps) {
+        print MANIFEST, "$package: $deps[$packages]";
+    }
+
+Controlled value initialization
+-------------------------------
+
+In languages which don't initialize values, this needs to be explicitely done in order to avoid capturing what random bytes are in memory when run.
+
+[Example fix](http://review.coreboot.org/gitweb?p=coreboot.git;a=commitdiff;h=2d119a3f01eee6c4e86248b17b4c9ce14ab77836)
+
+Store compilation information separately
+----------------------------------------
+
+Any information related to the compilation process should be store in files separate from the software binary distribution. This includes information such as date and time of the build, build system hostname, path, network configuration, CPU type, memory size, environment variables.
+
+XXX: unclear
+
+Sadly, build paths are often recorded in debug information by compilers in order to locate the associated source files. This is currently hard to sanitize.
+
+Use deterministic version information
+-------------------------------------
+
+If the software needs to know version information, make them deterministic. A version number can come from a dedicated source file, a changelog, or from a version control system. If a date and time is needed, extract it from a changelog or the version control system.
+
+The date and time of the build is not useful information as one can always build an old version long after it has been released.
+
+XXX: add examples
+
+Use a known date and time
+-------------------------
+
+If you need timestamps, extract a date and time from a changelog or the version control system instead of using the current date and time.
+
+Another option is to implement support for the `SOURCE_DATE_EPOCH` environment variable. When set, its value —a number of seconds since January 1st 1970, 00:00 UTC— should be used instead of using the current date and time.
+
+Timestamps can creep in many unexpected places, but file modification times in archives might be the most common trap.
+
+Avoid true randomness
+---------------------
+
+Don't ship any data created using properly random sources. Avoid using random data if possible. If not, use a known value as the PRNG seed. For example, extract a value using a hash of some file content, a changelog or the version control system.
+
+Define environment variables affecting outputs
+----------------------------------------------
+
+Some environment variables might affect some tools output. A common example is the `LC_CTIME` environment variable which affects the format of dates.
+
+While someone building your software might want to see build error messages in their preferred language, it is better to have the output always be in the same format.
+
+Reproducible build environment
+==============================
+
+The output of a tool building software is likely to be different from one version to another. Typically, better optimizations are integrated into compilers all the time. To allow different builds to be easily compared, there must be a way to know which software versions to (re)use.
+
+Other aspects of the environment affecting the output of the compilation tools must be either set or recorded so they can be reproduced as well.
+
+XXX: explain how
+
+Distributing the build environment
+==================================
+
+XXX: explain how to tie source+build env+resulting binary and distributing it.
+
+Acknowledgements
+================
+
+David A. Wheeler and his fantastic work on [Diverse Double-Compilation](http://www.dwheeler.com/trusting-trust/).

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/reproducible-builds-howto.git



More information about the Reproducible-commits mailing list