[Reproducible-commits] [source-date-epoch-spec] 02/02: Large number of updates

Chris Lamb lamby at moszumanska.debian.org
Thu Aug 27 13:13:08 UTC 2015


This is an automated email from the git hooks/post-receive script.

lamby pushed a commit to branch master
in repository source-date-epoch-spec.

commit fb0281f52ec55e0b0c7662c2d1809ee560240080
Author: Chris Lamb <lamby at debian.org>
Date:   Thu Aug 27 15:10:21 2015 +0200

    Large number of updates
---
 source-date-epoch-spec.xml | 355 +++++++++++++++++++++++----------------------
 1 file changed, 185 insertions(+), 170 deletions(-)

diff --git a/source-date-epoch-spec.xml b/source-date-epoch-spec.xml
index ba1ec71..f4b3b96 100644
--- a/source-date-epoch-spec.xml
+++ b/source-date-epoch-spec.xml
@@ -15,42 +15,37 @@
 		</author>
 	</authorgroup>
 
-	<title>SOURCE_DATE_EPOCH specification (DRAFT)</title>
-	<pubdate>22 August 2015</pubdate>
+	<title><envar>SOURCE_DATE_EPOCH</envar> specification (DRAFT)</title>
+	<pubdate>27 August 2015</pubdate>
 </articleinfo>
 
 <sect1>
-	<title>Warning: this specification is unfinished work at the moment</title>
-	<para>
-		This specification was started to be written at the just
-		finished DebConf15 conference in Heidelberg in late August
-		2015. We expect it to be finished in a few days or maybe weeks.
-		Please don't refer to it yet.
-	</para>
-</sect1>
-
-<sect1>
-	<title>Reproducible builds</title>
-	<para>
-		Whilst anyone can inspect the source code of free software for
-		malicious flaws, for reasons of convenience most distributions
-		provide binary or "compiled" packages to their end users.
-	</para>
-	<para>
-		The idea of "reproducible" or "deterministic" builds is to
-		empower anyone to verify that no flaws have been introduced
-		during the build process by reproducing byte-for-byte identical
-		binary packages from a given source.
-	</para>
+	<title>Background</title>
+	<sect2>
+		<title>Reproducible builds</title>
+		<para>
+			Whilst anyone can inspect the source code of free
+			software for malicious flaws, for reasons of
+			convenience most distributions provide binary or
+			"compiled" packages to their end users.
+		</para>
+		<para>
+			The motivation behind "reproducible" or "deterministic"
+			builds is to empower anyone to verify that no flaws
+			have been introduced during the build process by
+			promising that byte-identical binary packages are
+			always generated from a given source.
+		</para>
+	</sect2>
 	<sect2>
 		<title>Why they matter</title>
 		<para>
 			Build processes that are reproducible help prevent
-			against backdoor-introducing malware being installed on
-			developers' machines. Not only does an attacker need
+			against backdoor-introducing malware being used on
+			developers' machines. Not only would an attacker need
 			to insert the same backdoor on the machines of the
-			developers who are attempting to reproduce the build,
-			the malware is now almost certain to be widely exposed
+			developers who attempt to reproduce the build, the
+			malware is now almost certain to be widely exposed
 			which dramatically increases the risk to the attacker.
 			Combined with diverse cross-compiling, reproducible
 			builds can also detect most variations of the "Trusting
@@ -58,26 +53,29 @@
 		</para>
 		<para>
 			Privacy and security conscious projects such as Tor and
-			Bitcoin have a clear interest in allowing their
-			users to verify that the available binaries correspond
-			to the published source code. Core system utilities
-			such as Coreboot have similar reasons for wishing to
-			provide such assurances to their users in this way.
+			Bitcoin have a clear interest in allowing their users
+			to verify that the available binaries correspond to the
+			published source code. Core system utilities such as
+			Coreboot have similar reasons for wishing to provide
+			such assurances to their users in this way.
 		</para>
 	</sect2>
 	<sect2>
 		<title>Technical advantages</title>
 		<para>
-			There are other—technical—reasons for
-			adopting reproducible builds:
+			There are other, technical, reasons for adopting
+			reproducible builds:
 		</para>
 		<itemizedlist>
 		<listitem><para>
-			Encourages the removal of unreliable and/or
-			non-deterministic software behaviour.
+			Detects tainted, corrupted or out-dated
+			build-environments.
 		</para></listitem>
 		<listitem><para>
-			Detects tainted or corrupted build-environments.
+			Typically requires the removal of any non-deterministic
+			and/or unsafe behaviour, such as interacting with the
+			internet to obtain build-dependencies or reading from
+			uninitialised memory.
 		</para></listitem>
 		<listitem><para>
 			Removes many configuration-specific issues (eg. locale
@@ -86,170 +84,167 @@
 			particular environment.
 		</para></listitem>
 		<listitem><para>
-			Provides a convenient means to show that changes or
-			improvements to the source or packaging have no impact
-			on the generated binaries.
+			Provides a transparent method to show that a proposed
+			change to either the source or packaging toolchain has
+			no impact on generated binaries.
 		</para></listitem>
 		<listitem><para>
-			Implicitly provides an audit trail from a binary back
-			to the source.
+			Reduces the detection time of a build host compromise
+			as their results can be externally validated.
 		</para></listitem>
 		<listitem><para>
-			Validation of cross-architecture built packages.
+			Provides an audit trail from a binary back to its
+			source.
 		</para></listitem>
 		<listitem><para>
-			Reduces the impact of a centralised build host
-			compromise as their results can be externally
-			validated.
+			Validation of packages built on foreign architectures.
 		</para></listitem>
 		</itemizedlist>
 	</sect2>
-</sect1>
-
-<sect1>
-	<title>Current problems with timestamps</title>
-	<para>
-		Whilst there are a multitude technical obstacles to a fully
-		reproducible software distribution, many software packages are
-		unreprodubible simply because they embed the current build-time
-		timestamp into the generated binaries. As the current time is
-		implicitly unstable across different builds, this ensures that
-		the generated binaries contain different contents depending on
-		when they were built.
-	</para>
-	<para>
-		There are several rationales for embedding the build date:
-	</para>
-	<itemizedlist>
-	<listitem><para>
-		FIXME: it gives "some indication" of the age of the software.
-		However, this becomes basically redundant with reproducible
-		builds, as the whole point of reproducible builds is that the
-		build result will be exactly the same no matter when it was
-		built. To phrase this differently: if the only difference in
-		the build result is the embedded build date, then this
-		difference is meaningless and should be removed, or replaced
-		with a meaningful date.
-	</para></listitem>
-	<listitem><para>
-		FIXME: it gives "some indication" of the build environment
-		(e.g. age of the build dependencies?). But with reproducible
-		builds, there is no need to guess which build environment has
-		been used, based on a timestamp. To allow users to reproduce
-		binaries, the build environment is either known in advance, or
-		recorded (e.g. in .buildinfo files).
-	</para></listitem>
-	</itemizedlist>
-	<para>
-		FIXME: why timestamps become meaningless and/or misleading if
-		the software is reproducible.
-	</para>
-	<para>
-		Generally, a better solution is to embed the date of
-		the last modification to the source code. This proposal
-		attempts to define some standards for tools to operate,
-		based on this principle.
-	</para>
+	<sect2>
+		<title>Build timestamps</title>
+		<para>
+			Whilst there are a large number of obstacles to a fully
+			reproducible GNU/Linux or BSD distribution, many
+			software packages are only unreprodubible because they
+			embed a build-time timestamp into generated files.
+		</para>
+		<para>
+			As the current time is implicitly unstable across
+			different builds, this results in the generated
+			binaries containing different contents and are thus
+			unreproducible. This embedding occurs in a wide variety
+			of locations but particularly in generated documentation,
+			manpages, output from <command>--help</command>, etc.
+			It is also common in the metadata or headers of file
+			formats such as PNG or GZip.
+		</para>
+		<para>
+			Historically, there have been several rationales for
+			embedding the build date:
+		</para>
+		<itemizedlist>
+		<listitem><para>
+			It provides the age of the software.
+		</para></listitem>
+		<listitem><para>
+			It suggests the environment that was used based on the
+			availability of the build-dependencies available at the
+			specified time.
+		</para></listitem>
+		</itemizedlist>
+		<para>
+			However, these are not only unreliable indicators of
+			age given that software can be arbitrarily rebuilt,
+			they are redundant or meaningless in a build that is
+			reproducible given it will always build identically. In
+			any case, more-specific information about the build
+			environment is required if users wish to reliably
+			reproduce the binaries.
+		</para>
+		<para>
+			The current timestamp is therefore not only an
+			impediment to a reproducible build, it is incomplete,
+			misleading and offering little useful infomation to end
+			users or developers.
+		</para>
+	</sect2>
+	<sect2>
+		<title>Source timestamps</title>
+		<para>
+			A more reliable, stable and ultimately useful value to
+			embed is the timestamp representing the <emphasis>last
+			modification time of the source</emphasis>.  If the
+			source is modified, the generated binaries will change
+			by design. Additionally, this date is more informative
+			for end users as it reflects the "true age" of the
+			software and not merely when it was last compiled.
+		</para>
+		<para>
+			In the context of a distribution, the last modification
+			time is not a property of upstream source itself but
+			rather of a distributions' packaging that encapsulates
+			it. Ensuring this outer timestamp is used by the
+			underling build system often requires cumbersome and
+			distribution-specific changes.
+		</para>
+		<para>
+			This proposal therefore attempts to define a
+			distribution-agnostic standard for build systems to
+			exchange such a timestamp in a uniform manner.
+		</para>
+	</sect2>
 </sect1>
 
 <sect1>
 	<title>Proposal</title>
 	<para>
-		We propose the following build-time environment variable to be
-		consumed by build systems and for it to be used in place of the
-		"current" time and date.
+		We propose the following environment variable to be consumed by
+		build systems, tools and wrappers and for it to be used in
+		place of the "current" date and time.
 	</para>
 	<para>
 		It is intended to be a universal standard and not specific to
 		any particular project or distribution.
 	</para>
 	<sect2>
-		<title>SOURCE_DATE_EPOCH</title>
+		<title><envar>SOURCE_DATE_EPOCH</envar></title>
 		<para>
-			A UNIX timestamp, defined as the number of seconds
-			(excluding leap seconds) since 01 Jan 1970 00:00 UTC.
+			A UNIX timestamp defined as the number of seconds
+			(excluding leap seconds) since <computeroutput>01 Jan
+			1970 00:00 UTC</computeroutput> exposed through the
+			system's usual environment mechanism.
 		</para>
 		<para>
-			The value is an integer with no fractional component,
-			similar to that output by "date +%s". The value is
-			independent of timezones.
+			The value is an ASCII representation of integer with no
+			fractional component, similar to the output of
+			<command>date +%s</command>.
 		</para>
 		<para>
-			The actual value should be set to the time of the last
+			The value should be set to the time of the last
 			modification time of the source. For example, in
-			Debian, this is automatically set to the same time as
-			the latest entry in debian/changelog.
+			Debian, this would be set to the timestamp associated
+			with the latest entry in
+			<filename>debian/changelog</filename>.
 		</para>
 		<para>
 			Upstream build processes are encouraged to read and use
 			this variable in place of any embedded timestamps.
 		</para>
-	</sect2>
 
-	<sect2>
-		<title>Rationale</title>
-		<para>
-			We deliberate do not specify anything resembling a
-			"time zone". Developing such a standard would require
-			consideration of various issues:
-		</para>
-		<itemizedlist>
-		<listitem><para>
-			Intuitive and naive ways of handling human-readable
-			dates, such as the POSIX date functions are highly
-			flawed and freely mix implicit not-well-defined
-			calendars with absolute time. For example, they don't
-			specify they mean the Gregorian calendar, and/or don't
-			specify what to do with dates before when the Gregorian
-			calendar was introduced, or use named time zones that
-			require an up-to-date timezone database (e.g. with
-			historical DST definitions) to parse properly.
-		</para></listitem>
-		<listitem><para>
-			Since this is meant to be a universal standard that all
-			tools and distributions can support, we need to keep
-			things simple and precise, so that different groups of
-			people cannot accidentally interpret it in different
-			ways. So it is probably unwise to try to standardise
-			anything that resembles a named time zone, since that
-			is very very complex.
-		</para></listitem>
-		</itemizedlist>
+		<warning>
+			<para>
+				Code should not require the presence of the
+				variable so that it can be built outside of a
+				context where it is provided. Falling back to
+				the "current" time may be acceptable behaviour
+				if the variable is missing or malformed.
+			</para>
+			<para>
+				In addition, care should be taken to avoid
+				timezone and locale-specific formatting of the
+				value of <envar>SOURCE_DATE_EPOCH</envar>. If
+				it is deemed essential that an end-user can
+				see this timestamp in their own locale or
+				timezone, this formatting must be delayed until
+				run-time.
+			</para>
+		</warning>
 	</sect2>
 </sect1>
 
 <sect1>
-	<title>Implementation issues</title>
-	<itemizedlist>
-	<listitem><para>
-		Care should be taken to avoid timezone and locale-specific
-		formatting of SOURCE_DATE_EPOCH during build-time. If it is
-		deemed essential that an end-user can read this timestamp in
-		their own locale and timezone, this formatting must be delayed
-		until run-time.
-	</para></listitem>
-	<listitem><para>
-		Code should avoid requiring the presence of the variable so
-		that it can still be built outside of a context where this
-		variable is available. Depending on the context, falling back
-		to the "current" time may be acceptable behaviour if the
-		variable is missing.
-	</para></listitem>
-	</itemizedlist>
-
-	<sect2>
-		<title>Examples</title>
-		<para>
-			A number of examples are available at: <ulink
-			url="https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples">https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples</ulink>.
-		</para>
-	</sect2>
+	<title>Examples</title>
+	<para>
+		A number of examples are available at <ulink
+		url="https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples">https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples</ulink>.
+	</para>
 </sect1>
 
 <sect1>
 	<title>Copyright</title>
 	<para>
-		Copyright (C) 2014, 2015 See Contributors List
+		Copyright © 2014, 2015 See Contributors List
 	</para>
 	<para>
 		Permission is hereby granted, free of charge, to any person
@@ -280,15 +275,35 @@
 <sect1>
 	<title>Contributors</title>
 
-	<para>Axel Beckert</para>
-	<para>Chris Lamb (lamby)</para>
-	<para>Chris West (Faux)</para>
-	<para>Dmitry Shachnev</para>
-	<para>Eduard Sanou</para>
-	<para>Holger Levsen</para>
-	<para>Jérémy Bobbio (Lunar)</para>
-	<para>Mattia Rizzolo</para>
-	<para>Ximin Luo</para>
+	<itemizedlist>
+	<listitem><para>
+		Axel Beckert
+	</para></listitem>
+	<listitem><para>
+		Chris Lamb (lamby)
+	</para></listitem>
+	<listitem><para>
+		Chris West (Faux)
+	</para></listitem>
+	<listitem><para>
+		Dmitry Shachnev
+	</para></listitem>
+	<listitem><para>
+		Eduard Sanou
+	</para></listitem>
+	<listitem><para>
+		Holger Levsen
+	</para></listitem>
+	<listitem><para>
+		Jérémy Bobbio (Lunar)
+	</para></listitem>
+	<listitem><para>
+		Mattia Rizzolo
+	</para></listitem>
+	<listitem><para>
+		Ximin Luo
+	</para></listitem>
+	</itemizedlist>
 </sect1>
 
 </article>

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/source-date-epoch-spec.git



More information about the Reproducible-commits mailing list