[Reproducible-commits] [source-date-epoch-spec] 01/01: SOURCE_DATE_EPOCH specification 1.0.

Chris Lamb lamby at moszumanska.debian.org
Wed Sep 2 14:16:55 UTC 2015


This is an automated email from the git hooks/post-receive script.

lamby pushed a commit to branch master
in repository source-date-epoch-spec.

commit e8ba01d29843130250a5fa0c696509246b4c413a
Author: Chris Lamb <lamby at debian.org>
Date:   Wed Sep 2 15:16:36 2015 +0100

    SOURCE_DATE_EPOCH specification 1.0.
    
    Utterly ruthless editing; best to stick to specifying, not justifying.
---
 source-date-epoch-spec.xml | 362 ++++++++++++++-------------------------------
 1 file changed, 108 insertions(+), 254 deletions(-)

diff --git a/source-date-epoch-spec.xml b/source-date-epoch-spec.xml
index bf04b3c..25eb93a 100644
--- a/source-date-epoch-spec.xml
+++ b/source-date-epoch-spec.xml
@@ -1,302 +1,156 @@
 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
-"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
 ]>
-<article id="index">
+<article>
 <articleinfo>
-	<authorgroup>
-		<author>
-			<firstname>Chris</firstname>
-			<surname>Lamb</surname>
-			<affiliation>
-				<address>
-					<email>lamby at debian.org</email>
-				</address>
-			</affiliation>
-		</author>
-		<author>
-			<firstname>Ximin</firstname>
-			<surname>Luo</surname>
-			<affiliation>
-				<address>
-					<email>infinity0 at debian.org</email>
-				</address>
-			</affiliation>
-		</author>
-	</authorgroup>
-
-	<title><envar>SOURCE_DATE_EPOCH</envar> specification (DRAFT)</title>
-	<pubdate>27 August 2015</pubdate>
+	<title><envar>SOURCE_DATE_EPOCH</envar> specification</title>
+	<revhistory>
+		<revision>
+		<revnumber>1.0</revnumber>
+			<date>01 September 2015</date>
+			<revremark>Initial version.</revremark>
+		</revision>
+	</revhistory>
+	<author>
+		<firstname>Chris</firstname>
+		<surname>Lamb</surname>
+		<affiliation>
+			<address><email>lamby at debian.org</email></address>
+		</affiliation>
+	</author>
 </articleinfo>
 
 <sect1>
 	<title>Introduction</title>
 	<para>
-		This specification defines a distribution-agnostic standard
-		for build systems to convey information about the date and
-		time in their build result in a reproducible manner.
+		This specification defines a distribution-agnostic standard for
+		build systems to exchange a timestamp.
+	</para>
+	<para>
+		The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+		NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+		"OPTIONAL" in this document are to be interpreted as described
+		in RFC 2119.
 	</para>
 	<para>
-		The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
-		"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",
-		"MAY", and "OPTIONAL" in this document are to be
-		interpreted as described in RFC 2119.
+		The canonical URI for this document is: <ulink
+		url="https://reproducible-builds.org/specs/source-date-epoch/">https://reproducible-builds.org/specs/source-date-epoch/</ulink>.
 	</para>
 </sect1>
 
 <sect1>
-	<title>Background</title>
-	<sect2>
-		<title>Reproducible builds</title>
-		<para>
-			Whilst anyone can inspect the source code of free
-			software for malicious flaws, for reasons of
-			convenience most distributions provide binary or
-			"compiled" packages to their end users.
-		</para>
-		<para>
-			The motivation behind "reproducible" or "deterministic"
-			builds is to empower anyone to verify that no flaws
-			have been introduced during the build process by
-			promising that byte-identical binary packages are
-			always generated from a given source.
-		</para>
-	</sect2>
-	<sect2>
-		<title>Why they matter</title>
-		<para>
-			Build processes that are reproducible help prevent
-			against backdoor-introducing malware being used on
-			developers' machines. Not only would an attacker need
-			to insert the same backdoor on the machines of the
-			developers who attempt to reproduce the build, the
-			malware is now almost certain to be widely exposed
-			which dramatically increases the risk to the attacker.
-			Combined with diverse cross-compiling, reproducible
-			builds can also detect most variations of the "Trusting
-			Trust" Thompson attack.
-		</para>
-		<para>
-			Privacy and security conscious projects such as Tor and
-			Bitcoin have a clear interest in allowing their users
-			to verify that the available binaries correspond to the
-			published source code. Core system utilities such as
-			Coreboot have similar reasons for wishing to provide
-			such assurances to their users in this way.
-		</para>
-	</sect2>
-	<sect2>
-		<title>Technical advantages</title>
-		<para>
-			A reproducible build has other, technical, advantages:
-		</para>
-		<itemizedlist>
-		<listitem><para>
-			Detects tainted, corrupted or out-dated
-			build-environments.
-		</para></listitem>
-		<listitem><para>
-			Typically requires the removal of any non-deterministic
-			and/or unsafe behaviour, such as interacting with the
-			internet to obtain build-dependencies or reading from
-			uninitialised memory.
-		</para></listitem>
-		<listitem><para>
-			Removes many configuration-specific issues (eg. locale
-			or timezone-related changes to behaviour) eliminating
-			hard to debug problems that can be specific to a user's
-			particular environment.
-		</para></listitem>
-		<listitem><para>
-			Provides a transparent method to show that a proposed
-			change to either the source or packaging toolchain has
-			no impact on generated binaries.
-		</para></listitem>
-		<listitem><para>
-			Reduces the time-to-detection of a build host
-			compromise as its results can be externally validated.
-		</para></listitem>
-		<listitem><para>
-			Provides an audit trail from a binary back to its
-			source.
-		</para></listitem>
-		<listitem><para>
-			Packages built on foreign architectures can be
-			trivially validated.
-		</para></listitem>
-		</itemizedlist>
-	</sect2>
+	<title>Motivation</title>
+	<para>
+		Whilst anyone can inspect the source code of free software for
+		malicious flaws, most distributions provide binary (or
+		"compiled") packages to end users. The motivation behind
+		"reproducible" builds is to allow verification that no flaws
+		have been introduced during this compilation process by
+		promising identical binary packages are always generated from a
+		given source.
+	</para>
+	<para>
+		This prevents against the installation of backdoor-introducing
+		malware on developers' machines as an attacker would need to
+		simultaneously infect all the developers attempting to
+		reproduce the build. In addition, a reproducible build has
+		other technical advantages:
+	</para>
+	<itemizedlist>
+	<listitem><para>
+		Requires the removal of any non-deterministic and/or unsafe
+		behaviour, eg. connecting to the internet to download
+		build-dependencies or reading from uninitialised memory
+	</para></listitem>
+	<listitem><para>
+		Detects corrupted or outdated build environments
+	</para></listitem>
+	<listitem><para>
+		Provides validation of packages built on foreign
+		architectures
+	</para></listitem>
+	<listitem><para>
+		Reduces time-to-detection of a build host compromise
+	</para></listitem>
+	<listitem><para>
+		Can show that proposed changes have no impact on binaries
+	</para></listitem>
+	</itemizedlist>
 	<sect2>
 		<title>Build timestamps</title>
 		<para>
-			Whilst there are a large number of obstacles to a fully
-			reproducible GNU/Linux or BSD distribution, many
-			software packages are only unreproducible because they
-			embed a build-time timestamp into generated files.
+			Software packages are often unreproducible because they
+			embed compile-time timestamps into generated files. As
+			the current time changes between builds, this results
+			in the binaries containing different contents.
+			Futhermore, these dates are unreliable indicators of
+			the software's age given that software can be
+			arbitrarily rebuilt.
 		</para>
 		<para>
-			As the current time is inherently unstable across
-			different builds, this results in the generated
-			binaries containing different contents and are thus
-			unreproducible. This embedding occurs in a wide variety
-			of locations but particularly in generated documentation,
-			manpages, output from <command>--help</command>, etc.
-			It is also common in the metadata or headers of file
-			formats such as PNG or gzip.
+			An improvement is to use the last modification time of
+			the source; if the source is then modified, the
+			binaries will change by design. This timestamp is also
+			more informative as it reflects the actual age of the
+			software and not when it was last compiled.
 		</para>
 		<para>
-			One suggestion that is sometimes raised, is to have diff
-			programs detect and ignore embedded timestamps. However,
-			it is not feasible to develop an algorithm to do this for
-			arbitrary data formats, and <emphasis>computationally
-			impossible</emphasis> in the case of Turing-complete data
-			formats such as executables - since the real behaviour of
-			the result can easily change based on a piece of data
-			embedded in the file, even if the data is itself static
-			or immutable. The only way to algorithmically verify
-			identical behaviour in the general case, is to enforce
-			bit-for-bit identical build results, and eliminate build
-			time variations even in data that is static at run time.
+			However, in the context of a distribution, the last
+			modification time is not a property of the upstream
+			source, but rather of the packaging that encapsulates
+			it.
 		</para>
 		<para>
-			Historically, there have been several rationales for
-			embedding the build date:
-		</para>
-		<itemizedlist>
-		<listitem><para>
-			Provides a rough indication of the age of the software.
-		</para></listitem>
-		<listitem><para>
-			Provides some indication of the environment that was
-			used for the build based on the availability of the
-			build-dependencies available at that particular moment.
-		</para></listitem>
-		</itemizedlist>
-		<para>
-			However, such hints are misleading indicators of the
-			information they intend to convey, since software can
-			be arbitrarily rebuilt. Notably, the inaccuracy becomes
-			more and more severe as time passes, which is not a good
-			property to have for any program or process.
-		</para>
-		<para>
-			Furthermore, the information is redundant in a build
-			that is reproducible: if the only difference in the build
-			result is the embedded build date, then this difference
-			<emphasis>is</emphasis> meaningless and should be
-			removed, or replaced with a meaningful date.
-		</para>
-		<para>
-			In any case, more specific information about the build
-			environment is required if users wish to reliably
-			reproduce the binaries. Indeed, standards for conveying
-			such metadata <emphasis>precisely</emphasis> are being
-			developed elsewhere at the time of writing; but they are
-			outside of the scope of this particular document.
-		</para>
-	</sect2>
-	<sect2>
-		<title>Source timestamps</title>
-		<para>
-			A more reliable, stable and ultimately useful value to
-			embed is the timestamp representing the <emphasis>last
-			modification time of the source</emphasis>.  If the
-			source is modified, the generated binaries will change
-			by design. Additionally, this date is more informative
-			for end users as it reflects the "true age" of the
-			software and not merely when it was last compiled.
-		</para>
-		<para>
-			In the context of a distribution, the last modification
-			time is not a property of the upstream source itself but
-			rather of the distribution's packaging that encapsulates
-			it. Distributions typically have a standard repository
-			where this information may be accessed easily.
-		</para>
-		<para>
-			Many upstream build processes embed the time of the build
-			since that is an easy option that approximates the more
-			informative source timestamp. Many also offer no way for
-			an external source to override this. This specification
-			offers a solution to both of these problems.
+			This specification therefore defines a
+			distribution-agnostic standard for upstream build
+			processes to consume this timestamp from packaging
+			systems.
 		</para>
 	</sect2>
 </sect1>
 
 <sect1>
-	<title>Environment variables</title>
-	<para>
-		We propose the following environment variables to be consumed by
-		build systems, tools and wrappers.
-	</para>
-	<para>
-		It is intended to be a universal standard and not specific to
-		any particular project or distribution.
-	</para>
+	<title>Specification</title>
 	<sect2>
 		<title><envar>SOURCE_DATE_EPOCH</envar></title>
 		<para>
 			A UNIX timestamp, defined as the number of seconds
 			(excluding leap seconds) since <computeroutput>01 Jan
-			1970 00:00:00 UTC</computeroutput> exposed through the
+			1970 00:00:00 UTC</computeroutput>.
+		</para>
+		<para>
+			The value MUST be exported through the operating
 			system's usual environment mechanism.
 		</para>
 		<para>
-			The value is an ASCII representation of an integer with
-			no fractional component, similar to the output of
-			<command>date +%s</command> in GNU coreutils.
+			The value MUST be an ASCII representation of an integer
+			with no fractional component, identically to the output
+			of <command>date +%s</command>.
 		</para>
 		<para>
-			The value should be set to the time of the last
-			modification time of the source. For example, in
-			Debian, this would be set to the timestamp associated
-			with the latest entry in
+			The value SHOULD be set to the time of the last
+			modification time of the source, incorporating any
+			packaging-specific modifications. For example, in
+			Debian, the timestamp of the latest entry in
 			<filename>debian/changelog</filename>.
 		</para>
 		<para>
-			To adhere to this specification, upstream build processes
-			MUST read and use this variable for embedded timestamps,
-			<emphasis>in place of</emphasis> the "current" date and
-			time of when the process is being run. Upstream MAY also
-			patch any relevant descriptive text so that it refers to
-			the source code's modification time instead of the build
-			time, but this is not necessary for the purposes of this
-			specification.
+			Upstream build processes MUST use this variable for
+			embedded timestamps in place of the "current" date and
+			time.
 		</para>
-		<warning>
-			<para>
-				In addition, care should be taken to avoid timezone
-				and locale-specific formatting of the value of
-				<envar>SOURCE_DATE_EPOCH</envar>. Any embedded
-				timezone MUST be constant at build time and SHOULD
-				refer to UTC. If is deemed essential that an end user
-				read this value in their own locale or timezone, this
-				formatting MUST be delayed until run time.
-			</para>
-		</warning>
 		<para>
-			Upstream build processes MUST NOT overwrite this variable
-			(e.g. for child processes to consume) if it is already
-			set (e.g. by a parent process or the user themselves).
+			Build systems MUST NOT overwrite this variable for
+			child processes to consume if it is already present.
 		</para>
 		<para>
-			If the value is missing or empty, the upstream build
-			process chooses its own behaviour; this situation is
-			indistinguishable from one that is not following this
-			specification. However, we RECOMMEND that the behaviour
-			should more closely approximate the date of the last
-			modification to the source code. Falling back to the
-			"current" date and time of the build is NOT RECOMMENDED.
-			For example, the upstream build process MAY attempt to
-			automatically detect an appropriate value to set this
-			variable to, by reading the time of the latest VCS commit
-			or filesystem entry if there are uncommitted changes, or
-			from a hard-coded value in an official release tarball.
-			Child processes may then consume this variable as if
-			they were following this specification themselves.
+			Formatting MUST be deferred until runtime if an end
+			user should observe the value in their own locale or
+			timezone.
 		</para>
 		<para>
-			If the value is malformed, the upstream build process
-			SHOULD exit with a non-zero error code.
+			If the value is malformed, the build process SHOULD
+			exit with a non-zero error code.
 		</para>
 	</sect2>
 </sect1>
@@ -304,7 +158,7 @@
 <sect1>
 	<title>Examples</title>
 	<para>
-		A number of examples are available at <ulink
+		Examples are available at <ulink
 		url="https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples">https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples</ulink>.
 	</para>
 </sect1>
@@ -312,7 +166,7 @@
 <sect1>
 	<title>Copyright</title>
 	<para>
-		Copyright © 2014, 2015 See Contributors List
+		Copyright © 2015 See Contributors List
 	</para>
 	<para>
 		Permission is hereby granted, free of charge, to any person

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/source-date-epoch-spec.git



More information about the Reproducible-commits mailing list