[Reproducible-commits] [source-date-epoch-spec] 01/01: SOURCE_DATE_EPOCH specification 1.0.
Chris Lamb
lamby at moszumanska.debian.org
Wed Sep 2 14:16:55 UTC 2015
This is an automated email from the git hooks/post-receive script.
lamby pushed a commit to branch master
in repository source-date-epoch-spec.
commit e8ba01d29843130250a5fa0c696509246b4c413a
Author: Chris Lamb <lamby at debian.org>
Date: Wed Sep 2 15:16:36 2015 +0100
SOURCE_DATE_EPOCH specification 1.0.
Utterly ruthless editing; best to stick to specifying, not justifying.
---
source-date-epoch-spec.xml | 362 ++++++++++++++-------------------------------
1 file changed, 108 insertions(+), 254 deletions(-)
diff --git a/source-date-epoch-spec.xml b/source-date-epoch-spec.xml
index bf04b3c..25eb93a 100644
--- a/source-date-epoch-spec.xml
+++ b/source-date-epoch-spec.xml
@@ -1,302 +1,156 @@
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
-"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
-<article id="index">
+<article>
<articleinfo>
- <authorgroup>
- <author>
- <firstname>Chris</firstname>
- <surname>Lamb</surname>
- <affiliation>
- <address>
- <email>lamby at debian.org</email>
- </address>
- </affiliation>
- </author>
- <author>
- <firstname>Ximin</firstname>
- <surname>Luo</surname>
- <affiliation>
- <address>
- <email>infinity0 at debian.org</email>
- </address>
- </affiliation>
- </author>
- </authorgroup>
-
- <title><envar>SOURCE_DATE_EPOCH</envar> specification (DRAFT)</title>
- <pubdate>27 August 2015</pubdate>
+ <title><envar>SOURCE_DATE_EPOCH</envar> specification</title>
+ <revhistory>
+ <revision>
+ <revnumber>1.0</revnumber>
+ <date>01 September 2015</date>
+ <revremark>Initial version.</revremark>
+ </revision>
+ </revhistory>
+ <author>
+ <firstname>Chris</firstname>
+ <surname>Lamb</surname>
+ <affiliation>
+ <address><email>lamby at debian.org</email></address>
+ </affiliation>
+ </author>
</articleinfo>
<sect1>
<title>Introduction</title>
<para>
- This specification defines a distribution-agnostic standard
- for build systems to convey information about the date and
- time in their build result in a reproducible manner.
+ This specification defines a distribution-agnostic standard for
+ build systems to exchange a timestamp.
+ </para>
+ <para>
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+ NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described
+ in RFC 2119.
</para>
<para>
- The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
- "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",
- "MAY", and "OPTIONAL" in this document are to be
- interpreted as described in RFC 2119.
+ The canonical URI for this document is: <ulink
+ url="https://reproducible-builds.org/specs/source-date-epoch/">https://reproducible-builds.org/specs/source-date-epoch/</ulink>.
</para>
</sect1>
<sect1>
- <title>Background</title>
- <sect2>
- <title>Reproducible builds</title>
- <para>
- Whilst anyone can inspect the source code of free
- software for malicious flaws, for reasons of
- convenience most distributions provide binary or
- "compiled" packages to their end users.
- </para>
- <para>
- The motivation behind "reproducible" or "deterministic"
- builds is to empower anyone to verify that no flaws
- have been introduced during the build process by
- promising that byte-identical binary packages are
- always generated from a given source.
- </para>
- </sect2>
- <sect2>
- <title>Why they matter</title>
- <para>
- Build processes that are reproducible help prevent
- against backdoor-introducing malware being used on
- developers' machines. Not only would an attacker need
- to insert the same backdoor on the machines of the
- developers who attempt to reproduce the build, the
- malware is now almost certain to be widely exposed
- which dramatically increases the risk to the attacker.
- Combined with diverse cross-compiling, reproducible
- builds can also detect most variations of the "Trusting
- Trust" Thompson attack.
- </para>
- <para>
- Privacy and security conscious projects such as Tor and
- Bitcoin have a clear interest in allowing their users
- to verify that the available binaries correspond to the
- published source code. Core system utilities such as
- Coreboot have similar reasons for wishing to provide
- such assurances to their users in this way.
- </para>
- </sect2>
- <sect2>
- <title>Technical advantages</title>
- <para>
- A reproducible build has other, technical, advantages:
- </para>
- <itemizedlist>
- <listitem><para>
- Detects tainted, corrupted or out-dated
- build-environments.
- </para></listitem>
- <listitem><para>
- Typically requires the removal of any non-deterministic
- and/or unsafe behaviour, such as interacting with the
- internet to obtain build-dependencies or reading from
- uninitialised memory.
- </para></listitem>
- <listitem><para>
- Removes many configuration-specific issues (eg. locale
- or timezone-related changes to behaviour) eliminating
- hard to debug problems that can be specific to a user's
- particular environment.
- </para></listitem>
- <listitem><para>
- Provides a transparent method to show that a proposed
- change to either the source or packaging toolchain has
- no impact on generated binaries.
- </para></listitem>
- <listitem><para>
- Reduces the time-to-detection of a build host
- compromise as its results can be externally validated.
- </para></listitem>
- <listitem><para>
- Provides an audit trail from a binary back to its
- source.
- </para></listitem>
- <listitem><para>
- Packages built on foreign architectures can be
- trivially validated.
- </para></listitem>
- </itemizedlist>
- </sect2>
+ <title>Motivation</title>
+ <para>
+ Whilst anyone can inspect the source code of free software for
+ malicious flaws, most distributions provide binary (or
+ "compiled") packages to end users. The motivation behind
+ "reproducible" builds is to allow verification that no flaws
+ have been introduced during this compilation process by
+ promising identical binary packages are always generated from a
+ given source.
+ </para>
+ <para>
+ This prevents against the installation of backdoor-introducing
+ malware on developers' machines as an attacker would need to
+ simultaneously infect all the developers attempting to
+ reproduce the build. In addition, a reproducible build has
+ other technical advantages:
+ </para>
+ <itemizedlist>
+ <listitem><para>
+ Requires the removal of any non-deterministic and/or unsafe
+ behaviour, eg. connecting to the internet to download
+ build-dependencies or reading from uninitialised memory
+ </para></listitem>
+ <listitem><para>
+ Detects corrupted or outdated build environments
+ </para></listitem>
+ <listitem><para>
+ Provides validation of packages built on foreign
+ architectures
+ </para></listitem>
+ <listitem><para>
+ Reduces time-to-detection of a build host compromise
+ </para></listitem>
+ <listitem><para>
+ Can show that proposed changes have no impact on binaries
+ </para></listitem>
+ </itemizedlist>
<sect2>
<title>Build timestamps</title>
<para>
- Whilst there are a large number of obstacles to a fully
- reproducible GNU/Linux or BSD distribution, many
- software packages are only unreproducible because they
- embed a build-time timestamp into generated files.
+ Software packages are often unreproducible because they
+ embed compile-time timestamps into generated files. As
+ the current time changes between builds, this results
+ in the binaries containing different contents.
+ Futhermore, these dates are unreliable indicators of
+ the software's age given that software can be
+ arbitrarily rebuilt.
</para>
<para>
- As the current time is inherently unstable across
- different builds, this results in the generated
- binaries containing different contents and are thus
- unreproducible. This embedding occurs in a wide variety
- of locations but particularly in generated documentation,
- manpages, output from <command>--help</command>, etc.
- It is also common in the metadata or headers of file
- formats such as PNG or gzip.
+ An improvement is to use the last modification time of
+ the source; if the source is then modified, the
+ binaries will change by design. This timestamp is also
+ more informative as it reflects the actual age of the
+ software and not when it was last compiled.
</para>
<para>
- One suggestion that is sometimes raised, is to have diff
- programs detect and ignore embedded timestamps. However,
- it is not feasible to develop an algorithm to do this for
- arbitrary data formats, and <emphasis>computationally
- impossible</emphasis> in the case of Turing-complete data
- formats such as executables - since the real behaviour of
- the result can easily change based on a piece of data
- embedded in the file, even if the data is itself static
- or immutable. The only way to algorithmically verify
- identical behaviour in the general case, is to enforce
- bit-for-bit identical build results, and eliminate build
- time variations even in data that is static at run time.
+ However, in the context of a distribution, the last
+ modification time is not a property of the upstream
+ source, but rather of the packaging that encapsulates
+ it.
</para>
<para>
- Historically, there have been several rationales for
- embedding the build date:
- </para>
- <itemizedlist>
- <listitem><para>
- Provides a rough indication of the age of the software.
- </para></listitem>
- <listitem><para>
- Provides some indication of the environment that was
- used for the build based on the availability of the
- build-dependencies available at that particular moment.
- </para></listitem>
- </itemizedlist>
- <para>
- However, such hints are misleading indicators of the
- information they intend to convey, since software can
- be arbitrarily rebuilt. Notably, the inaccuracy becomes
- more and more severe as time passes, which is not a good
- property to have for any program or process.
- </para>
- <para>
- Furthermore, the information is redundant in a build
- that is reproducible: if the only difference in the build
- result is the embedded build date, then this difference
- <emphasis>is</emphasis> meaningless and should be
- removed, or replaced with a meaningful date.
- </para>
- <para>
- In any case, more specific information about the build
- environment is required if users wish to reliably
- reproduce the binaries. Indeed, standards for conveying
- such metadata <emphasis>precisely</emphasis> are being
- developed elsewhere at the time of writing; but they are
- outside of the scope of this particular document.
- </para>
- </sect2>
- <sect2>
- <title>Source timestamps</title>
- <para>
- A more reliable, stable and ultimately useful value to
- embed is the timestamp representing the <emphasis>last
- modification time of the source</emphasis>. If the
- source is modified, the generated binaries will change
- by design. Additionally, this date is more informative
- for end users as it reflects the "true age" of the
- software and not merely when it was last compiled.
- </para>
- <para>
- In the context of a distribution, the last modification
- time is not a property of the upstream source itself but
- rather of the distribution's packaging that encapsulates
- it. Distributions typically have a standard repository
- where this information may be accessed easily.
- </para>
- <para>
- Many upstream build processes embed the time of the build
- since that is an easy option that approximates the more
- informative source timestamp. Many also offer no way for
- an external source to override this. This specification
- offers a solution to both of these problems.
+ This specification therefore defines a
+ distribution-agnostic standard for upstream build
+ processes to consume this timestamp from packaging
+ systems.
</para>
</sect2>
</sect1>
<sect1>
- <title>Environment variables</title>
- <para>
- We propose the following environment variables to be consumed by
- build systems, tools and wrappers.
- </para>
- <para>
- It is intended to be a universal standard and not specific to
- any particular project or distribution.
- </para>
+ <title>Specification</title>
<sect2>
<title><envar>SOURCE_DATE_EPOCH</envar></title>
<para>
A UNIX timestamp, defined as the number of seconds
(excluding leap seconds) since <computeroutput>01 Jan
- 1970 00:00:00 UTC</computeroutput> exposed through the
+ 1970 00:00:00 UTC</computeroutput>.
+ </para>
+ <para>
+ The value MUST be exported through the operating
system's usual environment mechanism.
</para>
<para>
- The value is an ASCII representation of an integer with
- no fractional component, similar to the output of
- <command>date +%s</command> in GNU coreutils.
+ The value MUST be an ASCII representation of an integer
+ with no fractional component, identically to the output
+ of <command>date +%s</command>.
</para>
<para>
- The value should be set to the time of the last
- modification time of the source. For example, in
- Debian, this would be set to the timestamp associated
- with the latest entry in
+ The value SHOULD be set to the time of the last
+ modification time of the source, incorporating any
+ packaging-specific modifications. For example, in
+ Debian, the timestamp of the latest entry in
<filename>debian/changelog</filename>.
</para>
<para>
- To adhere to this specification, upstream build processes
- MUST read and use this variable for embedded timestamps,
- <emphasis>in place of</emphasis> the "current" date and
- time of when the process is being run. Upstream MAY also
- patch any relevant descriptive text so that it refers to
- the source code's modification time instead of the build
- time, but this is not necessary for the purposes of this
- specification.
+ Upstream build processes MUST use this variable for
+ embedded timestamps in place of the "current" date and
+ time.
</para>
- <warning>
- <para>
- In addition, care should be taken to avoid timezone
- and locale-specific formatting of the value of
- <envar>SOURCE_DATE_EPOCH</envar>. Any embedded
- timezone MUST be constant at build time and SHOULD
- refer to UTC. If is deemed essential that an end user
- read this value in their own locale or timezone, this
- formatting MUST be delayed until run time.
- </para>
- </warning>
<para>
- Upstream build processes MUST NOT overwrite this variable
- (e.g. for child processes to consume) if it is already
- set (e.g. by a parent process or the user themselves).
+ Build systems MUST NOT overwrite this variable for
+ child processes to consume if it is already present.
</para>
<para>
- If the value is missing or empty, the upstream build
- process chooses its own behaviour; this situation is
- indistinguishable from one that is not following this
- specification. However, we RECOMMEND that the behaviour
- should more closely approximate the date of the last
- modification to the source code. Falling back to the
- "current" date and time of the build is NOT RECOMMENDED.
- For example, the upstream build process MAY attempt to
- automatically detect an appropriate value to set this
- variable to, by reading the time of the latest VCS commit
- or filesystem entry if there are uncommitted changes, or
- from a hard-coded value in an official release tarball.
- Child processes may then consume this variable as if
- they were following this specification themselves.
+ Formatting MUST be deferred until runtime if an end
+ user should observe the value in their own locale or
+ timezone.
</para>
<para>
- If the value is malformed, the upstream build process
- SHOULD exit with a non-zero error code.
+ If the value is malformed, the build process SHOULD
+ exit with a non-zero error code.
</para>
</sect2>
</sect1>
@@ -304,7 +158,7 @@
<sect1>
<title>Examples</title>
<para>
- A number of examples are available at <ulink
+ Examples are available at <ulink
url="https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples">https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples</ulink>.
</para>
</sect1>
@@ -312,7 +166,7 @@
<sect1>
<title>Copyright</title>
<para>
- Copyright © 2014, 2015 See Contributors List
+ Copyright © 2015 See Contributors List
</para>
<para>
Permission is hereby granted, free of charge, to any person
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/source-date-epoch-spec.git
More information about the Reproducible-commits
mailing list