[Reproducible-commits] [source-date-epoch-spec] 02/02: Large number of updates
Chris Lamb
lamby at moszumanska.debian.org
Thu Aug 27 13:13:08 UTC 2015
This is an automated email from the git hooks/post-receive script.
lamby pushed a commit to branch master
in repository source-date-epoch-spec.
commit fb0281f52ec55e0b0c7662c2d1809ee560240080
Author: Chris Lamb <lamby at debian.org>
Date: Thu Aug 27 15:10:21 2015 +0200
Large number of updates
---
source-date-epoch-spec.xml | 355 +++++++++++++++++++++++----------------------
1 file changed, 185 insertions(+), 170 deletions(-)
diff --git a/source-date-epoch-spec.xml b/source-date-epoch-spec.xml
index ba1ec71..f4b3b96 100644
--- a/source-date-epoch-spec.xml
+++ b/source-date-epoch-spec.xml
@@ -15,42 +15,37 @@
</author>
</authorgroup>
- <title>SOURCE_DATE_EPOCH specification (DRAFT)</title>
- <pubdate>22 August 2015</pubdate>
+ <title><envar>SOURCE_DATE_EPOCH</envar> specification (DRAFT)</title>
+ <pubdate>27 August 2015</pubdate>
</articleinfo>
<sect1>
- <title>Warning: this specification is unfinished work at the moment</title>
- <para>
- This specification was started to be written at the just
- finished DebConf15 conference in Heidelberg in late August
- 2015. We expect it to be finished in a few days or maybe weeks.
- Please don't refer to it yet.
- </para>
-</sect1>
-
-<sect1>
- <title>Reproducible builds</title>
- <para>
- Whilst anyone can inspect the source code of free software for
- malicious flaws, for reasons of convenience most distributions
- provide binary or "compiled" packages to their end users.
- </para>
- <para>
- The idea of "reproducible" or "deterministic" builds is to
- empower anyone to verify that no flaws have been introduced
- during the build process by reproducing byte-for-byte identical
- binary packages from a given source.
- </para>
+ <title>Background</title>
+ <sect2>
+ <title>Reproducible builds</title>
+ <para>
+ Whilst anyone can inspect the source code of free
+ software for malicious flaws, for reasons of
+ convenience most distributions provide binary or
+ "compiled" packages to their end users.
+ </para>
+ <para>
+ The motivation behind "reproducible" or "deterministic"
+ builds is to empower anyone to verify that no flaws
+ have been introduced during the build process by
+ promising that byte-identical binary packages are
+ always generated from a given source.
+ </para>
+ </sect2>
<sect2>
<title>Why they matter</title>
<para>
Build processes that are reproducible help prevent
- against backdoor-introducing malware being installed on
- developers' machines. Not only does an attacker need
+ against backdoor-introducing malware being used on
+ developers' machines. Not only would an attacker need
to insert the same backdoor on the machines of the
- developers who are attempting to reproduce the build,
- the malware is now almost certain to be widely exposed
+ developers who attempt to reproduce the build, the
+ malware is now almost certain to be widely exposed
which dramatically increases the risk to the attacker.
Combined with diverse cross-compiling, reproducible
builds can also detect most variations of the "Trusting
@@ -58,26 +53,29 @@
</para>
<para>
Privacy and security conscious projects such as Tor and
- Bitcoin have a clear interest in allowing their
- users to verify that the available binaries correspond
- to the published source code. Core system utilities
- such as Coreboot have similar reasons for wishing to
- provide such assurances to their users in this way.
+ Bitcoin have a clear interest in allowing their users
+ to verify that the available binaries correspond to the
+ published source code. Core system utilities such as
+ Coreboot have similar reasons for wishing to provide
+ such assurances to their users in this way.
</para>
</sect2>
<sect2>
<title>Technical advantages</title>
<para>
- There are other—technical—reasons for
- adopting reproducible builds:
+ There are other, technical, reasons for adopting
+ reproducible builds:
</para>
<itemizedlist>
<listitem><para>
- Encourages the removal of unreliable and/or
- non-deterministic software behaviour.
+ Detects tainted, corrupted or out-dated
+ build-environments.
</para></listitem>
<listitem><para>
- Detects tainted or corrupted build-environments.
+ Typically requires the removal of any non-deterministic
+ and/or unsafe behaviour, such as interacting with the
+ internet to obtain build-dependencies or reading from
+ uninitialised memory.
</para></listitem>
<listitem><para>
Removes many configuration-specific issues (eg. locale
@@ -86,170 +84,167 @@
particular environment.
</para></listitem>
<listitem><para>
- Provides a convenient means to show that changes or
- improvements to the source or packaging have no impact
- on the generated binaries.
+ Provides a transparent method to show that a proposed
+ change to either the source or packaging toolchain has
+ no impact on generated binaries.
</para></listitem>
<listitem><para>
- Implicitly provides an audit trail from a binary back
- to the source.
+ Reduces the detection time of a build host compromise
+ as their results can be externally validated.
</para></listitem>
<listitem><para>
- Validation of cross-architecture built packages.
+ Provides an audit trail from a binary back to its
+ source.
</para></listitem>
<listitem><para>
- Reduces the impact of a centralised build host
- compromise as their results can be externally
- validated.
+ Validation of packages built on foreign architectures.
</para></listitem>
</itemizedlist>
</sect2>
-</sect1>
-
-<sect1>
- <title>Current problems with timestamps</title>
- <para>
- Whilst there are a multitude technical obstacles to a fully
- reproducible software distribution, many software packages are
- unreprodubible simply because they embed the current build-time
- timestamp into the generated binaries. As the current time is
- implicitly unstable across different builds, this ensures that
- the generated binaries contain different contents depending on
- when they were built.
- </para>
- <para>
- There are several rationales for embedding the build date:
- </para>
- <itemizedlist>
- <listitem><para>
- FIXME: it gives "some indication" of the age of the software.
- However, this becomes basically redundant with reproducible
- builds, as the whole point of reproducible builds is that the
- build result will be exactly the same no matter when it was
- built. To phrase this differently: if the only difference in
- the build result is the embedded build date, then this
- difference is meaningless and should be removed, or replaced
- with a meaningful date.
- </para></listitem>
- <listitem><para>
- FIXME: it gives "some indication" of the build environment
- (e.g. age of the build dependencies?). But with reproducible
- builds, there is no need to guess which build environment has
- been used, based on a timestamp. To allow users to reproduce
- binaries, the build environment is either known in advance, or
- recorded (e.g. in .buildinfo files).
- </para></listitem>
- </itemizedlist>
- <para>
- FIXME: why timestamps become meaningless and/or misleading if
- the software is reproducible.
- </para>
- <para>
- Generally, a better solution is to embed the date of
- the last modification to the source code. This proposal
- attempts to define some standards for tools to operate,
- based on this principle.
- </para>
+ <sect2>
+ <title>Build timestamps</title>
+ <para>
+ Whilst there are a large number of obstacles to a fully
+ reproducible GNU/Linux or BSD distribution, many
+ software packages are only unreprodubible because they
+ embed a build-time timestamp into generated files.
+ </para>
+ <para>
+ As the current time is implicitly unstable across
+ different builds, this results in the generated
+ binaries containing different contents and are thus
+ unreproducible. This embedding occurs in a wide variety
+ of locations but particularly in generated documentation,
+ manpages, output from <command>--help</command>, etc.
+ It is also common in the metadata or headers of file
+ formats such as PNG or GZip.
+ </para>
+ <para>
+ Historically, there have been several rationales for
+ embedding the build date:
+ </para>
+ <itemizedlist>
+ <listitem><para>
+ It provides the age of the software.
+ </para></listitem>
+ <listitem><para>
+ It suggests the environment that was used based on the
+ availability of the build-dependencies available at the
+ specified time.
+ </para></listitem>
+ </itemizedlist>
+ <para>
+ However, these are not only unreliable indicators of
+ age given that software can be arbitrarily rebuilt,
+ they are redundant or meaningless in a build that is
+ reproducible given it will always build identically. In
+ any case, more-specific information about the build
+ environment is required if users wish to reliably
+ reproduce the binaries.
+ </para>
+ <para>
+ The current timestamp is therefore not only an
+ impediment to a reproducible build, it is incomplete,
+ misleading and offering little useful infomation to end
+ users or developers.
+ </para>
+ </sect2>
+ <sect2>
+ <title>Source timestamps</title>
+ <para>
+ A more reliable, stable and ultimately useful value to
+ embed is the timestamp representing the <emphasis>last
+ modification time of the source</emphasis>. If the
+ source is modified, the generated binaries will change
+ by design. Additionally, this date is more informative
+ for end users as it reflects the "true age" of the
+ software and not merely when it was last compiled.
+ </para>
+ <para>
+ In the context of a distribution, the last modification
+ time is not a property of upstream source itself but
+ rather of a distributions' packaging that encapsulates
+ it. Ensuring this outer timestamp is used by the
+ underling build system often requires cumbersome and
+ distribution-specific changes.
+ </para>
+ <para>
+ This proposal therefore attempts to define a
+ distribution-agnostic standard for build systems to
+ exchange such a timestamp in a uniform manner.
+ </para>
+ </sect2>
</sect1>
<sect1>
<title>Proposal</title>
<para>
- We propose the following build-time environment variable to be
- consumed by build systems and for it to be used in place of the
- "current" time and date.
+ We propose the following environment variable to be consumed by
+ build systems, tools and wrappers and for it to be used in
+ place of the "current" date and time.
</para>
<para>
It is intended to be a universal standard and not specific to
any particular project or distribution.
</para>
<sect2>
- <title>SOURCE_DATE_EPOCH</title>
+ <title><envar>SOURCE_DATE_EPOCH</envar></title>
<para>
- A UNIX timestamp, defined as the number of seconds
- (excluding leap seconds) since 01 Jan 1970 00:00 UTC.
+ A UNIX timestamp defined as the number of seconds
+ (excluding leap seconds) since <computeroutput>01 Jan
+ 1970 00:00 UTC</computeroutput> exposed through the
+ system's usual environment mechanism.
</para>
<para>
- The value is an integer with no fractional component,
- similar to that output by "date +%s". The value is
- independent of timezones.
+ The value is an ASCII representation of integer with no
+ fractional component, similar to the output of
+ <command>date +%s</command>.
</para>
<para>
- The actual value should be set to the time of the last
+ The value should be set to the time of the last
modification time of the source. For example, in
- Debian, this is automatically set to the same time as
- the latest entry in debian/changelog.
+ Debian, this would be set to the timestamp associated
+ with the latest entry in
+ <filename>debian/changelog</filename>.
</para>
<para>
Upstream build processes are encouraged to read and use
this variable in place of any embedded timestamps.
</para>
- </sect2>
- <sect2>
- <title>Rationale</title>
- <para>
- We deliberate do not specify anything resembling a
- "time zone". Developing such a standard would require
- consideration of various issues:
- </para>
- <itemizedlist>
- <listitem><para>
- Intuitive and naive ways of handling human-readable
- dates, such as the POSIX date functions are highly
- flawed and freely mix implicit not-well-defined
- calendars with absolute time. For example, they don't
- specify they mean the Gregorian calendar, and/or don't
- specify what to do with dates before when the Gregorian
- calendar was introduced, or use named time zones that
- require an up-to-date timezone database (e.g. with
- historical DST definitions) to parse properly.
- </para></listitem>
- <listitem><para>
- Since this is meant to be a universal standard that all
- tools and distributions can support, we need to keep
- things simple and precise, so that different groups of
- people cannot accidentally interpret it in different
- ways. So it is probably unwise to try to standardise
- anything that resembles a named time zone, since that
- is very very complex.
- </para></listitem>
- </itemizedlist>
+ <warning>
+ <para>
+ Code should not require the presence of the
+ variable so that it can be built outside of a
+ context where it is provided. Falling back to
+ the "current" time may be acceptable behaviour
+ if the variable is missing or malformed.
+ </para>
+ <para>
+ In addition, care should be taken to avoid
+ timezone and locale-specific formatting of the
+ value of <envar>SOURCE_DATE_EPOCH</envar>. If
+ it is deemed essential that an end-user can
+ see this timestamp in their own locale or
+ timezone, this formatting must be delayed until
+ run-time.
+ </para>
+ </warning>
</sect2>
</sect1>
<sect1>
- <title>Implementation issues</title>
- <itemizedlist>
- <listitem><para>
- Care should be taken to avoid timezone and locale-specific
- formatting of SOURCE_DATE_EPOCH during build-time. If it is
- deemed essential that an end-user can read this timestamp in
- their own locale and timezone, this formatting must be delayed
- until run-time.
- </para></listitem>
- <listitem><para>
- Code should avoid requiring the presence of the variable so
- that it can still be built outside of a context where this
- variable is available. Depending on the context, falling back
- to the "current" time may be acceptable behaviour if the
- variable is missing.
- </para></listitem>
- </itemizedlist>
-
- <sect2>
- <title>Examples</title>
- <para>
- A number of examples are available at: <ulink
- url="https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples">https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples</ulink>.
- </para>
- </sect2>
+ <title>Examples</title>
+ <para>
+ A number of examples are available at <ulink
+ url="https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples">https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Examples</ulink>.
+ </para>
</sect1>
<sect1>
<title>Copyright</title>
<para>
- Copyright (C) 2014, 2015 See Contributors List
+ Copyright © 2014, 2015 See Contributors List
</para>
<para>
Permission is hereby granted, free of charge, to any person
@@ -280,15 +275,35 @@
<sect1>
<title>Contributors</title>
- <para>Axel Beckert</para>
- <para>Chris Lamb (lamby)</para>
- <para>Chris West (Faux)</para>
- <para>Dmitry Shachnev</para>
- <para>Eduard Sanou</para>
- <para>Holger Levsen</para>
- <para>Jérémy Bobbio (Lunar)</para>
- <para>Mattia Rizzolo</para>
- <para>Ximin Luo</para>
+ <itemizedlist>
+ <listitem><para>
+ Axel Beckert
+ </para></listitem>
+ <listitem><para>
+ Chris Lamb (lamby)
+ </para></listitem>
+ <listitem><para>
+ Chris West (Faux)
+ </para></listitem>
+ <listitem><para>
+ Dmitry Shachnev
+ </para></listitem>
+ <listitem><para>
+ Eduard Sanou
+ </para></listitem>
+ <listitem><para>
+ Holger Levsen
+ </para></listitem>
+ <listitem><para>
+ Jérémy Bobbio (Lunar)
+ </para></listitem>
+ <listitem><para>
+ Mattia Rizzolo
+ </para></listitem>
+ <listitem><para>
+ Ximin Luo
+ </para></listitem>
+ </itemizedlist>
</sect1>
</article>
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/source-date-epoch-spec.git
More information about the Reproducible-commits
mailing list