NoMoreSourcePackages
Launchpad Entry: https://launchpad.net/distros/ubuntu/+spec/no-more-source-packages
Created: Date(2006-06-09T11:42:21Z) by ScottJamesRemnant
Contributors: ScottJamesRemnant
Packages affected:
Introduction
When we talk about programming languages, we have a test we can employ to tell us whether or not something is actually a programming language, is it Turing Complete? Through this test we know that C is a programming language and HTML isn't. And I'm sure you know that the definition of the machine is actually a little bit abstract, but nonetheless suffices for the problem.
We don't have anything like that to define what a revision control system is, and I think that it might be useful, so let's define some basic tenants that one must fulfil to be one.
- The system must store or be able to construct on demand a snapshot of a file or tree (a revision).
- Each revision of a file or tree must be uniquely identifiable within the available set.
- Revisions should be annotated with useful meta-data, e.g. the author, time of commit, reason for change, etc.
- The system must always be able to provide the most recent revision, and optionally historical revisions.
- Revisions may be ordered to indicate a line of history, and these lines may diverge into multiple branches.
These five principles allow RCS, CVS, Subversion, Bazaar and Arch to be all considered revision control systems despite their individual quirks.
Interestingly, there's something else that passes; the Debian-style source archive. Every day, Debian and Ubuntu developers are working with a revision control system, albeit an initially non-obvious one. Let me prove it, here's how it passes.
1) For a given source package, the archive is able to produce a .tar.gz and optional .diff.gz that can be combined to give a revision of the source. This is no more unsual than, e.g. Arch retrieving a .tar.gz and some patches to apply to it. 2) Each revision of a source package is uniquely identifiable by its version number, which are required to be unique within the archive.
3) Each revision carries a debian/changelog file which identifies the author, date, "log message" and some other meta data. This being inside the snapshot is no more unusual than Arch storing them in {arch}/.../log
Indeed, like Arch the archive also copies some of this data into an external log file for easier access (the .dsc file).
4) The most recent revisions (HEADs, if you will) are always available, and if the archive admin wishes, history can be retained. By default there's a high history drop, but then Arch also drops its history, and this is configurable. An admin with plenty of disk space can choose to retain history as done on snapshot.debian.net, for example.
Note that Arch retains the line of history within the repository, and just loses the ability to reconstruct prior revisions. This is also true for the Debian-style source, the line of history remains in debian/changelog.
5) The debian/changelog file also lists the previous revisions in order, providing a history of development. This history can diverge wherever the source is published in different places, provided the version numbers remain unique.
We can even go as far as providing analogies to the commands:
apt-get source => checkoutBR dupload/dput => commit
This goes some way to explaining why we've found it so hard to get Ubuntu developers using a revision control system every day, they already are.
The Problem
Using bzr to develop software, and then placing it in a source package and uploading it to the archive is exactly akin to developing the software with bzr, and then committing it to CVS to make it available to others!
Sooner or later it becomes too easy to make the quick fix in CVS, and forget to copy it to bzr later.
And when other people get involved, they may be more familiar with CVS, entirely unaware of the bzr repository, or just in a hurry; and thus you need to keep track of two repositories simultaneously.
Developers make things worse too, not content with their revision-control system only offering a single "changes from upstream" collection (which is already a bit like CVS and its vendor branches), they've begun implementing internal branches using debian/patches (a bit like git?).
This is the model HCT historically tried to retain compatibility with. In effect it was using bzr to store revisions that are then exported to another revision control system (a primitive one made of patches) which are themselves stored in yet another revision control system (the Debian archive).
No wonder it feels wrong.
Solution?
So given this new way of looking at reality, let's have a rethink for a few seconds what the world should look like.
Why do we need source packages? What are they for?
Building
First answer: that's what we build to make the binaries, and those are what we _really_ care about.
Given the definition that a given version of a source package is just a revision in the Debian-style archive revision control system, why not drop the archive entirely and just replace it with a different revision control system.
Rather than the buildd calling apt-get source, why doesn't it just call bzr checkout?
We can throw away all the source package handling, instead of all that tedious mucking around with uploads, queueing, publishing and domination, all a developer need to do is say "I want to build revision R of repository S, and have it published in distro D."
Depending on the request and their permissions that might result in a reject (not allowed, or too older revision), manual approval (D is closed or new) or acceptance (and into the build queue).
We could even do away with debian/changelog and have dpkg-gencontrol and dpkg-genchanges get the information directly from bzr -- this is almost supported in the tools (a custom changelog "parser" and a small patch would do the trick).
That's equivalent to a native tarball upload; developers will probably still want to be able to do non-native uploads... well, those turn out to be easy too. "Take upstream revision U, and then merge my branch." If the merge fails, it's no different to a diff.gz failing to apply.
We can even do debian/patches style uploads!
"Take the upstream revision, and merge the folowing branches, and then build it."
Obviously this information will need to be recorded so that it's known how to rebuild the binaries again later, but it's nowhere near as complex as the current SourcePackage* table set.
The upstream branches are just our bzr imports of their own revision control, or their tarballs.
GPL compliance
There is a second use for source packages, GPL and other licence compliance. That's easy to satisfy though, the buildds can generate a .tar.gz before they build it (either including or excluding the .bzr, depending on the mood).
We could even still generate .tar.gz and .diff.gz if we felt like it. The point is that to us these are just a result of building, like debs or rosetta translation tarballs; they go in the archive, but we don't use them for anything ourselves.
Other Concerns
The development tools in the distribution would simply grab the source from bzr, and to "upload" would just push it and request the build.
What would we do about Debian? Well, we'd just use Sourcerer to import their sources into our bzr branches -- it wouldn't be much of a tweak to make it format them differently, it was designed to be flexible and maintainable like this.
Getting There
So, controversial, source packages are obsolete! Obviously this is all blue-sky, and a long way off, right? I mean, how long would this all take to implement?
- 1) bzr imports of upstream ... coming along nicely? 2) somewhere to upload things to ... super mirror's working ok for me 3) drop source packages from soyuz and replace with "source branch lists" ... this is probably the long bit, requires some spec 4) buildds to checkout and merge the branches, and generate the "source" ... can't see this being hard 5) modify sourcerer to import Debian in this way ... mostly just deleting code that HCTifies things
The scary thing is that the only reason I can see this taking a while to be implemented is politics. A lot of code has been written to handle source packages, and having them vanish may upset people. Also some people may still be somewhat attached to the idea.
Outstanding issues