Hadoop

Revision 2 as of 2012-01-24 17:18:15

Clear message

Describe ServerTeam/HadoopPartnerPackaging here.

Overview

During the 12.04 cycle hadoop and a selection of components will be packaged to support upload to the Partner archive.

  • hadoop 0.20.205.0
  • hive 0.7.1
  • pig
  • hbase
  • hcatalog
  • zookeeper 3.3.x

PPA's

A team (http://launchpad.net/~hadoop-ubuntu) and three PPA's have been setup on launchpad; all three PPA's have been enabled for armel, armhf and powerpc so please check before adding anyone else to the team.

dev

ppa:hadoop-ubuntu/dev

Development versions of packages; new versions prior to full testing in testing

testing

ppa:hadoop-ubuntu/testing

Versions of packages which are ready for testing.

stable

ppa:hadoop-ubuntu/stable

Stable versions of packages which have been tested.

Packaging Details

Packages should be built based on the source packages built by bigtop (http://incubator.apache.org/bigtop http://github.com/apache/bigtop). However most packages are based on older debhelper so d/rules etc should be rationalised to use new features:

git clone https://github.com/apache/bigtop
mkdir -p hadoop/debian
cp bigtop/bigtop-packages/src/deb/hadoop/* hadoop/debian
cp bigtop/bigtop-packages/src/common/hadoop/* hadoop/debian

Packages should ship upstart configurations; see hadoop for examples.

Packages should be based on upstream binary distributions; Java components should not be rebuild but native components will need to be rebuild with appropriate patches for precise + official ports.

Packaging only branches (just the debian folder) should be created for all packages and stored under the hadoop-ubuntu team, e.g. lp:~hadoop-ubuntu/+junk/hadoop

Packages should define a target in debian/rules called get-orig-source which pulls the correct version of the upstream distribution from an appropriate upstream source. Ideally checksums should be validated.

Packages should be of source/format - 3.0 (quilt) and should have packaging version numbers i.e. 0.20.205.0-0ubuntu1~hadoopX. A ~hadoopX suffix should be used to support multiple upload iterations to the PPA's.