Succinct

Differences between revisions 1 and 2
Revision 1 as of 2006-02-08 18:35:38
Size: 1854
Editor: host86-132-155-103
Comment: Psync braindump
Revision 2 as of 2006-02-08 19:02:25
Size: 4154
Editor: host86-132-155-103
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Line 32: Line 31:
 * patch, reversibly transform b2 into b1.  * patch, reversibly transform b2 into b1.  Destination checksum, but unknown source checksum---can give hint though.
Line 35: Line 34:
 * literal, l1, c1, string[l1]="..."
Line 36: Line 36:
When contructing the z-map, if it is not possible to reverse-engineer the compression parameters needed to reconstruct the block, it is not possible to call it a zlib block! Call it linear instead. When contructing the z-map, if it is not possible to reverse-engineer the compression parameters needed to reconstruct the block, it is not possible to call it a zlib block! Call it linear instead, same applies to bzip2.

Not that a zlib mapping is also a linear mapping. This saves the round-tripping problem if also latter blocks of a zlib stream are required.
Line 39: Line 41:

More important in the case of Ubuntu than Debian as the Packages.gz gets pushed more often.

If a p

== Relation to 'patch' approach ==

Focus on using patch to get from block b1 to b2, ...but there are multiple checksums of of b2 that can produce b1 when applied. If we have a pre-known corpus of previous packages hints can be providers for those checksums that the patch is known to apply to.

Additionally a hint can be provided for which unpacked file this particular section applies to with the hint to try applying the patch to that.

When the range that the patch affects is known, the zlib source stream can be repacked to insert additional reset markers before and after the range that is changed. This allows just that altered range to be fetched rather than the whole thing.

== Linear ranges ==

Contigious linear ranges can be merged.

== deb file as receipes ==

{{{
 linear arch header

 linear arch debian-binary header
 linear debian-binary

 linear arch control.tar.gz header
 linear control.tar.gz gz header
 zlib control.tar
 linear control.tar.gz gz footer

 linear arch data.tar.bz2 header
 linear data.tar.bz2 bzip2 header
 bzip2 data.tar
 linear data.tar.bz2 bzip2 footer
}}}

 1. Above linear sections can all be concatrenated as they are unlikely to appear in part elsewhere and are very short in length.
 2. The zlib and bzip2 are actually in multiple depending on the size.

== Depths ==

Multiple depths of non-linear transforms (stacking) should be supported but it seems excessive and doesn't bring an immediate benefit.

== Tarball case ==

The 512 byte tar headers are going to cause massive zlib section fetches even if 100% of the files either side of them can be found and constructed locally on the disk. To remain backwards (not modifying the original tarball in any way) it might be best to provide these as literal ranges to make them directly addressible (since they are mostly zeros they will compress)

+ Wikis are good because everything you write stays in one place. - Firefox and X crashing take out all your hard work.

PSync was suggested by mvo, but it'll likely to have to be named something else as there's a few crashes from Googling. Not zsync though, even if the ideas and some of the base code is from there.

Rational

I should concentrate on psync and stop playing with making more laptop/tablet buttons work out of the box because:

  • Can do the buttons for Dapper + 1
  • PSync makes a real world difference, now
    • Saves users bandwidth
      • Saves money
      • Saves time
      • Saves frustration
    • Saves mirror bandwidth
      • Saves money
      • Saves CPU as comparison is done client side
    • Only depends on HTTP and now rsync.

Because Ubuntu runs cron.daily 48 times per day that means that each update requires a fresh 5MB download of Packages.gz for a single package change.

Receipes

Receipes take a block b1 of length l1 and checksum c1 and transfer it into block b2 with length l2 and checksum c2 using parmeters p0..n

  • linear, c1 == c2, l1 == l2
  • zlib, l2 normally shorter than l1, c1 != c2 using compression parameters p1,2,3
    • special rules as fetches can end on partial flushes, but not start on one, can simplify the common case by having multiple overlapping ranges of gradually extending lengths.
  • bzip2, l2 normally shorter than l1, c1 != c2, using compression parameters p1
  • patch, reversibly transform b2 into b1. Destination checksum, but unknown source checksum---can give hint though.
  • ed, reversibly transform b2 into b1
  • i386jmp, transform block b2 into b1 of l1 == l2 using parameter p0, base-offset
  • literal, l1, c1, string[l1]="..."

When contructing the z-map, if it is not possible to reverse-engineer the compression parameters needed to reconstruct the block, it is not possible to call it a zlib block! Call it linear instead, same applies to bzip2.

Not that a zlib mapping is also a linear mapping. This saves the round-tripping problem if also latter blocks of a zlib stream are required.

Packages.gz case

More important in the case of Ubuntu than Debian as the Packages.gz gets pushed more often.

If a p

Relation to 'patch' approach

Focus on using patch to get from block b1 to b2, ...but there are multiple checksums of of b2 that can produce b1 when applied. If we have a pre-known corpus of previous packages hints can be providers for those checksums that the patch is known to apply to.

Additionally a hint can be provided for which unpacked file this particular section applies to with the hint to try applying the patch to that.

When the range that the patch affects is known, the zlib source stream can be repacked to insert additional reset markers before and after the range that is changed. This allows just that altered range to be fetched rather than the whole thing.

Linear ranges

Contigious linear ranges can be merged.

deb file as receipes

 linear   arch header

 linear   arch debian-binary header
 linear   debian-binary

 linear   arch control.tar.gz header
 linear   control.tar.gz gz header
 zlib     control.tar
 linear   control.tar.gz gz footer

 linear   arch data.tar.bz2 header
 linear   data.tar.bz2 bzip2 header
 bzip2    data.tar
 linear   data.tar.bz2 bzip2 footer
  1. Above linear sections can all be concatrenated as they are unlikely to appear in part elsewhere and are very short in length.
  2. The zlib and bzip2 are actually in multiple depending on the size.

Depths

Multiple depths of non-linear transforms (stacking) should be supported but it seems excessive and doesn't bring an immediate benefit.

Tarball case

The 512 byte tar headers are going to cause massive zlib section fetches even if 100% of the files either side of them can be found and constructed locally on the disk. To remain backwards (not modifying the original tarball in any way) it might be best to provide these as literal ranges to make them directly addressible (since they are mostly zeros they will compress)

PaulSladen/Succinct (last edited 2008-08-06 16:22:24 by localhost)