PhasedUpdates
Rationale
We want to be able to slowly phase updates of software packages. If we increase the percentage of systems that can install such an update and it becomes clear from data harnessed from errors.ubuntu.com that the update is less reliable than the one which proceeded it, we should programmatically stop the phasing.
When do we stop a phased update?
There are three conditions under which we want to consider stopping the phasing of an update:
- New buckets since the previous version of a package.
- The rate of crashes for the package increases.
- Updates that purport to fix an issue, which do not.
Updates that do not actually fix an issue are worth notifying a developer about, but it is not worth stopping the phasing of an update over them. They will not be considered further here.
New buckets since the previous version of a package
- We will create an AMQP pub/sub queue for modified bucket notifications.
- We will keep track of the systems that report into a bucket.
Every time a bucket is changed, we'll first check to see if this problem is new with this version of the package. If it is, we'll get the first X+1 columns in the BucketSystems column family for the key of the bucket. If we get exactly X+1 results back, an alert will be fired, provided the bucket is for an official version of an Ubuntu package.
Thus, if the number of systems reporting any new problem exceeds X, where X is initially set to 3, an alert will be fired.
The rate of crashes for the package increases
This is a periodic check. At every interval, the system will check if the standard deviation of the number of crashes for each day in the past two weeks is less than the number of crashes seen today. If it is, an alert will be fired.
This algorithm will likely require modification as it will likely only exceed the standard deviation at or near the end of the day. One idea possible way to deal with this is by dividing both quantities of crashes by the number of hours that have passed in the day. However, this assumes that errors receives similar amounts of crashes throughout the day.
This algorithm may require tuning to account for the population size increasing dramatically around milestones and release day.
This will require a new counter-based column family where we increment a counter for the row of release:package and the column name of the YYYYMMDD date.
This may also be useful for packages not under going phased updates, i.e. for finding out if a new version of a package is crashier than the previous one.
Stopping of a phased update
An API call will need to be provided by errors.ubuntu.com for Launchpad to check so that it can determine whether or not to increment the Update Percentage.
Additionally, when errors recommends stopping a phased update an e-mail should be sent to the ubuntu-release team so that they are notified the phasing of the update has stopped. There should be a way for members of ubuntu-release or ubuntu-archive to override the stopping of a phased update.