PhasedUpdates
Rationale
TODO
When do we stop a phased update?
There are three conditions under which we want to consider stopping the phasing of an update:
- Updates that purport to fix an issue, which do not.
- New buckets since the previous version of a package.
- The rate of crashes increases.
Updates that do not actually fix an issue are worth notifying a developer about, but it is not worth stopping the phasing of an update over them.
New buckets since the previous version of a package
- We will create an AMQP pub/sub queue for modified bucket notifications.
- We will keep track of the systems that report into a bucket.
Every time a bucket is changed, we'll first check to see if this is new in this version of the package. If it is, we'll get the first X+1 columns in the BucketSystems column family for the key of the bucket. If we get exactly X+1 results back, an alert will be fired.
If the number of systems reporting any new bucket exceeds X, where X is initially set to 3, an alert will be fired.
The rate of crashes increases
This is a periodic check. At every interval, the system will check if the standard deviation of the number of crashes for each day in the past two weeks is less than the number of crashes seen today. If it is, an alert will be fired.
This algorithm may require tuning to account for the population size increasing dramatically around milestones and release day.
This will require a new counter-based column family where we increment a counter for the row of release:package and the column name of the YYYYMMDD date.