Summary

With automated bug reporting from apport, kerneloops and checkbox we frequently receive duplicate bug reports. The duplicate bug reports usually don't provide much value, subsequently there should be mechanisms in place to prevent filing of duplicate bug reports. Additionally, tools for finding and consolidating duplicates already reported in Launchpad should be developed and distributed to bug triaging teams.

Rationale

During the Karmic release cycle we had lots of duplicate bugs reported, for example consider bug 422536 with 107 duplicates and bug 429322 with 1097 duplicates. The high quantity of duplicates utilizes extra resources for very little value and results in a poor user experience for bug reporters.

User stories

Arnold is an Ubuntu user who wants to help make Ubuntu better and is testing the Beta version of the next release of Ubuntu. Unfortunately, when he is are using pornview it ends up crashing on him and he receives an apport dialog notifying them of the crash. He proceeds to file the crash report in Launchpad only to have it duplicated hours later. He continues to receive e-mail for every bug that is a made a duplicate of the master report, since he is subscribed to it, and becomes rather annoyed.

Franco is an Ubuntu developer and is subscribed to package bug reports for the package pytrainer which he maintains. There happens to be a bug in pytrainer which affects every user of it - and there are lots! Franco becomes inundated with bug mail regarding duplicates of this bug and ends up unsubscribing from bug reports about pytrainer.

Implementation

Code Changes

The apport-retracer shall be modified to subscribe ubuntu-bugcontrol to crashes with more than 10 duplicates. This will act as a notification system so the team knows what bugs should have a pattern written for them. Additionally, in the even that apport finds a matching pattern, for an already reported bug, it will then 'metoo' that bug report. Also apport-collect shall be modified to encourage collectors to report a new bug instead of adding their information to someone else's bug report. Apport will also be modified so that more files are searchable by bug patterns.

Bughugger will extended to provide an easy way for search for duplicate bug reports and merge them.

The ubuntu-qa-tools bzr branch includes a few scripts for working with duplicate bug reports, but these could use some improvement. The is-duplicate script shall be modified to deal with a bug report that has duplicates. A single launchpadlib script shall be written for moving all the duplicates from one bug to another. (Similar to move_duplicates.py from the examples folder in python-launchpad-bugs.) A launchpadlib script will also be written that uses searchTasks.findSimilarBugs() to facilitate in finding duplicate bug reports.

The search-bugs script from ubuntu-bugpatterns will be modified to consolidate duplicate bug reports that it finds.

Documentation / Process Changes

Documentation regarding how to write and test bug patterns will be written in the Ubuntu wiki and linked to from relevant Bug Squad pages. The documentation will also be blogged about to get more people writing bug patterns.

BoF agenda and discussion

With automated bug reporting from apport, kerneloops and checkbox(?) we frequently have duplicate bug reports submitted. The duplicate bug reports usually don't provide much value, subsequently patterns should be written to prevent filing of duplicates. There should also be facilities for identifying which bugs would most benefit from a pattern. Additionally, tools for finding and consolidating duplicates already reported should be developed and deployed.

Finding duplicates

Consolidating duplicates

Avoiding duplicates

Issues

ACTIONS

How to write a pattern

ACTION: document this better for the Ubuntu Bug Control team (and developers) and notify (Brian Murray)

https://wiki.ubuntu.com/Apport/DeveloperHowTo#Bug patterns


CategorySpec

QATeam/Specs/DuplicateBugConsolidation (last edited 2009-12-02 21:25:32 by c-24-21-43-9)