CrashReporting
|
Size: 12124
Comment: add webui mockup
|
Size: 8965
Comment: write code section, address some comments
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 3: | Line 3: |
| * '''Contributors''': MatthewPaulThomas | |
| Line 57: | Line 56: |
| === Edgy+1 === | === Later Ubuntu versions === |
| Line 59: | Line 58: |
| '''This design is not finalized yet. It will need discussing at the post-Edgy UDS.''' For Edgy+1, there should be some sort of database (like [http://talkback-public.mozilla.org/search/start.jsp Mozilla Talkback] or [https://sodium.ubuntu.com/~jamesh/oops.cgi Launchpad Oops]) that can record crash reports, for aggregation and bug reporting by QA staff. Crash victims should no longer fill out bug reports manually, because it's time-consuming, complicated, and usually not something they're interested in, and because [http://www.microsoft.com/whdc/maintain/WERHelp.mspx 80 percent of crashes come from 20 percent of the bugs]. |
Eventually, crash reports should be stored in a separate database in LP (like [http://talkback-public.mozilla.org/search/start.jsp Mozilla Talkback] or [https://sodium.ubuntu.com/~jamesh/oops.cgi Launchpad Oops]). Crash victims should no longer fill out bug reports manually, because it's time-consuming, complicated, and usually not something they're interested in, and because [http://www.microsoft.com/whdc/maintain/WERHelp.mspx 80 percent of crashes come from 20 percent of the bugs]. |
| Line 71: | Line 68: |
| The window should now be a normal window, not a floating window or a dialog. When the "Send" button is clicked, it should be made insensitive, and a progress animation should be shown in the bottom left corner of the window until the transmission succeeds or fails. If it fails, the progress animation should disappear, an error alert should be shown explaining the problem (for example, "The error report could not be sent because there is no Internet connection. Try again later."), and the "Send" button should be made available again. If the transmission succeeds, the window should disappear. Or maybe the report could be queued for sending later. -- mpt |
The window should now be a normal window, not a floating window or a dialog. When the "Send" button is clicked, it should be made insensitive, and a progress animation should be shown in the bottom left corner of the window until the transmission succeeds or fails. If it fails, the progress animation should disappear, an error alert should be shown explaining the problem (for example, "The error report could not be sent because there is no Internet connection. Try again later."), and the "Send" button should be made available again. If the transmission succeeds, the window should disappear. If transmission fails, the report is kept for at most 7 days, so that users can still send it later by clicking on the bomb icon in the panel. |
| Line 76: | Line 71: |
=== Code === |
|
| Line 93: | Line 86: |
| == Unresolved issues == | === Code === |
| Line 95: | Line 88: |
| * How can we solve the problem of confidential information being included? Either strip it out, or restrict the kind of person who can see it. Restricting it to Canonical staff would be annoying and maybe not scalable. | The crash database needs to offer an XML-RPC or HTTP POST interface for anonymous crash report submission (it might be possible to reuse the Malone cloakroom for this). apport uses this interface to send the report to the crash database. |
| Line 97: | Line 90: |
| MartinPitt: what about using private bugs? | Gnome's crash database has a fairly good duplicate finder even without full symbols in the stack trace; we need to do the same to avoid human work for duplicate elimination. In a later stage of implementation, the crash database should automatically invoke `apport-retrace` for getting symbolic and useful stack traces for crash reports. However, this requires root access to a sandbox system to install the package where the crash occured in. Access to the raw crash reports should be very limited, since they potentially contain sensitive information. Thus the web interface needs to ask for LP authentication, and limit acces to a trusted crash report triage team (initially, `ubuntu-core-dev`). Bug reports (without a core dump, just with textual data) can be created from crash reports for wider triaging and solving. When the system works, we want to disable bug-buddy by default and use apport to intercept and report Gnome-related crashes, too. Discussion with Sebastien revealed that email notifications about crashes are not requested. It is prefered to regularly check the crash database for new issues and provide good search options for good default filters. == Discussion == |
| Line 101: | Line 104: |
| * If they have a Launchpad account, let them log into it; or offer them the chance to create one if they are willing to be pestered about it. Add checkboxes, "Notify me about progress on this issue" and "I am willing to be contacted for more information." If they don't check them, don't pester them. --JohnMoser * ["UbuntuDownUnder/BOFs/AutomatedCrashReporting"] suggests that bug-buddy is way too complicated and limited to gnome applications. The bug-buddy ui has had a lot of improvements recently and has a strong community improving it. Have you reconsidered using bug-buddy, but with the appport backend for detecting the crashes? This would remove a lot of duplicated code and result in a product which gets updated in the future for free. [http://live.gnome.org/ImprovingDebugging Bug Buddy Plans] indicate that they are planning server based processing for inserting debug symbols and resolving duplicates -- why can't ubuntu use/implement this, but integrate it with launchpad? * This may not have been mentioned because it is too obvious, but I will throw it down anyway. I think that it is important that the bugs captured by Bug Buddy are intercepted by Apport instead. At the moment Edgy has two completely different crash-capturers and it will be quite confusing to new users. Absorbing Bug Buddy into Apport will mean that Ubuntu has a consistent crash interface and will ensure that Gnome crashes have all the information captured by Apport (from what I can see, Apport crashes seem to have more information and can be retraced etc.) - LunaTick * Automated reporting will result in a flood of similar, non-identical bugs. How will you handle this? I attempt the problem with AutomatedProblemReportsTagging but this needs deeper examination. --JohnMoser * LunaTick : I have added a comment to that page with a link to [https://lists.ubuntu.com/archives/ubuntu-devel/2006-October/021502.html Martin's email] on BugPatterns. |
|
| Line 108: | Line 107: |
== BoF discussion == * If you have a good stacktrace, you can aggregate common crashes * Most stacktraces from users are poor * `apport-retrace` applies debug-symbol magic to stacktraces and can generate useful reports * requires chroot for each distribution release * will become easier with Xen * store crash reports in separate database * user interface for showing similar crash reports and letting developers report a bug on them * Martin has very old code for doing something like this * core dumps may contain confidential information * developer can strip confidential information when retracing + reporting a bug * using Launchpad could make it easier to promote a crash report to a bug * or maybe keep it separate * standalone (non-Launchpad) code for parsing stacktraces and output as Python dictionary/RFC822 * this will make faster the process of downloading stacktraces, applying debug symbols etc * who should view the core dumps? * core-dev + Kees Cook * this requires Launchpad authentication :-] * retracing on the client? * would require downloading huge amounts of debugging symbols for large packages * Firefox crash = uploading about 5 MB compressed (Gnome programs smaller, OpenOffice larger?) * Launchpad uploading code is not suitable for files this large (Zope3 limitation) * future work: some way to distribute this so that people can retrace it themselves * without swamping your network connection by downloading lots of symbols * stage 1: database, primitive UI * crashes from last ''n'' days (Mozilla Talkback: 3 months) * by package, by date, by top function, by dependencies, regexps for creating bug patterns... * download complete report/coredump * Martin, Seb, and Kees to work on this * saves cluttering Malone * stage 2: aggregation, better UI * currently we get 15 crashes a day * Gnome gets 500 crashes a day * how to notify people about incoming crash reports? * don't -- just visit the Web page * throw out old reports (when any library is updated) * add gdb as a dependency of apport (currently a dependency of bug-buddy) * run as part of the Launchpad application server * use for derivatives registered in Launchpad |
Launchpad entry: https://launchpad.net/distros/ubuntu/+spec/crash-reporting
Created: Date(2006-07-31T12:05:45Z) by MatthewPaulThomas
Packages affected: apport, apport-gtk
Summary
When a program crashes, an alert should appear that explains what just happened, makes it easy to report the problem to Ubuntu developers, and makes it easy to reopen the crashed program if appropriate. Because a few bugs cause most crashes, this system should eventually involve a database of crash reports, automatically aggregated by type so that developers can allocate their time to the top crashers.
Rationale
We want to improve Ubuntu's reliability. See also [:UbuntuDownUnder/BOFs/AutomatedCrashReporting:AutomatedCrashReporting], BugReportingTool.
Use cases
- Willy was creating a logo for his soccer club in Inkscape when it crashed. Being the family's Ubuntu expert, he feels a responsibility to help improve the system. He's reported one or two bugs in Malone before, though it wasn't a particularly enjoyable experience.
- Aunt Martha was adding to her genealogical records in Gramps when it crashed. She doesn't know anything about reporting bugs, and has no desire ever to report any.
- Millie was logging in to her bank account when Firefox crashed. She's used to clicking the "Send" button for Windows Error Reporting, but it would be a bad idea for her to report this problem since anyone could find her banking password in the crash informaation.
- Thunderbird has just crashed on Billie's machine for the third time in three minutes. She angrily blats away the error reporting alert within 0.8 seconds of it opening.
Design
The crash reporting interface is an interruption; it is not at all related to the user's goal (designing a logo, recording genealogy, online banking, etc). To make things worse, the crash itself has probably just eaten some of the victim's work. They will likely be angry with computers in general, and Ubuntu in particular. Therefore the crash reporting interface must be very simple and apologetic.
Another design problem is that most people who have come from Windows XP or later, or Mac OS X 10.3 or later, will be used to crash reports that are confidential to Microsoft, Apple, or trusted ISVs. For Edgy, Ubuntu crash reports will not be like this: if someone reports a bug and attaches their crash report, anyone will be able to see it. Bug reports can be marked private after the fact, but the crash reporting interface itself must take some responsibility for discouraging leaking of sensitive information. (Mozilla.org can [http://www.mozilla.org/quality/qfa.html say that] "Sensitive data, such as passwords, Web sites visited, and e-mail addresses will not be collected". We can't guarantee the same, because our crash reporting system is package-agnostic and can't make assumptions about the data involved.)
Comparisons
[http://windowsdevcenter.com/pub/a/windows/2004/03/16/wer.html Windows Error Reporting under the covers]
[http://developer.apple.com/technotes/tn2004/tn2123.html Apple TN2123: CrashReporter]
[http://ramikayyali.com/archives/2005/07/26/krash Kool Krashing] (see also [http://amarok.kde.org/blog/archives/4-amaroK-1.2-It-Crashes-Somewhat-Less.html amaroK 1.2 - it crashes somewhat less])
[http://flickr.com/photos/jfpoole/143205824/ Adium Crash Reporter]
Edgy
For Edgy, it will not possible to report a bug without using Launchpad's Web interface. So we should (a) apologize for the error, (b) make it easy to report a bug if that would help, and (c) make it easy to reopen the program if appropriate.
The crash reporter should determine whether the crashed program generated a useful backtrace, then wait until three seconds have passed since the crash, and determine whether the crashed program is now running (meaning that it restarted automatically, that multiple copies were running, or that the user restarted it quickly).
|
There is a useful backtrace |
There is not a useful backtrace |
The program is not running |
attachment:edgy.jpg |
Same, but without the secondary text, and without the "Report a Bug…" button. |
The program is running |
attachment:edgy-no-reopen.jpg |
No alert at all. |
If a human-readable name can be found for the program (from a .desktop file), the primary text of the alert should be "Sorry, Name of Program closed unexpectedly." Otherwise, it should be "Sorry, the program “binary-name” closed unexpectedly."
The keyboard equivalent for the "Reopen" or "OK" button should be Enter, not a letter.
Clicking "Report a Bug…" should open both a Web browser to Ubuntu's Bugs page in Launchpad; and also a floating window near the top left corner of the screen, containing the bug information.
attachment:edgy-report.jpg
In the floating window, the icon should be draggable into the browser's filepicker to select that file, and the pathname should also be copyable text. The "What does the file contain?" expander should disclose a read-only text field containing the crash log as wrapped text.
Later Ubuntu versions
Eventually, crash reports should be stored in a separate database in LP (like [http://talkback-public.mozilla.org/search/start.jsp Mozilla Talkback] or [https://sodium.ubuntu.com/~jamesh/oops.cgi Launchpad Oops]). Crash victims should no longer fill out bug reports manually, because it's time-consuming, complicated, and usually not something they're interested in, and because [http://www.microsoft.com/whdc/maintain/WERHelp.mspx 80 percent of crashes come from 20 percent of the bugs].
This simplifies the initial crash alert a little ...
attachment:funky.jpg
... And it simplifies the resulting reporting interface a lot.
attachment:funky-report.jpg
The window should now be a normal window, not a floating window or a dialog. When the "Send" button is clicked, it should be made insensitive, and a progress animation should be shown in the bottom left corner of the window until the transmission succeeds or fails. If it fails, the progress animation should disappear, an error alert should be shown explaining the problem (for example, "The error report could not be sent because there is no Internet connection. Try again later."), and the "Send" button should be made available again. If the transmission succeeds, the window should disappear. If transmission fails, the report is kept for at most 7 days, so that users can still send it later by clicking on the bomb icon in the panel.
Implementation
Web user interface mockup
Start page and query form. This also shows the most recent crashes, top crashers first:
attachment:crashdb-query.jpg
The search form results are displayed in a list:
attachment:crashdb-result.jpg
Crash report details:
attachment:crashdb-details.jpg
Code
The crash database needs to offer an XML-RPC or HTTP POST interface for anonymous crash report submission (it might be possible to reuse the Malone cloakroom for this). apport uses this interface to send the report to the crash database.
Gnome's crash database has a fairly good duplicate finder even without full symbols in the stack trace; we need to do the same to avoid human work for duplicate elimination.
In a later stage of implementation, the crash database should automatically invoke apport-retrace for getting symbolic and useful stack traces for crash reports. However, this requires root access to a sandbox system to install the package where the crash occured in.
Access to the raw crash reports should be very limited, since they potentially contain sensitive information. Thus the web interface needs to ask for LP authentication, and limit acces to a trusted crash report triage team (initially, ubuntu-core-dev). Bug reports (without a core dump, just with textual data) can be created from crash reports for wider triaging and solving.
When the system works, we want to disable bug-buddy by default and use apport to intercept and report Gnome-related crashes, too.
Discussion with Sebastien revealed that email notifications about crashes are not requested. It is prefered to regularly check the crash database for new issues and provide good search options for good default filters.
Discussion
Telling [people] they will not receive a reply is awful, worse than awful even. One of the best things about open source is the fact that we have an open bug tracking system. -- CoreyBurger
It's unfortunate, but nowhere close to awful. This is about improving the quality of Ubuntu, not about giving support (compare [http://hendrix.mozilla.org/ hendrix.mozilla.org]). The openness of the bug tracking system is not relevant to this issue, especially as the user base becomes less geeky on average. -- mpt