CrashReporting
|
Size: 3727
Comment: Use cases, Design problems, Comparisons
|
Size: 8965
Comment: write code section, address some comments
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 1: | Line 1: |
| ## Register at https://launchpad.net/distros/ubuntu/+specs * '''Launchpad entry''': none yet |
* '''Launchpad entry''': https://launchpad.net/distros/ubuntu/+spec/crash-reporting |
| Line 4: | Line 3: |
| * '''Contributors''': MatthewPaulThomas * '''Packages affected''': |
* '''Packages affected''': apport, apport-gtk |
| Line 9: | Line 7: |
| When a program crashes, an alert should appear that explains what just happened, makes it easy to report the problem to Ubuntu developers, and makes it easy to reopen the crashed program if appropriate. Because a few bugs cause most crashes, this system should eventually involve a database of crash reports, automatically aggregated by type so that developers can allocate their time to the top crashers. |
|
| Line 11: | Line 11: |
| We want to improve the reliability of Ubuntu software, by making it easy for people to report details about crashes. See also: * [:UbuntuDownUnder/BOFs/AutomatedCrashReporting:AutomatedCrashReporting] * BugReportingTool |
We want to improve Ubuntu's reliability. ''See also'' [:UbuntuDownUnder/BOFs/AutomatedCrashReporting:AutomatedCrashReporting], BugReportingTool. |
| Line 21: | Line 17: |
| * Aunt Tillie was adding to her genealogical records in Gramps when it crashed. She doesn't know anything about reporting bugs, and has no desire ever to report any. | * Aunt Martha was adding to her genealogical records in Gramps when it crashed. She doesn't know anything about reporting bugs, and has no desire ever to report any. |
| Line 29: | Line 25: |
| The crash reporting interface is an interruption; it is not at all related to the user's goal (designing a logo, recording genealogy, online banking, etc). To make things worse, the crash itself has probably just eaten some of the victim's work. They will likely be angry with computers in general and Ubuntu in particular. Therefore the crash reporting interface must be ''extremely'' simple, apologetic, and well-humored. | The crash reporting interface is an interruption; it is not at all related to the user's goal (designing a logo, recording genealogy, online banking, etc). To make things worse, the crash itself has probably just eaten some of the victim's work. They will likely be angry with computers in general, and Ubuntu in particular. Therefore the crash reporting interface must be very simple and apologetic. |
| Line 31: | Line 27: |
| Another design problem is that most people who have come from Windows 2000 or later, or Mac OS X 10.0 or later, will be used to crash reports that are confidential to Microsoft, Apple, or trusted ISVs. For Edgy, Ubuntu crash reports will not be like this: if someone reports a bug and attaches their crash report, anyone will be able to see it. Bug reports can be marked private after the fact, but most of the prevention of information leaking must occur in the crash reporting interface itself. | Another design problem is that most people who have come from Windows XP or later, or Mac OS X 10.3 or later, will be used to crash reports that are confidential to Microsoft, Apple, or trusted ISVs. For Edgy, Ubuntu crash reports will not be like this: if someone reports a bug and attaches their crash report, anyone will be able to see it. Bug reports can be marked private after the fact, but the crash reporting interface itself must take some responsibility for discouraging leaking of sensitive information. (Mozilla.org can [http://www.mozilla.org/quality/qfa.html say that] "Sensitive data, such as passwords, Web sites visited, and e-mail addresses will not be collected". We can't guarantee the same, because our crash reporting system is package-agnostic and can't make assumptions about the data involved.) |
| Line 36: | Line 32: |
| * [http://developer.apple.com/technotes/tn2004/tn2123.html Apple TN2123: CrashReporter] | |
| Line 38: | Line 35: |
| * [http://unsanity.org/archives/000424.php Smart Crash Reports] | |
| Line 42: | Line 38: |
| For Edgy, it will not possible to report a bug directly. So we should (a) apologize for the error, (b) make it easy for people to report a bug if they want to, and (c) make it let easy for them to reopen the program in question. | For Edgy, it will not possible to report a bug without using Launchpad's Web interface. So we should (a) apologize for the error, (b) make it easy to report a bug if that would help, and (c) make it easy to reopen the program if appropriate. |
| Line 44: | Line 40: |
| attachment:alert-edgy.jpg attachment:alert-edgy-no-reopen.jpg | The crash reporter should determine whether the crashed program generated a useful backtrace, then wait until three seconds have passed since the crash, and determine whether the crashed program is now running (meaning that it restarted automatically, that multiple copies were running, or that the user restarted it quickly). |
| Line 46: | Line 42: |
| (This follows standard HIG layout for an error alert with alternate buttons.) | || ||'''There is a useful backtrace'''||'''There is not a useful backtrace'''|| ||'''The program is not running'''||attachment:edgy.jpg||Same, but without the secondary text, and without the "Report a Bug…" button.|| ||'''The program is running'''||attachment:edgy-no-reopen.jpg||No alert at all.|| |
| Line 48: | Line 46: |
| The alert must not contain the "Reopen" button if the program is one of those that is automatically restarted anyway (`gnome-panel` or `nautilus`). | If a human-readable name can be found for the program (from a `.desktop` file), the primary text of the alert should be "Sorry, Name of Program closed unexpectedly." Otherwise, it should be "Sorry, the program “binary-name” closed unexpectedly." |
| Line 50: | Line 48: |
| === Edgy+1 === | The keyboard equivalent for the "Reopen" or "OK" button should be Enter, not a letter. |
| Line 52: | Line 50: |
| For Edgy+1, we assume there will be some sort of database (like Launchpad's Oops system) that can record crash reports, for aggregation and bug reporting by QA staff; crash victims do not fill out bug reports themselves. | Clicking "Report a Bug…" should open both a Web browser to Ubuntu's Bugs page in Launchpad; and also a floating window near the top left corner of the screen, containing the bug information. attachment:edgy-report.jpg In the floating window, the icon should be draggable into the browser's filepicker to select that file, and the pathname should also be copyable text. The "What does the file contain?" expander should disclose a read-only text field containing the crash log as wrapped text. === Later Ubuntu versions === Eventually, crash reports should be stored in a separate database in LP (like [http://talkback-public.mozilla.org/search/start.jsp Mozilla Talkback] or [https://sodium.ubuntu.com/~jamesh/oops.cgi Launchpad Oops]). Crash victims should no longer fill out bug reports manually, because it's time-consuming, complicated, and usually not something they're interested in, and because [http://www.microsoft.com/whdc/maintain/WERHelp.mspx 80 percent of crashes come from 20 percent of the bugs]. This simplifies the initial crash alert a little ... |
| Line 56: | Line 64: |
| ... And it simplifies the resulting reporting interface a lot. attachment:funky-report.jpg The window should now be a normal window, not a floating window or a dialog. When the "Send" button is clicked, it should be made insensitive, and a progress animation should be shown in the bottom left corner of the window until the transmission succeeds or fails. If it fails, the progress animation should disappear, an error alert should be shown explaining the problem (for example, "The error report could not be sent because there is no Internet connection. Try again later."), and the "Send" button should be made available again. If the transmission succeeds, the window should disappear. If transmission fails, the report is kept for at most 7 days, so that users can still send it later by clicking on the bomb icon in the panel. |
|
| Line 57: | Line 71: |
=== Web user interface mockup === Start page and query form. This also shows the most recent crashes, top crashers first: attachment:crashdb-query.jpg The search form results are displayed in a list: attachment:crashdb-result.jpg Crash report details: attachment:crashdb-details.jpg |
|
| Line 60: | Line 88: |
| == Unresolved issues == | The crash database needs to offer an XML-RPC or HTTP POST interface for anonymous crash report submission (it might be possible to reuse the Malone cloakroom for this). apport uses this interface to send the report to the crash database. Gnome's crash database has a fairly good duplicate finder even without full symbols in the stack trace; we need to do the same to avoid human work for duplicate elimination. In a later stage of implementation, the crash database should automatically invoke `apport-retrace` for getting symbolic and useful stack traces for crash reports. However, this requires root access to a sandbox system to install the package where the crash occured in. Access to the raw crash reports should be very limited, since they potentially contain sensitive information. Thus the web interface needs to ask for LP authentication, and limit acces to a trusted crash report triage team (initially, `ubuntu-core-dev`). Bug reports (without a core dump, just with textual data) can be created from crash reports for wider triaging and solving. When the system works, we want to disable bug-buddy by default and use apport to intercept and report Gnome-related crashes, too. Discussion with Sebastien revealed that email notifications about crashes are not requested. It is prefered to regularly check the crash database for new issues and provide good search options for good default filters. == Discussion == * Telling [people] they will not receive a reply is awful, worse than awful even. One of the best things about open source is the fact that we have an open bug tracking system. -- CoreyBurger * It's unfortunate, but nowhere close to awful. This is about improving the quality of Ubuntu, not about giving support (compare [http://hendrix.mozilla.org/ hendrix.mozilla.org]). The openness of the bug tracking system is not relevant to this issue, especially as the user base becomes less geeky on average. -- mpt |
Launchpad entry: https://launchpad.net/distros/ubuntu/+spec/crash-reporting
Created: Date(2006-07-31T12:05:45Z) by MatthewPaulThomas
Packages affected: apport, apport-gtk
Summary
When a program crashes, an alert should appear that explains what just happened, makes it easy to report the problem to Ubuntu developers, and makes it easy to reopen the crashed program if appropriate. Because a few bugs cause most crashes, this system should eventually involve a database of crash reports, automatically aggregated by type so that developers can allocate their time to the top crashers.
Rationale
We want to improve Ubuntu's reliability. See also [:UbuntuDownUnder/BOFs/AutomatedCrashReporting:AutomatedCrashReporting], BugReportingTool.
Use cases
- Willy was creating a logo for his soccer club in Inkscape when it crashed. Being the family's Ubuntu expert, he feels a responsibility to help improve the system. He's reported one or two bugs in Malone before, though it wasn't a particularly enjoyable experience.
- Aunt Martha was adding to her genealogical records in Gramps when it crashed. She doesn't know anything about reporting bugs, and has no desire ever to report any.
- Millie was logging in to her bank account when Firefox crashed. She's used to clicking the "Send" button for Windows Error Reporting, but it would be a bad idea for her to report this problem since anyone could find her banking password in the crash informaation.
- Thunderbird has just crashed on Billie's machine for the third time in three minutes. She angrily blats away the error reporting alert within 0.8 seconds of it opening.
Design
The crash reporting interface is an interruption; it is not at all related to the user's goal (designing a logo, recording genealogy, online banking, etc). To make things worse, the crash itself has probably just eaten some of the victim's work. They will likely be angry with computers in general, and Ubuntu in particular. Therefore the crash reporting interface must be very simple and apologetic.
Another design problem is that most people who have come from Windows XP or later, or Mac OS X 10.3 or later, will be used to crash reports that are confidential to Microsoft, Apple, or trusted ISVs. For Edgy, Ubuntu crash reports will not be like this: if someone reports a bug and attaches their crash report, anyone will be able to see it. Bug reports can be marked private after the fact, but the crash reporting interface itself must take some responsibility for discouraging leaking of sensitive information. (Mozilla.org can [http://www.mozilla.org/quality/qfa.html say that] "Sensitive data, such as passwords, Web sites visited, and e-mail addresses will not be collected". We can't guarantee the same, because our crash reporting system is package-agnostic and can't make assumptions about the data involved.)
Comparisons
[http://windowsdevcenter.com/pub/a/windows/2004/03/16/wer.html Windows Error Reporting under the covers]
[http://developer.apple.com/technotes/tn2004/tn2123.html Apple TN2123: CrashReporter]
[http://ramikayyali.com/archives/2005/07/26/krash Kool Krashing] (see also [http://amarok.kde.org/blog/archives/4-amaroK-1.2-It-Crashes-Somewhat-Less.html amaroK 1.2 - it crashes somewhat less])
[http://flickr.com/photos/jfpoole/143205824/ Adium Crash Reporter]
Edgy
For Edgy, it will not possible to report a bug without using Launchpad's Web interface. So we should (a) apologize for the error, (b) make it easy to report a bug if that would help, and (c) make it easy to reopen the program if appropriate.
The crash reporter should determine whether the crashed program generated a useful backtrace, then wait until three seconds have passed since the crash, and determine whether the crashed program is now running (meaning that it restarted automatically, that multiple copies were running, or that the user restarted it quickly).
|
There is a useful backtrace |
There is not a useful backtrace |
The program is not running |
attachment:edgy.jpg |
Same, but without the secondary text, and without the "Report a Bug…" button. |
The program is running |
attachment:edgy-no-reopen.jpg |
No alert at all. |
If a human-readable name can be found for the program (from a .desktop file), the primary text of the alert should be "Sorry, Name of Program closed unexpectedly." Otherwise, it should be "Sorry, the program “binary-name” closed unexpectedly."
The keyboard equivalent for the "Reopen" or "OK" button should be Enter, not a letter.
Clicking "Report a Bug…" should open both a Web browser to Ubuntu's Bugs page in Launchpad; and also a floating window near the top left corner of the screen, containing the bug information.
attachment:edgy-report.jpg
In the floating window, the icon should be draggable into the browser's filepicker to select that file, and the pathname should also be copyable text. The "What does the file contain?" expander should disclose a read-only text field containing the crash log as wrapped text.
Later Ubuntu versions
Eventually, crash reports should be stored in a separate database in LP (like [http://talkback-public.mozilla.org/search/start.jsp Mozilla Talkback] or [https://sodium.ubuntu.com/~jamesh/oops.cgi Launchpad Oops]). Crash victims should no longer fill out bug reports manually, because it's time-consuming, complicated, and usually not something they're interested in, and because [http://www.microsoft.com/whdc/maintain/WERHelp.mspx 80 percent of crashes come from 20 percent of the bugs].
This simplifies the initial crash alert a little ...
attachment:funky.jpg
... And it simplifies the resulting reporting interface a lot.
attachment:funky-report.jpg
The window should now be a normal window, not a floating window or a dialog. When the "Send" button is clicked, it should be made insensitive, and a progress animation should be shown in the bottom left corner of the window until the transmission succeeds or fails. If it fails, the progress animation should disappear, an error alert should be shown explaining the problem (for example, "The error report could not be sent because there is no Internet connection. Try again later."), and the "Send" button should be made available again. If the transmission succeeds, the window should disappear. If transmission fails, the report is kept for at most 7 days, so that users can still send it later by clicking on the bomb icon in the panel.
Implementation
Web user interface mockup
Start page and query form. This also shows the most recent crashes, top crashers first:
attachment:crashdb-query.jpg
The search form results are displayed in a list:
attachment:crashdb-result.jpg
Crash report details:
attachment:crashdb-details.jpg
Code
The crash database needs to offer an XML-RPC or HTTP POST interface for anonymous crash report submission (it might be possible to reuse the Malone cloakroom for this). apport uses this interface to send the report to the crash database.
Gnome's crash database has a fairly good duplicate finder even without full symbols in the stack trace; we need to do the same to avoid human work for duplicate elimination.
In a later stage of implementation, the crash database should automatically invoke apport-retrace for getting symbolic and useful stack traces for crash reports. However, this requires root access to a sandbox system to install the package where the crash occured in.
Access to the raw crash reports should be very limited, since they potentially contain sensitive information. Thus the web interface needs to ask for LP authentication, and limit acces to a trusted crash report triage team (initially, ubuntu-core-dev). Bug reports (without a core dump, just with textual data) can be created from crash reports for wider triaging and solving.
When the system works, we want to disable bug-buddy by default and use apport to intercept and report Gnome-related crashes, too.
Discussion with Sebastien revealed that email notifications about crashes are not requested. It is prefered to regularly check the crash database for new issues and provide good search options for good default filters.
Discussion
Telling [people] they will not receive a reply is awful, worse than awful even. One of the best things about open source is the fact that we have an open bug tracking system. -- CoreyBurger
It's unfortunate, but nowhere close to awful. This is about improving the quality of Ubuntu, not about giving support (compare [http://hendrix.mozilla.org/ hendrix.mozilla.org]). The openness of the bug tracking system is not relevant to this issue, especially as the user base becomes less geeky on average. -- mpt