AutomatedProblemReportsTagging

Differences between revisions 1 and 2
Revision 1 as of 2006-07-29 06:17:26
Size: 4464
Editor: c-68-33-112-13
Comment: creat()
Revision 2 as of 2006-07-29 07:25:28
Size: 6010
Editor: c-68-33-112-13
Comment: clean up.
Deletions are marked like this. Additions are marked like this.
Line 30: Line 30:
There are several heuristics we can use, including: The interface developers use to view AutomatedProblemReports would be capable of displaying all characteristics of a report and allowing developers to select a subset of these characteristics. This subset would then be cross-checked against all reports to generate a list of matching reports.

From a list of reports, developers should be able to review reports and tag them as being related to any other reports. For our purposes the only "relation" we care about is whether two reports are the same bug.

Reports marked as the same bug should always mark as the same bug as the earliest reported instance. The earliest instance known of any bug will be marked in the report as the being the same as "itself". This may complicate the marking process in some cases; but the searching process will be much faster because everything will point to the same bug, and thus listing reports that are "the same bug as this report" is simply looking for reports whose "the same bug as" field is the same as the current report's.

== Implementation ==
The stuff in Design needs to be implemented on top of the facilities of AutomatedProblemReports.

=== Code ===
We will need AutomaticProblemReports working first.

The server handling the AutomatedProblemReports must tag problems exhibiting known characteristics, allowing developers to sort and examine them.

=== Data preservation and migration ===

No issues exist.

== Unresolved issues ==

=== Characteristics to tag ===
There are several characteristics we can use to identify crashes, including:
Line 43: Line 64:
AutomatedProblemReports would be tagged with information such as this. Developers would later be able to use these tags as search criteria to find similar bugs. For example, pitti may select a crash and "search for similar crashes." He may then enter (through check boxes) various similarities to locate. Let's take the example of a `SIGSEGV` attempting to execute the stack; the below options may be available to him: There are several more I don't discuss here. We need a complete list.

Adding new characteristics later is alright because the only characteristics that help us are those heuristically detectable. It thus stands to reason that when a new characteristic is added, the system can rescan every report ever made and tag any matches.

=== Use of tags ===

A
utomatedProblemReports would be tagged with information such as the above. Developers would later be able to use these tags as search criteria to find similar bugs. For example, pitti may select a crash and "search for similar crashes." He may then enter (through check boxes) various similarities to locate. Let's take the example of a `SIGSEGV` attempting to execute the stack; the below options may be available to him:
Line 73: Line 100:
== Implementation ==


=== Code ===

We will need AutomaticProblemReports working first.

The server handling the AutomatedProblemReports must tag problems exhibiting known characteristics, allowing developers to sort and examine them.

=== Data preservation and migration ===

No issues exist.

== Unresolved issues ==

Summary

This spec describes a method for categorizing and sorting AutomatedProblemReports to help developers identify similar or identical bugs.

Rationale

Collecting AutomatedProblemReports can result in DrinkingFromTheFirehose due to the massive influx of problem reports. We must provide a way for developers to wade through the inundation without getting lost in sudden influxes of thousands of copies of the same problem.

Use cases

There are many use cases:

  • Rhythmbox spontaneously crashes. On six thousand machines.
  • Totem and Nautilus both crash, due to a bug in the gstreamer XviD plug-in being triggered during playback in Totem or thumbnailing in Nautilus.

In each of these cases the AutomatedProblemReports daemon would report a problem back to Ubuntu. On the server, these reports would be analyzed, categorized, and tagged using the resulting information.

Scope

The scope of this spec includes all problems reported by AutomatedProblemReports.

Design

We will need a crash handler and reporter, which is from AutomatedProblemReports.

The server handling the AutomatedProblemReports should tag problems exhibiting known characteristics to fall into certain categories.

The interface developers use to view AutomatedProblemReports would be capable of displaying all characteristics of a report and allowing developers to select a subset of these characteristics. This subset would then be cross-checked against all reports to generate a list of matching reports.

From a list of reports, developers should be able to review reports and tag them as being related to any other reports. For our purposes the only "relation" we care about is whether two reports are the same bug.

Reports marked as the same bug should always mark as the same bug as the earliest reported instance. The earliest instance known of any bug will be marked in the report as the being the same as "itself". This may complicate the marking process in some cases; but the searching process will be much faster because everything will point to the same bug, and thus listing reports that are "the same bug as this report" is simply looking for reports whose "the same bug as" field is the same as the current report's.

Implementation

The stuff in Design needs to be implemented on top of the facilities of AutomatedProblemReports.

Code

We will need AutomaticProblemReports working first.

The server handling the AutomatedProblemReports must tag problems exhibiting known characteristics, allowing developers to sort and examine them.

Data preservation and migration

No issues exist.

Unresolved issues

Characteristics to tag

There are several characteristics we can use to identify crashes, including:

  • Crash occurs at the same point
    • The same function
      • i.e. we found a stack smash because it called __stack_chk_fail(), the calling function was some_vuln_function()

    • The same module (library or program)
      • i.e. some_vuln_function() is from some_vuln_lib.so

  • list all crashes with SOME_NUMBER of common back trace from the crash point
    • i.e. we found a double-free(), determined some_stupid_function() called free(), and the last 3 up to there were good_function(), fine_function(), some_stupid_function().

  • Crashes fault on the same problem
    • i.e. SIGILL crash.

    • SIGSEGV detected attempting to execute unmapped memory

There are several more I don't discuss here. We need a complete list.

Adding new characteristics later is alright because the only characteristics that help us are those heuristically detectable. It thus stands to reason that when a new characteristic is added, the system can rescan every report ever made and tag any matches.

Use of tags

AutomatedProblemReports would be tagged with information such as the above. Developers would later be able to use these tags as search criteria to find similar bugs. For example, pitti may select a crash and "search for similar crashes." He may then enter (through check boxes) various similarities to locate. Let's take the example of a SIGSEGV attempting to execute the stack; the below options may be available to him:

  • Similar fault
    • SIGSEGV

      • Attempt to execute
        • Attempt to execute non-executable area
  • Same module
    • Previous executing function was very_broken_function(), so we assume we are in module very_broken_lib.so

      • The version was 0.1.19 (i.e. very_broken_lib.so.0.1.19)

  • Same program
    • Running /usr/bin/program_that_crashed

  • Similar backtrace
    • Fault occurred in very_broken_function()

    • Previous N calls (you enter a value for N)
  • Related (as manually tagged by the developers)
    • Yes
    • No
    • Unknown

The range of options is limitless; but some example searches include:

  • SIGSEGV in /usr/bin/program_that_crashed in module very_broken_lib.so (any version) apparently from very_broken_function()

    • Will not worry about matching any part of the tail of the backtrace, or that the attempt was to execute.
  • SIGSEGV on attempt to execute related to very_broken_lib.so.

  • Any fault in very_broken_lib.so.

This would prove to be a powerful tool for taking characteristics of an automatically detected problem and matching it with other problems.

The other problems listed may or may not be related. Developers would manually tag them as being the same bug; group them in the same group; and have quick and easy future reference to all the different reports.

BoF agenda and discussion

Comments


CategorySpec

AutomatedProblemReportsTagging (last edited 2008-08-06 16:36:12 by localhost)