Debugging

Differences between revisions 2 and 39 (spanning 37 versions)
Revision 2 as of 2007-10-18 23:12:49
Size: 16561
Editor: c-67-169-207-142
Comment: Initial writeup + some wikification
Revision 39 as of 2008-04-07 18:35:25
Size: 17355
Editor: c-67-168-235-241
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
||<tablestyle="float:right; font-size: 0.9em; background:#F1F1ED;margin: 0 0 1em 1em;" style="padding:2.0em;">'''Contents'''[[BR]][[TableOfContents]]||

= I. X and Ubuntu =
||<tablestyle="float:right; font-size: 0.9em; background:#F1F1ED;margin: 0 0 1em 1em;" style="padding:2.0em;">'''Contents'''[[BR]][[TableOfContents(2)]]||

= X and Ubuntu =
Line 16: Line 16:
folks a toolset for rendering these bugs easily solveable. By making
Ubuntu'
s X strong, we can help drive Open Source to world domination!


= II. Bug Reporting =
folks a toolset for rendering these bugs easily solvable.

Also see:
* ["X/Troubleshooting"] - Tips for analyzing and troubleshooting bugs


= Bug Reporting =
Line 27: Line 29:
=== Choosing a Good Title: === == Choosing a Good Title ==
Line 33: Line 35:
Examples:
  
BAD: Crazy screen issues on boot
  
BAD: Multiple problems with CD today
  
BAD: Randomly doesn't work
  
GOOD: [Feisty] Screen briefly corrupts during boot with -nv (NVidia 6100)
  
GOOD: [Hardy Alpha-3] Alt-CD (only) selected wrong driver (Matrox / BenQ FP91+)
  
GOOD: [Gutsy] Periodic crashes w/ high CPU on Dell Latitude D505 (-intel 855GM)
  
GOOD: [Dapper,Edgy] Wrong default refresh rates on 16:10 LCD panels
''' Examples: '''
|| '''
BAD''':|| Crazy screen issues on boot ||
|| '''
BAD''':|| Multiple problems with CD today ||
|| '''
BAD''':|| Not able to login or start X after updating ||
|| '''BAD''':||
Randomly doesn't work ||
|| '''
GOOD''':|| [Feisty] Screen briefly corrupts during boot with -nv (NVidia 6100) ||
|| '''
GOOD''':|| [Hardy Alpha-3] Alt-CD (only) selected wrong driver (Matrox / BenQ FP91+) ||
|| '''
GOOD''':|| [Gutsy] Periodic crashes w/ high CPU on Dell Latitude D505 (-intel 855GM) ||
|| '''
GOOD''':|| [Dapper,Edgy] Wrong default refresh rates on 16:10 LCD panels ||
|| '''GOOD''':|| After update to -intel 2.0-0ubuntu3, X fails with 'Invalid mode' error ||
Line 45: Line 48:
 DON'T: Assume "they must already know about this"
 DO: Look for existing bug reports that match your problem

 DON'T: Assume a "similar" bug is exactly what you're seeing
 DO: File a new bug, but mention the ID's of all bugs that sound
         similar. Someone can dupe them together later.

DON'T: Add "me too" responses. Wastes everyone's time.
 DO: Add missing data (photos, logs) to add to an existing bug's
         "knowledge base". Or if you just wish to be notified, then
         Subscribe yourself to the bug.

DON'T: Post bugs with only a brief description of the problem
 DO: Post relevant logs, config files, and data (see table 1)
         
ALWAYS ATTACH YOUR /var/log/Xorg.0.log!!

 DON'T: Assume "everyone" is seeing this same bug
 DO: Consider what is unique about your system

 DON'T: Assume others will "just know" how the bug occurs
 DO: Itemize the exact steps that result in the issue.
         Can you reproduce it at will?

DON'T: Fire and forget. Abandoned bugs rarely get fixed.
 DO: Follow up on your bug from time to time, even if it seems
        
ignored. Report if the issue goes away or remains when new
        
Ubuntu's come out.
||<tablestyle="width: 80%;" style="background-color: #FF8080;"> DON'T: || Assume "they must already know about this" ||
||<style="background-color: lightgreen;"> DO: || Look for existing bug reports that match your problem ||
||||||
||<style="background-color: #FF8080;"> DON'T: || Assume a "similar" bug is exactly what you're seeing ||
||<style="background-color: lightgreen;"> DO:  || File a new bug, but mention the ID's of all bugs that sound similar. Someone can dupe them together later. ||
||||||
||<style="background-color: #FF8080;"> DON'T: || Add "me too" responses. Wastes everyone's time. ||
||<style="background-color: lightgreen;"> DO: || Add missing data (photos, logs) to add to an existing bug's "knowledge base". Or if you just wish to be notified, then Subscribe yourself to the bug. ||
||||||
||<style="bac
kground-color: #FF8080;"> DON'T: || Post bugs with only a brief description of the problem ||
||<style="background-color: lightgreen;"> DO: || Post relevant logs, config files, and data (see table below) '''ALWAYS ATTACH YOUR /var/log/Xorg.0.log''' ||
||||||
||<style="background-color: #FF8080;"> DON'T: || Assume "everyone" is seeing this same bug ||
||<style="background-color: lightgreen;"> DO: || Consider what is unique about your system ||
||||||
||<style="background-color: #FF8080;"> DON'T: || Assume others will "just know" how the bug occurs ||
||<style="background-color: lightgreen;"> DO: || Itemize the exact steps that result in the issue. Can you reproduce it at will? ||
||||||
||<style="background-color: #FF8080;"> DON'T: || Fire and forget. Abandoned bugs rarely get fixed. ||
||<style="background-color: lightgreen;"> DO: || Follow up on your bug from time to time, even if it seems ignored. Report if the issue goes away or remains when new Ubuntu's come out. ||
Line 76: Line 72:
|| Problem class:  || Things to Include: ||
||
General X bug || * Description of problem ||
|| || * Paste in output of `lspci -nn | grep "VGA comp" ||
|| || *
Attach /etc/X11/xorg.conf ||
|| || * Attach /var/log/Xorg.0.log ||
|| || * Attach output of `lspci -vvnn` ||
|| Wrong resolutions, refresh rates, or monitor specs || * Resolution, rate, or other parameter expected ||
|| || * Resolutions, rates, or other parameters actually obtained ||
|| || * /etc/X11/xorg.conf ||
|| || * /var/log/Xorg.0.log ||
|| || * output of `lspci -vvnn` ||
|| || * output of `sudo ddcprobe` ||
|| || * output of `xrandr` ||
||  Wrong font dpi or size || * Are you running GNOME, KDE, XFCE, or ...? ||
|| || * Affected (and unaffected) applications ||
|| || * /var/log/Xorg.0.log ||
|| || * output of `sudo ddcprobe` ||
|| || * Screenshot showing font differences ||
|| X crash, lockup, freeze, exit, or doesn't start/shutdown || * Detailed description of problem ||
|| || * List any versions you tried that did not have this issue ||
|| || * Detailed list of steps to reproduce ||
|| || * How complete is the X failure? ||
|| ||
+ Does ctrl+alt+f1 take you to a console? ||
|| ||
+ Does ctrl+alt+backspace restart X? ||
|| ||
+ Does mouse pointer still move? ||
|| ||
+ Does the keyboard LED come on when hitting the CAPSLOCK key? ||
|| || * /etc/X11/xorg.conf ||
|| || * /var/log/Xorg.0.log ||
|| || * /var/log/Xorg.0.log.old ||
|| || * ~/.xsession-errors ||
|| || * output of `lspci -vvnn` ||
|| Keyboard, touchpad, and mouse issues || * Description of the problem ||
|| || * /var/log/Xorg.0.log ||
|| || * output of `xprop -root` ||
|| || * output of `gconftool-2 -R /desktop/gnome/peripherals` ||
|| Screen display corruption || * Photo of the screen ||
|| || * Description of the problem ||
|| || * Does it also occur if DRI is disabled? ||
|| || * /var/log/Xorg.0.log ||
|| Bad video playback || * /etc/X11/xorg.conf ||
|| || * /var/log/Xorg.0.log ||
|| || * output of `lspci -vvnn` ||


= III. Bug Triage =
|| '''Problem class:''' || '''Things to Include:''' ||
||<|5(^> '''
General X bug''' || Description of problem ||
|| Paste in output of {{{lspci -nn | grep VGA}}} ||
||
Attach /etc/X11/xorg.conf ||
|| Attach /var/log/Xorg.0.log ||
|| Attach output of `lspci -vvnn` ||
||<|7(^> '''Wrong resolutions, refresh rates, or monitor specs''' || Resolution, rate, or other parameter expected ||
|| Resolutions, rates, or other parameters actually obtained ||
|| /etc/X11/xorg.conf ||
|| /var/log/Xorg.0.log ||
|| output of `lspci -vvnn` ||
|| output of `sudo ddcprobe` ||
|| output of `xrandr` ||
||<|5(^> '''Wrong font dpi or size''' || Are you running GNOME, KDE, XFCE, or ...? ||
|| Affected (and unaffected) applications ||
|| /var/log/Xorg.0.log ||
|| output of `sudo ddcprobe` ||
|| Screenshot showing font differences ||
||<|11(^> '''X crash, lockup, freeze, exit, or doesn't start/shutdown''' || Detailed description of problem ||
|| List any versions you tried that did not have this issue ||
|| Detailed list of steps to reproduce ||
|| How complete is the X failure? [[BR]] + Does ctrl+alt+f1 take you to a console? [[BR]] + Does ctrl+alt+backspace restart X? [[BR]] + Does mouse pointer still move? [[BR]] + Does the keyboard LED come on when hitting the CAPSLOCK key? ||
|| /etc/X11/xorg.conf ||
|| /var/log/Xorg.0.log ||
|| /var/log/Xorg.0.log.old ||
|| ~/.xsession-errors ||
|| output of `lspci -vvnn` ||
|| output of `cat /proc/acpi/video/*/DOS` ||
|| output of `sudo cat /proc/acpi/dsdt` ||
||<|4(^> '''
Keyboard, touchpad, and mouse issues''' || Description of the problem ||
|| /var/log/Xorg.0.log ||
|| output of `xprop -root` ||
|| output of `gconftool-2 -R /desktop/gnome/peripherals` ||
||<|4(^> '''Screen display corruption''' || Photo of the screen ||
|| Description of the problem ||
|| Does it also occur if DRI is disabled? ||
|| /var/log/Xorg.0.log ||
||<|3(^> '''Bad video playback''' || /etc/X11/xorg.conf ||
|| /var/log/Xorg.0.log ||
|| output of `lspci -vvnn` ||


= Bug Triage =
Line 138: Line 132:
 1. Is it definitely an X bug? Sometimes things get misfiled, and
     sometimes reports are just invalid, or are really just support
     requests and should be directed to Launchpad Answers instead. If
     unsure, leave it as is.

 2. Is it clearly a dupe of an already known bug? Ideally reporters
     should do a cursory scan of existing bug reports to see if it's
     obviously already in the system, but not all reporters do. If
     unsure, don't dupe it - someone can handle this later.

 3. Is there at least the basic minimum amount of data present? If
     not, mark it invalid and see below for a table of what kinds of
     files and command output is needed. Once you think the basic
     required info is present, move it to the Confirmed state.
      
 4. Review log files for error messages or other obvious anomalies.
     Highlight these in the bug report, and search launchpad for other
     reports of that same error message. Mention these as potential
     dupes, or dupe them where obvious.

 5. Tidy up the bug report. This may involve improving the bug's
     title or wordsmithing the description to clarify it.

Since step 3 requires waiting on replies from bug reporters, you can't
go through all five of the above steps in one bug triaging session.
Instead, when doing bulk triage work, you can consider dividing the
workflow into two types of sessions:

INITIAL TRIAGE: (steps 1->2->3)
 * Do a query for NEW bugs.
 * For each bug, review according to steps 1, 2, and 3 above
 * Post a request for more information and set to INCOMPLETE

FINAL TRIAGE: (step 3->4->5)
 * Do a query for INCOMPLETE-WITH-RESPONSE bugs.
 * For each bug, read the reporter's reply and information posted
 * If still insufficient info, ask for more info and leave INCOMPLETE
 * Otherwise, do steps 4 and 5, and mark CONFIRMED

Once this basic triage work is in place, a reviewer (generally a
developer or official bug master) reviews CONFIRMED bugs and just
doublechecks that all the necessary stuff has been done. They then set
the bug to TRIAGED state.


= IV. Bug Research =

For many bugs, a little googling and searching in upstream bug trackers
can reveal important additional info.

 1. Review all attached log files for error messages.

 2. Look for other similar/duplicate bug reports to gain additional
     perspectives and look for obvious commonalities, like same error
     messages, driver, hardware, etc. Places to search:
== INITIAL TRIAGE: ==
Do a query for NEW bugs, and for each bug try to move its state to Incomplete, Invalid, etc.:
1. '''Is it definitely an X bug?''' Sometimes things get misfiled, and sometimes reports are just invalid, or are really just support requests and should be directed to '''Launchpad Answers''' instead. If unsure, leave it as is.

2. '''Is it clearly a dupe of an already known bug?''' Ideally reporters should do a cursory scan of existing bug reports to see if it's obviously already in the system, but not all reporters do. If unsure, don't dupe it - someone can handle this later.

3. '''Is the title descriptive enough?''' Watch out for generic titles like "Randomly crashes", "X won't start", or "Corrupted graphics after suspend/resume" because these are just common symptoms and will tend to accumulate me too reports from people with the same symptom but actually a different bug.

4. '''Is there at least the basic minimum amount of data present?''' If not, mark it '''Incomplete''' and see below for a table of what kinds of files and command output is needed. Once you think the basic required info is present, move it to the Confirmed state.

== FINAL TRIAGE: ==
Do a query for INCOMPLETE-WITH-RESPONSE bugs, and for each bug try to move its state to Completed or etc.:
5. '''Take action if now obvious, or ask for additional information.''' Often by this point, additional information has come to light indicating the bug is resolved, a dupe, or invalid, so you can just set the state accordingly at this point.

6. '''Handle 'me-too' replies.''' Often, additional users will report that they too have the "same" problem, yet don't give evidence to that fact so it's hard to say. As a general rule it is preferred for users to report bugs independently unless they present evidence that it is the same (e.g. identical error messages, steps to reproduce, hardware, etc.) In particular, for third party confirmation of an issue it's best for them to post evidence (log files, screenshots, etc.) that they're seeing the same thing. Often the original information request was not answered by the original reporter, but another user has answered in their place - in this case be extremely careful that the second reporter has exactly the same issue, and is not simply piggybacking on a report that 'sounds roughly similar'; gently encourage them to file a new report on their issue if it's not the same.

7. '''Review log files for error messages or other obvious anomalies.''' Highlight these in the bug report, and search launchpad for other reports of that same error message. Mention these as potential dupes, or dupe them where obvious.

8. '''Tidy up the bug report.''' This may involve improving the bug's title or wordsmithing the description to clarify it. Also, make sure it has an Importance assigned to it (see below).

Also do a query for INCOMPLETE-WITHOUT-RESPONSE. In some cases, the bugs actually do have a response, so the above procedure can be used. In most cases, we're still waiting on a response. If a long period (say, 60 days) has passed since the first unanswered request, the bug can be closed as out of date, usually with a request to please test against the latest development version of Ubuntu, and reopen the bug with the requested info if the problem still exists.

== Marking Bugs Ready to be Upstreamed ==

Often bugs are best solved upstream, so many of our bugs should be filed upstream with them. However, it's important to ensure we send them only quality reports that have a high chance of being solved. It's our job to filter out reports that lack information or otherwise are inappropriate for upstream. This work can be divided into two steps: Marking bugs for upstreaming, and filing the upstream reports.

A bug is ready for upstreaming if it meets all the following criteria:
* The original reporter (or a secondary reporter who has proven to have *exactly* the same issue) is active on the Launchpad bug and can follow up on requests
* The issue has be verified using upstream's git-head of xserver and/or the relevant driver
* The issue is not Ubuntu-specific, and not likely to be a kernel issue
* The bug has complete set of log files, backtraces with debug symbols (if its a crash), config files, screenshots, etc. as appropriate.
* The bug has a solid set of steps to reproduce it on demand (if it occurs randomly or intermittently, upstream won't accept it.)

To mark a bug as ready for upstream, click "Also affects project", then leave the URL empty and click "Add to bug report", and "Confirm". This will leave a blank upstream task to search on later.

== Reporting X issues upstream ==

To get issues addressed effectively, its important to provide complete information and clean, high-quality reports them. Indeed, much of the above bug work is geared with the objective of gathering sufficient information that upstream will be able to deal effectively with the problem.

Do a query on "Bugs that need reported upstream". This searches for bugs with an empty project task. For each bug, do a search to see if the same bug is already reported upstream. Take care though - many times a bug *looks* similar but isn't; if there is *any* doubt, file a new bug and just mention the possible dupe, and let the experts decide.

Here are some guidelines for getting best results from bugs upstreamed to bugzilla.freedesktop.org:

 * Always ask the original reporter to join the discussion at bugzilla, and also subscribe them to the bug report after filing it.
 * Focus each bug report on a single issue. Even if the Launchpad bug has multiple "me too" comments, only focus the upstream bug on one of these.
 * Attach complete logs, config files, screenshots, photos, and all other collected evidence
 * Be prepared to follow through on additional information requests, testing, and so on
 * Include a prominent link back to the launchpad bug (not only for upstream; this gives you a convenient back-link to the original bug).


== Bug Importance ==

Bug "Importance" is not the same thing as Development Priority. Importance is an indication of the severity of the issue, not an indication of when it will be fixed (although a bug's importance is a factor to consider when prioritizing).

'''Low''': These bugs are merely cosmetic or make things inconvenient, or occur only rarely.

'''Medium''': Most bugs are medium importance. They hamper use of the system in some fashion, sometimes requiring an inconvenient workaround or other unusual steps (like disabling hardware or software features or reverting to older versions) to get around it.

'''High''': These are serious bugs that are preventing users from using the system, either with no known workaround or an extremely cumbersome one.

'''Critical''': This importance level is not often used, and is saved for widespread catastrophic failures, like X failing to start for all Ubuntu users.

A bug that affects a lot of users may deserve one bump up from where it would be otherwise. A bug which was not well reported, that can't be reproduced, or that only occurs in obscure situations may have its importance bumped down one step.

The bug triager should make an attempt to set an appropriate importance, but don't worry about getting it perfect; it can be adjusted later.

== Bug Priority ==

The priority for a bug is determined by the developers themselves, based on a variety of factors, and so the bug triager does not need to do anything with regard to priority usually.

One factor that can drive a bug to a high priority is if there is a known, tested fix for it, that simply needs integrated into the development version of Ubuntu.

There are two ways priority is indicated in Launchpad:

1. '''Milestones''': Bugs that are assigned to a milestone will gain priority attention during that development cycle. Do not milestone bugs until after they've been fully triaged and have all necessary information available to troubleshoot them.

2. '''Assignees''': Bugs that are assigned to a particular individual will be priorities for them to work on. Generally, ask before assigning bugs to a developer, unless you're that developer's manager.


= Bug Research =

For many bugs, a little googling and searching in upstream bug trackers can reveal important additional info.

1. Review all attached log files for error messages.

2. Look for other similar/duplicate bug reports to gain additional perspectives and look for obvious commonalities, like same error messages, driver, hardware, etc. Places to search:
Line 200: Line 225:
     If you find the same issue reported in Launchpad, mark the less
    
complete and/or newer bug as a dupe of the other.

     If you find the same issue reported in debian or xorg, mark it as       "Also affects project..."

     With google, it often helps to include "ubuntu" in the search
    
string. Also, you can use "site:freedesktop.org" or
    
"site:debian.org" to narrow the search to a specific domain.

 3. Try reproducing the issue, especially if you have similar
    
hardware.

 4. Look for a newer version of the package, and review its changes to
    
see if there's a fix for this issue.

     If so, check `apt-get update; apt-cache madison $pkgname` to see
    
if the new version is already packaged. If not, ask a packager to
    
produce a test package of the new release to test for this bug.

 5. Upload any patches you run across directly to Launchpad, and be
    
sure to tick the "patch" checkbox, so patches can be queried for
     later.

 
6. Have them try an older Ubuntu Live CD, or have them downgrad a
    
specific package. For example, to downgrade the xserver:
 If you find the same issue reported in Launchpad, mark the less complete and/or newer bug as a dupe of the other.

 If you find the same issue reported in debian or xorg, mark it as "Also affects project..."

 With Google, it often helps to include "ubuntu" in the search string. Also, you can use "site:freedesktop.org" or "site:debian.org" to narrow the search to a specific domain.

3. Try reproducing the issue, especially if you have similar hardware.

4. Look for a newer version of the package, and review its changes to see if there's a fix for this issue.

 If so, check `apt-get update; apt-cache madison $pkgname` to see if the new version is already packaged. If not, ask a packager to produce a test package of the new release to test for this bug.

5. Upload any patches you run across directly to Launchpad, and be sure to tick the "patch" checkbox, so patches can be queried for later.

6. Have them try an older Ubuntu Live CD, or have them downgrade a specific package. For example, to downgrade the xserver:

{{{
Line 228: Line 243:

     If an older version fixes the issue, then possibly you can bisect
     things down to find a specific patch causing the issue. See the
     Analysis section for how to do this.

 7. Unless you've been lucky and found the fix already, finish up
     the research phase by doing the following:

     * Summarize your findings. Restate the problem, describe progress
       made, outline remaining suspicions or questions.

     * If appropriate, report the bug upstream to Debian and/or Xorg,
       attaching all relevant files and a link to the Ubuntu bug
       report. Summarize the research you did, patches that were
       tested, and any other details that may be relevant.



= V. Analyzing X Problems =

For hard bugs the analysis phase is the most important, and most
challenging part of bug work. It often requires both strength of
insight and skill with code.

Depending on how the bug is behaving, there are multiple directions to
investigate the issue. Here's some different approaches:


=== Problem manifested only recently ===

If the issue has been narrowed to occur only after (or before) a given
point in time or software version, then it is possible to narrow in on
the specific cause of the issue through a "Bi-Section" strategy.

Essentially, if you know it occurred in Version 1, but not Version 8,
have a person able to replicate the issue try Version 4. If it's there,
then have them try Version 6, otherwise Version 2.

If the problem is in the current Ubuntu, but not in the prior Ubuntu, it
can be useful to have them test the intermediate Alpha versions of the
new release.

Once you have bracketed it down to a specific version of something, you
can then go through the individual patches included in that version
compared with the prior one. Sometimes the patch descriptions can give
a strong clue to this. If there are a number of changes, then rather
than trying each patch one-by-one you may want to simply disable the
latter half of patches, and bisect that way.

If you've narrowed it to an upstream version change, then you may wish
to use git's bisecting functionality to assist with this.


=== Problem manifests only with specific configuration options ===

TODO


=== Problem manifests only with a particular driver ===

If the research found that most people with this problem were all using
the same driver, then obviously it makes sense to explore it from that
aspect.

Note that for most graphics hardware, there are at least two different
drivers. It can be worthwhile to test the alternate driver to verify
it's a driver issue.

 * NVidia: -nv (open) and -nvidia (proprietary)
 * ATI: -ati (open) and -fglrx (proprietary)
 * Intel: -intel (current) and -i810 (legacy).

Each driver has its own source code package, which can be retrieved via
xserver-xorg-video-<driver>. The open source drivers also have git
repositories at http://gitweb.freedesktop.org.

Resolving these issues will generally require patching the driver code,
although some driver-specific issues end up requiring changes to other
pieces of code, like the xserver.


Problem manifests only with particular kind of hardware

Many issues are highly specific to a particular kind of hardware, such
as only Intel 855, or only a particular monitor model. Sometimes these
end up being general bugs, but often they require adding
hardware-specific quirks to the driver or to xserver.

TODO: Process for adding these changes


=== Problem manifests under seemingly random conditions ===

Few bugs are truly random; usually this just means more data is needed.
Things to consider:

 * Resource utilization over time
 * Specific to one piece of hardware? If so, is that HW faulty?
 * How is the system being used when it fault occurs? If it's idle,
   could it be a screensaver, power savings, or something?
 * Fluctuating Network/power conditions?


=== Problem manifests itself during video playback ===

TODO


=== Problem manifests itself when using 3D software (compiz, games, GL...) ===

TODO


=== Problem manifests as a performance degredation issue ===

TODO


=== Problem results in screen display corruption ===

Nearly all screen corruption issues will be due to a bug in a driver.
Identify the driver and the specific steps to produce the corruption.
Then run the xserver through gdb to identify the line or lines
immediately prior to the corruption.

From here, things to try could include checking for invalid/undefined
values, adding usleep() calls to add delay, or even disabling the lines
in question.

Once a preliminary patch exists, post it to the upstream xorg list for
feedback. Often they can suggest a better patch.


=== Problem results in X crash, lockup, freeze, or exit ===

In some cases, an error message will be printed before the fault; these
can be used to identify where in the codebase the fault occurred, and
often give an explanation as to why.

Otherwise, use gdb to get a backtrace. Once the issue is found, step
through the code leading up to the line where the fault occurred.
Look for invalid/undefined values, or questionable logic. Try disabling
the line or lines where the fault occurred, adding usleep() before it,
or etc.

Once a preliminary patch exists, post it to the upstream xorg list for
feedback. Often they can suggest a better patch.


=== Problem involves wrong resolutions, refresh rates, or monitor specs ===

These issues can be narrowed down by checking a few things:

 * Does xorg.conf have the correct values? If so, then something is
   wrong in how xserver is interpreting them. Review Xorg.0.log.

 * Is the hardware new? If so, is it's pciid registered properly?

 * xresprobe - Is ddcprobe outputting the right parameters? Is
   xresprobe selecting the correct one from this set?


=== Problem involves wrong font dpi or size ===

TODO


=== Problem involves buggy EDID from monitor ===

If the monitor is clearly advertising an incorrect mode (such as not
advertising a preferred mode), a quirk can be added to the xserver to
prefer a specific mode.

TODO: What's the code change to do this?


=== Problem involves startup/shutdown, hibernate, suspend, or tty switch ===

TODO


=== Problem involves missing support for some keyboard keys ===

TODO


=== Problem involves missing support for mouse or touchpad functions ===

TODO


=== Problem involves GUI application that crashes with an X error message ===

TODO


=== Debugging Memory Issues ===

top
xrestop

Limiting the ram X uses to 80% to prevent memory leaks
via ulimit -m in the X startup script
}}}

 If an older version fixes the issue, then possibly you can bisect things down to find a specific patch causing the issue. See the Analysis section for how to do this.

7. Unless you've been lucky and found the fix already, finish up the research phase by doing the following:

 * Summarize your findings. Restate the problem, describe progress made, outline remaining suspicions or questions.

 * If appropriate, report the bug upstream to Debian and/or Xorg, attaching all relevant files and a link to the Ubuntu bug report. Summarize the research you did, patches that were tested, and any other details that may be relevant.

X and Ubuntu

The X Windows System is a critical component in the Ubuntu operating system. X is not without its bugs, but fortunately debugging X issues is not rocket science.

The vast majority of Ubuntu X issues fall into one of several distinct categories, and based on the way they manifest, there are several different tactics that can be employed in a nearly paint-by-numbers fashion to isolate them.

Even non-developers can help! The goal of this handbook is to give folks a toolset for rendering these bugs easily solvable.

Also see:

  • ["X/Troubleshooting"] - Tips for analyzing and troubleshooting bugs

Bug Reporting

The lifecycle of a bug report begins, unsurprisingly, with the preliminary report. How a bug is initially reported can have a huge effect on how it's handled and how quickly it gets fixed.

Choosing a Good Title

Your title should communicate two things: The symptom you're seeing, and whatever is unique or unusual about your system. Otherwise, your bug may not get proper attention.

Examples:

BAD:

Crazy screen issues on boot

BAD:

Multiple problems with CD today

BAD:

Not able to login or start X after updating

BAD:

Randomly doesn't work

GOOD:

[Feisty] Screen briefly corrupts during boot with -nv (NVidia 6100)

GOOD:

[Hardy Alpha-3] Alt-CD (only) selected wrong driver (Matrox / BenQ FP91+)

GOOD:

[Gutsy] Periodic crashes w/ high CPU on Dell Latitude D505 (-intel 855GM)

GOOD:

[Dapper,Edgy] Wrong default refresh rates on 16:10 LCD panels

GOOD:

After update to -intel 2.0-0ubuntu3, X fails with 'Invalid mode' error

Do's and Don't's

DON'T:

Assume "they must already know about this"

DO:

Look for existing bug reports that match your problem

DON'T:

Assume a "similar" bug is exactly what you're seeing

DO:

File a new bug, but mention the ID's of all bugs that sound similar. Someone can dupe them together later.

DON'T:

Add "me too" responses. Wastes everyone's time.

DO:

Add missing data (photos, logs) to add to an existing bug's "knowledge base". Or if you just wish to be notified, then Subscribe yourself to the bug.

DON'T:

Post bugs with only a brief description of the problem

DO:

Post relevant logs, config files, and data (see table below) ALWAYS ATTACH YOUR /var/log/Xorg.0.log

DON'T:

Assume "everyone" is seeing this same bug

DO:

Consider what is unique about your system

DON'T:

Assume others will "just know" how the bug occurs

DO:

Itemize the exact steps that result in the issue. Can you reproduce it at will?

DON'T:

Fire and forget. Abandoned bugs rarely get fixed.

DO:

Follow up on your bug from time to time, even if it seems ignored. Report if the issue goes away or remains when new Ubuntu's come out.

What to Include in Bug Reports

Problem class:

Things to Include:

General X bug

Description of problem

Paste in output of lspci -nn | grep VGA

Attach /etc/X11/xorg.conf

Attach /var/log/Xorg.0.log

Attach output of lspci -vvnn

Wrong resolutions, refresh rates, or monitor specs

Resolution, rate, or other parameter expected

Resolutions, rates, or other parameters actually obtained

/etc/X11/xorg.conf

/var/log/Xorg.0.log

output of lspci -vvnn

output of sudo ddcprobe

output of xrandr

Wrong font dpi or size

Are you running GNOME, KDE, XFCE, or ...?

Affected (and unaffected) applications

/var/log/Xorg.0.log

output of sudo ddcprobe

Screenshot showing font differences

X crash, lockup, freeze, exit, or doesn't start/shutdown

Detailed description of problem

List any versions you tried that did not have this issue

Detailed list of steps to reproduce

How complete is the X failure? BR + Does ctrl+alt+f1 take you to a console? BR + Does ctrl+alt+backspace restart X? BR + Does mouse pointer still move? BR + Does the keyboard LED come on when hitting the CAPSLOCK key?

/etc/X11/xorg.conf

/var/log/Xorg.0.log

/var/log/Xorg.0.log.old

~/.xsession-errors

output of lspci -vvnn

output of cat /proc/acpi/video/*/DOS

output of sudo cat /proc/acpi/dsdt

Keyboard, touchpad, and mouse issues

Description of the problem

/var/log/Xorg.0.log

output of xprop -root

output of gconftool-2 -R /desktop/gnome/peripherals

Screen display corruption

Photo of the screen

Description of the problem

Does it also occur if DRI is disabled?

/var/log/Xorg.0.log

Bad video playback

/etc/X11/xorg.conf

/var/log/Xorg.0.log

output of lspci -vvnn

Bug Triage

Ubuntu receives a huge amount of bug reports, many of which are important and valid issues needing attention. Even so, nearly all X bugs are initially reported without information necessary for classification and analysis. This is where the bug triaging role comes in.

Bug triaging for Ubuntu's Xorg components does not require any particular expertise with X, just regular Linux know-how should be sufficient. As a bug triager, your role is twofold: First as a coach to help bug reporters in maximizing their chances of getting the bug addressed by providing complete information, and second as a filter to help developers focus their time on important and/or easy-to-fix bugs.

After initially reported, a bug is reviewed and several basic things are checked by the bug triager:

INITIAL TRIAGE:

Do a query for NEW bugs, and for each bug try to move its state to Incomplete, Invalid, etc.: 1. Is it definitely an X bug? Sometimes things get misfiled, and sometimes reports are just invalid, or are really just support requests and should be directed to Launchpad Answers instead. If unsure, leave it as is.

2. Is it clearly a dupe of an already known bug? Ideally reporters should do a cursory scan of existing bug reports to see if it's obviously already in the system, but not all reporters do. If unsure, don't dupe it - someone can handle this later.

3. Is the title descriptive enough? Watch out for generic titles like "Randomly crashes", "X won't start", or "Corrupted graphics after suspend/resume" because these are just common symptoms and will tend to accumulate me too reports from people with the same symptom but actually a different bug.

4. Is there at least the basic minimum amount of data present? If not, mark it Incomplete and see below for a table of what kinds of files and command output is needed. Once you think the basic required info is present, move it to the Confirmed state.

FINAL TRIAGE:

Do a query for INCOMPLETE-WITH-RESPONSE bugs, and for each bug try to move its state to Completed or etc.: 5. Take action if now obvious, or ask for additional information. Often by this point, additional information has come to light indicating the bug is resolved, a dupe, or invalid, so you can just set the state accordingly at this point.

6. Handle 'me-too' replies. Often, additional users will report that they too have the "same" problem, yet don't give evidence to that fact so it's hard to say. As a general rule it is preferred for users to report bugs independently unless they present evidence that it is the same (e.g. identical error messages, steps to reproduce, hardware, etc.) In particular, for third party confirmation of an issue it's best for them to post evidence (log files, screenshots, etc.) that they're seeing the same thing. Often the original information request was not answered by the original reporter, but another user has answered in their place - in this case be extremely careful that the second reporter has exactly the same issue, and is not simply piggybacking on a report that 'sounds roughly similar'; gently encourage them to file a new report on their issue if it's not the same.

7. Review log files for error messages or other obvious anomalies. Highlight these in the bug report, and search launchpad for other reports of that same error message. Mention these as potential dupes, or dupe them where obvious.

8. Tidy up the bug report. This may involve improving the bug's title or wordsmithing the description to clarify it. Also, make sure it has an Importance assigned to it (see below).

Also do a query for INCOMPLETE-WITHOUT-RESPONSE. In some cases, the bugs actually do have a response, so the above procedure can be used. In most cases, we're still waiting on a response. If a long period (say, 60 days) has passed since the first unanswered request, the bug can be closed as out of date, usually with a request to please test against the latest development version of Ubuntu, and reopen the bug with the requested info if the problem still exists.

Marking Bugs Ready to be Upstreamed

Often bugs are best solved upstream, so many of our bugs should be filed upstream with them. However, it's important to ensure we send them only quality reports that have a high chance of being solved. It's our job to filter out reports that lack information or otherwise are inappropriate for upstream. This work can be divided into two steps: Marking bugs for upstreaming, and filing the upstream reports.

A bug is ready for upstreaming if it meets all the following criteria: * The original reporter (or a secondary reporter who has proven to have *exactly* the same issue) is active on the Launchpad bug and can follow up on requests * The issue has be verified using upstream's git-head of xserver and/or the relevant driver * The issue is not Ubuntu-specific, and not likely to be a kernel issue * The bug has complete set of log files, backtraces with debug symbols (if its a crash), config files, screenshots, etc. as appropriate. * The bug has a solid set of steps to reproduce it on demand (if it occurs randomly or intermittently, upstream won't accept it.)

To mark a bug as ready for upstream, click "Also affects project", then leave the URL empty and click "Add to bug report", and "Confirm". This will leave a blank upstream task to search on later.

Reporting X issues upstream

To get issues addressed effectively, its important to provide complete information and clean, high-quality reports them. Indeed, much of the above bug work is geared with the objective of gathering sufficient information that upstream will be able to deal effectively with the problem.

Do a query on "Bugs that need reported upstream". This searches for bugs with an empty project task. For each bug, do a search to see if the same bug is already reported upstream. Take care though - many times a bug *looks* similar but isn't; if there is *any* doubt, file a new bug and just mention the possible dupe, and let the experts decide.

Here are some guidelines for getting best results from bugs upstreamed to bugzilla.freedesktop.org:

  • Always ask the original reporter to join the discussion at bugzilla, and also subscribe them to the bug report after filing it.
  • Focus each bug report on a single issue. Even if the Launchpad bug has multiple "me too" comments, only focus the upstream bug on one of these.
  • Attach complete logs, config files, screenshots, photos, and all other collected evidence
  • Be prepared to follow through on additional information requests, testing, and so on
  • Include a prominent link back to the launchpad bug (not only for upstream; this gives you a convenient back-link to the original bug).

Bug Importance

Bug "Importance" is not the same thing as Development Priority. Importance is an indication of the severity of the issue, not an indication of when it will be fixed (although a bug's importance is a factor to consider when prioritizing).

Low: These bugs are merely cosmetic or make things inconvenient, or occur only rarely.

Medium: Most bugs are medium importance. They hamper use of the system in some fashion, sometimes requiring an inconvenient workaround or other unusual steps (like disabling hardware or software features or reverting to older versions) to get around it.

High: These are serious bugs that are preventing users from using the system, either with no known workaround or an extremely cumbersome one.

Critical: This importance level is not often used, and is saved for widespread catastrophic failures, like X failing to start for all Ubuntu users.

A bug that affects a lot of users may deserve one bump up from where it would be otherwise. A bug which was not well reported, that can't be reproduced, or that only occurs in obscure situations may have its importance bumped down one step.

The bug triager should make an attempt to set an appropriate importance, but don't worry about getting it perfect; it can be adjusted later.

Bug Priority

The priority for a bug is determined by the developers themselves, based on a variety of factors, and so the bug triager does not need to do anything with regard to priority usually.

One factor that can drive a bug to a high priority is if there is a known, tested fix for it, that simply needs integrated into the development version of Ubuntu.

There are two ways priority is indicated in Launchpad:

1. Milestones: Bugs that are assigned to a milestone will gain priority attention during that development cycle. Do not milestone bugs until after they've been fully triaged and have all necessary information available to troubleshoot them.

2. Assignees: Bugs that are assigned to a particular individual will be priorities for them to work on. Generally, ask before assigning bugs to a developer, unless you're that developer's manager.

Bug Research

For many bugs, a little googling and searching in upstream bug trackers can reveal important additional info.

1. Review all attached log files for error messages.

2. Look for other similar/duplicate bug reports to gain additional perspectives and look for obvious commonalities, like same error messages, driver, hardware, etc. Places to search:

  • google.com
  • bugs.launchpad.net/ubuntu
  • bugs.debian.org/
  • bugs.freedesktop.org/
  • ubuntuforums.org
  • If you find the same issue reported in Launchpad, mark the less complete and/or newer bug as a dupe of the other. If you find the same issue reported in debian or xorg, mark it as "Also affects project..." With Google, it often helps to include "ubuntu" in the search string. Also, you can use "site:freedesktop.org" or "site:debian.org" to narrow the search to a specific domain.

3. Try reproducing the issue, especially if you have similar hardware.

4. Look for a newer version of the package, and review its changes to see if there's a fix for this issue.

  • If so, check apt-get update; apt-cache madison $pkgname to see if the new version is already packaged. If not, ask a packager to produce a test package of the new release to test for this bug.

5. Upload any patches you run across directly to Launchpad, and be sure to tick the "patch" checkbox, so patches can be queried for later.

6. Have them try an older Ubuntu Live CD, or have them downgrade a specific package. For example, to downgrade the xserver:

         apt-get install xserver-xorg-core=2:1.3.0.0.dfsg-4ubuntu
  • If an older version fixes the issue, then possibly you can bisect things down to find a specific patch causing the issue. See the Analysis section for how to do this.

7. Unless you've been lucky and found the fix already, finish up the research phase by doing the following:

  • Summarize your findings. Restate the problem, describe progress made, outline remaining suspicions or questions.
  • If appropriate, report the bug upstream to Debian and/or Xorg, attaching all relevant files and a link to the Ubuntu bug report. Summarize the research you did, patches that were tested, and any other details that may be relevant.

X/Debugging (last edited 2016-01-10 22:13:08 by penalvch)