CrashdumpRecipe

Differences between revisions 2 and 15 (spanning 13 versions)
Revision 2 as of 2010-07-26 17:44:56
Size: 4838
Editor: x1-6-00-0e-a6-3f-d7-8d
Comment: added 10.04 section
Revision 15 as of 2015-01-21 15:29:59
Size: 6287
Editor: 63
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
"The LKCD (Linux Kernel Crash Dump) project is a set of kernel patches and utilities to allow a copy of the kernel memory to be saved in the event of a kernel panic. The saved kernel image makes forensics on the kernel panic possible with utilities included in the package. Most commercial Unix operating systems come with similar crash utilities, but this package is fairly new to Linux and has to be added on manually. The LKCD utility is not designed to gather helpful information in the case of a hardware caused panic or a segment violation. The complete LKCD package is available for download at http://lkcd.sourceforge.net/."
Line 4: Line 3:
For convenience, the kernel crash dump utility has been packaged in Ubuntu. It can be installed with the following command: ||<tablestyle="float:right; font-size: 0.9em; width:40%; background:#F1F1ED; margin: 0 0 1em 1em;" style="padding:0.5em;"><<TableOfContents>>||

= Introduction =

The Ubuntu Kernel Crash Dump is a mechanism that enable enterprise style post-mortem crash analysis in Linux operating systems. It uses a special mode of kexec which allows to automatically boot a secondary kernel whenever a crash (Oops/panic) occurs. This secondary kernel will then save the state and memory of the primary kernel to a certain location of the filesystem (''/var/crash'' on newer releases). This file can then be used by '''crash''' to gather detailed information about the problem.

= Installation =

For convenience, the kernel crash dump utility has been packaged in Ubuntu. It can be installed with the following command: {{{
sudo apt-get install linux-crashdump }}}

Newer versions of the package will automatically add an entry ''crashkernel=384M-2G:64M,2G-:128M'' to the kernel commandline in grub. However this may cause problems on systems with less than 2G of memory (see [[#Troubleshooting|troubleshooting]]).

= Verifying linux-crashdump installation =

For Trusty, please see [[https://help.ubuntu.com/lts/serverguide/kernel-crash-dump.html|here]].

= Inspecting the crash dump using crash =

In order to use the generated crash dump with '''crash''' one needs the ''vmlinux'' file which has the debugging information. This is part of the kernel ddeb package which can be found at:

[[http://ddebs.ubuntu.com/pool/main/l/linux/]]
Line 7: Line 27:
 apt-get install linux-crashdump sudo tee /etc/apt/sources.list.d/ddebs.list << EOF
deb http://ddebs.ubuntu.com/ $(lsb_release -cs) main restricted universe multiverse
deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-security main restricted universe multiverse
deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-proposed main restricted universe multiverse
EOF

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys ECDCAD72428D7C01
sudo apt-get update
sudo apt-get install linux-image-$(uname -r)-dbgsym
Line 10: Line 39:
/!\ Be aware that those packages are huge! (~600 MB)
Line 11: Line 41:
&nbsp;
== Ubuntu 10.04 "Lucid Lynx" ==
When installed, the debug kernel can be found under ''/usr/lib/debug/boot/'' and '''crash''' is started by: {{{
crash <debug kernel> <crash dump> }}}
Line 14: Line 44:
Same as 9.10, except note that `apport-retrace` is broken - see:
 * [[https://bugs.launchpad.net/ubuntu/+source/apport/+bug/533565|Bug #533565 in apport (Ubuntu): "Strings missing from the apport template"]] - "''This bug was fixed in the package python-distutils-extra - 2.19''" (in lucid it's 2.18bzr1)
 * [[https://bugs.launchpad.net/ubuntu/+source/apport/+bug/592239|Bug #592239 in apport (Ubuntu): "apport-retrace - IndexError: list index out of range"]]
Unfortunately the tool does not allow to look at a 32bit dump on a 64bit system and the other way round. Also it tends to be quite picky about matching up kernel and dump.
Line 18: Line 46:
On the other hand, `apport-update`+`crash` seem to work fine, as described for 9.10. = Inspecting the crash dump using apport-retrace =
Line 20: Line 48:
To get a local retrace, you need apport-retrace and then run: {{{
apport-retrace --stdout --rebuild-package-info /var/crash/linux-image*.crash }}}
Line 21: Line 51:
== Ubuntu 9.10 "Karmic Koala" == /!\ Again, this can take a while because it needs to download the kernel debug package.
Line 23: Line 53:
In Karmic all that is needed is to install the "linux-crashdump" package. After a reboot the system should be able to catch crash dumps automatically and provide them to apport. = Troubleshooting =
Line 25: Line 55:
For example, to test you can force a kernel oops:
{{{
 echo 1 > /proc/sys/kernel/panic_on_oops
 echo c > /proc/sysrq-trigger
}}}
This should force a kernel oops and automatic reboot. Then watch for an apport prompt in the notification area on the next login.
== Allocated memory for the crash kernel ==
Line 32: Line 57:
To get a local retrace, you need apport-retrace and then run:
{{{
# apport-retrace --stdout --rebuild-package-info /var/crash/linux-image*.crash
}}}
(this can take a while because it needs to download the linux-image-debug package and that file is several hundreds megs).
When testing crash dump sometimes the system just seems to lock up. The main issue there is how much memory was assigned for the crash kernel. When kexec starts the crash kernel it requires enough memory to fit the unpacked kernel, the compressed initrd and the uncompressed initrd (at least while unpacking). If there is not enough memory allocated, things usually go wrong without any hint. To solve this there are the following options:
Line 38: Line 59:
To do the backtrace manually, you you have to install "crash" (ie linux-crashdump) and the linux-image-debug-`uname -r` kernel debug deb package from ddebs.ubuntu.com. Note, you can run the apport-retrace command above which will also unpack and install the linux-image-debug-`uname -r` kernel debug deb package. Then you need to get the VmCore from apport again and use "crash" with all its power. Try the following commands:
{{{
# apport-unpack /var/crash/linux-image*.crash /tmp/unpacked
# crash /usr/lib/debug/boot/vmlinux-`uname -r` /tmp/unpacked/vmcore
crash> bt -a
}}}
 1. Increase the allocation by changing ''crashkernel='' on the grub command line or in ''/boot/grub/grub.cfg'' (for grub2) or ''/boot/grub/menu.lst'' (for old grub). To avoid loosing the settings when running '''update-grub''' the change can be made in ''/etc/grub.d/10_linux''.
 1. Reduce the size of the initrd. By default this is set to include all the modules and firmware ever needed. This allows using the same initrd on any system but increases its size a lot. In order to limit it to the modules really required to boot on the current hardware, change the following in ''/etc/initramfs-tools/initramfs.conf'': {{{
 ...
 MODULES=dep
 ... }}}
Line 45: Line 65:
Note: the linux-image-debug-* packages do not exist in the usual repositories - you have to use download the packages from http://ddebs.ubuntu.com/pool/main/l/linux/. See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/289087, https://lists.ubuntu.com/archives/kernel-team/2009-February/004310.html, https://lists.ubuntu.com/archives/kernel-team/2009-March/004570.html, https://lists.ubuntu.com/archives/kernel-team/2009-June/005931.html == Crash kernel fails to load: Hang ==
Line 47: Line 67:
== Ubuntu 9.04 "Jaunty Jackalope" == This can be frustrating to debug, especially if you're unable to record the console messages from the new kexec kernel. A serial console attached to the system is best here to continue debugging. An easy troubleshooting step is to systematically eliminate the additional kernel parameters passed to the crash kernel and retrying. These arguments are kept in '''/etc/init.d/kdump''': {{{
...
        # Append kdump_needed for initramfs to know what to do, and add
        # maxcpus=1 to keep things sane.
        APPEND="$APPEND kdump_needed maxcpus=1 irqpoll reset_devices"
Line 49: Line 73:
This page describes a recipe for enabling crash dump vmcore analysis on your Jaunty x86/x86_64 platform. Much of the information was gleaned from the kernel source tree files in Documentation/kdump.         # --elf32-core-headers is needed for 32-bit systems (ok
        # for 64-bit ones too).
        log_action_begin_msg "Loading crashkernel"
        kexec -p "$KERNEL_IMAGE" --initrd="$INITRD" --append="$APPEND"
        log_action_end_msg $?
... }}}
Line 51: Line 80:
  * 'apt-get install linux-crashdump'
    This is a meta package that installs all of the tools necessary to acquire and analyse a crash-dump vmcore.
Leave '''$APPEND''' and '''kdump_needed'''. Start by removing '''reset_devices''' and then
install the new kexec crash kernel configuration: {{{
sudo service kdump start }}}
Line 54: Line 84:
  * Add 'crashkernel=64M@16M' to the kernel command line in /boot/grub/menu.lst.
    You'll also probably want to remove 'quiet splash'.
Then retest; if that doesn't work, remove the next argument, rinse and repeat.
Line 57: Line 86:
  * Reboot the system (into the ordinary kernel). The section of RAM above will now be reserved for the crashkernel (and not available to the normal system). = Release specific notes =
Line 59: Line 88:
  * Make note of your root partition, e.g., /dev/sda1
    'kexec -p /boot/vmlinuz-{{{`uname -r`}}} --initrd=/boot/initrd.img-{{{`uname -r`}}} --append="root=<ROOT_PARTITION> irqpoll maxcpus=1"'
    This loads the crash-dump kernel into the reserved memory, in preparation for a panic.
== Ubuntu 12.04 "Precise Pangolin" ==
Line 63: Line 90:
  Now your kernel is ready to acquire a post-crash vmcore. You can test the process by simulating a crash-dump:  * [[https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/785394|Bug 785394: Hard-coded crashkernel=... memory reservation in /etc/grub.d/10_linux is insufficient]]<<BR>>
 The default allocation for systems below 2G is not enough for the current initrd size. Manually adapting the size allows to use the crash kernel.
 * The current (1.3.7-2) version of makedumpfile reports to be incompatible with the 3.2 kernel. The dumps created seem to be ok.
Line 65: Line 94:
  'echo c > /proc/sysrq-trigger' == Ubuntu 14.10 "Utopic Unicorn" ==
Line 67: Line 96:
  What you should see is a boot sequence, which is the crash dump kernel loading. Login as root and copy /proc/vmcore to a location of your choice, e.g. cp /proc/vmcore /var/log/vmcore.
  Reboot back to the normal kernel and use crash to analyse the vmcore:

  'crash /boot/System.map-{{{`uname -r`}}} /lib/modules/{{{`uname -r`}}}/vmlinux /var/log/vmcore

  The methods used for examining the vmcore using crash are left as an exercise for the user.
  * [[https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1359980|Bug #1359980: [Hyper-V] Unable to perform a full kernel crash on Ubuntu 14.10]] <<BR>>
  With the default crashkernel=128M@64M, when we trigger the crash a kernel panic gets generated but no vmcore/crash file is generated under /var/crash and the VM will hang and not reboot.
  However if we modify the kernel parameter as suggested to crashkernel=384M-:256M and trigger a kernel panic, the vm will reboot, and also generate a vmcore/crash file under /var/crash.

Introduction

The Ubuntu Kernel Crash Dump is a mechanism that enable enterprise style post-mortem crash analysis in Linux operating systems. It uses a special mode of kexec which allows to automatically boot a secondary kernel whenever a crash (Oops/panic) occurs. This secondary kernel will then save the state and memory of the primary kernel to a certain location of the filesystem (/var/crash on newer releases). This file can then be used by crash to gather detailed information about the problem.

Installation

For convenience, the kernel crash dump utility has been packaged in Ubuntu. It can be installed with the following command:

sudo apt-get install linux-crashdump 

Newer versions of the package will automatically add an entry crashkernel=384M-2G:64M,2G-:128M to the kernel commandline in grub. However this may cause problems on systems with less than 2G of memory (see troubleshooting).

Verifying linux-crashdump installation

For Trusty, please see here.

Inspecting the crash dump using crash

In order to use the generated crash dump with crash one needs the vmlinux file which has the debugging information. This is part of the kernel ddeb package which can be found at:

http://ddebs.ubuntu.com/pool/main/l/linux/

sudo tee /etc/apt/sources.list.d/ddebs.list << EOF
deb http://ddebs.ubuntu.com/ $(lsb_release -cs)          main restricted universe multiverse
deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-security main restricted universe multiverse
deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-updates  main restricted universe multiverse
deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-proposed main restricted universe multiverse
EOF

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys ECDCAD72428D7C01
sudo apt-get update
sudo apt-get install linux-image-$(uname -r)-dbgsym

Warning /!\ Be aware that those packages are huge! (~600 MB)

When installed, the debug kernel can be found under /usr/lib/debug/boot/ and crash is started by:

crash <debug kernel> <crash dump> 

Unfortunately the tool does not allow to look at a 32bit dump on a 64bit system and the other way round. Also it tends to be quite picky about matching up kernel and dump.

Inspecting the crash dump using apport-retrace

To get a local retrace, you need apport-retrace and then run:

apport-retrace --stdout --rebuild-package-info /var/crash/linux-image*.crash 

Warning /!\ Again, this can take a while because it needs to download the kernel debug package.

Troubleshooting

Allocated memory for the crash kernel

When testing crash dump sometimes the system just seems to lock up. The main issue there is how much memory was assigned for the crash kernel. When kexec starts the crash kernel it requires enough memory to fit the unpacked kernel, the compressed initrd and the uncompressed initrd (at least while unpacking). If there is not enough memory allocated, things usually go wrong without any hint. To solve this there are the following options:

  1. Increase the allocation by changing crashkernel= on the grub command line or in /boot/grub/grub.cfg (for grub2) or /boot/grub/menu.lst (for old grub). To avoid loosing the settings when running update-grub the change can be made in /etc/grub.d/10_linux.

  2. Reduce the size of the initrd. By default this is set to include all the modules and firmware ever needed. This allows using the same initrd on any system but increases its size a lot. In order to limit it to the modules really required to boot on the current hardware, change the following in /etc/initramfs-tools/initramfs.conf:

     ...
     MODULES=dep
     ... 

Crash kernel fails to load: Hang

This can be frustrating to debug, especially if you're unable to record the console messages from the new kexec kernel. A serial console attached to the system is best here to continue debugging. An easy troubleshooting step is to systematically eliminate the additional kernel parameters passed to the crash kernel and retrying. These arguments are kept in /etc/init.d/kdump:

...
        # Append kdump_needed for initramfs to know what to do, and add
        # maxcpus=1 to keep things sane.
        APPEND="$APPEND kdump_needed maxcpus=1 irqpoll reset_devices"

        # --elf32-core-headers is needed for 32-bit systems (ok
        # for 64-bit ones too).
        log_action_begin_msg "Loading crashkernel"
        kexec -p "$KERNEL_IMAGE" --initrd="$INITRD" --append="$APPEND"
        log_action_end_msg $?
... 

Leave $APPEND and kdump_needed. Start by removing reset_devices and then install the new kexec crash kernel configuration:

sudo service kdump start 

Then retest; if that doesn't work, remove the next argument, rinse and repeat.

Release specific notes

Ubuntu 12.04 "Precise Pangolin"

Ubuntu 14.10 "Utopic Unicorn"

  • Bug #1359980: [Hyper-V] Unable to perform a full kernel crash on Ubuntu 14.10
    With the default crashkernel=128M@64M, when we trigger the crash a kernel panic gets generated but no vmcore/crash file is generated under /var/crash and the VM will hang and not reboot. However if we modify the kernel parameter as suggested to crashkernel=384M-:256M and trigger a kernel panic, the vm will reboot, and also generate a vmcore/crash file under /var/crash.

Kernel/CrashdumpRecipe (last edited 2021-11-04 14:04:59 by tomreyn)