How to get meaningful backtraces in Gentoo
1.
Backtraces with Gentoo
What are backtraces?
A backtrace (sometimes also called bt, trace, or stack trace) is a human
readable report of the calling stack of a program. It tells you at which point
of a program you are and how you reached that point through all the functions
up to main() (at least in theory).
Backtraces are usually analyzed when error conditions such as segmentation
faults or aborts are reached using debuggers like gdb (GNU debugger), to
find the cause of the error.
A meaningful backtrace contains not only the shared objects where the call was
generated, but also the name of the function, the filename and the line where
it stopped. Unfortunately on a system optimised for performance and conserved
disk space, the backtraces are useless and show only the pointers on the stack
and a series of ?? instead of the functions' names and position.
This guide will show how it's possible to get useful, meaningful backtraces in
Gentoo, by using Portage features.
Compiler flags
By default gcc does not build debug information inside the objects
(libraries and programs) it builds, as that creates larger objects. Also,
many optimisations interfere with how the debug information is saved. For these
reasons, the first thing to pay attention to is that the CFLAGS are set to
generate useful debug information.
The basic flag to add in this case is -g. That tells the compiler to
include extra information in objects, such as filenames and line numbers.
This is usually enough to have basic backtraces, but the flag -ggdb
adds more information. There is actually another flag (-g3), but its use
is not recommended. It seems to break binary interfaces and might lead to extra
crashes. For instance, glibc breaks when built with that flag. If
you want to provide as much information as possible, you should use the
-ggdb flag.
Code Listing 1.1: Example of CFLAGS with debug information |
CFLAGS="-march=k8 -O2 -ggdb"
CXXFLAGS="${CFLAGS}"
|
High optimisation levels, such as -O3 might cause the backtrace
to be less faithful, or incorrect. Generally speaking, -O2 and
-Os can be used safely to get an approximate backtrace, down to
the function called and the area of the source file where the crash
happened. For more precise backtraces, you should instead use -O1.
Note:
The use of -O0 is often suggested when trying to produce a
complete backtrace. Unfortunately this does not always play fair with
the software itself, as disabling all optimisations changes the
implementation of functions in the GNU C library (sys-libs/glibc), to
the point that it can be considered being two different libraries, one
for optimised and one for non-optimised builds. Also, some software
will fail to build entirely when using -O0 because of the
changes in headers' includes, and lack of features such as constant
propagation in dead code elimination.
|
Note for x86 architecture users: x86 users frequently have
-fomit-frame-pointer in their CFLAGS. The x86 architecture has a limited
set of general registers, and this flag can make an extra register available,
which improves performance. However there is a cost: it makes it impossible for
gdb to "walk the stack" — in other words, to generate a backtrace reliably.
Remove this flag from CFLAGS to build something easier for gdb to
understand. Most other platforms do not have to worry; either they generally
don't set -fomit-frame-pointer anyway, or the code generated by
gcc does not confuse gdb (in which case the flag is already
enabled by -O2 optimisation level).
Hardened users have other things to worry about. The
hardened
FAQ provides the extra hints and tips you need to know.
Stripping
Just changing your CFLAGS and re-emerging world won't give you meaningful
backtraces anyway, as you have to solve the stripping problem. By default
Portage strips binaries. In other words, it removes the sections
unneeded to run them to reduce the size of the installed files. This is a good
thing for an average user not needing useful backtraces, but removes all the
debug information generated by -g* flags, and also the symbol tables
that are used to find the base information to show a backtrace in human readable
form.
There are two ways to stop stripping from interfering with debugging and useful
backtraces. The first is to tell Portage to not strip binaries at all, by adding
nostrip to FEATURES. This will leave the installed files exactly as gcc
created them, with all the debug information and symbol tables, which
increases the disk space occupied by executables and libraries. To avoid this
problem, in Portage version 2.0.54-r1 and the 2.1 series, it's possible to use the
splitdebug FEATURE instead.
With splitdebug enabled, Portage will still strip the binaries installed
in the system. But before doing that, all the useful debug information is copied
to a ".debug" file, which is then installed inside /usr/lib/debug
(the complete name of the file would be given by appending to that the path
where the file is actually installed). The path to that file is then saved
in the original file inside an ELF section called ".gnu_debuglink", so that
gdb knows which file to load the symbols from.
Important:
If you enable both nostrip and splitdebug features, Portage won't
strip binaries at all, so you have to pay attention to what you want.
|
Another advantage of splitdebug is that it doesn't require you to rebuild
the package to get rid of the debug information. This is helpful when you build
some packages with debugging to get a backtrace of a single error. Once it's
fixed, you just need to remove the /usr/lib/debug directory.
To be sure to not strip binaries, you must also be sure you don't have
the -s flag set in your LDFLAGS. That tells the linker to strip the
resulting binaries in the link phase. Also note that using that flag might lead
to further problems. It won't respect the strip restrictions imposed by some
packages that stop working when entirely stripped.
Note:
Some packages unfortunately handle stripping by themselves, inside the upstream
provided makefiles. This is an error and should be reported. All packages
should leave Portage the task of the stripping or simply restrict stripping
entirely. The main exception to this are binary packages. They are usually
stripped by upstream, outside of Portage control.
|
debug USE flag
Some ebuilds provide a debug USE flag. While some mistakenly use it to
provide debug information and play with compiler flags when it is enabled, that
is not its purpose.
If you're trying to debug a reproduceable crash, you want to leave this USE flag
alone, as it'll be building a different source than what you had before. It is
more efficient to get first a backtrace without changing the code, by simply
emitting symbol information, and just afterward enable debug features to track
the issue further down.
Debug features that are enabled by the USE flag include assertions, debug logs
on screen, debug files, leak detection and extra-safe operations (such as
scrubbing memory before use). Some of them might be taxing, especially for
complex software or software where performance is an important issue.
For these reasons, please exercise caution when enabling the debug USE
flag, and only consider it a last-chance card.
Introducing gdb
Once your packages are built with debug information and are not stripped, you
just need to get the backtrace. To do so you need the sys-devel/gdb
package. It contains the GNU debugger (gdb). After installing that, you
can proceed with getting the backtrace. The simplest way to get one is to run
the program from inside gdb. To do so, you need to point gdb to
the path of the program to run, give it the arguments it will need, and then
run it:
Code Listing 1.2: Running ls through gdb |
$ gdb /bin/ls
GNU gdb 6.4
[...]
(gdb) set args /usr/share/fonts
(gdb) run
Starting program: /bin/ls /usr/share/fonts
[Thread debugging using libthread_db enabled]
[New Thread 47467411020832 (LWP 11100)]
100dpi aquafont baekmuk-fonts cyrillic dejavu fonts.cache-1 kochi-substitute misc xdtv
75dpi arphicfonts CID default encodings fonts.dir mikachan-font util
Program exited normally.
(gdb)
|
The message "Program exited normally" means that the program exited with the
code 0. That means that no errors were reached. You shouldn't trust that too
much, as there are programs that exit with status 0 when they reach error
conditions. Another common message is "Program exited with code nn". That
simply tells you which non-zero status code they returned. That might imply a
handled or expected error condition. For segmentation faults and aborts, you get
instead a "Program received signal SIGsomething" message.
When a program receives a signal, it might be for many different reasons. In case
of SIGSEGV and SIGABRT (respectively segmentation fault and abort), it usually
means the code is doing something wrong, like doing a wrong syscall or
trying to access memory through a broken pointer. Other common signals are
SIGTERM, SIGQUIT and SIGINT (the latter is received when CTRL-C is sent to the
program, and usually gets caught by gdb and ignored by the program).
Finally there is the series of "Real-Time events". They are named SIGnn
with nn being a number greater than 31. The pthread implementation
usually uses them to syncronise the different threads of the program, and
thus they don't represent error conditions of any sort. It's easy to provide
meaningless backtraces when confusing the Real-Time signals with error
conditions. To prevent this, you can tell gdb to not stop the program when
they are received, and instead pass them directly to the program, like in the
following example.
Code Listing 1.3: Running xine-ui through gdb, ignoring real-time signals. |
$ gdb /usr/bin/xine
GNU gdb 6.4
[...]
(gdb) run
Starting program: /usr/bin/xine
[...]
Program received signal SIG33, Real-time event 33.
[Switching to Thread 1182845264 (LWP 11543)]
0x00002b661d87d536 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
(gdb) handle SIG33 nostop noprint noignore pass
Signal Stop Print Pass to program Description
SIG33 No No Yes Real-time event 33
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) run
|
The handle command tells gdb what it should do when the given signal is
sent to the command; in this case the flags are nostop (don't stop the
program returning the command to the debugger), noprint (don't bother
printing the reception of such a signal), noignore (don't ignore the signal —
ignoring signals is dangerous, as it means discarding them without passing them
to the program), pass (pass the signal to the debugged program).
After the eventual Real-Time events are being ignored by gdb, you should
try to reproduce the crash you want to report. If you can reproduce it
systematically, it's quite easy. When gdb tells you that the program
received the SIGSEGV or SIGABRT signal (or whatever else signal might represent
the error condition for the program), you'll have to actually ask for the
backtrace, possibly saving it somewhere. The basic command to do that is
bt, which is short for backtrace, which will show you the backtrace of
the current thread (if the program is single-threaded, there's only one thread).
An alternative command to get a more detailed backtrace is bt full. That
also gives you the information about parameters and local variables to the
function where calls are being made (when they are available and not
removed by optimisations). This makes the trace longer but also more useful
when trying to find, for example, why a pointer is uninitialised.
Lately it's not rare that even simple programs are written with multiple threads,
making the use of a simple bt output, albeit meaningful, quite useless,
as it might represent the status of a thread different from the one in which the
signal is thrown, or from the one where the error condition manifested (in case there's
another thread responsible for throwing signals). For this reason, you should
instead get the trace with the longer command thread apply all bt full,
that tells the debug to print the full tracing of all the threads currently
running.
If the backtrace is short, it's easy to copy and paste it out of the terminal
(unless the failure happens on a terminal without X), but sometimes it's just
too long to be copied easily, because it spans over multiple pages. To be able
to get the backtraces on a file to attach to a bug, you can use the
logging feature:
Code Listing 1.4: Using logging feature to save the backtrace to file |
$ gdb /usr/bin/xine
GNU gdb 6.5
[...]
(gdb) run
[...]
(gdb) set logging file backtrace.log
(gdb) set logging on
Copying output to backtrace.log.
(gdb) bt
#0 0x0000003000eb7472 in __select_nocancel () from /lib/libc.so.6
...
(gdb) set logging off
Done logging to backtrace.log.
(gdb) quit
|
Now you can get the backtrace in the backtrace.log file, and just
send it via email or attach that file to the related bug.
Core dumps
Sometimes the crashes are difficult to reproduce, the program is vastly threaded,
it's too slow to run in gdb or it's messed up when run through it
(shouldn't surprise anybody that running inside the debugger there are more bugs
than are reproducible without the debugger itself). In these cases, there is one
tool that comes in useful: the core dump.
A core dump is a file that contains the whole memory area of a program when it
crashed. Using that file, it's possible to extract the stack backtrace even
if the program has crashed outside gdb, assuming core dumps are
enabled. By default core dumps are not enabled on Gentoo Linux (they are,
however, enabled by default on
Gentoo/FreeBSD),
so you have to enable them.
The core dump files are generated directly by the kernel; for this
reason, the kernel need to have the feature enabled at build time to
work properly. While all the default configurations enable core dump
files, if you're running an embedded kernel, or you have configured
otherwise standard kernel features, you should verify the following
options:
Note:
You can skip this step if you haven't enabled the “Configure standard
kernel features” option at all, which you shouldn't have if you don't
know whether you did.
|
Code Listing 1.5: Kernel options to enable core dumps |
General Setup --->
Configure standard kernel features --->
Enable ELF core dumps
|
Core dumps can be enabled on the system level or the shell session level. In the
first case, everything in the system that crashes and does not have already a
crash handler (see later for more notes about KDE's crash handler) will dump.
When enabled at shell session level, only the programs started from that session
will leave behind a dump.
To enable core dumps on a system level, you have to edit either
/etc/security/limits.conf (if you're using PAM, as is the default)
or /etc/limits.conf. In the first case, you must define a limit
(whether hard or, most commonly, soft; for core files, that might be anywhere from
0 to no limit). In the latter case, you just need to set the variable C to the
size limit of a core file (here there's no "unlimited").
Code Listing 1.6: Example of rule to get unlimited core files when using PAM |
# /etc/security/limits.conf
* soft core unlimited
|
Code Listing 1.7: Example of rule to get core files up to 20MB when not using PAM |
# /etc/limits.conf
* C20480
|
To enable core files on a single shell session you can use the ulimit
command with the -c option. 0 means disabled; any other positive number
is the size in KB of the generated core file, while unlimited simply
removes the limit on core file dimension. From that point on, all the programs
that exit because of a signal like SIGABRT or SIGSEGV will leave behind a
core file that might be called either "core" or "core.pid" (where pid is
replaced with the actual pid of the program that died).
Code Listing 1.8: Example of ulimit use |
$ ulimit -c unlimited
$ crashing-program
[...]
Abort (Core Dumped)
$
|
Note:
The ulimit command is an internal command in bash and zsh. On other
shells it might be called in other ways or might even not be available at all.
|
After you get a core dump, you can run gdb on it, specifying both the path to the
file that generated the core dump (it has to be the same exact binary, so if you
recompile, the core dump is useless) and the path to the core file. Once you have
gdb open on it, you can follow the same instructions given above as it had just
received the signal killing it.
Code Listing 1.9: Starting gdb on a core file |
$ gdb $(which crashing-program) --core core
|
As an alternative, you can use gdb's command-line capabilities to get the
backtrace without entering the interactive mode. This also makes it easier to save
the backtrace in a file or to send it to a pipe of any kind. The trick lies in
the --batch and -ex options that are accepted by gdb. You
can use the following bash function to get the full backtrace of a core dump
(including all threads) on the standard output stream.
Code Listing 1.10: Function to get the whole backtrace out of a core dump |
gdb_get_backtrace() {
local exe=$1
local core=$2
gdb ${exe} \
--core ${core} \
--batch \
--quiet \
-ex "thread apply all bt full" \
-ex "quit"
}
|
KDE crash handler's notes
KDE-based applications runs by default with their own crash handler, which is
presented by the user by the means of "Dr. Konqi" if it's installed (the
package is either kde-base/kdebase or kde-base/drkonqi
(included in kdebase-meta). This crash handler shows the user
an informative dialog telling him that the program has crashed. On this dialog
there is a "Backtrace" tab that, when loaded, calls gdb and makes it load the
data and generate the full backtrace on the behalf of the user, showing it in
the main text box and allowing it to be saved directly to a file. That backtrace is
usually good enough for reporting a problem.
When drkonqi is not installed, the crashes won't generate a core dump anyway,
and the user will receive no information by default. To avoid this, it's
possible to use the --nocrashhandler argument on all the KDE-based
applications. That disables the crash handler entirely and leaves the signals
to be handled by the operating system as usual. This is useful to generate core
files when drkonqi is not available or when wanting to inspect stack frames by
hand.
The contents of this document, unless otherwise expressly stated, are licensed under the CC-BY-SA-2.5 license. The Gentoo Name and Logo Usage Guidelines apply.
|