An introduction to crash reporting

Despite our best intentions, our programs will likely crash at one point or another. If this has happened on your own machine then you have your debugger at hand, but if it has happened on a user’s device this becomes more problematic.

A solution for this is crash reporting: a crash dump which is sent to you for debugging, as well as any relevant log files.

These crash dumps are usually minidumps, but will contain information about all the threads running and their callstacks.

The basic blocks

For simplicity, for now lets assume we’re talking about a PC or Mac application. At a high level, crash reporting is composed of four building blocks:

The source code and symbols for your executable.
Your program, which spawns a crash handler process when launched.
The crash handler executable, which monitors your program.
A remote service which the crash handler communicates a crash to.

Although a program can monitor itself for a crash, it is deep in the land of undefined behaviour once a thread crashes. As such, it is better to rely on a dedicated crash handler executable which runs in parallel and does the heavy lifting.

Your program therefore needs to do the following: start the crash handler when it launches (the sooner the better!), intercept the unhandled exception which is raised by a crash, generate a minidump and pass this data to the crash handler.

The crash handler takes all of this data, collects any attachments you might be interested in (for example, log files) and sends everything to some server controlled by the developer, normally via a HTTP POST request.

The remote service receives this data, and depending on your integration, it can show you the exact callstack and the linked source code. Two such services are Bugsplat and Backtrace, both of which have free tiers for smaller projects and extensive documentation.

This is where you need the source code you’ve used to generate the program you’ve published, as well as the matching debug symbols. The crash dump you receive is mapped against those symbols, providing you with a functional callstack. Without the debug symbols, all you’ll have is a collection of memory addresses.

Integrations

If you are using a commercial engine, such as Unreal or Unity, then your life is greatly simplified: the engine developers have done the heavy lifting for you and integrated a crash handler. You’ll still need to do some work, such as setting up the crash ingestion service and configuring your engine to submit crashes to it.

If you are writing your own engine, you can integrate crashpad (if you’re using Backtrace, they have their own fork): it will provide you with a crash handler, setup everything necessary for your engine to generate minidumps and submit the resulting data to your crash ingestion service of choice.

Symbols

As mentioned before, to be able to resolve a minidump you’ll need matching debug symbols. Ideally you’d upload the debug symbols automatically to the crash ingestion service as part of your release pipeline, but you can also upload the symbols manually.

The last thing you need is to start getting crash reports and not be able to resolve the callstacks, so definitely spend time making sure this is working as expected.

Edge cases

If you are working on Microsoft or Sony consoles, they have in-built crash reporters which will provide some information to the developer. These can be temperamental and need some time to setup, as well as a thorough read of the documentation.
If for some reason you don’t want to (or can’t) have a separate process to handle a crash, you can write out the minidump to a temporary directory or drive, and upload it to your crash ingestion service the next time the process is booted.
Don’t try to submit the crash report from the crashed process unless you have a great deal of time in your hands to debug your crash reporting code. Particularly with Unreal, it’s not going to have a happy ending.

Final words

Integrating a proper crash reporting system can greatly improve the health of your program, giving you considerably more visibility over crashes and situations you can’t reproduce locally for one reason or another. It does take time to setup everything properly, but it is vastly superior from having to figure out crashes from log files alone.