As much as we hate to admit it, programs will inevitably crash in the field. When they do, you need to gather as much information about the crash as you can to diagnose it and prevent it from happening again.
For Spotkin’s latest game, Contraption Maker, I found an excellent crash reporting service called HockeyApp. It has great support for OSX out of the box. Unfortunately, it does not have support for Microsoft Windows, which is over 90% of our customer base.
For those who want to see the end result without reading any further, here is a diagram of the crash reporting architecture we ended up with. I’ll describe more in detail how we arrived at this point.
I first looked into the Microsoft Windows Error Reporting Service. From what I read that requires getting a Verisign certificate that costs around $400 dollars a year. HockeyApp, on the other hand, is only $100 a year.
Our game is on Steam, and Steam has built-in crash reporting, but that would not help us if we released a version outside of Steam. Also, Steam only reports crashes after eight crashes have occurred, and I wanted to get reports for all crashes as soon as they happened.
So, I went back to HockeyApp. It turns out that HockeyApp has a REST API that lets you upload custom crash reports via an HTTP post. I thought I could use this to upload Windows crash reports, but first I would need to generate crash files for Windows.
I decided to use Google’s Breakpad project for generating the Windows crash reports. I thought Breakpad was a good choice as I could eventually use it for Linux crash reporting as well. Hooking up Breakpad in Contraption Maker was very simple, and I was soon writing crash dumps to the filesystem. Now I needed to get them symbolicated and into HockeyApp.
I could have uploaded the crash report directly from the app to HockeyApp, but the crash report would then not be symbolicated. I’d have to download the dump files and process them to find out where crashes were happening. Intermediate processing was needed.
For intermediate storage, we set up an Amazon S3 bucket that was write-only from the application. When Contraption Maker starts up, it looks for the latest crash file and puts up a dialog asking the user what happened to cause the crash. It then uploads a zipfile to the S3 bucket containing the following: the user text describing the crash, the crash dump file generated by Breakpad, and the last Contraption Maker log file.
At this point I could download the crashes from S3 and process them on a Windows machine that had the symbols. I started out using the windows command line debugger cdb to do this, but the output from that was not what HockeyApp needed. I thought about writing a script to process the stack output from cdb and transforming it into the correct format for HockeyApp. However, I then discovered that I could actually process the crash dump on Linux, using the minidump_stackwalker utility that comes with Breakpad.
With some small modifications, I was able to get the stackwalker tool to output the crash stack in the format HockeyApp wanted it. Here is the modified file I ended up with: https://gist.github.com/keithjohnston/8308469
Finally, I set up a cron job on the AWS Linux instance to run a script that polls the S3 bucket of crashes. When a new crash is found, it runs the modified stackwalker on it and uploads the crash report to HockeyApp. It also attaches the log file and the dump file, as well as a full stacktrace from minidump_stackwalker.
One step I left out is the generation of a symbol file in a format that the Breakpad minidump_stackwalker can read. These symbols are created using the dump_syms utility that comes with Breakpad. The dump_syms command must be run on Windows as it uses Windows DLLs to break apart the pdb file and generate the Breakpad symbol file. You will need a version that matches up with the compiler you are using. The Mozilla project provides pre-compiled versions for you here: http://hg.mozilla.org/mozilla-central/file/8f1c9cdedba5/toolkit/crashreporter/tools/win32.
To sum up, here is the process we now use:
- Build the app on Windows in release mode, but with debug symbols.
- Process the app pdb file on Windows with Breakpad’s dump_syms.
- Upload the Breakpad symbol file to S3.
- The Breakpad client in the Windows app uploads the dump file to S3.
- A cron job on a Linux AWS instance reads the symbols and dump files from S3, runs a modified minidump_stackwalk to output a symbolicated stack trace for HockeyApp, and uploads the final crash report to HockeyApp with curl.