This guest blog post is by Steve Thomas. He is a senior consultant at Microsoft, and his interests include cloud/virtualization technologies, AppCompat, Azure, Xbox and the New Orleans Saints.
Steve covers a broad set of topics on his website.
Productive virtual application debugging requires an understanding of the basic fundamentals of debugging compiled software code. For this part of my series on debugging virtual applications, I will be focusing exclusively on these fundamentals. If you are already familiar with these concepts, please allow me to quickly recap these to those readers which may be either not familiar, or only somewhat and looking to solidify these concepts.
Types of Debugging
There are several categories of debugging and the descriptions will vary by vendor, publication, and academic degrees of description. There is almost a guaranteed point of view when it comes to applying it to a specific product or series of products. Being that my discussion primarily revolves around products that run on top of the Windows operating system, my point of view, or slant, is obviously geared towards the types and toolsets that come with Windows.
Live debugging refers to the mechanism of attaching to a running program or process either invasively or non-invasively. A debugger may attach to a process and wait for exceptions or set a specific breakpoint. The debugger can insert those breakpoints in once attached to the process. The easiest way to think of a breakpoint is to understand its most basic definition: a breakpoint is a place or time at which an interruption or change is made. More information on breakpoints and different breakpoint types within the Windows context can be found here: link. In addition, live debugging is also commonly used to troubleshoot and analyze code within the developer environment. In those situations the types of breakpoints will vary. For example, you can refer to the examples of breakpoints that are available within the Visual Studio development environment here: link. Once attached to the process, a debugger can then step through threads and functions as the application is live.
Print or Trace Debugging
This is the most common method for troubleshooting software applications and operating systems as technically, this can cover a wide scope of methods. An application can run at specific diagnostic levels generating additional output and information that can be collected into a file or database that can be used to isolate and issue. Event traces, log files, debug output all fall into this category. Strictly speaking within Windows, applications can leverage the OutputDebugString or ODS to have an application, service, or operating system component generate what is referred to as “debug spew” and you can use various tools to collect or view this debug trace information. The most popular tool for viewing ODS traces is the Debug View utility (DBGVIEW) from the Sysinternals suite (link) although this is not the only one. More information on the OutPutDebugString can be found here: link.
In addition, there are tools that can hook into the Windows operating system to capture Win32 API and other application functions through the use of a simple user mode monitor (like the API Monitor tool) are even deeper through the use of a kernel-level filter driver (Like Process Monitor.) Literally troubleshooting outside the box and on to the wire – you can use network traffic\protocol analysis tools like Wireshark or Message Monitor (link)to capture network traces. These are all forms of trace debugging.
Windows Integrated Tracing and Instrumentation
Prior to Windows Vista, there were Event Logs, ODS tracing, text-based log files, etc. all within Windows each requiring their own tools and APIs. Starting with Windows 2000, Microsoft began incorporating Event Tracing for Windows (ETW) into the operating system and soon, applications and windows components were using this common engine for enabling diagnostics and collecting detailed debug tracing. Viewing of these traces was soon integrated into the Windows Event Viewer and users of App-V 5 are able to often resolve issues using this very mechanism.
Remote Debugging is a form of live debugging where the process of debugging occurs on a system that is different from the debugger. In most Windows cases, this is where there is an issue that needs to be debugged at the kernel mode level prior to the completion of an operating system boot or a system level crash. To start remote debugging, a debugger connects to a remote computer over a network or via a serial cable. The debugger can then control the execution of the program on the remote system and retrieve information about its state. In Windows, this is often done serially or via Firewire.
Post Mortem Debugging
Post Mortem debugging is a very common method of troubleshooting problems within software because it involves viewing a historical point-in-time snapshot of a hang, system, or application crash. This is where a debugger will read in a snapshot of debugging data called a “dump file” which contains existing memory and instruction pointers. The degrees of debugging depend on how much data is collected in the dump file as dump files can vary in what they collect. When it comes to application and system dumps, these can be controlled by the operating system’s default handlers (once called Dr. Watson for user mode applications) as to what information is collected in the dump file.
I was first introduced to the concept of post-mortem debugging reading an article by Matt Pietrek way back in 1992 in Dr. Dobbs Journal. Matt historically is one of the earliest writers on the subject of Windows debugging going back nearly three decades. The amazing thing is you can still read this article I am citing as it is available online: link
Execution Modes of Debugging in Windows
When we are speaking of execution modes in Windows, were talking about code that runs either in user mode or kernel mode. The execution mode affects the methodologies and tools you will leverage in order to properly debug the issue. Software is ultimately driven by the processor (CPU.) For a computer running Windows, the CPU runs in two different modes – user mode and kernel mode. The CPU switches between the two depending on the code.
The kernel and other operating system components run in kernel mode, hence the term. Rather than a macrokernel like other operating systems, Windows runs a smaller microkernel that runs as process SYSTEM. Like an application loads and uses DLL (dynamically linked libraries) the kernel also loads special modules called executive components and/or filter drivers alongside device drivers. There is essentially only one process running and that is what shows up in the Windows task Manager as “System” and if this application crashes . . . well . . . so does the entire computer. With debugging, when we are debugging in kernel mode, we are essentially debugging this process – however, it also serves as the governor of all of the other processes running on the system in user mode. All code that runs in kernel mode runs in a single virtual address space. This means that a faulty kernel-mode driver is not isolated from other drivers and the operating system itself.
Regular applications, middleware, plug-ins, and most services run in user mode. When you start a user-mode application, Windows creates a process for the application. This process will execute one or more threads. I use the description of the process itself being innate in nature. It just owns a private virtual address space, a private handle table, and contains at least one primary thread for execution. This description of a process comes from Jeffrey Richter who has written many books on Win32 programming. Because these processes are isolated from each other, an application is unable to screw up the operation of another separated process if it crashes. Other applications and the operating system are not affected by the crash. Data can be exchanged between these processes through interprocess communication mechanisms but they cannot directly write to address spaces directly. Limiting the virtual address space of a user-mode application prevents the application from altering, and possibly damaging, critical operating system data.
The App-V product is especially complex when using it as an example because it contains code at both kernel and user mode. The App-V client engine consist of kernel level drivers, a primary service, and user-mode DLL’s that are injected into the processes of virtualized applications.
Next Up Part 3: Debugging Misbehaving Application Scenarios