This week I was assigned a bug with a single line stack trace:
Telerik.Windows.Controls.RadDocking.<>c__DisplayClass1b'::'<DropWindow>b__19
The exception type was of type NullReferenceException. The issue could be reproduced by repeatedly docking and undocking a window in the application for about 30 seconds. The result was an unhandled exception that took down the application.
The single line indicated that the exception originated somewhere in Telerik’s RadControls for Silverlight, probably a compiler generated class for a closure.
Ildasm
Ildasm is a tool that lets you look at the .Net IL code generated by the compiler. Looking at the Telerik Docking dll with Ildasm, the generated class and method can be seen:.
.method public hidebysig instance void 'b__19'() cil managed
{
// Code size 18 (0x12)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldfld class Telerik.Windows.Controls.RadPane
Telerik.Windows.Controls.RadDocking/'<>c__DisplayClass1b'::activePane
IL_0006: callvirt instance class Telerik.Windows.Controls.RadPaneGroup
Telerik.Windows.Controls.RadPane::get_PaneGroup()
IL_000b: callvirt instance bool [System.Windows]
System.Windows.Controls.Control::Focus()
IL_0010: pop
IL_0011: ret
} // end of method '<>c__DisplayClass1b'::'b__19'
The IL code shows a PanegGroup property being accessed followed by a call to a Focus method. The c__Displayclass class name indicates a closure.
Source Code
Telerik’s source code contains a RadDocking class with a DockWindow method that contains a closure that calls SetFocus on PaneGroup. Bingo!
Dispatcher.BeginInvoke(() => activePane.PaneGroups.SetFocus());
Workaround
The workaround is a common one in C#, add a null check against the property (PaneGroups) before calling the method (SetFocus).
What can we learn?
This fatal exception was found in a third party framework, thankfully during development. Lets examine how this happened and what can be done
Null checks
Tony Hoare, inventor of QuickSort, speaking at a conference in 2009:
I call it my billion-dollar mistake.
The billon-dollar mistake is the invention of the null reference in 1965.
C# references are null by default, and nullability is implicit.
Are null references really a bad thing? – Top Answer on Stack Overflow:
The problem is that because in theory any object can be a null and toss an exception when you attempt to use it, your object-oriented code is basically a collection of unexploded bombs.
How could this be done differently?
In F# references are not nullable by default, and nullability is explicit via the Option type, i.e. this issue could be removed by design.
Mutation
The PaneGroup property is most likely initialized with a valid reference before the call to BeginInvoke. The BeginInvoke method adds the Action to a queue and call it some time in the future.
C# objects are mutable by default.
This means that the state of the PaneGroup property may be mutated (set to null) before the closure is called.
F# objects are immutable by default, i.e. this issue could be removed by design.
BeginInvoke
It looks like SetFocus is being called asynchronously as UI Duct Type to workaround another issue where focus can not be set until the control is initialized:
It’s a standing joke on my current Silverlight project that when something isn’t working, just try Dispatcher.BeginInvoke.
This issue would require a framework fix where you could specify the control that receives focus by default.
Asynchronous calls
As the call to the closure was asynchronous it would be added to a queue, and later processed. The act of adding the closure to the queue removes it’s calling context which makes debugging hard.
Conclusions
Just this single line stack trace demonstrates a cacophony of language and framework design issues. Nullability by default in C# makes code look like a collection of unexploded bombs. Add asynchronous calls to the mix and you have even more chances of triggering one of those bombs. Worse working around the framework often forces you to make asynchronous calls to workaround other issues. Finally when a bomb does go off you are left with very little information to diagnose it.
Is OOP really a good paradigm for modern asynchronous UI programming?