Monday, November 14, 2011

A taste of hardware programming

Have you ever tried hardware programming with Verilog or VHDL? During University I had to program FPGAs with Verilog, and it struck me that hardware programming was a lot like software programming, but at the same time much different also. Instead of consuming plentiful RAM, hardware programming consumes a much more scarce resource (transistors) instead. Physics can cause your program to malfunction, but aside from that, I really enjoyed my one little hardware programming course.

The fundamental difference between hardware and software is that every logic gate inherently operates in parallel, unlike software which inherently operates sequentially, and requires a special mechanism--threads--in order to do tasks in parallel. Traditionally, writing multithreaded programs has been quite difficult, due to the danger of race conditions, the danger of deadlocks, forgetting to lock data structures, and so on. So it was a little surprising that such problems don't necessarily happen in hardware programming. In fact I found that the biggest assignment I was given (to make an "alarm system") was relatively fun and easy. And what made it relatively fun and easy was (1) that the solution was small enough to fit in our FPGAs without much effort, and (2) I didn't have to "update" variables.

For instance, when the user activates the alarm system, she has 15 seconds to exit the building before the device is "armed". To make this work in a software program, you might write an "event" method and configure some sort of timer to call the method each second. When the event happens, you would check whether we're in a "countdown mode", and if so, decrement a variable representing the number of seconds left, then display the new number of seconds on the screen by explicitly calling a method to update the screen.

In a hardware program, it works a bit differently. It's been a few years since I did this in hardware, but I'll give you the gist of it. You don't need a timer "event"; instead, there is a hardware clock running at a known speed, which I'll say is 10 MHz, i.e. 10 million clock cycles per second or 0.1 microsecond per tick. So I created a counter that takes the hardware clock as input and restarts every 1/60 of a second (because part of the user interface updates 60 times per second), and when it restarts, I programmed a particular wire to be "on" (1 bit) during that clock cycle only (IIRC). I used that signal as input to a second counter that restarts every second and similarly sets a particular wire to "on" during one clock cycle per second (out of 10 million). I defined a third counter to represent the countdown, which decreases each second if the countdown is active. The counter is kept in BCD format (binary-coded decimal, 4 bits per decimal digit) so that no math is required to convert the digits into a format suitable for display to the user.

The "screen" was a pair of 7-segment displays, i.e. a display that can show two characters (the numbers 0123456789 and, if you're creative, the letters AbCdEFgHIJLnoPqrStUy.) The "screen" showed only 16 bits of information total (two digits and two dots), so I simply connected 16 of the FPGA's metal pins directly to the 7-segment display. To draw something on the screen, my hardware program merely had to set those pins to "on" or "off" as appropriate.

I didn't have to execute any "commands" to make a number appear on the seven-segment (numeric) display. Instead, I wrote "functions" (not really functions, but blocks of code that represent circuits or sets of circuits) that did the following:
  • Examine the state of the device and, if a countdown is in progress, produce the countdown value as output (the output is 5 bits per digit while the input is 4 bits per digit, so I just set the extra bit to 0). If a countdown is not in progress then something else is selected as output, such as part of a message (e.g. "rEAdy" might be scrolling across the screen. Actually, displaying anything but numbers was not part of the assignment; I just added the ability to display messages for fun.)
  • Those two 5-bit numbers become the input to an output mapper, which converts each 5-bit number to a 7-bit character.
  • The 7-bit character is passed directly to the output pins, unless it is modified to display an edit cursor (another visual flourish that was not part of the assignment, and not relevant to this blog post either, for that matter).
One cool thing about hardware programming was how "natural" it was to maintain the correct program state and show the desired output on the screen. I just wired the various "functions" together (I honestly don't remember what they call the functions in Verilog) and it all "just worked". I didn't have to do anything to explicitly propagate state from one place to another; for instance I didn't have to issue a "command" to make the screen update.

Instead I simply declared that
  • the thing that does the countdown is wired to the thing that selects a 5-bit output;
  • that thing is wired to a pair of things that select 7-bit outputs; and
  • the two 7-bit outputs are wired to the thing that chooses the 16-bit output.
Every "function" in the program is continuously "computing" its value, so when the value of the counter changes, the new count shows up on the screen automatically, less than one clock cycle later. The screen updated automatically because it was physically wired to the output of a function, and that function updated automatically because it's really a circuit, and circuits are always updating because they exist in the real world and (as the laws of physics require) run continuously as long as electricity is flowing. It actually takes extra effort to create a circuit that does not respond immediately.

The fact that stuff happens automatically, without you having to remember to issue "commands", makes programming easier, and thus hardware programming is at times easier than software programming... until you run into the physical limitations of your device. My "alarm system" had a two-character display, so I simply made a duplicate copy of the "function" that converts the 5-bit representation to 7 bits. That way, the two copies could run in parallel. This approach was viable because the display was only two digits; if the display had 10 digits or more, 10 copies of the function might not fit on the chip. If I wanted to save space, I would create only one instance of the function, and then add a special circuit to "call" the function on each digit in sequence (e.g. a different digit every clock cycle). In that case I would need to create registers to save the result of running the function on each digit. Moreover, the chip doesn't have enough metal pins to represent 10 digits, so the screen would define some communication protocol that the chip would have to follow. Potentially, it could get messy. But, the tasks that students were given were fairly easy and fun.

Ever since I tasted hardware programming (and probably before then, actually), I have wished that software could offer the same kind of easy, automatic updates, so that variables inside the program automatically propagate to the screen, or to whereever they are supposed to go. Data binding provides part of the answer, but the problem is that you don't usually want to display your program's variables directly on the screen; you want to filter, sort, and format those variables first. If the user modifies what's on the screen, reverse transformations are needed to figure out how the internal variables should be changed. Recently, I discovered that there is a .NET library that (if used correctly) will provide these wonderful automatic updates. That library is called Update Controls (at CodePlex).

Functional languages like Haskell and F# also make programming easier, generally speaking, and like hardware programming, functional languages are sufficiently different from popular "imperative" languages that it feels like something altogether different. However, functional languages don't tend to represent graphical user interfaces (GUIs) in a natural way. For GUIs, all hail Update Controls.

Wednesday, July 27, 2011

I want a conditional dot operator

I was writing earlier about features I'd like to have in the .NET Framework and C#, but I forgot a couple. Earlier I wrote about the "slide operator". Now, I'd like to propose the conditional dot or "null dot" operator.

Some people will say this idea makes the language too complex, but I can't seem to get enough language features. I use just about all of C#'s existing features, including the "hidden" ones. And still I can think of many unsupported features that I know would improve productivity, code clarity, or performance... at least for me.

I think it's okay for C# to have tons of features, but the IDE needs to help teach people what they mean. For example, Visual Studio should show the name of an operator in a tooltip when mousing over it, and show a help page for it if the user puts the cursor on it and presses F1.

C# already has a handy "??" operator which lets you choose a default value in case the "first choice" is null:
Console.WriteLine("Your name is {0}.", firstName ?? "(unknown)");
I use this feature somewhat often. But something's missing. I would like, in addition, a "conditional dot" operator, another kind of null guard that deals with cases where an object you want to access might be null. For example, let's say your class has a reference to another class, and you'd like to inform it when something happened:
if (referenceToAnotherClass != null)
referenceToAnotherClass.OnSomethingHappened(info);
Of course you can't call the method if the referenceToAnotherClass is null. It would be nice if we could shorten this to something like
referenceToAnotherClass??.OnSomethingHappened(info);

If the method you want to call returns a value, the "??." operator would substitute null for the return value if the class reference is null. For example,
// firstName might be null; length will be null if firstName is null.
int? length = firstName??.Length;
It would be very natural to combine the "??." operator with the existing "??" operator:
// Equivalent to firstName != null ?  firstName.Length : 0
int length = firstName??.Length ?? 0;
This operator would be most powerful when it is chained together, or used to avoid creating temporary variables:
// If "DatabaseConnection", "PersonTable", and "FirstRow" can all 
// return null, chaining "??." simplifies your code a lot.
var firstName = DatabaseConnection??.Tables.PersonTable??.FirstRow??.Name;

// Equivalent to:
string firstName = null;
var dbc = DatabaseConnection;
if (dbc != null) {
var pt = dbc.Tables.PersonTable;
if (pt != null) {
var fr = pt.FirstRow;
if (fr != null)
firstName = fr.Name;
}
}

The operator should also help invoke events:
public event EventHandler Click;
protected void OnClick()
{
Click??.(this, EventArgs.Empty);
// equivalent to:
if (Click != null)
Click(this, EventArgs.Empty);

// Note: the dot in "??." is still required because "X??(Y)"
// would be indistinguishable from the null coalescing operator.
}
Somebody implemented a "null dot" extension method that provides this kind of "operator" in C#, except that it only supports one out of the four cases I just described, as it requires that the function you want to call return a reference type; it doesn't support void or struct return values. It's also slightly clumsy, and since it relies on a lambda, it hurts performance. To work well, this feature needs language support.

Right now I am working with code that often converts objects to strings (the objects are usually strings, but may be something else). The object is sometimes null, so I write
string s = (obj ?? "").ToString();
This works fine, but it's less efficient than it could be, because if obj == null, a virtual call to ToString() will be called on the empty string "". If the "null dot" or "conditional dot" operator existed, I would write this code as
string s = obj??.ToString() ?? "";
or even
string s = obj??.ToString();
if a null result is acceptable.

I know that an operator like this exists in some other languages, but I don't which ones at the moment. Anybody remember?

P.S. Microsoft somehow forgot to include a compound assignment operator, which should work like the other compound assignment operators.

twosies += 2; // equivalent to twosies = twosies + 2
doubled *= 2; // equivalent to doubled = doubled * 2
// ensure myList is not null
myList ??= new List<int>(); // myList = myList ?? new List<int>()

Tuesday, July 5, 2011

C++ vs C# Speed Benchmark, v.2

I just finished the second edition of my detailed benchmark comparing C++ and .NET performance. Hope you like it.
 

Why WPF sucks: an overview

I was just reading this article detailing someone's frustrations with WPF, and thought I should add to the chorus.

I agree with Mr. Mitchell, I'm fairly sure I will never like WPF. It has a steep learning curve because it's undiscoverable. WinForms was relatively easy to understand; its most complex feature was data binding. But WPF relies heavily on XAML, a language in which MS somehow expects you to "just know" what to type to get what you want, and when you somehow figure out what to type, half the time it doesn't work and you get a mysterious unhelpful exception (or no exception, but it still doesn't work.)

In WinForms, there was an easy-to-use designer and anything you did was translated to easy-to-understand C# (or VB) code; in fact that was the native representation! XAML, however, can't be translated to anything except BAML, so nobody can easily understand what a piece of XAML does, nor can we step through it in the debugger. To make matters worse, XAML relies heavily on dynamic typing (and even the C# part constantly makes you cast "object"s). This prevents IntelliSense from working very well, thwarts the refactoring tools (ever tried renaming a property that was bound in XAML?), and shifts tons of errors from compile-time to run-time.

In short, WPF programs look pretty, and offer better options for data visualization than WinForms (DataTemplates in ListBoxes are a major improvement), but beyond that they are a huge step in the wrong direction. How could Microsoft think this design was a good idea? If any company but Microsoft or Apple made a GUI architecture that was this difficult to use, no one would want to use it. For my company I started to develop some ideas for a GUI architecture that would have been much leaner and more elegant than WPF, but it looks like I may never write that code. The point is, I have some sense of how a GUI framework should work, and it's not like WPF.

I remember my first WPF app. I had an ItemsControl with a ControlTemplate containing a ScrollViewer with a Grid inside (note the learning curve: you can't use WPF very well without some idea what those terms mean.) I wanted to know: "How do I attach mouse event handlers to an ItemsControl to detect mouse clicks and drags on blank space?" The answer? In order to get mouse events with meaningful mouse coordinates (i.e. coordinates in scrollable space), it was necessary to obtain a reference to the grid using a strange incantation:

Grid grid = (Grid)_itemsControl.Template.FindName("Panel", _itemsControl);

Then I attached event handlers to the grid, and inside the mouse event handlers, get the mouse coordinates w.r.t. the grid using

Point p = e.GetPosition((IInputElement)sender);

Plus, in order to get mouse events on the entire surface, the control (actually the grid) must have a background.

In another case I had to use this incantation:

var r = VisualTreeHelper.HitTest(_panel, new Point(e.X, e.Y));
var hit = r == null ? null : r.VisualHit as FrameworkElement;

In WinForms, pretty much all the methods you need are members of the control class you are using. But this example shows that in WPF, sometimes you have to call a static method of some other class to accomplish what you want. Then consider the minimalist documentation (so that you really need to buy a book to learn WPF)... that's undiscoverability at its finest.

Update: Also, WPF is shockingly inefficient, as you may learn if you need to use a non-virtualized ListBox in order to get MVVM to work and the list has more than about 1000 items. Trivial items (short text strings, no DataTemplate) consume about 8 KB of memory per row, or 80 MB for 10000 items, and ListBox takes 4 seconds to handle a single arrow key pressed in a list with this many items.

Saturday, April 9, 2011

The story of IIterable

I like the .NET Framework. To me, C# is a much better programming language than Java or C++, and it offers good performance at the same time. However, some of the fundamental design decisions in the .NET framework bother me. Which brings me to the subject of this post: I'm remaking some of the .NET collections library, built on top of a replacement for IEnumerable<T> called IIterable<T>.

You see, I'm a bit of a performance nut. So it bothers me that the IEnumerator<T> interface requires two interface calls to retrieve each element of a collection: MoveNext() and Current. Therefore, I invented a replacement and named it Iterator<T>:
   public delegate T Iterator<out T>(ref bool ended);

It had a corresponding interface called IIterable<T>, which is analogous to IEnumerable<T>:
   public interface IIterable<out T>
{
Iterator<T> GetIterator();
}

Microbenchmarks show that this iterator has (at most) half the overhead of IEnumerator, as you would expect. More on that later.

When an iterator is called, it returns the next value in the sequence, and sets ended to true if there are no more elements. It should be noted that Iterator<T> was originally defined as follows:
   public delegate bool Iterator<out T>(out T current);

This version was supposed to behave like IEnumerator.MoveNext(), returning true on success and false if there were no more elements in the sequence. It also happened to return the next value in the sequence, eliminating the need for IEnumerator.Current. Code that used the original Iterator<T> might look like this:

T current;
for (var moveNext = list.GetIterator(); moveNext(out current);)
{
...
}

This was very similar to code that uses IEnumerator directly (instead of using a foreach loop):
   for (var e = list.GetEnumerator(); e.MoveNext();)
{
T current = e.Current;
...
}

Unfortunately, the .NET Framework did not allow this signature because the CLR does not support true "out" arguments--"out" arguments are the same as "ref" arguments but with a special attribute on them, so at the CIL level they still permit the caller to supply an input value. That makes them incompatible with the <out T> part of the declaration: since they technically accept an input value, they must be invariant, not covariant. Thus I had to change the definition as written above:
   public delegate T Iterator<out T>(ref bool ended);

That's unfortunate, because using the iterator is much clumsier this way. for-loops like the one above must be written like this instead:
   bool ended = false;
for (var it = list.GetIterator();;)
{
T current = it(ref ended);
if (ended)
break;
...
}

This clumsiness is avoided using an extension method:
   public static bool MoveNext<T>(this Iterator<T> it, out T value)
{
bool ended = false;
value = it(ref ended);
return !ended;
}

Unfortunately, there is a performance penalty for calling MoveNext(), which eliminates most of the performance gain you get from using Iterator in the first place.

Anyway, these "Iterators" are returned from IIterable<T>.GetIterator. There's more to it than this--I actually created a whole series of collection interfaces to fix other things I don't like about the design of .NET collection interfaces--but for this post I'm just going to focus on IIterable<T> and Iterator<T>.

I spent the last two days working on an implementation of LINQ-to-Iterators, and then discovered that there is a problem. A rather serious problem.

Iterator<T> and IIterable<T> and various other interfaces derived from it are covariant: Iterator<out T> and IIterable<out T>. This makes it possible to write pointless code such as this:
   IIterable<object> Combine(IIterable<string> a, IIterable<Uri> b)
{
return a.Concat<object>(b);
}

That is, IIterable<string> and IIterable<Uri> is implicitly convertible to IIterable<object>. It requires C# 4.0, but in theory you can do this with any version of the .NET Framework since 2.0. With a helper method written in C# 4.0, it's even possible to write code that does the same thing in C# 2.0.

However, as I began to implement LINQ-to-Iterable I realized that there is one major problem. In practice, any collection that implements IIterable<T> will also implement IEnumerable<T>, for fairly obvious reasons: Microsoft and third-party libraries expect IEnumerable<T>, and the foreach loop expects a GetEnumerator method.

But if a class implements both IIterable<T> and IEnumerable<T>, that class would suddenly have weird problems with LINQ. Why? Well, IEnumerable<T> supports LINQ, and once I finish LINQ-to-Iterable, IIterable<T> will support LINQ too, and that's precisely the problem. Let's consider what happens if I try to run a LINQ query on my Deque<T> collection that implements IIterable<T> and IEnumerable<T>:
   Deque<int> list = ...;
var odds = Iterable.CountForever(3, 2);
var primes = from p in list
where (p > 2 && (p & 1) == 1) || p == 2
where !odds.TakeWhile(n => n * n <= p).Any(n => p % n == 0)
select p;

The compiler complains because it doesn't know which LINQ implementation to use: "Multiple implementations of the query pattern were found for source type 'Loyc.Runtime.Deque<int>'. Ambiguous call to 'Where'." The problem disappears if I remove "using System.Linq" from the top of my source file, or if I do the query against "(IIterable<int>)list" instead of just "list". However, it's an inconvenience at best. At worst, if the user doesn't understand why the error occurred and how to solve it, it's a showstopper.

I solved this problem by changing the definition of IIterable as follows:
   public interface IIterable<out T> : IEnumerable<T>

This forces all implementations of IIterable to also implement IEnumerable, but it solves the problem because the compiler no longer considers the choice ambiguous: if a class implements IIterable<T>, the compiler will choose LINQ-to-iterable without complaint. The reason is that IIterable is now considered more specific, and the compiler prefers more specific method signatures.

The variance dilemma

Unfortunately, this causes a new problem: it refuses to compile for any .NET Framework version before 4.0! This is because Microsoft made a very strange decision: even though CLR version 2.0 supports generic variance, IEnumerable<T> is defined as invariant before version 4.0.

Interesting, is it not, how the interaction of seemingly unrelated issues--the way extension method resolution works in C# 3.0 (2006) and the decision Microsoft made in 2005 to mark IEnumerable<T> as invariant--combine to screw up my LINQ implementation?

I could only think of two ways to work around this problem.

(1) Restrict covariance to .NET 4.0, i.e. in .NET 2.0, make IIterable<T> invariant.

If I choose this option, not only do we lose covariance in .NET 2.0 and .NET 3.5 (which is sad because it's a cool feature), but it also forces me to release at least two DLLs, one for .NET 2.0 (which relies on LinqBridge for some features), and one for .NET 4.0. Probably a .NET 3.0 version is needed too, for those that don't want a dependency on LinqBridge.

If this problem didn't exist then you could have at least used the same DLL for .NET 3 and 4. Technically you could reference the .NET 3 DLL in .NET 4 and lose variance, but this would have the side effect of breaking interoperability with any program that uses the .NET 4 DLL instead.

(2) Don't derive IIterable<T> from IEnumerable<T>.

This option allows IIterable<T> to remain covariant, but causes conflicts with LINQ-to-Enumerable as I already described. I fear that a lot of developers, when confronted with the "Ambiguous call to 'Where'" error, will immediately say "screw this" and refuse to use my collection classes.

Therefore, my decision is to keep the following definition of IIterable<T>:
public interface IIterable<out T> : IEnumerable<T> { ... }

Blah, blah, blah

I have a couple of comments about the design of Iterator<T>. Firstly, unlike IEnumerator, Iterator<T> does not implement IDisposable (indeed, it can't, since it's a delegate). My main rationale for this is that if your collection's iterator (as opposed to the collection itself) needs to implement IDisposable, you may as well continue using IEnumerator<T>. Certainly if Microsoft had chosen to use my "fast iterator" implementation when it first gave birth to .NET, it should have included IDisposable:
   interface IEnumerator<out T> : IDisposable
{
bool MoveNext(out T current);
}

Of course, back in the original .NET Framework of 2002, they weren't especially concerned with performance yet. And they hadn't invented generics yet. Plus, to make this interface variance-ready, they would have had to formally define "out" parameters as equivalent to return values. And while they were at it they could have maybe implemented return value inheritance covariance. But I digress...

Also, the C# language makes Iterator<T> much easier to implement as a delegate. For example, the implementation of Iterator.CountForever is very simple:
   public static Iterator<int> CountForever(int start, int step)
{
start -= step;
return (ref bool ended) => start += step;
}

Although it would be possible to write a helper class that converts a lambda function to the hypothetical "IIterator<T>" interface, this would incur an extra interface invocation above the cost of invoking the lambda, not to mention an extra memory allocation. Therefore, IIterator<T> would generally not be any faster than IEnumerator<T> (without special compiler support to eliminate the overhead).

Secondly, you may have noticed that "ref bool ended" is "ref" instead of "out". My reasoning, again, is that if you're using Iterator instead of IEnumerator it's because you want to squeeze out every last drop of performance. Therefore, to save time, Iterators are not required to clear "ended" on every iteration; they only need to set it once, at the end.

One more thing. After I release this library, if you want to implement IIterable<T>, you can make your job easier by using IterableBase<T> or ListSourceBase<T> as your base class, which provides implementations of GetEnumerator(). If your collection class already has GetEnumerator, and you want to add IIterable<T>, you could add this method:
   public Iterator<T> GetIterator()
{
return GetEnumerator().AsIterator();
}

But that's not necessarily a good idea, because the iterator itself will be slightly slower than the IEnumerator it was derived from. Only if a multi-step LINQ-to-Iterable query is built on top of this would there be any performance benefit. (I'll get around to some benchmarks later).

The performance of Iterator

I wrote some simple loops that call IEnumerator<T> (MoveNext and Current) on the one hand, and Iterator<T> on the other. The collections being enumerated just count numbers:
   public static IEnumerator<int> Counter1(int limit)
{
for (int i = 0; i < limit; i++)
yield return i;
}
public static Iterator<int> Counter2(int limit)
{
int i = -1;
return (ref bool ended) =>
{
if (++i < limit)
return i;
ended = true;
return 0;
};
}

On my machine, in a Release build, it basically takes 7.1 seconds to count to a billion if I use IEnumerator<int>, 3.552 if I invoke Iterator<int> directly, and 6.1 seconds if I call the Iterator<int>.MoveNext extension method. The raw Iterator is thus about twice as fast as IEnumerator, although my test is imperfect since it ignores loop overhead and the content of Counter1 and Counter2. (It should also be noted that results vary from run to run, and seemingly irrelevant changes to the benchmark code also change the results. Sometimes my benchmark reports that IEnumerator requires 250% as much time, and sometimes only 170%, depending on the weather and the test PC.)

Conclusion

With a performance difference of only 3.5 seconds in a billion iterations, it's fair to ask whether IIterable is really worth it. Well, in most circumstances it probably isn't. But LINQ queries tend to be composed of many simple components, which means that the time spent invoking MoveNext, Current, and your lambdas may be a large fraction of the total runtime. In the future I'll try to quantify the difference.

If you do find that your LINQ-to-objects queries are slow, it often means you're doing it wrong: maybe you're using an O(N^2) algorithm when you should have picked a O(N log N) algorithm. But I believe in "the pit of success": developers shouldn't have to work much harder to write fast code. Microoptimizations, like writing out a LINQ query longhand with "foreach" or "for" loops, and calling through a List<T> reference instead of IList<T>, should virtually never be necessary. Instead, the low-level code they use as their foundation should be as fast as possible.

It's a hobby of mine to explore how the most basic components of the .NET Framework can be adjusted for better performance, in order to slide high-level code closer to that "performance pit of success". I don't really expect you to go out of your way to use IIterable<T>, because most of the time the performance difference is quite small. However, those who want to finally write a video game, compiler, simulation or other computationally intensive program in a truly high-level language like C# with LINQ--instead of juggling pointers in C++ and writing "for (int i = 0; i < (int)collection.size(); i++)" on the chalkboard another thousand times--will appreciate libraries like the one I'm developing.

Microsoft, if you're reading this: yes, I would consider a job offer. Put me with the cool kids that are designing C# 5.0. I'll tack on support for IIterable via yield return, and needless to say I have many other ideas for improvement. Also, if you pay me to write Loyc and rename it C# 6.0, I promise not to complain.

Thanks for reading!

Monday, April 4, 2011

Update Controls

I've felt for a long time that something was wrong with the way we do GUI development. For instance, suppose we have a form with a list of people and some information about them:

A pointless person editor
(This used to be an HTML form but CodeProject wouldn't render it.)

Suppose that the "serial number" box should be disabled if the user does not have a bike. Traditionally, we would write an event handler for when the value of the checkbox changes, and update the Enabled property when it does. Some code bases would also have code when the current person changes, and we must also handle the situation that the list of persons is empty.

GUI-specific data binding can solve this particular problem most of the time; typically you can bind the Enabled property on the serial number box to the value of the checkbox. However, it might not handle the case that no person is selected. And there are many other scenarios where data binding is insufficient:
  • Suppose we would like the ListBox of persons to show a summary of each person (based on multiple properties of the person)--well, traditional data binding can't aggregate different properties of an object.
  • Suppose we have an options dialog box that controls the appearance of the main window, we would like any changes to the options to immediately alter the main window's appearance--well, traditional data binding doesn't work across different windows.
  • Suppose we are showing a shopping cart and we'd like to keep a label up-to-date that shows the total number of items and the total price--well, data binding can't aggregate properties of multiple objects.
  • Suppose we'd like to display a certain panel only if the application is running in "administrator" mode, AND an item is selected, AND a check box is checked--well, traditional data binding can't derive a value from multiple unrelated sources of information.
  • Suppose we'd like to filter a list based on a user's search string--in general, data binding doesn't help with filtering.

In these cases, if data binding helps at all, it requires you to jump through hoops, or add event handlers for edge cases, or at least keep dependencies in mind when you invoke your INotifyPropertyChanged events. For example, suppose you're using WPF and your ListBox is bound to a Person.ListLabel property that returns a summary string with the name and rank of a Person. If nothing else, you'll have to make sure you raise the PropertyChanged event for "ListLabel" when the name or rank is changed. That is, the code for the Name and Rank properties must explicitly state that ListLabel depends on them. (Now, you can avoid creating the ListLabel property by using a DataTemplate, but if you've programmed WPF then you have probably encountered the need to fire multiple PropertyChanged events for a single property change.)

Ideally, the actual code of a program should be closely related to the behavior rules it is supposed to follow. More generally, the solution space should resemble the problem space; your code should express your business rules. Usually, behavior rules can be described in a simple way:
  • "The serial number field is enabled only if 'Has a bike' is checked and a person is selected".
  • "The list box will show the name and rank of each person."
  • "The font size of the main dialog is to match the font size selected on the options dialog."
  • "In the shopping cart, the total price is shown under the list of items."
  • "The list only shows items that contain the text typed in the 'Filter' box."
Notice how all of these requirements are declarative. They describe the desired results, without considering how to keep the user interface up-to-date when changes occur. What must the program do when the user selects a different person or a different invoice? What must it do when the server tells us there a new person, or that a new item was created remotely in the shopping cart? The requirements don't say anything about that. Wouldn't it be nice if we didn't have to write code to handle these circumstances either?

Introducing Update Controls

In a University undergrad project I experimented with starting to solve this problem, using a concept I called "propagating variables". I didn't entirely figure out how to make it work, though. So I was delighted to find that someone had solved this problem in .NET land, with a framework called Update Controls. Update Controls is a different way of programming, so it takes some practice, but I am delighted with how it streamlines my code.

UpdateControls not only keeps your GUI up-to-date, but it also separates the GUI from the rest of the code better than any framework I have seen before. UpdateControls supports not only WPF but WinForms and Silverlight, too, and it encourages you to write your code in such a way that the view is replacable. I'm using UpdateControls for WinForms right now, because I find it easier, but if I ever want to upgrade to WPF, I will be able to do so with minimal changes to my Model and ViewModel code. Moreover, I could easily design a program that supports both a WinForms user interface and a Silverlight one at the same time: two views, one ViewModel. If somebody decides to create "UpdateControls for MonoTouch" someday, I could add an iPhone version, sharing the same viewmodel between Silverlight, WinForms and MonoTouch.

By strictly separating the ViewModel from the View, and placing all nontrivial logic in the ViewModel, you can effectively write unit tests for your GUI by writing tests for the ViewModel. Just make sure that the code in the View is so simple that no tests are needed, and that the output you want to test for is computed by the ViewModel instead of the View. You still have to manually test the view to make sure it looks right, but its logic and behavior can be unit-tested.

The Presentation and Navigation Models

Moreover, I am finding that the "Update Controls" way of doing things leads to clean, maintainable code. I especially like the Presentation Model concept. I think of the presentation model as a kind of "file system" of GUI state. That is, the presentation model is the root of a tree structure from which you can reach all of your application's important state information: the list of open documents, the currently selected item in each listbox, the options in your Options dialog, and so forth. The view (Forms and Controls) merely reflects the state of the presentation model.

Conceptually the presentation model (PM) is broken down into two parts: the model, and the navigation model. I am not sure where the view-models fit into this pattern; heck, maybe this "file system" idea isn't exactly what Micheal Perry (author of Update Controls) was thinking of when he described the pattern.

But I don't think it's necessary to follow the pattern exactly as described in his blog. The key idea is that you can put all important state information in a big tree (analogous to a file system, or to the DOM in web programming), filled with the two basic Update Controls primitives, Independents and Dependents (known collectively as Precedents). Placing state in the UC-enabled presentation model makes your code very maintainable, because
  • You no longer need to answer questions like "what events do I have to fire when I change this?" or "what properties change when this property changes?"
  • If your state is in the presentation model, you no longer have to worry about what physical control(s) display a piece of state information--because controls don't depend on each other--and you can even move controls between different dialog boxes without breaking anything (other than the controls being moved).
  • A tree structure is a good basis for a mental model of the application. In applications that hold state in many different ways, in separate "islands", it may be hard to keep track of everything. But the PM concept gives you a way to mentally visualize your program state and to communicate about state variables (because everything important has a named location in the tree.)

Have you ever fiddled with the console variables (CVars) in a video game such as Quake or Crysis? It's cool how you can change a variable and the game's output instantly matches the changes you make. Update Controls and the presentation model bring this kind of magic to your own code. You just define the state variables, stick them in a tree, and write simple snippets of code that describe how a GUI control depends on the tree.

The Presentation Model tree

The tree I'm describing isn't anything fancy. It's just an ordinary collection of classes, organized and nested in whatever way is most natural for the application. For instance, I'm writing an application that displays a collection of objects on a form. Well, actually there might be more than one collection of objects, with each collection shown on a different tab on the user interface; but I decided that my presentation model would only represent one data set (I create extra instances for multiple data sets). This data set has two fundamentally different kinds of objects, but for this discussion it doesn't matter what they are, so let's call them Apples and Bananas.

The user can have either a "live view" which shows the current state of the objects, or a historical view, which shows the state of the objects at some point in the past. Plus, there is an options dialog, and the Presentation Model has a reference to the options. Note that there are no GUI objects in the presentation model; it provides access to the options, not to the dialog! It has four main properties (HModel, ApplesVM, BananasVM, Options) which, in turn, contain other properties, forming a tree:

PresentationModel (class)
.HModel (HistoryBrowserModel class: represents the model at one instant in time)
.Model (Model class: holds the core data model)
.Apples (list of FruitObject with state and history of each apple)
.Bananas (list of FruitObject with state and history of each banana)
(properties of FruitObject are not shown because they are not relevant
to this blog entry. In reality, my app is not fruit-related.)
.StartTime (DateTime: minimum value of the time slider in the GUI)
.EndTime (DateTime: maximum value of the time slider in the GUI;
derived automatically from time stamps on the FruitObjects)
.ViewTime (DateTime: current time being viewed in the GUI)
.LiveMode (bool: if true, ViewTime will be locked to EndTime)
.Apples (a projection of Model.Apples at the current ViewTime)
.Bananas (a projection of Model.Bananas at the current ViewTime)
.ApplesVM (FruitListViewModel class)
.CompleteList (list of FruitViewModels derived from HModel.Apples)
.Filter (user-defined filter that reduces the set of objects to display)
.FilteredList (list of FruitViewModels limited by the Filter)
.SelectedItem (currently selected FruitViewModel, or if multiple apples are
selected, a fake ViewModel that provides a combined view)
.BananasVM (another instance of FruitListViewModel)
.CompleteList
.Filter
.FilteredList
.SelectedItem
.Options (shared between all instances of PresentationModel)
.ServerAddress
... (more options)

What I'm most proud of is the time slider. Basically, the time slider is bound to the value of _pm.HModel.ViewTime (where _pm is the presentation model). When the user moves the slider, an event written by me updates _pm.HModel.ViewTime. UpdateControls figures out the rest:
  1. It knows that _pm.HModel.Apples and _pm.HModel.Bananas depend on ViewTime and automatically updates them.
  2. It also knows that _pm.ApplesVM.CompleteList and _pm.BananasVM.CompleteList depend on _pm.HModel.Apples and _pm.HModel.Bananas so it updates the CompleteLists too.
  3. In turn, the FilteredList properties depend on the CompleteList properties so they are updated.
  4. Finally, the Update Controls in the tab (ListBoxes, etc.) know that they depend on FilteredList, so they update themselves too. Like magic!

All I had to do was correctly associate my data with Update Controls primitives such as Independent, Dependent, Independent<T>, Dependent<T>, DependentList<T> and IndependentList<T>; and to implement GetHashCode and Equals on wrapper objects such as ViewModels (it's important to tell UpdateControls that two wrappers are equal if the wrapped objects are equal.) UpdateControls automatically detects dependencies between properties that use these primitives, and updates all Dependents on-demand (i.e. when their values are requested).

Has a bike?

So let's say you've designed a presentation model "PModel" for your application and it represents the form shown at the beginning of this blog entry. You're using Windows Forms. So, first, you need to give the form a reference to the presentation model:
PModel _pm;
public Form1(PModel pm)
{
InitializeComponent();
_pm = pm;
}

// Somewhere else
Application.Run(new Form1(new PModel()));

Then, to ensure that the "serial number" checkbox (an instance of UpdateCheckBox) is enabled at the right time, use the WinForms designer to create an event handler for the "GetEnabled" event, and put the following code in it:
return _pm.SelectedPerson != null && _pm.SelectedPerson.HasBike;

Similar code snippets must be written for the control's Text property and for properties of all the other controls on the dialog. I'm not going to explain right now how to actually use Independents and Dependents to create the presentation model; I would refer you to the Update Controls blog for that information. Right now I just want to point out the two key advantages of this approach over conventional event-driven code:
  1. The code closely follows the requirement: "The serial number field is enabled only if 'Has a bike' is checked and a person is selected". If the requirements change, it's easy to update the code to match them.
  2. If you used Update Controls correctly (assuming no bugs in UC itself), there is no way the program can malfunction with respect to this requirement. It will always behave as that one line of code describes. The UC Way automatically leads to fewer bugs--although if you're new to UC, bugs that do occur tend to be serious head-scratchers.

I should point out one disadvantage to UpdateControls: it will permeate your codebase, changing the way you write both your ViewModels and your Models, so you'll be making a major commitment. And while UpdateControls integrates with WPF very closely, the WinForms version requires a special "Update" version of every control that you want to stay up-to-date automatically, and only a few properties currently support automatic updates. If you need to use 3rd-party Model classes in an UpdateControls app, you'll have to write decorator classes to augment the model classes with "Independent sentries". Similarly, to use 3rd-party controls in a WinForms app, you'll need to create a derived class or a special container that has private "Dependent sentries" and updates them during Application.Idle events. For example, I had to write an UpdateTrackBar class to represent my time slider, because UpdateControls didn't come with one.

Another issue, currently, is that the way UpdateControls handles lists is not scalable. Normally lists have a single "independent sentry" that controls updates to the lists. Adding, removing, or modifying a single item will cause each dependency to scan the entire list. If you are filtering a million-item list down to 100 items that you show on-screen, that million items will be scanned and filtered every time any item is added, removed, or changed, even if that item is not one of the 100 items displayed.

I'm trying to think of a way to make it more scalable, but I have no solutions to offer yet. There is one mitigating factor: UpdateControls updates dependencies in a kind of "lazy" fashion. If you change several items at once (e.g. in a single method or for-loop) in the million-item list, each of its dependencies is typically updated only once (in WinForms, dependencies will be updated in the next Application.Idle event; in WPF they will be updated in a posted message. Early updates occur on-demand, however, e.g. if your code explicitly accesses the filtered list.)

Still, I am quite certain that using a framework like UpdateControls will become an industry standard practice someday.

To get started with UpdateControls yourself, read its blog, check out examples such as Noteworthy, and write a small app to try it out. Noteworthy is odd--a "blog editor" that doesn't have a body text field?--but it seems to be the best example available at the moment. If you have any questions about UpdateControls, it's probably best to ask them on the discussions list.

Thursday, September 16, 2010

I want a "slide" operator

I've been writing imperative code for a long time and I noticed that I often extract a value from a variable, then change the variable immediately afterward. This is evident in numerous iterative algorithms, such as the fibonnaci algorithm:
{
int a = 0, b = 1, tmp;
for(;;) {
print(u);
tmp = b;
b = a + b;
a = tmp;
}
}
Notice how "a" gets the old value of b, and "b" gets the old value of a + b.

Another pattern is that you want to do something the first time some code is reached:
if (!initialized) {
initialized = true;
DoSomething();
}
Again, we read the value of the variable and then change it.

What if the user clicks a column on a ListView, then clicks a second column? Then the list should be sorted first by the second column, then by the column that was clicked earlier. I wrote some code for this earlier today:
public void SortByColumn(int column)
{
if (_primarySortColumn == column) {
_secondarySortAscending = !_secondarySortAscending;
_primarySortAscending = !_primarySortAscending;
} else {
_secondarySortColumn = _primarySortColumn;
_secondarySortAscending = _primarySortAscending;
_primarySortColumn = column;
_primarySortAscending = true;
}
}
Here, the pattern occurs in the "else" clause.

Since this pattern happens so much, I think there should be an operator for it, let's say "<-". A natural name for it would be the "shift" operator, but I'll call it the "slide" operator or maybe the "rotate" operator, to distinguish it from the bitwise left shift "<<". Using this operator, fibonnaci becomes
{
int a = 0, b = 1;
for(;;) {
print(a);
a <- b <- a + b;
}
}
The operator is the same as the assignment operator, except that the result of the expression is the old value of the left-hand side, not the new value. So "a <- b <- a + b" is equivalent to "a <- (b <- a + b)", which is equivalent to "temp = b, b = a + b, a = temp". Note that "a <- b <- a + b" is also equivalent to "a = b <- a + b"; since the old value of "a" is discarded, it doesn't matter whether we use "=" or "<-" to change it.

The initialization code could be expressed more briefly with the slide operator:
if (!(initialized <- true))
DoSomething();
And similarly for the sorting code:
_secondarySortColumn <- _primarySortColumn <- column;
_secondarySortAscending <- primarySortAscending <- true;

The semantics of the slide operator imply creation of a temporary variable, but probably in some cases the temporary can be eliminated if the compiler can determine that the order of operations doesn't matter, or that the temporary variable isn't being used.

The operator could also be used for swapping values:
a <- b <- a

A second temporary variable is needed in case the left-hand side is expensive or has side-effects, in which case it should only be evaluated once. For example, if the code is
savedValue <- GetActiveItem().Value <- savedValue
then GetActiveItem() shouldn't be called twice, so another temporary must be introduced (the exact details depend on the programming language):
temp1 = GetActiveItem();
temp2 = temp1.Value;
temp1.Value = savedValue;
savedValue = temp2;
Now when can I get my slide operator in a real language?

Wednesday, August 11, 2010

DI and Pervasive services

I have been learning recently about Dependency Injection for loose coupling (see for example these articles).

DI is really very simple. When a class X is designed for DI, it means that
  1. X itself avoids specifying what other classes it depends on. For example, it avoids calling "new Y()" or using a singleton class, and it accesses the services it needs through interfaces (or, occasionally, abstract classes) rather than references to concrete types.
  2. X's dependencies are exposed explicitly in the constructor, or by properties of X, so that whoever creates/manages X can choose the dependencies.
Because X doesn't choose its own dependencies, a third party can do so on its behalf. This makes it possible to swap out a service that X needs for some other service without changing X itself, which may allow X to be used in more situations than it was designed for. For example, if X needs a database service, it would take a reference to some database-related interface (e.g. IQueryable<T> or whatever, I don't know, I don't use databases much myself) in its constructor. Now X may have initially been intended to use a MySQL database, but because X doesn't open the database connection itself, you may be able to switch to PostgreSQL or Oracle without changing X at all. For unit testing you might not want to use a real database, so you could use a "mock object" that contains simulated data, based on LINQ-to-XML or something.

Often, there are often many levels of dependency. For example, a program for editing documents may have a main window, which contains a document editor, and toolbars/menus that depend on the document editor. The editor in turn depends on a document object and various user interface services. The document may depend on services for undo/redo, various data structures, and disk access, while the user interface services may depend on other data structures, drawing services, spatial analysis algorithms... who knows.

Often, in an app designed with DI, all the different components are wired together in a single place if possible, so that you can look in one place to see how components are connected and dependent on each other.

As the application grows more complex, the code that initializes all these objects via dependency injection also grows more complex. Eventually, you may get to a point where an IoC/DI framework like Ninject or Windsor or (my tentative favorite) Autofac can simplify all that initialization work.

But there are some services that are pervasive, services that you would have to pass to a hundred different constructors if you want to use DI "properly". Some examples are localization (to provide French and Spanish translations), logging (almost any component might want to write a diagnostic message), profiling (to gather performance statistics), and possibly "config options" (so end-users or admins can configure multiple components through a command-line, xml file or other source). In a compiler, a service for error/warning messages might be used all over the place. In Loyc, there might ultimately be hundreds of components that need to create AST Nodes or use other "pervasive" services.

It looks to me like passing such common services to constructors is more trouble than it's worth. If a component of Loyc may produce a warning message, is user-configurable, creates new AST nodes, and needs localization support, that's 4 constructor arguments just for "pervasive" services, never mind the more important, "meaty" services that it might need, like a parser, graph algorithm or whatever. If you use constructor injection, dozens of constructors are starting to smell.

How can we avoid the burden of passing around lots of pervasive service references? I'm not sure what the best way is. I have an idea in mind, but it is not appropriate for all cases. My idea involves a global singleton that can be swapped out temporarily while performing a specific task. I tentatively call it the Ambient Service Pattern.

For instance, in a compiler, the error/warning service might print to the console by default, or to an output window, but certain types of analysis might be "transactional" or "tentative": if an error occurs, the operation is aborted, and no error is printed, although if a warning occurs, it is buffered and printed if the operation succeeds. A concrete example of this scenario is the C++ rule known as SFINAE. Template substitution may produce an error, but if that error occurs during overload resolution, it is not really an error and no message should be printed. Given a global error service, we can model this rule by switching to a special error service, performing the operation, and then switching back afterward.

To implement this pattern, use a thread-local variable alongside the interface to manage the service:
interface IService { ... }

class Service
{
static ThreadLocalVariable<IService> _cur =
new ThreadLocalVariable<IService>();

// Current service
public static Cur { get { return _cur.Value; } }

// Use to install a new service temporarily, or to install the
// initial service when the program begins
public static PushedTLV<IService> Push(IService newValue)
{ return new PushedTLV<IService>(_cur, newValue); }
}
Depending on the app, there could be different threads doing independent work, which is why the thread-local variable is needed.

Note: if you need to support threads that fork, there is a problem because .NET thread-local variables cannot inherit their value from the parent thread. To work around this problem in Loyc I made a whole infrastructure for thread creation, with a ThreadEx to wrap the standard Thread class, and a ThreadLocalVariable<T> class to be used instead of [ThreadStatic], which registers itself in a global weak reference collection so that when a thread is created with ThreadEx, the values of all ThreadLocalVariables can be propagated from the parent thread to the child thread (custom propagation behavior is also supported). Obviously, this workaround is a huge pain in the ass since all code must agree to use ThreadEx and the new .NET stuff like the "Parallel Extensions" won't propagate the thread local variables. I guess to properly support .NET 4, I should try to find a new solution based on ThreadLocal<T>.

PushedTLV is a helper class that changes the value of a thread-local variable until it is disposed. Use it like this:
using (var old = Service.Push(newService)) {
// Perform an operation that will use the new service
}
// old service is automatically restored

An unfortunate consequence of this pattern is that the interface appears to be coupled to the thread-local variable. Sure, you could still write a class X that has a constructor-injected IService, but this might confuse those that indirectly use X: if someone changes Service.Cur, they might assume all code that needs an IService will use Service.Cur, but X could be using a different IService.

While the Ambient Service Pattern doesn't work like traditional dependency injection, it still follows the spirit of DI because components remain independent from a specific implementation of the pervasive service.

In the version of the pattern seen here, the service provided acts as a singleton. Of course, if the service is something that can be instantiated on-demand (e.g. a data structure), the thread-local variable would hold a factory instead of a singleton. The "Push()" method would switch to a different factory instead of a different instance, and there would be a "New()" method instead of a "Cur" property.

I wonder if the .NET Framework itself should adopt this kind of pattern. Consider what you do to open a file: you call File.Open() or make a FileStream, right? Now what if management decides "we need our app to be able to open files from an ftp site". Rather than change the code that calls File.Open() (and what if a third-party library is calling it?), wouldn't it be more elegant if you could swap in a new file management service that understands FTP sites (and perhaps maintains a local disk cache)? And hey, what if you want to change your console app to use a graphical console with icons and proportional fonts? What if you decide Debug.WriteLine needs to store a log somewhere?

Here's my implementation of PushedTLV, but as I mentioned, it's based on my own special ThreadLocalVariable class.
/// <summary>Designed to be used in a "using" statement to alter a 
/// thread-local variable temporarily.</summary>
public class PushedTLV<T> : IDisposable
{
T _oldValue;
ThreadLocalVariable<T> _variable;

public PushedTLV(ThreadLocalVariable<T> variable, T newValue)
{
_variable = variable;
_oldValue = variable.Value;
variable.Value = newValue;
}
public void Dispose()
{
_variable.Value = _oldValue;
}

public T OldValue { get { return _oldValue; } }
public T Value { get { return _variable.Value; } }
}
Do you have another idea for managing pervasive services with minimum fuss? Do tell.

Thursday, June 10, 2010

Cω stream flattening

The Microsoft research language Cω (C-omega), which predates C# 3.0 (and even 2.0?), extended C# with a number of interesting features, of which C# 3.0 includes several comparable features. To me the most interesting thing in Cω was streams, which are roughly just lists of things in IEnumerables. For instance, the stream "int*" basically means "IEnumerable<int>", but they allowed you to manipulate lists of things without using explicit loops:
int* FromTo(int from, int to)
{
for (i = from; i <= to; i++) yield return i;
}
void Example()
{
Console.WriteLine(string.Join(", ",
ToArray(FromTo(1, 10)[it % 2 == 0].ToString())));
// Output: 2, 4, 6, 8, 10
}
Luckily, LINQ provides similar features in C# 3.0, just with a different syntax:
IEnumerable<int> FromTo(int from, int to)
{
for (int i = from; i <= to; i++) yield return i;
}
void Ex()
{
Console.WriteLine(string.Join(", ",
(from i in FromTo(1, 10) where i % 2 == 0 select i).ToString()).ToArray()));
}

There's one thing missing from C# 3 that I would really like, though: flattening. Quite simply, you can combine Cω streams easily with no performance hit. The Cω guys illustrated this in a paper with a silly, but illustrative, alternative to FromTo():
int* FromToB(int b, int e)
{
if (b>e) yield break;
yield return b;
yield return FromTo2(b+1,e);
}
They explain tersely:
Without flattening we would be forced to copy the stream produced by the recursive invocation, leading to a quadratic instead of a linear number of yields:
int* FromToC(int b, int e)
{
if (b>e) yield break;
yield return b;
foreach (int i in FromTo3(b+1,e)) yield return i;
}

Basically, FromToC does the same thing as FromToB, except that it is extremely slow. The reason is that when you enumerate over FromToC, a new enumerator is created on every iteration, and because the enumerators are nested, each enumerator that exists must be called in sequence to enumerate a single value.

This is perhaps better illustrated by the following practical example, which gives you a list of all subfolders of a folder:
public IEnumerable SubFolders(string path)
{
yield return path;
foreach (string dir in Directory.GetDirectories(path))
foreach (string subfolder in SubFolders(dir))
yield return subfolder;
}

Suppose you write
var folders = SubFolders(@"C:\").GetEnumerator();
while (folders.MoveNext())
Console.WriteLine(folders.Current);

Imagine that somewhere in the middle of this process, the enumerator returns "C:\Windows\System32\drivers". This does not happen directly. First the MoveNext() method of the enumerator for "C:\" is called. That enumerator calls MoveNext() on its child enumerator, "C:\Windows". The child enumerator then calls MoveNext() on its child, "C:\Windows\System", which advances to "C:\Windows\System32\drivers". Thus, for all subfolders that are 3 levels deep, the MoveNext() method is actually called 3 times. I hope you can see that the code will get slower as the directory structure gets deeper--not just because the strings are longer, but because the MoveNext() gets called one extra time for every level of nesting.

Stream flattening is a feature of Cω that removes all the intermediate calls to MoveNext(), so that MoveNext() and Current are only called once for every folder returned, while at the same time making the code simpler:
public IEnumerable SubFolders(string path)
{
yield return path;
foreach (string dir in Directory.GetDirectories(path))
yield return SubFolders(dir);
}

Stream flattening and the "*" syntax make it more convenient to write coroutines. Coroutines are useful for numerous purposes, but perhaps the most obvious is game programming. In games you need to program numerous "actors", each with its own behavior. Actors' behavior can be described in a linear way. For instance an enemy might have logic like this:
- Wander around until player is within 10 metres
- Attack the player while my health > 25%
- Retreat until distance from player exceeds 50 metres
- Regenerate health at 1% per second
- If at any time my health reaches 0%,
- run death animation
- wait 10 seconds
- despawn the body

Logic like this could be expressed straightforwardly using streams... something like this:
class Enemy : AbstractEnemy
{
public override IAction* EnemyLogic()
{
// The game engine resumes this method every frame
foreach(var a in NormalBehavior()) {
yield return a;
if (Health <= 0)
break;
}
yield return DeathAndUnspawn();
}
IAction* NormalBehavior()
{
for(;;) {
while(Distance > 10) {
yield return Wander();
Health = Math.Min(Health + 0.01 * TimePerFrame, 1.0);
}
while(Health > 0.25)
yield return Attack();
while(Distance < 50)
yield return Retreat();
}
}
IAction Attack() { ... }
IAction Retreat() { ... }
IAction Wander() { ... }

IAction* DeathAndUnspawn()
{
yield return DeathAnimation();
yield return Wait(10.0);
yield return Despawn();
}
IAction* DeathAnimation() { ... }
...
}

While you can certainly do this in regular C#, streams improve performance and make the syntax more convenient.

Of course, neither Cω streams nor C# generators were actually designed for coroutines; any function that contains code that can be "paused" must return some sort of IEnumerable, which can be inconvenient sometimes. For one thing, a pausable function cannot easily return a value to its immediate caller, only to its "highest-level" caller (such as the game engine).

But, since the .NET framework doesn't support genuine coroutines, this is the only option right now. Mono recently introduced continuations that it calls "tasklets", which can also be used to write coroutines (but only under Mono, not Microsoft .NET); Since they apparently duplicate and restore the entire call stack, though, I am concerned that Tasklets might have poor performance.

I don't actually know how the flattening is implemented; I'll be sure to update this entry if I figure it out.

Friday, May 21, 2010

New features the .NET framework should have

I recently wrote a class called CPTrie that stores a sorted collection of strings or integers in less space than a Dictionary or SortedDictionary. It took a long time to develop this data structure in .NET while minimizing memory and CPU usage. I am fairly convinced that without some of .NET's restrictions, I could have made an even more compact data structure.

Allow me to suggest a few features that would make .NET better by allowing it to do certain tasks more efficiently, or by giving coders more flexible design options.

It is already easier to write efficient code in .NET than Java, of course, thanks to features like structs (value types), Generics and compiled expression trees. But it could be better.

Slices

I love the type safety that .NET provides, but people seem to forget that type safety doesn't have to be a straightjacket. They think, for instance, that pointers automatically kill type safety, apparently forgetting that "references" are nothing but neutered pointers. Pointers aren't inherently dangerous, rather it is certain operations that are dangerous, like reading or writing past the end of an array. If your language can prevent you from doing dangerous things with pointers then they suddenly become safe.

Google Go recognizes this, and provides a safe form of pointer called a slice. A slice is a pointer to an array that stores the array size with the pointer. This is not the same as a .NET array reference, which always points to the beginning of the array. A slice can point to a subset of an array, and if .NET had slices, all functions of the form
int Sum(int[] array, int startIndex, int endIndex) {...}

could be replaced simply with
int Sum(int[..] array) {...}

And you would probably call the function with a nice syntax like
int result = Sum(array[startIndex..endIndex]);
// or Sum(array) to pass the whole array
Plus, these functions often come with a "convenience overload" to process the entire array...
void Foo(int[] array) { Foo(array, 0, array.Length); }
...which would not be necessary with slices.

Slices can be more efficient than the traditional triplet (array, start, end) for three reasons:
  1. A slice is two values (pointer + length). Passing it around is faster than passing around three values.

  2. Triplets use 50% more memory (relevant if you need to store many of them)

  3. All functions that access an array through a triplet must check the array index on every access. For instance, in today's .NET, Sum() might be written like this:
    void Sum(int[] array, int startIndex, int endIndex)
    {
    // I avoid using the "less than" sign because Blogger is still
    // terribly buggy after several years. Lazy bums.
    if (0>startIndex || startIndex>endIndex || endindex>array.Length)
    throw new ArgumentException();

    int total = 0;
    for (int i = startIndex; array.Length > i; i++)
    total += array[i];
    return total;
    }

    With slices, the code is shorter and the JIT optimizer could do a better job:
    void Sum(int[..] slice)
    {
    int total = 0;
    for (int i = 0; slice.Length > i; i++)
    total += slice[i];
    return total;
    }

    Microsoft .NET can detect that a for loop goes from 0 to the array length, and optimize out the range check. It should be easy to do the same for slices, which could make a method like this significantly faster, yet still perfectly type-safe. Meanwhile, the JIT is unable to optimize the first loop (the one that uses an array) because it is too complex, and there may be additional overhead because the JIT can only hold 2 or 3 variables in registers.

Slices would be even more useful if you could cast them in type-safe ways: for instance you should be able to cast a int[4] slice to a short[8] slice, and you should be able to cast a short[7] to an int[3] (there is the possibility of an alignment exception in the latter case on certain processors, but it is still "safe" in the sense that memory corruption is impossible). Slice casts would be extremely useful for parsing various binary data blobs (e.g. binary files, network packets), which are difficult to deal with in .NET right now.

Such casts are perfectly safe, provided that casts involving reference types are prohibited.

There's one disadvantage of allowing the slice cast: the optimizer would have to assume that a value-type slice may point to the same physical memory as a different kind of slice or array. But C has the same problem and it still wins the speed race.

One roadblock to implementing slices in .NET may be a challenge to the garbage collector. In .NET today all references point to the beginning of an object. Slices can point to the middle of an object, which means that the garbage collector needs a way to find the beginning of an object given a pointer to the middle. If this is not possible, slices will have to be expanded to a triplet: a third value is required, pointing to the beginning of the object being kept alive by the slice.

A slice in triplet form would still have a performance advantage over normal (array, index, length) triplets, because while the pointer to the beginning of the object would have to be passed around, it never has to be accessed except by the garbage collector. On x86 this means it can be held only on the stack and not in a register, saving scarce registers for other things.

Programmers should be able to get the slice for an entire array given a slice of just part of the array. However, if this is allowed, note the security implication: anyone returning a slice to a partial array must be aware that the receiver can get access to the entire array.

Slices that point to null are also possible. .NET would have to guarantee that null slices always have a Length of 0. Note that null slices are advantageous over null array references, because you can access the Length field of a slice without risking a NullReferenceException, so you need not write code to handle both "null" slices and "zero-length" slices.

Fixed-size arrays

In some cases it would be really nice to have zero-overhead fixed-size arrays. For instance, I have seen people representing points like this:
class Point {
private float[] _coords = new float[3];
public float X { get { return _coords[0]; } set { _coords[0] = value; } }
public float Y { get { return _coords[1]; } set { _coords[1] = value; } }
public float Z { get { return _coords[2]; } set { _coords[2] = value; } }
public float this[int n] {
get { return _coords[n]; } set { _coords[n] = value; }
}
}
This is very inefficient if you need to keep track of large numbers of points, since the 12 bytes of coordinate data is accompanied by 20 bytes of overhead: 8 bytes for the class and 12 bytes for the array. Moreover, all access to the X, Y and Z properties is guarded by an array range check, even though the subscripts and array size are constants. Only the indexer should require a range check.

To solve these problems, .NET should offer "safe" fixed-size arrays that hold the array directly in the container class or struct. This would save 12 bytes of overhead and allow faster access to the array, as well as faster object construction.

You can already use fixed-size arrays in .NET, but they are considered "unsafe" and have the following disadvantages:
  • They require you to sprinkle the "unsafe" keyword all over the place.
  • You have to set a compiler option to allow "unsafe" code.
  • Your code cannot run in secure environments, such as inside a web browser.
  • Fixed-size arrays cannot hold references to classes, or structs (only primitives are allowed).

Provided that there are array index range checks, there is really nothing unsafe about a fixed-size array, and no reason they can't hold references or structs. The reason for these silly restrictions (I think) is that the CLR doesn't have built-in support for fixed-size arrays, so C# resorts to pointer arithmetic, which is unverifiable to the CLR (and in that sense "unsafe").

Fixed-size arrays would work very nicely with slices, so that you could offer access to the array via a property:
class Point {
private fixed float _coords[3];
...
public float[..] Coords { get { return _coords; } }
}
However, this would only be legal inside classes. A struct could not return a slice of itself, since a struct could cease to exist before the slice does.

Go interfaces

In Go, classes do not have to explicitly say what interfaces they implement, and anybody that notices a pattern in the methods of two otherwise unrelated classes can declare that they both implement a certain interface.

Go's interfaces provide compile-time type checking, and since Go is built heavily around these "implicit" interfaces, I assume Google found a way to give them high performance similar to .NET interfaces. As such I think Microsoft would be smart to consider adding support for Go-style interfaces, which in turn would allow a fast implementation of the Go language on the CLR.

Go-style interfaces would allow developers to work around limitations in the way Microsoft designed interfaces. For instance, there are a lot of read-only collection classes (or collections like Stack where modifications are constrained), but Microsoft didn't bother to make a IReadOnlyCollection interface, so we're stuck writing four unsupported methods (Add, Clear, Remove, CopyTo) in addition to the three that matter (GetEnumerator, Count, Contains).

With Go interfaces, you can define your own private IReadOnlyCollection interface which all the BCL collections, and any collections made by third parties, automatically implement.

(I am not, of course, suggesting that traditional interfaces be abandoned. They are too well entrenched and, I have learned, are faster in some cases anyhow.)

Update: it turns out this feature would be insufficient to support Go. In Go you can define new methods for classes defined elsewhere (or for primitive types, pointers, etc.) These could be compared to extension methods, but Go is not object-oriented and it's not really the same. In Go, as I understand it, an interface can bundle together methods defined in multiple modules that happen to operate on the same data type. I don't understand how it works - I haven't been able to find information on their dispatch mechanism. Anyway, the point is, if MS really wanted to support Go on .NET they'd have to study it more carefully than I have.

Go interfaces are comparable to Visual Basic 9's new "dynamic interfaces", except that in Go you get a error when you try to cast an object to an interface that the object does not support. In VB an exception occurs when you call an unsupported method on the interface, not during the cast.

Return type covariance

If class Circle is derived from class Shape, then the following code is perfectly safe and should be legal:

abstract class ShapeFactory {
abstract public Shape New();
}
abstract class CircleFactory {
override public Circle New() { ... }
}
I have encountered many situations where covariance would be useful, and it pains me to have to find a different design or implement a workaround just because Microsoft won't add this trivial feature.

Thread-local variable inheritance

Just a small pet peeve. When you create a child thread, it loses all the values of thread-local variables from the parent. There should be an option to copy values from the parent thread. I heard a rumor that somebody has a patent on this trivial idea, though.

Update: My own Ambient Service Pattern cannot work reliably without this feature.

Microsoft doesn't actually have to infringe that ridiculous patent to support the feature, either: they can simply (1) fire an event on each child thread when it starts, while blocking the parent thread, and (2) provide a way for the child thread to access thread-local variables of the parent thread. This way, Microsoft is merely giving developers the tools they need to make it possible to infringe the patent. Possible is a big improvement over impossible, friends.

Binary-comparing two value types

Another trivial feature that Microsoft left out of .NET is the ability to bitwise-compare two values. While developing the VList data structures for .NET, I really wanted this feature. There's no reason the 'ceq' opcode shouldn't support it, and as far as I can tell the ECMA-335 standard does not state whether 'ceq' can be used on arbitrary value types. However, C# doesn't allow it so I assume it's not supported.

It's true that bitwise equality isn't "perfect". For instance, floating point has a "positive zero" and a "negative zero" that compare equal, but are bitwise unequal. However, if two value types ARE bitwise equal then they are the same for all practical purposes, and it's hard to think why you would waste time calling Equals() (which, in structs, uses reflection by default!)

In the case of reference types you can call object.ReferenceEquals to find out if two references are the same, but what if you are writing generic code that handles both values and references? In that case you cannot use object.ReferenceEquals, which is why a bitwise comparison operator would be so useful.

A concrete example of this occurs in my VList code, which has a "Smart Select" method:

public FVList<T> SmartSelect(Func<T, T> map)
This method of FVList passes each element of the FVList to the map() function. If, for every element of the list, map() returns the same value as it was given, SmartSelect() can simply return the original list instead of a new one. The ability to do this is very important for efficient functional programming; unfortunately, .NET screws it up by making it difficult to tell whether map() returned its argument unchanged or not. We have to use virtual function calls, and maybe even reflection, to tell whether the output is the same as the input. That's stupid.

Integers in pointers

Okay, what I'm about to suggest may sound crazy. You have been warned.

In .NET, all references have 0s in the bottom two bits of the pointer. This is also true for C++ pointers to heap objects, and the Ruby language interpreter and the Judy array library use this fact to store a kind of safe union in a pointer:
union {
Object* pointer;
int integer;
}
If the two lowest bits are 0, the value is a pointer, otherwise it is an integer of some kind. A similar technique is to use the lowest bit to discriminate a union between two pointer types:
union {
Object0* pointer0; // if low bit is 0
Object1* pointer1; // if low bit is 1
}
While abusing pointers like this in C is dangerous (since you could easily dereference an integer by mistake, etc.), formalizing the technique can make it safe (just as the Ruby language is safe from memory corruption despite using this kind of union extensively).

In C#, the second union would be almost pointless since objects are self-typed, but it might be useful instead if you could use the low bit of a pointer as a boolean flag. For instance, .NET's SortedDictionary uses the following node structure:
internal class Node
{
private bool isRed;
private T item;
private Node left;
private Node right;
}
There's something to be said for a straightforward implementation like this, but if somebody's using this class extensively with a large data set, they might appreciate the memory savings (up to 17%) that would come from hiding the "isRed" flag inside the 'left' or 'right' reference.

Of course, this technique is pretty obscure, and is rarely needed, yet occasionally it is the only efficient technique to do the job. In Ruby this feature is key to the type system that unifies integers and objects. In Ruby "everything is an object", but storing integers on the heap is very inefficient. Although all Ruby variables are implemented as pointers, if you store a small integer in a variable, rather than allocating a heap object, Ruby just encodes the integer right into the pointer. Can you think of an approach that is remotely as efficient? Similarly, the Judy array's ability to store data compactly is heavily reliant on the fact that it can change the meaning of a "pointer" based on its low bits. I wanted to port Judy to .NET, and struggled for quite some time before concluding that approximating Judy is impossible in .NET.

To minimize changes to the CLR, I would propose a single pointer union be defined, in which the 2 low bits are given the following meanings:
  • 00: A pointer with a "false flag"
  • 10: A pointer with a "true flag"
  • 01 and 11: An integer (shift right 1 to get its value)
To store these values, the .NET BCL would define a special structure:
struct PointerOrInteger<T> where T:class
{
object _ptr; // not directly accessible
bool IsInteger { get; } // returns ((int)_ptr & 1) != 0
T Pointer { get; set; } // getter returns IsInteger ? null : _ptr & ~3
IntPtr Integer { get; set; } // getter returns IsInteger ? (IntPtr)_ptr >> 1 : 0
int Int32 { get; set; } // getter returns IsInteger ? (int) _ptr >> 1 : 0
bool Flag { get; set; } // getter returns (_ptr & 2) != 0
}

The garbage collector could use reflection/metadata to detect the presence of PointerOrInteger, in order to single it out for special treatment. However, it might be simpler if the GC assumed any pointer could be PointerOrInteger and therefore tested low two bits of every pointer before following it. If the two bits are 0, it can be followed normally; otherwise it is presumably a PointerOrInteger and the pointer must be retrieved through the Pointer property.

The disadvantage of this feature would be that the garbage collector would have to be aware of these special pointer types; code to deal with them might slow down the GC slightly, even in code that doesn't use the special pointers. However, this feature could probably be added without modifying CIL or the JIT, as the code to manipulate them could be neatly tucked away in PointerOrInteger.

SIMD Vector types

Microsoft ought to copy SIMD support from Mono. It's all explained here.

New features C# should have

The above features would need CLR changes, but many useful features need only language support.

Default interface implementations

Interfaces should be able to have default implementations. For instance, ICollection.Contains(T) could have a default implementation like this:

bool Contains(T item) default {
var c = EqualityComparer<T>.Default;
foreach(T t in this)
if (item.Equals(t))
return true;
return false;
}
Half the time this is exactly how one implements the Contains() method. Why require every collection author to write the same code? Of course, traits could also help us avoid code duplication and would be more powerful, so let me stick that on my list:

Traits

Read about them in this PDF.

Units

The traditional type system (roughly speaking) classifies data by its structure: an array holds data in a certain way; a linked list holds data in a different way. There is an orthogonal concept of type, the "unit" type, which says something about the data itself. For example, an integer may contain a quantity measured in bytes, kilobytes, kilometres, sectors or pixels. A double has a different physical structure than an int, but it could have the same unit type.

It is safe to add or subtract two values as long as they have the same units. I can add 10 bytes to 5 bytes and get 15 bytes, but if I add 5 bytes to 10 DWORDs there is probably an error in my code. Every engineer and scientist knows that if you multiply "2 kg" times "2 m/s^2", the result is "4 N", not "4 lbf" and code that computes the latter result might have destroyed the Mars Climate Orbiter.

A unit type checker like the one I wrote for boo would semi-automatically detect such errors and emit a warning message. Unit checking would work best if it had some support from the Microsoft BCL (Base Class Library). That's because a unit checker needs to track units as they pass through the BCL. For instance, if I write "Math.Abs(-7`pixels`), the unit checker has to be told that the return value of Math.Abs() has the same units as its argument. It would be nice if the BCL had attributes on its methods to provide this kind of information.

Tuples

C# should contain language support for tuples and "don't care" return values:

Tuple<int,int> DivMod(int n, int d)
{
return (n/d, n%d);
}
void Example()
{
int div, mod, five;
(div, mod) = DivMod(10, 3);
(five, _) = DivMod(25, 5); // "_" means "Don't care" (provided "_" is undefined)
}

Note that Tuple classes come with .NET Framework 4.0. For unpacking tuples, the compiler should not require the official Tuple class; instead it should allow any data type that has properties named Item1, Item2, etc. This would allow small tuples to be returned more efficiently in a structure:
struct Pair<A,B> {
public A Item1 { get; set; }
public B Item2 { get; set; }
}
Pair<int,int> DivMod(int n, int d)
{
return new Pair<int,int> { Item1=n/d, Item2=n%d };
}
void Example()
{
int div, mod;
(div, mod) = DivMod(10, 3);
}

Update: there are various other ways this idea could work, too. Some have proposed extending the concept of anonymous classes so they can be returned from methods, but this approach is problematic when independent DLLs are involved. Suppose one DLL return a (string Key, string Value) pair and an independent DLL accepts a pair with exactly the same signature. You should be able to pass the tuple directly from one DLL to the other, but that wouldn't work in practice because .NET doesn't support any kind of "structural typing".

For that reason I think an approach based primarily on the standard Tuple type would be best, with support for user-defined types when they are needed.

Inline variable declarations

The tuple feature would be more elegant if C# allowed variables to be declared where they are first used:
void Example()
{
(var div, var mod) = DivMod(10, 3);
}

When I first wrote this article, I said I would like to be able to just create variables implicitly like in boo:
int ReadNumber()
{
do { line = Console.ReadLine(); }
while (!int.TryParse(line, out value));
return value;
}

No doubt, however, this feature would create even more uproar than the "var" statement did. I think, though, that syntax like "int.TryParse(line, out var value)" is a reasonable compromise.

I have another idea for making code more concise and easier to optimize, namely, a ":" or ":=" operator to create "cached" values. You see, it's really easy to write code that repeats work unnecessarily:
int GetFoo(object key, int seed)
{
if (BTree[key].Record != null)
return BTree[key].Record.GetFoo(seed);
else
return seed;
}

Sure, the expression BTree[key].Record might be expensive to compute twice, but you'd have to do extra work to refactor it into a special local variable. Nobody wants that! So many/most developers will take the performance hit, and not worry unless it starts to get slow on their 3 GHz workstation with 8 GB of RAM. Too bad half of the users are on netbooks.

I strongly believe C# should make it easy to write fast code. This kind of case is the most common optimization I can think of that cannot be automated: only the developer knows if a result is safe to cache. So I propose this syntax for caching values:
int GetFoo(object key, int seed)
{
if (r:BTree[key].Record != null)
return r.GetFoo(seed);
else
return seed;
}

The "x:y" operator would simply be a more compact form of "var x = y" with a higher precedence than most other operators, to make it easy to use inside expressions.
Regrettably, "x:" cannot be allowed at the beginning of a statement because it could be mistaken for a goto-label (you'd have to use var x = y in that case.)

I chose this syntax because it minimizes clutter. It looks good without any whitespace around the colon, which is important to convey the idea that the operator has high precedence. High precedence, in turn, is important to reduce the number of cases where you need parenthesis, since parenthesis increase typing effort and reduce readability.

When used in an if-statement or loop expression, it's hard to say if the scope of the new variable (or value--maybe it should be read-only?) should extend outside the if-statement or loop expression. Perhaps if the user writes "var x = y" the scope should be limited to the loop (because the for and foreach loops already work that way), but if the user writes "x:y" then the scope should extend to the containing block. After all, there are many cases where you might want to use the cached value later on.

Cω Streams

This I'll make into a separate post.