Thursday, September 16, 2010

I want a "slide" operator

I've been writing imperative code for a long time and I noticed that I often extract a value from a variable, then change the variable immediately afterward. This is evident in numerous iterative algorithms, such as the fibonnaci algorithm:
int a = 0, b = 1, tmp;
for(;;) {
tmp = b;
b = a + b;
a = tmp;
Notice how "a" gets the old value of b, and "b" gets the old value of a + b.

Another pattern is that you want to do something the first time some code is reached:
if (!initialized) {
initialized = true;
Again, we read the value of the variable and then change it.

What if the user clicks a column on a ListView, then clicks a second column? Then the list should be sorted first by the second column, then by the column that was clicked earlier. I wrote some code for this earlier today:
public void SortByColumn(int column)
if (_primarySortColumn == column) {
_secondarySortAscending = !_secondarySortAscending;
_primarySortAscending = !_primarySortAscending;
} else {
_secondarySortColumn = _primarySortColumn;
_secondarySortAscending = _primarySortAscending;
_primarySortColumn = column;
_primarySortAscending = true;
Here, the pattern occurs in the "else" clause.

Since this pattern happens so much, I think there should be an operator for it, let's say "<-". A natural name for it would be the "shift" operator, but I'll call it the "slide" operator or maybe the "rotate" operator, to distinguish it from the bitwise left shift "<<". Using this operator, fibonnaci becomes
int a = 0, b = 1;
for(;;) {
a <- b <- a + b;
The operator is the same as the assignment operator, except that the result of the expression is the old value of the left-hand side, not the new value. So "a <- b <- a + b" is equivalent to "a <- (b <- a + b)", which is equivalent to "temp = b, b = a + b, a = temp". Note that "a <- b <- a + b" is also equivalent to "a = b <- a + b"; since the old value of "a" is discarded, it doesn't matter whether we use "=" or "<-" to change it.

The initialization code could be expressed more briefly with the slide operator:
if (!(initialized <- true))
And similarly for the sorting code:
_secondarySortColumn <- _primarySortColumn <- column;
_secondarySortAscending <- primarySortAscending <- true;

The semantics of the slide operator imply creation of a temporary variable, but probably in some cases the temporary can be eliminated if the compiler can determine that the order of operations doesn't matter, or that the temporary variable isn't being used.

The operator could also be used for swapping values:
a <- b <- a

A second temporary variable is needed in case the left-hand side is expensive or has side-effects, in which case it should only be evaluated once. For example, if the code is
savedValue <- GetActiveItem().Value <- savedValue
then GetActiveItem() shouldn't be called twice, so another temporary must be introduced (the exact details depend on the programming language):
temp1 = GetActiveItem();
temp2 = temp1.Value;
temp1.Value = savedValue;
savedValue = temp2;
Now when can I get my slide operator in a real language?

Wednesday, August 11, 2010

DI and Pervasive services

I have been learning recently about Dependency Injection for loose coupling (see for example these articles).

DI is really very simple. When a class X is designed for DI, it means that
  1. X itself avoids specifying what other classes it depends on. For example, it avoids calling "new Y()" or using a singleton class, and it accesses the services it needs through interfaces (or, occasionally, abstract classes) rather than references to concrete types.
  2. X's dependencies are exposed explicitly in the constructor, or by properties of X, so that whoever creates/manages X can choose the dependencies.
Because X doesn't choose its own dependencies, a third party can do so on its behalf. This makes it possible to swap out a service that X needs for some other service without changing X itself, which may allow X to be used in more situations than it was designed for. For example, if X needs a database service, it would take a reference to some database-related interface (e.g. IQueryable<T> or whatever, I don't know, I don't use databases much myself) in its constructor. Now X may have initially been intended to use a MySQL database, but because X doesn't open the database connection itself, you may be able to switch to PostgreSQL or Oracle without changing X at all. For unit testing you might not want to use a real database, so you could use a "mock object" that contains simulated data, based on LINQ-to-XML or something.

Often, there are often many levels of dependency. For example, a program for editing documents may have a main window, which contains a document editor, and toolbars/menus that depend on the document editor. The editor in turn depends on a document object and various user interface services. The document may depend on services for undo/redo, various data structures, and disk access, while the user interface services may depend on other data structures, drawing services, spatial analysis algorithms... who knows.

Often, in an app designed with DI, all the different components are wired together in a single place if possible, so that you can look in one place to see how components are connected and dependent on each other.

As the application grows more complex, the code that initializes all these objects via dependency injection also grows more complex. Eventually, you may get to a point where an IoC/DI framework like Ninject or Windsor or (my tentative favorite) Autofac can simplify all that initialization work.

But there are some services that are pervasive, services that you would have to pass to a hundred different constructors if you want to use DI "properly". Some examples are localization (to provide French and Spanish translations), logging (almost any component might want to write a diagnostic message), profiling (to gather performance statistics), and possibly "config options" (so end-users or admins can configure multiple components through a command-line, xml file or other source). In a compiler, a service for error/warning messages might be used all over the place. In Loyc, there might ultimately be hundreds of components that need to create AST Nodes or use other "pervasive" services.

It looks to me like passing such common services to constructors is more trouble than it's worth. If a component of Loyc may produce a warning message, is user-configurable, creates new AST nodes, and needs localization support, that's 4 constructor arguments just for "pervasive" services, never mind the more important, "meaty" services that it might need, like a parser, graph algorithm or whatever. If you use constructor injection, dozens of constructors are starting to smell.

How can we avoid the burden of passing around lots of pervasive service references? I'm not sure what the best way is. I have an idea in mind, but it is not appropriate for all cases. My idea involves a global singleton that can be swapped out temporarily while performing a specific task. I tentatively call it the Ambient Service Pattern.

For instance, in a compiler, the error/warning service might print to the console by default, or to an output window, but certain types of analysis might be "transactional" or "tentative": if an error occurs, the operation is aborted, and no error is printed, although if a warning occurs, it is buffered and printed if the operation succeeds. A concrete example of this scenario is the C++ rule known as SFINAE. Template substitution may produce an error, but if that error occurs during overload resolution, it is not really an error and no message should be printed. Given a global error service, we can model this rule by switching to a special error service, performing the operation, and then switching back afterward.

To implement this pattern, use a thread-local variable alongside the interface to manage the service:
interface IService { ... }

class Service
static ThreadLocalVariable<IService> _cur =
new ThreadLocalVariable<IService>();

// Current service
public static Cur { get { return _cur.Value; } }

// Use to install a new service temporarily, or to install the
// initial service when the program begins
public static PushedTLV<IService> Push(IService newValue)
{ return new PushedTLV<IService>(_cur, newValue); }
Depending on the app, there could be different threads doing independent work, which is why the thread-local variable is needed.

Note: if you need to support threads that fork, there is a problem because .NET thread-local variables cannot inherit their value from the parent thread. To work around this problem in Loyc I made a whole infrastructure for thread creation, with a ThreadEx to wrap the standard Thread class, and a ThreadLocalVariable<T> class to be used instead of [ThreadStatic], which registers itself in a global weak reference collection so that when a thread is created with ThreadEx, the values of all ThreadLocalVariables can be propagated from the parent thread to the child thread (custom propagation behavior is also supported). Obviously, this workaround is a huge pain in the ass since all code must agree to use ThreadEx and the new .NET stuff like the "Parallel Extensions" won't propagate the thread local variables. I guess to properly support .NET 4, I should try to find a new solution based on ThreadLocal<T>.

PushedTLV is a helper class that changes the value of a thread-local variable until it is disposed. Use it like this:
using (var old = Service.Push(newService)) {
// Perform an operation that will use the new service
// old service is automatically restored

An unfortunate consequence of this pattern is that the interface appears to be coupled to the thread-local variable. Sure, you could still write a class X that has a constructor-injected IService, but this might confuse those that indirectly use X: if someone changes Service.Cur, they might assume all code that needs an IService will use Service.Cur, but X could be using a different IService.

While the Ambient Service Pattern doesn't work like traditional dependency injection, it still follows the spirit of DI because components remain independent from a specific implementation of the pervasive service.

In the version of the pattern seen here, the service provided acts as a singleton. Of course, if the service is something that can be instantiated on-demand (e.g. a data structure), the thread-local variable would hold a factory instead of a singleton. The "Push()" method would switch to a different factory instead of a different instance, and there would be a "New()" method instead of a "Cur" property.

I wonder if the .NET Framework itself should adopt this kind of pattern. Consider what you do to open a file: you call File.Open() or make a FileStream, right? Now what if management decides "we need our app to be able to open files from an ftp site". Rather than change the code that calls File.Open() (and what if a third-party library is calling it?), wouldn't it be more elegant if you could swap in a new file management service that understands FTP sites (and perhaps maintains a local disk cache)? And hey, what if you want to change your console app to use a graphical console with icons and proportional fonts? What if you decide Debug.WriteLine needs to store a log somewhere?

Here's my implementation of PushedTLV, but as I mentioned, it's based on my own special ThreadLocalVariable class.
/// <summary>Designed to be used in a "using" statement to alter a 
/// thread-local variable temporarily.</summary>
public class PushedTLV<T> : IDisposable
T _oldValue;
ThreadLocalVariable<T> _variable;

public PushedTLV(ThreadLocalVariable<T> variable, T newValue)
_variable = variable;
_oldValue = variable.Value;
variable.Value = newValue;
public void Dispose()
_variable.Value = _oldValue;

public T OldValue { get { return _oldValue; } }
public T Value { get { return _variable.Value; } }
Do you have another idea for managing pervasive services with minimum fuss? Do tell.

Thursday, June 10, 2010

Cω stream flattening

The Microsoft research language Cω (C-omega), which predates C# 3.0 (and even 2.0?), extended C# with a number of interesting features, of which C# 3.0 includes several comparable features. To me the most interesting thing in Cω was streams, which are roughly just lists of things in IEnumerables. For instance, the stream "int*" basically means "IEnumerable<int>", but they allowed you to manipulate lists of things without using explicit loops:
int* FromTo(int from, int to)
for (i = from; i <= to; i++) yield return i;
void Example()
Console.WriteLine(string.Join(", ",
ToArray(FromTo(1, 10)[it % 2 == 0].ToString())));
// Output: 2, 4, 6, 8, 10
Luckily, LINQ provides similar features in C# 3.0, just with a different syntax:
IEnumerable<int> FromTo(int from, int to)
for (int i = from; i <= to; i++) yield return i;
void Ex()
Console.WriteLine(string.Join(", ",
(from i in FromTo(1, 10) where i % 2 == 0 select i).ToString()).ToArray()));

There's one thing missing from C# 3 that I would really like, though: flattening. Quite simply, you can combine Cω streams easily with no performance hit. The Cω guys illustrated this in a paper with a silly, but illustrative, alternative to FromTo():
int* FromToB(int b, int e)
if (b>e) yield break;
yield return b;
yield return FromTo2(b+1,e);
They explain tersely:
Without flattening we would be forced to copy the stream produced by the recursive invocation, leading to a quadratic instead of a linear number of yields:
int* FromToC(int b, int e)
if (b>e) yield break;
yield return b;
foreach (int i in FromTo3(b+1,e)) yield return i;

Basically, FromToC does the same thing as FromToB, except that it is extremely slow. The reason is that when you enumerate over FromToC, a new enumerator is created on every iteration, and because the enumerators are nested, each enumerator that exists must be called in sequence to enumerate a single value.

This is perhaps better illustrated by the following practical example, which gives you a list of all subfolders of a folder:
public IEnumerable SubFolders(string path)
yield return path;
foreach (string dir in Directory.GetDirectories(path))
foreach (string subfolder in SubFolders(dir))
yield return subfolder;

Suppose you write
var folders = SubFolders(@"C:\").GetEnumerator();
while (folders.MoveNext())

Imagine that somewhere in the middle of this process, the enumerator returns "C:\Windows\System32\drivers". This does not happen directly. First the MoveNext() method of the enumerator for "C:\" is called. That enumerator calls MoveNext() on its child enumerator, "C:\Windows". The child enumerator then calls MoveNext() on its child, "C:\Windows\System", which advances to "C:\Windows\System32\drivers". Thus, for all subfolders that are 3 levels deep, the MoveNext() method is actually called 3 times. I hope you can see that the code will get slower as the directory structure gets deeper--not just because the strings are longer, but because the MoveNext() gets called one extra time for every level of nesting.

Stream flattening is a feature of Cω that removes all the intermediate calls to MoveNext(), so that MoveNext() and Current are only called once for every folder returned, while at the same time making the code simpler:
public IEnumerable SubFolders(string path)
yield return path;
foreach (string dir in Directory.GetDirectories(path))
yield return SubFolders(dir);

Stream flattening and the "*" syntax make it more convenient to write coroutines. Coroutines are useful for numerous purposes, but perhaps the most obvious is game programming. In games you need to program numerous "actors", each with its own behavior. Actors' behavior can be described in a linear way. For instance an enemy might have logic like this:
- Wander around until player is within 10 metres
- Attack the player while my health > 25%
- Retreat until distance from player exceeds 50 metres
- Regenerate health at 1% per second
- If at any time my health reaches 0%,
- run death animation
- wait 10 seconds
- despawn the body

Logic like this could be expressed straightforwardly using streams... something like this:
class Enemy : AbstractEnemy
public override IAction* EnemyLogic()
// The game engine resumes this method every frame
foreach(var a in NormalBehavior()) {
yield return a;
if (Health <= 0)
yield return DeathAndUnspawn();
IAction* NormalBehavior()
for(;;) {
while(Distance > 10) {
yield return Wander();
Health = Math.Min(Health + 0.01 * TimePerFrame, 1.0);
while(Health > 0.25)
yield return Attack();
while(Distance < 50)
yield return Retreat();
IAction Attack() { ... }
IAction Retreat() { ... }
IAction Wander() { ... }

IAction* DeathAndUnspawn()
yield return DeathAnimation();
yield return Wait(10.0);
yield return Despawn();
IAction* DeathAnimation() { ... }

While you can certainly do this in regular C#, streams improve performance and make the syntax more convenient.

Of course, neither Cω streams nor C# generators were actually designed for coroutines; any function that contains code that can be "paused" must return some sort of IEnumerable, which can be inconvenient sometimes. For one thing, a pausable function cannot easily return a value to its immediate caller, only to its "highest-level" caller (such as the game engine).

But, since the .NET framework doesn't support genuine coroutines, this is the only option right now. Mono recently introduced continuations that it calls "tasklets", which can also be used to write coroutines (but only under Mono, not Microsoft .NET); Since they apparently duplicate and restore the entire call stack, though, I am concerned that Tasklets might have poor performance.

I don't actually know how the flattening is implemented; I'll be sure to update this entry if I figure it out.

Friday, May 21, 2010

New features the .NET framework should have

I recently wrote a class called CPTrie that stores a sorted collection of strings or integers in less space than a Dictionary or SortedDictionary. It took a long time to develop this data structure in .NET while minimizing memory and CPU usage. I am fairly convinced that without some of .NET's restrictions, I could have made an even more compact data structure.

Allow me to suggest a few features that would make .NET better by allowing it to do certain tasks more efficiently, or by giving coders more flexible design options.

It is already easier to write efficient code in .NET than Java, of course, thanks to features like structs (value types), Generics and compiled expression trees. But it could be better.


I love the type safety that .NET provides, but people seem to forget that type safety doesn't have to be a straightjacket. They think, for instance, that pointers automatically kill type safety, apparently forgetting that "references" are nothing but neutered pointers. Pointers aren't inherently dangerous, rather it is certain operations that are dangerous, like reading or writing past the end of an array. If your language can prevent you from doing dangerous things with pointers then they suddenly become safe.

Google Go recognizes this, and provides a safe form of pointer called a slice. A slice is a pointer to an array that stores the array size with the pointer. This is not the same as a .NET array reference, which always points to the beginning of the array. A slice can point to a subset of an array, and if .NET had slices, all functions of the form
int Sum(int[] array, int startIndex, int endIndex) {...}

could be replaced simply with
int Sum(int[..] array) {...}

And you would probably call the function with a nice syntax like
int result = Sum(array[startIndex..endIndex]);
// or Sum(array) to pass the whole array
Plus, these functions often come with a "convenience overload" to process the entire array...
void Foo(int[] array) { Foo(array, 0, array.Length); }
...which would not be necessary with slices.

Slices can be more efficient than the traditional triplet (array, start, end) for three reasons:
  1. A slice is two values (pointer + length). Passing it around is faster than passing around three values.

  2. Triplets use 50% more memory (relevant if you need to store many of them)

  3. All functions that access an array through a triplet must check the array index on every access. For instance, in today's .NET, Sum() might be written like this:
    void Sum(int[] array, int startIndex, int endIndex)
    // I avoid using the "less than" sign because Blogger is still
    // terribly buggy after several years. Lazy bums.
    if (0>startIndex || startIndex>endIndex || endindex>array.Length)
    throw new ArgumentException();

    int total = 0;
    for (int i = startIndex; array.Length > i; i++)
    total += array[i];
    return total;

    With slices, the code is shorter and the JIT optimizer could do a better job:
    void Sum(int[..] slice)
    int total = 0;
    for (int i = 0; slice.Length > i; i++)
    total += slice[i];
    return total;

    Microsoft .NET can detect that a for loop goes from 0 to the array length, and optimize out the range check. It should be easy to do the same for slices, which could make a method like this significantly faster, yet still perfectly type-safe. Meanwhile, the JIT is unable to optimize the first loop (the one that uses an array) because it is too complex, and there may be additional overhead because the JIT can only hold 2 or 3 variables in registers.

Slices would be even more useful if you could cast them in type-safe ways: for instance you should be able to cast a int[4] slice to a short[8] slice, and you should be able to cast a short[7] to an int[3] (there is the possibility of an alignment exception in the latter case on certain processors, but it is still "safe" in the sense that memory corruption is impossible). Slice casts would be extremely useful for parsing various binary data blobs (e.g. binary files, network packets), which are difficult to deal with in .NET right now.

Such casts are perfectly safe, provided that casts involving reference types are prohibited.

There's one disadvantage of allowing the slice cast: the optimizer would have to assume that a value-type slice may point to the same physical memory as a different kind of slice or array. But C has the same problem and it still wins the speed race.

One roadblock to implementing slices in .NET may be a challenge to the garbage collector. In .NET today all references point to the beginning of an object. Slices can point to the middle of an object, which means that the garbage collector needs a way to find the beginning of an object given a pointer to the middle. If this is not possible, slices will have to be expanded to a triplet: a third value is required, pointing to the beginning of the object being kept alive by the slice.

A slice in triplet form would still have a performance advantage over normal (array, index, length) triplets, because while the pointer to the beginning of the object would have to be passed around, it never has to be accessed except by the garbage collector. On x86 this means it can be held only on the stack and not in a register, saving scarce registers for other things.

Programmers should be able to get the slice for an entire array given a slice of just part of the array. However, if this is allowed, note the security implication: anyone returning a slice to a partial array must be aware that the receiver can get access to the entire array.

Slices that point to null are also possible. .NET would have to guarantee that null slices always have a Length of 0. Note that null slices are advantageous over null array references, because you can access the Length field of a slice without risking a NullReferenceException, so you need not write code to handle both "null" slices and "zero-length" slices.

Fixed-size arrays

In some cases it would be really nice to have zero-overhead fixed-size arrays. For instance, I have seen people representing points like this:
class Point {
private float[] _coords = new float[3];
public float X { get { return _coords[0]; } set { _coords[0] = value; } }
public float Y { get { return _coords[1]; } set { _coords[1] = value; } }
public float Z { get { return _coords[2]; } set { _coords[2] = value; } }
public float this[int n] {
get { return _coords[n]; } set { _coords[n] = value; }
This is very inefficient if you need to keep track of large numbers of points, since the 12 bytes of coordinate data is accompanied by 20 bytes of overhead: 8 bytes for the class and 12 bytes for the array. Moreover, all access to the X, Y and Z properties is guarded by an array range check, even though the subscripts and array size are constants. Only the indexer should require a range check.

To solve these problems, .NET should offer "safe" fixed-size arrays that hold the array directly in the container class or struct. This would save 12 bytes of overhead and allow faster access to the array, as well as faster object construction.

You can already use fixed-size arrays in .NET, but they are considered "unsafe" and have the following disadvantages:
  • They require you to sprinkle the "unsafe" keyword all over the place.
  • You have to set a compiler option to allow "unsafe" code.
  • Your code cannot run in secure environments, such as inside a web browser.
  • Fixed-size arrays cannot hold references to classes, or structs (only primitives are allowed).

Provided that there are array index range checks, there is really nothing unsafe about a fixed-size array, and no reason they can't hold references or structs. The reason for these silly restrictions (I think) is that the CLR doesn't have built-in support for fixed-size arrays, so C# resorts to pointer arithmetic, which is unverifiable to the CLR (and in that sense "unsafe").

Fixed-size arrays would work very nicely with slices, so that you could offer access to the array via a property:
class Point {
private fixed float _coords[3];
public float[..] Coords { get { return _coords; } }
However, this would only be legal inside classes. A struct could not return a slice of itself, since a struct could cease to exist before the slice does.

Go interfaces

In Go, classes do not have to explicitly say what interfaces they implement, and anybody that notices a pattern in the methods of two otherwise unrelated classes can declare that they both implement a certain interface.

Go's interfaces provide compile-time type checking, and since Go is built heavily around these "implicit" interfaces, I assume Google found a way to give them high performance similar to .NET interfaces. As such I think Microsoft would be smart to consider adding support for Go-style interfaces, which in turn would allow a fast implementation of the Go language on the CLR.

Go-style interfaces would allow developers to work around limitations in the way Microsoft designed interfaces. For instance, there are a lot of read-only collection classes (or collections like Stack where modifications are constrained), but Microsoft didn't bother to make a IReadOnlyCollection interface, so we're stuck writing four unsupported methods (Add, Clear, Remove, CopyTo) in addition to the three that matter (GetEnumerator, Count, Contains).

With Go interfaces, you can define your own private IReadOnlyCollection interface which all the BCL collections, and any collections made by third parties, automatically implement.

(I am not, of course, suggesting that traditional interfaces be abandoned. They are too well entrenched and, I have learned, are faster in some cases anyhow.)

Update: it turns out this feature would be insufficient to support Go. In Go you can define new methods for classes defined elsewhere (or for primitive types, pointers, etc.) These could be compared to extension methods, but Go is not object-oriented and it's not really the same. In Go, as I understand it, an interface can bundle together methods defined in multiple modules that happen to operate on the same data type. I don't understand how it works - I haven't been able to find information on their dispatch mechanism. Anyway, the point is, if MS really wanted to support Go on .NET they'd have to study it more carefully than I have.

Go interfaces are comparable to Visual Basic 9's new "dynamic interfaces", except that in Go you get a error when you try to cast an object to an interface that the object does not support. In VB an exception occurs when you call an unsupported method on the interface, not during the cast.

Return type covariance

If class Circle is derived from class Shape, then the following code is perfectly safe and should be legal:

abstract class ShapeFactory {
abstract public Shape New();
abstract class CircleFactory {
override public Circle New() { ... }
I have encountered many situations where covariance would be useful, and it pains me to have to find a different design or implement a workaround just because Microsoft won't add this trivial feature.

Thread-local variable inheritance

Just a small pet peeve. When you create a child thread, it loses all the values of thread-local variables from the parent. There should be an option to copy values from the parent thread. I heard a rumor that somebody has a patent on this trivial idea, though.

Update: My own Ambient Service Pattern cannot work reliably without this feature.

Microsoft doesn't actually have to infringe that ridiculous patent to support the feature, either: they can simply (1) fire an event on each child thread when it starts, while blocking the parent thread, and (2) provide a way for the child thread to access thread-local variables of the parent thread. This way, Microsoft is merely giving developers the tools they need to make it possible to infringe the patent. Possible is a big improvement over impossible, friends.

Binary-comparing two value types

Another trivial feature that Microsoft left out of .NET is the ability to bitwise-compare two values. While developing the VList data structures for .NET, I really wanted this feature. There's no reason the 'ceq' opcode shouldn't support it, and as far as I can tell the ECMA-335 standard does not state whether 'ceq' can be used on arbitrary value types. However, C# doesn't allow it so I assume it's not supported.

It's true that bitwise equality isn't "perfect". For instance, floating point has a "positive zero" and a "negative zero" that compare equal, but are bitwise unequal. However, if two value types ARE bitwise equal then they are the same for all practical purposes, and it's hard to think why you would waste time calling Equals() (which, in structs, uses reflection by default!)

In the case of reference types you can call object.ReferenceEquals to find out if two references are the same, but what if you are writing generic code that handles both values and references? In that case you cannot use object.ReferenceEquals, which is why a bitwise comparison operator would be so useful.

A concrete example of this occurs in my VList code, which has a "Smart Select" method:

public FVList<T> SmartSelect(Func<T, T> map)
This method of FVList passes each element of the FVList to the map() function. If, for every element of the list, map() returns the same value as it was given, SmartSelect() can simply return the original list instead of a new one. The ability to do this is very important for efficient functional programming; unfortunately, .NET screws it up by making it difficult to tell whether map() returned its argument unchanged or not. We have to use virtual function calls, and maybe even reflection, to tell whether the output is the same as the input. That's stupid.

Integers in pointers

Okay, what I'm about to suggest may sound crazy. You have been warned.

In .NET, all references have 0s in the bottom two bits of the pointer. This is also true for C++ pointers to heap objects, and the Ruby language interpreter and the Judy array library use this fact to store a kind of safe union in a pointer:
union {
Object* pointer;
int integer;
If the two lowest bits are 0, the value is a pointer, otherwise it is an integer of some kind. A similar technique is to use the lowest bit to discriminate a union between two pointer types:
union {
Object0* pointer0; // if low bit is 0
Object1* pointer1; // if low bit is 1
While abusing pointers like this in C is dangerous (since you could easily dereference an integer by mistake, etc.), formalizing the technique can make it safe (just as the Ruby language is safe from memory corruption despite using this kind of union extensively).

In C#, the second union would be almost pointless since objects are self-typed, but it might be useful instead if you could use the low bit of a pointer as a boolean flag. For instance, .NET's SortedDictionary uses the following node structure:
internal class Node
private bool isRed;
private T item;
private Node left;
private Node right;
There's something to be said for a straightforward implementation like this, but if somebody's using this class extensively with a large data set, they might appreciate the memory savings (up to 17%) that would come from hiding the "isRed" flag inside the 'left' or 'right' reference.

Of course, this technique is pretty obscure, and is rarely needed, yet occasionally it is the only efficient technique to do the job. In Ruby this feature is key to the type system that unifies integers and objects. In Ruby "everything is an object", but storing integers on the heap is very inefficient. Although all Ruby variables are implemented as pointers, if you store a small integer in a variable, rather than allocating a heap object, Ruby just encodes the integer right into the pointer. Can you think of an approach that is remotely as efficient? Similarly, the Judy array's ability to store data compactly is heavily reliant on the fact that it can change the meaning of a "pointer" based on its low bits. I wanted to port Judy to .NET, and struggled for quite some time before concluding that approximating Judy is impossible in .NET.

To minimize changes to the CLR, I would propose a single pointer union be defined, in which the 2 low bits are given the following meanings:
  • 00: A pointer with a "false flag"
  • 10: A pointer with a "true flag"
  • 01 and 11: An integer (shift right 1 to get its value)
To store these values, the .NET BCL would define a special structure:
struct PointerOrInteger<T> where T:class
object _ptr; // not directly accessible
bool IsInteger { get; } // returns ((int)_ptr & 1) != 0
T Pointer { get; set; } // getter returns IsInteger ? null : _ptr & ~3
IntPtr Integer { get; set; } // getter returns IsInteger ? (IntPtr)_ptr >> 1 : 0
int Int32 { get; set; } // getter returns IsInteger ? (int) _ptr >> 1 : 0
bool Flag { get; set; } // getter returns (_ptr & 2) != 0

The garbage collector could use reflection/metadata to detect the presence of PointerOrInteger, in order to single it out for special treatment. However, it might be simpler if the GC assumed any pointer could be PointerOrInteger and therefore tested low two bits of every pointer before following it. If the two bits are 0, it can be followed normally; otherwise it is presumably a PointerOrInteger and the pointer must be retrieved through the Pointer property.

The disadvantage of this feature would be that the garbage collector would have to be aware of these special pointer types; code to deal with them might slow down the GC slightly, even in code that doesn't use the special pointers. However, this feature could probably be added without modifying CIL or the JIT, as the code to manipulate them could be neatly tucked away in PointerOrInteger.

SIMD Vector types

Microsoft ought to copy SIMD support from Mono. It's all explained here.

New features C# should have

The above features would need CLR changes, but many useful features need only language support.

Default interface implementations

Interfaces should be able to have default implementations. For instance, ICollection.Contains(T) could have a default implementation like this:

bool Contains(T item) default {
var c = EqualityComparer<T>.Default;
foreach(T t in this)
if (item.Equals(t))
return true;
return false;
Half the time this is exactly how one implements the Contains() method. Why require every collection author to write the same code? Of course, traits could also help us avoid code duplication and would be more powerful, so let me stick that on my list:


Read about them in this PDF.


The traditional type system (roughly speaking) classifies data by its structure: an array holds data in a certain way; a linked list holds data in a different way. There is an orthogonal concept of type, the "unit" type, which says something about the data itself. For example, an integer may contain a quantity measured in bytes, kilobytes, kilometres, sectors or pixels. A double has a different physical structure than an int, but it could have the same unit type.

It is safe to add or subtract two values as long as they have the same units. I can add 10 bytes to 5 bytes and get 15 bytes, but if I add 5 bytes to 10 DWORDs there is probably an error in my code. Every engineer and scientist knows that if you multiply "2 kg" times "2 m/s^2", the result is "4 N", not "4 lbf" and code that computes the latter result might have destroyed the Mars Climate Orbiter.

A unit type checker like the one I wrote for boo would semi-automatically detect such errors and emit a warning message. Unit checking would work best if it had some support from the Microsoft BCL (Base Class Library). That's because a unit checker needs to track units as they pass through the BCL. For instance, if I write "Math.Abs(-7`pixels`), the unit checker has to be told that the return value of Math.Abs() has the same units as its argument. It would be nice if the BCL had attributes on its methods to provide this kind of information.


C# should contain language support for tuples and "don't care" return values:

Tuple<int,int> DivMod(int n, int d)
return (n/d, n%d);
void Example()
int div, mod, five;
(div, mod) = DivMod(10, 3);
(five, _) = DivMod(25, 5); // "_" means "Don't care" (provided "_" is undefined)

Note that Tuple classes come with .NET Framework 4.0. For unpacking tuples, the compiler should not require the official Tuple class; instead it should allow any data type that has properties named Item1, Item2, etc. This would allow small tuples to be returned more efficiently in a structure:
struct Pair<A,B> {
public A Item1 { get; set; }
public B Item2 { get; set; }
Pair<int,int> DivMod(int n, int d)
return new Pair<int,int> { Item1=n/d, Item2=n%d };
void Example()
int div, mod;
(div, mod) = DivMod(10, 3);

Update: there are various other ways this idea could work, too. Some have proposed extending the concept of anonymous classes so they can be returned from methods, but this approach is problematic when independent DLLs are involved. Suppose one DLL return a (string Key, string Value) pair and an independent DLL accepts a pair with exactly the same signature. You should be able to pass the tuple directly from one DLL to the other, but that wouldn't work in practice because .NET doesn't support any kind of "structural typing".

For that reason I think an approach based primarily on the standard Tuple type would be best, with support for user-defined types when they are needed.

Inline variable declarations

The tuple feature would be more elegant if C# allowed variables to be declared where they are first used:
void Example()
(var div, var mod) = DivMod(10, 3);

When I first wrote this article, I said I would like to be able to just create variables implicitly like in boo:
int ReadNumber()
do { line = Console.ReadLine(); }
while (!int.TryParse(line, out value));
return value;

No doubt, however, this feature would create even more uproar than the "var" statement did. I think, though, that syntax like "int.TryParse(line, out var value)" is a reasonable compromise.

I have another idea for making code more concise and easier to optimize, namely, a ":" or ":=" operator to create "cached" values. You see, it's really easy to write code that repeats work unnecessarily:
int GetFoo(object key, int seed)
if (BTree[key].Record != null)
return BTree[key].Record.GetFoo(seed);
return seed;

Sure, the expression BTree[key].Record might be expensive to compute twice, but you'd have to do extra work to refactor it into a special local variable. Nobody wants that! So many/most developers will take the performance hit, and not worry unless it starts to get slow on their 3 GHz workstation with 8 GB of RAM. Too bad half of the users are on netbooks.

I strongly believe C# should make it easy to write fast code. This kind of case is the most common optimization I can think of that cannot be automated: only the developer knows if a result is safe to cache. So I propose this syntax for caching values:
int GetFoo(object key, int seed)
if (r:BTree[key].Record != null)
return r.GetFoo(seed);
return seed;

The "x:y" operator would simply be a more compact form of "var x = y" with a higher precedence than most other operators, to make it easy to use inside expressions.
Regrettably, "x:" cannot be allowed at the beginning of a statement because it could be mistaken for a goto-label (you'd have to use var x = y in that case.)

I chose this syntax because it minimizes clutter. It looks good without any whitespace around the colon, which is important to convey the idea that the operator has high precedence. High precedence, in turn, is important to reduce the number of cases where you need parenthesis, since parenthesis increase typing effort and reduce readability.

When used in an if-statement or loop expression, it's hard to say if the scope of the new variable (or value--maybe it should be read-only?) should extend outside the if-statement or loop expression. Perhaps if the user writes "var x = y" the scope should be limited to the loop (because the for and foreach loops already work that way), but if the user writes "x:y" then the scope should extend to the containing block. After all, there are many cases where you might want to use the cached value later on.

Cω Streams

This I'll make into a separate post.

Thursday, May 6, 2010

Why C# is better than C++

I could phrase this as "why I hate C++" or "why C++ sucks", but let's try the more positive spin, "why C# is better". The reasons are so numerous and compelling that there is only one reason to use C++ instead, and that is better performance.
  1. C# compiles much faster
  2. IntelliSense is much more reliable and faster (press F12 in Visual Studio to see the definition of any symbol)
  3. Automatic memory management cuts your development time in half all by itself - not just because you write less code, but also because you'll never have to track down "double free" problems and you'll very rarely have memory leaks (and if do you have memory leaks, CLR profiler can help track them down)
  4. No weird errors caused by #include order or #defines
  5. No more buffer overflows or other C-related security vulnerabilities
  6. Debugging is much easier; you can execute arbitrary expressions and call your own functions and properties from within Visual Studio's debugger (and SharpDevelop, I expect)
  7. Strings are handled the same way in all code (no more converting between various string representations)
  8. C# has anonymous inner functions with type inference (but the newest version of C++ has so-called "lambdas" too)
  9. GUIs are easier to make in C# (at least if you use WinForms. I found WPF very hard to learn)
  10. LINQ (Language INtegrated Query)
  11. "yield return" statement for writing generators and coroutines (approximate C equivalent)
  12. You can create and compile code at run-time using Reflection.Emit, Dynamic Methods or (easiest) LambdaExpression.Compile. These use the JIT engine to produce new machine code at run-time.
  13. The .NET standard libraries ("BCL") have more capabilities than those of C++, and the STL is more cumbersome (object->my_vector.erase(object->my_vector.begin() + index), anyone?)
  14. Unlike C++ templates, C# generics are guaranteed to work for all type parameters, do not bloat your code size, and can be used by modules linked dynamically (mind you, C++ templates can do some things that are hard/impossible for generics, but advanced use of templates is difficult)
  15. The MS C# compiler gives much better error messages than the MS C++ compiler
  16. You can mix different programming languages much more easily in .NET. Bjarne Stroustrup said, "I consider the idea of one language, one programming tool, as the one and only best tool for everyone and for every problem infantile"--yet C++ isn't designed to inter-operate with any language other than C. You can use SWIG if necessary, but it's got a big learning curve and C++ can't take credit for it anyway. On Windows, COM is a possible solution, but it's a huge pain to write COM classes in C++.
  17. There are various tools for analyzing and modifying .NET assemblies/programs after they are compiled, e.g. PostSharp provides aspect-oriented programming and Microsoft Code Contracts let you specify preconditions, postconditions and invariants in your classes.
  18. You can easily see how the standard libraries work in their binary form, using Reflector (the source code of the BCL is also available).
  19. You can write "safe" code that can run directly in a web browser (Silverlight)
  20. It is possible to write a C# program that targets a mobile device (ARM) and run the same binary on your desktop PC
  21. You'll no longer have to manage *.h files and write every function declaration twice. Well, you can save yourself work by leaving short"inline" functions in the header file, but you'll pay for it later with slower compile times.

    The need to use a different syntax for member functions in the header file than the implementation is a huge pet peeve of mine. Consider the difference between the header file declaration "virtual std::string Name(bool longForm = false);" and the cpp file equivalent "string ClassName::Name(bool longForm) { ... }". In C++ I'm expected to manually remove "virtual", "= false" and the semicolon, but add "ClassName::". Plus you might want to remove "std::" if you're "using namespace std" in the cpp file. Doing all this a few dozen times in a day can drive me mad, and of course the two copies make maintenance harder too.
  22. Dynamic linking and reflection make it easy to support plug-in architectures, and to use 3rd party libraries without compiling them yourself.
  23. No more dependency-detection glitches where you change a struct or virtual function table but not all dependencies are recompiled, leading to bizarre and unpredictable run-time behavior. In C# I never see such glitches, and if they did happen you would get a run-time exception rather than weird crashes or strange behavior.
  24. C# IDEs support automatic refactoring and Visual C# underlines syntax and semantic errors as soon as you make them. No such luck in C++!
Unfortunately, I still have to use C++! It is still the performance king and the standard on Windows platforms, and I maintain a performance-critical project for WinCE, on which .NET performs poorly.

Some say C# is slower, but I find that if you write code carefully (more like you would in C), you can make C# almost as fast as C. The Microsoft .NET JIT compiler does not optimize nearly as well as a C compiler, but I find that C# code generally runs faster than a debug build of equivalent C/C++ code.

Unfortunately, whereas C++ code may be 20 times slower when it runs on mobile (ARM-based) devices compared to a desktop PC, C# on the Compact Framework seems to be closer to 100 times slower than the same C# code running on a desktop PC. Microsoft needs to put a lot more work into their ARM version! Also note that Compact Framework has fewer features (e.g. no support for run-time code generation).

Update: Long after writing this blog post I made a C++ vs C# benchmark. The conclusions were exactly as I expected/feared: C# is often almost as fast as C++ (but about half as fast in certain cases), while the Compact Framework is 3-11 times slower than C++.

Now as much as I hate C++, and prefer C# over Java, the .NET Framework is far from perfect. Some parts of the BCL were badly designed, and by now it is getting to be extremely bloated. Also, I think the .NET framework needs some new features. Chief among my requests would be Go-style slices and interfaces, and return type covariance (of course). This would bring a lot of C's big-fiddling prowess to C# without compromising type safety. You know, that deserves its own blog entry.

Friday, February 26, 2010

The Compact Patricia Trie

I am not quite sure why I did it, but after learning about Judy, I had this obsession with bringing something comparable to .NET. I basically failed, so then instead I made a crazy data structure called a CPTrie. It's like a normal trie, but really compact! And much slower, I imagine. So I spent the last 3 days working on a CodeProject article about it. Enjoy! It is included in Loyc.Utilities.dll.