Tuesday, July 17, 2012

Constructors Considered Harmful

Ahh, constructors, constructors. In my opinion the constructor design in most languages including C++, C#, D and Java is flawed, because it exposes an implementation detail that should not be exposed, namely, which class gets allocated and when.

I offer you as "exhibit L" the Lazy<T> class in the .NET framework. This class's main purpose is to compute a value the first time you access it. For example:
 int x;
 Lazy<int> lazy = new Lazy<int>(() => 7 * x);
 x = 3;
 x = lazy.Value; // 21
 x = lazy.Value; // still 21 (Value is initialized only once)}
There is an annoying issue, though: Lazy<T> operates in some different modes, and it also contains a member and extra code to aid debugging. So in addition to holding the value itself and a reference to the initializer delegate, it's got a couple of other member variables and the Value property has to check a couple of things before it returns the value. Thus, Lazy<T> is less efficient than it should be.

So what if, in the future, MS decided to optimize Lazy<T> for its default mode of operation, and factor out other modes into derived class(es)? Well, they can't ever do that. If MS wants to return a LazyThreadSafe<T> object (derived from Lazy<T>) when the user requests the thread-safe mode, they can't do that because all the clients are saying "new Lazy", which can only return the exact class Lazy and nothing else.

MS could add a static function, Lazy.New(...) but it's too late now that the interface is already defined with all those public constructors.

As "exhibit 2", a.k.a. "exhibit Foo", I recently wrote a library where I needed to provide a constructor that does a bunch of initialization work (that may fail) before it actually creates the object. By far the most natural implementation was a static member function:
 // Constructor with dependency injection
 public Foo(A arg1, B arg2, C arg3)
 {
 }
 // Static helper method provides an easier way to create Foo
 public static Foo LoadFrom(string filename, ...)
 {
     ... // do some work
     arg1 = ...; arg2 = ...; arg3 = ...;
     return new Foo(arg1, arg2, arg3);
 }
However, the client didn't like that, and insisted that LoadFrom() should be a constructor "for consistency" with the other constructors. I was able to rearrange things to make LoadFrom() into a constructor, but my code was a little clunkier that way.

So what I'm saying is, when clients directly allocate memory themselves with new(), it constrains the class implementation to work a certain way, and this constraint is unnecessary.

A final problem with constructors is that they can't have names.

For D I suggested a simple solution, which in principle could be added to any language that uses "new". The solution is to treat constructors as equivalent to static methods named "new". So when the user writes "new Foo(...)", this call might go to a real constructor, or it could go to a static method named "new" that returns a Foo. Likewise, if the user writes "Foo.new(...)", this call could go to a static method or to an actual constructor.

This tweak would allow the class Foo to change the implementation of a constructor to return an object derived from Foo, instead of Foo itself. It would also allow Foo to do some work before actually creating a Foo object. Finally, this proposal doesn't allow named constructors per se, but it does provide a similar syntax for named and unnamed creation methods, because the syntax to call an unnamed constructor, "Foo.new(...)" is similar to the syntax to call a named creation method, "Foo.newFromFile(...)".

I don't expect D or any other language to accept my proposal, but it's a good idea so I'm blogging it anyway. In .NET, this feature would be more practical if .NET compilers were to automatically create a static "new" method for every real constructor, and to prefer to emit calls to the static method (if present) instead of the real constructor. This rule would provide for binary compatibility, which improves upon the source-level compatibility provided by my syntax proposal.