Saturday, February 21, 2009

Failure

Unfortunately, I have lost enthusiasm about Loyc. I just can't bring myself to do the incredibly large amount of work necessary to make a new compiler infrastructure. Besides that, I'm running into severe indecision trying to design the AST. I wish I could completely separate the implementation of the AST from its public interface, so that I could change it later if desired, but that's not entirely possible in C#.

I'm sure that if I had supportive friends and another programmer that shares my vision, I could do it, but I am so very alone in this endeavor. If you are reading this article, please leave a comment, otherwise I'll have to assume that not one damn person read it.

Boo recently added some very cool metaprogramming features in v0.9, the kind of thing that I would have liked to put in Loyc. Also it sounds like eventually boo will move to an extensible (PEG-based) syntax, which will theoretically give it a lot of the power that I wanted to give Loyc (albeit boo will still only be powerful enough to compile boo code, not C#--a key feature of Loyc is supposed to be multi-language support). However, the boo developers are terrible at documenting their language. I wonder how Rodrigo managed to find other developers to work on boo given his reluctance to communicate. Maybe it was that boo manifesto--it certainly won me over.

In the boo google group they recently called for people to write examples to showcase boo's new features, but they were unwilling to tell people how to actually USE the new features!

I asked:
Where is the documentation for the "macro" macro? Where is the documentation for using the AST and those cool [| AST expressions |] with $interpolation [...and...] where is the documentation for the AST classes?
No one responded.

I wrote:
I would like to write a macro in which you could write something
like...
x = 12
y = 7.0
z = "11"
total = 0.0
witheach Var in x, y, z:
total += Convert.ToDouble(Var)

and the macro would expand this to

x = 12
y = 7.0
z = "11"
total = 0.0
total += Convert.ToDouble(x)
total += Convert.ToDouble(y)
total += Convert.ToDouble(z)

But I don't know how to get started.
No one responded.

I asked:

can macros have memory? I think it would be cool to
have a pair of macros, let's call them "define" and "expand". define
would be used something like this:
define PointClass(P, T):
class P:
public constructor(x as T, y as T):
X=x; Y=y
public X as T
public Y as T
static def op_Addition(a as P, b as P):
return P(a.X+b.X, a.Y+b.Y)
static def op_Subtraction(a as P, b as P):
return P(a.X-b.X, a.Y-b.Y)
static def op_Multiply(a as P, b as P):
return a.X*b.X + a.Y*b.Y

and expand would be used like this to define three different kinds of
points:
expand PointClass(PointF, float)
expand PointClass(PointD, double)
expand PointClass(PointI, int)

Is this even possible with the current macro architecture?
No one responded.

A brief history of Loyc

I actually wrote a complete unit-inference engine for boo around two years ago, including small changes to the parser, so that you could write, for example,
_weight as double
def GetAcceleration(force as double`N`) `m/s^2`:
return force/_weight

And the boo compiler would automatically determine that _weight is measured in kilograms. Or, you could specify instead that _weight is `kg` and the engine would automatically infer that GetAcceleration returns `m/s^2`. In fact it was not even necessary to specify any units on the GetAcceleration method; it was sufficient to include units in any call to the method:
_weight as double`kg`
def GetAcceleration(force as double):
return force/_weight
def Foo():
_weight = 3
a = GetAcceleration(2`N`)
x = 3
a2 = GetAcceleration(x)

There are only two unit annotations in this code, but the engine has already inferred that the local variable x has units of `kg m/s^2`, i.e. newtons, while a and a2 have units of `m/s^2`.

If you later wrote code that contradicted the units that had been inferred, the compiler would give you a warning. I was very happy with my work and looked forward to using units in my everyday boo programming. I don't always use many physical units like kilograms and metres in my code, mind you, but I would certainly use a lot of other units like bits, bytes, dwords, records, pixels and percentages.

Unfortunately, as my engine required a parser change, it could not be used with standard boo unless boo's author accepted a parser patch. Unfortunately, when I announced my completed work on the boo group, there seemed to be absolutely no interest in it, and boo's author, Rodrigo B. de Oliveira, never even commented on it.

That's when I earnestly began to work on Loyc. I decided that the set of features a language supports should not be under the control of a single person (boo), corporation (C#), or committee (C++). Instead, I felt, a compiler should exist that allowed anybody to add new features. Soon I came up with a name for this idea: Loyc, or Language of Your Choice, because the my compiler would support multiple languages and it the user could choose what the language would support or prohibit.

But without even a single other person rallying behind my cause, I feel at this point it has been a failure. I would certainly consider working on boo instead, except that boo's developers don't seem interested in nurturing their community by helping people use boo. When boo finally has some half-decent documentation, I may start using it again. Hell, I'd write the documentation myself if I had any clue how to use boo's advanced features, but I don't, so I won't.

Friday, November 21, 2008

Using ICSharpCode.TextEditor

I wrote an article about using SharpDevelop's ICSharpCode.TextEditor on CodeProject.

Friday, October 3, 2008

Visual Studio dialogs are modal—for OpenOffice

This is really irritating. I use Visual Studio 2008 and OpenOffice 2.2.0 Writer at the same time at work. When Visual Studio shows certain modal dialogs, such as a wizard for a new project, or dialogs of the SourceSafe plugin, OpenOffice freezes up completely. It won't even redraw itself, let alone respond to mouse clicks.

Update: I thought Visual Studio was unaffected by this quirk until I started OpenOffice.org Writer at the same time as a SourceSafe "differences" dialog was already open. A message box appeared saying "Unable to complete operation" and then Visual Studio crashed (disappeared instantly). Hmm.

Friday, September 26, 2008

Bitstream Vera Sans Mono Bold

My favorite font for programming. Use a black background for vibrant colors.

In this color scheme I use an off-white (not full intensity) rather than pure white, otherwise the white text seems brighter than everything else. Interestingly, this font is typically bundled with Linux, but IMO it looks significantly better in Windows.

Friday, September 12, 2008

Symbols in .NET

I'm a big fan of Ruby's "symbols". Symbols are sort of like strings or enums, but different. Their syntax is an identifier with a colon in front, e.g. :Foo. See here for details.

I love using symbols in place of enums, because if they are implemented properly, comparing two symbols is as fast as comparing two integers (enums). Enums have the problem of non-extensibility; library B can't define new values for an enum in library A. Meanwhile, anybody can define a new symbol at any time.

Via Loyc I would like to add symbol support to C# and boo, but Loyc is a long way off as long as I have nobody to help me. In the meantime, see here for my current implementation of Symbols in C#.

To simulate enums using Symbols in C#, I just define a static class full of Symbols. For example:
public static class Tokens {
static public readonly Symbol WS = Symbol.Get("WS"); // whitespace
static public readonly Symbol NEWLINE = Symbol.Get("NEWLINE");
static public readonly Symbol ID = Symbol.Get("ID"); // identifier
static public readonly Symbol PUNC = Symbol.Get("PUNC");
static public readonly Symbol EOS = Symbol.Get("EOS");
static public readonly Symbol ML_COMMENT = Symbol.Get("ML_COMMENT");
static public readonly Symbol SL_COMMENT = Symbol.Get("SL_COMMENT");
...
}
Enjoy!

Simulating covariant return types in C#

For several years, Microsoft engineers have refused to add support for covariant return types, a trivially simple feature that should have been in the CLR from the beginning.

Suppose you want to write a Clone() method that returns a copy of the current object. Naturally you want to write the following, but it is illegal:
class MyStuff : ICloneable {
public MyStuff Clone() { ... }
}

Since you are implementing an interface, you can use this workaround that uses explicit interface implementation:
class MyStuff : ICloneable {
public MyStuff Clone() { ... }
object ICloneable.Clone() { return Clone(); }
}

The above workaround is okay for implementing an interface, but what if you are writing a class hierarchy, and you want a Clone() method that is virtual but has the appropriate return type?
class BaseNode : ICloneable
{
object ICloneable.Clone() { return Clone(); }
public virtual BaseNode Clone() { ... }
}
class ComplexNode : BaseNode
{
override BaseNode BaseNode.Clone() { return Clone(); } // Error!
public ComplexNode Clone() { ... }
}

Oops, the workaround that you use for interfaces is illegal for class inheritance. There is still a solution, though:
class BaseNode : ICloneable
{
object ICloneable.Clone() { return Clone(); }
public BaseNode Clone() { BaseNode c; Clone(out c); return c; }
protected virtual void Clone(out BaseNode clone) { ... }
}
class ComplexNode : BaseNode
{
public new ComplexNode Clone() { ComplexNode c; Clone(out c); return c; }
protected override void Clone(out BaseNode clone) { clone = Clone(); }
protected virtual void Clone(out ComplexNode clone) { ... }
}

That's right. You need six Clone() methods. The last method is virtual in case you want to make a class derived from ComplexNode, e.g. VeryComplexNode:
class VeryComplexNode : ComplexNode
{
public new VeryComplexNode Clone() { VeryComplexNode c; Clone(out c); return c; }
protected override void Clone(out BaseNode clone) { clone = Clone(); }
protected override void Clone(out ComplexNode clone) { clone = Clone(); }
protected virtual void Clone(out VeryComplexNode clone) { ... }
}

Without covariant return types, you have to to define an additional virtual function for each additional derived class.

Sunday, June 1, 2008

How to get started with Subversion on Sourceforge

To get started with TortoiseSvn as the admin of a new project:

  1. On your SourceForge project page, choose "Subversion" from the "Admin" menu.
  2. Check the box beside "The following box should be checked to enable Subversion:" and click Update. (Note: you may also want to disable CVS from the Admin | CVS page).
  3. Prepare the folder that has your source code in it by removing binary files that are generated by your compiler (e.g. remove "bin", "Debug", "Release" folders), so you are only left with files you want to put in the repository.
  4. Install TortoiseSvn if you haven't already.
  5. Assuming your root folder is called "MyProject" and the "unix name" of your project on SourceForge is "myproject", right-click on the MyProject folder (on your hard drive) and choose TortoiseSVN | Import...
  6. As documented here, use https://myproject.svn.sourceforge.net/svnroot/myproject as the URL of the repository. Then click OK and your files will be uploaded.
  7. Unfortunately, the MyProject folder is still not associated with the repository; you are required to "check out" the files, which means downloading them, in order to complete the association between the files on your computer and the repository. To accomplish this, first rename the MyProject folder to MyProject2--or just delete it, as you probably don't need it anymore. Then make a new MyProject folder, right-click on it and choose "SVN Checkout...". Use the same repository URL as before: https://myproject.svn.sourceforge.net/svnroot/myproject. Click OK and the files are downloaded.
  8. Now when you recompile your project, the object files and stuff will come back (e.g. in "Debug", "Release" folders). You can ensure that TortoiseSVN won't upload a file/folder by right-clicking on the file/folder and choosing TortoiseSVN | Add to ignore list | filename.
  9. After making changes to your program, right click the MyProject folder and choose "SVN Commit..." to upload the changes. TortoiseSVN will update existing files in the repository automatically, but you need to check the check box beside new ("unversioned") files. Then click OK.
  10. When sharing a repository with other people, you must frequently right click the MyProject folder and choose "SVN Update" to download changes made by others.
  11. To learn more about Subversion and TortoiseSVN, read the truly excellent TortoiseSVN manual, by right-clicking on any file and choosing TortoiseSVN | Help.
  12. When inviting other developers to your project, point them to this other handy tutorial.

Sunday, May 18, 2008

VList data structure in C# (.NET Framework 2.0)

I've made an implementation of Phil Bagwell's VList data structure in C#, with a fairly comprehensive test suite. It comes in two flavors: VList(of T), where you normally add/remove items at the beginning of the list, and RVList(of T), to which you normally add/remove items at the end. It implements the complete IList(of T) interface plus quite a few additional members including AddRange, InsertRange, RemoveRange, Push, and Pop. Converting a VList to a RVList and vice versa is a trivial O(1) operation that appears to reverse the order of the elements. VList and RVList are value types (structs) that contain a reference to the underlying linked list of arrays (VListBlock(of T)). Small lists (0 to 2 items) are optimized with a specialized block class (VListBlockOfTwo(of T)).

License: Lesser GPL. Contact me at qwertie256, at, gmail.com if you would like the source code. Here's an example usage:

void Example()
{
VList<int> oneTwo = new VList<int>(1, 2);
VList<int> threeFour = new VList<int>(3, 4);
VList<int> list = oneTwo;
VList<int> list2 = threeFour;

ExpectList(list, 1, 2);
list.InsertRange(1, threeFour);
ExpectList(list, 1, 3, 4, 2);
list2.InsertRange(2, oneTwo);
ExpectList(list2, 3, 4, 1, 2);

// oneTwo and ThreeFour are unchanged:
ExpectList(oneTwo, 1, 2);
ExpectList(threeFour, 3, 4);
}
static void ExpectList<T>(IList<T> list, params T[] expected)
{
Assert.AreEqual(expected.Length, list.Count);
for (int i = 0; i < expected.Length; i++)
Assert.AreEqual(expected[i], list[i]);
}

I thought that I would use the RVList to implement Loyc's AST to help make it possible to take AST snapshots easily, but I now suspect it's not a good idea. I am still working on the problem.

Performance characteristics

Similarly to a persistent linked list,
  • Adding an item to the front of a VList or the end of an RVList is always O(1) in time, and often O(1) in space (though, unlike a linked list, it may be much more)
  • Removing an item from the front of a VList or the end of an RVList is O(1) in time, although space not necessarily reclaimed.
  • Adding or removing an item at the end of a VList or the front of an RVList is O(N) and requires making a copy of the entire list.
  • Inserting or removing a list of M items at the end of a VList or the front of an RVList is O(N + M).
  • Changing an item at an arbitrary position should be avoided, as it performs as poorly as inserting or removing an item at that position.
VLists, however, offer some operations that singly-linked lists cannot provide efficiently:
  • Access by index averages O(1) in ideal conditions
  • Getting the list length is typically O(log N), but O(1) in my version
  • If a sublist points somewhere within a larger list, its index within the larger list can be obtained in between O(1) and O(log N) time. Consequently, reverse enumeration is possible without creating a temporary stack or list.
Also, VLists can (in the best cases) store data almost as compactly as ordinary arrays.

Monday, September 3, 2007

Loyc design issues

It's been fun designing Loyc, but boy, I've got a lot left to think about.

Right now I'm trying to figure out how to allow extensions to activate and deactivate statements based on arbitrary contextual criteria. One unanswered question is whether statements should have access to their parent node (ICodeNode) during parsing. The main problem with allowing it is that the parent nodes are, in general, not yet fully parsed when the child nodes are parsed, and it may be tricky to design convenient, reasonable, non-cumbersome semantics for the incomplete parent nodes. I'm leaning toward requiring only that the type Symbol of parent nodes be made available. Probably some other kind of context than the parent node ought to be available, such as symbol tables. In some languages, notably C++, symbol tables are considered necessary for correct parsing, although there are usually ways around such problems; for example I think FOG can parse C++ without them. Still, even if symbol tables aren't needed to parse, it often makes sense to build symbol tables during parsing. But in Loyc I also want to separate concerns as much as possible in order to maximize code re-use. By separating out the code for building symbol tables,
  1. it should be easier to add artificial (aka synthetic) nodes to symbol tables
  2. people can parse code without building symbol tables, which is nice if, for whatever reason, the symbol tables are not needed.
But I digress. There's lots of unresolved issues and I'd just like to summarize the ones I can think of...
  • There may be a lot of statements allowed from a lot of different extensions, perhaps hundreds, and the set of available statements may vary with every new block that opens. I'm planning to give statements full control over parsing their contents, including nested statements, but there will be a conventional way that statements can give control back to the language style. So the questions are
    • How to efficiently modify the set of available statements (I decree the split infinitive to be perfect English :P).
    • How to allow statements to specify when they are available. Arbitrary criteria should be possible but the most common case(s) should be easy for the user (i.e. extension writer) to use and should perform well. Or maybe the problem should be reconsidered as follows: how can block statements (that contain other statements) specify what categories of other statements they can contain?
    • How to provide the language style with enough control over how parsing operates that the original language spec can be supported under the Loyc extensible parsing model.
  • Similar concerns apply to operators. There may be hundreds of operators available in a program, but not all at once. Availability may be moderated by the parent statements and parent expressions.
  • Note to self: I need to introduce a new kind of OneOperatorPart that represents the edge of the expression. This would be a prerequisite to custom-syntax function calls such as Line(from x, y to x+10, y+10).
  • What kind of context information should be available during statement and expression parsing? Certainly the type Symbol of parent and grandparent nodes... but some statements may only be available if a certain custom attribute was used on the statement or a parent statement, so I think the set of attributes for parent/grandparents should be available too. And maybe availability based on attributes should be a standard feature, a criterion upon which Loyc activates/deactivates the statements automatically. But as I've said, providing the parent ICodeNode seems like too much to ask. I suppose it could be provided optionally.
  • As I mentioned above, there are two ways to look at how statements are allowed to be nested inside other statements. You can either have the substatements specify what they can be located inside of; or, the parent statement can say what kind of substatements it can contain. Should Loyc support both approaches?

Now consider this. Suppose somebody writes an "unless-else statement" extension:
unless (x < 0)
return new StringBuilder(x);
else
return new StringBuilder();
and somebody else writes an extension for "macro methods" which can be "instantiated" as normal methods:
macro(T) T Abs(T x) {
unless (x < 0) return x;
else return -x;
}

instantiate(int) Abs; // create method int Abs(int)
instantiate(long) Abs; // create method long Abs(long)
instantiate(float) Abs; // create method float Abs(float)
instantiate(double) Abs; // create method double Abs(double)
You can see that the macro method statement should be able to parse all statements that belong inside a method, and the "unless" statement should be allowed in the same places an "if" statement is allowed. You can see that if the "macro method" had to specify explicitly the kind of statements it supports, or if the "unless" statement had to specify explicitly the allowable parent nodes, then there is no way the two extensions could work together if neither author knew about the other's extension.

Therefore, I think statements should grouped by "category", where categories are classes of statements like "method body statements", "class body statements", "loop statements", "block statements", "conditional statements", etc. I suspect categories will be important for extensibility because they can allow statements to work together that are not aware of each other.

Hello, no one!

At this point I have no readership and no one has the foggiest idea what Loyc is, except maybe my non-programmer best friend, Ivan. (In case some random guy reads this, Loyc is the Language of Your Choice, a multi-language compiler that will allow anybody to add new features to existing programming languages.)

Before I try to get anybody on board the project, I need to work out enough design issues to present a reasonable description of its design. I've got some design documents written but not posted on the web yet--I need to re-learn how to upload to sourceforge.net, and the usability of their services is truly awful, at least if one is not a Unix guy. For now I should probably make pages on my local crappy WEBrick server. Then I won't have to upload anything, just move stuff to another folder on my hard drive.

For now you can read the incomplete doc about my extensible expression parser called ONEP. If anyone clicks on that link I'll be shocked, shocked! I'm excited to announce (to my zero readers) that the C# code for BasicOneParser is complete and you can post a message if you want a copy.

Anyway, I bought the domain name loyc.net a few months ago; in fact that's how I chose the name of the project: domains for most acronyms are already taken.

Saturday, August 11, 2007

Microsoft sure knows how to foil search engines

I have been trying to learn COM and .NET development lately, but every time I try to search for COM and .NET related stuff on Google, it seems like I get any website that has .com or .net in the domain name. Plus, search engines are harly able to tell the difference between C, C++, C# and pages that are indexed by their first letter. For .NET I've got around this problem by searching for ".NET framework" or CLR, but for COM there seems to be no way to find information about it.

What TLA will they come up with next? THE? FOO?

P.S. My 80-gig secondary hard drive has died. It's only a matter of time before my 200-gig explodes...

Tuesday, August 7, 2007

Microsoft C++ doing what Loyc is doing?

I came across this MSDN article today, which says
...when the project is built, the compiler parses each C++ source file, producing an object file. However, when the compiler encounters an attribute, it is parsed and syntactically verified. The compiler then dynamically calls an attribute provider to insert code or make other modifications at compile time. The implementation of the provider differs depending on the type of attribute. For example, ATL-related attributes are implemented by Atlprov.dll.

So, boo and Loyc aren't the only compilers doing binary-compatible compiler extensions. I wonder who else is doing it.