Monday, September 3, 2007

Loyc design issues

It's been fun designing Loyc, but boy, I've got a lot left to think about.

Right now I'm trying to figure out how to allow extensions to activate and deactivate statements based on arbitrary contextual criteria. One unanswered question is whether statements should have access to their parent node (ICodeNode) during parsing. The main problem with allowing it is that the parent nodes are, in general, not yet fully parsed when the child nodes are parsed, and it may be tricky to design convenient, reasonable, non-cumbersome semantics for the incomplete parent nodes. I'm leaning toward requiring only that the type Symbol of parent nodes be made available. Probably some other kind of context than the parent node ought to be available, such as symbol tables. In some languages, notably C++, symbol tables are considered necessary for correct parsing, although there are usually ways around such problems; for example I think FOG can parse C++ without them. Still, even if symbol tables aren't needed to parse, it often makes sense to build symbol tables during parsing. But in Loyc I also want to separate concerns as much as possible in order to maximize code re-use. By separating out the code for building symbol tables,
  1. it should be easier to add artificial (aka synthetic) nodes to symbol tables
  2. people can parse code without building symbol tables, which is nice if, for whatever reason, the symbol tables are not needed.
But I digress. There's lots of unresolved issues and I'd just like to summarize the ones I can think of...
  • There may be a lot of statements allowed from a lot of different extensions, perhaps hundreds, and the set of available statements may vary with every new block that opens. I'm planning to give statements full control over parsing their contents, including nested statements, but there will be a conventional way that statements can give control back to the language style. So the questions are
    • How to efficiently modify the set of available statements (I decree the split infinitive to be perfect English :P).
    • How to allow statements to specify when they are available. Arbitrary criteria should be possible but the most common case(s) should be easy for the user (i.e. extension writer) to use and should perform well. Or maybe the problem should be reconsidered as follows: how can block statements (that contain other statements) specify what categories of other statements they can contain?
    • How to provide the language style with enough control over how parsing operates that the original language spec can be supported under the Loyc extensible parsing model.
  • Similar concerns apply to operators. There may be hundreds of operators available in a program, but not all at once. Availability may be moderated by the parent statements and parent expressions.
  • Note to self: I need to introduce a new kind of OneOperatorPart that represents the edge of the expression. This would be a prerequisite to custom-syntax function calls such as Line(from x, y to x+10, y+10).
  • What kind of context information should be available during statement and expression parsing? Certainly the type Symbol of parent and grandparent nodes... but some statements may only be available if a certain custom attribute was used on the statement or a parent statement, so I think the set of attributes for parent/grandparents should be available too. And maybe availability based on attributes should be a standard feature, a criterion upon which Loyc activates/deactivates the statements automatically. But as I've said, providing the parent ICodeNode seems like too much to ask. I suppose it could be provided optionally.
  • As I mentioned above, there are two ways to look at how statements are allowed to be nested inside other statements. You can either have the substatements specify what they can be located inside of; or, the parent statement can say what kind of substatements it can contain. Should Loyc support both approaches?

Now consider this. Suppose somebody writes an "unless-else statement" extension:
unless (x < 0)
return new StringBuilder(x);
else
return new StringBuilder();
and somebody else writes an extension for "macro methods" which can be "instantiated" as normal methods:
macro(T) T Abs(T x) {
unless (x < 0) return x;
else return -x;
}

instantiate(int) Abs; // create method int Abs(int)
instantiate(long) Abs; // create method long Abs(long)
instantiate(float) Abs; // create method float Abs(float)
instantiate(double) Abs; // create method double Abs(double)
You can see that the macro method statement should be able to parse all statements that belong inside a method, and the "unless" statement should be allowed in the same places an "if" statement is allowed. You can see that if the "macro method" had to specify explicitly the kind of statements it supports, or if the "unless" statement had to specify explicitly the allowable parent nodes, then there is no way the two extensions could work together if neither author knew about the other's extension.

Therefore, I think statements should grouped by "category", where categories are classes of statements like "method body statements", "class body statements", "loop statements", "block statements", "conditional statements", etc. I suspect categories will be important for extensibility because they can allow statements to work together that are not aware of each other.

No comments: