Loyc, etc.

Saturday, April 20, 2013

The Loyc tree and prefix notation in EC#

Update: the Loyc tree code has been rewritten and the original concept has been modified from the concept described in this post (for example, the red/green tree concept has been dropped for the sake of simplicity and other reasons).

As I designed the macro system for Enhanced C#, I thought a lot about what kind of syntax tree would be appropriate--not just for EC# but for tools that could process multiple languages. One of Loyc's goals is conversion between languages, and another goal is to support multiple language syntaxes (perhaps each with a macro system!) in a single project. A third goal is to assist IDE tools (in particular, to be compatible with incremental parsing). And, as always, I wanted a data structure that is fast and uses memory efficiently.

I considered using LISP-style lists, but they didn't fit into C# very well. In particular I didn't feel that they were the right way to model complex declarations such as

 [Serializable] struct X<T> : IX<T> where T : class, Z { ... }

Besides, LISP lists don't hold any metadata such as the original source file and line number of a piece of code. And how would I address the needs of IDEs--incremental parsing, for example? So I fiddled around with different ideas until I found something that felt right for C# and other languages descended from Algol. Eventually I decided to call my idea the "Loyc node" or "Loyc tree". The Loyc tree is really two independent ideas that I developed concurrently: one idea for the concept of the tree as viewed by an end-user (i.e. other developers), and another idea for how the implementation works.

The concept of a Loyc node involves three main parts: a "head", an "argument list" and "attributes". The implementation involves two parallel trees, "red" and "green", inspired by a similar idea in Microsoft Roslyn.

A Loyc tree is basically "XML for code"--but wait a minute, I hate XML. Maybe "JSON for code"? Well, just as all XML elements have the same structure, all Loyc nodes have the same structure, also. Just as there is a clear and obvious mapping from XML to the code representation of XML (such as an XML DOM or XElement), the mapping from EC# to a Loyc tree is similarly transparent, when the EC# code is written in prefix notation.

In EC# prefix notation, the above struct "X<Y>" looks like this:

 [Serializable]
 #struct(#of(X, [#where(#class, Z)] T),
         #(#of(IX, T)),
         #{}( ... ));

But wait, let's start with some simpler examples:

 x += percent * 0.01;       // normal EC#
 #+=(x, #*(percent, 0.01)); // prefix notation
 
 public int X;              // normal EC#
 [#public] #var(#int, X);   // prefix notation
 
 using System.Collections.Generic;              // normal
 #import(#.(#.(System, Collections), Generic)); // prefix
     // (#using is reserved for the using(...) statement)

 if (condition) { y(); }    // normal EC#
 @#if(condition, #{}(y())); // prefix notation
 
 Foo(x, y, z);              // normal EC#
 Foo(x, y, z);              // prefix notation

In the pairs above, the EC# parser produces the same syntax tree for the two lines in each pair. EC# accepts prefix notation anywhere that a normal statement or expression is allowed. Prefix notation is based on function-call notation, as you can clearly see in the last example.

Note: the "@" in "@#if" avoids confusion between if-statements and the preprocessor directive "#if". The preprocessor in EC# is obsolete and not necessary, but it still exists. I wanted to use the "#" character to indicate a "special name" in prefix notation, but somehow we must avoid conflicts with the preprocessor. The parser converts the identifier "@#if" to "#if" internally, just as "@foo" actually means "foo" in plain C#.

The Loyc syntax tree

In most compilers, the syntax tree is very strongly typed, with separate classes or data structures for, say, variable declarations, binary operators, method calls, method declarations, unary operators, and so forth. Loyc, however, only has a single data type, Node, for all nodes*. There are several reasons for this:

Simplicity. Many projects have thousands of lines of code dedicated to the AST (abstract syntax tree) data structure itself, because each kind of AST node has its own class. Simplicity means I write less code and users learn to use it faster.
Serializability. Loyc nodes can always be serialized to a plain text "prefix tree" and deserialized back to objects, even by programs that are not designed to handle the language that the tree represents**. This makes it easy to visualize syntax trees or exchange them between programs.
Extensibility. Loyc nodes can represent any programming language imaginable, and they are suitable for embedded DSLs (domain-specific languages). Since nodes do not enforce a particular structure, they can be used in different ways than originally envisioned. For example, most languages only have "+" as a binary operator, that is, with two arguments. If Loyc had a separate class for each AST, there would probably be a PlusOperator class derived from BinaryOperator, with a LeftChild and a RightChild. But since there is only one node class, a "+" operator with three arguments is easy; this is denoted by #+(a, b, c) in EC# source code. The EC# compiler won't understand it, but it might be meaningful to another compiler or to a macro.

* In fact, there are a family of node classes, but this is just an implementation detail.
** Currently, the only supported syntax for plain-text Loyc trees is EC# syntax, either normal EC# or prefix-tree notation.

EC# syntax trees are stored in a universal format that I call a "Loyc tree". All nodes in a Loyc tree consist of up to four parts:

An attribute list (the Attrs property)
A Value
A Head or a Name (if a node has a Head, Name refers to Head.Name)
An argument list (the Args property)

The EC# language does not allow (2) and (3) to appear together (specifically, a Value can only be represented in source code if the Name is "#literal"). Therefore, you can think of Value, Head and Name as a discriminated union known as "the head part of a node". There is no easy and efficient way to represent a discriminated union in .NET, so all five properties (Attrs, Value, Head, Name, Args) are present on all nodes.

As you've seen, almost any Loyc node can be expressed in EC# using either "prefix notation" or ordinary code. The basic syntax of prefix notation is

 [attributes] head(argument_list)

where the [attributes] and (argument_list) are both optional, and the head part could be a simple name. For example, the EC# statement

 [Foo] Console.WriteLine("Hello");

is a single Node object with three children: Foo, Console.WriteLine, and "Hello". Foo is an attribute, Console.WriteLine is a Head, and "Hello" is an argument. Each of these children is a Node too, but neither Foo nor "Hello" have children of their own. The Head, Console.WriteLine, is a Node named "#." with two arguments, Console and WriteLine. The above statement could be expressed equivalently as

 [Foo] #.(Console, WriteLine)("Hello");

This makes its structure explicit, but the infix dot notation is preferred. Finally, Console and WriteLine are nodes that only have a Name (no Attrs, no Args, no Head, no Value).

Conceptually, Loyc trees have either a Head node or a Name symbol but not both. Foo, Console, WriteLine, and #. are all node names, while Console.WriteLine is a head node. However, you can always ask a node what its Name is; if the node has a Head rather than a Name, Name returns Head.Name. Thus, #. is the Name of the entire statement.

Attributes can only appear at the beginning of an expression or statement. Use parenthesis to clarify your intention if necessary, but please note that parenthesis are represented explicitly in the syntax tree, not discarded by the parser. Parenthesis cause a node to be inserted into the head of another node, so

 (x())

is a node with no arguments, that has a Head that points to another node that represents x(). Attributes have lower precedence than everything else, so

 [Attr] x = y;

associates the attribute Attr with the "#=" node, not with the "x" node.

Unlike C# attributes, EC# attributes can be any list of expressions, and do not imply any particular semantics. You can attach any expression as an attribute to any other statement or expression, e.g.

 [4 * y << z()]
 Console.WriteLine("What is this attribute I see before me?");

When the time comes to generate code, the compiler will warn you that it does not understand what the hell "4 * y << z()" is supposed to mean, but otherwise this statement is legal. Attributes serve as an information side-channel, used for instructions to macros or to the compiler. Macros can use attributes to receive information from users, to store information in a syntax tree temporarily, or to communicate with other macros.

You can mix prefix notation with "normal" EC# in various ways. For example, all of the following versions of "MulDiv" are equivalent:

 int MulDiv(int a, int b, int c) { 
     return (int)((long)a * b / c); 
 }
 int MulDiv(int a, int b, int c) { 
     #return((int)(#cast(a, long) * b / c)); 
 }
 #def(#int, MulDiv, #(int a, int b, int c), { 
     return (int)((long)a * b / c); 
 });
 #def(#int, MulDiv, #(#var(#int, a), #var(int, b), #var(int, c)), 
     #{}(#return(
         #cast((#/(#*(#cast(a, #long), b), c)), #int)
     )) 
 );

Statement/expression equivalence

An important feature of EC# is that the difference between "statements" and "expressions" is only syntactic, not semantic, i.e. the difference is only skin-deep. Any EC# expression can be a statement, and any statement can be written in the form of an expression. The EC# parser needs to know whether to expect expressions or statements in a given location, but once a syntax tree is built, the distinction between statements and expressions disappears. Here is an example that mixes statements and expressions:

 if (foo.Bar > 0 && #{
     var stats = GatherStatistics(foo);
     stats.Avg > foo.Bar && stats.StdDev < 1.0
 })
 {
     Frobnicate(foo, stats);
 }

Roughly, "#{" causes the parser to switch to statement mode, allowing you to write statements where an expression is expected. Of course, plain C# doesn't work this way. Therefore, the conversion to plain C# can sometimes be messy and difficult.

The EC# parser also does not distinguish between "executable places" and "places for declarations". For example, this code can be parsed:

    Console.WriteLine("This statement is illegal!");
    using System.Collections.Generic;

    void foo() {
        WriteLine("This statement makes more sense!");

        int Ecks {
            int x = 0;
            get { x }
            set { x = value; }
            namespace WTF {
                Console.WriteLine("This is pretty weird!");
            }
        }
    }

Although the parser understands this code, the EC# compiler (as a whole) would reject it, because code cannot run outside of methods (except macros), and namespaces cannot be located inside methods or properties.

Note: EC# is not usable yet. This post exists because I want to publish the idea of Loyc trees. I'll update this when the parser is ready. But once the parser is ready, you will still not be able to write code in EC#. Making a programming language by myself is a long process!

Thursday, April 18, 2013

Loyc and Nemerle

I created this blog to be about Loyc, the Language of Your Choice project, which is about making general tools for creating and extending programming language, converting code between languages, and creating tools for IDEs. However, quite simply Loyc is too big a project for one person. I couldn't find the time for it, and I couldn't quite figure out how to make an extensible programming system.

Last August I decided to reboot with something much more modest: to design and implement a series of improvements to C#. That way I hoped to gain enough experience to do something bigger, later. To increase my chance of success, I changed to a part-time worker at work, leaving more time for Loyc.

I was going to start by modifying NRefactory, and my "Enhanced C#" would simply compile down to plain C#. I couldn't figure out how to accomplish my end goal--a fully extensible language--so I decided to simply improve the existing C# language with a series of specific and modest features such as the "null dot" or safe navigation operator (now denoted ??.), the quick-binding operator (::), forwarding clauses, return value covariance, C++-style templates, "static if", blah blah blah.

However, once I was done drafting the new language, I noticed that despite all the work I'd put into merely the draft alone, the new language still didn't address one of the most requested features from C# users: "Provide a way for INotifyPropertyChanged to be implemented for you automatically on classes".

INotifyPropertyChanged is a simple interface for allowing code to subscribe to an object to find out when it changes. For example:

interface INotifyPropertyChanged {
 event PropertyChangedEventHandler PropertyChanged;
}
class Person : INotifyPropertyChanged {
 public event PropertyChangedEventHandler PropertyChanged;
 
 string _name;
 public string Name
 {
  get { return _name; }
  set {
   if (_name != value) {
    _name = Value;
    Changed("Name");
   }
  }
 }
 string _address;
 public string Address
 {
  ... same thing ...
 }
 DateTime _dateOfBirth;
 public DateTime DateOfBirth
 {
  ... same thing ...
 }
 void Changed(string prop)
 {
  if (PropertyChanged != null)
   // The .NET "EventArgs" concept is stupid, but I digress
   PropertyChanged(this, new PropertyChangedEventArgs(prop));
 }
}

The problem is that you need 10 new lines of code for every new property in your class. Couldn't the language have some kind of feature to make this easier? But I didn't want to have a feature that was specifically designed for INotifyPropertyChanged and nothing else. Besides, there are different ways that users might want the feature to work: maybe some people want the event to fire even if the value is not changing. Maybe some people want to do a reference comparison while others want to do object.Equals(). Maybe it can't be fully automatic--some properties may need some additional processing besides just firing the event. How could a feature built into the language be flexible enough for all scenarios?

I decided at that point that it was time to learn LISP. After studying it briefly, I figured out that what would really help in this case is macros. Macros would allow users to provide a code "template" that is expanded for each property. To make a medium-length story short, I came up with an interchange format for syntax trees called "Loyc trees", and I dropped my EC# 1.0 draft--which was never made--to create EC# 2.0; many of the features of EC# 1.0 could be accomplished with macros anyhow. But since I wasn't satisfied with any of the readily-available parser generators, I also decided to make my own LL(k) parser generator. The initial version is about 75% complete right now (no, no, just the parser generator, sorry); the main missing features are syntactic "and" predicates (an optional feature that provides unlimited lookahead), and the ability for the parser generator to parse its own input (I'm using C# operator overloading for now.)

So the other day I finally started writing the parser for EC# 2.0, which would also be the parser generator's native language... when I rediscovered Nemerle.

I'd looked at Nemerle a few years back, but it didn't seem very impressive at the time; it just appeared to be a C#-like language with some type inference, algebraic data types, and pattern matching (and my memory is bad but I think it mainly targeted Linux (Mono)). Those are all good things, but my interest has always been in extensibility. I never knew until now that in fact Nemerle already has LISP-style macros (well, not quite LISP-style: a Nemerle macro cannot exist in the same project as the code to which it is applied; but that's okay because EC# wasn't going to support such an advanced feature either (not at first, although I want to support it eventually).)

I don't know whether Nemerle has always been extensible, and I simply didn't realize it, or whether macros are a new feature. In either case, this discovery will certainly impact my development of Loyc and EC#. In principle, Nemerle is so similar to EC# that I should perhaps even halt development of EC# and just use Nemerle.

My first impression is that Nemerle's macros seem to lack the simplicity of my design, but they are clearly more powerful. So far, I have only figured out how to support syntactic macros in EC#. I haven't figured out how to allow a macro to access semantic knowledge of a program--to look up types and members, or to introspect the "meaning" of some code and then to modify that code. Nemerle seems to have multiple "flavors" of macros that can accomplish different things, which is very exciting. Another neat fact is that both myself and Nemerle's people decided to use what I call a "tree parser"--the lexer output is grouped by parenthesis, square brackets and braces, before it is sent to the main parser. This makes it easier to support extensibility (although EC#'s parser will not be extensible), and makes it easy for the parser to look beyond anything in parentheses.

So where do I go from here? I'm not sure yet. I'll see what I can learn about the Nemerle compiler and macro system, before I can decide on a course of action. I am particularly interested in the base language--the language before any macros are added--but I have found only one page on that topic, and the beginning of that article is hard to understand. Oh well. (Argh! They keep saying that basic features of the language such as the "if" expression are defined by macros, but since Nemerle has no "goto" statement, I can't imagine how the "if" expression could decompose into anything simpler.) I also found this "extended course" in macros, and other information.

Wikipedia's Nemerle page claims that "the core developers of Nemerle were hired by the Czech software development company JetBrains." However, the core developers appeared to be Michał Moskal and Kamil Skalski, and Google seemed to indicate that those two worked for Microsoft and Google respectively, not JetBrains. So I contacted Michał, who told me that neither of them were working on Nemerle! He said there "were a few people, mostly from Russia, still working on it." But who exactly is left? I have no idea.

The webpage is in disarray. The first time I tried to download Nemerle, I found the download link on the front page of the wiki, but this points to downloads that are 2 years old, making me wonder if Nemerle was dead (I found the correct download link later). I wanted to talk to Nemerle's developers, but the link to the "Forum" on Nemerle's front page was broken, so next I found this contact page, but the links to the mailing list archives do not work, and the subscription link requires login credentials. Finally I found this "news" page, which hasn't been updated since 2004! Finally, Michał pointed me to Nemerle's Google Group, but clicking "New Topic" caused the error "An error occurred while communicating with the server." Yikes!

Finally, I was able to ask a couple of questions about Nemerle--but so far, the only response is from someone that cannot understand my English. It turns out that the first language of Nemerle is Russian! Could it be that I won't be able to work with the Nemerle people due to a language barrier?

In conclusion, we should all learn Esperanto or something.
Por finiĝi, ni ĉiuj devus lerni Esperanton aŭ ion.

(I used to be very interested in the language boo, but the boo people wouldn't talk to me, and in general did not document boo's advanced features at all. I don't know if the Nemerle people will talk to me, but at least Nemerle offers far more articles about its features than boo ever did, even if you only count the English articles (note, it's been a couple of years now since I checked on boo).)

Tuesday, October 30, 2012

Lisp, here's why the mainstream ignores you.

While I currently don't know how you write a "hello world" in LISP, I am keenly aware that the LISP language, with its powerful macro system, could be very instructive on the topic of metaprogramming. However, it is proving difficult to quickly learn LISP because the community uses terminology, technology and notations that are far outside the mainstream.

The most obvious thing is the parenthesis. While a prefix notation such as

    (eqv? (* (+ (2 3)) x) 20)
    meaning (2 + 3) * x == 20

gives LISP its power as a programming language, it's also a huge turnoff. I can't figure out why they haven't standardized on a syntactic sugar such as sweet expressions that maps in a predictable way back to S-expressions. Even sweet expressions look a lot different than popular languages, but it's clearly an improvement.

I went looking for information on the macro system and the first search result for "LISP macros" is some university lecture notes. It suggests "macroexpand-1" as a debugging tool so I Googled that next, and came upon this page from Common Lisp the Language, 2nd Edition, which states:

A form is considered to be a macro call only if it is a cons whose car is a symbol that names a macro.

Ahh... say what? Well, I guess I can handle learning what a "cons" is (Wikipedia implies that a list like (1 2 3) is shorthand for the singly-linked list (cons 1 (cons 2 (cons 3 nil))), which is a bit surprising since everything in LISP is made out of lists, and the performance and memory consumption of singly-linked lists is known to be rather bad on modern 64-bit machines, but I digress...

Anyway it's slightly amazing that the terms "car" and "cdr" are still in use:

Lisp was originally implemented on the IBM 704 computer, in the late 1950s. The 704 hardware had special support for splitting a 36-bit machine word into four parts, an "address part" and "decrement part" of 15 bits each and a "prefix part" and "tag part" of three bits each.
...

car (short for "Contents of the Address part of Register number"),

cdr ("Contents of the Decrement part of Register number")

Considering that LISP was a potentially revolutionary idea, I can forgive it for using these terms... in 1959. But to be using them in the 21st century? It's a boneheaded way to alienate potential users.

It may seem unfair to call out LISP for having weird names for important functionality: C does the same thing with weird names like "itoa", "fprintf" and "scalb". But C can get away with it because it is already popular. No new language created today could get away with it; and an old language with few users is in almost the same position as a brand new language.

I'm not totally new to car and cdr, I've seen it before, but I've had to look up car and cdr about three times now because I keep forgetting which one is which. I think I'll just remember that "a" is the first letter of the alphabet so "car" fetches the first item in a list, while "d"... well, "d" is a different letter of the alphabet. The Wikipedia article even seems to think these names are advantageous:

However, car and cdr have the advantage that short compositions of the functions can be given short and more or less pronounceable names of the same form. In Lisp, (cadr '(1 2 3)) is the equivalent of (car (cdr '(1 2 3))); its value is 2 (the first item of the rest of (1 2 3)). Similarly, (caar '((1 2) (3 4))) is the same as (car (car '((1 2) (3 4)))); its value is 1.

Riiiiight. So you could use "cadr" to mean "first of the rest" and you could use "caar" for "first of the first". Or, why not use something logical like "second" and "first.first". Or if brevity is really that important, "r.f" and "f.f" - but it's probably not that important.

So then I figured I'd download LISP and play around with it. I knew Common Lisp was a ... well, a common LISP, so I Googled that. The first four results were sites about common lisp without downloads; the third was the book Practical Common Lisp which suggested Lisp in a Box, linklessly. Searching for Lisp in a Box finds "LispBox" instead, although it might be the same thing: the book says Lisp in a Box uses emacs as its editor, while Lispbox's page says the same thing.

But I don't want to use emacs! I used that bastard a little in university a bit but ugh, it's just an ugly text window with a weird interface that is different from every editor that isn't emacs. Almost every other editor is standardized on Ctrl+S for Save, Ctrl+F for Find, Tab for indenting text, Ctrl+C for copy, etc. But not ugly old emacs, oh no. (Yeah, okay, macs prefer the ⌘ key... if I had a mac I'd probably have to remap it to Ctrl.)

They even have a different notation for keyboard shortcuts than everybody else: "C-h t" where the rest of the world would probably say "Ctrl+H, T" (except that the rest of the world wouldn't use multiple-key shortcuts for anything important.) Even worse, referring to the Alt key as "M-"? It's crazy, like they're waiting for the 80s LISP machines to come back.

Of course, if you're used to emacs then it must be the rest of the world that seems weird. But there's an easy solution: on first startup, emacs could offer a choice of "Traditional emacs keyboard interface" and "Normal keyboard interface (recommended for new users)", the latter being based on Windows and Mac programs. To decide how to assign other keys, they could copy bindings from the most popular IDEs, Visual Studio and Eclipse.

The Lispbox downloads are oddly headlined "Lispbox test builds", but whatever. So I download the Windows one--90 MB compressed? Son of a bitch! That's what happens when you choose emacs as your editor--okay, okay, it's only half of the total size, and maybe I'm being unfair since some IDEs are bigger, but wasn't LISP supposed to be lightweight?

Thankfully, emacs offers a "C-x/C-c/C-v Cut and Paste" option, but "Delete" still doesn't delete selected text, "Ctrl+F" doesn't work and "Ctrl+Z" minimizes the damn window, and maybe worst of all, "Tab" either doesn't do anything or indents by a seemingly random amount, depending on the context (I'll figure that out later).

It's getting late, maybe I'll finish this later... if anyone can name a non-emacs LISP environment, I'd like to know.
----

The key problem I'm facing right now in my exploratory design of macros for EC# is the question of how to order macro expansions with respect to each other and with respect to substitution operations (which, for all I know, should themselves be implemented with macros) and what the rules for name resolution should be. However, reading about LISP has provided little insight so far, because (1) I would have to understand LISP fairly well in order to understand the documentation about macros, or extremely well in order to understand any technical specifications; and (2) in tutorial-level material, the ordering of operations, the name resolution rules, and the compilation model are implicit and assumed to be of secondary importance. Quite reasonably they assume you want to use LISP, whereas I just want to shamelessly copy the ideas behind LISP and the lessons that were learned in the course of developing modern LISPs.

Perhaps this will help me. Since much of LISP is defined with macros, I had also been wondering which parts of LISP were built-in and which were macros. The key to finding this out seems to be the term "special operator" aka "special form", which is like a keyword or built-in statement in other languages. So far I haven't found a complete list of them though.

Wednesday, September 12, 2012

Opus, the codec to end all codecs

Opus is a new all-purpose low-latency audio codec with better quality than MP3 and better than Vorbis at low bitrates. Plus it's an open standard, open source, and patent-free! Not only that, it operates over bitrates from 6 kbps (for low-quality speech) up to 510 kpbs. Thus it is suitable to replace virtually every audio codec in the world!*

(* except the ultra-high-rate lossless codecs like FLAC, the ultra-low-rate codecs like codec2 which works at an incredible 1.4-2.4 kbps, and in situations where simplicity is paramount, so that uncompressed audio, G.711 or ADPCM is used.)

As a bonus, it's an official IETF standard and it has been chosen a mandatory-to-implement (MTI) codec for WebRTC, an upcoming standard for real-time communication on the web.

Well, I had a look at the RFC. The technical details are largely impenetrable… for example the introduction to Range Coding should really say something about what fl[k], fh[k], and ft represent (and what’s [k], an array subscript?), and they really should not have waited until section 4.1.3.3 to introduce the concept of a Probability Density Function, since it appears to be a fundamental idea...

...Anyway, Opus is basically two codecs in one: a linear prediction (LP) codec for speech (based on a standalone codec called ‘SILK’) and a MDCT-based codec for music (based on a standalone codec called ‘CELT’). An Opus encoder can instantly and seamlessly switch between the two codecs at any time according to the bitrate and the dominant character of the input, and there is a hybrid mode where SILK is used for low frequencies and CELT for high frequencies.

CELT is built on the same psychoacoustic principles as other modern codecs (e.g. Vorbis), but adds innovations to permit low latency without hurting overall audio quality. SILK is designed for low bandwidth and low-latency speech (though the principles behind it remain totally opaque to me), while CELT is better for high bandwidth audio (music or speech or anything else) but can still operate at very low bitrates when the sound is not speech or speech-like and therefore unsuitable for SILK.

The operating modes of Opus are designed so that the two codecs “fit together” nicely, and it can switch between the two on 10ms boundaries (sampling range can similarly change among 8/12/16/24/48 kHz, ditto for mono/stereo). Opus also offers other real-time and VOIP features, such as CBR and forward error correction and other tricks to compensate for packet loss.

Some final niceties of Opus: the encoder and decoder can operate at independent sampling rates (or one can be mono and the other stereo), and the (continuously variable) bitrate is independent of the operating mode (e.g. want stereo at 48 kHz at 8 kbps? It’s probably a bad idea, but it’s not outright prohibited). So basically Opus covers virtually all scenarios. The codec to end all codecs, indeed. (correct me if I’m wrong about any of this, dear experts).

Now can I just get a codec that achieves very low bitrates by observing long-term audio patterns and making backward references to similar audio many seconds in the past? kthxbye :)

Codec2 (presentation) is really fascinating too. It was developed for radio applications, and is not really useful for VOIP because Codec2 packets are much smaller than UDP headers, i.e. the overhead of the Internet itself outweighs the space savings of Codec2. But what I find interesting about it is the potential for cheap archiving. For instance, with Codec2 you could record your entire life: at 2400 bits/second it requires just 25.9 MB (non-MiB) per day and 9.46 GB per year or under 1 TB for a lifetime (or much less, assuming you eliminate regions of silence and reduce the bitrate in nearly-silent situations.) Coupled with voice recognition and a search index, you'd finally be able to resolve any arguments with your wife about who said what last week. I bet some sociologists and linguists would find interesting uses for such a corpus...

Tuesday, July 17, 2012

Constructors Considered Harmful

Ahh, constructors, constructors. In my opinion the constructor design in most languages including C++, C#, D and Java is flawed, because it exposes an implementation detail that should not be exposed, namely, which class gets allocated and when.

I offer you as "exhibit L" the Lazy<T> class in the .NET framework. This class's main purpose is to compute a value the first time you access it. For example:

 int x;
 Lazy<int> lazy = new Lazy<int>(() => 7 * x);
 x = 3;
 x = lazy.Value; // 21
 x = lazy.Value; // still 21 (Value is initialized only once)}

There is an annoying issue, though: Lazy<T> operates in some different modes, and it also contains a member and extra code to aid debugging. So in addition to holding the value itself and a reference to the initializer delegate, it's got a couple of other member variables and the Value property has to check a couple of things before it returns the value. Thus, Lazy<T> is less efficient than it should be.

So what if, in the future, MS decided to optimize Lazy<T> for its default mode of operation, and factor out other modes into derived class(es)? Well, they can't ever do that. If MS wants to return a LazyThreadSafe<T> object (derived from Lazy<T>) when the user requests the thread-safe mode, they can't do that because all the clients are saying "new Lazy", which can only return the exact class Lazy and nothing else.

MS could add a static function, Lazy.New(...) but it's too late now that the interface is already defined with all those public constructors.

As "exhibit 2", a.k.a. "exhibit Foo", I recently wrote a library where I needed to provide a constructor that does a bunch of initialization work (that may fail) before it actually creates the object. By far the most natural implementation was a static member function:

 // Constructor with dependency injection
 public Foo(A arg1, B arg2, C arg3)
 {
 }
 // Static helper method provides an easier way to create Foo
 public static Foo LoadFrom(string filename, ...)
 {
     ... // do some work
     arg1 = ...; arg2 = ...; arg3 = ...;
     return new Foo(arg1, arg2, arg3);
 }

However, the client didn't like that, and insisted that LoadFrom() should be a constructor "for consistency" with the other constructors. I was able to rearrange things to make LoadFrom() into a constructor, but my code was a little clunkier that way.

So what I'm saying is, when clients directly allocate memory themselves with new(), it constrains the class implementation to work a certain way, and this constraint is unnecessary.

A final problem with constructors is that they can't have names.

For D I suggested a simple solution, which in principle could be added to any language that uses "new". The solution is to treat constructors as equivalent to static methods named "new". So when the user writes "new Foo(...)", this call might go to a real constructor, or it could go to a static method named "new" that returns a Foo. Likewise, if the user writes "Foo.new(...)", this call could go to a static method or to an actual constructor.

This tweak would allow the class Foo to change the implementation of a constructor to return an object derived from Foo, instead of Foo itself. It would also allow Foo to do some work before actually creating a Foo object. Finally, this proposal doesn't allow named constructors per se, but it does provide a similar syntax for named and unnamed creation methods, because the syntax to call an unnamed constructor, "Foo.new(...)" is similar to the syntax to call a named creation method, "Foo.newFromFile(...)".

I don't expect D or any other language to accept my proposal, but it's a good idea so I'm blogging it anyway. In .NET, this feature would be more practical if .NET compilers were to automatically create a static "new" method for every real constructor, and to prefer to emit calls to the static method (if present) instead of the real constructor. This rule would provide for binary compatibility, which improves upon the source-level compatibility provided by my syntax proposal.

Tuesday, June 19, 2012

NDK Docs

Google has not placed its NDK documentation online. Consequently any NDK-related Google searches that might by answered by the NDK documentation will come up empty. I'm tired of this, so I've decided to put up some of these docs, at least the ones that are not mirrored elsewhere already.

ANRDOID-ATOMICS
ANDROID-MK
APPLICATION-MK
CHANGES
CPLUSPLUS-SUPPORT
CPU-ARCH-ABIS
CPU-ARM-NEON
CPU-FEATURES
CPU-X86
DEVELOPMENT
HOWTO
IMPORT-MODULE
INSTALL
LICENSES
NATIVE-ACTIVITY
NDK-BUILD
NDK-GDB
NDK-STACK
OVERVIEW
PREBUILTS
STABLE-APIS
STANDALONE-TOOLCHAIN
SYSTEM-ISSUES

Friday, June 15, 2012

Android ListView: Make Selection Stay

The behavior of Android's ListView is surprising: when the user clicks an item, it doesn't stay selected! It looks selected for a brief instant and then fades away.

Apparently the "disappearing selection" is by design; it's something called "touch mode". I read through that document and still I have no idea why they thought it was a good idea. My guess is that, since Android was originally designed for small-screen devices, they expected that you would fill the screen with a list and then, when the user clicks an item, move to a new list on a different screen. Thus, the user wouldn't be aware that Android lost track of the selected item.

But this behavior is quite annoying if, for example, you want the user to select an item and then show information about that item on the same screen. If the selection disappears, how is the user supposed to know what they clicked (assuming of course that users have the attention span of a goldfish)?

One possible solution is to change all the list items into radio buttons. I don't really like that solution because it wastes screen real estate. I'd rather just use the background color to show which item is selected. I have seen one solution so far but it is not quite complete or general. So here's my solution:

1. In your XML layout file,

Go to your ListView element and the following attribute: android:choiceMode="singleChoice". I'm not entirely sure what this does (by itself, it doesn't allow the user to select anything) but without this attribute, the code below doesn't work.

2. Define the following class.

It is used to keep track of the selected item, and also allows you to simulate pass-by-reference in Java:

public class IntHolder {
public int value;
public IntHolder() {}
public IntHolder(int v) { value = v; }
}

3. Put the following code somewhere

I'll assume you put it in your Activity, but it could go in any class really:

static void setListItems(Context context, AdapterView listView, List listItems, final IntHolder selectedPosition)
{
    setListItems(context, listView, listItems, selectedPosition, 
                 android.R.layout.simple_list_item_1, 
                 android.R.layout.simple_spinner_dropdown_item);
}
static void setListItems(Context context, AdapterView listView, List listItems, final IntHolder selectedPosition, 
                         int list_item_id, int dropdown_id)
{
    listView.setOnItemClickListener(new AdapterView.OnItemClickListener() {
        public void onItemClick(AdapterView<?> list, View lv, int position, long id) {
            selectedPosition.value = position;
        }
    });
    ArrayAdapter<CharSequence> adapter = new ArrayAdapter<CharSequence>(context, list_item_id, listItems) { 
        @Override
        public View getView(int position, View convertView, ViewGroup parent) {
            View itemView = super.getView(position, convertView, parent);
            if (selectedPosition.value == position)
                itemView.setBackgroundColor(0xA0FF8000); // orange
            else
                itemView.setBackgroundColor(Color.TRANSPARENT);
            return itemView;
        }
    };
    adapter.setDropDownViewResource(dropdown_id);
    listView.setAdapter(adapter);
}

That's all! The above assumes you want single selection. With some small modifications to getView(), you could support multi-selection too, I guess, but you should probably use checkboxes instead.

Warning: this solution needs further development. If the user uses arrow keys or buttons to select an item, that item will not be selected from the IntHolder's perspective. If the user presses the unlabeled button or the Enter key then the item will become "officially" selected, but then you have another problem because if the user uses the arrow keys again, it will sort of look like two items are selected. Leave a comment if you figure out how to keep the "internal selection" in the IntHolder synchronized with the "keyboard selection" or whatever it's called. What is it called, anyway?

Android NDK tips

Friday, June 8, 2012

The Delightful D Programming Language

D versus C# and C++

I became a D enthusiast yesterday, when I learned how much better it is than C++, and I have been studying D for two days straight out of sheer love. Oh, it's not perfect, but compared to C++? No contest. Ditto for Java. C# was my language of choice as of 3 days ago, but today I think it has moved down a rank.

Not having used D for any serious work yet, I could be mistaken. But D has an answer to every major criticism commonly raised against C++, from compilation time, to poor type safety, to the headache of maintaining header files, to slow compilation. D isn't just an evolutionary improvement, it has innovations that don't exist in any of the world's popular languages*:

It is said to have one of the world's fastest compilers
You can use try/catch/finally and RAII, but scope(exit) makes exception-safe code easier to read and write
You can define closures that the compiler can inline (do any C++11 compilers do that? I'm not sure, I'm stuck on Visual C++ 2008 due to a need to support Windows CE)
Garbage Collection is standard but optional, so you can write programs with low-latency guarantees by avoiding GC allocations (but how to manage memory instead? I suspect one could use alias this to make smart pointers a la C++?)
Slices and ranges, a collection access mechanism that is far safer than C++ iterators and also far more convenient, no need to repeat yourself as in lower_bound(blobCollection.begin(), blobCollection.end(), blob)
Generics are more flexible than in C++ and don't produce pages of error messages
Compile-time metaprogramming that vastly outclasses C++ (and obviously C# too)
Compile-time reflection which (I hope, but can't confirm) one could use to build a run-time reflection system if one wanted
A well-designed, multi-paradigm approach to concurrency with interesting features for both shared-memory and message-passing architectures
Built-in support for unit tests
Array-wise expressions, e.g. a[] = (b[] + c[]) / 2 (MATLAB does this more tersely, but among general-purpose languages this kind of feature is rare)
Superior floating-point features (e.g. nextUp()/nextDown()/ulp(), hex floats, control of hardware exceptions)

* (Other less-popular languages have some of these features, but certainly not all of them)

And since D is so similar to C++ and C#, you wouldn't have to spend a lot of time learning it, so why not? Plus, it shouldn't be that hard to port small programs and libraries from C++.

Admittedly, I'm not happy with everything. They are still catering to the C crowd, so you still have to fill your "switch" statements with "breaks", the "static" keyword is confusingly overused, lambdas require braces and a "return" statement (c.f. C#'s quick x -> x+1 syntax), all functions and try/catches requires braces, pass-by-reference is implicit at the call site, the operator precedence of (&, |, ^) is foolish, there's no pattern matching or algebraic data types (but at least there are tuples), if statements are not expressions... still, what D offers is too good to pass up.

But of course, while the D language is clearly terrific, and the standard library has apparently matured, the surrounding tools might not be so good: IDEs, support for smartphone platforms, etc. The only IDE I tried, Visual D (IDE plugin for Visual Studio) works pretty well, including debugging which seems to work as well as the Visual C++ debugger, and which can step into the standard library (fun!). However, Code Completion doesn't work very well yet.

Compared to C#, D is better in most areas but seems weak when it comes to dynamic linking and reflection. For example, an IDE written in .NET or Java could easily have a plug-in system, but I'm not so sure about D. .NET also offers runtime code generation while D does not. However, a research compiler exists to compile D to .NET code. Given that C++/CLI already compiles to .NET (C++/CLI), perhaps someday one will be able use D equally well for managed and native code (with a small performance hit in managed land, of course.)

Interoperability with C/C++ and .NET are pretty decent. D is supposed to interoperate with C++ functions and singly-inherited classes via extern (C++) and C++ name mangling (but which compiler's name mangling?), while you can easily create COM interfaces callable from .NET and other languages.

Still don't know D? Go forth and learn it! Note: okay, to be honest the documentation isn't that great. It really helps to read The D Programming Language.

Thursday, June 7, 2012

Smart Tabs

Man, I sure wish programming text editors would support those smart tabs known as "elastic tabs".

Use Visual Studio? Vote for elastic tabs.

Thursday, April 5, 2012

Shouldn't you learn that new Microsoft API?

I recently responded to someone trying to justify learning all those newfandangled Microsoft APIs. It sounded like a blog entry so I'm copy-pasting here.

While you should definitely learn some of the new MS technologies, my recent experience learning about things like WPF and WCF has made me a bit more cautious about learning the newest MS APIs.

Now, LINQ is a major boon to productivity that you should definitely be using; the nice thing is that you can introduce LINQ-to-objects piecemeal in all sorts of random situations, with or without the actual query syntax (I commonly call `Where()` without the `from-select` syntax, as it is often shorter.) The mathematical dual of LINQ, Reactive Extensions, is something you should be aware of, although I am still struggling to find a really good use case. Likewise, all features of C# 3/4/5 are useful so you should study them and watch for places where they will be useful, even as you continue using the "old" BCL stuff.

However, I'm going to take devil's advocate and suggest that the very newest and biggest MS libraries, such as WCF and WPF are not necessarily worth learning at all.

The primary reason is that they are huge, and not especially well-designed (the former being a symptom of the latter). I briefly blogged recently about why WPF sucks. As for WCF, the whitepaper makes it sound like it will easily interoperate nicely with "Java EE server running on a non-Windows system" and "Partner applications running on a variety of platforms", but the truth is that WCF APIs are very specifically SOAP-oriented and have very limited support for non-SOAP protocols. MS could have easily designed a general system that allows pluggable protocols, and maybe the capability is hidden (undocumented) in there somewhere, but as far as I can tell they chose to design a much more limited system that can only do SOAP and limited HTTP (as long as your message body is a serialized .NET object, IIRC). I researched Entity Framework more briefly, but noticed some complaints that it was incapable of supporting some scenarios that the (much simpler) LINQ-to-SQL can handle out-of-the-box.

IMO the design of all of these libraries are fundamentally flawed because they use a lot of components that are tightly coupled to each other, a dependency graph of the classes in each framework would probably be huge and look like a jumbled mess of scribbly lines. And even if the design were good, we would not be able to tell because there aren't public architectural documents that delve into the lower-level details, and the MSDN docs for the most part aren't very good (they tend to get less and less helpful as you look at lower and lower-level classes.)

The sheer size of the libraries also seems like a flaw; I have learned over 20 years of programming that simplicity is a virtue, one that Microsoft has never valued.

But you might ask, "so what"? Well, with such big libraries you might never feel like you really understand them. That means that when you want to do something outside the use cases for which Microsoft specifically designed WCF/WPF/EF, you're not going to know how, and it is possible that no one outside Redmond will know how, either. And when something goes wrong, you will have a hard time figuring out what went wrong. And 15 years from now when Microsoft has moved on to their next next-generation API, no one is going to enjoy maintaining software built on a foundation that is so poorly-understood.

Also because of the largeness and complexity of these new APIs, the cross-platform alternative to .NET, Mono, has poor support or no support for them. You'll probably have little difficulty using your typed data tables on Linux or Mac, but Entity Framework? Forget about it. I wouldn't be surprised if Mono never supports it.

I have used LINQ-to-SQL in a new project and it's not bad. In some ways it could be better, but I think the developer experience is substantially better than ADO.NET. One significant limitation: L2S is easiest by far if you modify tables in "connected" mode, unlike the old ADO.NET which was specifically designed to work without an active database connection. Anyway, since LINQ-to-SQL is (according to Mono people) a quarter the size of Entity Framework, it's a shame that MS decided to quit working on it.