Sunday, April 28, 2013

static and dynamic dispatch


Cecil is a very cool language from the University of Washington. Cecil has had a big impact on my view of classes, types and dispatching, some of which I’ll discuss here and some of which will be left for a future post. In this post I’ll talk about static and dynamic dispatch, starting with the venerable concept of overloaded operators.

Many languages have overloaded operator symbols. Typically, the symbol + can refer to either an integer operation or a floating-point operation depending on the types of the operands (I’m going to ignore the difference between single-precision and double-precision in this discussion). In the expression x+y, if both x and y are int, then the compiler generates an integer arithmetic instruction. If both are float, then it generates a floating-point arithmetic instruction. Most languages also handle mixed operands; iIf one operand is an int and the other is a float then the compiler generates a floating-point instruction and generates code to convert the int to a float.

There are two ways to look at such operators. Either we can say that there are two different operations with the same operator symbol, or we can say that there is one operation with two different implementations or branches. I’ll use the second terminology because it makes clear that there is a special relationship between the different branches --they aren’t independent things that just happen to have the same name.

Some languages such as C++ also allow you to declare overloaded functions such as:

float power(float,float);
int power(int,int);

Given these declarations, when C++ sees an expression power(x,y) it will chose an implementation based on the signatures of of x and y.

The decision of which branch of a multiply-defined function to use is called dispatch. When dispatching is done at compile time based on the statically-defined signatures of the arguments, then this is called static dispatch.

Dynamic dispatch is possible as well. With dynamic dispatch you choose the branch of the function based on the runtime types of the arguments. Some object-oriented languages such as Cecil have multi-methods rather than overloaded methods. Multi-methods are similar to overloaded methods but they are dynamically dispatched. The choice of which branch to call is based on the run-time types of the arguments rather than the signatures. In a strongly, statically typed language where signatures (the types of expressions) correspond directly to sorts (the types of values) this just means that you choose the branch at compile-time rather than run-time. You pick the same branch either way, you just choose it at a different time.

But in languages where a signatures can match several sorts (including any language with subclassing), this is no longer true. Consider the following example: there are two classes X and Y where Y is a subclass of X. There is a function defined with two branches:

f(a:X);
f(a:Y);

Now consider

x:X = new Y;
f(x);

where x is declared with signature X but signature. The signature of x doesn’t tell you what it’s runtime sort will be because the signature X matches two different sorts: X and Y. At runtime x will actually contain a value of sort Y. So what branch will the compiler call for f(x)? If it uses static dispatching then it will choose the first branch f(a:X) because the statically-declared signature of x is X. If it uses dynamic dispatching then it will choose the second branch f(a:Y) because the dynamic type of x is Y at the point of the call. C++ has static dispatching; Cecil has dynamic dispatching.

Cecil also has optional type declarations. The type declarations are primarily there to provide type safety rather than optimization because in general the declarations do not allow the Cecil compiler to do static dispatching. Cecil has inheritance, and as discussed above, in a language with inheritance (or other ways that a signature can match multiple sorts) static dispatch does not chose the same branches as dynamic dispatch.

Unobtainabol has both dynamicness and verifiability as goals, so it uses dynamic dispatching with optional type declarations like Cecil. However, Cecil treats all parameters symmetrically and treats the message receiver as just another parameter. This violates another goal of Unobtainabol --the principle of least surprise. In a method call like this:

obj.foo(x, y)

doesn’t it look like obj should be treated differently from x and y in deciding what branch gets executed? Doesn’t it look like obj is a more important determiner? I’m not sure exactly how Unobtainabol is asymmetric because branch-choosing algorithms are a complex subject. But I’m pretty sure that I want the message receiver to be a dominating factor in the algorithm because that is what most users will expect.

Unobtainabol has another goal to consider here --low-levelness. The user ought to be able to control whether dynamic or static dispatch is used. For a clue on how to do this, let’s look at the partial support that C++ has for dynamic dispatch. In C++ you can declare a method (i.e.: member function) as virtual. When a method is virtual the compiler uses dynamic dispatch on the message receiver. For example:

struct X {virtual f(int i); virtual f(float i);};
struct Y : public X {virtual f(int i); virtual f(float i);};
...
X* x = new Y;
int i = 0;
x->f(i);

Even though the static type of x is X, the compiler will call the branch of f associated with Y because f is virtual and the dynamic type of x is Y.

Dynamic dispatch only works for the message receiver, x. C++ still uses static dispatching on all of the other arguments (such as i in this example).

Unobtainabol has a similar feature except that it is more flexible, potentially applying to any collection of parameters. Also, the Unobtainabol feature takes dynamic dispatch as the default rather than static dispatch. In Unobtainabol, you can declare a method as static which means that it uses static dispatch on the message receiver. Similarly, for any routine you can declare any collection of arguments as static, meaning that the compiler uses static dispatch on that argument.

Unobtainbol has another goal: predictability. Since dynamic dispatch is the default, a function that uses static dispatch has to be called with the curly-brace syntax: f{x,y}. This syntax is used in Unobtainabol to warn that user that something unusual is going on and he better look at the signature of the function.

As I mentioned above, dispatch algorithms are complex and an algorithm that selects a branch in a way that is human-understandable and human-predictable based on an arbitrary combination of static and dynamic dispatch may be a bit hard to come up with. But since Unobtainabol is a perfect language rather than a real one, this is not a problem; it’s a research topic.

No comments:

Post a Comment