Const when you need it

infernu uses row-type polymorphism to propagate read/write capabilities on record fields. Using row-type polymorphism to describe more than just which fields are present bears a vague resemblance to polymorphic constraints.

In C, a pointer to a field in a const struct is automatically const’ed:

struct foo { int field; };

void useInt(const int *);

int main(void) {

    const struct foo x;

    useInt(&x.field); // no warnings because &x.field is 'const int *'

    return 0;
}

Thus, a function that extracts a pointer to a (possibly deep) field from a const struct, will also return a const pointer:

const int *getField(const struct foo *x) {

    return &x->field;

}

(All the code compiles with `-Wall` and `-Wextra`)

But, what if I want to use `getField` on a non-const struct, to get an accessor to a field within it? Almost works:

struct foo y;
int *bla = getField(&y);
*bla = 2;

Uh oh. We get a warning:

warning: initialization discards ‘const’ qualifier 
from pointer target type [enabled by default]
     int *bla = getField(&y);
                ^

The compiler is angry because `int *bla` should be `const int *bla`. But we don’t want that! We just want to get an accessor – a writable accessor – to a field in our not-const struct value.

C++ (not C) does have a non-solution: const_cast. That isn’t what we want: it’s unsafe. What we want is, if a function doesn’t get a const struct, the ‘non-constness’ should propagate to the field accessor being returned (and vice versa: if the given struct was const, so should the accessor).

In fancier words, we need const polymorphism, which I imagine would be written with a ‘constness type variable’ C like this made-up syntax:

const<C> int *getField(const<C> struct foo *x) {
    return &x->field;
}

And then we would expect this to compile with no problems:

    struct foo y;
    int *bla = getField(&y);

…because, as ‘y’ is not const, ergo the pointer returned from getField is not pointing at a const.

Unfortunately, no such thing. We could represent this in a type system in a number of ways. One simple way is to say that constness is a constraint on a type (using something like Haskell’s type classes). Another way is to have ‘write into a field’ be a kind of a capability that’s part of the type.

The latter, write-capability approach is what I use in Infernu. Here there are no structs (it’s JavaScript) but there are polymorphic records. The type system includes two flavors for each field label: Get and Set. If a field is only being read, the record (or object or row) that contains it only needs to have the ‘Get’ accessor for that field. Here’s infernu’s output for a simple example:

//  obj :  { subObj:  { val: Number } }
var obj = { subObj: { val: 3 } };

Our object is simple. The comment is what infernu infers, a reasonably simple type.

In the notation I (shamelessly) invented, read-only fields have a prefix ‘get’ in the type, and read/write fields don’t have any prefix. So a read-only field ‘bla’ would be: { get bla : t }. If ‘bla’ is required to be writable, the type is written as { bla : t }. So in the above ‘obj’ example, we see that literal objects are by default inferred to be writable (type annotations would allow you to control that).

Next let’s make a function that only reads ‘subObj’:

//       readSubObj : ( { get subObj: h | g} -> h)
function readSubObj(x) { return x.subObj; }

The type inferred says “readSubObj is a function, that takes an object with a readable field subObj, (hence “get subObj”: it doesn’t require the ‘Set’ capability!). subObj has any type ‘h’, and the function returns that same type, ‘h’. (By the way, that ‘| g‘ means the passed object is allowed to contain also other fields, we don’t care.)

Example of a nested read:

//       readVal : ( { get subObj:  { get val: d | c} | b} -> d)
function readVal(x) { return x.subObj.val; }

Now we need to ‘get subObj’ but subObj itself is an object with a readable field ‘val’ of type d. The function returns a ‘d’.

We can use readSubObj on a writable object with no problems:

//  sub :  { val: Number }
var sub = readSubObj(obj);

When infernu supports type annotations (eventually) one could take advantage of this type-system feature by marking certain fields ‘get’.

While this isn’t exactly the same as the problem we discussed with C const pointers, the same idea could be used to implement polymorphic constness.

The main idea here is that ideas from row-type polymorphism can be used to implement a certain kind of ‘capabilities’ over types, constraints that are propagated. This may be a nicer way to implement (some kind of) polymorphic constraints.

(For example, in a language that supports row extension/reduction, a function { x : Int | r } -> { | r } would retain the unspecified constraints from ‘r’. I’m sure there are more interesting examples.)

If you can refer me to something like this, please do!

Advertisements
Const when you need it

infernu news

In the past month, most of the work on infernu was to stabilize the type system, making decisions about some trade-offs. Here’s a short summary:

  • Colors! I refactored all the pretty printing code to use the ansi-wl-pprint package, which fixes indentation and formatting issues and also adds ansi colors to the output. One neat thing is that identical type variables have the same color:
    infernu-colors
  • Changed the way constructor functions are treated. Until now, constructor functions were treated at definition site as regular functions. The difference between constructors and non-constructors only happened at “new” calls, where the expected “this” value was forced to be a closed row type. Unfortunately this breaks if you call “new Foo()” inside the definition of “Foo”.
    To avoid this issue, functions with uppercase names are now treated specially and the “this” row-type is forced to be closed when the function name is added to the environment. This allows maximum flexibility while defining Foo, while ensuring Foo’s type is closed outside the constructor to prevent junk code like var x = new Foo(); x.myWrongProperty = 2;
  • Explored the idea of “Maybe” (or “optional”) types, including having common JS APIs use them for stronger safety. For example, array access should return a Maybe value.
    Unfortunately, since there is no syntax to construct a Maybe-typed value (no “Just”), all values can be implicitly “upgradeed” to Maybe. In other words, there is an ambiguity that break type inference. So for now, not implementing Maybe types (perhaps with explicit annotations they can come back).
  • Decided to disable generalization of row fields (object properties). This decision means that user-defined objects will by default not have polymorphic methods, although the object itself could be polymorphic (and thus different occurrences of the object in the program syntax, will allow instantiation to different types). The reason for this decision is that overly-polymorphic fields cause unexpected type errors, such as when passing objects that contain them as parameters to functions (contravariance can surprise you).
  • Started a document sketching out denotational semantics of JS, as infernu decides these should be, which helped clarify a few issues in the JS -> LC translator. The next step is to change all translations to preserve semantics, currently they only preserve types.
  • Bug fixes: polymorphic subsumption checking, unification of recursive types.
  • Increased compatibility: now using base-compat and a custom prelude to increase compatibility with different GHC versions (thanks to RyanGlScott for submitting a fix to use base-orphans which prompted me to do this).
infernu news

A simple problem with recursive types and subtyping

Here’s a simple example of recursive types interacting badly with subtyping:

T={ foo: T -> A}
U={ foo: U -> B}

Consider T <: U, therefore

(T -> A) <: (U -> B)

Which implies:

U <: T

So T <: U but also U <: T, which is true iff A <: B and B <: A.

In my case, the subtyping relation is polymorphic subsumption: T is subsumed by U iff U is “more polymorphic”, intuitively, it can be instantiated to all types that T can.

This situation arises in rather simple code, involving polymorphic vs. non-polymorphic row fields. For example, A is a row with a polymorphic method, whereas B is a row with a monomorphic (but compatible) method, such as:

A = { method: forall a. a -> () }
B = { method: String -> () }

In this case subsumption (the form of subtyping in play) fails.

One way around this is to avoid subsumption issues altogether by keeping things rank-1, and not using higher-rank row fields. Unfortunately, throwing away polymorphic methods is very bad: consider a non-polymorphic array.map (in JS).

A slightly better workaround is to push the foralls all the way out, keeping all types (including row types) rank-1. Every time an object method is accessed via the property syntax obj.method, we end up instantiating the object’s row type, and get a “fresh” type for the method. We get practically polymorphic methods. That’s the approach I’m investigating for infernu.

A simple problem with recursive types and subtyping

Identity Crisis

Compared to other tools adding static types to JavaScript, Infernu’s main strengths are full type inference and strong type safety. Here are a few examples.

Identity Function

Here is the simplest possible function:

function id(x) {
  return x;
}

TypeScript

TypeScript‘s compiler tsc can generate a .d.ts from that:

declare function id(x: any): any;

That’s no good! We can pass any type we want to our function, but the return type is not tied to the argument type – it is treated as anything, basically untyped. So we can do this:

var n = 'hi'; n = id(5);

And TypeScript will output the following .d.ts:

declare var n: string;

That seems wrong: n is assigned a number via id(5). But wait – there is a way to turn off inference of any types (with --noImplicitAny). If we try that on our identity function, we get:

id.ts(1,13): error TS7006: Parameter 'x' implicitly has an 'any' type.

Explicit Generics

Oops. Ok, but TypeScript has generics! Let’s try that: the TypeScript handbook gives exactly the example we need – we just write the type out explicitly, like so:

function identity<T>(arg: T): T {
    return arg;
}

Great! We got what we needed, but without type inference.

Flow

Facebook’s Flow has a type system that’s (slightly?) different from TypeScript’s, and apparently works differently. Let’s try it. We can use the flow suggest command to get suggested typing (I’m using version 0.7). Here’s what we get for a single file containing only the identity function above: nothing. It doesn’t suggest any type. Ok, let’s try using our id in a way that makes no sense, to induce an error (after all, type checkers are used to find errors). Here’s bad_id.js:

/* @flow */
function id(x) { return x;}
var n = 'hi'; n = id(5);
var z = n; // added so we can see what flow says the type of n is here.

(Note: The /* @flow */ header is used to tell flow that it should look at this file.)
Run flow suggest bad_id.js and you get a diff-style output. I’ve ‘applied’ it to make it easier to read – here’s what flow suggests:

function id(x: number) : number{ return x;}
var n: string | number = 'hi'; n = id(5);
var z: number = n;

Interesting! We managed to get something without reverting to explicit type annotations. But we didn’t get an error!

First, id was inferred to take and return number, apparently because that’s the only way we’ve used it. It would be interesting to see what happens when we use id several times with different types – we’ll try that soon.

Second, n was given a union type string | number, because it takes on both types during its lifetime. It may be a matter of taste, but I would rather not have the type checker deduce implicit union types in this case (n = 'hi'; n = 5;) – instead I would expect that to be an error.

The unique (and impressive) part is that flow is able to tell that z is only ever going to have number values, and so it’s safe to assign it that type. That’s probably a result of the flow analysis they do.

Now let’s try calling id several times, with different types:

/* @flow */
function id(x) { return x;}
id(5); id('hi');

Flow suggests:

function id(x: string | number) : string | number{ return x;}

Uh oh – does this means the argument and result types are no longer tied to each other? If I pass in a number, will the compiler check that I use the result only as a number (and not as a string)? Let’s try using it, doing var n = id(5), flow suggests:

var n: string | number = id(5);

Despite n only ever being assigned a number, it now has type string | number. So apparently, union types propagate implicitly, infecting everything on the way.

Explicit Generics

Fortunately, flow too has generics, and again straight out of the manual we find:

/* @flow */
function foo(x: X): X { return x; }

Great! We got what we needed, but without type inference.

Infernu

Let’s get down to business. Infernu says:

//       id : a.(b -> b)
function id(x) { return x; }

Cool! Without any help from us, Infernu figured out the most generic type. Take a type b, return the same type b. The magic of polymo…Wait, what’s that a. thing?

Well, JavaScript has this nice keyword called this which is dynamically scoped, meaning that this is bound to different things depending on how your function is invoked and not on how it’s defined. For example:

var obj = { hat: { type: 'top' }, getHatType: function() { return this.hat.type; } };
obj.getHatType(); // ok.
var f = obj.getHatType;
f(); // oops! TypeError: Cannot read property 'type' of undefined

Nasty stuff. Every JavaScript programmer should know this.

Fortunately, Infernu is here to save you. It infers not only what arguments and return types a function has, but also what this must be for the function call to work, and verifies that you use it correctly.

Infernu type signatures for functions have the following format (subject to change without notice, bla bla bla):

this.(arguments -> result)

So for our var f = obj.getHatType example, Infernu says:

//  f : {hat: {type: d, ..f}, ..e}.(() -> d)
var f = obj.getHatType;

Decomposing that type signature, we read that this is expected to be an object containing at least a property called ‘hat’ which is an object with at least a property called ‘type’ of some type d. The function takes no arguments (hence the empty ()) and returns the same type d that was taken from the hat.type property of this. (The ellipsis stuff ..f and ..e is due to row-type polymorphism, which will be elaborated upon in a future blog post.)

Back to our identity function, we examine the signature again: a.(b -> b) – the type of this is given an unconstrained type parameter a – so Infernu is telling us explicitly that this is allowed to be anything for our identity function. Yippy!

Summary

We saw that both TypeScript and Flow (and Google Closure too, which I haven’t shown) support generics that can express the identity function properly. They also offer weak forms of type inference that sometimes yields weakly-typed results. Infernu, on the other hand, will infer generic types automatically, and prefers to fail over giving weak typings.

There are many known discussions about subtyping (inheritance)-based type systems, represented here by TypeScript and Flow, vs. parametric polymorphism (being Infernu in this case). There are known pros and cons to both sides: but one important result is that type inference is just easier when there is no subtyping involved.

Infernu is designed to take advantage of that.

Identity Crisis

Ad-hoc polymorphism, type families, ambiguous types, and Javascript

Imagine a language, where all we have are strings and numbers, and where + is a built-in function that can either add numbers or concatenate strings. Consider the following code:

a + b

What are the types of a, b and +?

Using a Haskell-like type signature we can say that + has either of these types:

+ :: (Number, Number) -> Number

or

+ :: (String, String) -> String

(Currying avoided intentionally.)

This is a classic case of ad-hoc polymorphism. With type classes one could say:

class Plus a where
  (+) :: (a, a) -> a

instance Plus Number where
  x + y = ... implement addition ...

instance Plus String where
  x + y = ... implement string concatenation ...

That’s great! Now we can type our a + b:

a + b :: Plus t => t

Where a :: t and b :: t.

Yes, there are also Scala-style implicits (recently discussed here in a proposal for OCaml), and probably other solutions I’m less aware of.

Notice that in the + example, a constraint on a single type (expressed through type class requirements) was enough to solve the problem.

Constraints on multiple types

Now, let’s look at a more complicated problem, involving multiple types at once.

Consider a language with only parameterized lists [a] and maps from strings to some type, Map a. Now, throw in a terribly ambiguous syntax where brackets denote either accessing a list at some index, or accessing a map at some key (wat):

   x[i]

That syntax means “access the container x at index/key i”. Now, what are the types of x, i and the brackets? There are two possibilities: if x is an array, then i must be a number; otherwise if x is a map, then i is a string.

Type families can be used to encode the above constraints:

class Indexable a where
  type Index a
  type Result a
  atIndex :: (a, Index a) -> Result a

The syntax means that any instance of the type class Indexable a “comes with” two accompanying types, Index a and Result a which are uniquely determined by the appropriate choice of a. For [t], Index = Number and Result = t. For Map t, Index = String and Result = t.

Now we just need syntax sugar where x[i] = x `atIndex` i. We could then define instances for our two types, [a] and Map a (remember, in our language the map keys are always Strings):

instance Indexable [a] where
  type Index [a] = Number
  type Result [a] = a
  atIndex = ... implement list lookup by index ...

instance Indexable (Map a) where
  type Index (Map String a) = String
  type Result (Map String a) = a
  atIndex = ... implement map lookup by key ...

Nice. Now, to type our original expression x[i]:

x[i] :: Indexable a => Result a

Where x :: a and i :: Index a.

Great! Type families (or rather, “type dependencies”) provide a way to represent inter-type constraints and can be used to resolve ambiguous expressions during type inference. (I’ve heard that type families are equivalent to functional dependencies or even GADTs for some value of “equivalent” , maybe where “equivalent = not equivalent at all”, but that’s off-topic.) See also
a valid Haskell implementation of the above example (thanks to Eyal Lotem).

I don’t know if Scala-style implicits can be used to such effect – let me know if you do.

Ambiguous Types

Now, here’s an altogether different way to approach the ad-hoc polymorphism problem. This was my idea for a solution to ad-hoc polymorphism with inter-type constraints, before I realized type families could do that too.

Define an “ambiguous type” as a type that represents a disjunction between a set of known types. For example, for + we could say:

+ :: (a = (String | Number)) => (a, a) -> a

The type variable a is constrained to be either a String or a Number, explicitly, without a type class. I see two main differences with type classes, from a usage point of view. First, this type is closed: because there is now class, you can’t define new instances. It must be a Number or a String. Second, you don’t need to add + to some class, and if we have more operators that require a similar constiant, or even user-defined functions that require some kind of ad-hoc constraint, we don’t need to define a type class and add functions/operators to it. Lastly, the type is straightforward, and is easier to explain to people familiar with types but not with type classes.

By the way, I have no idea (yet) if this leads to a sound type system.

Ambiguous types involving multiple type variables

Let’s continue to our more complicated, ambiguous “atIndex” function that can either index into lists (using a number) or into maps (using a string). A simple disjunction (or, |) is not enough: we must express the dependencies between the container type and the index type. To that end we add conjunctions (and, &) so we can express arbitrary predicates on types, such as (t & s) | u. The type of atIndex will now be:

atIndex :: (a = [t] & i = Number) | (a = Map t & i = String) => a -> i -> t

This definitely does the job. Is it sound? Will it blend? I don’t know (yet).

The main drawback of this system is the combinatorial explosion of the predicate when combining ambiguous (overloaded) functions, such as in the following program:

x[i] + y[j]

x could be either a list or map of either numbers or strings, and so can y, so i and j can be either numbers or strings (to index into the lists or maps). We quickly get to a very large predicate expression, that is slow to analyze and more importantly, very hard to read and understand.

Nevertheless, I implemented it.

Implementing ambiguous types

Infernu is a type inference engine for JavaScript. All of the examples described above are “features” of the JavaScript language. One of the goals of Infernu is to infer the safest most general type. JS expressions such as x[i] + d should be allowed to be used in a way that makes sense. To preserve safety, Infernu doesn’t allow implicit coercions anywhere, including in +, or when indexing into a map (JS objects can act as string-keyed maps). To retain a pseudo-dynamic behavior safely, polymorphism with fancy type constraints as discussed above are required.

To properly implement ambiguous types I had to solve a bunch of implementation problems, such as:

  1. How to unify two types that have constraints on them? The problem is of finding the intersection of two arbitrary boolean expressions. My solution was to convert both equations to DNF, which is just a big top-level OR between many AND (conjunction) expressions (and no further OR or ANDs). Then compare every combination pair of conjunctions and rule out ones that don’t match. The surviving set of conjunctions is used to build a new DNF expression.
  2. How to simplify the resulting predicate? While an optimal solution exists, it is NP-hard, and more importantly, I was too lazy to implement it. Instead I ended up using a heuristic constructive approach where while building expressions, I try to find patterns that can be simplified.
  3. A software engineering challenge: when building the constraint unifier, how to design an API that allows effectful unification at the leaf nodes when we are testing conjunctions? I ended up using some Applicative/Traversable magic (but the code isn’t pretty). Rumor has it that lenses may make it much nicer.
  4. How to represent constraints in the types AST? I followed what is allegedly GHC’s approach by defining a wrapper data type, which wraps a type with an accompanying predicate, something like: data QualType t = QualType (Pred t) (Type t). Then, the unification function is left unchanged: when doing inference, I also call the constraint unifier separately, using its result to “annotate” the inferred type with a constraint predicate.

Conclusions

Apparently, the ambiguous types are not a good solution due to the complex type signatures. I’ll either leave the ambiguous types in (having already implemented them) or just get rid of them and implement type families, which will require another crusade on my code.

I’m still thinking about whether or not type families cover all cases that my ambiguous types can. One example is the “type guard” pattern common in JavaScript:

if (typeof obj == 'string') {
    .. here obj should be treated as a string! ...
}

Can ambiguous types and type families both be used coherently to implement compile-type checking inside type guards? (Haven’t given it much thought – so far I think yes for both.)

Also, since ambiguous types are closed, they may offer some features that type families can’t, such as warnings about invalid or incomplete guards, which can be seen as type pattern matching. Maybe closed type families are a solution: I don’t know much about them yet.

I also don’t know if ambiguous types lead to a sound type system or are there pitfalls I haven’t considered.
Remember that these ambiguous types may also interact with many features: parameteric polymorphism, row-type polymorphism, the requirement for prinicpal types and full type inference without annotations, etc.

Lastly, I’ve probably re-invented the wheel and somebody has written a bunch of papers in 1932, and there’s some well-accepted wisdom I’ve somehow missed.

Acknowledgement

Thanks to Eyal Lotem for a short, but very fruitful conversation where he showed me how type families may obsolete my idea.

Ad-hoc polymorphism, type families, ambiguous types, and Javascript