Ad-hoc polymorphism, type families, ambiguous types, and Javascript

Imagine a language, where all we have are strings and numbers, and where + is a built-in function that can either add numbers or concatenate strings. Consider the following code:

a + b

What are the types of a, b and +?

Using a Haskell-like type signature we can say that + has either of these types:

+ :: (Number, Number) -> Number

or

+ :: (String, String) -> String

(Currying avoided intentionally.)

This is a classic case of ad-hoc polymorphism. With type classes one could say:

class Plus a where
  (+) :: (a, a) -> a

instance Plus Number where
  x + y = ... implement addition ...

instance Plus String where
  x + y = ... implement string concatenation ...

That’s great! Now we can type our a + b:

a + b :: Plus t => t

Where a :: t and b :: t.

Yes, there are also Scala-style implicits (recently discussed here in a proposal for OCaml), and probably other solutions I’m less aware of.

Notice that in the + example, a constraint on a single type (expressed through type class requirements) was enough to solve the problem.

Constraints on multiple types

Now, let’s look at a more complicated problem, involving multiple types at once.

Consider a language with only parameterized lists [a] and maps from strings to some type, Map a. Now, throw in a terribly ambiguous syntax where brackets denote either accessing a list at some index, or accessing a map at some key (wat):

   x[i]

That syntax means “access the container x at index/key i”. Now, what are the types of x, i and the brackets? There are two possibilities: if x is an array, then i must be a number; otherwise if x is a map, then i is a string.

Type families can be used to encode the above constraints:

class Indexable a where
  type Index a
  type Result a
  atIndex :: (a, Index a) -> Result a

The syntax means that any instance of the type class Indexable a “comes with” two accompanying types, Index a and Result a which are uniquely determined by the appropriate choice of a. For [t], Index = Number and Result = t. For Map t, Index = String and Result = t.

Now we just need syntax sugar where x[i] = x `atIndex` i. We could then define instances for our two types, [a] and Map a (remember, in our language the map keys are always Strings):

instance Indexable [a] where
  type Index [a] = Number
  type Result [a] = a
  atIndex = ... implement list lookup by index ...

instance Indexable (Map a) where
  type Index (Map String a) = String
  type Result (Map String a) = a
  atIndex = ... implement map lookup by key ...

Nice. Now, to type our original expression x[i]:

x[i] :: Indexable a => Result a

Where x :: a and i :: Index a.

Great! Type families (or rather, “type dependencies”) provide a way to represent inter-type constraints and can be used to resolve ambiguous expressions during type inference. (I’ve heard that type families are equivalent to functional dependencies or even GADTs for some value of “equivalent” , maybe where “equivalent = not equivalent at all”, but that’s off-topic.) See also
a valid Haskell implementation of the above example (thanks to Eyal Lotem).

I don’t know if Scala-style implicits can be used to such effect – let me know if you do.

Ambiguous Types

Now, here’s an altogether different way to approach the ad-hoc polymorphism problem. This was my idea for a solution to ad-hoc polymorphism with inter-type constraints, before I realized type families could do that too.

Define an “ambiguous type” as a type that represents a disjunction between a set of known types. For example, for + we could say:

+ :: (a = (String | Number)) => (a, a) -> a

The type variable a is constrained to be either a String or a Number, explicitly, without a type class. I see two main differences with type classes, from a usage point of view. First, this type is closed: because there is now class, you can’t define new instances. It must be a Number or a String. Second, you don’t need to add + to some class, and if we have more operators that require a similar constiant, or even user-defined functions that require some kind of ad-hoc constraint, we don’t need to define a type class and add functions/operators to it. Lastly, the type is straightforward, and is easier to explain to people familiar with types but not with type classes.

By the way, I have no idea (yet) if this leads to a sound type system.

Ambiguous types involving multiple type variables

Let’s continue to our more complicated, ambiguous “atIndex” function that can either index into lists (using a number) or into maps (using a string). A simple disjunction (or, |) is not enough: we must express the dependencies between the container type and the index type. To that end we add conjunctions (and, &) so we can express arbitrary predicates on types, such as (t & s) | u. The type of atIndex will now be:

atIndex :: (a = [t] & i = Number) | (a = Map t & i = String) => a -> i -> t

This definitely does the job. Is it sound? Will it blend? I don’t know (yet).

The main drawback of this system is the combinatorial explosion of the predicate when combining ambiguous (overloaded) functions, such as in the following program:

x[i] + y[j]

x could be either a list or map of either numbers or strings, and so can y, so i and j can be either numbers or strings (to index into the lists or maps). We quickly get to a very large predicate expression, that is slow to analyze and more importantly, very hard to read and understand.

Nevertheless, I implemented it.

Implementing ambiguous types

Infernu is a type inference engine for JavaScript. All of the examples described above are “features” of the JavaScript language. One of the goals of Infernu is to infer the safest most general type. JS expressions such as x[i] + d should be allowed to be used in a way that makes sense. To preserve safety, Infernu doesn’t allow implicit coercions anywhere, including in +, or when indexing into a map (JS objects can act as string-keyed maps). To retain a pseudo-dynamic behavior safely, polymorphism with fancy type constraints as discussed above are required.

To properly implement ambiguous types I had to solve a bunch of implementation problems, such as:

  1. How to unify two types that have constraints on them? The problem is of finding the intersection of two arbitrary boolean expressions. My solution was to convert both equations to DNF, which is just a big top-level OR between many AND (conjunction) expressions (and no further OR or ANDs). Then compare every combination pair of conjunctions and rule out ones that don’t match. The surviving set of conjunctions is used to build a new DNF expression.
  2. How to simplify the resulting predicate? While an optimal solution exists, it is NP-hard, and more importantly, I was too lazy to implement it. Instead I ended up using a heuristic constructive approach where while building expressions, I try to find patterns that can be simplified.
  3. A software engineering challenge: when building the constraint unifier, how to design an API that allows effectful unification at the leaf nodes when we are testing conjunctions? I ended up using some Applicative/Traversable magic (but the code isn’t pretty). Rumor has it that lenses may make it much nicer.
  4. How to represent constraints in the types AST? I followed what is allegedly GHC’s approach by defining a wrapper data type, which wraps a type with an accompanying predicate, something like: data QualType t = QualType (Pred t) (Type t). Then, the unification function is left unchanged: when doing inference, I also call the constraint unifier separately, using its result to “annotate” the inferred type with a constraint predicate.

Conclusions

Apparently, the ambiguous types are not a good solution due to the complex type signatures. I’ll either leave the ambiguous types in (having already implemented them) or just get rid of them and implement type families, which will require another crusade on my code.

I’m still thinking about whether or not type families cover all cases that my ambiguous types can. One example is the “type guard” pattern common in JavaScript:

if (typeof obj == 'string') {
    .. here obj should be treated as a string! ...
}

Can ambiguous types and type families both be used coherently to implement compile-type checking inside type guards? (Haven’t given it much thought – so far I think yes for both.)

Also, since ambiguous types are closed, they may offer some features that type families can’t, such as warnings about invalid or incomplete guards, which can be seen as type pattern matching. Maybe closed type families are a solution: I don’t know much about them yet.

I also don’t know if ambiguous types lead to a sound type system or are there pitfalls I haven’t considered.
Remember that these ambiguous types may also interact with many features: parameteric polymorphism, row-type polymorphism, the requirement for prinicpal types and full type inference without annotations, etc.

Lastly, I’ve probably re-invented the wheel and somebody has written a bunch of papers in 1932, and there’s some well-accepted wisdom I’ve somehow missed.

Acknowledgement

Thanks to Eyal Lotem for a short, but very fruitful conversation where he showed me how type families may obsolete my idea.

Ad-hoc polymorphism, type families, ambiguous types, and Javascript

Inference of ‘new’ statements, and ambiguous types

To prevent possible name clashes, the name of my type inference engine for JavaScript is now Inferny Infernu . I hope I don’t have to change the name again!

In other news, here is some recent progress:

‘new’ now requires ‘this’ to be a row type

I changed inference of ‘new’ calls, so that the constructed function must have a row type as it’s “this” implicit parameter (nothing else makes sense).

The change to “new” typing made it possible to define the built in String, Number, Boolean type coercion functions in a way that disallows constructing them (e.g. “new String(3)”) which is considered bad practice in JavaScript. The reason is that the constructed values are “boxed” (kind of) so that they don’t equate by reference as normal strings, booleans and numbers do. For example, new String(3) == '3' but at the same time, new String(3) !== '3'.

Ambiguous Types

I added an initial implementation of what I call ambiguous types. These are simple type constraints that limit a type variable to a set of specific types.

The motivation for type constraints is that some JavaScript operators make sense for certain types, but not all types. So having a fully polymorphic type variable would be too weak. For example, the + operator has weird outputs when using all sorts of different input types (NaNNaNNaNNaNNaNNaN….batman!). I would like to constrain + to work only between strings or between numbers.

With the new type constraints, + has the following type:

a = (TNumber | TString) => ((a, a) -> a)

The syntax is reminiscent of Haskell’s type classes, and means: given a type variable “a” that is either a TNumber or a TString, the type of + is: (a, a) -> a

I’m thinking about implementing full-blown type classes, or alternatively, a more powerful form of ambiguous types, to deal with some other more complicated examples.

Inference of ‘new’ statements, and ambiguous types

SJS is now known as Inferno

Inferno is a type inference engine for a safe subset of JS. It was formerly known as SJS.

Happy happy joy joy. The poll results are in. The interwebz has said it’s word, and that word is Inferno.

Inferno can stand for “Inference, Noam!” but who cares.

Here are the top results for those of you waiting for something to compile:

9 inferno
8 Safe JS (current name)
8 infer.js
7 saner.js
6 type.js
6 honest
6 antic (A Noam’s Type Inference and Checker :)
5 sane.js
5 entypen
4 inferity
4 entype
4 confer

 

SJS is now known as Inferno

Ok, Internet, I heard you. I need a new name for the JS type inferer/checker.

I knew a new name is needed for this JS type inferer/checker. Many people have pointed out that sweet.js uses the same short name (sjs). I could just stick with “Safe JS” but not make it short. Or I could pick a new name, to make it less boring.

Help me help you!

Please take this quick poll.

Thanks!

Ok, Internet, I heard you. I need a new name for the JS type inferer/checker.

Introducing SJS, a type inferer and checker for JavaScript (written in Haskell)

TL;DR: SJS is a type inference and checker for JavaScript, in early development. The core inference engine is working, but various features and support for the full browser JS environment and libraries are in the works.

SJS (Haskell source on github) is an ongoing effort to produce a practical tool for statically verifying JavaScript code. The type system is designed to support a safe subset of JS, not a superset of JS. That is, sometimes, otherwise valid JS code will not pass type checking with SJS. The reason for not allowing the full dynamic behavior of JS, is to guarantee more safety and (as a bonus) allows fully unambiguous type inference.

The project is still in early development, but the core inference engine is more or less feature complete and the main thing that’s missing is support for all of JS’s builtin functions / methods and those that are present in a browser environment.

Compare to:

  • Google Closure Compiler, whose primary goal is “making JavaScript download and run faster”, but also has a pretty complex type-annotation centric type-checking feature. The type system is rather Java-like, with “shallow” or local type inference. Generics are supported at a very basic level. I should write a blog post about the features and limitations of closure. It’s a very stable project in production use at Google for several years now. I’ve used it myself on a few production projects. Written in Java. They seem to be working on a new type inference engine, but I don’t know what features it will have.
  • Facebook Flow, which was announced a few weeks ago (just as I was putting a finishing touch on my core type checker code!), has a much more advanced (compared to closure) type checker, and seems to be based on data flow analysis. I haven’t gotten around to exploring what exactly flow does, but it seems to be much closer in design to SJS, and obviously as a project has many more resources. There are certain differences in the way flow infers types, I’ll explore those in the near future.
  • TypeScript: a superset of JS that translates into plain JS. Being a superset of JS means that it includes all of the awful parts of JS! I’ve asked about disabling those bad features a while back (around version 0.9); from what I’ve checked, version 1.4 still seems to include them.
  • Other something-to-JS languages, such as PureScript, Roy, Haste, and GHCJS (a full Haskell to JS compiler). These all have various advantages. SJS is aimed at being able to run the code you wrote in plain JS. There are many cases where this is either desired or required.

Of all existing tools, Flow seems to be the closest to what I aim to achieve with SJS. However, SJS supports type system features such as polymorphism which are not currently supported by Flow. On the other hand, Flow has Facebook behind it, and will surely evolve in the near future.

Closure seems to be designed for adapting an existing JS code base. They include features such as implicit union types and/or a dynamic “any” type, and as far as I know don’t infer polymorphic types. The fundamental difference between SJS and some of the alternatives is that I’ve designed SJS for more safety, by supporting a (wide) subset of JS and disallowing certain dynamic typing idioms, such as assigning values of different types to the same variable (in the future this may be relaxed a bit when the types are used in different scopes, but that’s a whole other story).

Ok, let’s get down to the good stuff:

Features:

  • Full type inference: no type annotations necessary.
  • Parametric polymorphism (aka “generics”), based on Hindley-Milner type inference.
  • Row-type polymorphism, otherwise known as “static duck typing”.
  • Recursive types for true representation of object-oriented methods.
  • Correct handling of JS’s this dynamic scoping rules.

Support for type annotations for specifically constraining or for documentation is planned.

Polymorphism is value restricted, ML-style.

Equi-recursive types are constrained to at least include a row type in the recursion to prevent inference of evil recursive types.

Examples

Note: An ongoing goal is to improve readability of type signatures and error messages.

Basic

JavaScript:

var num = 2;
var arrNums = [num, num];

SJS infers (for arrNums):

[TNumber]

That is, an array of numbers.

Objects:

var obj = { something: 'hi', value: num };

Inferred type:

{something: TString, value: TNumber}

That is, an object with two properties: ‘something’, of type string, and ‘value’ of type number.

Functions and this

In JS, this is one truly awful part. this is a dynamically scoped variable that takes on values depending on how the current function was invoked. SJS knows about this (pun intended) and infers types for functions indicating what this must be.

For example:

function useThisData() {
    return this.data + 3;
}

SJS infers:

(this: {data: TNumber, ..l} -> TNumber)

In words: a function which expects this to be an object with at least one property, “data” of type number. It returns a number.

If we call a function that needs this incorrectly, SJS will be angry:

> useThisData();
Error: Could not unify: {data: TNumber, ..a} with TUndefined

Because we called useThisData without a preceding object property access (e.g. obj.useThisData), it will get undefined for this. SJS is telling us that our expected type for this is not unifiable with the type undefined.

Polymorphism

Given the following function:

function makeData(x) {
    return {data: x};
}

SJS infer the following type:

((this: a, b) -> {data: b})

In words: A function that takes anything for its this, and an argument of any type, call it b. It returns an object containing a single field, data of the same type b as the argument.

Row-type polymorphism (static duck typing)

Given the following function:

function getData(obj) {
    return obj.data;
}

SJS infers:

((this: h, {data: i, ..j}) -> i)

In words: a function taking any type for this, and a parameter that contains at least one property, named “data” that has some type i (could be any type). The function returns the same type i as the data property.

SJS is an ongoing project – I hope to blog about specific implementation concerns or type system features soon.

Introducing SJS, a type inferer and checker for JavaScript (written in Haskell)

xml-to-json – new version released, constant memory usage

I’ve released a new version (1.0.0) of xml-to-json, which aims to solve memory issues encountered when converting large XML files. The new version includes two executables: the regular (aka “classic”) version, xml-to-json, which includes the various features, and the newly added executable xml-to-json-fast, which runs with constant memory usage and can process files of arbitrary size. It does this by not validating the input xml, and by basically streaming json output as it encounters xml elements (tags) in the input. The implementation takes advantage of the cool tagsoup library for processing XML.

Check the README.md for more details. Hackage is updated.

xml-to-json – new version released, constant memory usage

Generate Javascript classes for your .NET types

We open-sourced another library: ClosureExterns.NET (on github and nuget). It generates Javascript classes from .NET-based backend types, to preserve type “safety” (as safe as Javascript gets) across front- and backend. As a bonus you get Google closure annotations. The type annotations are understood by WebStorm (and other editors) and improve your development experience. Also, if you use Google Closure to compile or verify your code, it will take these types into account. We use it extensively with C#. We haven’t tried it with F#, but it’s supposed to work with any .NET type.

ClosureExterns.NET makes it easier to keep your frontend models in sync with your backend. The output is customizable – you can change several aspects of the generated code. For example you can change the constructor function definition, to support inheritance from some other Javascript function. For more details see ClosureExternOptions.

Getting Started

First, install it. Using nuget, install the package ClosureExterns.

Then, expose a method that generates your externs. For example, a console application:

public static class Program
{
    public static void Main()
    {
        var types = ClosureExternsGenerator.GetTypesInNamespace(typeof(MyNamespace.MyType));
        var output = ClosureExternsGenerator.Generate(types);
        Console.Write(output);
    }
}

You can also customize the generation using a ClosureExternsOptions object.

Example input/output

Input

class B
{
    public int[] IntArray { get; set; }
}

Output

var Types = {};

// ClosureExterns.Tests.ClosureExternsGeneratorTest+B
/** @constructor
*/
Types.B = function() {};
/** @type {Array.<number>} */
Types.B.prototype.intArray = null;

For a full example see the tests.

Generate Javascript classes for your .NET types