Generate Javascript classes for your .NET types

We open-sourced another library: ClosureExterns.NET (on github and nuget). It generates Javascript classes from .NET-based backend types, to preserve type “safety” (as safe as Javascript gets) across front- and backend. As a bonus you get Google closure annotations. The type annotations are understood by WebStorm (and other editors) and improve your development experience. Also, if you use Google Closure to compile or verify your code, it will take these types into account. We use it extensively with C#. We haven’t tried it with F#, but it’s supposed to work with any .NET type.

ClosureExterns.NET makes it easier to keep your frontend models in sync with your backend. The output is customizable – you can change several aspects of the generated code. For example you can change the constructor function definition, to support inheritance from some other Javascript function. For more details see ClosureExternOptions.

Getting Started

First, install it. Using nuget, install the package ClosureExterns.

Then, expose a method that generates your externs. For example, a console application:

public static class Program
{
    public static void Main()
    {
        var types = ClosureExternsGenerator.GetTypesInNamespace(typeof(MyNamespace.MyType));
        var output = ClosureExternsGenerator.Generate(types);
        Console.Write(output);
    }
}

You can also customize the generation using a ClosureExternsOptions object.

Example input/output

Input

class B
{
    public int[] IntArray { get; set; }
}

Output

var Types = {};

// ClosureExterns.Tests.ClosureExternsGeneratorTest+B
/** @constructor
*/
Types.B = function() {};
/** @type {Array.<number>} */
Types.B.prototype.intArray = null;

For a full example see the tests.

Jammed, the simple traffic simulator

See it.

snapshot1

I spend too much time in traffic.

So much time, that I start to see things. I see patterns.

You can get stuck in a slow lane because the others are going so fast you never manage to merge. So on average, how much time do you spend stuck in unescapable lanes? Does it make a difference in the cosmic scheme? (No, it doesn’t)

I decided to build a simulator and see how driving styles affect traffic.

Source is on github.

xml-to-json is now a library

A while back I needed to convert a ton (millions) of small xml files to json, so I could store them in MongoDB. To that end I wrote a teensy-tiny tool called xml-to-json (github, Hackage). Originally it was just a command-line tool with all the code thrown in a single file.

So, I did a quick refactor this week to split it into a library + executable, and pushed it to github (to deafening cries of joy).

Features

First, a non-feature. xml-to-json is “optimized” for many small xml files. If you have many small xml files, you can easily take advantage of multiple cores / cpu’s. You should be aware that for large files (over 10MB of xml data in a single file) something starts to eat up RAM, around 50 times the size of the file.

Other features:

  • You can filter xml subtrees to convert, by element name regex (and you can skip the matching tree root if you wish, converting only the child elements and down).
  • Output either a top-level json object or json array.
  • (Optionally) simplify representation of xml text nodes in attribute-less elements (e.g. “<elem>test</elem>” -> { elem: “test” })

Packages used

For XML decoding, I’m using hxt (over expat using hxt-expat). I tried a few of the xml packages on hackage, and hxt + expat was the only way I could parse quickly while avoiding nasty memory issues. Apparently, tagsoup can be used with Bytestrings to avoid the same issue but I didn’t try.

JSON is encoded using aeson.

 

C# Pitfalls, Compiler Bugs & .NET gotchas

Rarely, but not rarely enough, I find myself banging my head against some unexpected semantics of the C# language. Sometimes I’m simply misunderstanding the language, other times it’s a limitation, or even a compiler bug.

To document my findings I’ve started repository on github for a list of C# pitfalls, compiler bugs, and various other .NET framework gotchas. These are not “beginner tips” – they are things that may surprise a seasoned developer (or is it just me?).

In any case, all are welcome to join this effort – don’t hesitate to fork & send your pull requests.

Flat design sucks.

Recently Microsoft updated the design of its Dynamics CRM to a “flat” look. Since then, we can’t figure out what is clickable or editable, and what is just content.

My plea to designers: make that clickable thingy look clickable. Make that editable input box look like an input box, and not just when I hover over it.

As put by Raluca Budiu from Nielsen Group in this article about iOS 7:

Buttons and interface widgets, when present, need to be easily distinguishable from content. They need to have good affordances that invite users to action. In the absence of strong signifiers, they can get ignored, and users may find themselves lost and disoriented.

 

Not all “flat” design has indistinguishable chrome from content, as pointed out in the above article.

Continue reading

Brackets – a great editor for Javascript, HTML and CSS

http://brackets.io/

brackets

Besides being great, it also has a fast release cycle (2.5 weeks) and is open source. It is being developed by folks at Adobe (plus others from the OSS comunity).

Favorite features:

  • Nice hinting and inline code helpers for CSS.
  • Live editing of HTML and CSS – you can fire-up a browser (currently only Chrome) that is updated as soon as you change the code, no need for F5. It also highlights the HTML your cursor is at.
  • Very lightweight. Yet, it has everything you need to start working on a web project. You just install it.
  • It comes with JSLint built-in that immediately bugs you about your Javascript pitfalls. Other IDEs I’ve used leave it to you to set up that kind of stuff.
  • Extensible. The extensions I’ve installed include CSS color hinting, and a context-menu to open a file’s folder in the OS’s file system thingy (explorer, in Windows).

Weaknesses: a few basic editor features (such as code folding or more intelligent search) are lacking, but the basics are good enough to be very productive. Extension management is currently done via manual file copying, but according to the dev blogs, they’re working on an extension manager.

Platform support: currently built only for Windows and Mac OSX.

Tip: if you’re using Live Development on a site that uses AJAX to access it’s own web server, you need to either enable cross-site requests on your development web server, or (easier) tell brackets where your development web server is serving your html/js/css from – it’s under File…Project Settings. See here for more.

Windows 8 is the worst.

Don’t install Windows 8. Don’t buy a laptop that has it.

I tried. I installed windows 8 on my personal laptop. It was so cheap – $15 to upgrade – I decided to give it a shot.

What a mistake.

If you do anything at all with your computer, don’t use Windows 8. For a tablet – maybe, I don’t know. For a computer – no way. Just say no.

I’m not a hater or a flamer, I use almost only Microsoft products all day long (Windows 7 and Visual Studio most of my work day). I like Windows 7.

Here’s a very short list of stuff that kills me (there are many other problems):

1. Windows 8 restarts your machine WHILE YOU ARE USING IT to finish installing updates. It warns you but never gives you the chance to prevent it. You could be in the middle of reviewing an emergency patient’s diagnosis, and *BOOM* machine shuts down and is unusable for at least 5 minutes. You have no option to save your work. You have no option to stop this from happening.

2. The “metro UI” or “new start screen” or “tile area” is horribly confusing because it hides everything else. You can’t see what’s running at a glance. You can’t change this behavior.

3. The “charms” bar requires you to point your mouse to a magical location. You can’t change this. You can’t disable this (and make the freakin’ bar always visible)

4. There is no bloody CLOCK or calendar on the default screen, that annoying “start screen”. You can’t add one without installing a third-party “windows store app”.

5. The control panel is split between the “classic” control panel and some super-simplified – yet non-overlapping – set of options in the “charms” settings area. Go figure where to fix something.

Too bad, Microsoft really screwed this one up.

End of rant. Here’s a more scientific approach that analyzes windows 8′s usability issues.

p.s. some search engine food: windows 8 sucks, windows 8 is bad, windows 8 is horrible, terrible, nasty, don’t use windows 8, don’t install windows 8, keep windows 7, windows 7 is better than windows 8. Google, you get the point.

Penalty search, part 4 and finale: the x-section algorithm

What’s the problem?

This is the last post in a series dedicated to exploring the problem of “penalty search”. Penalty search involves an ordered array: the first part of the array has values that do not satisfy some property, while the second part – the rest of the values – do satisfy the property. Testing a value for the property carries an index-dependent penalty, given by a cost function. We want to find the first value (lowest index) in the array that satisfies the property. The “total cost” of a search attempt is the total penalty spent looking for the value until it is found.

The goal is to devise an algorithm that searches the array with minimal expected total cost.

Why is this interesting?

Besides being fun to wrestle with, this problem is interesting because it deals with a realistic physical situation: you need to find out the ideal timing for some process. Any time-dependent experiment can be modeled by this problem. Every time you try, you spend time, so the cost of the test is time being spent. The goal is to find the solution as quickly as possible, so we want to minimize the expected time spent searching for a solution.

Linear cost – recap

The last post explored how a linear cost function affects the recursive difference equation that calculates the expected cost. Linear cost is interesting because first, it is easier to analyze, and second, because it fits problems where the penalty is the amount of time spent waiting to check a time-sensitive process.

Analytically, the nice property of linear cost is that the size of the array being searched is what determines the cost (up to a constant), not the specific range of indexes. In that last post we took advantage of this translational quasi-invariance of a linear cost function to simplify the problem. After defining a couple of help functions we ended with the following equation:

h(N)=N(at_N+b)+h(t_N+1)+h(N - t_N - 1)+a(N - t_N - 1)(t_N+1)

Where:

  • N is the size of the region (array) that’s being searched
  • h(N) represents the expected cost of a region (almost – you need to divide by N to get the actual cost)
  • t_N is the index that our algorithm decides to test in the current step
  • and at_N+b is the cost of testing at index t_N

Constant ratio search

For linear cost, we expect the solution – the optimal search algorithm – to be somewhat similar to binary section search: we always pick a constant ratio (“section”) of the current range to pick our next search target. Although I can’t prove that this is a valid assumption, I think it’s interesting enough to see what the ratio should be if we do assume constant ratio search.

To begin, we define t_N := \lfloor rN \rfloor where r \in [0,1) (remember we are already assuming that the region is [0,N-1]). To make things simpler (though admittedly less precise) we’ll drop the floor notation and treat rN as if it were an integer. Substituting in the last equation, we get:

h(N)=N(arN +b)+h(rN +1)+h(N - rN - 1)+a(N- rN - 1)(rN +1)

With a little simplification (saving time using Maxima) we get:

h(N)=h(rN+1)+h((1-r)N-1)+(2ar-a r^2)N^2+(-2ar+b+a)N-a

Notice that the two recursive references to the function h nicely represent testing the left and right sub-regions around the split point rN. I’m personally not any closer to solving that equation, so here’s where we use approximations.

In the last post we saw that for N=1 the value must be zero. We can use this information to solve N=2:

h(2)=h(2r+1)+h(2(1-r)-1)+4(2ar-ar^2)+2(a+b-2ar)-a

Following the same logic as in that last post, we know that we can only split a region of size 2 into two subregions of size 1. This means h must be referencing recursively calls to h(1) twice. This limits the range of r:

\lfloor 2r \rfloor +1=1 and 2- \lfloor 2r \rfloor -1=1

Implying:

\lfloor 2r \rfloor =0 \implies r < 1/2

Here we have a first result: we must split somewhere lower than the middle. Less than makes sense if our linear curve is increasing, such that a>0. I didn’t have time to go back and check, but I’m guessing I somewhere made that assumption implicitly along the way.

It also makes sense: if the cost is increasing linearly, the upper bound for a constant split ratio is the middle – which is exactly binary section when cost is constant (not increasing at all, a=0).

It’s an interesting exercise to plug values for N and see where the bounds on r go. I’m leaving that for another day, or for you to try.

That’s all. If you find interesting instances of this problem, let me know!

Penalty search, part 3: exploring linear cost

Recap

The previous post – a second in a series discussing optimal search under test penalty – concluded with an open question. We can write an equation to describe the expected cost, which we want to minimize the following by picking the best t_i for each interval [x_i,y_i]:

E[cost(x_i,y_i)] = c(t_i) + E[cost(x_{i+1},y_{i+1})]

The open question is: how?

c(t_i) is the test cost (penalty) function. Different values have different costs, so we expect that the optimal search will take into account the different costs and be smart about picking which values to test. As we saw previously, for constant cost the algorithm is simply binary section search. We have a recursive function of two variables (the range – x_i, y_i) which we want to minimize by picking yet another function (the one that picks t_i given the current range).

I don’t know how to solve this generally, which is why I resorted to manual calculations and pretty graphs. In that same previous post I used dynamic programming – a natural problem for Haskell – to find the optimal decision tree for a specific range and cost function. Thanks to Control.Parallel it was easy to improve performance by taking advantage of today’s multi-core processors. What I would really like, though, is to actually find a closed-form solution, one that defines the optimal search algorithm.

Disclaimer

I’m no mathematician (nor a computer science wizard). If you think there’s a mistake anywhere, let me know.

When you don’t know, make the problem simpler

So what to do when you don’t know how to solve a problem?

Make more assumptions.TM

Hopefully, we’ll learn something from the solution of the simpler problem.

Uniform Probability

The first assumption we can make is that of uniform probability (of the search target, in the range). Here’s our equation:

E[cost(x_i,y_i)] = c(t_i) + E[cost(x_{i+1},y_{i+1})]

The range [x_{i+1},y_{i+1}] is what our algorithm decides to search in the next step, and t_i is the current searched value. We expand the expectation E[] of the next test (next level in the decision tree). For clarity I’ll dump the indexes since we don’t need them now:

E[cost(x,y)] = c(t) + P_{x,y}(s \le t)E[cost(x,t)] + P_{x,y}(s > t)E[cost(t+1,y)]

Where P_{x,y}(A) := P(A | s \in [x,y]) is a conditional probability. From now on I’ll skip writing the expectation E. Using our assumption of uniform probability we can write:

cost(x,y) = c(t) + \frac{1}{y-x+1} \left ((t-x+1)cost(x,t) + (y-t)cost(t+1,y) \right )

Notice that now our problem is mostly a function of range size, so we can define N_{x,y}:=(y-x+1) and write:

cost(x,y) = c(t) + \frac{1}{N_{x,y}}(N_{x,t}cost(x,t) + N_{t+1,y}cost(t+1,y))

Multiply by N_{x,y} to get:

N_{x,y}cost(x,y) = N_{x,y}c(t) + N_{x,t}cost(x,t) + N_{t+1,y}cost(t+1,y)

Being a non-mathematician, here’s where I said “Aha!”. We can make things simpler if we define:

g(x,y) := N_{x,y}cost(x,y)

Since $g$ is just $cost$ multiplied by $y-x+1$, minimizing $g$ will also minimize $cost$. Using this definition the equation becomes:

g(x,y) = N_{x,y}c(t) + g(x,t) + g(t+1,y)

…which is beginning to look a lot more manageable. And remember – so far, the only assumption we’ve made is that of uniform probability. But how to proceed? We still have a recursive equation with two variables, x and y.

From two variables to one

Someone must have once said: “one is better than two.” Variables. In a recursive function.

Let’s add another assumption – that the cost function is linear:

c(t) = at+b

Where a and b are arbitrary constants. Why linear? This cost function has the nice property that the cost of any search tree is invariant under translation (of the searched range) up to a constant addition. That is,

cost(x,y)+a u=cost(x+u,y+u)

or

g(x,y)+N_{x,y} a u = g(x+u,y+u)

How do I know this? Here’s an informal proof. A search tree consists of nodes representing the values being tested (and cost being spent). If we translate the underlying range, we’re in effect changing the linear cost function by adding a constant that depends on the amount of translation. So the cost of each node in the search tree will vary by the same constant, leaving the distribution of cost throughout the tree equivalent and keeping the expected cost minimal compared to other trees on the same range.

This property means that given a range size N we can find the best search tree for just one translation of that range (say, x=0) and it will be correct for any other range of the same size.

Plugging in the linear cost in the equation from before gives us:

g(x_i,y_i) = N_i a t_i + N_i b + g(x_i,t_i) + g(t_i+1,y_i)

I’m explicitly writing that the choice of t depends on x,y, just to make things clearer. We’ll use the indexed notation t_i to signify that t depends on the ith range, and use it also for the range size, that is N_i := N_{x_i,y_i}.

Now we can use the translation invariance to replace x_i everywhere with 0. At the same time we can get rid of y_i since it will be equivalent to make the function depend on N_i=y_i+1. To that end, define:

h(N) := g(0,N-1) = g(x,x+N-1)-Nax = N(cost(x,x+N-1)-ax)

Then, the equation above for g(x_i,y_i) becomes:

h(N)=Nat_N+Nb+h(t_N+1)+h(N - t_N - 1)+a(N - t_N - 1)(t_N+1)

That last term is due to the required translation of the h(N-t-1) from the origin to t+1 (the original value was g(t_i+1,y_i).

Results please…

Let’s look at the actual value of the expected cost for a few small N’s (region sizes):

N=1

When the size of the region, N is 1, there is nothing to check – by assumption the target value is in the region and there is only one cell, which must be it. Since there is nothing to check, there is no cost, so $cost(0,1)=0=h(1)$. Also, we don’t care what the offset (translation) of this one-celled region is – so we don’t need to add the additive term when calculating translated calls such as cost(x,x+1) because the value is always 0.

N=2

Let’s move on to N=2. For a region of two cells, the only logical algorithm is to check the first cell. If it yields the value 0, then the target (first post-step cell) is cell #2, otherwise (value at first cell is 1) – the target is cell #1. So the expected cost is just the cost of the first cell – which has index 0 – which is $0a+b=b=cost(0,1)$.

That’s what simple logic dictates. Now let’s see how our equation fairs:

h(2) = 2at+2b+h(t+1)+(h(2-t-1)+a(2-t-1)(t+1))

Remember that t \in [0,N-1], so in this case t can only have the value 0 (matches our logic that we can only test cell 0). Since the second recursion (h(2-t-1)) turns out to be a 1-celled region, we delete the additive term and replace it with 0:

h(2) = 2b+h(1)+h(1)=2b

Since h(2)=2cost(0,1)=2b the result matches our “manual logic” from above.

N=3

Here’s where things start to get interesting. N=3 is the first case where we have to compare different values for t to find out which one yields the minimal expected cost. The possible values are 0 and 1:

  • t=0 \Rightarrow h(3)=3b+h(1)+h(2)+2a=3b+0+2b+2a=2a+5b
  • t=1 \Rightarrow h(3)=3a+3b+h(2)+h(1)+0=3a+3b

So which is less? We should pick t=0 iff 2b \le a. Otherwise, pick t=1.

This is interesting because it means that the optimal search algorithm depends on how our linear cost is defined. If the slope a is “twice more dominant” than the constant b, then we should bias our search towards 0. Otherwise, we go with binary section and take the middle cell with index 1.

What’s next

We could spend the rest of our lives calculating larger and larger values of N. What we really want, though, is a general, closed-form solution for any N. In my next post I’ll make another weakening assumption that will make it possible.