The previous post – a second in a series discussing optimal search under test penalty – concluded with an open question. We can write an equation to describe the expected cost, which we want to minimize the following by picking the best for each interval :
The open question is: how?
is the test cost (penalty) function. Different values have different costs, so we expect that the optimal search will take into account the different costs and be smart about picking which values to test. As we saw previously, for constant cost the algorithm is simply binary section search. We have a recursive function of two variables (the range – ) which we want to minimize by picking yet another function (the one that picks given the current range).
I don’t know how to solve this generally, which is why I resorted to manual calculations and pretty graphs. In that same previous post I used dynamic programming – a natural problem for Haskell – to find the optimal decision tree for a specific range and cost function. Thanks to Control.Parallel it was easy to improve performance by taking advantage of today’s multi-core processors. What I would really like, though, is to actually find a closed-form solution, one that defines the optimal search algorithm.
I’m no mathematician (nor a computer science wizard). If you think there’s a mistake anywhere, let me know.
When you don’t know, make the problem simpler
So what to do when you don’t know how to solve a problem?
Make more assumptions.TM
Hopefully, we’ll learn something from the solution of the simpler problem.
The first assumption we can make is that of uniform probability (of the search target, in the range). Here’s our equation:
The range is what our algorithm decides to search in the next step, and is the current searched value. We expand the expectation of the next test (next level in the decision tree). For clarity I’ll dump the indexes since we don’t need them now:
Where is a conditional probability. From now on I’ll skip writing the expectation . Using our assumption of uniform probability we can write:
Notice that now our problem is mostly a function of range size, so we can define and write:
Multiply by to get:
Being a non-mathematician, here’s where I said “Aha!”. We can make things simpler if we define:
Since $g$ is just $cost$ multiplied by $y-x+1$, minimizing $g$ will also minimize $cost$. Using this definition the equation becomes:
…which is beginning to look a lot more manageable. And remember – so far, the only assumption we’ve made is that of uniform probability. But how to proceed? We still have a recursive equation with two variables, x and y.
From two variables to one
Someone must have once said: “one is better than two.” Variables. In a recursive function.
Let’s add another assumption – that the cost function is linear:
Where and are arbitrary constants. Why linear? This cost function has the nice property that the cost of any search tree is invariant under translation (of the searched range) up to a constant addition. That is,
How do I know this? Here’s an informal proof. A search tree consists of nodes representing the values being tested (and cost being spent). If we translate the underlying range, we’re in effect changing the linear cost function by adding a constant that depends on the amount of translation. So the cost of each node in the search tree will vary by the same constant, leaving the distribution of cost throughout the tree equivalent and keeping the expected cost minimal compared to other trees on the same range.
This property means that given a range size we can find the best search tree for just one translation of that range (say, ) and it will be correct for any other range of the same size.
Plugging in the linear cost in the equation from before gives us:
I’m explicitly writing that the choice of depends on , just to make things clearer. We’ll use the indexed notation to signify that depends on the th range, and use it also for the range size, that is .
Now we can use the translation invariance to replace everywhere with 0. At the same time we can get rid of since it will be equivalent to make the function depend on . To that end, define:
Then, the equation above for becomes:
That last term is due to the required translation of the from the origin to (the original value was .
Let’s look at the actual value of the expected cost for a few small N’s (region sizes):
When the size of the region, is 1, there is nothing to check – by assumption the target value is in the region and there is only one cell, which must be it. Since there is nothing to check, there is no cost, so $cost(0,1)=0=h(1)$. Also, we don’t care what the offset (translation) of this one-celled region is – so we don’t need to add the additive term when calculating translated calls such as cost(x,x+1) because the value is always 0.
Let’s move on to . For a region of two cells, the only logical algorithm is to check the first cell. If it yields the value 0, then the target (first post-step cell) is cell #2, otherwise (value at first cell is 1) – the target is cell #1. So the expected cost is just the cost of the first cell – which has index 0 – which is $0a+b=b=cost(0,1)$.
That’s what simple logic dictates. Now let’s see how our equation fairs:
Remember that , so in this case t can only have the value 0 (matches our logic that we can only test cell 0). Since the second recursion () turns out to be a 1-celled region, we delete the additive term and replace it with 0:
Since the result matches our “manual logic” from above.
Here’s where things start to get interesting. is the first case where we have to compare different values for to find out which one yields the minimal expected cost. The possible values are 0 and 1:
So which is less? We should pick iff . Otherwise, pick .
This is interesting because it means that the optimal search algorithm depends on how our linear cost is defined. If the slope is “twice more dominant” than the constant , then we should bias our search towards 0. Otherwise, we go with binary section and take the middle cell with index 1.
We could spend the rest of our lives calculating larger and larger values of N. What we really want, though, is a general, closed-form solution for any N. In my next post I’ll make another weakening assumption that will make it possible.