# What’s this about?

I’ve been thinking about ways to lay firm, logical foundations for a model of FRP. Rather than trying to cook up the “next best thing”, we should perhaps review what we really want and try to infer what we need. Unfortunately I’ve very little time to think about this and I’ve hardly discussed this. Hopefully this blog post will give my effort another small push.

# Goal

The idea is to define a set of building blocks that allow one to express only systems that satisfy a few basic properties. Specifically we want systems that *don’t* satisfy these properties, to be inexpressible within our model. It makes sense to start by defining the properties we want to satisfy, and then to try and deduce the most general model that satisfies them and only them.

The basic concept is that of a “system”, which we define as a function:

`f :: (T -> A) -> (T -> B)`

Where *T* is an ordered, measurable set, usually denoting time. This is what Conal Elliott called an “interactive behavior“, and what is known in Yampa as a “signal function”. I use the term “system” because that’s how this is called in engineering, and I see no reason to invent new names.

# Desired properties of systems

**Edit: the properties as defined below are not exactly what we want. See comments for discussion.**

So, here are the requirements I’d like to propose. Every expressible system must satisfy the following:

**Insensitivity:**If two inputs are almost the same (differ at most at a set of measure zero), the output will be*(almost?)*the same.**Causality**: For all t in T, output(t) can only depend on input(t-s) where*s ≥ 0*. Meaning, for every two inputs x1, x2 that have the same past, the output for the present is the same. And to be more precise, for every*t*, if*x1(t-s) = x2(t-s)*for almost all*s ≥ 0*, the output at*t*will be the same.**Markovity:**Same as (2), but with the condition that*|s| < ∞*. In words, the output does not depend on inputs from the infinitely remote past or future.

**Finite memory:**given the current input (a single value), the current output depends additionally at most on a finite-length binary string (that may have been computed from past inputs).**Time invariance**(*optional*):*x(t)*produces output*y(t)*, then*x(t – s)*produces output*y(t – s)*.

## Discussion

The first property means that the system is not overly sensitive. Integration is one example of a system that satisfies this property: changing the value of a function at finite number of points does not affect the value of its integral. I don’t know how to call this – if anyone knows a name for this property, please tell me. For now I’ll call it *insensitivity*.

The second property is* causality.* It stems from the physical intuition that we can’t know the future, and every value that we (or our systems, or any other natural thing) compute at the present, should depend only on inputs from the past (or at most, the present). Otherwise, we’d have to wait until the future “arrives” to complete the computation.

Causality is one of the properties that are easily violated if we consider a model that allows us to arbitrarily sample our input, one that allows us to actually treat the input as a function of time. Mathematically, a function is just a mapping and has no rules regarding which part of the domain can be evaluated and when. So, the first realization from our requirements is that we can’t have signals (inputs, outputs, etc.) as directly accessible values in our model, or that they shouldn’t be modeled as functions. Yampa (or in it’s other name, AFRP) imposes the no-access rule and doesn’t have signals as values that can be passed around. Instead, they are indirectly manipulated by constructing systems that transform them – a sort of “wiring up”. There’s a big debate on the pros and cons of an Arrow-based FRP, but I’ve yet to understand the full consequences.

As for a non-functional description of signals, I haven’t though about it enough to know if there’s any obviously useful alternative that solves the causality problem. So in my case, I choose to go the Yampa way – I won’t provide any means to access signals directly.

Besides causality, we also want to make sure the system does not depend on the infinitely remote past. Physical intuition says that we’ll never be able to know what happened that far back. This is the defining property of Markov processes (of any order), so I’ll call this *Markovity*. Note that the requirement is defined as going both ways, but our previous demand for causality means that it only has meaning regarding past inputs. Perhaps the more general formulation (that includes both past and future) is redundant, but to me it seems nicer to have it symmetric even though only one direction (past) matters in our case.

The fourth property is *finite memory*, which is physically intuitive. Especially in the context of computers, we don’t want our systems to endlessly accumulate information until we run out of memory. Mathematically, this requirement needs elaboration to be made precise. The formulation I’ve given above (depending on a finite-length binary string) is simple and describes what I want, but it will require extra work to figure out what implications it has on our systems.

Finally, *time invariance* can be interpreted as not being able to tell the absolute “wall clock” time. Physically this is true: we don’t know about any absolute time. Although in reality engineered systems do in fact evolve over time (wear and tear, for one) it is not some intrinsic property of the system that causes this change – rather it is that the entire observable universe serves as inputs to our systems, which can never be fully isolated. Thus the only thing that should matter for our system’s output is how the input evolves over time, not the exact position in global time of the input. We forbid sensitivity to input time shifts. I noted this requirement as optional because I’m not sure we really want it as a limitation. I do have a gut feeling that we don’t miss out any desirable systems by adding this limitation – please correct me if you have any ideas against this.

# What’s next?

In this blog post I’ve identified the five basic properties that we desire in a system. The next step should be to define a set of operations that can serve as universal building blocks for any system that satisfies the requirements. Perhaps more importantly, our building blocks should *prevent *us from accidentally expressing systems that violate the requirements. I’ll even say that it’s ok if a few valid systems are inexpressible (our “systems filter” has false positives) as long as no violating system is expressible (no false negatives).

I’m waiting for your comments.

Your “insensitivity” property has a few issues. First, without the “almost” on the output you would disallow the identity “system” which I highly doubt you want to do. Second, your “insensitivity” property disallows discrete events (specifically the combination of discrete and continuous.) For example, a “system” that produces true only after the mouse is clicked behaves very differently given the “mouse is never clicked” input and the “mouse is ever clicked” input even though those inputs differ potentially only at one point. You may not be trying to handle this. Either way, it’s not clear to me that in a computational context “insensitivity” is a desirable global property. What you are probably going for is that the inevitable unhideable implementation details won’t change the meaning of the program. It’s probably not feasible to guarantee this and also provide a reasonably comprehensive set of primitives.

Your “finite memory” property doesn’t say what I think you think it says. It is equivalent to “The output function is total.” It would take an infinite amount of time to “depend on” (access say) an infinite amount of memory. If you want to say there is some guaranteed bound on the amount of memory used, then that is a strong and limiting restriction. It’s appropriate in some cases, particularly hard real-time scenarios, but not appropriate in general. For example, integrating a constant function fails to be bounded.

Finally, your last property, “time invariance” could probably be formulated more clearly by making f more explicit (and likely some of your other properties as well.) Let T_s be the time shift operator, T_s(x)(t) = x(t-s). Time invariance is then f(T_s(x)) = T_s(f(x)) It’s hard to see that this is what you are saying. Alternatively, simply slapping a λt. and so forth on the input/output terms would help. The input is the whole function x, as you correctly have it in “causality”, but then you start using the bad mathematician habit of writing f(x) to mean f in time invariance (or at least it’s particularly bad in this case as you don’t do one or the other consistently.)

As far as limitations from “time invariance,” I don’t think it causes any meaningful limitation. You can still make a clock, you just need to be given a start time, but that’s true anyway. Basically, “time invariance” means you can’t depend on the implementation notion of time, but that is, of course, an implementation detail, to the extent that it might not even exist. One consequence of time invariance is that the output of a “system” always depends on its input or is a constant function since the only other thing to depend on is time and that is ruled out.

That’s a lot to think about :)

Regarding infinite memory, you’re right, this is not exactly what I want.

What I DO want is related to “BIBO” (bounded input bounded output), but here I want it only in terms of memory. I want the system to consume an additional finite amount of memory, over every finite time span.

A nice example someone at #math (napping) gave was of a stack machine, with inputs being Push x | Pop and outputs being Pushed | Popped x. Over an infinite time interval this machine consumes infinite memory, which violates what I’ve stated in the post. With the new requirement (bounded memory over bounded times) this is a valid system.

This is a nice summary of the state of our desires, I think.

As for no. 1, it appears that while integral is insensitive, the identity function is not. Also, what about if the input is only a teeny bit earlier or later, should that also produce no/a small change in the output.

I think that for no. 1 we are looking for a notion of continuity. If we use real numbers for time and the scott topology the codomain types, then this means that we may switch abruptly but only if there is a _|_ at a single point while it switches. I am wondering whether we can endow these types with a different topology that captures more closely what we are looking for (doesn’t have _|_s floating around)… but as I think about it I am leaning toward “probably not”.

However, I note that nowhere did you define that our systems should be in continuous time — this description admits a stepwide, discrete model. I think there is no continuity condition then; i.e. I can think of no reasonable analogue to no. 1.

Time invariance is important to me. While you call it a “limitation”, it adds the feature that we may translate systems in time, increasing their composability.

No. 4 is easy to state precisely in a discrete model:

Define SF A B = (T -> A) -> (T -> B)

Every system f :: SF A B can be written as an iteration of some function g :: 2^c × A -> 2^c × B, where c :: Nat.

I am having a very hard time defining it for the continuous case, though.

Thanks for keeping the candle lit! :-)

> I think that for no. 1 we are looking for a notion of continuity. If we use real numbers for time and the scott topology the codomain types…

No, I think Noam’s whole point is that what we really want is not continuity, it’s measurability (in the mathematical sense of measure spaces). That is a really neat solution to the problem of tension between continuous and discrete models – it handles both, and any combination.

The same thing happened with integration. Everyone was stuck for a long time; they felt the need for some notion of continuity, except somehow discrete things didn’t fit in. It was only after Lebesgue changed the focus to measurability instead of continuity that a nice theory was finally achieved.

Regarding inputs that are shifted in time (even a very small amount) – that sort of change is actually more than “almost nowhere” because *every* point in time is different than what it was before the shift. About the rest I’m trying to write a new post.

Check out my new post for a little more discussion on that first property (insensitivity). I should check out what the “scott topology and codomain types” are.

> However, I note that nowhere did you define that our systems should be in continuous time

That’s right, I’m trying to define it in a more general way. Ultimately I want something that at least for the basic properties is defined for both continuous and discrete times.

> Time invariance is important to me. While you call it a “limitation”, it adds the feature that we may translate systems in time, increasing their composability.

That’s cool! I haven’t thought about it that way. Good point.

> No. 4 is easy to state precisely in a discrete model:

>

> Define SF A B = (T -> A) -> (T -> B)

>

> Every system f :: SF A B can be written as an iteration of some function g :: 2^c × A -> 2^c × B, where c :: Nat.

>

> I am having a very hard time defining it for the continuous case, though.

I didn’t get that. Can you explain more?

> Thanks for keeping the candle lit! :-)

I wish it was more like a torch, but yeah :)

I think your “insensitivity” corresponds pretty closely to the mathematical notion of continuity.

Another problem with my formulation: no. 3, “Markovity” is missing the crucial bit that outputs are allowed to depend on previous outputs. So for example a discrete-time system that sums (discrete integration) depends on the previous output and the current input. The condition can still be not to depend on infinitely remote past outputs, as well as inputs.

Does “markovity” cover hidden-markov? i.e it depends on a finite hidden state..

I’m not even sure how to define the first requirement if we allow events, i.e. some form of Dirac deltas. We surely want to be able to describe entities coming to life and dying, but does it count as a major response or not whether the life of an entity depends on some infinitesimal change elsewhere? It seems to depend on the context.

As for direct access to the signals, they are perfectly fine as long as your building blocks and combinators don’t make random sampling possible. For instance, Elerea gives you the signals, but you are forced to sample them at increasing times. You can access the past through delays, and you cannot reach into the future at all.

I can give you a sixth requirement though: scalability. More specifically, doing nothing should cost nothing. This is where both Yampa and Elerea fail, since they always have to explicitly propagate the fact that nothing has happened.

See my next post for how I mean to handle events.

Regarding direct access to signals, if a signal is modeled as a function, direct access to it means you can use it as a function (and therefore sample it). So either use a different model or don’t make it directly available. Forcing sampling on them at increasing times sounds to me like not really exposing the signals, only exposing certain operations on them. I think it’s the same as having a “placeholder” for signals to be used with those operations, which from what I understand – is what you do in Elerea.

Regarding scalability, it’s definitely something we want but it’s an implementation thing, not part of the model, if I understand what you mean.

I don’t really understand the objection to use functions as a model. The primary purpose of the model is to ease understanding, both for the developer and the user of the library. If you add the requirement that the implementation should expose all the operations possible in the model’s domain, then your implementation will be the only possible model of itself.

For instance, a straightforward choice to model discrete streams would be using infinite lists, aka cons streams. A basic operation on lists is tail, which represents a peek into the future, therefore breaks causality. But does that makes cons streams a bad model?

Note that there are two classes of operations that are dual to each other: constructors and destructors. In the case of ADTs, the former means applying data constructors to their arguments, while the latter is pattern matching. However, in the case of functions, lambda abstraction plays the role of the constructor while function application is the destructor (you look inside the function by sampling it).

When you provide an abstract data type, it’s pretty customary to provide a set of combinators to create this type of data, i.e. a rich set of constructors, while you severely limit the ways it can be observed by providing only a small selection of destructors. Elerea gives you a single destructor that can only be used at the top level: it’s essentially a combination of head and tail (or applying to zero and composing with succ in the function model) in the IO monad, which guarantees that it won’t be used within the network.

As for scalability, it’s not just an implementation detail. Some models inherently require you to propagate the fact that there was no event. For instance, you might want to provide a disambiguation operator of the following type:

disamb :: (NonemptyMultiset a -> a) -> Event a -> Event a

We are assuming that simultaneous occurrences are allowed in this event model, and disamb can see these occurrences as multisets (basically lists with undefined order) in order to be able to resolve them. However, to be able to tell what goes in the multiset at any given time, you have to go through all the sources contributing to the event at some point, because you have to synchronise them.

I completely agree about exposing an interface for e.g. functions so that the model includes which primitives are exposed, but still uses functions in the model.

I wasn’t saying that it isn’t a good way to model, just pointing out that there are two options: not use functions at all, or use them and expose a certain interface.

In fact that is exactly what I want to do: use measurable functions (for both event and continuous functions) and expose only causal operations (something akin to scanl) on them.

About scalability, that’s a good point. The model should not make scalability a problem. I don’t know if the model I’m trying to design has this problem or not.

You seem to be allowing the ordered measure space to vary, and that’s nice. We always require causality and insensitivity, but not every application requires a numerical notion of time, for example.

Even if you say bounded input produces bounded output, that means you are still limiting yourself to total functions. I would suggest leaving out this requirement altogether. It’s similar to the issue of Integer vs. Int – really it’s nicest to use Integer always, but sometimes we need Int for optimization.

I’m not sure markovity is needed as a separate condition either – each application will use a measure for which everything outside of its range of consideration has measure zero.

You don’t lose anything by requiring time invariance when T is the real numbers. But you do lose something by requiring it in general, because that means you are requiring T to have a notion of affine translation for every application.

Your Causality property is likely insufficient: For all t in T, output(t) can only depend on input(t-s) where s ≥ 0.

This doesn’t preclude atemporal cycles in the dependency graph. We don’t have to worry about these in the real world due to relativity, since the light cone collapses to a point.

Perhaps a more correct definition would be something like forall t in T, output(t) can only depend on input (t-s) where either s > 0 or s == 0 and input(t) does not transitively depend on output(t).

Good point, but what exactly does “not transitively depend” mean? What is the exact condition we want to avoid?

Pingback: Axioms for FRP: Discussing insensitivity « Enough Blogging!