Into CPS, Never to Return

133 points by g0xA52A2A 4 days ago | 58 comments

In case anyone is wondering, "when would I EVER use this (in hand-written code)?", it's a trick that makes DSL (domain specific language) and small language implementation much easier. A great reference for this is Peter Norvig's Paradigms of Artificial Intelligence Programming, when he implements a subset of Prolog and bolsters the functionality using CPS[1].

The second, although more obscure, is that you can use it in languages that do not have "non-local exits" to terminate a deeply nested computation early or return to an earlier point in the call stack. For example, Clojure does not have nonlocal exits, as only the final form of the function is returned. However, using CPS, you can terminate the expression early and return to the original caller without executing the rest of the function. You probably only want to use this in specialized cases though or it may upset your team, they are tricky to debug.

Lastly and probably most controversially, you can make an extensible "if" statement using CPS if you are in a dynamic language and you have no other tools to do so. Admittedly I do sometimes use this in ClojureScript. This allows you to write "append only" code without continually growing the "cond". Again, most teams don't like this, so it depends on the circumstances, but might be nice to have in your back pocket if you need it.

[1]: https://github.com/norvig/paip-lisp/blob/main/docs/chapter12...

cryptonector 3 days ago | root | parent | next |

I've used something similar to CPS in async I/O programs, where at each point that a function wants more I/O it queues up a continuation for the event and then returns to the event loop. This forces one to represent state in a compact way and to use very little stack space for state representation, all of which reduces per-client memory footprint a great deal.

fweimer 3 days ago | root | parent | next |

What's the alternative? Update the state object associated with the I/O event in place (and unregister/re-register it with the event loop as needed)?

cryptonector 3 days ago | root | parent |

> What's the alternative?

Alternatives include:

  - thread per-client
  - thread per-client (but
    green threads)
  - co-routines

Co-routines are very similar to what I'm already doing, really. For fairly linear code one can easily end up with co-routine functions that are multi-thousand lines of code long -- see PuTTY for example.

> Update the state object associated with the I/O event in place (and unregister/re-register it with the event loop as needed)?

I have a simple data structure to represent the continuation, which is really more like a closure, consisting of `{fd, data, handler_function}` (this being C). Events are typically one-shots. When queuing I/O I just update the handler_function of that struct and then register the one-shot event with epoll, re-using this "closure" object. When it comes time to close a connection and we're done handling events we free (or, rather, put on a free-list) the closure, and destruct the client data structure.

upghost 2 days ago | root | parent |

To be fair, continuations are also the classic way to implement coroutines in languages that don't have them. Probably should've mentioned that in my original comment...

cryptonector 2 days ago | root | parent |

For sure.

upghost 3 days ago | root | parent | prev |

hey that's really clever. was this for open source or commercial? was it received well? what language? Very interesting technique

cryptonector 3 days ago | root | parent |

> was this for open source or commercial?

Proprietary, though I have permission to open source it (but I need to make sure that $WORK picks a license for it).

This was specifically for a poor man's Kafka as an HTTP server that implements "tail -f" semantics for GET when using `Range:` with the end offset left unspecified, and with `{URI, weak-ETag, offset}` resembling the Kafka `{topic, partition, offset}`. Non-range requests end when EOF is reached in the file being fetched. Range requests with the end offset specified end as soon as the requested range has been sent back. Range requests with the end offset left unspecified do not end until the file is unlinked or renamed away, using inotify to detect those cases, and every time data is written to the file that data is immediately sent to the clients that are waiting "at the tail".

The connection acceptor in the event loop creates a client record and queues reads to call the reader continuation which is just `{client, reader}`, then when a request is fully read the reader continuation will queue up a writer continuation to write the response, and so on.

The fact that the underlying abstraction is "just a file" makes this very easy to use because `write(2)` is how you publish data and `rename(2)` and `unlink(2)` is how you "commit transactions" / roll logs in your application, and clients immediately find out. All that completely asynchronously with how the data gets served. You can even script an application, like using bash, since `echo foo >> some_file` is how you publish data.

It's extremely useful.

> what language?

C, specifically for Linux, using epoll and inotify. The server is multi-processed, with each process single-threaded, and it spawns NPROC workers (or however many you specify). But naturally this would work very well in Rust, C++, etc., and it could be multi-threaded instead of multi-processed.

> Very interesting technique

It's just C10K with a dash of CPS :)

In this particular program the sequence of events is pretty linear. Since the progression is fairly linear all the continuation functions are one next to the other and one can just read them linearly. This makes the code quite readable.

upghost 2 days ago | root | parent |

That's cool. So I suppose you are using function pointers for the continuations? That's pretty wild, I hadn't even considered that!

It's also really helpful with defunctionalization; search "defunctionalize the continuation" for stuff on that

I've also used it to make a function tail recursive, which is useful in languages with TCO.

codebje 3 days ago | root | parent |

If your continuation has no state to hold your recursion should already be tail called.

If it does have state, it’s probably a stack allocated closure…

In case anyone is wondering, "when would I EVER use this (in hand-written code)?", it's a trick that makes DSL (domain specific language) and small language implementation much easier.

Sure but that's still saying "this only gets used implementing a language". And that's OK because the continuation construct (A half-executed function passed to another function, wtf) seems like something I'd be horrified to find in the code of a normal application.

codebje 3 days ago | root | parent | next |

Callbacks with closures are continuations with bad syntax. Admittedly one could call those horrifying without being accused widely of hyperbole, so it’s not a great counter to your point.

upghost 3 days ago | root | parent | prev |

I'm not disagreeing with you. The three examples I mentioned were specifically about overcoming the limitations presented by some runtimes. If you have to reach for these, typically you know you are already swimming upstream. But sometimes the price of adopting a new tool is more expensive than using a specialized (hopefully well documented) application of CPS on the existing infrastructure.

In the second example, for instance, you could throw an exception that gets caught higher up the stack. And in fact, that's often how call/cc with lexically scoped continuations are implemented in languages that do not support first class continuations.

In that case, it really comes down to how you and your team feel about "exceptions as flow control".

Of course, exception-based flow control doesn't help in situations where you are deeply recursing and have a limited stack. This is where "trampolining" using CPS is very effective.

Typically languages implementing this go out of their way to hide the CPS though, because most people react to seeing it the way you did. And I don't blame them, it's pretty horrifying as you said!

CPS continuation-passing style

mattalex 3 days ago | root | parent | prev |

This is essentially the principle behind algebraic effects (which, in practice, do get implemented as delimited continuations):

When you have an impure effect (e.g. check a database, generate a random number, write to a file, nondeterministic choices,...), instead of directly implementing the impure action, you instead have a symbol e.g "read", "generate number", ...

When executing the function, you also provide a context of "interpreters" that map the symbol to whatever action you want. This is very useful, since the actual business logic can be analyzed in an isolated way. For instance, if you want to test your application you can use a dummy interpreter for "check database" that returns whatever values you need for testing, but without needing to go to an actual SQL database. It also allows you to switch backends rather easily: If your database uses the symbols "read", "write", "delete" then you just need to implement those calls in your backend. If you want to formally prove properties of your code, you can also do that by noting the properties of your symbols, e.g. `∀ key. read (delete key) = None`.

Since you always capture the symbol using an interpreter, you can also do fancy things like dynamically overriding the interpreter: To implement a seeded random number generator, you can have an interpreter that always overrides itself using the new seed. The interpreter would look something like this

```

Pseudorandom_interpreter(seed)(argument, continuation):

  rnd, new_seed <- generate_pseudorandom(seed, argument)
  with Pseudorandom_interpreter(new_seed):
       continuation(rnd)

```

You can clearly see the continuation passing style and the power of self-overriding your own interpreter. In fact, this is a nice way of handeling state in a pure way: Just put something other than new_seed into the new interpreter.

If you want to debug a state machine, you can use an interpreter like this

``` replace_state_interpreter(state)(new_state, continuation):

  with replace_state_interpreter(new_state ++ state):
       continuation(head state)

```

To trace the state. This way the "state" always holds the entire history of state changes, which can be very nice for debugging. During deployment, you can then replace use a different interpreter

```

replace_state_interpreter(state)(new_state, continuation):

  with replace_state_interpreter(new_state):
       continuation(state)

```

which just holds the current state.

upghost 3 days ago | root | parent |

That's really interesting. This seems like it would be a really good approach to combine something like an otherwise pure finite state machine, but with state transitions that rely on communicating with external systems.

Normally I emit tokens to a stack which are consumed by an interpreter but then it's a bit awkward to feed the results back into the FSM, it feels like decoupling just for the sake of decoupling even though the systems need to be maintained in parallel.

I'll have to explore this approach, thank you!

Joker_vD 3 days ago | prev | next |

There is actually a way to produce CPS without lots of administrative redexes, which uses the same main trick that "higher-order transform" utilizes: tracking the whether the expression being transformed is in tail context or general context, and translating accordingly — but without using meta-continuations because boy are those mind-boggling.

Instead, the normal loops are used, and the subresults are accumulated in some builder structure which is then finalized to produce a complete tree of a CPS expression. The main trick is to figure out just what this builder structure should be, and it turns out it's a tree zipper of sorts!

The margins of this comment are not big enough to sketch the whole thing but it's actually not as formidable as it sounds ("tree zipper", wooooh): it's mostly just a stack of tree-building instructions which can be easily extended/completed. An alternative would be to have a mutable CPS expression as the result, and keep appending things into it, but it requires lots of tree traversal and lot of care to not accidentally leave the thing in not quite constructed state which is way too easy in my experience.

I think I'll make and publish a gist with it when I get home because I don't believe I've ever seen it done this way anywhere.

upghost 3 days ago | root | parent | next |

Sounds like this is for a syntactic or metaprogramming transformation over source code? I assume this implies you are either working with a homoiconic language or you have already converted the source to AST?

However it sounds like a very portable technique, would be interested to see it.

tekknolagi 3 days ago | root | parent | prev |

Please show this gist. I would enjoy linking to it

Joker_vD 3 days ago | root | parent |

Okay, there it is: [0]. It mostly implements a version of CPS transform from "Compiling with Continuations, Continued" by A. Kennedy [1], figure 8. Two main differences in the output language from the one in your post is that, first, this version binds everything to temporaries (including literals and continuations), and second, it uses "let" (with multiple definitions, so it's actually "let*") as the primary name-binding feature instead of immediate application. And I use name "$ret" instead of "$call-cont".

The main point of interest is the use of CpsExprBuilder as an accumulator and explicit looping over sub-expressions instead of using meta-continuations, as is popular in most papers on CPS transformations. This is not the only possible way to implement such CpsExprBuilder: another option would be to have a mutable CPS expression with a pointer into it, and gradually grow it. But since you can't just take a pointer to a list's element in Python, you would probably have to keep a whole path (as, say, an array of indexes), and re-traverse the tree every time you append to it, like this:

    result = {
        'expr': ['let', [['#t0', 2], ['@k0', ['cont', ['#t1'], None]]], ['f', '#t0', '@k0']],
        'path': [1, 1, 1, 2],
    }

    cursor = result['expr']
    for index in result['path'][:-1]:
        cursor = cursor[index]
    cursor[result['path'][-1]] = ['g', '@halt', '#t1']

I've tried this approach, and even when it's encapsulated in a class, it's way too easy to accidentally end up with a random None somewhere deep inside your resulting expression. So instead, the expression's tree is turned inside out, so to speak. It's similar to using an explicit stack instead of using recursion.

[0] https://gist.github.com/Joker-vD/7cddcfada042ed486dd74690ae9...

[1] https://www.microsoft.com/en-us/research/wp-content/uploads/...

tekknolagi 2 days ago | root | parent |

Taking a look now. A couple of questions:

* Why give literal ints their own temporaries? * Why not support lambda expressions? Is it tricky in this approach or just accidentally missing?

Joker_vD a day ago | root | parent |

> Why give literal ints their own temporaries?

Just following the paper's convention (their justification being that now they only need to substitute variables for variables which is simpler), plus this gist is a trimmed down version of my hobby project which had full-blown algebraic data types: the integers were handled by that case as well. You absolutely can merge "int" case with the "var" case above, and use ints as themselves:

    if isinstance(expr, str) or isinstance(expr, int):
        result = CpsExprBuilder()
        return maybe_tail(result, expr)

The second argument in maybe_tail would need a better name though...

> Why not support lambda expressions?

They were missing in your write-up as well :) I just went by your text, and look at the code snippets in it. But adding them looks like this:

    if what == 'lambda':
        args, body = rest
        result = CpsExprBuilder()
        tmp_name, ret = gen_tmp(), gen_cont()
        fun = ['fun', [*args, ret], cps(body, ret)]
        result.add_let_without_body([[tmp_name, fun]])
        return maybe_tail(result, tmp_name)

Some test cases I've tried look like this then:

    ['lambda', ['x'], 'x'] =>
    let
      #t0 = fun(x, @k1):
        $ret @k1(x)
    in $ret @halt(#t0)
    
    [['lambda', ['x'], 'x'], 2] =>
    let
      #t0 = fun(x, @k1):
        $ret @k1(x)
      #t2 = 2
    in #t0(#t2, @halt)
    
    ['let', ['f', ['lambda', ['x'], 'x']], ['f', 2]] =>
    let
      @k0 = cont(f):
        let
          #t3 = 2
        in f(#t3, @halt)
      #t1 = fun(x, @k2):
        $ret @k2(x)
    in $ret @k0(#t1)

By the way, if you don't like creating contiunations just to immediately invoke them, you can adjust make_LET adjusted to flatten that pattern as well: check if the body is '$ret' with the continuation name that's in the defns list, replace body with this continuation's body (with variables substituted!) and throw its definition away.

    [['lambda', ['x'], 'x'], 2] =>
    let
      #t0 = fun(x, @k1):
        $ret @k1(x)
      #t2 = 2
    in #t0(#t2, @halt)
    
    ['let', ['f', ['lambda', ['x'], 'x']], ['f', 2]] =>
    let
      #t1 = fun(x, @k2):
        $ret @k2(x)
      #t3 = 2
    in #t1(#t3, @halt)

But this also throws the programmer-supplied names away during the substitution, sadly. Preferring to keep programmer-supplied names is possible, of course, but it requires both tracking them and making sure they don't collide with temporaries or other programmer-supplied names so... this simplification step is better performed in the optimizer after the CPS-conversion.

purplesyringa 3 days ago | prev | next |

> If you don’t want to do any of this trampoline stuff, you can also do the One Big Switch approach which stuffs each of the funs and conts into a case in a massive switch statement. Calls become gotos.

Just wanted to add that this is very similar to how async/await JavaScript code was compiled for runtimes that didn't support generators back in the day: http://facebook.github.io/regenerator/

assbuttbuttass 4 days ago | prev | next |

CPS is a cool technique, and makes advanced constructs like call/cc trivial to implement. Using CPS techniques in code can feel like magic.

But CPS isn't really the state of the art for functional compilers anymore. It's complex for a human to read and hard to optimize. Something like A-normal form is much easier to work with for a compiler intermediate representation

https://en.wikipedia.org/wiki/A-normal_form

maplant 3 days ago | root | parent | next |

I wouldn’t call either “state of the art”, it all depends on what you’re doing and what you want to achieve. Many compilers use several different IRs to achieve different things

Mathnerd314 3 days ago | root | parent | prev |

If you think ANF is great then explain how to deal with the transformation from "E (if x then a else b)" to "if x then E a else E b".

codebje 3 days ago | root | parent | next |

ANF transform will rewrite that to “let a0 = if x then an else b in E a0”.

This isn’t identical to floating the call to E inside the “if”, obviously, but would have (within predictable boundaries) the same performance. If this code went through LLVM to produce machine code I suspect it would wind up identical as LLVM will likely rewrite the duplicated call to be outside the “if”.

Mathnerd314 2 days ago | root | parent |

What?

Let's say E[h] is "if h == 1 then 2 else 3", a is 1, and b is 2. Then before:

if (if x then 1 else 2) == 1 then 2 else 3

After:

if x then (if 1 == 1 then 2 else 3) else (if 2 == 1 then 2 else 3)

which trivially simplifies to if x then 2 else 3

Your proposed rewrite is "let a0 = if x then 1 else 2 in if a0 == 1 then 2 else 3" which is not a simplification at all - it makes the expression longer and actually introduces a variable indirection, making the expression harder to analyze. The call will require some sort of non-local analysis to pierce through the variable and do the duplication and elimination.

codebje 2 days ago | root | parent |

The ANF form is amenable to rewriting the “if” outside the “let”.

    if x then
      let a0 = a in E a0
    else
      let a0 = b in E a0

If a and b are simple variables or constants the new “let”s will be eliminated by ANF.

What happens to E isn’t so straightforward. In your example, though, it’s a small expression tree and would just be duplicated inline with the same obvious constant reductions available.

If it’s ‘large’ (heuristically) it might instead be assigned to a “let” binding outside the “if”, to avoid duplicating a large block of code.

CPS should be making that same determination on E.

Mathnerd314 14 hours ago | root | parent |

Well, with CPS it just works. You start with

  let
    x k = if x then k 1 else k 2
    E k h = if h == 1 then k 2 else k 3
  in
    \k -> x (E k)

and then when you inline E into x it becomes

  let
    E k h = if h == 1 then k 2 else k 3
  in
    \k -> if x then E k 1 else E k 2

which can again be locally simplified to

  \k -> if x then k 2 else k 3

Unlike ANF, no blind duplication is needed, it is obvious from inspection that E can be reduced. That's why ANF doesn't handle this transformation well - determining which expressions can be duplicated is not straightforward.

mrkeen 3 days ago | root | parent | prev |

See join points: https://dl.acm.org/doi/pdf/10.1145/3062341.3062380

Mathnerd314 2 days ago | root | parent | next |

If there was a clear distinction between let-bindings and join points, I would be happy. But there is not - there is this contification process and, rather than proving that their algorithm finds all join points, they just say "it covers most of the ground [in practice]", and then they cite (Section 4) two other papers that suggest contification is undecidable - whether a function returns to a specific continuation or function is a behavioral property, not syntactic. Even if you accept that the dominator-based analysis is optimal, it is a global property. not a local property like in a proper IR. So what I see is a muddy mess.

codebje 2 days ago | root | parent | prev |

Last I looked, join points hadn't made it into GHC Core directly - do you know if the ideas in the join points paper were introduced to (mainline) GHC in the end?

edit: it looks like there's distinct Core join points now, never mind!

childintime 4 days ago | prev | next |

The author references [scrapscript](https://scrapscript.org/), hopefully we'll hear more about it.

tekknolagi 4 days ago | root | parent | next |

Now that I'm back at a computer:

* https://bernsteinbear.com/blog/scrapscript/

* https://bernsteinbear.com/blog/scrapscript-baseline/

* https://bernsteinbear.com/blog/scrapscript-tricks/

* https://bernsteinbear.com/blog/type-inference/

* https://bernsteinbear.com/blog/row-poly/

and the repo for the main implementation: https://github.com/tekknolagi/scrapscript

Scripted Pipelines in Jenkins are also using CPS in Groovy! https://plugins.jenkins.io/workflow-cps/#plugin-content-tech...

maratc 3 days ago | root | parent |

Declarative Pipelines do that too (i.e. not only Scripted ones.)

tekknolagi 4 days ago | root | parent | prev |

See my other blog posts and GitHub if you would like some more!

upghost 4 days ago | root | parent |

I realize this is something I should probably know but I've never been able to pin down a definition nor have I used a language with this feature -- could you maybe explain what a "content addressable language" is and why that is important? It seems like it's really important but I have no idea why. You explain it here[1] but it's somewhat brief. I realize it probably should be self-evident why it's important but it's not clicking for me.

It's somewhat like when Jack Skelington is trying to explain the meaning of Christmas to the residents of Halloween Town[2] and somehow it hasn't stuck yet.

[1]: https://scrapscript.org/#ContentAddressibleEverything

[2]: https://youtu.be/l6QZUcRJw-A

tekknolagi 3 days ago | root | parent |

To be clear, I am not Taylor and I did not come up with scrapscript! That website belongs to him and I'm just a random guy on the internet who emailed him and built an implementation.

The way I think about it is kind of like software versions. Say you want to do some kind of reproducible software. You can write that your software depends on jazzinator==3.0.0 but that's kind of unsatisfying because it relies either on tools or people to make sure that versions are immutable. It also relies on jazzinator to specify its own dependencies pretty strictly. So maybe you end up inventing the concept of a lockfile to partially solve that problem, but you still have the issue of relying on a third party to not change things up on you.

However, if you can address a library by a name that is computed from its code (a hash), things get interesting. You end up with this tree/DAG of dependencies where every dependency is immutable--you can't change the code without also changing the name. It's a Merkle DAG like Git. You do lose the ability to say things like "above version 2.5"... sort of.

If you also build a ref or tag system (similar to Git refs), you can kinda do normal style versioning on top of this content addressable world. But again, that relies on someone or something (maybe your hash-based store) to keep a map point-version names to hash names.

The other thing that's interesting with scrapscript in particular is since everything is addressable, you can use (sort of implicitly import/download) any value from any other program in a repository readable by you. But no scrapscript implementation fully does this yet, as far as I'm aware. For a fully working system, check out Unison.

upghost 3 days ago | root | parent |

Oh that is really fascinating, thank you. That has some really interesting implications for dependency management that I will have to think through. It sounds like that implies some really useful tree properties on dependency checking, like if the part of the program you are relying on matches a certain hash you don't need to check down the chain because you can be sure the rest of the nodes are as you expect.

Much appreciated!

_zoltan_ 4 days ago | prev | next |

I thought I'm going to read a piece about US's Child Protective Services, and clicked. Was a letdown :-)

MobiusHorizons 4 days ago | root | parent | next |

I was pleased to see something properly technical, given that that is the supposed focus of this forum. I Hadn't ever heard of CPS as an IR, only SSA, and it was very interesting to learn about it.

tekknolagi 3 days ago | root | parent | next |

Funnily enough, both CPS and SSA are US government services

cryptonector 3 days ago | root | parent | prev |

CPS is a predecessor technique to SSA, essentially.

TapamN 4 days ago | root | parent | prev |

I was hoping, but not expecting, for it to be about the Capcom arcade board.

fovc 4 days ago | prev | next |

It’s also useful in typed languages to introduce an existentially quantified type

guerrilla 2 days ago | root | parent |

Where do you recommend reading more on this? I've always wanted to understand that better. From what I understand continuations are dual to functions (of universally quantified types). It would be cool to complete the symmetry, for the aesthetics.

fovc a day ago | root | parent |

Maybe this? https://www.cis.upenn.edu/~bcpierce/tapl/

yapyap 3 days ago | prev | next |

not gonna lie this title made me think this was gonna be some child protective services horror story or something

draw_down 3 days ago | root | parent |

[dead]

psyclobe 4 days ago | prev |

[flagged]

tekknolagi 4 days ago | root | parent | next |

No, and in fact internet searching for continuation-passing style stuff is quite tricky for this reason :(

smitty1e 4 days ago | root | parent |

Ambiguity is heaven for humor and hell for coding.

dylan604 4 days ago | root | parent |

my favorite symbol is ~= for this very reason. it's so ambiguous it means different things in different language

4 days ago | root | parent | prev |

[deleted]