Blue Sky: Difference between revisions
No edit summary |
m (moved CpsParallelism to Blue Sky) |
(No difference)
|
Revision as of 19:21, 22 March 2011
Yesterday in our meeting, Jacob and I went briefly off on a tangent before being hauled back to earth by more sober-minded folks. Nevertheless, it was a cool idea and I spend a little quality time (in bed at 4am) thinking some more about it.
I don't recall how it arose, but the basic idea was to convert programs to continuation-passing style (CPS) and send one-way messages (closures, with context + continuation address) in place of shared memory references. So all threads migrate all the time, chasing down the values of shared variables to build up context, then performing some computation whenever adequate values have been collected (in whatever core happened to be managing the last memory accessed), and continuing.
Perhaps like Active Messages, though I don't know much about them.
I can't explain CPS in a short note, but I think it's pretty familiar to most folks. Here's a trivial example of what I mean.
Imagine we want to add a couple of integers stored in shared memory, say X and Y. First, let's think about what we'll do with the sum when we have it. In best Lisp style, we'll call a function named "continuation" and pass the sum as an argument. OK, now let's begin by gen'ing up a message to the processor that stores A (or rather, the core that manages the memory where A is stored, we'll call it procA). The message will have the address of A plus a pointer to some code.
In procA, the message comes along and procA executes a little code loading the value of A from the memory it controls and gen'ing up a message to procB. The message will have a the address of B, the value of A, and a pointer to some code.
In procB, the message comes along and procB executes a little code loading the value of B from the memory it controls. Since procB has the values of both A and B handy, we can add those values, then jump to "continuation", passing in the sum we just computed.
Note that this "gen'ing up a message" phrase is an entirely local computation, involving no references to shared memory, so it should run pretty fast on an x86 core.
No stacks are maintained, there are no calls, read and writes are always performed locally and the computation continues on the core where the memory reference was handled. Same for AMOs. (and since the delegate manages all references to its share of global memory, sequential consistency is easy, even without fences and such). A core never replies to a message.
I expect we could hack up an example pretty quickly, compiling a Scheme-like language. When I say scheme-like, I don't mean lists, I mean Scheme's control structures (nested functions, calls, if-then-else, global variables, ...) plus something like futures for parallelism.
It's perhaps a weird idea, passing along the context to the variable, but if messages have high overhead, then saving half the messages makes it seem like it could be worthwhile.
Active Messages are more of an implementation technique. Seem like a natural way to implement this idea.
A lot of old Cilk ideas seem related. Need to read more.
If we have full bits, it's pretty simple to manipulate queues of tasks waiting at a sync variable, since the complete context is available locally. Note that both the cotext and the queue are in private memory.