Discussion forum for David Beazley

Concurrency, async etc


#1

Hi David/

If you did a concurrency course I would defiantly be ready for the trip to Chicago, others who feel like this would be a good idea?

/Rene


#2

Years ago, I ran a “Concurrency and Distributed Systems” course, but it kind of fell by the wayside. One of the reasons for it falling away was my compilers course. Not so much the contents of compilers, but the structure of running it. Basically, compilers is set up in the form of a week-long project. There’s a bit of lecture and traditional teaching, but mostly it’s about building a compiler. I’ve since been wanting to follow this same project-based structure with some other topics (including networks/concurrency).

Lately (quite recently), I’ve been thinking that such a course might work for a project involving distributed consensus (e.g., Raft or Paxos). I’ve messed around with Raft and found it to significantly more difficult to implement than your standard sort of “echo server” or even an “HTTP server” project. Something like that might work really well for a concurrency/async/networks kind of course.

I’d be curious to get thoughts on that.


#3

Both lxd for its distributed database and juju for its distributed log use hashicorp’s raft. It may be handy (although in go) to see how the raft has been implemented and used in either use case.

Definitely interested in what a python implementation of a raft would look like!


#4

^ hashicorp’s raft in go https://github.com/hashicorp/raft


#5

Building a concurrent, distributed and reliable system would be awesome, I would suggest it included:

  • Concurrent local Node (built on curio, trio, …)
  • State/Event model to implement non-blocking architecture
  • Message Queue implementation
  • How to make all events (Net, Disk, Key, Mouse, Internal messages, Timeout, Exceptions, …) appear on Queue
  • Distributed state (raft, paxos, …)
  • Maybe some Erlang inspired stuff fail fast, reload, restart of Nodes
  • Reliable handling of external web-services/microservices
  • Handling a DB in this environment
  • Handling hot load of code and config
  • Replay eventstream to re-establish state

Just say when I should be in Chicago…

/Rene

PS. Maybe throw in a GIL removal lecture and make it a two week course :slight_smile:


#6

Maybe some Erlang inspired stuff

That reminds me of this.

It would be epic if we had even a broken implementation of this beauty in CPython


#7

I think the trick in a course like this is making sure the project has enough of a well-defined scope to be doable in a week. I also wouldn’t want to do it in a way where there was too much reliance on third-party libraries (I’d want it to be more of a ground-up building of everything needed so that we could explore all of the underlying issues that arise).

I definitely think implementing Raft would be a hard challenge for everyone. Not only is the problem itself hard, the implementation involves a wide variety of extremely tricky edge cases. I’ve been working on an implementation that uses ZeroMQ and I’ve found it to be pretty challenging–and it still doesn’t work (think I need another few days).

In the big picture, I definitely think there are enough “issues” surrounding just Raft all by itself to make for a pretty good course.


#8

Hi! I also emailed you about a Chicago-course on concurrency and related topics. I am definitely waiting to sign up! (It would be the most wonderful thing in the world if the course runs during my spring break in March too)


#9

implementation that uses ZeroMQ and I’ve found it to be pretty challenging

Are you referring to difficulties in using zeromq itself, or the consensus algorithms?


#10

The difficulties are all on the consensus side, not with ZeroMQ (at least not so far). There are a lot of very tricky interactions concerning time (heartbeats, timeouts, retries, etc.). For example, the leader sending out concurrent requests to followers–some of which might be dead or unresponsive. Or the fact that forward progress is made if a simple majority of nodes agree (not necessarily all nodes). Frankly, there is a whole lot of juggling going on with all of the parts and all of the possible failure modes. Holding it all in my head has proven to be challenging.

If I were to do a course, I wouldn’t base it on a ZeroMQ implementation. That’s more of an incidental detail of me playing around right now–you need to have some kind of messaging layer, but it’s not terribly hard to make one from scratch. From a pure teaching perspective, I’d want to build all of the layers from the ground up (including the messaging).


#11

it’s not terribly hard to make one from scratch

<3. This would be really awesome to witness. I have zero low-level message passing experience beyong zeromq, and building one from scratch would be an awesome learning experience!

the leader sending out concurrent requests to followers–some of which might be dead or unresponsive.

I also had this problem of unresponsive clients, handling disconnects, and it was eventually solved by replacing PUB-SUB / PAIR with an async socket, ROUTER-DEALER. Combined with zmq.select, it provided very ergonomic model for this kind of stuff. Don’t know what kind of approach you are taking

Frankly, there is a whole lot of juggling going on with all of the parts and all of the possible failure modes.

Isn’t this a general problem with distributed systems? It seems quite tiring and cumbersome to do all the manual wiring…


#12

I have been working on distributed software for a while now, and I found quite tricky to introduce junior devs (or senior managers) to the various non-trivial aspects of distributed computing, because there is always so much boiler plate to deal with before getting to the actual thing that matters for them (and even more for the mind bending bits, distribution fallacies, etc.).

I would love to know of some resources to bring junior devs to distributed computing from scratch, step by step. One weekly project after another would be a nice way to structure it, with a long term overarching goal.

What I had in mind as a first goal would be a kind of distributed chat / shell / repl interface, using a very simple custom command/programming language, maybe with a logic relying on interval tree clocks, CRDTs, distributed hashtable, blockchain, . In the DIY open spirit, I started a few projects on github, slowly moving, toying on this idea :

  • pyros-dev/pyzmp : a kind of communicating processes framework (using zmq with pickle or protobuf). Currently used for ROS/web applications and might need to be rethought/redesigned/rewritten eventually, maybe using something like autobahn.

  • asmodehn/replator : importing custom code file, using a lalr parser in python, and providing a REPL running your custom DSL implemented in usual python.

  • Recently I have been thinking about writing something around https://github.com/python-trio/trio-click to have async repl to do network communication / remote interaction directly from a repl. Obviously something like that is also doable around curio :wink:

  • Another thing could be hardware : developing/interacting with distributed software on IoT devices can be motivating in itself, because the usual “computer” environment is stripped down, and we can adjust student/user expectations, and focus on the bits that actually matter. Input/output is the only thing needed.
    And we can link it with “real world” experiment/entertainment : “Destroy this device, it still works ! Cut this cable, it still works !”