diff options
| author | Garrett D'Amore <garrett@damore.org> | 2017-10-18 14:06:32 -0700 |
|---|---|---|
| committer | Garrett D'Amore <garrett@damore.org> | 2017-10-18 14:06:32 -0700 |
| commit | 6638b37c5e4aa5e1357d2ba06b9953df4d6d87f3 (patch) | |
| tree | e317f09900f2c672b7785bf60550b0b6a080f377 | |
| parent | 21c69e7398d7dad6bd2cc4bb4e766770d75320bd (diff) | |
| download | nng-6638b37c5e4aa5e1357d2ba06b9953df4d6d87f3.tar.gz nng-6638b37c5e4aa5e1357d2ba06b9953df4d6d87f3.tar.bz2 nng-6638b37c5e4aa5e1357d2ba06b9953df4d6d87f3.zip | |
fixes #114 docs: Major differences and motivation vs. nanomsg
| -rw-r--r-- | RATIONALE.adoc | 260 |
1 files changed, 260 insertions, 0 deletions
diff --git a/RATIONALE.adoc b/RATIONALE.adoc new file mode 100644 index 00000000..9c21ce3f --- /dev/null +++ b/RATIONALE.adoc @@ -0,0 +1,260 @@ +Rational: Or why am I bothering to rewrite nanomsg? +=================================================== +Garrett D'Amore <garrett@damore.org> +v0.1, October 18, 2017 + + +NOTE: You might want to review + http://nanomsg.org/documentation-zeromq.html[Martin Sustrik's rationale] + for nanomsg vs. ZeroMQ. + + +Background +---------- + +I became involved in the nanomsg community back in 2014, when +I wrote https://github.com/go-mangos/mangos[mangos] as a pure Golang +implementation of the wire protocols behind nanomsg. I did that work +because I was dissatisfied with ZeroMQ's licensing and C++ baggage, and +I needed something that would work with Golang on Solaris, which at the time +lacked support for cgo (so I could not just use an FFI binding.) This gave +me a lot of detail about the internals of nanomsg and the SP protocols. +At the time, it was the only alternate implementation those protocols. + +It would not be wrong to say that one of the goals of mangos was to teach +me about golang. It was my first non-trivial golang project. + +While working with mangos, I wound up implementing a number of additional +features, such as a TLS transport, the ability to bind to wild card ports, +and the ability to determine more information about the sender of a message. +This was incredibly useful in a number of projects. + +I initially looked at nanomsg itself, as I wanted to add a TLS transport +to it, and I needed to make some bug fixes (for protocol bugs for example), +and so forth. + + +State Machine Madness +--------------------- + +What I ran into was a challenging mess of state machines. +nanomsg has dozens of state machines, many of which feed into others, +such that tracking flow through the state machines is incredibly painful. + +Worse, these state machines are designed to be run from a single worker +thread. This means that a given socket is entirely single theaded; you +could in theory have dozens, hundreds, or even thousands of connections +open, but they would be serviced only by a single thread. (Admittedly +non-blocking I/O is used to let the OS kernel calls run asynchronously +perhaps on multiple cores, but nanomsg itself runs all socket code on +a single worker thread.) + +There is another problem too -- the inproc code that moves messages +between one socket and another was incredibly racy. This is because the +two sockets have different locks, and so dealing with the different +contexts was tricky (and consequently buggy). (I've since, I think, fixed +the worst of the bugs here, but not after many hours of pulling out hair.) + +The state machines also make fairly linear flow really difficult to follow. +For example, there is a state machine to read the header information. This +may come a byte a time, and the state machine has to add the bytes, check +for completion, and possibly change state, even if it is just reading a +single 32-bit word. This is a lot more complex than most programmers are +used to, such as `read(fd, &val, 4)`. + +Now to be fair, Martin Sustrik had the best intentions when he created the +state machine model around which nanomsg is built. I do think that from +experience this is one of the most dense and unapproachable parts of nanomsg, +in spite of the fact that Martin's goal was precisely the opposite. I +consider this a "failed experiment" -- but hey failed experiments are the +basis of all great science. + + +Thread Challenges +----------------- + +While nanomsg is mostly internally single threaded, I decided to try to +emulate the simple architecture of mangos using system threads. (mangos +has the benefit of goroutines.) Having been well and truly spoiled by +Solaris/illumos threading (and especially Solaris/illumos kernel threads), +I thought this would be a reasonable architecture. + +Sadly, this initial effort, while it worked, scaled incredibly poorly -- +even so-called "modern" operating systems like macOS 10.12 and Windows 8.1 +simply melted or failed entirely when creating any non-trivial number of +threads. (To me, creating 100 threads should be a no-brainer, especially if +one limits the stack size appropriately. I'm used to be able to create +thousands of threads without concern. As I said, I've been spoiled. +If your system falls over at a mere 200 threads I consider it a toy +implementation of threading. Unfortunately most of the mainstream operating +systems are therefore toy implementations.) + +Chalk up another failed experiment. + +I did find another approach which is discussed further. + + +File Descriptor Driven +---------------------- + +Most of the underlying I/O in nanomsg is built around file descriptors, +and it's internal usock structure, which is also state machine driven. +This means that implementing new transports which might need something +other than a file descriptor, is really non-trivial. This stymied my +first attempt to add OpenSSL support to get TLS added -- OpenSSL has +it's own struct BIO for this stuff, and I could not see an easy way to +convert nanomsg's usock stuff to accomodate the struct BIO. + + +Poll +---- + +In order to support use in event driven programming, asynchronous +situations, etc. nanomsg offers non-blocking I/O. In order to make +this work for end-users, a notification mechanism is required, and +nanomsg, in the spirit of following POSIX, offers a notification method +based on poll(2) or select(2). + +In order for this to work, it offers up a selectable file descriptor +for send and another one for receive. When events occur, these are +written to, and the user application "clears" these by reading from +them. (This is done on behalf of the application by nanomsg's API calls.) + +This means that in addition to the context switch code, there are not +fewer than 2 extra system calls executed per message sent or received, and +on a mostly idle system as many as 3. This means that to send a message +from one process to another you may have to execute up to 6 extra system +calls, beyond the 2 required to actually send and receive the message. + +There are cases where this file descriptor logic is easier for existing +applications to integrate into event loops (e.g. they already have a thread +blocked in poll().) + +But for many cases this is not necessary. A simple callback mechanism +would be far better, with the FDs available only as an option for code +that needs them. This is the approach that we have taken with nng. + +As another consequence of our approach, we do not require file descriptors +for sockets at all, so it is possible to create applications containing +many thousands of inproc sockets with no files open at all. (Obviously +if you're going to perform real I/O to other processes or other systems, +you're going to need to have the underlying transport file descriptors +open.) + + +POSIX APIs +---------- + +Another of Martin's goals, which seems worthwhile at first, was the +attempt to provide a familiar BSD POSIX API. As a C programmer, this +really attracted me. + +The problem is that the POSIX APIs are actually really horrible. In +particular the semantics around cmsg are about as arcane and painful as +one can imagine. Largely, this has meant that extensions to the cmsg +API simply have not occurred in nanomsg. + +The cmsg API specified by POSIX is as bad as it is because POSIX had +requirements not to break APIs that already existed, and they needed to +shim something that would work with existing implementations, including +getting across a system call boundary. nanomsg has never had such +constraints. + +Attempting to retain low numbered "socket descriptors" had its own +problems -- a huge source of use-after-close bugs, which made the +use of nn_close() incredibly dangerous for multithreaded sockets. +(If one thread closes and opens a new socket, other threads still using +the old socket might wind up accessing the "new" socket without realizing +it.) + +The other thing is that BSD socket APIs are super familiar to UNIX C +programmers -- but experience with nanomsg has taught us already that these +are actually in the minority of nanomsg's users. Most of our users are +coming to us from C++ (object oriented), Java, and Python backgrounds. +For them the BSD sockets API is frankly somewhat bizarre and alien. + +With nng, we realized that constraining ourselves to the mistakes of the +POSIX API was hurting rather than helping. So nng provides a much friendlier +interface for getting properties associated with messages. + +In nng we also generally try hard to avoid reusing +an identifier until no other option exists. This generally means most +applications won't see socket reuse until billions of other sockets +have been opened. + + +Compatibility +------------- + +Of course, there are a number of existing nanomsg consumers "in the wild" +already. It is important to continue to support them. So I decided from +the get go to implement a "compatibility" layer, that provides the same +API, and as much as possible the same ABI, as legacy nanomsg. However, +new features and capabilities would not necessarily be exposed to the +the legacy API. + +Today nng offers this. You can relink an existing nanomsg binary against +nng instead of nanomsg, and it usually Just Works. Source compatibility +is almost as easy, although the application code needs to be modified +to use different header files. (Note: I am considering changing this +in the future.) + + +Asynchronous IO +--------------- + +As a consequence of our experience with threads being so unscalable, +we decided to create a new underlying abstraction modeled largely on +Windows IO completion ports. (As bad as so many of the Windows APIs +are, the IO completion port stuff is actually pretty nice.) Under the +hood in nng all I/O is asynchronous, and we have `struct nni_aio` objects +for each pending I/O. These have an associated completion routine. + +The completion routines are _usually_ run on a separate worker thread +(we have many such workers; in theory the number should be tuned to the +available number of CPU cores to ensure that we never wait while a CPU +core is available for work), but they can be run "synchronously" if +the I/O provider knows it is safe to do so (for example the completion +is occuring in a context where no locks are held.) + +There is still a lot of performance tuning work to do, and there is a +real need to expose the AIO structures to user code, which is something +that I will be doing in the coming days. This should enable a much +faster and lighter weight I/O model, especially for event driven programs. + + +New Transports (ZeroTier) +------------------------- + +The other, most critical, motivation behind nng was to enable an easier +creation of new transports. In particular, one client (Capitar IT Group BV) +contracted the creation of a ZeroTier transport for nanomsg. + +After beating my head against the state machines some more, I finally asked +myself if it would not be easier just to rewrite nanomsg using the model +I had created for mangos. + +In retrospect, I'm not sure that the answer was a clear and definite yes +in favor of nng, but for the other things I want to do, it has enabled a +lot of new work. The ZeroTier transport was created with a relatively +modest amount of effort, in spite of being based upon a connectionless +transport. I do not believe I could have done this easily in the existing +nanomsg. + +In the coming weeks I expect to create transports for TLS, websocket, +and websocket over TLS. I expect the websocket transport to support +multiple sockets with different path components bound to the same TCP port. +This might be possible with nanomsg, but it wouldn't be easy. With nng, +it looks fairly straight-forward. + + +Towards nanomsg 2.0 +------------------- + +It is my intention that nng ultimately replace nanomsg. I do think of it +as "nanomsg 2.0". In fact "nng" stands for "nanomsg next generation" in +my mind. Some day before too long I'm hoping that the various website +references to nanomsg my simply be updated to point at nng. It is not +clear to me whether at that time I will simply rename the existing +code to nanomsg, nanomsg2, or leave it as nng. (One possible issue is +that Martin Sustrik retains the trademark for "nanomsg".) |
