diff options
| author | Garrett D'Amore <garrett@damore.org> | 2018-03-02 10:16:02 -0800 |
|---|---|---|
| committer | Garrett D'Amore <garrett@damore.org> | 2018-03-02 10:16:02 -0800 |
| commit | 05aba898cedc8c2c1d9f1a21f6963e450b3f127c (patch) | |
| tree | d6cc34b19da6342477103155d64af6e93faf922e /RATIONALE.adoc | |
| parent | 7f2b2f174a796132b61e2d0cf7aed94f69e24d88 (diff) | |
| download | nng-05aba898cedc8c2c1d9f1a21f6963e450b3f127c.tar.gz nng-05aba898cedc8c2c1d9f1a21f6963e450b3f127c.tar.bz2 nng-05aba898cedc8c2c1d9f1a21f6963e450b3f127c.zip | |
Move some docs to docs directory, add CONTRIBUTING and templates.
Diffstat (limited to 'RATIONALE.adoc')
| -rw-r--r-- | RATIONALE.adoc | 316 |
1 files changed, 0 insertions, 316 deletions
diff --git a/RATIONALE.adoc b/RATIONALE.adoc deleted file mode 100644 index b4ef5468..00000000 --- a/RATIONALE.adoc +++ /dev/null @@ -1,316 +0,0 @@ -= Rational: Or why am I bothering to rewrite nanomsg? -Garrett D'Amore <garrett@damore.org> -v0.2, February 22, 2018 - - -NOTE: You might want to review - http://nanomsg.org/documentation-zeromq.html[Martin Sustrik's rationale] - for nanomsg vs. ZeroMQ. - - -== Background - -I became involved in the -http://www.nanomsg.org[nanomsg] community back in 2014, when -I wrote https://github.com/go-mangos/mangos[mangos] as a pure -http://www.golang.org[Go] implementation of the wire protocols behind -_nanomsg_. I did that work because I was dissatisfied with the -http://zeromq.org[_ZeroMQ_] licensing model -and the {cpp} baggage that came with it. I also needed something that would -work with _Go_ on http://www.illumos.org[illumos], which at the time -lacked support for `cgo` (so I could not just use an FFI binding.) - - -At the time, it was the only alternate implementation those protocols. -Writing _mangos_ gave me a lot of detail about the internals of _nanomsg_ and -the SP protocols. - -It would not be wrong to say that one of the goals of _mangos_ was to teach -me about _Go_. It was my first non-trivial _Go_ project. - -While working with _mangos_, I wound up implementing a number of additional -features, such as a TLS transport, the ability to bind to wild card ports, -and the ability to determine more information about the sender of a message. -This was incredibly useful in a number of projects. - -I initially looked at _nanomsg_ itself, as I wanted to add a TLS transport -to it, and I needed to make some bug fixes (for protocol bugs for example), -and so forth. - -== Lessons Learned - -Perhaps it might be better to state that there were a number of opportunities -to learn from the lessons of _nanomsg_, as well as lessons we learned while -building _nng_ itself. - -=== State Machine Madness - -What I ran into in _nanomsg_, when attempting to improve it, was a -challenging mess of state machines. _nanomsg_ has dozens of state machines, -many of which feed into others, such that tracking flow through the state -machines is incredibly painful. - -Worse, these state machines are designed to be run from a single worker -thread. This means that a given socket is entirely single theaded; you -could in theory have dozens, hundreds, or even thousands of connections -open, but they would be serviced only by a single thread. (Admittedly -non-blocking I/O is used to let the OS kernel calls run asynchronously -perhaps on multiple cores, but nanomsg itself runs all socket code on -a single worker thread.) - -There is another problem too -- the `inproc` code that moves messages -between one socket and another was incredibly racy. This is because the -two sockets have different locks, and so dealing with the different -contexts was tricky (and consequently buggy). (I've since, I think, fixed -the worst of the bugs here, but only after many hours of pulling out hair.) - -The state machines also make fairly linear flow really difficult to follow. -For example, there is a state machine to read the header information. This -may come a byte a time, and the state machine has to add the bytes, check -for completion, and possibly change state, even if it is just reading a -single 32-bit word. This is a lot more complex than most programmers are -used to, such as `read(fd, &val, 4)`. - -Now to be fair, Martin Sustrik had the best intentions when he created the -state machine model around which _nanomsg_ is built. I do think that from -experience this is one of the most dense and unapproachable parts of _nanomsg_, -in spite of the fact that Martin's goal was precisely the opposite. I -consider this a "failed experiment" -- but hey failed experiments are the -basis of all great science. - -=== Thread Challenges - -While _nanomsg_ is mostly internally single threaded, I decided to try to -emulate the simple architecture of _mangos_ using system threads. (_mangos_ -benefits greatly from _Go_'s excellent coroutine facility.) Having been well -and truly spoiled by _illumos_ threading (and especially _illumos_ kernel -threads), I thought this would be a reasonable architecture. - -Sadly, this initial effort, while it worked, scaled incredibly poorly -- -even so-called "modern" operating systems like _macOS_ 10.12 and _Windows_ 8.1 -simply melted or failed entirely when creating any non-trivial number of -threads. (To me, creating 100 threads should be a no-brainer, especially if -one limits the stack size appropriately. I'm used to be able to create -thousands of threads without concern. As I said, I've been spoiled. -If your system falls over at a mere 200 threads I consider it a toy -implementation of threading. Unfortunately most of the mainstream operating -systems are therefore toy implementations.) - -Chalk up another failed experiment. - -I did find another approach which is discussed further. - -=== File Descriptor Driven - -Most of the underlying I/O in _nanomsg_ is built around file descriptors, -and it's internal usock structure, which is also state machine driven. -This means that implementing new transports which might need something -other than a file descriptor, is really non-trivial. This stymied my -first attempt to add http://www.openssl.org[OpenSSL] support to get TLS -added -- _OpenSSL_ has it's own `struct BIO` for this stuff, and I could -not see an easy way to convert _nanomsg_'s `usock` stuff to accomodate the -`struct BIO`. - -In retrospect, _OpenSSL_ wasn't the ideal choice for an SSL/TLS library, -and we have since chosen another (https://tls.mbed.org[mbed TLS]). -Still, we needed an abstraction model that was better than just file -descriptors for I/O. - -=== Poll - -In order to support use in event driven programming, asynchronous -situations, etc. _nanomsg_ offers non-blocking I/O. In order to make -this work for end-users, a notification mechanism is required, and -nanomsg, in the spirit of following POSIX, offers a notification method -based on `poll(2)` or `select(2)`. - -In order for this to work, it offers up a selectable file descriptor -for send and another one for receive. When events occur, these are -written to, and the user application "clears" these by reading from -them. (This is done on behalf of the application by _nanomsg_'s API calls.) - -This means that in addition to the context switch code, there are not -fewer than 2 extra system calls executed per message sent or received, and -on a mostly idle system as many as 3. This means that to send a message -from one process to another you may have to execute up to 6 extra system -calls, beyond the 2 required to actually send and receive the message. - -NOTE: Its even more hideous to support this on Windows, where there is no - `pipe(2)` system call, so we have to cobble up a loopback TCP connection - just for this event notification, in addition to the system call - explosion. - -There are cases where this file descriptor logic is easier for existing -applications to integrate into event loops (e.g. they already have a thread -blocked in `poll()`.) - -But for many cases this is not necessary. A simple callback mechanism -would be far better, with the FDs available only as an option for code -that needs them. This is the approach that we have taken with _nng_. - -As another consequence of our approach, we do not require file descriptors -for sockets at all, so it is possible to create applications containing -_many_ thousands of `inproc` sockets with no files open at all. (Obviously -if you're going to perform real I/O to other processes or other systems, -you're going to need to have the underlying transport file descriptors -open, but then the only real limit should be the number of files that you -can open on your system. And the number of active connections you can maintain -should ideally approach that system limit closely.) - -=== POSIX APIs - -Another of Martin's goals, which seems worthwhile at first, was the -attempt to provide a familiar POSIX API (based upon the BSD socket API). -As a C programmer coming from UNIX systems, this really attracted me. - -The problem is that the POSIX APIs are actually really horrible. In -particular the semantics around `cmsg` are about as arcane and painful as -one can imagine. Largely, this has meant that extensions to the `cmsg -API simply have not occurred in _nanomsg_. - -The `cmsg` API specified by POSIX is as bad as it is because POSIX had -requirements not to break APIs that already existed, and they needed to -shim something that would work with existing implementations, including -getting across a system call boundary. _nanomsg_ has never had such -constraints. - -Oh, and there was that whole "design by committee" aspect. - -Attempting to retain low numbered "socket descriptors" had its own -problems -- a huge source of use-after-close bugs, which made the -use of `nn_close()` incredibly dangerous for multithreaded sockets. -(If one thread closes and opens a new socket, other threads still using -the old socket might wind up accessing the "new" socket without realizing -it.) - -The other thing is that BSD socket APIs are super familiar to UNIX C -programmers -- but experience with _nanomsg_ has taught us already that these -are actually in the minority of _nanomsg_'s users. Most of our users are -coming to us from {cpp} (object oriented), _Java_, and _Python_ backgrounds. -For them the BSD sockets API is frankly somewhat bizarre and alien. - -With _nng_, we realized that constraining ourselves to the mistakes of the -POSIX API was hurting rather than helping. So _nng_ provides a much friendlier -interface for getting properties associated with messages. - -In _nng_ we also generally try hard to avoid reusing -an identifier until no other option exists. This generally means most -applications won't see socket reuse until billions of other sockets -have been opened. There is little chance for accidental reuse. - - -== Compatibility - -Of course, there are a number of existing _nanomsg_ consumers "in the wild" -already. It is important to continue to support them. So I decided from -the get go to implement a "compatibility" layer, that provides the same -API, and as much as possible the same ABI, as legacy _nanomsg_. However, -new features and capabilities would not necessarily be exposed to the -the legacy API. - -Today _nng_ offers this. You can relink an existing _nanomsg_ binary against -_libnng_ instead of _libnn_, and it usually Just Works(TM). Source -compatibility is almost as easy, although the application code needs to be -modified to use different header files. - -NOTE: I am considering changing the include file in the future so that -it matches exactly the _nanomsg_ include path, so that only a compiler -flag change would be needed. - -== Asynchronous IO - -As a consequence of our experience with threads being so unscalable, -we decided to create a new underlying abstraction modeled largely on -Windows IO completion ports. (As bad as so many of the Windows APIs -are, the IO completion port stuff is actually pretty nice.) Under the -hood in _nng_ all I/O is asynchronous, and we have `nni_aio` objects -for each pending I/O. These have an associated completion routine. - -The completion routines are _usually_ run on a separate worker thread -(we have many such workers; in theory the number should be tuned to the -available number of CPU cores to ensure that we never wait while a CPU -core is available for work), but they can be run "synchronously" if -the I/O provider knows it is safe to do so (for example the completion -is occuring in a context where no locks are held.) - -The `nni_aio` structures are accessible to user applications as well, which can -lead to much more efficient and easier to write asynchronous applications, -and can aid integration into event-driven systems and runtimes, without -requiring extra system calls required by the legacy _nanomsg_ approach. - -There is still performance tuning work to do, especially optimization for -specific pollers like `epoll()` and `kqueue()` to address the C10K problem, -but that work is already in progress. - -== Portability & Embeddability - -A significant goal of _nng_ is to be portable to many kinds of different -kinds of systems, and embedded in systems that do not support POSIX or Win32 -APIs. To that end we have a clear platform portability layer. We do require -that platforms supply entry points for certain networking, synchronization, -threading, and timekeeping functions, but these are fairly straight-forward -to implement on any reasonable 32-bit or 64-bit system, including most -embedded operating systems. - -Additionally, this portability layer may be used to build other kinds of -experiments -- for example it should be relatively straight-forward to provide -a "platform" based on one of the various coroutine libraries such as Martin's -http://libdill.org[libdill] or https://swtch.com/libtask/[libtask]. - -TIP: If you want to write a coroutine-based platform, let me know! - -== New Transports - -The other, most critical, motivation behind _nng_ was to enable an easier -creation of new transports. In particular, one client ( -http://www.capitar.com[Capitar IT Group BV]) -contracted the creation of a http://www.zerotier.com[ZeroTier] transport for -_nanomsg_. - -After beating my head against the state machines some more, I finally asked -myself if it would not be easier just to rewrite _nanomsg_ using the model -I had created for _mangos_. - -In retrospect, I'm not sure that the answer was a clear and definite yes -in favor of _nng_, but for the other things I want to do, it has enabled a -lot of new work. The ZeroTier transport was created with a relatively -modest amount of effort, in spite of being based upon a connectionless -transport. I do not believe I could have done this easily in the existing -_nanomsg_. - -I've since added a rich TLS transport, and have implemented a WebSocket -transport that is far more capable than that in _nanomsg_, as it can -support TLS and sharing the TCP port across multiple _nng_ sockets (using -the path to discriminate) or even other HTTP services. - -There are already plans afoot for other kinds of transports using QUIC -or KCP or SSH, as well as a pure UDP transport. The new _nng_ transport -layer makes implementation of these all fairly straight-forward. - -== HTTP and Other services - -As part of implementing a real WebSocket transport, it was necessary to -implement at least some HTTP capabilities. Rather than just settle for a toy -implementation, _nng_ has a very capable HTTP server and client framework. -The server can be used to build real web services, so it becomes possible -for example to serve static content, REST API, and _nng_ based services -all from the same TCP port using the same program. - -We've also made the WebSocket services fairly generic, which may support -a plethora of other kinds of transports and services. - -There is also a portability layer -- so some common services (threading, -timing, etc.) are provided in the _nng_ library to help make writing -portable _nng_ applications easier. - -It will not surprise me if developers start finding uses for _nng_ that -have nothing to do with Scalability Protocols. - -== Towards _nanomsg_ 2.0 - -It is my intention that _nng_ ultimately replace _nanomsg_. I do think of it -as "nanomsg 2.0". In fact "nng" stands for "nanomsg next generation" in -my mind. Some day before too long I'm hoping that the various website -references to nanomsg my simply be updated to point at _nng_. It is not -clear to me whether at that time I will simply rename the existing -code to _nanomsg_, nanomsg2, or leave it as _nng_. |
