nng - A mirror of https://github.com/nanomsg/nng

	Commit message (Collapse)	Author	Age
*	fixes #352 aio lock is burning hot	Garrett D'Amore	2018-05-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fixes #326 consider nni_taskq_exec_synch() fixes #410 kqueue implementation could be smarter fixes #411 epoll_implementation could be smarter fixes #426 synchronous completion can lead to panic fixes #421 pipe close race condition/duplicate destroy This is a major refactoring of two significant parts of the code base, which are closely interrelated. First the aio and taskq framework have undergone a number of simplifications, and improvements. We have ditched a few parts of the internal API (for example tasks no longer support cancellation) that weren't terribly useful but added a lot of complexity, and we've made aio_schedule something that now checks for cancellation or other "premature" completions. The aio framework now uses the tasks more tightly, so that aio wait can devolve into just nni_task_wait(). We did have to add a "task_prep()" step to prevent race conditions. Second, the entire POSIX poller framework has been simplified, and made more robust, and more scalable. There were some fairly inherent race conditions around the shutdown/close code, where we thought we were synchronizing against the other thread, but weren't doing so adequately. With a cleaner design, we've been able to tighten up the implementation to remove these race conditions, while substantially reducing the chance for lock contention, thereby improving scalability. The illumos poller also got a performance boost by polling for multiple events. In highly "busy" systems, we expect to see vast reductions in lock contention, and therefore greater scalability, in addition to overall improved reliability. One area where we currently can do better is that there is still only a single poller thread run. Scaling this out is a task that has to be done differently for each poller, and carefuly to ensure that close conditions are safe on all pollers, and that no chance for deadlock/livelock waiting for pfd finalizers can occur.
*	fix a number of cppcheck complaints (not all)	Garrett D'Amore	2018-04-24
\|
*	fixes #45 expose aio to applications	Garrett D'Amore	2017-10-25
\| \| \| \| \| \| \| \| \| \|	While here we added a test for the aio stuff, and cleaned up some dead code for the old fd notifications. There were a few improvements to shorten & clean code elsewhere, such as short-circuiting task wait when the task has no callback. The legacy sendmsg() and recvmsg() APIs are still in the socket core until we convert the device code to use the aios.
*	Provide versions of mutex, condvar, and aio init that never fail.	Garrett D'Amore	2017-08-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the underlying platform fails (FreeBSD is the only one I'm aware of that does this!), we use a global lock or condition variable instead. This means that our lock initializers never ever fail. Probably we could eliminate most of this for Linux and Darwin, since on those platforms, mutex and condvar initialization reasonably never fails. Initial benchmarks show little difference either way -- so we can revisit (optimize) later. This removes a lot of otherwise untested code in error cases and so forth, improving coverage and resilience in the face of allocation failures. Platforms other than POSIX should follow a similar pattern if they need this. (VxWorks, I'm thinking of you.) Most sane platforms won't have an issue here, since normally these initializations do not need to allocate memory. (Reportedly, even FreeBSD has plans to "fix" this in libthr2.) While here, some bugs were fixed in initialization & teardown. The fallback code is properly tested with dedicated test cases.
*	Idempotent taskq finalizers.	Garrett D'Amore	2017-08-14
\|
*	Thundering herd kills performance.	Garrett D'Amore	2017-08-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	A little benchmarking showed that we were encountering far too many wakeups, leading to severe performance degradation; we had a bunch of threads all sleeping on the same condition variable (taskqs) and this woke them all up, resulting in heavy mutex contention. Since we only need one of the threads to wake, and we don't care which one, let's just wake only one. This reduced RTT latency from about 240 us down to about 30 s. (1/8 of the former cost.) There's still a bunch of tuning to do; performance remains worse than we would like.
*	Subsystem initialize is idempotent; simplify cleanup.	Garrett D'Amore	2017-08-07
\|
*	Refactor AIO logic to close numerous races and reduce complexity.	Garrett D'Amore	2017-08-04
\| \| \| \| \| \| \| \| \|	This passes valgrind 100% clean for both helgrind and deep leak checks. This represents a complete rethink of how the AIOs work, and much simpler synchronization; the provider API is a bit simpler to boot, as a number of failure modes have been simply eliminated. While here a few other minor bugs were squashed.
*	More reliable taskq fini; avoids deadlock during shutdown.	Garrett D'Amore	2017-08-02
\|
*	Eliminate the separate AIO wake callback, making nni_aio_wait	Garrett D'Amore	2017-07-21
\| \| \| \|	block for any AIO completion.
*	Simpler taskq API.	Garrett D'Amore	2017-07-21
\| \| \| \| \| \| \|	The queue is bound at initialization time of the task, and we call entries just tasks, so we don't have to pass around a taskq pointer across all the calls. Further, nni_task_dispatch is now guaranteed to succeed.
*	Yet more race condition fixes.	Garrett D'Amore	2017-07-20
\| \| \| \| \| \| \| \| \|	We need to remember that protocol stops can run synchronously, and therefore we need to wait for the aio to complete. Further, we need to break apart shutting down aio activity from deallocation, as we need to shut down all async activity before deallocating anything. Noticed that we had a pipe race in the surveyor pattern too.
*	Always run the AIO completion logic.	Garrett D'Amore	2017-07-19
\| \| \| \| \| \| \| \|	We have seen some yet another weird situation where we had an orphaned pipe, which was caused by not completing the callback. If we are going to run nni_aio_fini, we should still run the callback (albeit with a return value of NNG_ECANCELED or somesuch) to be sure that we can't orphan stuff.
*	Give up on uncrustify; switch to clang-format.	Garrett D'Amore	2017-07-10
\|
*	Refactor stop again, closing numerous races (thanks valgrind!)	Garrett D'Amore	2017-06-28
\|
*	Fix taskq_cancel race.	Garrett D'Amore	2017-06-08
\|
*	Fix leaking taskq data.	Garrett D'Amore	2017-03-12
\|
*	Pipeline protocol now entirely callback driven.	Garrett D'Amore	2017-03-04
\|
*	Taskq implementation.	Garrett D'Amore	2017-02-18