| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
|
| |
This further limits some of the thread counts, but principally it
offers a new runtime facility, nng_init_set_parameter(), which can
be used to set certain runtime parameters on the number of threads,
provided it is called before the rest of application start up.
This facility is quite intentionally "undocumented", at least for now,
as we want to limit our commitment to it. Still this should be helpful
for applications that need to reduce the number of threads that are
created.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Credit goes to Wu Xuan (@willwu1217) for diagnosing and proposing
a fix as part of #1695. This approach takes a revised approach
to avoid adding extra memory, and it also is slightly faster as we
do not need to update both pointers in the linked list, by reusing
the reap node.
As part of this a new internal API, nni_aio_completions, is introduced.
In all likelihood we will be able to use this to solve some similar
crashes in other areas of the code.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change makes expire threads tunable follows the same strategy as
taskq threads tunables.
Add NNG_NUM_EXPIRE_THREADS to override the default behavior (`n_cpu`
expire threads).
The NNG_MAX_EXPIRE_THREADS limit is always applied if > 0, even if you
specify the desired number of threads using NNG_NUM_EXPIRE_THREADS.
NNG_EXPIRE_THREADS is not used anymore. This was only referenced in the
code but never defined on CMake.
The logic to cap expire threads between 1 and 256 was removed. If users
set no limits, whatever value they choose will be used instead of being
silently overridden by us.
|
| | |
|
| |
|
|
| |
This reverts commit 8461c7207b440f5ba8c51b2236fcfa178f415a6f.
|
| |
|
| |
fixes #1543 by aborting tasks that may have been prepped, but not yet started.
|
| |
|
|
|
|
|
| |
This introduces a new API, nng_aio_busy(), that can be used
to query the status of the aio without blocking.
Some minor documentation fixes are included.
|
| |
|
|
|
|
| |
This takes one less parameter, and is simpler. It will let us
reclaim the aio_prov_extra data space as well, so that we can
use it for other purposes.
|
| | |
|
| |
|
|
|
|
| |
There were several problems with the array implementation, both
from performance and from correctness. This corrects those errors
(hopefully) and restores the expiration lists as linked lists.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
fixes #1048 nng_aio reuse error messages are unhelpful
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
fixes #1317 IPv6 listener get port is incorrect
fixes #1319 Want symbolic service names
This is phase 1 of reducing the memory foot-print of aios, and
also of pipes. This removes the largest consumer the socket
address information, from the aio, which was only used by a few
consumers.
|
| |
|
|
|
|
| |
This also exposes an nng_thread_set_name() function for
applications to use. All NNG thread names start with "nng:".
Note that support is highly dependent on the operating system.
|
| |
|
|
| |
fixes #1219 nng_close occasionally hang on Windows
|
| | |
|
| |
|
|
|
| |
This only does it for rep, but it also has changes that should increase
the overall test coverage for the REP protocol
|
| | |
|
| |
|
|
| |
fixes #1097 aio prov_data not used at all
|
| | |
|
| |
|
|
|
|
|
|
|
| |
This is a major change, and includes changes to use a polymorphic
stream API for all transports. There have been related bugs fixed
along the way. Additionally the man pages have changed.
The old non-polymorphic APIs are removed now. This is a breaking
change, but the old APIs were never part of any released public API.
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
| |
This changes the signature of the aio cancellation routines
to take the argument for cancellation directly, so we do not
need to lookup the argument using the nni_aio_get_prov_data.
We should probably consider eliminating nni_aio_get_prov_data,
and co, and changing the prov_extra to reflect prov_data. Later.
|
| | |
|
| | |
|
| |
|
|
|
| |
This changeset needs work. We are seeing errors described by
This reverts commit d7f7c896c0ede24249ef63b1e45b1878bf4bd473.
|
| |
|
|
|
|
|
|
|
|
| |
fixes #208 pipe start should occur before connect / accept
fixes #616 Race condition closing between header & body
This refactors the transports to handle their own connection
handshaking before passing the pipe to the socket. This
changes and simplifies the setup. This also fixes a rather
challenging race condition described by #616.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes #179 DNS resolution should be done at connect time
fixes #586 Windows IO completion port work could be better
fixes #339 Windows iocp could use synchronous completions
fixes #280 TCP abstraction improvements
This is a rather monstrous set of changes, which refactors TCP, and
the underlying Windows I/O completion path logic, in order to obtain
a cleaner, simpler API, with support for asynchronous DNS lookups performed
on connect rather than initialization time, the ability to have multiple
connects or accepts pending, as well as fewer extraneous function calls.
The Windows code also benefits from greatly reduced context switching,
fewer lock operations performed, and a reduced number of system calls
on the hot code path. (We use automatic event resetting instead of manual.)
Some dead code was removed as well, and a few potential edge case leaks
on failure paths (in the websocket code) were plugged.
Note that all TCP based transports benefit from this work. The IPC code
on Windows still uses the legacy IOCP for now, as does the UDP code (used
for ZeroTier.) We will be converting those soon too.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
This changes nni_aio_begin so that it immediately terminates when
it encounters aio->a_closed, much like it does for aio->a_stop.
The semantic for nni_aio_close() is supposed to be like nni_aio_stop(),
but without blocking.
I suspect that this might be responsible for use-after-free bugs that
seem to have been rearing their head lately.
|
| |
|
|
|
|
|
|
|
|
| |
Essentially, if we're destroying an aio, and we are doing so from the
thread that is running the callback, then we should defer the destruction
of the task until it returns.
Note that calling nni_aio_wait() or anything else that calls it from
the callback is still verboten and will result in a single party
deadlock.
|
| |
|
|
|
|
|
|
| |
This changes the array of flags, which was confusing, brittle, and
racy, into a much simpler reference (busy) count on the task structures.
This allows us to support certain kinds of "reentrant" dispatching,
where either a synchronous or asynchronous task can reschedule / dispatch
itself. The new code also helps reduce certain lock pressure, as a bonus.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes #429 async websocket reap leads to crash
This tightens up the code for shutdown, ensuring that transport
callbacks are completely stopped before advancing to the next step
of teardown of transport pipes or endpoints.
It also fixes a problem where task_wait would sometimes get "stuck"
as tasks transitioned between asynch and synchronous completions.
Finally, it saves a few cycles by only calling a cancellation callback
once during cancellation of an aio.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* fixes #419 want to nni_aio_stop without blocking
This actually introduces an nni_aio_close() API that causes
nni_aio_begin to return NNG_ECLOSED, while scheduling a callback
on the AIO to do an NNG_ECLOSED as well. This should be called
in non-blocking close() contexts instead of nni_aio_stop(), and
the cases where we call nni_aio_fini() multiple times are updated
updated to add nni_aio_stop() calls on all "interlinked" aios before
finalizing them.
Furthermore, we call nni_aio_close() as soon as practical in the
close path. This closes an annoying race condition where the
callback from a lower subsystem could wind up rescheduling an
operation that we wanted to abort.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes #326 consider nni_taskq_exec_synch()
fixes #410 kqueue implementation could be smarter
fixes #411 epoll_implementation could be smarter
fixes #426 synchronous completion can lead to panic
fixes #421 pipe close race condition/duplicate destroy
This is a major refactoring of two significant parts of the code base,
which are closely interrelated.
First the aio and taskq framework have undergone a number of simplifications,
and improvements. We have ditched a few parts of the internal API (for
example tasks no longer support cancellation) that weren't terribly useful
but added a lot of complexity, and we've made aio_schedule something that
now checks for cancellation or other "premature" completions. The
aio framework now uses the tasks more tightly, so that aio wait can
devolve into just nni_task_wait(). We did have to add a "task_prep()"
step to prevent race conditions.
Second, the entire POSIX poller framework has been simplified, and made
more robust, and more scalable. There were some fairly inherent race
conditions around the shutdown/close code, where we *thought* we were
synchronizing against the other thread, but weren't doing so adequately.
With a cleaner design, we've been able to tighten up the implementation
to remove these race conditions, while substantially reducing the chance
for lock contention, thereby improving scalability. The illumos poller
also got a performance boost by polling for multiple events.
In highly "busy" systems, we expect to see vast reductions in lock
contention, and therefore greater scalability, in addition to overall
improved reliability.
One area where we currently can do better is that there is still only
a single poller thread run. Scaling this out is a task that has to be done
differently for each poller, and carefuly to ensure that close conditions
are safe on all pollers, and that no chance for deadlock/livelock waiting
for pfd finalizers can occur.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This closes a fundamental flaw in the way aio structures were
handled. In paticular, aio expiration could race ahead, and
fire before the aio was properly registered by the provider.
This ultimately led to the possibility of duplicate completions
on the same aio.
The solution involved breaking up nni_aio_start into two functions.
nni_aio_begin (which can be run outside of external locks) simply
validates that nni_aio_fini() has not been called, and clears certain
fields in the aio to make it ready for use by the provider.
nni_aio_schedule does the work to register the aio with the expiration
thread, and should only be called when the aio is actually scheduled
for asynchronous completion. nni_aio_schedule_verify does the same thing,
but returns NNG_ETIMEDOUT if the aio has a zero length timeout.
This change has a small negative performance impact. We have plans to
rectify that by converting nni_aio_begin to use a locklesss flag for
the aio->a_fini bit.
While we were here, we fixed some error paths in the POSIX subsystem,
which would have returned incorrect error codes, and we made some
optmizations in the message queues to reduce conditionals while holding
locks in the hot code path.
|
| |
|
|
|
|
|
| |
This technically doesn't need the lock, as we're only trying to catch
a possible problem during development rather than in the field (and
this should never occur), but the false positive in valgrind obscures
other possible errors, so leave it in the lock.
|