| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This closes a fundamental flaw in the way aio structures were
handled. In paticular, aio expiration could race ahead, and
fire before the aio was properly registered by the provider.
This ultimately led to the possibility of duplicate completions
on the same aio.
The solution involved breaking up nni_aio_start into two functions.
nni_aio_begin (which can be run outside of external locks) simply
validates that nni_aio_fini() has not been called, and clears certain
fields in the aio to make it ready for use by the provider.
nni_aio_schedule does the work to register the aio with the expiration
thread, and should only be called when the aio is actually scheduled
for asynchronous completion. nni_aio_schedule_verify does the same thing,
but returns NNG_ETIMEDOUT if the aio has a zero length timeout.
This change has a small negative performance impact. We have plans to
rectify that by converting nni_aio_begin to use a locklesss flag for
the aio->a_fini bit.
While we were here, we fixed some error paths in the POSIX subsystem,
which would have returned incorrect error codes, and we made some
optmizations in the message queues to reduce conditionals while holding
locks in the hot code path.
|
| |
|
|
|
|
|
| |
This technically doesn't need the lock, as we're only trying to catch
a possible problem during development rather than in the field (and
this should never occur), but the false positive in valgrind obscures
other possible errors, so leave it in the lock.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This provides context support for REQ and REP sockets.
More discussion around this is in the issue itself.
Optionally we would like to extend this to the surveyor pattern.
Note that we specifically do not support pollable descriptors
for non-default contexts, and the results of using file descriptors
for polling (NNG_OPT_SENDFD and NNG_OPT_RECVFD) is undefined.
In the future, it might be nice to figure out how to factor in
optional use of a message queue for users who want more buffering,
but we think there is little need for this with cooked mode.
|
| |
|
|
|
|
|
|
| |
fixes #325 synchronous aio completion crash
fixes #327 move nni_clock() operations to outside the nni_aio_lk.
This work was done for the context tree, and is necessary to properly
enable that branch.
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first problem was that using nng_sleep_aio
was found to reset the timeout, and this caused subsequent
operations to start failing with timeouts when reusing
the AIO for other operations.
The second thing is that we think it would be nicer if the
presence of real aio timeouts were still honored, so that
if the timeout is shorter than the sleep time, then we get
back an NNG_ETIMEDOUT like every other operation, and we
get back a 0 if the logical sleep operation completes
normally.
|
| | |
|
| |
|
|
|
|
|
| |
We enabled verbose compiler warnings, and found a lot of issues.
Some of these were even real bugs. As a bonus, we actually save
some initialization steps in the compat layer, and avoid passing
some variables we don't need.
|
| |
|
|
|
|
|
|
| |
This addresses the use of the pipe special field, and eliminates it.
The message APIs (recvmsg, sendmsg) need to be updated as well still,
but I want to handle that as part of a separate issue.
While here we fixed various compiler warnings, etc.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces enough of the HTTP API to support fully server
applications, including creation of websocket style protocols,
pluggable handlers, and so forth.
We have also introduced scatter/gather I/O (rudimentary) for
aios, and made other enhancements to the AIO framework. The
internals of the AIOs themselves are now fully private, and we
have eliminated the aio->a_addr member, with plans to remove the
pipe and possibly message members as well.
A few other minor issues were found and fixed as well.
The HTTP API includes request, response, and connection objects,
which can be used with both servers and clients. It also defines
the HTTP server and handler objects, which support server applications.
Support for client applications will require a client object to be
exposed, and that should be happening shortly.
None of this is "documented" yet, bug again, we will follow up shortly.
|
| | |
|
| |
|
|
|
| |
fixes #210 Want NNG_OPT_TLS_* options for TLS transport
fixes #212 Eliminate a_endpt member of aio
|
| |
|
|
|
|
|
| |
I'm pretty sure I need to go back and review the handling of
send messages for websocket too. We still have a receive leak
in websocket and leaks caused by the new URL parsing code which
needs to be refactored.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a rather large changeset -- it fundamentally adds websocket
transport, but as part of this changeset we added a generic framework
for both HTTP and websocket. We also made some supporting changes to
the core, such as changing the way timeouts work for AIOs and adding
additional state keeping for AIOs, and adding a common framework for
deferred finalization (to avoid certain kinds of circular deadlocks
during resource cleanup). We also invented a new initialization framework
so that we can avoid wiring in knowledge about them into the master
initialization framework.
The HTTP framework is not yet complete, but it is good enough for simple
static serving and building additional services on top of -- including
websocket. We expect both websocket and HTTP support to evolve
considerably, and so these are not part of the public API yet.
Property support for the websocket transport (in particular address
properties) is still missing, as is support for TLS.
The websocket transport here is a bit more robust than the original
nanomsg implementation, as it supports multiple sockets listening at
the same port sharing the same HTTP server instance, discriminating
between them based on URI (and possibly the virtual host).
Websocket is enabled by default at present, and work to conditionalize
HTTP and websocket further (to minimize bloat) is still pending.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We introduced richer, deeper tests for UDP functionality.
These tests uncovered a number of issues which this commit fixes.
The Windows IOCP code needs to support multiple aios on a single
nni_win_event. A redesign of the IOCP handling addresses that.
The POSIX UDP code also needed fixes; foremost among them is the
fact that the UDP file descriptor is not placed into non-blocking
mode, leading to potential hangs.
A number of race conditions and bugs along the implementation of
the above items were uncovered and fixed. To the best of our knowledge
the current code is bug-free.
|
| | |
|
| |
|
|
|
|
|
|
|
| |
We add a flag (auto-clearing) that can be set on an AIO to indicate
that the AIO should not processed asynchronously on a taskq. This
can be used to enhance performance in some cases, but it can also
be used to permit an AIO be destroyed from a completion callback.
(For the latter, the callback must execute the new nni_aio_fini_cb()
routine, which destroys the AIO without waiting for it to finish.)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
We allocate AIO structures dynamically, so that we can use them
abstractly in more places without inlining them. This will be used
for the ZeroTier transport to allow us to create operations consisting
of just the AIO. Furthermore, we provide accessors for some of the
aio members, in the hopes that we will be able to wrap these for
"safe" version of the AIO capability to export to applications, and
to protocol and transport implementors.
While here we cleaned up the protocol details to use consistently
shorter names (no nni_ prefix for static symbols needed), and we
also fixed a bug in the surveyor code.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This eliminates all the old #define's or enum values, making all
option IDs now totally dynamic, and providing well-known string
values for well-behaved applications.
We have added tests of some of these options, including lookups, and
so forth. We have also fixed a few problems; including at least
one crasher bug when the timeouts on reconnect were zero.
Protocol specific options are now handled in the protocol. We will
be moving the initialization for a few of those well known entities
to the protocol startup code, following the PAIRv1 pattern, later.
Applications must therefore not depend on the value of the integer IDs,
at least until the application has opened a socket of the appropriate
type.
|
| |
|
|
|
|
|
|
|
| |
This moves the DNS related functionality into common code, and also
removes all the URL parsing stuff out of the platform specific code
and into the transports. Now the transports just take sockaddr's on
initialization. (We may want to move this until later.)
We also add UDP resolution as another separate API.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the underlying platform fails (FreeBSD is the only one I'm aware
of that does this!), we use a global lock or condition variable instead.
This means that our lock initializers never ever fail.
Probably we could eliminate most of this for Linux and Darwin, since
on those platforms, mutex and condvar initialization reasonably never
fails. Initial benchmarks show little difference either way -- so we
can revisit (optimize) later.
This removes a lot of otherwise untested code in error cases and so forth,
improving coverage and resilience in the face of allocation failures.
Platforms other than POSIX should follow a similar pattern if they need
this. (VxWorks, I'm thinking of you.) Most sane platforms won't have
an issue here, since normally these initializations do not need to allocate
memory. (Reportedly, even FreeBSD has plans to "fix" this in libthr2.)
While here, some bugs were fixed in initialization & teardown.
The fallback code is properly tested with dedicated test cases.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
A little benchmarking showed that we were encountering far too many
wakeups, leading to severe performance degradation; we had a bunch
of threads all sleeping on the same condition variable (taskqs)
and this woke them all up, resulting in heavy mutex contention.
Since we only need one of the threads to wake, and we don't care which
one, let's just wake only one. This reduced RTT latency from about
240 us down to about 30 s. (1/8 of the former cost.)
There's still a bunch of tuning to do; performance remains worse than
we would like.
|
| |
|
|
|
|
| |
The finish routine can race against an asynchronous cancellation,
so we must not clear the data pointer, or we can wind up with a
NULL pointer dereference.
|
| | |
|
| |
|
|
|
|
|
|
|
| |
This passes valgrind 100% clean for both helgrind and deep leak
checks. This represents a complete rethink of how the AIOs work,
and much simpler synchronization; the provider API is a bit simpler
to boot, as a number of failure modes have been simply eliminated.
While here a few other minor bugs were squashed.
|
| |
|
|
| |
block for any AIO completion.
|
| |
|
|
|
|
|
| |
The queue is bound at initialization time of the task, and we call
entries just tasks, so we don't have to pass around a taskq pointer
across all the calls. Further, nni_task_dispatch is now guaranteed
to succeed.
|
| |
|
|
|
|
|
|
|
| |
We need to remember that protocol stops can run synchronously, and
therefore we need to wait for the aio to complete. Further, we need
to break apart shutting down aio activity from deallocation, as we need
to shut down *all* async activity before deallocating *anything*.
Noticed that we had a pipe race in the surveyor pattern too.
|
| |
|
|
|
|
|
|
| |
We have seen some yet another weird situation where we had an orphaned
pipe, which was caused by not completing the callback. If we are going
to run nni_aio_fini, we should still run the callback (albeit with a
return value of NNG_ECANCELED or somesuch) to be sure that we can't
orphan stuff.
|
| |
|
|
|
|
| |
Apparently there are circumstances when a pipedesc may get orphaned form the
pollq. This triggers an assertion failure when it occurs. I am still
trying to understand how this can occur. Stay tuned.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
| |
We have seen leaks of pipes causing test failures (e.g. the Windows
IPC test) due to EADDRINUSE. This was caused by a case where we
failed to pass the pipe up because the AIO had already been canceled,
and we didn't realize that we had oprhaned the pipe. The fix is to
add a return value to nni_aio_finish, and verify that we did finish
properly, or if we did not then we must free the pipe ourself. (The
zero return from nni_aio_finish indicates that it accepts ownership
of resources passed via the aio.)
|
| |
|
|
|
|
|
|
| |
We closed a few subtle races in the AIO subsystem as well, and now
we were able to eliminate the separate timer handling the MQ code.
There appear to be some opportunities to further enhance the code
for MQs as well -- eventually probably the only access to MQs will
be with AIOs.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
|
|
We will still need some kind of specific handling of cancellation for
msg queues, but it will be simpler to just implement that for the queues,
and not worry about cancellation in the general case around poll etc.
(The low level poll and I/O routines will get notified by their underlying
transport pipes/descriptors closing.)
|