| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
| |
* Fixes compiling on Windows with Mingw
Fixes the build error: "InterlockedDecrementAcquire64 not defined" on Mingw
* fixes semantics of InterlockedDecrementRelease64 on Mingw
From Microsoft docs, InterlockedDecrementRelease64 returns the resulting decremented value.
The equivalent function on Mingw is `__atomic_sub_fetch`, not `__atomic_fetch_sub` (which returns the previous value).
|
| |
|
|
| |
(#1591)
|
| |
|
|
|
| |
Note that one of these warning is a real bug that would prevent
TLS from functioning properly on Windows.
|
| |
|
|
|
|
|
|
| |
* use correct LONG type for nni_atomic_flag on win32
* use InterlockExchangeAdd for nni_atomic_get_bool
- this is equivelent to InterlockAdd for the purposes of this call (since it is adding 0)
- this allows the code to compile on 32bit windows
|
| |
|
|
|
|
| |
This eliminates some run-time initialization, moving it to compile time.
Additional follow up work will expand on this to simplify initialization
and reduce the need for certain locks.
|
| |
|
|
|
| |
This is initially used for TLS to make loading the engine pointer
faster, eliminating a much more expensive lock operation.
|
| | |
|
| | |
|
| |
|
|
|
|
| |
This provides the initial implementation, and converts the
transport lookup routines to use it. This is probably of limited
performance benefit, but rwlock's may be useful in further future work.
|
| |
|
|
|
|
| |
This also exposes an nng_thread_set_name() function for
applications to use. All NNG thread names start with "nng:".
Note that support is highly dependent on the operating system.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
The TTL in these cases should have been atomic. To facilitate
things we actually introduce an atomic int for convenience. We
also introduce a convenience nni_msg_must_append_u32() and
nni_msg_header_must_append_u32(), so that we can eliminate some
failure tests that cannot ever happen. Combined with a new test
for xreq, we have 100% coverage for xreq and more coverage for
the other REQ/REP protocols.
|
| |
|
|
|
| |
This also introduces an nni_atomic_cas64 to help with lock-free designs.
Some mechanical renaming was done in some of the protocols for spelling.
|
| |
|
|
|
|
| |
When using the 32-bit Windows compiler, the functions
InterlockedIncrementAcquire64() and and InterlockedDecrementRelease64() are not
defined. So we fall back to the more generic InterlockedAcquire64() and
InterlockedDecrement64() on 32-bit Windows.
|
| |
|
|
|
| |
This also introduces a new atomic boolean type, so we can use that
to trigger whether we've added the HTTP handler or not.
|
| | |
|
| |
|
|
|
| |
This also introduces a more efficient reference counting usage based
on atomics, rather than locks.
|
| | |
|
| |
|
|
|
|
| |
Define a InterlockedAddNoFence64() function using gcc's atomics on
mingw(32|64)
(https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html)
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
fixes #622 incorrect assumptions about malloc(0)
Windows actually allocates an object of size zero when calling
malloc on size zero. This is unusual behavior, and we just
add logic to work more like malloc on POSIX systems.
Other systems can return non-NULL objects to fixed pages here.
We think the best option here is to uniformly return NULL from
our APIs in these circumstances, and to include testing to validate
that.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes #179 DNS resolution should be done at connect time
fixes #586 Windows IO completion port work could be better
fixes #339 Windows iocp could use synchronous completions
fixes #280 TCP abstraction improvements
This is a rather monstrous set of changes, which refactors TCP, and
the underlying Windows I/O completion path logic, in order to obtain
a cleaner, simpler API, with support for asynchronous DNS lookups performed
on connect rather than initialization time, the ability to have multiple
connects or accepts pending, as well as fewer extraneous function calls.
The Windows code also benefits from greatly reduced context switching,
fewer lock operations performed, and a reduced number of system calls
on the hot code path. (We use automatic event resetting instead of manual.)
Some dead code was removed as well, and a few potential edge case leaks
on failure paths (in the websocket code) were plugged.
Note that all TCP based transports benefit from this work. The IPC code
on Windows still uses the legacy IOCP for now, as does the UDP code (used
for ZeroTier.) We will be converting those soon too.
|
| |
|
|
|
|
|
|
|
|
| |
fixes #573 atomic flags could help
This introduces a new atomic flag, and reduces some of the global
locking. The lock refactoring work is not yet complete, but this is
a positive step forward, and should help with certain things.
While here we also fixed a compile warning due to incorrect types.
|
| |
|
|
|
|
| |
This should work on both Windows and the most common POSIX
variants. We will create at least two threads for running
completions, but there are numerous other threads in the code.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes #326 consider nni_taskq_exec_synch()
fixes #410 kqueue implementation could be smarter
fixes #411 epoll_implementation could be smarter
fixes #426 synchronous completion can lead to panic
fixes #421 pipe close race condition/duplicate destroy
This is a major refactoring of two significant parts of the code base,
which are closely interrelated.
First the aio and taskq framework have undergone a number of simplifications,
and improvements. We have ditched a few parts of the internal API (for
example tasks no longer support cancellation) that weren't terribly useful
but added a lot of complexity, and we've made aio_schedule something that
now checks for cancellation or other "premature" completions. The
aio framework now uses the tasks more tightly, so that aio wait can
devolve into just nni_task_wait(). We did have to add a "task_prep()"
step to prevent race conditions.
Second, the entire POSIX poller framework has been simplified, and made
more robust, and more scalable. There were some fairly inherent race
conditions around the shutdown/close code, where we *thought* we were
synchronizing against the other thread, but weren't doing so adequately.
With a cleaner design, we've been able to tighten up the implementation
to remove these race conditions, while substantially reducing the chance
for lock contention, thereby improving scalability. The illumos poller
also got a performance boost by polling for multiple events.
In highly "busy" systems, we expect to see vast reductions in lock
contention, and therefore greater scalability, in addition to overall
improved reliability.
One area where we currently can do better is that there is still only
a single poller thread run. Scaling this out is a task that has to be done
differently for each poller, and carefuly to ensure that close conditions
are safe on all pollers, and that no chance for deadlock/livelock waiting
for pfd finalizers can occur.
|
| |
|
|
|
|
|
| |
We enabled verbose compiler warnings, and found a lot of issues.
Some of these were even real bugs. As a bonus, we actually save
some initialization steps in the compat layer, and avoid passing
some variables we don't need.
|
| | |
|
| |
|
|
|
| |
This addresses a number of problems that were found on Windows,
including one bug that actually turned up in testing on POSIX.
|
| |
|
|
|
|
| |
There is now a public nng_duration type. We have also updated the
zerotier work to work with the signed int64_t's that the latst ZeroTier
dev branch is using.
|
| |
|
|
|
|
|
| |
This implements the basic UDP functionality for Windows (required
for ZeroTier for example). We have also introduced a UDP test suite
to validate that this actually works. While here a few Windows
compilation warnings / nits were fixed.
|
| |
|
|
|
|
|
|
|
|
| |
We only compile files that are appropriate for the platform. (We
still have guards in place, to allow for a future single .C file
to be built from all the sources.) We also remove the subsystem defines;
if a new platform needs to deviate from POSIX in ways beyond what we
intended here, then that platform should just copy those parts into
a new platform directory, rather than cross including portions from
POSIX.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the underlying platform fails (FreeBSD is the only one I'm aware
of that does this!), we use a global lock or condition variable instead.
This means that our lock initializers never ever fail.
Probably we could eliminate most of this for Linux and Darwin, since
on those platforms, mutex and condvar initialization reasonably never
fails. Initial benchmarks show little difference either way -- so we
can revisit (optimize) later.
This removes a lot of otherwise untested code in error cases and so forth,
improving coverage and resilience in the face of allocation failures.
Platforms other than POSIX should follow a similar pattern if they need
this. (VxWorks, I'm thinking of you.) Most sane platforms won't have
an issue here, since normally these initializations do not need to allocate
memory. (Reportedly, even FreeBSD has plans to "fix" this in libthr2.)
While here, some bugs were fixed in initialization & teardown.
The fallback code is properly tested with dedicated test cases.
|
| |
|
|
|
|
|
| |
This fixes one major problem, which was that if nni_fini() was called
once on Windows, it would not be further possible to call nni_init().
While here fixed a few compilation issues.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
A little benchmarking showed that we were encountering far too many
wakeups, leading to severe performance degradation; we had a bunch
of threads all sleeping on the same condition variable (taskqs)
and this woke them all up, resulting in heavy mutex contention.
Since we only need one of the threads to wake, and we don't care which
one, let's just wake only one. This reduced RTT latency from about
240 us down to about 30 s. (1/8 of the former cost.)
There's still a bunch of tuning to do; performance remains worse than
we would like.
|
| |
|
|
|
|
| |
This is only lightly tested, and I expect that there remain
some race conditions. Endpoint logic in particular needs
work.
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
| |
Modern Windows (Vista and later) have light weight Slim Read/Write locks
which only occupy 64 bits, and don't require any memory allocation to
create.
While here clean up a few more unreferenced variables found with the
Microsoft compilers.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
This compiles correctly, but doesn't actually deliver events yet.
As part of this, I've made most of the initializables in nng
safe to tear-down if uninitialized (or set to zero e.g. via calloc).
This makes it loads easier to write the teardown on error code, since
I can deinit everything, without worrying about which things have been
initialized and which have not.
|
| | |
|
| | |
|
| |
|
|
|
| |
Test code needs to use the static libraries so that they can get access
to the entire set of symbols, including private ones that are not exported.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Since we use the tick counter to sleep, we should use the same clock
for validation. The problem is that the high performance tick counter
on the CPU may be slightly out of agreement with the windows clock.
Furthermore, the tick counter is probably lots faster to retrieve since
it is already updated, and needn't be recalculated each time.
(We should consider just switching to millisecond clock resolution
internally as well. It turns out that I don't think that timers that
are shorter than 1ms are very useful.)
|
| |
|
|
|
|
|
| |
There are lots of changes here, mostly stuff we did in support of
Windows TCP. However, there are some bugs that were fixed, and we
added some new error codes, and generalized the handling of some failures
during accept. Windows IPC (NamedPipes) is still missing.
|
| |
|
|
|
|
| |
Windows is getting there. Needs a couple of more more hours to enable
everything, especially IPC, and most of the work at this point is probably
some combination of debug and tweaking things like error handling.
|
| |
|