We can't use ecl_disable_interrupts, because often writes in the
thread local environment happen while we hold the locks (e.g.
env->packages_to_be_created is written in find_pending_package
while the lock is held in ecl_make_package). Therefore we use the
lisp interrupt blocking mechanism. For this, the order of
operations in cl_boot has to be modified a bit.
Checking process.phase without holding the start_stop_spinlock
looks dangerous, the thread may exit after the check but before we
interrupt it. Also, we can't call mp_process_kill while interrupts
are disabled, so we have to use the lower level ecl_interrupt_process.
Previously, the dummy tag was written behind the stack
boundary. Also added race condition protection to non-inlined
ecl_bds_bind/push. The memory barriers have been reworked,
too. AO_store_full has been replaced by AO_full_nop. This is
sufficient to insert the required memory barrier instructions and
is implemented in a simpler way by libatomic_ops in some cases.
Due to the use of mprotect() for fast interrupt dispatch it is
not possible to write in the thread local environment when
interrupts are disabled. We need to use sigprocmask to block
interrupts in this case.
If ecl_unwind is interrupted with another call to ecl_unwind
before it has decremented env->frs_top, the second call of
ecl_unwind may stop too early with its unwinding, leading to
potential segfaults.
We don't need to save/restore outside of signal handlers. Also,
bignum_registers were not saved. Allocation of the values array
has been changed to heap allocation, since this array is quite
large and we may overflow the C stack, if we allocate it there.
If ecl_bds_push or ecl_bds_bind were interrupted by a call to
ecl_bds_unwind, segementation faults could occur, because
env->bds_top->symbol may not have pointed to a valid symbol.
Also, memory corruption was possible if the functions were
interrupted after setting slot->symbol but before setting
slot->value.
Interrupting a thread during setjmp with a call to ecl_unwind
leads to segmentation faults, since we try to call longjmp
before the corresponding setjmp has finished. Thus, we also need
to wait until setjmp has finished before we can set frs_val of
the frame.
If by chance env->frs_top->frs_val has the value ECL_PROTECT_TAG,
ecl_unwind will stop and call longjmp. However, at this point
setjmp has not yet been called, leading to a segmentation fault.
We have to make sure that the stack pointers always point to a
valid object. This means that we have to increase env->stack_top
before we change things in the stack.
If a thread is interrupted directly after a call to
ecl_function_dispatch, env->function may be overwritten before
it is used. Thus we need to save and restore when we
execute queued signals.
The logic im mp_barrier_wait is wrong. decrement_counter returns
the value of the counter __before__ it is decremented. Before
the fix, the counter decremented until it reached 0 and then the
next arriving thread would get stuck in decrement_counter. Also,
interrupts were not reenabled in all cases.
If mp_process_enable is interrupted after pthread_create, but
before its exit code is examined, the cleanup code may be run
even when pthread_create did not fail, so we need to disable
interrupts in this region.
If a thread is killed while it holds a spinlock, the lock will
never be released, leading to deadlocks. Hence we have to clean
up spinlocks in ECL_WITH_SPINLOCK_END. In mp_process_enable,
other cleanup (deallocating the environment, unlisting the
process) has to performed too.
This is important to prevent race conditions. If interrupts are
left disabled, the environment may be wrongly write protected by
an interrupting thread and completely harmless writes in the
environment can lead to segmentation faults.
If a process, that has already unwound its whole frame stack
(after ECL_CATCH_ALL_END in thread_entry_point) is interrupted by
a call to mp_exit_process, ECL will crash with a segmentation
fault. We thus need to aquire the start_stop_spinlock before we
unwind the frame stack.
If a thread is interrupted while interrupts are disabled by C,
then the signal is queued and the environment is write protected
by mprotect. If another thread then calls queue_signal, it will
try to write in the protected environment, leading to a
segmentation fault. Since mprotect can only protect whole memory
pages, we need to allocate the pending interrupts and the signal
queue in a separate struct.
It didn't wake up all processes to check the condition what caused n+1 lag in
condition check for signal-process (when called with n>1). Fixes#421. No
regression test, because this is already tested in sem-signal-* tests (they were
failing).