Signals and interrupts

Problems associated to signals POSIX contemplates the notion of "signals", which are events that cause a process or a thread to be interrupted. Windows uses the term "exception", which includes also a more general kind of errors. In both cases the consequence is that a thread or process may be interrupted at any time, either by causes which are intrinsic to them (synchronous signals), such as floating point exceptions, or extrinsic (asynchronous signals), such as the process being aborted by the user. Of course, those interruptions are not always welcome. When the interrupt is delivered and a handler is invoked, the thread or even the whole program may be in an inconsistent state. For instance the thread may have acquired a lock, or it may be in the process of filling the fields of a structure. Furthermore, sometimes the signal that a process receives may not even be related to it, as in the case when a user presses Cltr-C and a SIGINT signal is delivered to an arbitrary thread, or when the process receives the Windows exception CTRL_CLOSE_EVENT denoting that the terminal window is being closed. Understanding this, POSIX restricts severely what functions can be called from a signal handler, thereby limiting its usefulness. However, Common Lisp users expect to be able to handle floating point exceptions and to gracefully manage user interrupts, program exits, etc. In an attempt to solve this seemingly impossible problem, &ECL; has taken a pragmatic approach that works, it is rather safe, but involves some work on the &ECL; maintainers and also on users that want to embed &ECL; as a library.

Kinds of signals

Synchronous signals The name derives from POSIX and it denotes interrupts that occur due to the code that a particular thread executes. They are largely equivalent to C++ and Java exceptions, and in Windows they are called "unchecked exceptions." Common Lisp programs may generate mostly three kinds of synchronous signals: Floating point exceptions, that result from overflows in computations, division by zero, and so on. Access violations, such as dereferencing NULL pointers, writing into regions of memory that are protected, etc. Process interrupts. The first family of signals are generated by the floating point processing hardware in the computer, and they typically happen when code is compiled with low security settings, performing mathematical operations without checks. The second family of signals may seem rare, but unfortunately they still happen quite often. One scenario is wrong code that handles memory directly via FFI. Another one is undetected stack overflows, which typically result in access to protected memory regions. Finally, a very common cause of these kind of exceptions is invoking a function that has been compiled with very low security settings with arguments that are not of the expected type -- for instance, passing a float when a structure is expected. The third family is related to the multiprocessing capabilities in Common Lisp systems and more precisely to the function which is used to kill, interrupt and inspect arbitrary threads. In POSIX systems &ECL; informs a given thread about the need to interrupt its execution by sending a particular signal from the set which is available to the user. Note that in neither of these cases we should let the signal pass unnoticed. Access violations and floating point exceptions may propagate through the program causing more harm than expected, and without process interrupts we will not be able to stop and cancel different threads. The only question that remains, though, is whether such signals can be handled by the thread in which they were generated and how.

Asynchronous signals In addition to the set of synchronous signals or "exceptions", we have a set of signals that denote "events", things that happen while the program is being executed, and "requests". Some typical examples are: Request for program termination (SIGKILL, SIGTERM). Indication that a child process has finished. Request for program interruption (SIGINT), typically as a consecuence of pressing a key combination, Ctrl-C. The important difference with synchronous signals is that we have no thread that causes the interrupt and thus there is no preferred way of handling them. Moreover, the operating system will typically dispatch these signals to an arbitrary thread, unless we set up mechanisms to prevent it. This can have nasty consequences if the incoming signal interrupt a system call, or leaves the interrupted thread in an inconsistent state.

Signals and interrupts in &ECL; The signal handling facilities in &ECL; are constrained by two needs. First of all, we can not ignore the synchronous signals mentioned in . Second, all other signals should cause the least harm to the running threads. Third, when a signal is handled synchronously using a signal handler, the handler should do almost nothing unless we are completely sure that we are in an interruptible region, that is outside system calls, in code that &ECL; knows and controls. The way in which this is solved is based on the existence of both synchronous and asynchronous signal handling code, as explained in the following two sections.

Handling of asynchronous signals In systems in which this is possible, &ECL; creates a signal handling thread to detect and process asynchronous signals (See ). This thread is a trivial one and does not process the signals itself: it communicates with, or launches new signal handling threads to act accordingly to the denoted events. The use of a separate thread has some nice consequences. The first one is that those signals will not interrupt any sensitive code. The second one is that the signal handling thread will be able to execute arbitrary lisp or C code, since it is not being executed in a sensitive context. Most important, this style of signal handling is the recommended one by the POSIX standards, and it is the one that Windows uses. The installation of the signal handling thread is dictated by a boot time option, ECL_OPT_SIGNAL_HANDLING_THREAD, and it will only be possible in systems that support either POSIX or Windows threads. Systems which embed &ECL; as an extension language may wish to deactivate the signal handling thread using the previously mentioned option. If this is the case, then they should take appropriate measures to avoid interrupting the code in &ECL; when such signals are delivered. Systems which embed &ECL; and do not mind having a separate signal handling thread can control the set of asynchronous signals which is handled by this thread. This is done again using the appropriate boot options such as ECL_OPT_TRAP_SIGINT, ECL_OPT_TRAP_SIGTERM, etc. Note that in order to detect and handle those signals, &ECL; must block them from delivery to any other thread. This means changing the sigprocmask() in POSIX systems or setting up a custom SetConsoleCtrlHandler() in Windows.

Handling of synchronous signals We have already mentioned that certain synchronous signals and exceptions can not be ignored and yet the corresponding signal handlers are not able to execute arbitrary code. To solve this seemingly impossible contradiction, &ECL; uses a simple solution, which is to mark the sections of code which are interruptible, and in which it is safe for the handler to run arbitrary code. All other regions would be considered "unsafe" and would be protected from signals and exceptions. In principle this "marking" of safe areas can be done using POSIX functions such as pthread_sigmask() or sigprocmask(). However in practice this is slow, as it involves at least a function call, resolving thread-local variables, etc, etc, and it will not work in Windows. Furthermore, sometimes we want signals to be detected but not to be immediately processed. For instance, when reading from the terminal we want to be able to interrupt the process, but we can not execute the code from the handler, since the C function which is used to read from the terminal, read(), may have left the input stream in an inconsistent, or even locked state. The approach in &ECL; is more lightweight: we install our own signal handler and use a thread-local variable as a flag that determines whether the thread is executing interrupt safe code or not. More precisely, if the variable ecl_process_env()->disable_interrupts is set, signals and exceptions will be postponed and then the information about the signal is queued. Otherwise the appropriate code is executed: for instance invoking the debugger, jumping to a condition handler, quitting, etc. Systems that embed &ECL; may wish to deactivate completely these signal handlers. This is done using the boot options, ECL_OPT_TRAP_SIGFPE, ECL_OPT_TRAP_SIGSEGV, ECL_OPT_TRAP_SIGBUS, ECL_OPT_TRAP_INTERRUPT_SIGNAL. Systems that embed &ECL; and want to allow handling of synchronous signals should take care to also trap the associated lisp conditions that may arise. This is automatically taken care of by functions such as si_safe_eval(), and in all other cases it can be solved by enclosing the unsafe code in a CL_CATCH_ALL_BEGIN frame (See ).

Considerations when embedding &ECL; There are several approaches when handling signals and interrupts in a program that uses &ECL;. One is to install your own signal handlers. This is perfectly fine, but you should respect the same restrictions as &ECL;. Namely, you may not execute arbitrary code from those signal handlers, and in particular it will not always be safe to execute Common Lisp code from there. If you want to use your own signal handlers then you should set the appropriate options before invoking cl_boot(), as explained in . Note that in this case &ECL; will not always be able to detect floating point exceptions, specially if your compiler does not support C99 and the corresponding floating point flags. The other option is to let &ECL; handle signals itself. This would be safer when the dominant part of the code is Common Lisp, but you may need to protect the code that embeds &ECL; from being interrupted using either the macros and or the POSIX functions pthread_sigmaks and sigprocmask.