* doc/lispref/internals.texi (Garbage Collection):
* src/alloc.c (syms_of_alloc) <gc-cons-threshold>
<gc-cons-percentage>: Advise against enlarging the GC thresholds
more than needed and for prolonged periods of time.
Get rid of the global iterator object and instead allocate
a separate iterator for every loop. This still uses the "duplicate
iterator" code, including the old iterator which needs a stack,
make ITREE_FOREACH a bit more expensive than we'd like.
* src/itree.h (init_itree, forget_itree, itree_iterator_busy_p):
Delete declarations.
(itree_iterator_start): Add iterator arg and remove `line` and `file` args.
(struct itree_iterator): Move from `itree.c`. Remove `line` and
`file` fields.
(ITREE_FOREACH): Stack allocate an iterator object and pass it to
`itree_iterator_start`.
* src/itree.c (struct itree_iterator): Move to itree.h.
(iter): Delete global variable.
(itree_iterator_create, init_itree, forget_itree, itree_iterator_busy_p):
Delete functions.
(itree_contains): Adjust assertion.
(itree_iterator_finish): Deallocate the iterator's stack.
(itree_iterator_start): Take the (uninitialized) iterator as argument.
Allocate a fresh new stack. Remove `file` and `line` arguments.
Don't check `running` any more since the iterator is not expected to be
initialized at all.
* src/eval.c (signal_or_quit):
* src/alloc.c (garbage_collect): Don't check `itree_iterator_busy_p`
any more.
* src/emacs.c (main): No need to `init_itree` any more.
(Fdump_emacs): No need to `forget_itree` any more.
Some names under the interval_* namespace were renamed under the
itree_* namespace in commits:
0. f421b58db5 of 2022-10-19
"Prefix all itree.h type names with itree_".
1. 37a1145410 of 2022-10-19
"Rename all exported itree.h functions with the itree_ prefix"
Further, some values still referenced in commentary were removed in
commits:
2. 258e618364 of 2022-10-17
"Delete the itree_null sentinel node, use NULL everywhere."
3. 2c4a3910b3 of 2022-10-02
"itree: Use a single iterator object"
* src/emacs.c (main): Allocate global itree iterator once and for
all.
* src/alloc.c (mark_overlay):
* src/buffer.c (set_overlays_multibyte):
* src/itree.c (itree_destroy): Update commentary.
(interval_stack_ensure_space, itree_insert_gap): Prefer
unsigned-to-unsigned comparisons over signed-to-unsigned.
(interval_stack_push_flagged, interval_tree_insert)
(interval_tree_contains, itree_iterator_start)
(itree_iterator_finish, itree_iterator_next, itree_iterator_narrow):
Improve assertions.
(itree_init): Rename...
(init_itree): ...to this, for consistency with other global init
functions.
(itree_create): Stop leaking a global iterator allocation on each
call.
(interval_tree_init): Complete renames of
interval_tree -> itree_tree and interval_tree_clear -> itree_clear.
(interval_tree_remove_fix): Fix indentation.
* src/itree.h: Declare init_itree.
(ITREE_FOREACH): Fix typo in commentary.
* src/pdumper.c [CHECK_STRUCTS]
(dump_interval_node): Use the correct name in the HASH condition
and #error message.
(dump_overlay, dump_buffer): Update HASH (bug#58975).
For the most part, I replaced the interval_tree_ prefix with itree_,
interval_node_ with itree_node_, etc.
* src/alloc.c: Rename everywhere as appropriate.
* src/alloc.c: ditto.
* src/buffer.c: ditto.
* src/buffer.h: ditto.
* src/itree.c: ditto.
* src/itree.h: ditto.
This reverts commit b8fbd42f0a,
with edits.
* src/alloc.c (mark_overlays): restore function.
(mark_buffer): Call it, not ITREE_FOREACH.
(garbage_collect): eassert (!itree_busy_p ()).
* src/itree.h: Comment tweak: explain why GC is considered risky. It
isn't that GC itself is risky, it is that GC can call ELisp by way of
a hook, and running ELisp during iteration is risks nested iteration.
* src/treesit.c [WINDOWSNT]: Add MS-Windows boilerplate for
dynamically-loaded optional libraries.
(init_treesit_functions) [WINDOWSNT]: New function.
(load_tree_sitter_if_necessary): New function.
(ts_initialize): Call 'load_tree_sitter_if_necessary'.
(ts_delete_parser, ts_delete_query, ts_named_node_p): Wrapper
functions for TS calls from outside treesit.c.
(Ftreesit_parser_root_node, Ftreesit_parser_set_included_ranges)
(Ftreesit_parser_included_ranges, Ftreesit_node_type)
(Ftreesit_node_start, Ftreesit_node_end, Ftreesit_node_string)
(Ftreesit_node_parent, Ftreesit_node_child, Ftreesit_node_check)
(Ftreesit_node_field_name_for_child, Ftreesit_node_child_count)
(Ftreesit_node_next_sibling, Ftreesit_node_prev_sibling)
(Ftreesit_node_first_child_for_pos)
(Ftreesit_node_descendant_for_range, Ftreesit_node_eq)
(Ftreesit_query_compile, Ftreesit_query_capture)
(Ftreesit_search_subtree, Ftreesit_search_forward)
(Ftreesit_induce_sparse_tree): Call 'ts_initialize' before any
other TS functions.
(Ftreesit_available_p): 'treesit-available-p' implemented in C, so
that on WINDOWSNT the library could be loaded dynamically.
* src/treesit.h (ts_delete_parser, ts_delete_query)
(ts_named_node_p): Add prototypes.
* src/print.c (print_vectorlike):
* src/alloc.c (cleanup_vector): Call tree-sitter function via
wrappers defined in treesit.c, not directly, because WINDOWSNT
redefines the TS functions to be called via function pointers.
* src/Makefile.in (base_obj): Add treesit.o
(TREE_SITTER_OBJ): Remove.
* lisp/treesit.el (treesit-available-p): Remove: now implemented
in C.
* lisp/term/w32-win.el (dynamic-library-alist): Add libtree-sitter
DLLs.
* configure.ac (TREE_SITTER): Support the MinGW build.
(TREE_SITTER_OBJ): Remove.
Move args between `build_overlay` and `add_buffer_overlay`,
to try and keep buffer positions together with their buffer.
Be more strict in the `otick` values passed to `interval_tree_insert`.
Move a few things around to try and reduce dependencies through `.h` files.
Fix a thinko bug in `check_tree`.
* src/alloc.c (build_overlay): Remove `begin` and `end` args.
* src/buffer.c (add_buffer_overlay): Move from `buffer.h`.
Add `begin` and `end` args.
(copy_overlays): Adjust accordingly.
(Fmake_overlay): Use BUF_BEG and BUF_Z; adjust call to `build_overlay`
and `add_buffer_overlay`.
(Fmove_overlay): Use BUF_BEG and BUF_Z; Use the new `begin` and `end`
args of `add_buffer_overlay` so we don't need to use
`interval_node_set_region` when moving to a new buffer.
(remove_buffer_overlay, set_overlay_region): Move from `buffer.h`.
* src/buffer.h (set_overlay_region, add_buffer_overlay)
(remove_buffer_overlay): Move to `buffer.c`.
(build_overlay): Move from `lisp.h`.
(maybe_alloc_buffer_overlays): Delete function (inline into its only
caller).
* src/itree.c (interval_tree_insert): Move declaration `from buffer.h`.
(check_tree): Fix initial offset in call to `recurse_check_tree`.
Remove redundant check of the `limit` value.
(interval_node_init): Remove `begin` and `end` args.
(interval_tree_insert): Mark it as static.
Assert that the new node's `otick` should already be uptodate and its
new parent as well.
(itree_insert_node): New function.
(interval_tree_insert_gap): Assert the otick of the removed+added nodes
were uptodate and mark them as uptodate again after adjusting
their positions.
(interval_tree_inherit_offset): Check that the parent is at least as
uptodate as the child.
* src/lisp.h (build_overlay): Move to `buffer.h`.
* src/itree.h (interval_node_init): Adjust accordingly.
(interval_tree_insert): Remove declaration.
(itree_insert_node): New declaration.
This commit basically reverts commit 5b954f8f9. The problem of nested
iterations hasn't been fixed in the mean time, but since the GC can
run arbitrary ELisp code (via `post-gc-hook`), running the GC from
within an itree iteration is already unsafe anyway :-(
* src/alloc.c (mark_overlays): Delete function.
(mark_buffer): Use ITREE_FOREACH.
* src/alloc.c (mark_overlay): Add sanity check.
* src/buffer.c (next_overlay_change, previous_overlay_change):
Tweak code to keep the same vars for the bounds.
* src/itree.c (interval_tree_clear, interval_tree_insert)
(interval_tree_remove, interval_tree_insert_fix, interval_tree_remove_fix):
Adjust to the `color` -> `red` change.
(interval_tree_clear): Prefer `true/false` for booleans.
(interval_generator_create): Use an actual `interval_tree_order` value
rather than 0.
(interval_generator_next): Simplify a tiny bit. Add comment.
(interval_generator_narrow): Add sanity check.
* src/itree.h (struct interval_node): Replace `color` field with
boolean `red` field.
(enum interval_tree_order): Remove unused `ITREE_DEFLT_ORDER` value.
* src/pdumper.c (dump_interval_node): Adjust to the
`color` -> `red` change.
* lib-src/emacsclient.c (main): Use right length modifier when
printing uintmax_t.
* src/alloc.c (check_pure_size): Use right length modifier when
printing ptrdiff_t.
No intergration/interaction with the new type, just adding it.
* lisp/emacs-lisp/cl-preloaded.el (cl--typeof-types): Add new type.
* src/alloc.c (cleanup_vector): Add gc for the new type.
* src/data.c (Ftype_of): Add switch case for the new type.
(syms_of_data): Add symbols for the new type.
* src/lisp.h (DEFINE_GDB_SYMBOL_BEGIN): Add new type.
* src/treesit.c (Ftreesit_compiled_query_p): New function.
(syms_of_treesit): Add symbol for the new type.
* src/treesit.h (struct Lisp_TS_Query): New struct.
(TS_COMPILED_QUERY_P, XTS_COMPILED_QUERY, CHECK_TS_COMPILED_QUERY):
New macros.
* src/print.c (print_vectorlike): Add printing for the new type.
Restructure the reader to be nonrecursive so that it is not limited by
the C stack or crashes Emacs when reading deeply nested data.
This also improves performance.
A few minor bugs were fixed:
- (a .{NBSP}b) where {NBSP} is a non-breaking space (U+00A0) is now
the dotted pair (a . b), not the 3-element list (a \. b), since U+00A0
is treated as whitespace everywhere else.
- #_ with no symbol following is now equivalent to ## (empty interned
symbol), not #: (empty uninterned symbol).
* src/alloc.c (garbage_collect): Call mark_lread.
* src/lread.c (readevalloop): Use read0 instead of read_list.
(stackbufsize): Increase to 1024, now that read0 isn't recursive.
(invalid_radix_integer): Buffer overflow check.
(read1, read_list, read_vector): Remove.
(read_char_literal, read_string_literal)
(hash_table_from_plist, record_from_list, vector_from_rev_list)
(bytecode_from_rev_list, char_table_from_rev_list)
(sub_char_table_from_rev_list, string_props_from_rev_list)
(read_bool_vector, skip_lazy_string, symbol_char_span)
(skip_space_and_comments)
(enum read_entry_type, struct read_stack_entry, struct read_stack)
(rdstack, mark_lread, read_stack_top, read_stack_pop)
(read_stack_empty_p, grow_read_stack, read_stack_push): New.
(read0): Rewrite to be nonrecursive.
* test/src/lread-tests.el (lread-deeply-nested, lread-misc): New tests.
This lets ‘./configure; make’ work on Fedora 36 x86-64 from a Git
checkout without generating false-alarm warnings.
* lib-src/etags.c (main): There appeared to be false alarm with
GCC 12. However, the code was wrong anyway, as it mishandled file
names containing "'" so fix that bug. This pacifies GCC.
(mercury_decl): Omit tests ‘s + pos != NULL’ that were apparently
intended to be ‘s[pos] != '\0'’ but which were miscoded to always
be true and which were mostly not needed anyway. In one place,
though, a test was needed, so fix that by using strchr instead.
* src/alloc.c (lisp_free) [!GC_MALLOC_CHECK]:
* src/term.c (Fsuspend_tty): Don’t look at a pointer after freeing
it, even just to test it for equality with some other pointer, as
this has undefined behavior in C and GCC 12 diagnoses this.
* src/dbusbind.c (xd_read_message_1): Rework the code a bit
so that it has fewer tests. This pacifies GCC 12 which was
complaining incorrectly about dereferencing a null pointer.
* src/intervals.c (copy_properties): Remove an eassume that should
no longer be needed even to pacify older GCCs, due to ...
* src/intervals.h (split_interval_left): ... this addition of
ATTRIBUTE_RETURNS_NONNULL to pacify a GCC 12 warning about
dereferencing a null pointer.
* src/regex-emacs.c (EXTEND_BUFFER): Use negative values rather
than auxiliary booleans to indicate null pointers. This pacifies
GCC 12 false alarms about using uninitialized variables.
* src/xdisp.c (clear_position): New function.
(append_space_for_newline, extend_face_to_end_of_line):
Use it to work around false alarms from GCC 12.
(display_and_set_cursor): Add an UNINIT to pacify GCC 12.
* src/xterm.c (x_draw_glyphless_glyph_string_foreground):
Defend against hypothetical bad code elsewhere;
this also pacifies GCC 12.
(x_term_init): Use fixed-size auto array rather than alloca,
as the array is small; this also pacifies GCC 12.
* configure.ac (HAVE_TREE_SITTER, TREE_SITTER_OBJ): New variables.
(DYNAMIC_LIB_SUFFIX): new variable, I copied code from MODULES_SUFFIX
so the diff looks this way.
* doc/lispref/elisp.texi (Top): Add tree-sitter manual.
* doc/lispref/modes.texi (Font Lock Mode): mention tree-sitter.
(Parser-based Font Lock): New section.
(Auto-Indentation): Mention tree-sitter.
(Parser-based Indentation): New section.
* doc/lispref/parsing.texi (Parsing Program Source): New chapter.
* lisp/emacs-lisp/cl-preloaded.el (cl--typeof-types): Add
treesit-parser and treesit-node type.
* lisp/treesit.el: New file.
* src/Makefile.in (TREE_SITTER_LIBS, TREE_SITTER_FLAGS,
TREE_SITTER_OBJ): New variables.
* src/alloc.c:
(cleanup_vector): Add cleanup code for treesit-parser and
treesit-node.
* src/casefiddle.c (casify_region): Notify tree-sitter parser of
buffer change.
* src/data.c (Ftype_of): Add treesit-parser and treesit-node type
(Qtreesit_parser, Qtreesit_node): New symbol.
* src/emacs.c (main): Add symbols in treesit.c.
* src/eval.c (define_error): Move the function to here.
* src/insdel.c (insert_1_both, insert_from_string_1, insert_from_gap,
insert_from_buffer_1, replace_range, del_range_2): Notify tree-sitter
parser of buffer change.
* src/json.c (define_error): Move this function out.
* src/lisp.h (DEFINE_GDB_SYMBOL_BEGIN): Add treesit-parser and
treesit-node.
* src/lread.c (Vdynamic_library_suffixes): New variable.
* src/print.c (print_vectorlike): Add code for printing
treesit-parser and treesit-node.
* src/treesit.c: New file.
* src/treesit.h: New file.
* test/src/treesit-tests.el: New file.
* src/lisp.h (lisp_h_Qni): New macro.
(DEFUN): Use it.
* src/alloc.c (syms_of_alloc): Use it.
* src/bytecode.c (Fbyte_code): Fix Lisp_Object/int mixup.
This is the function that marks the C stack. Avoid confusion with the
new mark stack, a stack used in the GC mark phase.
* src/alloc.c (SETJMP_WILL_LIKELY_WORK, SETJMP_WILL_NOT_WORK)
(mark_stack, mark_c_stack):
* src/lisp.h:
* src/thread.c (mark_one_thread): Rename mark_stack to mark_c_stack.
It doesn't make sense to prevent the return frame or movement
frame from being deleted, but we should at least protect them
from garbage collection.
* src/alloc.c (garbage_collect): Call mark_xterm.
* src/xterm.c (x_dnd_begin_drag_and_drop)
(x_dnd_cleanup_drag_and_drop): Clear movement and return frames
upon DND completion.
(mark_xterm): Mark those frames.
* src/xterm.h: Update prototypes.
An explict stack of objects to be traversed for marking replaces
recursion for most common object types: conses, vectors, records, hash
tables, symbols, functions etc. Recursion is still used for other
types but those are less common and thus not as likely to cause a
problem.
The stack grows dynamically as required which eliminates almost all C
stack overflow crashes in the GC. There is also a nontrivial GC
performance improvement.
* src/alloc.c (GC_REMEMBER_LAST_MARKED, GC_CDR_COUNT): New.
(mark_char_table, struct mark_entry):
Remove (subsumed into process_mark_stack).
(struct mark_entry, struct mark_stack, mark_stk)
(mark_stack_empty_p, mark_stack_pop, grow_mark_stack)
(mark_stack_push_value, mark_stack_push_values)
(process_mark_stack): New.
(mark_object, mark_objects):
Just push the object(s) and let process_mark_stack do the work.
* src/alloc.c: Omit unnecessary static function declarations.
Don’t use ‘inline static’ as the C standard says that keyword
order is obsolescent. Anyway, no need for ‘inline’ as compilers
inline without it well enough.
* lisp/emacs-lisp/comp.el (comp-func): Add a command-modes slot.
(comp-spill-lap-function, comp-intern-func-in-ctxt): Fill it.
(comp-emit-for-top-level, comp-emit-lambda-for-top-level): Use it.
* src/alloc.c (mark_object): Mark the command_modes slot.
* src/comp.c (make_subr): Add a command_modes parameter.
(Fcomp__register_lambda): Use it.
(Fcomp__register_subr): Ditto.
* src/data.c (Fcommand_modes): Output the command_modes data for subrs
(bug#54437).
* src/lisp.h (GCALIGNED_STRUCT): Add a command_modes slot.
* src/pdumper.c (dump_subr): Update hash.
(dump_subr): Dump the command_modes slot.
Use a dedicated stack for bytecode, instead of using the C stack.
Stack frames are managed explicitly and we stay in the same
exec_byte_code activation throughout bytecode function calls and
returns. In other words, exec_byte_code no longer uses recursion
for calling bytecode functions.
This results in better performance, and bytecode recursion is no
longer limited by the size of the C stack. The bytecode stack is
currently of fixed size but overflow is handled gracefully by
signalling a Lisp error instead of the hard crash that we get now.
In addition, GC marking of the stack is now faster and more precise.
Full precision could be attained if desired.
* src/alloc.c (ATTRIBUTE_NO_SANITIZE_ADDRESS): Make non-static.
* src/bytecode.c (enum stack_frame_index, BC_STACK_SIZE)
(sf_get_ptr, sf_set_ptr, sf_get_lisp_ptr, sf_set_lisp_ptr)
(sf_get_saved_pc, sf_set_saved_pc, init_bc_thread, free_bc_thread)
(mark_bytecode, Finternal_stack_stats, valid_sp): New.
(exec_byte_code): Adapt to use the new bytecode stack.
(syms_of_bytecode): Add defsubr.
* src/eval.c (unwind_to_catch): Restore saved stack frame.
(push_handler_nosignal): Save stack frame.
* src/lisp.h (struct handler): Add act_rec member.
(get_act_rec, set_act_rec): New.
* src/thread.c (mark_one_thread): Call mark_bytecode.
(finalize_one_thread): Free bytecode thread state.
(Fmake_thread, init_threads): Set up bytecode thread state.
* src/thread.h (struct bc_thread_state): New.
(struct thread_state): Add bytecode thread state.
Avoid making a copy (in the interpreter C stack frame) of the bytecode
string by making sure it won't be moved by the GC. This is done by
reallocating it to the heap normally only used for large strings,
which isn't compacted.
This requires that we retain an explicit reference to the bytecode
string object (`bytestr`) lest it be GCed away should all other
references vanish during execution. We allocate an extra stack slot
for that, as we already do for the constant vector object.
* src/alloc.c (allocate_string_data): Add `immovable` argument.
(resize_string_data, make_clear_multibyte_string): Use it.
(pin_string): New.
* src/pdumper.c (dump_string): Fix incorrect comment.
Update hash for Lisp_String (only comments changed, not contents).
* src/lread.c (read1):
* src/alloc.c (Fmake_byte_code, purecopy):
* src/bytecode.c (Fbyte_code): Pin bytecode on object creation.
(exec_byte_code): Don't copy bytecode. Retain `bytestr` explicitly.
* src/lisp.h (Lisp_String): Explain special size_byte values.
(string_immovable_p): New.