Add line-column tracking for tree-sitter parsers. Copied from
comments in treesit.c:
Technically we had to send tree-sitter the line and column
position of each edit. But in practice we just send it dummy
values, because tree-sitter doesn't use it for parsing and
mostly just carries the line and column positions around and
return it when e.g. reporting node positions[1]. This has
been working fine until we encountered grammars that actually
utilizes the line and column information for
parsing (Haskell)[2].
[1] https://github.com/tree-sitter/tree-sitter/issues/445
[2] https://github.com/tree-sitter/tree-sitter/issues/4001
So now we have to keep track of line and column positions and
pass valid values to tree-sitter. (It adds quite some
complexity, but only linearly; one can ignore all the linecol
stuff when trying to understand treesit code and then come
back to it later.) Eli convinced me to disable tracking by
default, and only enable it for languages that needs it. So
the buffer starts out not tracking linecol. And when a
parser is created, if the language is in
treesit-languages-require-line-column-tracking, we enable
tracking in the buffer, and enable tracking for the parser.
To simplify things, once a buffer starts tracking linecol, it
never disables tracking, even if parsers that need tracking
are all deleted; and for parsers, tracking is determined at
creation time, if it starts out tracking/non-tracking, it
stays that way, regardless of later changes to
treesit-languages-require-line-column-tracking.
To make calculating line/column positons fast, we store
linecol caches for begv, point, and zv in the
buffer (buf->ts_linecol_cache_xxx); and in the parser object,
we store linecol cache for visible beg/end of that parser.
In buffer editing functions, we need the linecol for
start/old_end/new_end, those can be calculated by scanning
newlines (treesit_linecol_of_pos) from the buffer point
cache, which should be always near the point. And we usually
set the calculated linecol of new_end back to the buffer
point cache.
We also need to calculate linecol for the visible_beg/end for
each parser, and linecol for the buffer's begv/zv, these
positions are usually far from point, so we have caches for
all of them (in either the parser object or the buffer).
These positions are far from point, so it's inefficient to
scan newlines from point to there to get up-to-date linecol
for them; but in the same time, because they're far and
outside the changed region, we can calculate their change in
line and column number by simply counting how much newlines
are added/removed in the changed
region (compute_new_linecol_by_change).
* doc/lispref/parsing.texi (Using Parser): Mention line-column
tracking in manual.
* etc/NEWS: Add news.
* lisp/treesit.el:
(treesit-languages-need-line-column-tracking): New variable.
* src/buffer.c: Include treesit.h (for TREESIT_EMPTY_LINECOL).
(Fget_buffer_create):
(Fmake_indirect_buffer): Initialize new buffer fields.
(Fbuffer_swap_text): Add new buffer fields.
* src/buffer.h (ts_linecol): New struct.
(buffer): New buffer fields.
(BUF_TS_LINECOL_BEGV):
(BUF_TS_LINECOL_POINT):
(BUF_TS_LINECOL_ZV):
(SET_BUF_TS_LINECOL_BEGV):
(SET_BUF_TS_LINECOL_POINT):
(SET_BUF_TS_LINECOL_ZV): New inline functions.
* src/casefiddle.c (casify_region): Record linecol info.
* src/editfns.c (Fsubst_char_in_region):
(Ftranslate_region_internal):
(Ftranspose_regions): Record linecol info.
* src/insdel.c (insert_1_both):
(insert_from_string_1):
(insert_from_gap_1):
(insert_from_buffer):
(replace_range):
(del_range_2): Record linecol info.
* src/treesit.c (TREESIT_BOB_LINECOL):
(TREESIT_EMPTY_LINECOL):
(TREESIT_TS_POINT_1_0): New constants.
(treesit_debug_print_linecol):
(treesit_buf_tracks_linecol_p):
(restore_restriction_and_selective_display):
(treesit_count_lines):
(treesit_debug_validate_linecol):
(treesit_linecol_of_pos):
(treesit_make_ts_point):
(Ftreesit_tracking_line_column_p):
(Ftreesit_parser_tracking_line_column_p): New functions.
(treesit_tree_edit_1): Accept real TSPoint and pass to
tree-sitter.
(compute_new_linecol_by_change): New function.
(treesit_record_change_1): Rename from treesit_record_change,
handle linecol if tracking is enabled.
(treesit_linecol_maybe): New function.
(treesit_record_change): New wrapper around
treesit_record_change_1 that handles some boilerplate and sets
buffer state.
(treesit_sync_visible_region): Handle linecol if tracking is
enabled.
(make_treesit_parser): Setup parser's linecol cache if tracking
is enabled.
(Ftreesit_parser_create): Enable tracking if the parser's
language requires it.
(Ftreesit__linecol_at):
(Ftreesit__linecol_cache_set):
(Ftreesit__linecol_cache): New functions for debugging and
testing.
(syms_of_treesit): New variable
Vtreesit_languages_require_line_column_tracking.
* src/treesit.h (Lisp_TS_Parser): New fields.
(TREESIT_BOB_LINECOL):
(TREESIT_EMPTY_LINECOL): New constants.
* test/src/treesit-tests.el (treesit-linecol-basic):
(treesit-linecol-search-back-across-newline):
(treesit-linecol-col-same-line):
(treesit-linecol-enable-disable): New tests.
* src/lisp.h: Declare display_count_lines.
* src/xdisp.c (display_count_lines): Remove static keyword.
Work around a bug in GnuTLS 3.7.11 and earlier: when built
statically, its mistakenly exports symbols hash_lookup and
hash_string, which collide with Emacs symbols of the same name,
preventing temacs from linking statically. Problem reported by
Greg A. Woods (Bug#77476).
Because GnuTLS never uses hash_lookup or hash_string this issue
ordinarily doesn’t seem to prevent temacs from linking to GnuTLS
on GNU/Linux, as it’s linked dynamically and the dynamic linker
never needs to resolve references to either symbol. However, I
suppose a clash or bug could occur even with dynamic linking if
Emacs later loads a module that uses either symbol.
Although GnuTLS should be fixed, Emacs should link statically to
current and older GnuTLS versions in the meantime, and it should
avoid potential problems with dynamic linking. Renaming the two
clashing names is an easy way to do this. For consistency with
the new name for hash_lookup, also rename hash_lookup_with_hash
and hash_lookup_get_hash.
* src/fns.c (hash_find_with_hash): Rename from hash_lookup_with_hash.
(hash_find): Rename from hash_lookup.
(hash_find_get_hash): Rename from hash_lookup_get_hash.
(hash_char_array): Rename from hash_string.
All uses changed.
argv as left after main has proccessed the command-line can differ
both in order and contents of the original command-line arguments,
which can lead to surprising results when restarting emacs on the
cooked argv through `kill-emacs'.
Starting from that observation, consistenly use variables
'initial_cmdline' on Windows, 'initial_argc', 'initial_argv' on
non-Windows, and 'initial_argv0' in all ports.
* src/lisp.h: Declare 'initial_argv0', limit declaration of
'initial_argv' and 'initial_argc' to non-Windows ports.
* src/emacs.c: Likewise, but for the definitions.
(init_cmdargs): Move initialization of 'initial_argv' and
'initial_argc' ...
(copy_args) [!WINDOWSNT]: ... to this new function ...
(main): ... and call that in 'main', also initializing
'initial_argv0' before the command-line processing.
* src/emacs.c (Fkill_emacs):
* src/pgtkterm.c (pgtk_term_init):
* src/sysdep.c (emacs_perror):
* src/xterm.c (x_term_init): Use 'initial_argv0' where only that
is required. (Bug#77389)
* configure.ac (ALIGNOF_INT, ALIGNOF_LONG, ALIGNOF_LONG_LONG):
New variables.
(emacs_cv_alignas_unavailable): Define if alignas and structure
alignment primitives are unavailable. In such an environment,
the MSB tagging scheme must be enabled, as must the GNU malloc.
* msdos/sed2v2.inp: Adjust correspondingly.
* src/alloc.c (union emacs_align_type): Remove types which
contain flexible array members. The address of a field
subsequent to an aggregate with flexible array members cannot
validly be taken.
(mark_memory) [!USE_LSB_TAG && !WIDE_EMACS_INT]: Strip type bits
before scanning memory.
* src/emacs.c (main):
* src/eval.c (Fautoload_do_load):
* src/fns.c (Frequire): Rename a number of illogically named
fields.
* src/lisp.h (ALIGNOF_EMACS_INT): Define to the natural
alignment of EMACS_INT.
(IDEAL_GCALIGNMENT): New macro.
(USE_LSB_TAG): Disable if no alignment specifiers are available,
WIDE_EMACS_INT is undefined, and the natural alignment of
EMACS_INT falls short of LSB tagging's requirements.
(gflags): Rename illogically named fields and don't define them
as bitfields, which runs afoul of certain compiler issues.
(will_dump_p, will_bootstrap_p, will_dump_with_pdumper_p)
(dumped_with_pdumper_p): Adjust accordingly.
* src/pdumper.c (VM_SUPPORTED): Define to 0 when !USE_LSB_TAG.
It is better to read dump files into the heap by hand than to be
supplied with an address that is not representable.
(_dump_object_start_pseudovector): Rename to
dump_object_start_pseudovector, to avoid encroaching on reserved
names.
(START_DUMP_PVEC): Adjust correspondingly.
(dump_mmap_contiguous_vm): Preserve errno around failure
cleanup.
(dump_bitset_bit_set_p): Work around certain compiler issues.
(pdumper_load) [!USE_LSB_TAG]: Reject dump file allocations
that are not representable as Lisp_Objects.
Tested on i386-unknown-solaris2.10, sparc-sun-solaris2.10.
* config.bat (mvOk): Protoize.
(djgppOk): Include sys/version.h for _DJGPP_MINOR.
* lisp/loadup.el: If system-type is ms-dos, dump bootstrap-emacs
as b-emacs.dmp.
* msdos/INSTALL: Document new versions of tools that have been
verified successfully to compile Emacs.
* msdos/emacs.djl: New linker script that arranges to link
symbols in `.subrs' in a contiguous part of data, as the DJGPP
runtime appears to treat any non-data and non-text section as
allocatable.
* msdos/mainmake.v2 (install): Install emacs.dmp alongside
emacs.exe.
* msdos/sed1v2.inp (CFLAGS): Define to -O2 -g3.
(LDFLAGS): Provide the said linker script.
(HAVE_PDUMPER): Define to yes.
(UNEXEC_OBJ, PAXCTL_dumped, PAXCTL_notdumped): Delete.
(DUMPING): Set to pdumper.
(MAKE_PDUMPER_FINGERPRINT): Don't erase this variable.
Don't stubify or set minstack. Remove native-comp specific
directives. Don't remove temacs prior to copying and replace
`pdmp' extension with DOS-conformant `dmp'.
* msdos/sed2v2.inp (HAVE_UNEXEC): Remove definition.
(HAVE_PDUMPER): Define to 1.
* msdos/sed6.inp (top_srcdir): Define appropriately.
* msdos/sedlibmk.inp (HAVE_BLKCNT_T): Define to 1.
* src/emacs.c (load_pdump) [MSDOS]: Use `dmp' suffix.
* src/pdumper.c (Fdump_emacs_portable) [MSDOS]: Replace ".pdmp"
suffixes with ".dmp".
From a suggestion by Pip Cet.
* src/alloc.c (make_formatted_string): Omit first argument,
to simplify the calling convention. All callers changed.
* src/doprnt.c (doprnt): Also support %u. Update doc.
The case_Lisp_Int macro was originally introduced with different
definitions depending on USE_2_TAGS_FOR_INTS. However, since commit
2b57012478, we have assumed that USE_2_TAGS_FOR_INTS is always
defined, and the macro has only a single definition. As a result, the
macro is now unnecessary, and replacing it with standard C case labels
improves readability and understanding.
* src/lisp.h (case_Lisp_Int): Delete macro.
* src/alloc.c (process_mark_stack, survives_gc_p):
* src/data.c (Fcl_type_of):
* src/fns.c (value_cmp, sxhash_obj):
* src/pdumper.c (dump_object):
* src/print.c (print_object):
* src/xfaces.c (face_attr_equal_p): Remove uses of above macro.
Since the introduction of the 'calln' macro, the 'call1', 'call2', ...,
'call8' macros are just aliases for the former. This is slightly
misleading and potentially unhelpful. The number of arguments N can
also easily go out-of-synch with the used alias callN. There is no
reason not to replace these aliases with using 'calln' directly.
To reduce the risk for mistakes, the tool Coccinelle was used to make
these changes. See <https://coccinelle.gitlabpages.inria.fr/website/>.
* src/alloc.c, src/androidvfs.c, src/androidfns.c, src/buffer.c:
* src/callint.c, src/callproc.c, src/casefiddle.c, src/charset.c:
* src/chartab.c, src/cmds.c, src/coding.c, src/composite.c:
* src/data.c, src/dbusbind.c, src/dired.c, src/doc.c:
* src/emacs.c, src/eval.c, src/fileio.c, src/filelock.c:
* src/fns.c, src/frame.c, src/gtkutil.c, src/haikufns.c:
* src/haikumenu.c, src/image.c, src/insdel.c, src/intervals.c:
* src/keyboard.c, src/keymap.c, src/lisp.h, src/lread.c:
* src/minibuf.c, src/nsfns.m, src/nsselect.m, src/pgtkfns.c:
* src/pgtkselect.c, src/print.c, src/process.c, src/sort.c:
* src/syntax.c, src/textconv.c, src/textprop.c, src/undo.c:
* src/w32fns.c, src/window.c, src/xfaces.c, src/xfns.c:
* src/xmenu.c, src/xselect.c, src/xterm.c:
Replace all uses of 'call1', 'call2', ..., 'call8' with 'calln'.
Since native subrs can have either etc/DOC indexes or vector indexes,
we use the sign bit of the 'doc' field to distinguish the two cases.
* src/comp.c (native_function_doc, make_subr): Use one's complement of
doc index for native subrs.
* src/doc.c (store_function_docstring): Add assertion.
* src/lisp.h (struct Lisp_Subr): Document 'doc' sign bit.
* src/lisp.h (define_error): Move declaration to its proper place, make
external, and move its docstring...
* src/eval.c (define_error): ...to its function definition.
This is brittle, as evinced by the recent problem with lib/stdlib.c.
* src/conf_post.h: Move potential inclusion of stdlib.h and
redefinitions of malloc, free, realloc, aligned_alloc, and calloc
from here ...
* src/lisp.h: ... to here. Do not redefine the symbols
if UNEXMACOS_C is defined.
* src/unexmacosx.c: Do not undef malloc, realloc, free.
(UNEXMACOS_C): New symbol, to prevent lisp.h from defining them.
Outside compilation 'symbols_with_pos_enabled' is always false, so ask
the compiler to organize the most likely execution path in a sequential
fashion in order to favor run-time performance.
* src/lisp.h: Remove comment about ‘volatile’ that was mistakenly
left behind when 2013-10-03T04:58:56!monnier@iro.umontreal.ca
(adf2aa6140) removed all the volatile
members of struct handler.
6a512ab032 Fix a typo in Eglot manual
7b752a93a4 Fix dumping of Lisp profiles
bfe07eca59 Fix 'apropos-library' for 'define-symbol-props'
5c1bd99139 Fix 'forward-comment' in 'toml-ts-mode'
e966dd5ee2 Document spell-checking of multiple languages
8a072d1f05 Apply --display kluge for PGTK too
This makes comparison functions (=, /=, <, <=, >, >=, min, max) quite
a bit faster (10-20 %). Bytecode ops on fixnums are not affected,
nor is `value<`.
* src/data.c (arithcompare): Simplify the code to reduce the number of
branches. Remove the comparison code argument; instead, return the
relation encoded as bits, which can be tested cheaply. All callers
adapted.
* src/lisp.h (enum Arith_Comparison): Remove.
(Cmp_Bit_*, cmp_bits_t): New.
* src/alloc.c (make_clear_bool_vector): It’s now the caller’s
responsibility to make sure the bool vector length is in range.
Add an eassert to double-check this. This lets some locals be
ptrdiff_t not EMACS_INT.
(Fmake_bool_vector, Fbool_vector):
Check that bool vector lengths are in range.
* src/lisp.h (BOOL_VECTOR_LENGTH_MAX): New macro.
(bool_vector_words, bool_vector_bytes): Avoid undefined
behavior if size == EMACS_INT_MAX - (BITS_PER_BITS_WORD - 1).
This is mostly theoretical but it’s easy to do it right.
* src/lread.c (read_bool_vector): Use EMACS_INT, not just ptrdiff_t.
Check that length doesn’t exceed BOOL_VECTOR_LENGTH_MAX.
This fixes an unlikely integer overflow where the calculated size
went negative.
Although loading uninitialized works from memory and then ignoring
the result works fine on conventional architectures, it
technically has undefined behavior in C, so redo bool_vector
allocation so that the code never does that. This can improve
performance when allocating large vectors of nil, since calloc can
clear the memory lazily.
* src/alloc.c (make_clear_bool_vector): New function,
a generalization of make_uninit_bool_vector.
(make_uninit_bool_vector): Use it.
(Fmake_bool_vector): If !INIT, rely on make_clear_bool_vector.
* src/alloc.c (Fbool_vector):
* src/fns.c (Freverse): Don’t access uninitialized bool_vector words.
Like locale.h, it was standardized by C89, is universally
available now, and some code already assumes it.
* configure.ac: Do not check for setlocale.
* src/emacs.c (setlocale) [!HAVE_SETLOCALE]: Remove.
(fixup_locale, synchronize_locale, Vprevious_system_time_locale)
(synchronize_system_time_locale): Define even if !HAVE_SETLOCALE.
* src/sysdep.c (emacs_setlocale): Simplify by assuming HAVE_SETLOCALE.
* src/coding.c (setup_coding_system): Initialize it.
(produce_chars, encode_coding, decode_coding_gap):
Obey it in insert_from_gap calls.
(encode_string_utf_8, decode_string_utf_8): Update the other calls
to insert_from_gap to have one new argument (false).
* src/coding.h: New field insert_before_markers.
* src/decompress.c (Fzlib_decompress_region): Here too.
* src/insdel.c (insert_from_gap):
Accept new argument BEFORE_MARKERS (bug#71525) and pass it through
to adjust_markers_for_insert.
* src/lisp.h: Update prototype.
* src/process.c (read_and_insert_process_output):
Set process_coding->insert_before_markers instead of calling
adjust_markers_for_insert.