1
Fork 0
mirror of git://git.sv.gnu.org/emacs.git synced 2026-02-25 17:31:04 -08:00

Resolve FIXME's in tree-sitter manual sections

Pattern vs query: a query consists of many patterns.  I tightened up
the use of pattern vs query in the manual, now there shouldn't be
ambiguities.

* doc/lispref/modes.texi (Parser-based Font Lock):
* doc/lispref/parsing.texi (Language Definitions): Resolve FIXME's.
This commit is contained in:
Yuan Fu 2022-10-22 18:44:11 -07:00
parent 6f28810f6b
commit 773cce640f
No known key found for this signature in database
GPG key ID: 56E19BC57664A442
2 changed files with 103 additions and 110 deletions

View file

@ -3904,10 +3904,17 @@ variables with regexp-based font lock, it uses similar customization
schemes. The tree-sitter counterpart of @var{font-lock-keywords} is
@var{treesit-font-lock-settings}.
@c FIXME: The ``query'' part here and thereafter comes ``out of the
@c blue''. There should be some text here explaining what those
@c ``queries'' are and how are they related to fontifications, or a
@c cross-reference to another place with such an explanation.
In general, tree-sitter fontification works like the following: a Lisp
program provides a @dfn{query} consisting of @dfn{patterns} with
@dfn{capture names}. Tree-sitter finds the nodes in the parse tree
that match these patterns, tags the corresponding capture names onto
the nodes, and returns them to the Lisp program. The Lisp program
takes theses nodes and highlights the corresponding buffer text of
each node depending on the tagged capture name of the node. For
example, a node tagged @code{font-lock-keyword} would simply be
highlighted in @code{font-lock-keyword} face. For more information on
queries, patterns and capture names, @pref{Pattern Matching}.
@defun treesit-font-lock-rules :keyword value query...
This function is used to set @var{treesit-font-lock-settings}. It
takes care of compiling queries and other post-processing, and outputs
@ -3948,9 +3955,10 @@ Other keywords are optional:
@item @tab @code{keep} @tab Fill-in regions without an existing face
@end multitable
@c FIXME: The ``capture names'' part should be expl,ained before it is
@c first used: what it is and how it's related to fontifications.
Capture names in @var{query} should be face names like
Lisp programs mark patterns in the query with capture names (names
that starts with @code{@@}), and tree-sitter will return matched nodes
with capture names tagged onto them. For the purpose of
fontification, capture names in @var{query} should be face names like
@code{font-lock-keyword-face}. The captured node will be fontified
with that face. Capture names can also be function names, in which
case the function is called with 3 arguments: @var{start}, @var{end},
@ -3966,9 +3974,8 @@ is a list that represents a decoration level.
@code{font-lock-maximum-decoration} controls which levels are
activated.
@c FIXME: This should be rewritten using our style: ``each element of
@c the list is a list of the form (FOO BAR BAZ), where FOO...'' etc.
Inside each sublist are feature symbols, which correspond to the
Each element of the list is a list of the form @w{@code{(@var{feature}
@dots{})}}, where each @var{feature} corresponds to the
@code{:feature} value of a query defined in
@code{treesit-font-lock-rules}. Removing a feature symbol from this
list disables the corresponding query during font-lock.
@ -3992,40 +3999,18 @@ For example, the value of this variable could be:
Major modes should set this variable before calling
@code{treesit-font-lock-enable}.
@c FIXME: ``for further changes''? This should clarify when this
@c function has to be called.
@findex treesit-font-lock-recompute-features
In addition, for further changes to this variable to take effect, call
@code{treesit-font-lock-recompute-features}.
For this variable to take effect, a Lisp program should call
@code{treesit-font-lock-recompute-features} (which resets
@code{treesit-font-lock-settings} accordingly).
@end defvar
@defvar treesit-font-lock-settings
A list of settings for tree-sitter based font lock. The exact format
of this variable is considered internal. One should always use
@code{treesit-font-lock-rules} to set this variable.
@c FIXME: If the format is considered ``internal'', why do we need to
@c describe it here?
Each @var{setting} is of form
@example
(@var{query} @var{enable} @var{feature} @var{override})
@end example
@var{query} must be a compiled query (@pxref{Pattern Matching}).
For @var{setting} to be activated for font-lock, @var{enable} must be
@code{t}. To disable this @var{setting}, set @var{enable} to
@code{nil}.
@var{feature} is the ``feature name'' of the query, users can control
which features are enabled with @code{font-lock-maximum-decoration}
and @code{treesit-font-lock-feature-list}.
@var{override} is the override flag for this query. Its value can be
@code{t}, @code{nil}, @code{append}, @code{prepend}, or @code{keep}.
@c FIXME: See where?
See more in @code{treesit-font-lock-rules}.
@c Because the format is internal, we don't document them here.
@c Though We do have explanations in the docstring.
@end defvar
Multi-language major modes should provide range functions in
@ -4790,27 +4775,26 @@ a list of the form: @w{@code{(@var{language} . @var{rules})}}, where
@var{language} is a language symbol, and @var{rules} is a list of the
form @w{@code{(@var{matcher} @var{anchor} @var{offset})}}.
@c FIXME: ``node''?
First, Emacs passes the node at point to @var{matcher}; if it returns
non-@code{nil}, this rule is applicable. Then Emacs passes the node
to @var{anchor}, which returns a buffer position. Emacs takes the
column number of that position, adds @var{offset} to it, and the
result is the indentation column for the current line.
First, Emacs passes the smallest tree-sitter node at the beginning of
the current line to @var{matcher}; if it returns non-@code{nil}, this
rule is applicable. Then Emacs passes the node to @var{anchor}, which
returns a buffer position. Emacs takes the column number of that
position, adds @var{offset} to it, and the result is the indentation
column for the current line.
The @var{matcher} and @var{anchor} are functions, and Emacs provides
convenient defaults for them.
@c FIXME: Clarify the following description. In particular, how to
@c find/compute ``the largest node'' and its ``parent''?
Each @var{matcher} or @var{anchor} is a function that takes three
arguments: @var{node}, @var{parent}, and @var{bol}. The argument
@var{bol} is the buffer position whose indentation is required: the
position of the first non-whitespace character after the beginning of
the line. The argument @var{node} is the largest (highest-in-tree)
node that starts at that position; and @var{parent} is the parent of
@var{node}. @var{matcher} should return non-@code{nil} if the rule is
applicable, and @var{anchor} should return a buffer position that is
the basis of the indentation.
@var{node}. Emacs finds @var{bol}, @var{node} and @var{parent} and
passes them to each @var{matcher} and @var{anchor}. @var{matcher}
should return non-@code{nil} if the rule is applicable, and
@var{anchor} should return a buffer position.
@end defvar
@defvar treesit-simple-indent-presets
@ -4821,63 +4805,69 @@ available default functions are:
@ftable @code
@item no-node
This matcher is a symbol that matches the case where @var{node} is
This matcher is a function that matches the case where @var{node} is
@code{nil}, i.e., there is no node that starts at @var{bol}. This is
the case when @var{bol} is on an empty line or inside a multi-line
string, etc.
@item parent-is
This matcher is a function of one argument, @var{type}; it matches if
the type of the parent node is @var{type}.
This matcher is a function of one argument, @var{type}; it return a
function that given @w{@code{(@var{node} @var{parent} @var{bol})}},
matches if @var{parent}'s type is @var{type}.
@item node-is
This matcher is a function of one argument, @var{type}; it matches if
the node's type is @var{type}.
This matcher is a function of one argument, @var{type}; it returns a
function that given @w{@code{(@var{node} @var{parent} @var{bol})}},
matches if @var{node}'s type is @var{type}.
@c FIXME: The description of this matcher is unclear. What is
@c ``parent'' and what does it mean ``captures NODE''?
@item query
This matcher is a function of one argument, @var{query}; it matches if
querying @var{parent} with @var{query} captures @var{node}. The
capture name does not matter. @c Why is this bit important?
This matcher is a function of one argument, @var{query}; it returns a
function that given @w{@code{(@var{node} @var{parent} @var{bol})}},
matches if querying @var{parent} with @var{query} captures @var{node}
(@pxref{Pattern Matching}).
@item match
This matcher is a function of 5 arguments: @var{node-type},
@var{parent-type}, @var{node-field}, @var{node-index-min}, and
@var{node-index-max}). It matches if @var{node}'s type is @var{node-type},
@var{parent}'s type is @var{parent-type}, @var{node}'s field name in
@var{parent} is @var{node-field}, and @var{node}'s index among its
siblings is between @var{node-index-min} and @var{node-index-max}. If
@c FIXME: ``constraint''?
the value of a constraint is nil, this matcher doesn't check for that
constraint. For example, to match the first child where parent is
@var{node-index-max}). It returns a function that given
@w{@code{(@var{node} @var{parent} @var{bol})}}, matches if
@var{node}'s type is @var{node-type}, @var{parent}'s type is
@var{parent-type}, @var{node}'s field name in @var{parent} is
@var{node-field}, and @var{node}'s index among its siblings is between
@var{node-index-min} and @var{node-index-max}. If the value of an
argument is @code{nil}, this matcher doesn't check for that argument.
For example, to match the first child where parent is
@code{argument_list}, use
@example
(match nil "argument_list" nil nil 0 0)
@end example
@c FIXME: ``PARENT''? is that an argument of the anchor function
@item first-sibling
This anchor returns the start of the first child of @var{parent}.
This anchor is a function that given @w{@code{(@var{node} @var{parent}
@var{bol})}}, returns the start of the first child of @var{parent}.
@item parent
This anchor returns the start of @var{parent}. @c FIXME: Likewise.
This anchor is a function that given @w{@code{(@var{node} @var{parent}
@var{bol})}}, returns the start of @var{parent}.
@item parent-bol
This anchor returns the first non-space character on the line of
This anchor is a function that given @w{@code{(@var{node} @var{parent}
@var{bol})}}, returns the first non-space character on the line of
@var{parent}.
@c FIXME: ``NODE''?
@item prev-sibling
This anchor returns the start of the previous sibling of @var{node}.
This anchor is a function that given @w{@code{(@var{node} @var{parent}
@var{bol})}}, returns the start of the previous sibling of @var{node}.
@item no-indent
This anchor returns the start of @var{node}, i.e., no indent. @c ???
This anchor is a function that given @w{@code{(@var{node} @var{parent}
@var{bol})}}, returns the start of @var{node}.
@item prev-line
This anchor returns the first non-whitespace charater on the previous
line.
This anchor is a function that given @w{@code{(@var{node} @var{parent}
@var{bol})}}, returns the first non-whitespace charater on the
previous line.
@end ftable
@end defvar

View file

@ -95,7 +95,7 @@ This means Emacs could not find the language definition library.
@item (symbol-error @var{error-msg})
This means Emacs could not find in the library the expected function
that every language definition library should export.
@item (version_mismatch @var{error-msg})
@item (version-mismatch @var{error-msg})
This means the version of language definition library is incompatible
with that of the tree-sitter library.
@end table
@ -253,7 +253,7 @@ syntax tree effectively, you need to consult the @dfn{grammar file}.
The grammar file is usually @file{grammar.js} in a language
definition's project repository. The link to a language definition's
home page can be found on
@uref{https://tree-sitter.github.io/tree-sitter, the tree-sitter's
@uref{https://tree-sitter.github.io/tree-sitter, tree-sitter's
homepage}.
The grammar definition is written in JavaScript. For example, the
@ -405,11 +405,11 @@ returns non-@code{nil} if it is, @code{nil} otherwise.
@end defun
There is no need to explicitly parse a buffer, because parsing is done
automatically and lazily. A parser only parses when the mode queris
for a node in its syntax tree. Therefore, when a parser is first
created, it doesn't parse the buffer; it waits until the mode queries
for a node for the first time. Similarly, when some change is made in
the buffer, a parser doesn't re-parse immediately.
automatically and lazily. A parser only parses when a Lisp program
queris for a node in its syntax tree. Therefore, when a parser is
first created, it doesn't parse the buffer; it waits until the Lisp
program queries for a node for the first time. Similarly, when some
change is made in the buffer, a parser doesn't re-parse immediately.
@vindex treesit-buffer-too-large
When a parser does parse, it checks for the size of the buffer.
@ -510,7 +510,7 @@ Example:
@group
;; Find the node at point in a C parser's syntax tree.
(treesit-node-at (point) 'c)
@result{} #<treesit-node from 1 to 4 in *scratch*>
@result{} #<treesit-node (primitive_type) in *scratch*>
@end group
@end example
@end defun
@ -606,7 +606,7 @@ This function finds the child of @var{node} whose field name is
@group
;; Get the child that has "body" as its field name.
(treesit-child-by-field-name node "body")
@result{} #<treesit-node from 3 to 11 in *scratch*>
@result{} #<treesit-node (compound_statement) in *scratch*>
@end group
@end example
@end defun
@ -644,20 +644,24 @@ does.
By default, this function only traverses named nodes, but if @var{all}
is non-@code{nil}, it traverses all the nodes. If @var{backward} is
@c FIXME: What does it mean to ``traverse backward''?
non-nil, it traverses backwards. If @var{limit} is non-@code{nil}, it
non-nil, it traverses backwards (meaning visiting the last child first
when traversing down the tree). If @var{limit} is non-@code{nil}, it
must be a number that limits the tree traversal to that many levels
down the tree.
@end defun
@defun treesit-search-forward start predicate &optional all backward up
@c FIXME: Explain better what is the differencve between this function
@c and the previous one.
This function is somewhat similar to @code{treesit-search-subtree}.
It also traverse the parse tree and matches each node with
@var{predicate} (except for @var{start}), where @var{predicate} can be
a (case-insensitive) regexp or a function. For a tree like the below
where @var{start} is marked 1, this function traverses as numbered:
While @code{treesit-search-subtree} traverses the subtree of a node,
this function usually starts with a leaf node and traverses every node
comes after it in terms of buffer position. It is useful for
answering questions like ``what is the first node after @var{start} in
the buffer that satisfies some condition?''
Like @code{treesit-search-subtree}, this function also traverse the
parse tree and matches each node with @var{predicate} (except for
@var{start}), where @var{predicate} can be a (case-insensitive) regexp
or a function. For a tree like the below where @var{start} is marked
1, this function traverses as numbered:
@example
@group
@ -830,7 +834,7 @@ is not yet in its final form.
@cindex tree-sitter extra node
@cindex extra node, tree-sitter
A node can be ``extra'': extra nodes represent things like comments,
A node can be ``extra'': such nodes represent things like comments,
which can appear anywhere in the text.
@cindex tree-sitter node that has changes
@ -1007,9 +1011,9 @@ root node with @var{query}, and returns the result.
@heading More query syntax
Besides node type and capture, tree-sitter's query syntax can express
anonymous node, field name, wildcard, quantification, grouping,
alternation, anchor, and predicate.
Besides node type and capture, tree-sitter's pattern syntax can
express anonymous node, field name, wildcard, quantification,
grouping, alternation, anchor, and predicate.
@subheading Anonymous node
@ -1022,9 +1026,9 @@ pattern matching (and capturing) keyword @code{return} would be
@subheading Wild card
In a query pattern, @samp{(_)} matches any named node, and @samp{_}
matches any named and anonymous node. For example, to capture any
named child of a @code{binary_expression} node, the pattern would be
In a pattern, @samp{(_)} matches any named node, and @samp{_} matches
any named and anonymous node. For example, to capture any named child
of a @code{binary_expression} node, the pattern would be
@example
(binary_expression (_) @@in_biexp)
@ -1032,10 +1036,10 @@ named child of a @code{binary_expression} node, the pattern would be
@subheading Field name
It is possible to capture child nodes that have specific field names:
It is possible to capture child nodes that have specific field names.
In the pattern below, @code{declarator} and @code{body} are field
names, indicated by the colon following them.
@c FIXME: The significance of ``:'' should be explained, and also what
@c are ``declarator'' and ``body''.
@example
@group
(function_definition
@ -1059,7 +1063,6 @@ Tree-sitter recognizes quantification operators @samp{*}, @samp{+} and
@samp{*} matches the preceding pattern zero or more times, @samp{+}
matches one or more times, and @samp{?} matches zero or one time.
@c FIXME: ``pattern'' or :''query''? Or maybe ``query pattern''?
For example, the following pattern matches @code{type_declaration}
nodes that has @emph{zero or more} @code{long} keyword.
@ -1087,9 +1090,9 @@ express a comma separated list of identifiers, one could write
@subheading Alternation
Again, similar to regular expressions, we can express ``match anyone
from this group of patterns'' in the query pattern. The syntax is a
list of patterns enclosed in square brackets. For example, to capture
some keywords in C, the query pattern would be
from this group of patterns'' in a pattern. The syntax is a list of
patterns enclosed in square brackets. For example, to capture some
keywords in C, the pattern would be
@example
@group
@ -1136,7 +1139,7 @@ nodes.
@subheading Predicate
It is possible to add predicate constraints to a pattern. For
example, with the following query pattern:
example, with the following pattern:
@example
@group
@ -1170,11 +1173,11 @@ names in other patterns.
@heading S-expression patterns
@cindex query patterns as sexps
@cindex patterns as sexps
@cindex patterns, tree-sitter, in sexp form
Besides strings, Emacs provides a s-expression based syntax for query
Besides strings, Emacs provides a s-expression based syntax for
patterns. It largely resembles the string-based syntax. For example,
the following pattern
the following query
@example
@group