1
Fork 0
mirror of git://git.sv.gnu.org/emacs.git synced 2026-01-30 04:10:54 -08:00

Add tree-sitter admin notes

starter-guide: Guide on writing major mode features.
build-module: Script for building official language definitions.
html-manual: HTML version of the manual for easy access.

* admin/notes/tree-sitter/build-module/README: New file.
* admin/notes/tree-sitter/build-module/batch.sh: New file.
* admin/notes/tree-sitter/build-module/build.sh: New file.
* admin/notes/tree-sitter/starter-guide: New file.
* admin/notes/tree-sitter/html-manual/Accessing-Node.html: New file.
* admin/notes/tree-sitter/html-manual/Language-Definitions.html: New file.
* admin/notes/tree-sitter/html-manual/Multiple-Languages.html: New file.
* admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html:
New file.
* admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html:
New file.
* admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html: New
file.
* admin/notes/tree-sitter/html-manual/Pattern-Matching.html: New file.
* admin/notes/tree-sitter/html-manual/Retrieving-Node.html: New file.
* admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html: New
file.
* admin/notes/tree-sitter/html-manual/Using-Parser.html: New file.
* admin/notes/tree-sitter/html-manual/build-manual.sh: New file.
* admin/notes/tree-sitter/html-manual/manual.css: New file.
This commit is contained in:
Yuan Fu 2022-10-05 14:11:33 -07:00
parent 1ea503ed4b
commit cb183f6467
No known key found for this signature in database
GPG key ID: 56E19BC57664A442
16 changed files with 3444 additions and 0 deletions

View file

@ -0,0 +1,17 @@
To build the language definition for a particular language, run
./build.sh <language>
eg,
./build.sh html
The dynamic module will be in /dist directory
To build all modules at once, run
./batch.sh
This gives you C, JSON, Go, HTML, Javascript, CSS, Python, Typescript,
C#, C++, Rust. More can be added to batch.sh unless it's directory
strucure is not standard.

View file

@ -0,0 +1,20 @@
#!/bin/bash
languages=(
'c'
'cpp'
'css'
'c-sharp'
'go'
'html'
'javascript'
'json'
'python'
'rust'
'typescript'
)
for language in "${languages[@]}"
do
./build.sh $language
done

View file

@ -0,0 +1,62 @@
#!/bin/bash
lang=$1
if [ $(uname) == "Darwin" ]
then
soext="dylib"
else
soext="so"
fi
echo "Building ${lang}"
# Retrieve sources.
git clone "https://github.com/tree-sitter/tree-sitter-${lang}.git" \
--depth 1 --quiet
if [ "${lang}" == "typescript" ]
then
lang="typescript/tsx"
fi
cp tree-sitter-lang.in "tree-sitter-${lang}/src"
cp emacs-module.h "tree-sitter-${lang}/src"
cp "tree-sitter-${lang}/grammar.js" "tree-sitter-${lang}/src"
cd "tree-sitter-${lang}/src"
if [ "${lang}" == "typescript/tsx" ]
then
lang="typescript"
fi
# Build.
cc -c -I. parser.c
# Compile scanner.c.
if test -f scanner.c
then
cc -fPIC -c -I. scanner.c
fi
# Compile scanner.cc.
if test -f scanner.cc
then
c++ -fPIC -I. -c scanner.cc
fi
# Link.
if test -f scanner.cc
then
c++ -fPIC -shared *.o -o "libtree-sitter-${lang}.${soext}"
else
cc -fPIC -shared *.o -o "libtree-sitter-${lang}.${soext}"
fi
# Copy out.
if [ "${lang}" == "typescript" ]
then
cp "libtree-sitter-${lang}.${soext}" ..
cd ..
fi
mkdir -p ../../dist
cp "libtree-sitter-${lang}.${soext}" ../../dist
cd ../../
rm -rf "tree-sitter-${lang}"

View file

@ -0,0 +1,206 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This is the GNU Emacs Lisp Reference Manual
corresponding to Emacs version 29.0.50.
Copyright © 1990-1996, 1998-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License," with the
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
Texts as in (a) below. A copy of the license is included in the
section entitled "GNU Free Documentation License."
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual. Buying copies from the FSF supports it in
developing GNU and promoting software freedom." -->
<title>Accessing Node (GNU Emacs Lisp Reference Manual)</title>
<meta name="description" content="Accessing Node (GNU Emacs Lisp Reference Manual)">
<meta name="keywords" content="Accessing Node (GNU Emacs Lisp Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
<link href="Pattern-Matching.html" rel="next" title="Pattern Matching">
<link href="Retrieving-Node.html" rel="prev" title="Retrieving Node">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
span:hover a.copiable-anchor {visibility: visible}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="./manual.css">
</head>
<body lang="en">
<div class="section" id="Accessing-Node">
<div class="header">
<p>
Next: <a href="Pattern-Matching.html" accesskey="n" rel="next">Pattern Matching Tree-sitter Nodes</a>, Previous: <a href="Retrieving-Node.html" accesskey="p" rel="prev">Retrieving Node</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Accessing-Node-Information"></span><h3 class="section">37.4 Accessing Node Information</h3>
<p>Before going further, make sure you have read the basic conventions
about tree-sitter nodes in the previous node.
</p>
<span id="Basic-information"></span><h3 class="heading">Basic information</h3>
<p>Every node is associated with a parser, and that parser is associated
with a buffer. The following functions let you retrieve them.
</p>
<dl class="def">
<dt id="index-treesit_002dnode_002dparser"><span class="category">Function: </span><span><strong>treesit-node-parser</strong> <em>node</em><a href='#index-treesit_002dnode_002dparser' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns <var>node</var>&rsquo;s associated parser.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002dbuffer"><span class="category">Function: </span><span><strong>treesit-node-buffer</strong> <em>node</em><a href='#index-treesit_002dnode_002dbuffer' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns <var>node</var>&rsquo;s parser&rsquo;s associated buffer.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002dlanguage"><span class="category">Function: </span><span><strong>treesit-node-language</strong> <em>node</em><a href='#index-treesit_002dnode_002dlanguage' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns <var>node</var>&rsquo;s parser&rsquo;s associated language.
</p></dd></dl>
<p>Each node represents a piece of text in the buffer. Functions below
finds relevant information about that text.
</p>
<dl class="def">
<dt id="index-treesit_002dnode_002dstart"><span class="category">Function: </span><span><strong>treesit-node-start</strong> <em>node</em><a href='#index-treesit_002dnode_002dstart' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Return the start position of <var>node</var>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002dend"><span class="category">Function: </span><span><strong>treesit-node-end</strong> <em>node</em><a href='#index-treesit_002dnode_002dend' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Return the end position of <var>node</var>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002dtext"><span class="category">Function: </span><span><strong>treesit-node-text</strong> <em>node &amp;optional object</em><a href='#index-treesit_002dnode_002dtext' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Returns the buffer text that <var>node</var> represents. (If <var>node</var> is
retrieved from parsing a string, it will be text from that string.)
</p></dd></dl>
<p>Here are some basic checks on tree-sitter nodes.
</p>
<dl class="def">
<dt id="index-treesit_002dnode_002dp"><span class="category">Function: </span><span><strong>treesit-node-p</strong> <em>object</em><a href='#index-treesit_002dnode_002dp' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Checks if <var>object</var> is a tree-sitter syntax node.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002deq"><span class="category">Function: </span><span><strong>treesit-node-eq</strong> <em>node1 node2</em><a href='#index-treesit_002dnode_002deq' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Checks if <var>node1</var> and <var>node2</var> are the same node in a syntax
tree.
</p></dd></dl>
<span id="Property-information"></span><h3 class="heading">Property information</h3>
<p>In general, nodes in a concrete syntax tree fall into two categories:
<em>named nodes</em> and <em>anonymous nodes</em>. Whether a node is named
or anonymous is determined by the language definition
(see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
</p>
<span id="index-tree_002dsitter-missing-node"></span>
<p>Apart from being named/anonymous, a node can have other properties. A
node can be &ldquo;missing&rdquo;: missing nodes are inserted by the parser in
order to recover from certain kinds of syntax errors, i.e., something
should probably be there according to the grammar, but not there.
</p>
<span id="index-tree_002dsitter-extra-node"></span>
<p>A node can be &ldquo;extra&rdquo;: extra nodes represent things like comments,
which can appear anywhere in the text.
</p>
<span id="index-tree_002dsitter-node-that-has-changes"></span>
<p>A node &ldquo;has changes&rdquo; if the buffer changed since when the node is
retrieved, i.e., outdated.
</p>
<span id="index-tree_002dsitter-node-that-has-error"></span>
<p>A node &ldquo;has error&rdquo; if the text it spans contains a syntax error. It
can be the node itself has an error, or one of its
children/grandchildren... has an error.
</p>
<dl class="def">
<dt id="index-treesit_002dnode_002dcheck"><span class="category">Function: </span><span><strong>treesit-node-check</strong> <em>node property</em><a href='#index-treesit_002dnode_002dcheck' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function checks if <var>node</var> has <var>property</var>. <var>property</var>
can be <code>'named</code>, <code>'missing</code>, <code>'extra</code>,
<code>'has-changes</code>, or <code>'has-error</code>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002dtype"><span class="category">Function: </span><span><strong>treesit-node-type</strong> <em>node</em><a href='#index-treesit_002dnode_002dtype' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Named nodes have &ldquo;types&rdquo; (see <a href="Language-Definitions.html#tree_002dsitter-node-type">node type</a>).
For example, a named node can be a <code>string_literal</code> node, where
<code>string_literal</code> is its type.
</p>
<p>This function returns <var>node</var>&rsquo;s type as a string.
</p></dd></dl>
<span id="Information-as-a-child-or-parent"></span><h3 class="heading">Information as a child or parent</h3>
<dl class="def">
<dt id="index-treesit_002dnode_002dindex"><span class="category">Function: </span><span><strong>treesit-node-index</strong> <em>node &amp;optional named</em><a href='#index-treesit_002dnode_002dindex' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the index of <var>node</var> as a child node of its
parent. If <var>named</var> is non-nil, it only count named nodes
(see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002dfield_002dname"><span class="category">Function: </span><span><strong>treesit-node-field-name</strong> <em>node</em><a href='#index-treesit_002dnode_002dfield_002dname' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>A child of a parent node could have a field name (see <a href="Language-Definitions.html#tree_002dsitter-node-field-name">field name</a>). This function returns the field name
of <var>node</var> as a child of its parent.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002dfield_002dname_002dfor_002dchild"><span class="category">Function: </span><span><strong>treesit-node-field-name-for-child</strong> <em>node n</em><a href='#index-treesit_002dnode_002dfield_002dname_002dfor_002dchild' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the field name of the <var>n</var>&rsquo;th child of
<var>node</var>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dchild_002dcount"><span class="category">Function: </span><span><strong>treesit-child-count</strong> <em>node &amp;optional named</em><a href='#index-treesit_002dchild_002dcount' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the number of children of <var>node</var>. If
<var>named</var> is non-nil, it only counts named child (see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
</p></dd></dl>
</div>
<hr>
<div class="header">
<p>
Next: <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>, Previous: <a href="Retrieving-Node.html">Retrieving Node</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>

View file

@ -0,0 +1,326 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This is the GNU Emacs Lisp Reference Manual
corresponding to Emacs version 29.0.50.
Copyright © 1990-1996, 1998-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License," with the
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
Texts as in (a) below. A copy of the license is included in the
section entitled "GNU Free Documentation License."
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual. Buying copies from the FSF supports it in
developing GNU and promoting software freedom." -->
<title>Language Definitions (GNU Emacs Lisp Reference Manual)</title>
<meta name="description" content="Language Definitions (GNU Emacs Lisp Reference Manual)">
<meta name="keywords" content="Language Definitions (GNU Emacs Lisp Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
<link href="Using-Parser.html" rel="next" title="Using Parser">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
span:hover a.copiable-anchor {visibility: visible}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="./manual.css">
</head>
<body lang="en">
<div class="section" id="Language-Definitions">
<div class="header">
<p>
Next: <a href="Using-Parser.html" accesskey="n" rel="next">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Tree_002dsitter-Language-Definitions"></span><h3 class="section">37.1 Tree-sitter Language Definitions</h3>
<span id="Loading-a-language-definition"></span><h3 class="heading">Loading a language definition</h3>
<p>Tree-sitter relies on language definitions to parse text in that
language. In Emacs, A language definition is represented by a symbol.
For example, C language definition is represented as <code>c</code>, and
<code>c</code> can be passed to tree-sitter functions as the <var>language</var>
argument.
</p>
<span id="index-treesit_002dextra_002dload_002dpath"></span>
<span id="index-treesit_002dload_002dlanguage_002derror"></span>
<span id="index-treesit_002dload_002dsuffixes"></span>
<p>Tree-sitter language definitions are distributed as dynamic libraries.
In order to use a language definition in Emacs, you need to make sure
that the dynamic library is installed on the system. Emacs looks for
language definitions under load paths in
<code>treesit-extra-load-path</code>, <code>user-emacs-directory</code>/tree-sitter,
and system default locations for dynamic libraries, in that order.
Emacs tries each extensions in <code>treesit-load-suffixes</code>. If Emacs
cannot find the library or has problem loading it, Emacs signals
<code>treesit-load-language-error</code>. The signal data is a list of
specific error messages.
</p>
<dl class="def">
<dt id="index-treesit_002dlanguage_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-language-available-p</strong> <em>language</em><a href='#index-treesit_002dlanguage_002davailable_002dp' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function checks whether the dynamic library for <var>language</var> is
present on the system, and return non-nil if it is.
</p></dd></dl>
<span id="index-treesit_002dload_002dname_002doverride_002dlist"></span>
<p>By convention, the dynamic library for <var>language</var> is
<code>libtree-sitter-<var>language</var>.<var>ext</var></code>, where <var>ext</var> is the
system-specific extension for dynamic libraries. Also by convention,
the function provided by that library is named
<code>tree_sitter_<var>language</var></code>. If a language definition doesn&rsquo;t
follow this convention, you should add an entry
</p>
<div class="example">
<pre class="example">(<var>language</var> <var>library-base-name</var> <var>function-name</var>)
</pre></div>
<p>to <code>treesit-load-name-override-list</code>, where
<var>library-base-name</var> is the base filename for the dynamic library
(conventionally <code>libtree-sitter-<var>language</var></code>), and
<var>function-name</var> is the function provided by the library
(conventionally <code>tree_sitter_<var>language</var></code>). For example,
</p>
<div class="example">
<pre class="example">(cool-lang &quot;libtree-sitter-coool&quot; &quot;tree_sitter_cooool&quot;)
</pre></div>
<p>for a language too cool to abide by conventions.
</p>
<dl class="def">
<dt id="index-treesit_002dlanguage_002dversion"><span class="category">Function: </span><span><strong>treesit-language-version</strong> <em>&amp;optional min-compatible</em><a href='#index-treesit_002dlanguage_002dversion' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Tree-sitter library has a <em>language version</em>, a language
definition&rsquo;s version needs to match this version to be compatible.
</p>
<p>This function returns tree-sitter librarys language version. If
<var>min-compatible</var> is non-nil, it returns the minimal compatible
version.
</p></dd></dl>
<span id="Concrete-syntax-tree"></span><h3 class="heading">Concrete syntax tree</h3>
<p>A syntax tree is what a parser generates. In a syntax tree, each node
represents a piece of text, and is connected to each other by a
parent-child relationship. For example, if the source text is
</p>
<div class="example">
<pre class="example">1 + 2
</pre></div>
<p>its syntax tree could be
</p>
<div class="example">
<pre class="example"> +--------------+
| root &quot;1 + 2&quot; |
+--------------+
|
+--------------------------------+
| expression &quot;1 + 2&quot; |
+--------------------------------+
| | |
+------------+ +--------------+ +------------+
| number &quot;1&quot; | | operator &quot;+&quot; | | number &quot;2&quot; |
+------------+ +--------------+ +------------+
</pre></div>
<p>We can also represent it in s-expression:
</p>
<div class="example">
<pre class="example">(root (expression (number) (operator) (number)))
</pre></div>
<span id="Node-types"></span><h4 class="subheading">Node types</h4>
<span id="index-tree_002dsitter-node-type"></span>
<span id="tree_002dsitter-node-type"></span><span id="index-tree_002dsitter-named-node"></span>
<span id="tree_002dsitter-named-node"></span><span id="index-tree_002dsitter-anonymous-node"></span>
<p>Names like <code>root</code>, <code>expression</code>, <code>number</code>,
<code>operator</code> are nodes&rsquo; <em>type</em>. However, not all nodes in a
syntax tree have a type. Nodes that don&rsquo;t are <em>anonymous nodes</em>,
and nodes with a type are <em>named nodes</em>. Anonymous nodes are
tokens with fixed spellings, including punctuation characters like
bracket &lsquo;<samp>]</samp>&rsquo;, and keywords like <code>return</code>.
</p>
<span id="Field-names"></span><h4 class="subheading">Field names</h4>
<span id="index-tree_002dsitter-node-field-name"></span>
<span id="tree_002dsitter-node-field-name"></span><p>To make the syntax tree easier to
analyze, many language definitions assign <em>field names</em> to child
nodes. For example, a <code>function_definition</code> node could have a
<code>declarator</code> and a <code>body</code>:
</p>
<div class="example">
<pre class="example">(function_definition
declarator: (declaration)
body: (compound_statement))
</pre></div>
<dl class="def">
<dt id="index-treesit_002dinspect_002dmode"><span class="category">Command: </span><span><strong>treesit-inspect-mode</strong><a href='#index-treesit_002dinspect_002dmode' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This minor mode displays the node that <em>starts</em> at point in
mode-line. The mode-line will display
</p>
<div class="example">
<pre class="example"><var>parent</var> <var>field-name</var>: (<var>child</var> (<var>grand-child</var> (...)))
</pre></div>
<p><var>child</var>, <var>grand-child</var>, and <var>grand-grand-child</var>, etc, are
nodes that have their beginning at point. And <var>parent</var> is the
parent of <var>child</var>.
</p>
<p>If there is no node that starts at point, i.e., point is in the middle
of a node, then the mode-line only displays the smallest node that
spans point, and its immediate parent.
</p>
<p>This minor mode doesn&rsquo;t create parsers on its own. It simply uses the
first parser in <code>(treesit-parser-list)</code> (see <a href="Using-Parser.html">Using Tree-sitter Parser</a>).
</p></dd></dl>
<span id="Reading-the-grammar-definition"></span><h3 class="heading">Reading the grammar definition</h3>
<p>Authors of language definitions define the <em>grammar</em> of a
language, and this grammar determines how does a parser construct a
concrete syntax tree out of the text. In order to use the syntax
tree effectively, we need to read the <em>grammar file</em>.
</p>
<p>The grammar file is usually <code>grammar.js</code> in a language
definitions project repository. The link to a language definitions
home page can be found in tree-sitters homepage
(<a href="https://tree-sitter.github.io/tree-sitter">https://tree-sitter.github.io/tree-sitter</a>).
</p>
<p>The grammar is written in JavaScript syntax. For example, the rule
matching a <code>function_definition</code> node looks like
</p>
<div class="example">
<pre class="example">function_definition: $ =&gt; seq(
$.declaration_specifiers,
field('declarator', $.declaration),
field('body', $.compound_statement)
)
</pre></div>
<p>The rule is represented by a function that takes a single argument
<var>$</var>, representing the whole grammar. The function itself is
constructed by other functions: the <code>seq</code> function puts together a
sequence of children; the <code>field</code> function annotates a child with
a field name. If we write the above definition in BNF syntax, it
would look like
</p>
<div class="example">
<pre class="example">function_definition :=
&lt;declaration_specifiers&gt; &lt;declaration&gt; &lt;compound_statement&gt;
</pre></div>
<p>and the node returned by the parser would look like
</p>
<div class="example">
<pre class="example">(function_definition
(declaration_specifier)
declarator: (declaration)
body: (compound_statement))
</pre></div>
<p>Below is a list of functions that one will see in a grammar
definition. Each function takes other rules as arguments and returns
a new rule.
</p>
<ul>
<li> <code>seq(rule1, rule2, ...)</code> matches each rule one after another.
</li><li> <code>choice(rule1, rule2, ...)</code> matches one of the rules in its
arguments.
</li><li> <code>repeat(rule)</code> matches <var>rule</var> for <em>zero or more</em> times.
This is like the &lsquo;<samp>*</samp>&rsquo; operator in regular expressions.
</li><li> <code>repeat1(rule)</code> matches <var>rule</var> for <em>one or more</em> times.
This is like the &lsquo;<samp>+</samp>&rsquo; operator in regular expressions.
</li><li> <code>optional(rule)</code> matches <var>rule</var> for <em>zero or one</em> time.
This is like the &lsquo;<samp>?</samp>&rsquo; operator in regular expressions.
</li><li> <code>field(name, rule)</code> assigns field name <var>name</var> to the child
node matched by <var>rule</var>.
</li><li> <code>alias(rule, alias)</code> makes nodes matched by <var>rule</var> appear as
<var>alias</var> in the syntax tree generated by the parser. For example,
<div class="example">
<pre class="example">alias(preprocessor_call_exp, call_expression)
</pre></div>
<p>makes any node matched by <code>preprocessor_call_exp</code> to appear as
<code>call_expression</code>.
</p></li></ul>
<p>Below are grammar functions less interesting for a reader of a
language definition.
</p>
<ul>
<li> <code>token(rule)</code> marks <var>rule</var> to produce a single leaf node.
That is, instead of generating a parent node with individual child
nodes under it, everything is combined into a single leaf node.
</li><li> Normally, grammar rules ignore preceding whitespaces,
<code>token.immediate(rule)</code> changes <var>rule</var> to match only when
there is no preceding whitespaces.
</li><li> <code>prec(n, rule)</code> gives <var>rule</var> a level <var>n</var> precedence.
</li><li> <code>prec.left([n,] rule)</code> marks <var>rule</var> as left-associative,
optionally with level <var>n</var>.
</li><li> <code>prec.right([n,] rule)</code> marks <var>rule</var> as right-associative,
optionally with level <var>n</var>.
</li><li> <code>prec.dynamic(n, rule)</code> is like <code>prec</code>, but the precedence
is applied at runtime instead.
</li></ul>
<p>The tree-sitter project talks about writing a grammar in more detail:
<a href="https://tree-sitter.github.io/tree-sitter/creating-parsers">https://tree-sitter.github.io/tree-sitter/creating-parsers</a>.
Read especially &ldquo;The Grammar DSL&rdquo; section.
</p>
</div>
<hr>
<div class="header">
<p>
Next: <a href="Using-Parser.html">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>

View file

@ -0,0 +1,255 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This is the GNU Emacs Lisp Reference Manual
corresponding to Emacs version 29.0.50.
Copyright © 1990-1996, 1998-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License," with the
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
Texts as in (a) below. A copy of the license is included in the
section entitled "GNU Free Documentation License."
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual. Buying copies from the FSF supports it in
developing GNU and promoting software freedom." -->
<title>Multiple Languages (GNU Emacs Lisp Reference Manual)</title>
<meta name="description" content="Multiple Languages (GNU Emacs Lisp Reference Manual)">
<meta name="keywords" content="Multiple Languages (GNU Emacs Lisp Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
<link href="Tree_002dsitter-C-API.html" rel="next" title="Tree-sitter C API">
<link href="Pattern-Matching.html" rel="prev" title="Pattern Matching">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
span:hover a.copiable-anchor {visibility: visible}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="./manual.css">
</head>
<body lang="en">
<div class="section" id="Multiple-Languages">
<div class="header">
<p>
Next: <a href="Tree_002dsitter-C-API.html" accesskey="n" rel="next">Tree-sitter C API Correspondence</a>, Previous: <a href="Pattern-Matching.html" accesskey="p" rel="prev">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Parsing-Text-in-Multiple-Languages"></span><h3 class="section">37.6 Parsing Text in Multiple Languages</h3>
<p>Sometimes, the source of a programming language could contain sources
of other languages, HTML + CSS + JavaScript is one example. In that
case, we need to assign individual parsers to text segments written in
different languages. Traditionally this is achieved by using
narrowing. While tree-sitter works with narrowing (see <a href="Using-Parser.html#tree_002dsitter-narrowing">narrowing</a>), the recommended way is to set ranges in which
a parser will operate.
</p>
<dl class="def">
<dt id="index-treesit_002dparser_002dset_002dincluded_002dranges"><span class="category">Function: </span><span><strong>treesit-parser-set-included-ranges</strong> <em>parser ranges</em><a href='#index-treesit_002dparser_002dset_002dincluded_002dranges' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function sets the range of <var>parser</var> to <var>ranges</var>. Then
<var>parser</var> will only read the text covered in each range. Each
range in <var>ranges</var> is a list of cons <code>(<var>beg</var>
. <var>end</var>)</code>.
</p>
<p>Each range in <var>ranges</var> must come in order and not overlap. That
is, in pseudo code:
</p>
<div class="example">
<pre class="example">(cl-loop for idx from 1 to (1- (length ranges))
for prev = (nth (1- idx) ranges)
for next = (nth idx ranges)
should (&lt;= (car prev) (cdr prev)
(car next) (cdr next)))
</pre></div>
<span id="index-treesit_002drange_002dinvalid"></span>
<p>If <var>ranges</var> violates this constraint, or something else went
wrong, this function signals a <code>treesit-range-invalid</code>. The
signal data contains a specific error message and the ranges we are
trying to set.
</p>
<p>This function can also be used for disabling ranges. If <var>ranges</var>
is nil, the parser is set to parse the whole buffer.
</p>
<p>Example:
</p>
<div class="example">
<pre class="example">(treesit-parser-set-included-ranges
parser '((1 . 9) (16 . 24) (24 . 25)))
</pre></div>
</dd></dl>
<dl class="def">
<dt id="index-treesit_002dparser_002dincluded_002dranges"><span class="category">Function: </span><span><strong>treesit-parser-included-ranges</strong> <em>parser</em><a href='#index-treesit_002dparser_002dincluded_002dranges' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the ranges set for <var>parser</var>. The return
value is the same as the <var>ranges</var> argument of
<code>treesit-parser-included-ranges</code>: a list of cons
<code>(<var>beg</var> . <var>end</var>)</code>. And if <var>parser</var> doesn&rsquo;t have any
ranges, the return value is nil.
</p>
<div class="example">
<pre class="example">(treesit-parser-included-ranges parser)
&rArr; ((1 . 9) (16 . 24) (24 . 25))
</pre></div>
</dd></dl>
<dl class="def">
<dt id="index-treesit_002dset_002dranges"><span class="category">Function: </span><span><strong>treesit-set-ranges</strong> <em>parser-or-lang ranges</em><a href='#index-treesit_002dset_002dranges' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Like <code>treesit-parser-set-included-ranges</code>, this function sets
the ranges of <var>parser-or-lang</var> to <var>ranges</var>. Conveniently,
<var>parser-or-lang</var> could be either a parser or a language. If it is
a language, this function looks for the first parser in
<code>(treesit-parser-list)</code> for that language in the current buffer,
and set range for it.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dget_002dranges"><span class="category">Function: </span><span><strong>treesit-get-ranges</strong> <em>parser-or-lang</em><a href='#index-treesit_002dget_002dranges' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the ranges of <var>parser-or-lang</var>, like
<code>treesit-parser-included-ranges</code>. And like
<code>treesit-set-ranges</code>, <var>parser-or-lang</var> can be a parser or
a language symbol.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dquery_002drange"><span class="category">Function: </span><span><strong>treesit-query-range</strong> <em>source query &amp;optional beg end</em><a href='#index-treesit_002dquery_002drange' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function matches <var>source</var> with <var>query</var> and returns the
ranges of captured nodes. The return value has the same shape of
other functions: a list of <code>(<var>beg</var> . <var>end</var>)</code>.
</p>
<p>For convenience, <var>source</var> can be a language symbol, a parser, or a
node. If a language symbol, this function matches in the root node of
the first parser using that language; if a parser, this function
matches in the root node of that parser; if a node, this function
matches in that node.
</p>
<p>Parameter <var>query</var> is the query used to capture nodes
(see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>). The capture names don&rsquo;t matter. Parameter
<var>beg</var> and <var>end</var>, if both non-nil, limits the range in which
this function queries.
</p>
<p>Like other query functions, this function raises an
<var>treesit-query-error</var> if <var>query</var> is malformed.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dlanguage_002dat"><span class="category">Function: </span><span><strong>treesit-language-at</strong> <em>point</em><a href='#index-treesit_002dlanguage_002dat' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function tries to figure out which language is responsible for
the text at <var>point</var>. It goes over each parser in
<code>(treesit-parser-list)</code> and see if that parser&rsquo;s range covers
<var>point</var>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002drange_002dfunctions"><span class="category">Variable: </span><span><strong>treesit-range-functions</strong><a href='#index-treesit_002drange_002dfunctions' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>A list of range functions. Font-locking and indenting code uses
functions in this alist to set correct ranges for a language parser
before using it.
</p>
<p>The signature of each function should be
</p>
<div class="example">
<pre class="example">(<var>start</var> <var>end</var> &amp;rest <var>_</var>)
</pre></div>
<p>where <var>start</var> and <var>end</var> marks the region that is about to be
used. A range function only need to (but not limited to) update
ranges in that region.
</p>
<p>Each function in the list is called in-order.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dupdate_002dranges"><span class="category">Function: </span><span><strong>treesit-update-ranges</strong> <em>&amp;optional start end</em><a href='#index-treesit_002dupdate_002dranges' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function is used by font-lock and indent to update ranges before
using any parser. Each range function in
<var>treesit-range-functions</var> is called in-order. Arguments
<var>start</var> and <var>end</var> are passed to each range function.
</p></dd></dl>
<span id="An-example"></span><h3 class="heading">An example</h3>
<p>Normally, in a set of languages that can be mixed together, there is a
major language and several embedded languages. We first parse the
whole document with the major languages parser, set ranges for the
embedded languages, then parse the embedded languages.
</p>
<p>Suppose we want to parse a very simple document that mixes HTML, CSS
and JavaScript:
</p>
<div class="example">
<pre class="example">&lt;html&gt;
&lt;script&gt;1 + 2&lt;/script&gt;
&lt;style&gt;body { color: &quot;blue&quot;; }&lt;/style&gt;
&lt;/html&gt;
</pre></div>
<p>We first parse with HTML, then set ranges for CSS and JavaScript:
</p>
<div class="example">
<pre class="example">;; Create parsers.
(setq html (treesit-get-parser-create 'html))
(setq css (treesit-get-parser-create 'css))
(setq js (treesit-get-parser-create 'javascript))
;; Set CSS ranges.
(setq css-range
(treesit-query-range
'html
&quot;(style_element (raw_text) @capture)&quot;))
(treesit-parser-set-included-ranges css css-range)
;; Set JavaScript ranges.
(setq js-range
(treesit-query-range
'html
&quot;(script_element (raw_text) @capture)&quot;))
(treesit-parser-set-included-ranges js js-range)
</pre></div>
<p>We use a query pattern <code>(style_element (raw_text) @capture)</code> to
find CSS nodes in the HTML parse tree. For how to write query
patterns, see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>.
</p>
</div>
<hr>
<div class="header">
<p>
Next: <a href="Tree_002dsitter-C-API.html">Tree-sitter C API Correspondence</a>, Previous: <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>

View file

@ -0,0 +1,160 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This is the GNU Emacs Lisp Reference Manual
corresponding to Emacs version 29.0.50.
Copyright © 1990-1996, 1998-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License," with the
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
Texts as in (a) below. A copy of the license is included in the
section entitled "GNU Free Documentation License."
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual. Buying copies from the FSF supports it in
developing GNU and promoting software freedom." -->
<title>Parser-based Font Lock (GNU Emacs Lisp Reference Manual)</title>
<meta name="description" content="Parser-based Font Lock (GNU Emacs Lisp Reference Manual)">
<meta name="keywords" content="Parser-based Font Lock (GNU Emacs Lisp Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Font-Lock-Mode.html" rel="up" title="Font Lock Mode">
<link href="Multiline-Font-Lock.html" rel="prev" title="Multiline Font Lock">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
span:hover a.copiable-anchor {visibility: visible}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="./manual.css">
</head>
<body lang="en">
<div class="subsection" id="Parser_002dbased-Font-Lock">
<div class="header">
<p>
Previous: <a href="Multiline-Font-Lock.html" accesskey="p" rel="prev">Multiline Font Lock Constructs</a>, Up: <a href="Font-Lock-Mode.html" accesskey="u" rel="up">Font Lock Mode</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Parser_002dbased-Font-Lock-1"></span><h4 class="subsection">24.6.10 Parser-based Font Lock</h4>
<p>Besides simple syntactic font lock and regexp-based font lock, Emacs
also provides complete syntactic font lock with the help of a parser,
currently provided by the tree-sitter library (see <a href="Parsing-Program-Source.html">Parsing Program Source</a>).
</p>
<dl class="def">
<dt id="index-treesit_002dfont_002dlock_002denable"><span class="category">Function: </span><span><strong>treesit-font-lock-enable</strong><a href='#index-treesit_002dfont_002dlock_002denable' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function enables parser-based font lock in the current buffer.
</p></dd></dl>
<p>Parser-based font lock and other font lock mechanism are not mutually
exclusive. By default, if enabled, parser-based font lock runs first,
then the simple syntactic font lock (if enabled), then regexp-based
font lock.
</p>
<p>Although parser-based font lock doesn&rsquo;t share the same customization
variables with regexp-based font lock, parser-based font lock uses
similar customization schemes. The tree-sitter counterpart of
<var>font-lock-keywords</var> is <var>treesit-font-lock-settings</var>.
</p>
<dl class="def">
<dt id="index-treesit_002dfont_002dlock_002drules"><span class="category">Function: </span><span><strong>treesit-font-lock-rules</strong> <em>:keyword value query...</em><a href='#index-treesit_002dfont_002dlock_002drules' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function is used to set <var>treesit-font-lock-settings</var>. It
takes care of compiling queries and other post-processing and outputs
a value that <var>treesit-font-lock-settings</var> accepts. An example:
</p>
<div class="example">
<pre class="example">(treesit-font-lock-rules
:language 'javascript
:override t
'((true) @font-lock-constant-face
(false) @font-lock-constant-face)
:language 'html
&quot;(script_element) @font-lock-builtin-face&quot;)
</pre></div>
<p>This function takes a list of text or s-exp queries. Before each
query, there are <var>:keyword</var> and <var>value</var> pairs that configure
that query. The <code>:lang</code> keyword sets the querys language and
every query must specify the language. Other keywords are optional:
</p>
<table>
<thead><tr><th width="15%">Keyword</th><th width="15%">Value</th><th width="60%">Description</th></tr></thead>
<tr><td width="15%"><code>:override</code></td><td width="15%">nil</td><td width="60%">If the region already has a face, discard the new face</td></tr>
<tr><td width="15%"></td><td width="15%">t</td><td width="60%">Always apply the new face</td></tr>
<tr><td width="15%"></td><td width="15%"><code>append</code></td><td width="60%">Append the new face to existing ones</td></tr>
<tr><td width="15%"></td><td width="15%"><code>prepend</code></td><td width="60%">Prepend the new face to existing ones</td></tr>
<tr><td width="15%"></td><td width="15%"><code>keep</code></td><td width="60%">Fill-in regions without an existing face</td></tr>
</table>
<p>Capture names in <var>query</var> should be face names like
<code>font-lock-keyword-face</code>. The captured node will be fontified
with that face. Capture names can also be function names, in which
case the function is called with (<var>start</var> <var>end</var> <var>node</var>),
where <var>start</var> and <var>end</var> are the start and end position of the
node in buffer, and <var>node</var> is the node itself. If a capture name
is both a face and a function, the face takes priority. If a capture
name is not a face name nor a function name, it is ignored.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dfont_002dlock_002dsettings"><span class="category">Variable: </span><span><strong>treesit-font-lock-settings</strong><a href='#index-treesit_002dfont_002dlock_002dsettings' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>A list of <var>setting</var>s for tree-sitter font lock. The exact format
of this variable is considered internal. One should always use
<code>treesit-font-lock-rules</code> to set this variable.
</p>
<p>Each <var>setting</var> is of form
</p>
<div class="example">
<pre class="example">(<var>language</var> <var>query</var>)
</pre></div>
<p>Each <var>setting</var> controls one parser (often of different language).
And <var>language</var> is the language symbol (see <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>); <var>query</var> is the query (see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>).
</p></dd></dl>
<p>Multi-language major modes should provide range functions in
<code>treesit-range-functions</code>, and Emacs will set the ranges
accordingly before fontifing a region (see <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>).
</p>
</div>
<hr>
<div class="header">
<p>
Previous: <a href="Multiline-Font-Lock.html">Multiline Font Lock Constructs</a>, Up: <a href="Font-Lock-Mode.html">Font Lock Mode</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>

View file

@ -0,0 +1,244 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This is the GNU Emacs Lisp Reference Manual
corresponding to Emacs version 29.0.50.
Copyright © 1990-1996, 1998-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License," with the
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
Texts as in (a) below. A copy of the license is included in the
section entitled "GNU Free Documentation License."
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual. Buying copies from the FSF supports it in
developing GNU and promoting software freedom." -->
<title>Parser-based Indentation (GNU Emacs Lisp Reference Manual)</title>
<meta name="description" content="Parser-based Indentation (GNU Emacs Lisp Reference Manual)">
<meta name="keywords" content="Parser-based Indentation (GNU Emacs Lisp Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Auto_002dIndentation.html" rel="up" title="Auto-Indentation">
<link href="SMIE.html" rel="prev" title="SMIE">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
span:hover a.copiable-anchor {visibility: visible}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="./manual.css">
</head>
<body lang="en">
<div class="subsection" id="Parser_002dbased-Indentation">
<div class="header">
<p>
Previous: <a href="SMIE.html" accesskey="p" rel="prev">Simple Minded Indentation Engine</a>, Up: <a href="Auto_002dIndentation.html" accesskey="u" rel="up">Automatic Indentation of code</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Parser_002dbased-Indentation-1"></span><h4 class="subsection">24.7.2 Parser-based Indentation</h4>
<p>When built with the tree-sitter library (see <a href="Parsing-Program-Source.html">Parsing Program Source</a>), Emacs could parse program source and produce a syntax tree.
And this syntax tree can be used for indentation. For maximum
flexibility, we could write a custom indent function that queries the
syntax tree and indents accordingly for each language, but that would
be a lot of work. It is more convenient to use the simple indentation
engine described below: we only need to write some indentation rules
and the engine takes care of the rest.
</p>
<p>To enable the indentation engine, set the value of
<code>indent-line-function</code> to <code>treesit-indent</code>.
</p>
<dl class="def">
<dt id="index-treesit_002dindent_002dfunction"><span class="category">Variable: </span><span><strong>treesit-indent-function</strong><a href='#index-treesit_002dindent_002dfunction' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This variable stores the actual function called by
<code>treesit-indent</code>. By default, its value is
<code>treesit-simple-indent</code>. In the future we might add other
more complex indentation engines.
</p></dd></dl>
<span id="Writing-indentation-rules"></span><h3 class="heading">Writing indentation rules</h3>
<dl class="def">
<dt id="index-treesit_002dsimple_002dindent_002drules"><span class="category">Variable: </span><span><strong>treesit-simple-indent-rules</strong><a href='#index-treesit_002dsimple_002dindent_002drules' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This local variable stores indentation rules for every language. It is
a list of
</p>
<div class="example">
<pre class="example">(<var>language</var> . <var>rules</var>)
</pre></div>
<p>where <var>language</var> is a language symbol, and <var>rules</var> is a list
of
</p>
<div class="example">
<pre class="example">(<var>matcher</var> <var>anchor</var> <var>offset</var>)
</pre></div>
<p>First Emacs passes the node at point to <var>matcher</var>, if it return
non-nil, this rule applies. Then Emacs passes the node to
<var>anchor</var>, it returns a point. Emacs takes the column number of
that point, add <var>offset</var> to it, and the result is the indent for
the current line.
</p>
<p>The <var>matcher</var> and <var>anchor</var> are functions, and Emacs provides
convenient presets for them. You can skip over to
<code>treesit-simple-indent-presets</code> below, those presets should be
more than enough.
</p>
<p>A <var>matcher</var> or an <var>anchor</var> is a function that takes three
arguments (<var>node</var> <var>parent</var> <var>bol</var>). Argument <var>bol</var> is
the point at where we are indenting: the position of the first
non-whitespace character from the beginning of line; <var>node</var> is the
largest (highest-in-tree) node that starts at that point; <var>parent</var>
is the parent of <var>node</var>. A <var>matcher</var> returns nil/non-nil, and
<var>anchor</var> returns a point.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dsimple_002dindent_002dpresets"><span class="category">Variable: </span><span><strong>treesit-simple-indent-presets</strong><a href='#index-treesit_002dsimple_002dindent_002dpresets' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This is a list of presets for <var>matcher</var>s and <var>anchor</var>s in
<code>treesit-simple-indent-rules</code>. Each of them represent a function
that takes <var>node</var>, <var>parent</var> and <var>bol</var> as arguments.
</p>
<div class="example">
<pre class="example">no-node
</pre></div>
<p>This matcher matches the case where <var>node</var> is nil, i.e., there is
no node that starts at <var>bol</var>. This is the case when <var>bol</var> is
at an empty line or inside a multi-line string, etc.
</p>
<div class="example">
<pre class="example">(parent-is <var>type</var>)
</pre></div>
<p>This matcher matches if <var>parent</var>&rsquo;s type is <var>type</var>.
</p>
<div class="example">
<pre class="example">(node-is <var>type</var>)
</pre></div>
<p>This matcher matches if <var>node</var>&rsquo;s type is <var>type</var>.
</p>
<div class="example">
<pre class="example">(query <var>query</var>)
</pre></div>
<p>This matcher matches if querying <var>parent</var> with <var>query</var>
captures <var>node</var>. The capture name does not matter.
</p>
<div class="example">
<pre class="example">(match <var>node-type</var> <var>parent-type</var>
<var>node-field</var> <var>node-index-min</var> <var>node-index-max</var>)
</pre></div>
<p>This matcher checks if <var>node</var>&rsquo;s type is <var>node-type</var>,
<var>parent</var>&rsquo;s type is <var>parent-type</var>, <var>node</var>&rsquo;s field name in
<var>parent</var> is <var>node-field</var>, and <var>node</var>&rsquo;s index among its
siblings is between <var>node-index-min</var> and <var>node-index-max</var>. If
the value of a constraint is nil, this matcher doesn&rsquo;t check for that
constraint. For example, to match the first child where parent is
<code>argument_list</code>, use
</p>
<div class="example">
<pre class="example">(match nil &quot;argument_list&quot; nil nil 0 0)
</pre></div>
<div class="example">
<pre class="example">first-sibling
</pre></div>
<p>This anchor returns the start of the first child of <var>parent</var>.
</p>
<div class="example">
<pre class="example">parent
</pre></div>
<p>This anchor returns the start of <var>parent</var>.
</p>
<div class="example">
<pre class="example">parent-bol
</pre></div>
<p>This anchor returns the beginning of non-space characters on the line
where <var>parent</var> is on.
</p>
<div class="example">
<pre class="example">prev-sibling
</pre></div>
<p>This anchor returns the start of the previous sibling of <var>node</var>.
</p>
<div class="example">
<pre class="example">no-indent
</pre></div>
<p>This anchor returns the start of <var>node</var>, i.e., no indent.
</p>
<div class="example">
<pre class="example">prev-line
</pre></div>
<p>This anchor returns the first non-whitespace charater on the previous
line.
</p></dd></dl>
<span id="Indentation-utilities"></span><h3 class="heading">Indentation utilities</h3>
<p>Here are some utility functions that can help writing indentation
rules.
</p>
<dl class="def">
<dt id="index-treesit_002dcheck_002dindent"><span class="category">Function: </span><span><strong>treesit-check-indent</strong> <em>mode</em><a href='#index-treesit_002dcheck_002dindent' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function checks current buffer&rsquo;s indentation against major mode
<var>mode</var>. It indents the current buffer in <var>mode</var> and compares
the indentation with the current indentation. Then it pops up a diff
buffer showing the difference. Correct indentation (target) is in
green, current indentation is in red.
</p></dd></dl>
<p>It is also helpful to use <code>treesit-inspect-mode</code> when writing
indentation rules.
</p>
</div>
<hr>
<div class="header">
<p>
Previous: <a href="SMIE.html">Simple Minded Indentation Engine</a>, Up: <a href="Auto_002dIndentation.html">Automatic Indentation of code</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>

View file

@ -0,0 +1,125 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This is the GNU Emacs Lisp Reference Manual
corresponding to Emacs version 29.0.50.
Copyright © 1990-1996, 1998-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License," with the
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
Texts as in (a) below. A copy of the license is included in the
section entitled "GNU Free Documentation License."
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual. Buying copies from the FSF supports it in
developing GNU and promoting software freedom." -->
<title>Parsing Program Source (GNU Emacs Lisp Reference Manual)</title>
<meta name="description" content="Parsing Program Source (GNU Emacs Lisp Reference Manual)">
<meta name="keywords" content="Parsing Program Source (GNU Emacs Lisp Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="index.html" rel="up" title="Top">
<link href="Abbrevs.html" rel="next" title="Abbrevs">
<link href="Syntax-Tables.html" rel="prev" title="Syntax Tables">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
span:hover a.copiable-anchor {visibility: visible}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="./manual.css">
</head>
<body lang="en">
<div class="chapter" id="Parsing-Program-Source">
<div class="header">
<p>
Next: <a href="Abbrevs.html" accesskey="n" rel="next">Abbrevs and Abbrev Expansion</a>, Previous: <a href="Syntax-Tables.html" accesskey="p" rel="prev">Syntax Tables</a>, Up: <a href="index.html" accesskey="u" rel="up">Emacs Lisp</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Parsing-Program-Source-1"></span><h2 class="chapter">37 Parsing Program Source</h2>
<p>Emacs provides various ways to parse program source text and produce a
<em>syntax tree</em>. In a syntax tree, text is no longer a
one-dimensional stream but a structured tree of nodes, where each node
representing a piece of text. Thus a syntax tree can enable
interesting features like precise fontification, indentation,
navigation, structured editing, etc.
</p>
<p>Emacs has a simple facility for parsing balanced expressions
(see <a href="Parsing-Expressions.html">Parsing Expressions</a>). There is also SMIE library for generic
navigation and indentation (see <a href="SMIE.html">Simple Minded Indentation Engine</a>).
</p>
<p>Emacs also provides integration with tree-sitter library
(<a href="https://tree-sitter.github.io/tree-sitter">https://tree-sitter.github.io/tree-sitter</a>) if compiled with
it. The tree-sitter library implements an incremental parser and has
support from a wide range of programming languages.
</p>
<dl class="def">
<dt id="index-treesit_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-available-p</strong><a href='#index-treesit_002davailable_002dp' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns non-nil if tree-sitter features are available
for this Emacs instance.
</p></dd></dl>
<p>For tree-sitter integration with existing Emacs features,
see <a href="Parser_002dbased-Font-Lock.html">Parser-based Font Lock</a>, <a href="Parser_002dbased-Indentation.html">Parser-based Indentation</a>, and
<a href="List-Motion.html">Moving over Balanced Expressions</a>.
</p>
<p>To access the syntax tree of the text in a buffer, we need to first
load a language definition and create a parser with it. Next, we can
query the parser for specific nodes in the syntax tree. Then, we can
access various information about the node, and we can pattern-match a
node with a powerful syntax. Finally, we explain how to work with
source files that mixes multiple languages. The following sections
explain how to do each of the tasks in detail.
</p>
<ul class="section-toc">
<li><a href="Language-Definitions.html" accesskey="1">Tree-sitter Language Definitions</a></li>
<li><a href="Using-Parser.html" accesskey="2">Using Tree-sitter Parser</a></li>
<li><a href="Retrieving-Node.html" accesskey="3">Retrieving Node</a></li>
<li><a href="Accessing-Node.html" accesskey="4">Accessing Node Information</a></li>
<li><a href="Pattern-Matching.html" accesskey="5">Pattern Matching Tree-sitter Nodes</a></li>
<li><a href="Multiple-Languages.html" accesskey="6">Parsing Text in Multiple Languages</a></li>
<li><a href="Tree_002dsitter-C-API.html" accesskey="7">Tree-sitter C API Correspondence</a></li>
</ul>
</div>
<hr>
<div class="header">
<p>
Next: <a href="Abbrevs.html">Abbrevs and Abbrev Expansion</a>, Previous: <a href="Syntax-Tables.html">Syntax Tables</a>, Up: <a href="index.html">Emacs Lisp</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>

View file

@ -0,0 +1,430 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This is the GNU Emacs Lisp Reference Manual
corresponding to Emacs version 29.0.50.
Copyright © 1990-1996, 1998-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License," with the
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
Texts as in (a) below. A copy of the license is included in the
section entitled "GNU Free Documentation License."
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual. Buying copies from the FSF supports it in
developing GNU and promoting software freedom." -->
<title>Pattern Matching (GNU Emacs Lisp Reference Manual)</title>
<meta name="description" content="Pattern Matching (GNU Emacs Lisp Reference Manual)">
<meta name="keywords" content="Pattern Matching (GNU Emacs Lisp Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
<link href="Multiple-Languages.html" rel="next" title="Multiple Languages">
<link href="Accessing-Node.html" rel="prev" title="Accessing Node">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
span:hover a.copiable-anchor {visibility: visible}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="./manual.css">
</head>
<body lang="en">
<div class="section" id="Pattern-Matching">
<div class="header">
<p>
Next: <a href="Multiple-Languages.html" accesskey="n" rel="next">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node.html" accesskey="p" rel="prev">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Pattern-Matching-Tree_002dsitter-Nodes"></span><h3 class="section">37.5 Pattern Matching Tree-sitter Nodes</h3>
<p>Tree-sitter let us pattern match with a small declarative language.
Pattern matching consists of two steps: first tree-sitter matches a
<em>pattern</em> against nodes in the syntax tree, then it <em>captures</em>
specific nodes in that pattern and returns the captured nodes.
</p>
<p>We describe first how to write the most basic query pattern and how to
capture nodes in a pattern, then the pattern-match function, finally
more advanced pattern syntax.
</p>
<span id="Basic-query-syntax"></span><h3 class="heading">Basic query syntax</h3>
<span id="index-Tree_002dsitter-query-syntax"></span>
<span id="index-Tree_002dsitter-query-pattern"></span>
<p>A <em>query</em> consists of multiple <em>patterns</em>. Each pattern is an
s-expression that matches a certain node in the syntax node. A
pattern has the following shape:
</p>
<div class="example">
<pre class="example">(<var>type</var> <var>child</var>...)
</pre></div>
<p>For example, a pattern that matches a <code>binary_expression</code> node that
contains <code>number_literal</code> child nodes would look like
</p>
<div class="example">
<pre class="example">(binary_expression (number_literal))
</pre></div>
<p>To <em>capture</em> a node in the query pattern above, append
<code>@capture-name</code> after the node pattern you want to capture. For
example,
</p>
<div class="example">
<pre class="example">(binary_expression (number_literal) @number-in-exp)
</pre></div>
<p>captures <code>number_literal</code> nodes that are inside a
<code>binary_expression</code> node with capture name <code>number-in-exp</code>.
</p>
<p>We can capture the <code>binary_expression</code> node too, with capture
name <code>biexp</code>:
</p>
<div class="example">
<pre class="example">(binary_expression
(number_literal) @number-in-exp) @biexp
</pre></div>
<span id="Query-function"></span><h3 class="heading">Query function</h3>
<p>Now we can introduce the query functions.
</p>
<dl class="def">
<dt id="index-treesit_002dquery_002dcapture"><span class="category">Function: </span><span><strong>treesit-query-capture</strong> <em>node query &amp;optional beg end node-only</em><a href='#index-treesit_002dquery_002dcapture' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function matches patterns in <var>query</var> in <var>node</var>.
Parameter <var>query</var> can be either a string, a s-expression, or a
compiled query object. For now, we focus on the string syntax;
s-expression syntax and compiled query are described at the end of the
section.
</p>
<p>Parameter <var>node</var> can also be a parser or a language symbol. A
parser means using its root node, a language symbol means find or
create a parser for that language in the current buffer, and use the
root node.
</p>
<p>The function returns all captured nodes in a list of
<code>(<var>capture_name</var> . <var>node</var>)</code>. If <var>node-only</var> is
non-nil, a list of node is returned instead. If <var>beg</var> and
<var>end</var> are both non-nil, this function only pattern matches nodes
in that range.
</p>
<span id="index-treesit_002dquery_002derror"></span>
<p>This function raise a <var>treesit-query-error</var> if <var>query</var> is
malformed. The signal data contains a description of the specific
error. You can use <code>treesit-query-validate</code> to debug the query.
</p></dd></dl>
<p>For example, suppose <var>node</var>&rsquo;s content is <code>1 + 2</code>, and
<var>query</var> is
</p>
<div class="example">
<pre class="example">(setq query
&quot;(binary_expression
(number_literal) @number-in-exp) @biexp&quot;)
</pre></div>
<p>Querying that query would return
</p>
<div class="example">
<pre class="example">(treesit-query-capture node query)
&rArr; ((biexp . <var>&lt;node for &quot;1 + 2&quot;&gt;</var>)
(number-in-exp . <var>&lt;node for &quot;1&quot;&gt;</var>)
(number-in-exp . <var>&lt;node for &quot;2&quot;&gt;</var>))
</pre></div>
<p>As we mentioned earlier, a <var>query</var> could contain multiple
patterns. For example, it could have two top-level patterns:
</p>
<div class="example">
<pre class="example">(setq query
&quot;(binary_expression) @biexp
(number_literal) @number @biexp&quot;)
</pre></div>
<dl class="def">
<dt id="index-treesit_002dquery_002dstring"><span class="category">Function: </span><span><strong>treesit-query-string</strong> <em>string query language</em><a href='#index-treesit_002dquery_002dstring' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function parses <var>string</var> with <var>language</var>, pattern matches
its root node with <var>query</var>, and returns the result.
</p></dd></dl>
<span id="More-query-syntax"></span><h3 class="heading">More query syntax</h3>
<p>Besides node type and capture, tree-sitter&rsquo;s query syntax can express
anonymous node, field name, wildcard, quantification, grouping,
alternation, anchor, and predicate.
</p>
<span id="Anonymous-node"></span><h4 class="subheading">Anonymous node</h4>
<p>An anonymous node is written verbatim, surrounded by quotes. A
pattern matching (and capturing) keyword <code>return</code> would be
</p>
<div class="example">
<pre class="example">&quot;return&quot; @keyword
</pre></div>
<span id="Wild-card"></span><h4 class="subheading">Wild card</h4>
<p>In a query pattern, &lsquo;<samp>(_)</samp>&rsquo; matches any named node, and &lsquo;<samp>_</samp>&rsquo;
matches any named and anonymous node. For example, to capture any
named child of a <code>binary_expression</code> node, the pattern would be
</p>
<div class="example">
<pre class="example">(binary_expression (_) @in_biexp)
</pre></div>
<span id="Field-name"></span><h4 class="subheading">Field name</h4>
<p>We can capture child nodes that has specific field names:
</p>
<div class="example">
<pre class="example">(function_definition
declarator: (_) @func-declarator
body: (_) @func-body)
</pre></div>
<p>We can also capture a node that doesn&rsquo;t have certain field, say, a
<code>function_definition</code> without a <code>body</code> field.
</p>
<div class="example">
<pre class="example">(function_definition !body) @func-no-body
</pre></div>
<span id="Quantify-node"></span><h4 class="subheading">Quantify node</h4>
<p>Tree-sitter recognizes quantification operators &lsquo;<samp>*</samp>&rsquo;, &lsquo;<samp>+</samp>&rsquo; and
&lsquo;<samp>?</samp>&rsquo;. Their meanings are the same as in regular expressions:
&lsquo;<samp>*</samp>&rsquo; matches the preceding pattern zero or more times, &lsquo;<samp>+</samp>&rsquo;
matches one or more times, and &lsquo;<samp>?</samp>&rsquo; matches zero or one time.
</p>
<p>For example, this pattern matches <code>type_declaration</code> nodes
that has <em>zero or more</em> <code>long</code> keyword.
</p>
<div class="example">
<pre class="example">(type_declaration &quot;long&quot;*) @long-type
</pre></div>
<p>And this pattern matches a type declaration that has zero or one
<code>long</code> keyword:
</p>
<div class="example">
<pre class="example">(type_declaration &quot;long&quot;?) @long-type
</pre></div>
<span id="Grouping"></span><h4 class="subheading">Grouping</h4>
<p>Similar to groups in regular expression, we can bundle patterns into a
group and apply quantification operators to it. For example, to
express a comma separated list of identifiers, one could write
</p>
<div class="example">
<pre class="example">(identifier) (&quot;,&quot; (identifier))*
</pre></div>
<span id="Alternation"></span><h4 class="subheading">Alternation</h4>
<p>Again, similar to regular expressions, we can express &ldquo;match anyone
from this group of patterns&rdquo; in the query pattern. The syntax is a
list of patterns enclosed in square brackets. For example, to capture
some keywords in C, the query pattern would be
</p>
<div class="example">
<pre class="example">[
&quot;return&quot;
&quot;break&quot;
&quot;if&quot;
&quot;else&quot;
] @keyword
</pre></div>
<span id="Anchor"></span><h4 class="subheading">Anchor</h4>
<p>The anchor operator &lsquo;<samp>.</samp>&rsquo; can be used to enforce juxtaposition,
i.e., to enforce two things to be directly next to each other. The
two &ldquo;things&rdquo; can be two nodes, or a child and the end of its parent.
For example, to capture the first child, the last child, or two
adjacent children:
</p>
<div class="example">
<pre class="example">;; Anchor the child with the end of its parent.
(compound_expression (_) @last-child .)
;; Anchor the child with the beginning of its parent.
(compound_expression . (_) @first-child)
;; Anchor two adjacent children.
(compound_expression
(_) @prev-child
.
(_) @next-child)
</pre></div>
<p>Note that the enforcement of juxtaposition ignores any anonymous
nodes.
</p>
<span id="Predicate"></span><h4 class="subheading">Predicate</h4>
<p>We can add predicate constraints to a pattern. For example, if we use
the following query pattern
</p>
<div class="example">
<pre class="example">(
(array . (_) @first (_) @last .)
(#equal @first @last)
)
</pre></div>
<p>Then tree-sitter only matches arrays where the first element equals to
the last element. To attach a predicate to a pattern, we need to
group then together. A predicate always starts with a &lsquo;<samp>#</samp>&rsquo;.
Currently there are two predicates, <code>#equal</code> and <code>#match</code>.
</p>
<dl class="def">
<dt id="index-equal-1"><span class="category">Predicate: </span><span><strong>equal</strong> <em>arg1 arg2</em><a href='#index-equal-1' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Matches if <var>arg1</var> equals to <var>arg2</var>. Arguments can be either a
string or a capture name. Capture names represent the text that the
captured node spans in the buffer.
</p></dd></dl>
<dl class="def">
<dt id="index-match"><span class="category">Predicate: </span><span><strong>match</strong> <em>regexp capture-name</em><a href='#index-match' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Matches if the text that <var>capture-name</var>s node spans in the buffer
matches regular expression <var>regexp</var>. Matching is case-sensitive.
</p></dd></dl>
<p>Note that a predicate can only refer to capture names appeared in the
same pattern. Indeed, it makes little sense to refer to capture names
in other patterns anyway.
</p>
<span id="S_002dexpression-patterns"></span><h3 class="heading">S-expression patterns</h3>
<p>Besides strings, Emacs provides a s-expression based syntax for query
patterns. It largely resembles the string-based syntax. For example,
the following pattern
</p>
<div class="example">
<pre class="example">(treesit-query-capture
node &quot;(addition_expression
left: (_) @left
\&quot;+\&quot; @plus-sign
right: (_) @right) @addition
[\&quot;return\&quot; \&quot;break\&quot;] @keyword&quot;)
</pre></div>
<p>is equivalent to
</p>
<div class="example">
<pre class="example">(treesit-query-capture
node '((addition_expression
left: (_) @left
&quot;+&quot; @plus-sign
right: (_) @right) @addition
[&quot;return&quot; &quot;break&quot;] @keyword))
</pre></div>
<p>Most pattern syntax can be written directly as strange but
never-the-less valid s-expressions. Only a few of them needs
modification:
</p>
<ul>
<li> Anchor &lsquo;<samp>.</samp>&rsquo; is written as <code>:anchor</code>.
</li><li> &lsquo;<samp>?</samp>&rsquo; is written as &lsquo;<samp>:?</samp>&rsquo;.
</li><li> &lsquo;<samp>*</samp>&rsquo; is written as &lsquo;<samp>:*</samp>&rsquo;.
</li><li> &lsquo;<samp>+</samp>&rsquo; is written as &lsquo;<samp>:+</samp>&rsquo;.
</li><li> <code>#equal</code> is written as <code>:equal</code>. In general, predicates
change their &lsquo;<samp>#</samp>&rsquo; to &lsquo;<samp>:</samp>&rsquo;.
</li></ul>
<p>For example,
</p>
<div class="example">
<pre class="example">&quot;(
(compound_expression . (_) @first (_)* @rest)
(#match \&quot;love\&quot; @first)
)&quot;
</pre></div>
<p>is written in s-expression as
</p>
<div class="example">
<pre class="example">'((
(compound_expression :anchor (_) @first (_) :* @rest)
(:match &quot;love&quot; @first)
))
</pre></div>
<span id="Compiling-queries"></span><h3 class="heading">Compiling queries</h3>
<p>If a query will be used repeatedly, especially in tight loops, it is
important to compile that query, because a compiled query is much
faster than an uncompiled one. A compiled query can be used anywhere
a query is accepted.
</p>
<dl class="def">
<dt id="index-treesit_002dquery_002dcompile"><span class="category">Function: </span><span><strong>treesit-query-compile</strong> <em>language query</em><a href='#index-treesit_002dquery_002dcompile' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function compiles <var>query</var> for <var>language</var> into a compiled
query object and returns it.
</p>
<p>This function raise a <var>treesit-query-error</var> if <var>query</var> is
malformed. The signal data contains a description of the specific
error. You can use <code>treesit-query-validate</code> to debug the query.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dquery_002dexpand"><span class="category">Function: </span><span><strong>treesit-query-expand</strong> <em>query</em><a href='#index-treesit_002dquery_002dexpand' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function expands the s-expression <var>query</var> into a string
query.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dpattern_002dexpand"><span class="category">Function: </span><span><strong>treesit-pattern-expand</strong> <em>pattern</em><a href='#index-treesit_002dpattern_002dexpand' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function expands the s-expression <var>pattern</var> into a string
pattern.
</p></dd></dl>
<p>Finally, tree-sitter project&rsquo;s documentation about
pattern-matching can be found at
<a href="https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries">https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries</a>.
</p>
</div>
<hr>
<div class="header">
<p>
Next: <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node.html">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>

View file

@ -0,0 +1,362 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This is the GNU Emacs Lisp Reference Manual
corresponding to Emacs version 29.0.50.
Copyright © 1990-1996, 1998-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License," with the
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
Texts as in (a) below. A copy of the license is included in the
section entitled "GNU Free Documentation License."
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual. Buying copies from the FSF supports it in
developing GNU and promoting software freedom." -->
<title>Retrieving Node (GNU Emacs Lisp Reference Manual)</title>
<meta name="description" content="Retrieving Node (GNU Emacs Lisp Reference Manual)">
<meta name="keywords" content="Retrieving Node (GNU Emacs Lisp Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
<link href="Accessing-Node.html" rel="next" title="Accessing Node">
<link href="Using-Parser.html" rel="prev" title="Using Parser">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
span:hover a.copiable-anchor {visibility: visible}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="./manual.css">
</head>
<body lang="en">
<div class="section" id="Retrieving-Node">
<div class="header">
<p>
Next: <a href="Accessing-Node.html" accesskey="n" rel="next">Accessing Node Information</a>, Previous: <a href="Using-Parser.html" accesskey="p" rel="prev">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Retrieving-Node-1"></span><h3 class="section">37.3 Retrieving Node</h3>
<span id="index-tree_002dsitter-find-node"></span>
<span id="index-tree_002dsitter-get-node"></span>
<p>Before we continue, lets go over some conventions of tree-sitter
functions.
</p>
<p>We talk about a node being &ldquo;smaller&rdquo; or &ldquo;larger&rdquo;, and &ldquo;lower&rdquo; or
&ldquo;higher&rdquo;. A smaller and lower node is lower in the syntax tree and
therefore spans a smaller piece of text; a larger and higher node is
higher up in the syntax tree, containing many smaller nodes as its
children, and therefore spans a larger piece of text.
</p>
<p>When a function cannot find a node, it returns nil. And for the
convenience for function chaining, all the functions that take a node
as argument and returns a node accept the node to be nil; in that
case, the function just returns nil.
</p>
<span id="index-treesit_002dnode_002doutdated"></span>
<p>Nodes are not automatically updated when the associated buffer is
modified. And there is no way to update a node once it is retrieved.
Using an outdated node throws <code>treesit-node-outdated</code> error.
</p>
<span id="Retrieving-node-from-syntax-tree"></span><h3 class="heading">Retrieving node from syntax tree</h3>
<dl class="def">
<dt id="index-treesit_002dnode_002dat"><span class="category">Function: </span><span><strong>treesit-node-at</strong> <em>beg end &amp;optional parser-or-lang named</em><a href='#index-treesit_002dnode_002dat' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the <em>smallest</em> node that starts at or after
the <var>point</var>. In other words, the start of the node is equal or
greater than <var>point</var>.
</p>
<p>When <var>parser-or-lang</var> is nil, this function uses the first parser
in <code>(treesit-parser-list)</code> in the current buffer. If
<var>parser-or-lang</var> is a parser object, it use that parser; if
<var>parser-or-lang</var> is a language, it finds the first parser using
that language in <code>(treesit-parser-list)</code> and use that.
</p>
<p>If <var>named</var> is non-nil, this function looks for a named node
only (see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
</p>
<p>Example:
</p><div class="example">
<pre class="example">;; Find the node at point in a C parser's syntax tree.
(treesit-node-at (point) 'c)
</pre></div>
</dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002don"><span class="category">Function: </span><span><strong>treesit-node-on</strong> <em>beg end &amp;optional parser-or-lang named</em><a href='#index-treesit_002dnode_002don' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the <em>smallest</em> node that covers the span
from <var>beg</var> to <var>end</var>. In other words, the start of the node is
less or equal to <var>beg</var>, and the end of the node is greater or
equal to <var>end</var>.
</p>
<p><em>Beware</em> that calling this function on an empty line that is not
inside any top-level construct (function definition, etc) most
probably will give you the root node, because the root node is the
smallest node that covers that empty line. Most of the time, you want
to use <code>treesit-node-at</code>.
</p>
<p>When <var>parser-or-lang</var> is nil, this function uses the first parser
in <code>(treesit-parser-list)</code> in the current buffer. If
<var>parser-or-lang</var> is a parser object, it use that parser; if
<var>parser-or-lang</var> is a language, it finds the first parser using
that language in <code>(treesit-parser-list)</code> and use that.
</p>
<p>If <var>named</var> is non-nil, this function looks for a named node only
(see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dparser_002droot_002dnode"><span class="category">Function: </span><span><strong>treesit-parser-root-node</strong> <em>parser</em><a href='#index-treesit_002dparser_002droot_002dnode' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the root node of the syntax tree generated by
<var>parser</var>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dbuffer_002droot_002dnode"><span class="category">Function: </span><span><strong>treesit-buffer-root-node</strong> <em>&amp;optional language</em><a href='#index-treesit_002dbuffer_002droot_002dnode' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the first parser that uses <var>language</var> in
<code>(treesit-parser-list)</code> in the current buffer, and returns the
root node of that buffer. If it cannot find an appropriate parser,
nil is returned.
</p></dd></dl>
<p>Once we have a node, we can retrieve other nodes from it, or query for
information about this node.
</p>
<span id="Retrieving-node-from-other-nodes"></span><h3 class="heading">Retrieving node from other nodes</h3>
<span id="By-kinship"></span><h4 class="subheading">By kinship</h4>
<dl class="def">
<dt id="index-treesit_002dnode_002dparent"><span class="category">Function: </span><span><strong>treesit-node-parent</strong> <em>node</em><a href='#index-treesit_002dnode_002dparent' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the immediate parent of <var>node</var>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002dchild"><span class="category">Function: </span><span><strong>treesit-node-child</strong> <em>node n &amp;optional named</em><a href='#index-treesit_002dnode_002dchild' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the <var>n</var>&rsquo;th child of <var>node</var>. If
<var>named</var> is non-nil, then it only counts named nodes
(see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>). For example, in a node
that represents a string: <code>&quot;text&quot;</code>, there are three children
nodes: the opening quote <code>&quot;</code>, the string content <code>text</code>, and
the enclosing quote <code>&quot;</code>. Among these nodes, the first child is
the opening quote <code>&quot;</code>, the first named child is the string
content <code>text</code>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002dchildren"><span class="category">Function: </span><span><strong>treesit-node-children</strong> <em>node &amp;optional named</em><a href='#index-treesit_002dnode_002dchildren' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns all of <var>node</var>&rsquo;s children in a list. If
<var>named</var> is non-nil, then it only retrieves named nodes.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnext_002dsibling"><span class="category">Function: </span><span><strong>treesit-next-sibling</strong> <em>node &amp;optional named</em><a href='#index-treesit_002dnext_002dsibling' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the next sibling of <var>node</var>. If <var>named</var> is
non-nil, it finds the next named sibling.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dprev_002dsibling"><span class="category">Function: </span><span><strong>treesit-prev-sibling</strong> <em>node &amp;optional named</em><a href='#index-treesit_002dprev_002dsibling' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the previous sibling of <var>node</var>. If
<var>named</var> is non-nil, it finds the previous named sibling.
</p></dd></dl>
<span id="By-field-name"></span><h4 class="subheading">By field name</h4>
<p>To make the syntax tree easier to analyze, many language definitions
assign <em>field names</em> to child nodes (see <a href="Language-Definitions.html#tree_002dsitter-node-field-name">field name</a>). For example, a <code>function_definition</code> node
could have a <code>declarator</code> and a <code>body</code>.
</p>
<dl class="def">
<dt id="index-treesit_002dchild_002dby_002dfield_002dname"><span class="category">Function: </span><span><strong>treesit-child-by-field-name</strong> <em>node field-name</em><a href='#index-treesit_002dchild_002dby_002dfield_002dname' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the child of <var>node</var> that has <var>field-name</var>
as its field name.
</p>
<div class="example">
<pre class="example">;; Get the child that has &quot;body&quot; as its field name.
(treesit-child-by-field-name node &quot;body&quot;)
</pre></div>
</dd></dl>
<span id="By-position"></span><h4 class="subheading">By position</h4>
<dl class="def">
<dt id="index-treesit_002dfirst_002dchild_002dfor_002dpos"><span class="category">Function: </span><span><strong>treesit-first-child-for-pos</strong> <em>node pos &amp;optional named</em><a href='#index-treesit_002dfirst_002dchild_002dfor_002dpos' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the first child of <var>node</var> that extends beyond
<var>pos</var>. &ldquo;Extend beyond&rdquo; means the end of the child node &gt;=
<var>pos</var>. This function only looks for immediate children of
<var>node</var>, and doesn&rsquo;t look in its grand children. If <var>named</var> is
non-nil, it only looks for named child (see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002ddescendant_002dfor_002drange"><span class="category">Function: </span><span><strong>treesit-node-descendant-for-range</strong> <em>node beg end &amp;optional named</em><a href='#index-treesit_002dnode_002ddescendant_002dfor_002drange' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the <em>smallest</em> child/grandchild... of
<var>node</var> that spans the range from <var>beg</var> to <var>end</var>. It is
similar to <code>treesit-node-at</code>. If <var>named</var> is non-nil, it only
looks for named child.
</p></dd></dl>
<span id="Searching-for-node"></span><h3 class="heading">Searching for node</h3>
<dl class="def">
<dt id="index-treesit_002dsearch_002dsubtree"><span class="category">Function: </span><span><strong>treesit-search-subtree</strong> <em>node predicate &amp;optional all backward limit</em><a href='#index-treesit_002dsearch_002dsubtree' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function traverses the subtree of <var>node</var> (including
<var>node</var>), and match <var>predicate</var> with each node along the way.
And <var>predicate</var> is a regexp that matches (case-insensitively)
against each node&rsquo;s type, or a function that takes a node and returns
nil/non-nil. If a node matches, that node is returned, if no node
ever matches, nil is returned.
</p>
<p>By default, this function only traverses named nodes, if <var>all</var> is
non-nil, it traverses all nodes. If <var>backward</var> is non-nil, it
traverses backwards. If <var>limit</var> is non-nil, it only traverses
that number of levels down in the tree.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dsearch_002dforward"><span class="category">Function: </span><span><strong>treesit-search-forward</strong> <em>start predicate &amp;optional all backward up</em><a href='#index-treesit_002dsearch_002dforward' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function is somewhat similar to <code>treesit-search-subtree</code>.
It also traverse the parse tree and match each node with
<var>predicate</var> (except for <var>start</var>), where <var>predicate</var> can be
a (case-insensitive) regexp or a function. For a tree like the below
where <var>start</var> is marked 1, this function traverses as numbered:
</p>
<div class="example">
<pre class="example"> o
|
3--------4-----------8
| | |
o--o-+--1 5--+--6 9---+-----12
| | | | | |
o o 2 7 +-+-+ +--+--+
| | | | |
10 11 13 14 15
</pre></div>
<p>Same as in <code>treesit-search-subtree</code>, this function only searches
for named nodes by default. But if <var>all</var> is non-nil, it searches
for all nodes. If <var>backward</var> is non-nil, it searches backwards.
</p>
<p>If <var>up</var> is non-nil, this function will only traverse to siblings
and parents. In that case, only 1 3 4 8 would be traversed.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dsearch_002dforward_002dgoto"><span class="category">Function: </span><span><strong>treesit-search-forward-goto</strong> <em>predicate side &amp;optional all backward up</em><a href='#index-treesit_002dsearch_002dforward_002dgoto' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function jumps to the start or end of the next node in buffer
that matches <var>predicate</var>. Parameters <var>predicate</var>, <var>all</var>,
<var>backward</var>, and <var>up</var> are the same as in
<code>treesit-search-forward</code>. And <var>side</var> controls which side of
the matched no do we stop at, it can be <code>start</code> or <code>end</code>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dinduce_002dsparse_002dtree"><span class="category">Function: </span><span><strong>treesit-induce-sparse-tree</strong> <em>root predicate &amp;optional process-fn limit</em><a href='#index-treesit_002dinduce_002dsparse_002dtree' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function creates a sparse tree from <var>root</var>&rsquo;s subtree.
</p>
<p>Basically, it takes the subtree under <var>root</var>, and combs it so only
the nodes that match <var>predicate</var> are left, like picking out grapes
on the vine. Like previous functions, <var>predicate</var> can be a regexp
string that matches against each node&rsquo;s type case-insensitively, or a
function that takes a node and return nil/non-nil.
</p>
<p>For example, for a subtree on the left that consist of both numbers
and letters, if <var>predicate</var> is &ldquo;letter only&rdquo;, the returned tree
is the one on the right.
</p>
<div class="example">
<pre class="example"> a a a
| | |
+---+---+ +---+---+ +---+---+
| | | | | | | | |
b 1 2 b | | b c d
| | =&gt; | | =&gt; |
c +--+ c + e
| | | | |
+--+ d 4 +--+ d
| | |
e 5 e
</pre></div>
<p>If <var>process-fn</var> is non-nil, instead of returning the matched
nodes, this function passes each node to <var>process-fn</var> and uses the
returned value instead. If non-nil, <var>limit</var> is the number of
levels to go down from <var>root</var>.
</p>
<p>Each node in the returned tree looks like <code>(<var>tree-sitter
node</var> . (<var>child</var> ...))</code>. The <var>tree-sitter node</var> of the root
of this tree will be nil if <var>ROOT</var> doesn&rsquo;t match <var>pred</var>. If
no node matches <var>predicate</var>, return nil.
</p></dd></dl>
<span id="More-convenient-functions"></span><h3 class="heading">More convenient functions</h3>
<dl class="def">
<dt id="index-treesit_002dfilter_002dchild"><span class="category">Function: </span><span><strong>treesit-filter-child</strong> <em>node pred &amp;optional named</em><a href='#index-treesit_002dfilter_002dchild' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds immediate children of <var>node</var> that satisfies
<var>pred</var>.
</p>
<p>Function <var>pred</var> takes the child node as the argument and should
return non-nil to indicated keeping the child. If <var>named</var>
non-nil, this function only searches for named nodes.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dparent_002duntil"><span class="category">Function: </span><span><strong>treesit-parent-until</strong> <em>node pred</em><a href='#index-treesit_002dparent_002duntil' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function repeatedly finds the parent of <var>node</var>, and returns
the parent if it satisfies <var>pred</var> (which takes the parent as the
argument). If no parent satisfies <var>pred</var>, this function returns
nil.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dparent_002dwhile"><span class="category">Function: </span><span><strong>treesit-parent-while</strong><a href='#index-treesit_002dparent_002dwhile' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function repeatedly finds the parent of <var>node</var>, and keeps
doing so as long as the parent satisfies <var>pred</var> (which takes the
parent as the single argument). I.e., this function returns the
farthest parent that still satisfies <var>pred</var>.
</p></dd></dl>
</div>
<hr>
<div class="header">
<p>
Next: <a href="Accessing-Node.html">Accessing Node Information</a>, Previous: <a href="Using-Parser.html">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>

View file

@ -0,0 +1,212 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This is the GNU Emacs Lisp Reference Manual
corresponding to Emacs version 29.0.50.
Copyright © 1990-1996, 1998-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License," with the
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
Texts as in (a) below. A copy of the license is included in the
section entitled "GNU Free Documentation License."
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual. Buying copies from the FSF supports it in
developing GNU and promoting software freedom." -->
<title>Tree-sitter C API (GNU Emacs Lisp Reference Manual)</title>
<meta name="description" content="Tree-sitter C API (GNU Emacs Lisp Reference Manual)">
<meta name="keywords" content="Tree-sitter C API (GNU Emacs Lisp Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
<link href="Multiple-Languages.html" rel="prev" title="Multiple Languages">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
span:hover a.copiable-anchor {visibility: visible}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="./manual.css">
</head>
<body lang="en">
<div class="section" id="Tree_002dsitter-C-API">
<div class="header">
<p>
Previous: <a href="Multiple-Languages.html" accesskey="p" rel="prev">Parsing Text in Multiple Languages</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Tree_002dsitter-C-API-Correspondence"></span><h3 class="section">37.7 Tree-sitter C API Correspondence</h3>
<p>Emacs&rsquo; tree-sitter integration doesn&rsquo;t expose every feature
tree-sitter&rsquo;s C API provides. Missing features include:
</p>
<ul>
<li> Creating a tree cursor and navigating the syntax tree with it.
</li><li> Setting timeout and cancellation flag for a parser.
</li><li> Setting the logger for a parser.
</li><li> Printing a DOT graph of the syntax tree to a file.
</li><li> Coping and modifying a syntax tree. (Emacs doesn&rsquo;t expose a tree
object.)
</li><li> Using (row, column) coordinates as position.
</li><li> Updating a node with changes. (In Emacs, retrieve a new node instead
of updating the existing one.)
</li><li> Querying statics of a language definition.
</li></ul>
<p>In addition, Emacs makes some changes to the C API to make the API more
convenient and idiomatic:
</p>
<ul>
<li> Instead of using byte positions, the ELisp API uses character
positions.
</li><li> Null nodes are converted to nil.
</li></ul>
<p>Below is the correspondence between all C API functions and their
ELisp counterparts. Sometimes one ELisp function corresponds to
multiple C functions, and many C functions don&rsquo;t have an ELisp
counterpart.
</p>
<div class="example">
<pre class="example">ts_parser_new treesit-parser-create
ts_parser_delete
ts_parser_set_language
ts_parser_language treesit-parser-language
ts_parser_set_included_ranges treesit-parser-set-included-ranges
ts_parser_included_ranges treesit-parser-included-ranges
ts_parser_parse
ts_parser_parse_string treesit-parse-string
ts_parser_parse_string_encoding
ts_parser_reset
ts_parser_set_timeout_micros
ts_parser_timeout_micros
ts_parser_set_cancellation_flag
ts_parser_cancellation_flag
ts_parser_set_logger
ts_parser_logger
ts_parser_print_dot_graphs
ts_tree_copy
ts_tree_delete
ts_tree_root_node
ts_tree_language
ts_tree_edit
ts_tree_get_changed_ranges
ts_tree_print_dot_graph
ts_node_type treesit-node-type
ts_node_symbol
ts_node_start_byte treesit-node-start
ts_node_start_point
ts_node_end_byte treesit-node-end
ts_node_end_point
ts_node_string treesit-node-string
ts_node_is_null
ts_node_is_named treesit-node-check
ts_node_is_missing treesit-node-check
ts_node_is_extra treesit-node-check
ts_node_has_changes treesit-node-check
ts_node_has_error treesit-node-check
ts_node_parent treesit-node-parent
ts_node_child treesit-node-child
ts_node_field_name_for_child treesit-node-field-name-for-child
ts_node_child_count treesit-node-child-count
ts_node_named_child treesit-node-child
ts_node_named_child_count treesit-node-child-count
ts_node_child_by_field_name treesit-node-by-field-name
ts_node_child_by_field_id
ts_node_next_sibling treesit-next-sibling
ts_node_prev_sibling treesit-prev-sibling
ts_node_next_named_sibling treesit-next-sibling
ts_node_prev_named_sibling treesit-prev-sibling
ts_node_first_child_for_byte treesit-first-child-for-pos
ts_node_first_named_child_for_byte treesit-first-child-for-pos
ts_node_descendant_for_byte_range treesit-descendant-for-range
ts_node_descendant_for_point_range
ts_node_named_descendant_for_byte_range treesit-descendant-for-range
ts_node_named_descendant_for_point_range
ts_node_edit
ts_node_eq treesit-node-eq
ts_tree_cursor_new
ts_tree_cursor_delete
ts_tree_cursor_reset
ts_tree_cursor_current_node
ts_tree_cursor_current_field_name
ts_tree_cursor_current_field_id
ts_tree_cursor_goto_parent
ts_tree_cursor_goto_next_sibling
ts_tree_cursor_goto_first_child
ts_tree_cursor_goto_first_child_for_byte
ts_tree_cursor_goto_first_child_for_point
ts_tree_cursor_copy
ts_query_new
ts_query_delete
ts_query_pattern_count
ts_query_capture_count
ts_query_string_count
ts_query_start_byte_for_pattern
ts_query_predicates_for_pattern
ts_query_step_is_definite
ts_query_capture_name_for_id
ts_query_string_value_for_id
ts_query_disable_capture
ts_query_disable_pattern
ts_query_cursor_new
ts_query_cursor_delete
ts_query_cursor_exec treesit-query-capture
ts_query_cursor_did_exceed_match_limit
ts_query_cursor_match_limit
ts_query_cursor_set_match_limit
ts_query_cursor_set_byte_range
ts_query_cursor_set_point_range
ts_query_cursor_next_match
ts_query_cursor_remove_match
ts_query_cursor_next_capture
ts_language_symbol_count
ts_language_symbol_name
ts_language_symbol_for_name
ts_language_field_count
ts_language_field_name_for_id
ts_language_field_id_for_name
ts_language_symbol_type
ts_language_version
</pre></div>
</div>
<hr>
<div class="header">
<p>
Previous: <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>

View file

@ -0,0 +1,186 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This is the GNU Emacs Lisp Reference Manual
corresponding to Emacs version 29.0.50.
Copyright © 1990-1996, 1998-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License," with the
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
Texts as in (a) below. A copy of the license is included in the
section entitled "GNU Free Documentation License."
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual. Buying copies from the FSF supports it in
developing GNU and promoting software freedom." -->
<title>Using Parser (GNU Emacs Lisp Reference Manual)</title>
<meta name="description" content="Using Parser (GNU Emacs Lisp Reference Manual)">
<meta name="keywords" content="Using Parser (GNU Emacs Lisp Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
<link href="Retrieving-Node.html" rel="next" title="Retrieving Node">
<link href="Language-Definitions.html" rel="prev" title="Language Definitions">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
span:hover a.copiable-anchor {visibility: visible}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="./manual.css">
</head>
<body lang="en">
<div class="section" id="Using-Parser">
<div class="header">
<p>
Next: <a href="Retrieving-Node.html" accesskey="n" rel="next">Retrieving Node</a>, Previous: <a href="Language-Definitions.html" accesskey="p" rel="prev">Tree-sitter Language Definitions</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Using-Tree_002dsitter-Parser"></span><h3 class="section">37.2 Using Tree-sitter Parser</h3>
<span id="index-Tree_002dsitter-parser"></span>
<p>This section described how to create and configure a tree-sitter
parser. In Emacs, each tree-sitter parser is associated with a
buffer. As we edit the buffer, the associated parser and the syntax
tree is automatically kept up-to-date.
</p>
<dl class="def">
<dt id="index-treesit_002dmax_002dbuffer_002dsize"><span class="category">Variable: </span><span><strong>treesit-max-buffer-size</strong><a href='#index-treesit_002dmax_002dbuffer_002dsize' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This variable contains the maximum size of buffers in which
tree-sitter can be activated. Major modes should check this value
when deciding whether to enable tree-sitter features.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dcan_002denable_002dp"><span class="category">Function: </span><span><strong>treesit-can-enable-p</strong><a href='#index-treesit_002dcan_002denable_002dp' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function checks whether the current buffer is suitable for
activating tree-sitter features. It basically checks
<code>treesit-available-p</code> and <code>treesit-max-buffer-size</code>.
</p></dd></dl>
<span id="index-Creating-tree_002dsitter-parsers"></span>
<dl class="def">
<dt id="index-treesit_002dparser_002dcreate"><span class="category">Function: </span><span><strong>treesit-parser-create</strong> <em>language &amp;optional buffer no-reuse</em><a href='#index-treesit_002dparser_002dcreate' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>To create a parser, we provide a <var>buffer</var> and the <var>language</var>
to use (see <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>). If <var>buffer</var> is nil, the
current buffer is used.
</p>
<p>By default, this function reuses a parser if one already exists for
<var>language</var> in <var>buffer</var>, if <var>no-reuse</var> is non-nil, this
function always creates a new parser.
</p></dd></dl>
<p>Given a parser, we can query information about it:
</p>
<dl class="def">
<dt id="index-treesit_002dparser_002dbuffer"><span class="category">Function: </span><span><strong>treesit-parser-buffer</strong> <em>parser</em><a href='#index-treesit_002dparser_002dbuffer' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Returns the buffer associated with <var>parser</var>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dparser_002dlanguage"><span class="category">Function: </span><span><strong>treesit-parser-language</strong> <em>parser</em><a href='#index-treesit_002dparser_002dlanguage' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Returns the language that <var>parser</var> uses.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dparser_002dp"><span class="category">Function: </span><span><strong>treesit-parser-p</strong> <em>object</em><a href='#index-treesit_002dparser_002dp' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Checks if <var>object</var> is a tree-sitter parser. Return non-nil if it
is, return nil otherwise.
</p></dd></dl>
<p>There is no need to explicitly parse a buffer, because parsing is done
automatically and lazily. A parser only parses when we query for a
node in its syntax tree. Therefore, when a parser is first created,
it doesn&rsquo;t parse the buffer; it waits until we query for a node for
the first time. Similarly, when some change is made in the buffer, a
parser doesn&rsquo;t re-parse immediately.
</p>
<span id="index-treesit_002dbuffer_002dtoo_002dlarge"></span>
<p>When a parser do parse, it checks for the size of the buffer.
Tree-sitter can only handle buffer no larger than about 4GB. If the
size exceeds that, Emacs signals <code>treesit-buffer-too-large</code>
with signal data being the buffer size.
</p>
<p>Once a parser is created, Emacs automatically adds it to the
internal parser list. Every time a change is made to the buffer,
Emacs updates parsers in this list so they can update their syntax
tree incrementally.
</p>
<dl class="def">
<dt id="index-treesit_002dparser_002dlist"><span class="category">Function: </span><span><strong>treesit-parser-list</strong> <em>&amp;optional buffer</em><a href='#index-treesit_002dparser_002dlist' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the parser list of <var>buffer</var>. And
<var>buffer</var> defaults to the current buffer.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dparser_002ddelete"><span class="category">Function: </span><span><strong>treesit-parser-delete</strong> <em>parser</em><a href='#index-treesit_002dparser_002ddelete' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function deletes <var>parser</var>.
</p></dd></dl>
<span id="index-tree_002dsitter-narrowing"></span>
<span id="tree_002dsitter-narrowing"></span><p>Normally, a parser &ldquo;sees&rdquo; the whole
buffer, but when the buffer is narrowed (see <a href="Narrowing.html">Narrowing</a>), the
parser will only see the visible region. As far as the parser can
tell, the hidden region is deleted. And when the buffer is later
widened, the parser thinks text is inserted in the beginning and in
the end. Although parsers respect narrowing, narrowing shouldn&rsquo;t be
the mean to handle a multi-language buffer; instead, set the ranges in
which a parser should operate in. See <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>.
</p>
<p>Because a parser parses lazily, when we narrow the buffer, the parser
is not affected immediately; as long as we don&rsquo;t query for a node
while the buffer is narrowed, the parser is oblivious of the
narrowing.
</p>
<span id="index-tree_002dsitter-parse-string"></span>
<dl class="def">
<dt id="index-treesit_002dparse_002dstring"><span class="category">Function: </span><span><strong>treesit-parse-string</strong> <em>string language</em><a href='#index-treesit_002dparse_002dstring' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Besides creating a parser for a buffer, we can also just parse a
string. Unlike a buffer, parsing a string is a one-time deal, and
there is no way to update the result.
</p>
<p>This function parses <var>string</var> with <var>language</var>, and returns the
root node of the generated syntax tree.
</p></dd></dl>
</div>
<hr>
<div class="header">
<p>
Next: <a href="Retrieving-Node.html">Retrieving Node</a>, Previous: <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>

View file

@ -0,0 +1,23 @@
#!/bin/bash
MANUAL_DIR="../../../doc/lispref"
THIS_DIR=$(pwd)
echo "Build manual"
cd "${MANUAL_DIR}"
make elisp.html HTML_OPTS="--html --css-ref=./manual.css"
cd "${THIS_DIR}"
echo "Copy manual"
cp -f "${MANUAL_DIR}/elisp.html/Parsing-Program-Source.html" .
cp -f "${MANUAL_DIR}/elisp.html/Language-Definitions.html" .
cp -f "${MANUAL_DIR}/elisp.html/Using-Parser.html" .
cp -f "${MANUAL_DIR}/elisp.html/Retrieving-Node.html" .
cp -f "${MANUAL_DIR}/elisp.html/Accessing-Node.html" .
cp -f "${MANUAL_DIR}/elisp.html/Pattern-Matching.html" .
cp -f "${MANUAL_DIR}/elisp.html/Multiple-Languages.html" .
cp -f "${MANUAL_DIR}/elisp.html/Tree_002dsitter-C-API.html" .
cp -f "${MANUAL_DIR}/elisp.html/Parser_002dbased-Font-Lock.html" .
cp -f "${MANUAL_DIR}/elisp.html/Parser_002dbased-Indentation.html" .

View file

@ -0,0 +1,374 @@
/* Style-sheet to use for Emacs manuals */
/* Copyright (C) 2013-2014 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved. This file is offered as-is,
without any warranty.
*/
/* style.css begins here */
/* This stylesheet is used by manuals and a few older resources. */
/* reset.css begins here */
/*
Software License Agreement (BSD License)
Copyright (c) 2006, Yahoo! Inc.
All rights reserved.
Redistribution and use of this software in source and
binary forms, with or without modification, arepermitted
provided that the following conditions are met:
* Redistributions of source code must retain the above
copyright notice, this list of conditions and the
following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the
following disclaimer in the documentation and/or other
materials provided with the distribution.
* Neither the name of Yahoo! Inc. nor the names of its
contributors may be used to endorse or promote products
derived from this software without specific prior
written permission of Yahoo! Inc.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
*/
html {
color: #000;
background: #FFF;
}
body, div, dl, dt, dd, ul, ol, li, h1, h2, h3, h4,
h5, h6, pre, code, form, fieldset, legend, input,
button, textarea, p, blockquote, th, td {
margin: 0;
padding: 0;
}
table {
border-collapse: collapse;
border-spacing: 0;
}
fieldset, img {
border: 0;
}
address, caption, cite, code, dfn, em, strong,
th, var, optgroup {
font-style: inherit;
font-weight: inherit;
}
del, ins {
text-decoration: none;
}
li {
list-style:none;
}
caption, th {
text-align: left;
}
h1, h2, h3, h4, h5, h6 {
font-size: 100%;
font-weight: normal;
}
q:before, q:after {
content:'';
}
abbr, acronym {
border: 0;
font-variant: normal;
}
sup {
vertical-align: baseline;
}
sub {
vertical-align: baseline;
}
legend {
color: #000;
}
input, button, textarea, select, optgroup, option {
font-family: inherit;
font-size: inherit;
font-style: inherit;
font-weight: inherit;
}
input, button, textarea, select {
*font-size: 100%;
}
/* reset.css ends here */
/*** PAGE LAYOUT ***/
html, body {
font-size: 1em;
text-align: left;
text-decoration: none;
}
html { background-color: #e7e7e7; }
body {
max-width: 74.92em;
margin: 0 auto;
padding: .5em 1em 1em 1em;
background-color: white;
border: .1em solid #c0c0c0;
}
/*** BASIC ELEMENTS ***/
/* Size and positioning */
p, pre, li, dt, dd, table, code, address { line-height: 1.3em; }
h1 { font-size: 2em; margin: 1em 0 }
h2 { font-size: 1.50em; margin: 1.0em 0 0.87em 0; }
h3 { font-size: 1.30em; margin: 1.0em 0 0.87em 0; }
h4 { font-size: 1.13em; margin: 1.0em 0 0.88em 0; }
h5 { font-size: 1.00em; margin: 1.0em 0 1.00em 0; }
p, pre { margin: 1em 0; }
pre { overflow: auto; padding-bottom: .3em; }
ul, ol, blockquote { margin-left: 1.5%; margin-right: 1.5%; }
hr { margin: 1em 0; }
/* Lists of underlined links are difficult to read. The top margin
gives a little more spacing between entries. */
ul li { margin: .5em 1em; }
ol li { margin: 1em; }
ol ul li { margin: .5em 1em; }
ul li p, ul ul li { margin-top: .3em; margin-bottom: .3em; }
ul ul, ol ul { margin-top: 0; margin-bottom: 0; }
/* Separate description lists from preceding text */
dl { margin: 1em 0 0 0; }
/* separate the "term" from subsequent "description" */
dt { margin: .5em 0; }
/* separate the "description" from subsequent list item
when the final <dd> child is an anonymous box */
dd { margin: .5em 3% 1em 3%; }
/* separate anonymous box (used to be the first element in <dd>)
from subsequent <p> */
dd p { margin: .5em 0; }
table {
display: block; overflow: auto;
margin-top: 1.5em; margin-bottom: 1.5em;
}
th { padding: .3em .5em; text-align: center; }
td { padding: .2em .5em; }
address { margin-bottom: 1em; }
caption { margin-bottom: .5em; text-align: center; }
sup { vertical-align: super; }
sub { vertical-align: sub; }
/* Style */
h1, h2, h3, h4, h5, h6, strong, dt, th { font-weight: bold; }
/* The default color (black) is too dark for large text in
bold font. */
h1, h2, h3, h4 { color: #333; }
h5, h6, dt { color: #222; }
a[href] { color: #005090; }
a[href]:visited { color: #100070; }
a[href]:active, a[href]:hover {
color: #100070;
text-decoration: none;
}
h1 a[href]:visited, h2 a[href]:visited, h3 a[href]:visited,
h4 a[href]:visited { color: #005090; }
h1 a[href]:hover, h2 a[href]:hover, h3 a[href]:hover,
h4 a[href]:hover { color: #100070; }
ol { list-style: decimal outside;}
ul { list-style: square outside; }
ul ul, ol ul { list-style: circle; }
li { list-style: inherit; }
hr { background-color: #ede6d5; }
table { border: 0; }
abbr,acronym {
border-bottom:1px dotted #000;
text-decoration: none;
cursor:help;
}
del { text-decoration: line-through; }
em { font-style: italic; }
small { font-size: .9em; }
img { max-width: 100%}
/*** SIMPLE CLASSES ***/
.center, .c { text-align: center; }
.nocenter{ text-align: left; }
.underline { text-decoration: underline; }
.nounderline { text-decoration: none; }
.no-bullet { list-style: none; }
.inline-list li { display: inline }
.netscape4, .no-display { display: none; }
/*** MANUAL PAGES ***/
/* This makes the very long tables of contents in Gnulib and other
manuals easier to read. */
.contents ul, .shortcontents ul { font-weight: bold; }
.contents ul ul, .shortcontents ul ul { font-weight: normal; }
.contents ul { list-style: none; }
/* For colored navigation bars (Emacs manual): make the bar extend
across the whole width of the page and give it a decent height. */
.header, .node { margin: 0 -1em; padding: 0 1em; }
.header p, .node p { line-height: 2em; }
/* For navigation links */
.node a, .header a { display: inline-block; line-height: 2em; }
.node a:hover, .header a:hover { background: #f2efe4; }
/* Inserts */
table.cartouche td { padding: 1.5em; }
div.display, div.lisp, div.smalldisplay,
div.smallexample, div.smalllisp { margin-left: 3%; }
div.example { padding: .8em 1.2em .4em; }
pre.example { padding: .8em 1.2em; }
div.example, pre.example {
margin: 1em 0 1em 3% ;
-webkit-border-radius: .3em;
-moz-border-radius: .3em;
border-radius: .3em;
border: 1px solid #d4cbb6;
background-color: #f2efe4;
}
div.example > pre.example {
padding: 0 0 .4em;
margin: 0;
border: none;
}
pre.menu-comment { padding-top: 1.3em; margin: 0; }
/*** FOR WIDE SCREENS ***/
@media (min-width: 40em) {
body { padding: .5em 3em 1em 3em; }
div.header, div.node { margin: 0 -3em; padding: 0 3em; }
}
/* style.css ends here */
/* makeinfo convert @deffn and similar functions to something inside
<blockquote>. style.css uses italic for blockquote. This looks poor
in the Emacs manuals, which make extensive use of @defun (etc).
In particular, references to function arguments appear as <var>
inside <blockquote>. Since <var> is also italic, it makes it
impossible to distinguish variables. We could change <var> to
e.g. bold-italic, or normal, or a different color, but that does
not look as good IMO. So we just override blockquote to be non-italic.
*/
blockquote { font-style: normal; }
var { font-style: italic; }
div.header {
background-color: #DDDDFF;
padding-top: 0.2em;
}
/*** Customization ***/
body {
font-family: Charter, serif;
font-size: 14pt;
line-height: 1.4;
background-color: #fefefc;
color: #202010;
}
pre.menu-comment {
font-family: Charter, serif;
font-size: 14pt;
}
body > *, body > div.display, body > div.lisp, body > div.smalldisplay,
body > div.example, body > div.smallexample, body > div.smalllisp {
width: 700px;
margin-left: auto;
margin-right: auto;
}
div.header {
width: 100%;
min-height: 3em;
font-size: 13pt;
}
/* Documentation block for functions and variables. Make then
narrower*/
dd {
margin: .5em 6% 1em 6%
}
code, pre, kbd, samp, tt {
font-size: 12pt;
font-family: monospace;
}
/* In each node we have index table to all sub-nodes. Make more space
for the first column, which is the name to each sub-node. */
table.menu tbody tr td:nth-child(1) {
white-space: nowrap;
}
div.header p {
text-align: center;
margin: 0.5em auto 0.5em auto;
}

View file

@ -0,0 +1,442 @@
STARTER GUIDE ON WRITTING MAJOR MODE WITH TREE-SITTER -*- org -*-
This document guides you on adding tree-sitter support to a major
mode.
TOC:
- Building Emacs with tree-sitter
- Install language definitions
- Setup
- Font-lock
- Indent
- Imenu
- Navigation
- Which-func
- More features?
- Common tasks (code snippets)
- Manual
* Building Emacs with tree-sitter
You can either install tree-sitter by your package manager, or from
source:
git clone https://github.com/tree-sitter/tree-sitter.git
cd tree-sitter
make
make install
Then pull the tree-sitter branch (or the master branch, if it has
merged) and rebuild Emacs.
* Install language definitions
Tree-sitter by itself doesnt know how to parse any particular
language. We need to install language definitions (or “grammars”) for
a language to be able to parse it. There are a couple of ways to get
them.
You can use this script that I put together here:
https://github.com/casouri/tree-sitter-module
You can also find them under this directory in /build-modules.
This script automatically pulls and builds language definitions for C,
C++, Rust, JSON, Go, HTML, Javascript, CSS, Python, Typescript,
and C#. Better yet, I pre-built these language definitions for
GNU/Linux and macOS, they can be downloaded here:
https://github.com/casouri/tree-sitter-module/releases/tag/v2.1
To build them yourself, run
git clone git@github.com:casouri/tree-sitter-module.git
cd tree-sitter-module
./batch.sh
and language definitions will be in the /dist directory. You can
either copy them to standard dynamic library locations of your system,
eg, /usr/local/lib, or leave them in /dist and later tell Emacs where
to find language definitions by setting treesit-extra-load-path.
Language definition sources can be found on GitHub under
tree-sitter/xxx, like tree-sitter/tree-sitter-python. The tree-sitter
organization has all the "official" language definitions:
https://github.com/tree-sitter
* Setting up for adding major mode features
Start Emacs, and load tree-sitter with
(require 'treesit)
Now check if Emacs is built with tree-sitter library
(treesit-available-p)
For your major mode, first create a tree-sitter switch:
#+begin_src elisp
(defcustom python-use-tree-sitter nil
"If non-nil, `python-mode' tries to use tree-sitter.
Currently `python-mode' can utilize tree-sitter for font-locking,
imenu, and movement functions."
:type 'boolean)
#+end_src
Then in other places, we decide on whether to enable tree-sitter by
#+begin_src elisp
(and python-use-tree-sitter
(treesit-can-enable-p))
#+end_src
* Font-lock
Tree-sitter works like this: You provide a query made of patterns and
capture names, tree-sitter finds the nodes that match these patterns,
tag the corresponding capture names onto the nodes and return them to
you. The query function returns a list of (capture-name . node). For
font-lock, we use face names as capture names. And the captured node
will be fontified in their capture name. The capture name could also
be a function, in which case (START END NODE) is passed to the
function for font-lock. START and END is the start and end the
captured NODE.
** Query syntax
There are two types of nodes, named, like (identifier),
(function_definition), and anonymous, like "return", "def", "(",
"}". Parent-child relationship is expressed as
(parent (child) (child) (child (grand_child)))
Eg, an argument list (1, "3", 1) could be:
(argument_list "(" (number) (string) (number) ")")
Children could have field names in its parent:
(function_definition name: (identifier) type: (identifier))
Match any of the list:
["true" "false" "none"]
Capture names can come after any node in the pattern:
(parent (child) @child) @parent
The query above captures both parent and child.
["return" "continue" "break"] @keyword
The query above captures all the keywords with capture name
"keyword".
These are the common syntax, see all of them in the manual
("Parsing Program Source" section).
** Query references
But how do one come up with the queries? Take python for an
example, open any python source file, evaluate
(treesit-parser-create 'python)
so there is a parser available, then enable treesit-inspect-mode.
Now you should see information of the node under point in
mode-line. Move around and you should be able to get a good
picture. Besides this, you can consult the grammar of the language
definition. For example, Pythons grammar file is at
https://github.com/tree-sitter/tree-sitter-python/blob/master/grammar.js
Neovim also has a bunch of queries to reference:
https://github.com/nvim-treesitter/nvim-treesitter/tree/master/queries
The manual explains how to read grammar files in the bottom of section
"Tree-sitter Language Definitions".
** Debugging queires
If your query has problems, it usually cannot compile. In that case
use treesit-query-validate to debug the query. It will pop a buffer
containing the query (in text format) and mark the offending part in
red.
** Code
To enable tree-sitter font-lock, set treesit-font-lock-settings
buffer-locally and call treesit-font-lock-enable. For example, see
python--treesit-settings in python.el. Below I paste a snippet of
it.
Note that like the current font-lock, if the to-be-fontified region
already has a face (ie, an earlier match fontified part/all of the
region), the new face is discarded rather than applied. If you want
later matches always override earlier matches, use the :override
keyword.
#+begin_src elisp
(defvar python--treesit-settings
(treesit-font-lock-rules
:language 'python
:override t
`(;; Queries for def and class.
(function_definition
name: (identifier) @font-lock-function-name-face)
(class_definition
name: (identifier) @font-lock-type-face)
;; Comment and string.
(comment) @font-lock-comment-face
...)))
#+end_src
Then in python-mode, enable tree-sitter font-lock:
#+begin_src elisp
(treesit-parser-create 'python)
;; This turns off the syntax-based font-lock for comments and
;; strings. So it doesnt override tree-sitters fontification.
(setq-local font-lock-keywords-only t)
(setq-local treesit-font-lock-settings
python--treesit-settings)
(treesit-font-lock-enable)
#+end_src
Concretely, something like this:
#+begin_src elisp
(define-derived-mode python-mode prog-mode "Python"
...
(treesit-parser-create 'python)
(if (and python-use-tree-sitter
(treesit-can-enable-p))
;; Tree-sitter.
(progn
(setq-local font-lock-keywords-only t)
(setq-local treesit-font-lock-settings
python--treesit-settings)
(treesit-font-lock-enable))
;; No tree-sitter
(setq-local font-lock-defaults ...))
...)
#+end_src
Youll notice that tree-sitters font-lock doesnt respect
font-lock-maximum-decoration, major modes are free to set
treesit-font-lock-settings based on the value of
font-lock-maximum-decoration, or provide more fine-grained control
through other mode-specific means.
* Indent
Indent works like this: We have a bunch of rules that look like this:
(MATCHER ANCHOR OFFSET)
At the beginning point is at the BOL of a line, we want to know which
column to indent this line to. Let NODE be the node at point, we pass
this node to the MATCHER of each rule, one of them will match the node
("this node is a closing bracket!"). Then we pass the node to the
ANCHOR, which returns a point, eg, the BOL of the previous line. We
find the column number of that point (eg, 4), add OFFSET to it (eg,
0), and that is the column we want to indent the current line to (4 +
0 = 4).
For MATHCER we have
(parent-is TYPE)
(node-is TYPE)
(query QUERY) => matches if querying PARENT with QUERY
captures NODE.
(match NODE-TYPE PARENT-TYPE NODE-FIELD
NODE-INDEX-MIN NODE-INDEX-MAX)
=> checks everything. If an argument is nil, dont match that. Eg,
(match nil nil TYPE) is the same as (parent-is TYPE)
For ANCHOR we have
first-sibling => start of the first sibling
parent => start of parent
parent-bol => BOL of the line parent is on.
prev-sibling
no-indent => dont indent
prev-line => same indent as previous line
There is also a manual section for indent: "Parser-based Indentation".
When writing indent rules, you can use treesit-check-indent to
check if your indentation is correct. To debug what went wrong, set
treesit--indent-verboase to non-nil. Then when you indent, Emacs
tells you which rule is applied in the echo area.
#+begin_src elisp
(defvar typescript-mode-indent-rules
(let ((offset typescript-indent-offset))
`((typescript
;; This rule matches if node at point is "}", ANCHOR is the
;; parent nodes BOL, and offset is 0.
((node-is "}") parent-bol 0)
((node-is ")") parent-bol 0)
((node-is "]") parent-bol 0)
((node-is ">") parent-bol 0)
((node-is ".") parent-bol ,offset)
((parent-is "ternary_expression") parent-bol ,offset)
((parent-is "named_imports") parent-bol ,offset)
((parent-is "statement_block") parent-bol ,offset)
((parent-is "type_arguments") parent-bol ,offset)
((parent-is "variable_declarator") parent-bol ,offset)
((parent-is "arguments") parent-bol ,offset)
((parent-is "array") parent-bol ,offset)
((parent-is "formal_parameters") parent-bol ,offset)
((parent-is "template_substitution") parent-bol ,offset)
((parent-is "object_pattern") parent-bol ,offset)
((parent-is "object") parent-bol ,offset)
((parent-is "object_type") parent-bol ,offset)
((parent-is "enum_body") parent-bol ,offset)
((parent-is "arrow_function") parent-bol ,offset)
((parent-is "parenthesized_expression") parent-bol ,offset)
...))))
#+end_src
Then you set treesit-simple-indent-rules to your rules, and set
indent-line-function:
#+begin_src elisp
(setq-local treesit-simple-indent-rules typescript-mode-indent-rules)
(setq-local indent-line-function #'treesit-indent)
#+end_src
* Imenu
Not much to say except for utilizing treesit-induce-sparse-tree.
See python--imenu-treesit-create-index-1 in python.el for an
example.
Once you have the index builder, set imenu-create-index-function.
* Navigation
Mainly beginning-of-defun-function and end-of-defun-function.
You can find the end of a defun with something like
(treesit-search-forward-goto "function_definition" 'end)
where "function_definition" matches the node type of a function
definition node, and end means we want to go to the end of that
node.
Something like this should suffice:
#+begin_src elisp
(defun xxx-beginning-of-defun (&optional arg)
(if (> arg 0)
;; Go backward.
(while (and (> arg 0)
(treesit-search-forward-goto
"function_definition" 'start nil t))
(setq arg (1- arg)))
;; Go forward.
(while (and (< arg 0)
(treesit-search-forward-goto
"function_definition" 'start))
(setq arg (1+ arg)))))
(setq-local beginning-of-defun-function #'xxx-beginning-of-defun)
#+end_src
And the same for end-of-defun.
* Which-func
You can find the current function by going up the tree and looking for
the function_definition node. See python-info-treesit-current-defun
in python.el for an example. Since Python allows nested function
definitions, that function keeps going until it reaches the root node,
and records all the function names along the way.
#+begin_src elisp
(defun python-info-treesit-current-defun (&optional include-type)
"Identical to `python-info-current-defun' but use tree-sitter.
For INCLUDE-TYPE see `python-info-current-defun'."
(let ((node (treesit-node-at (point)))
(name-list ())
(type nil))
(cl-loop while node
if (pcase (treesit-node-type node)
("function_definition"
(setq type 'def))
("class_definition"
(setq type 'class))
(_ nil))
do (push (treesit-node-text
(treesit-node-child-by-field-name node "name")
t)
name-list)
do (setq node (treesit-node-parent node))
finally return (concat (if include-type
(format "%s " type)
"")
(string-join name-list ".")))))
#+end_src
* More features?
Obviously this list is just a starting point, if there are features in
the major mode that would benefit a parse tree, adding tree-sitter
support for that would be great. But in the minimal case, just adding
font-lock is awesome.
* Common tasks
How to...
** Get the buffer text corresponding to a node?
(treesit-node-text node)
BTW treesit-node-string does different things.
** Scan the whole tree for stuff?
(treesit-search-subtree)
(treesit-search-forward)
(treesit-induce-sparse-tree)
** Move to next node that...?
(treesit-search-forward-goto)
** Get the root node?
(treesit-buffer-root-node)
** Get the node at point?
(treesit-node-at (point))
* Manual
I suggest you read the manual section for tree-sitter in Info. The
section is Parsing Program Source. Typing
C-h i d m elisp RET g Parsing Program Source RET
will bring you to that section. You can also read the HTML version
under /html-manual in this directory. I find the HTML version easier
to read. You dont need to read through every sentence, just read the
text paragraphs and glance over function names.