1
Fork 0
mirror of git://git.sv.gnu.org/emacs.git synced 2025-12-26 23:31:55 -08:00
emacs/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html
2023-01-01 05:31:12 -05:00

280 lines
16 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This is the GNU Emacs Lisp Reference Manual
corresponding to Emacs version 29.0.50.
Copyright © 1990-1996, 1998-2023 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License," with the
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
Texts as in (a) below. A copy of the license is included in the
section entitled "GNU Free Documentation License."
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual. Buying copies from the FSF supports it in
developing GNU and promoting software freedom." -->
<title>Parser-based Indentation (GNU Emacs Lisp Reference Manual)</title>
<meta name="description" content="Parser-based Indentation (GNU Emacs Lisp Reference Manual)">
<meta name="keywords" content="Parser-based Indentation (GNU Emacs Lisp Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Auto_002dIndentation.html" rel="up" title="Auto-Indentation">
<link href="SMIE.html" rel="prev" title="SMIE">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
span:hover a.copiable-anchor {visibility: visible}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="./manual.css">
</head>
<body lang="en">
<div class="subsection" id="Parser_002dbased-Indentation">
<div class="header">
<p>
Previous: <a href="SMIE.html" accesskey="p" rel="prev">Simple Minded Indentation Engine</a>, Up: <a href="Auto_002dIndentation.html" accesskey="u" rel="up">Automatic Indentation of code</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Parser_002dbased-Indentation-1"></span><h4 class="subsection">24.7.2 Parser-based Indentation</h4>
<span id="index-parser_002dbased-indentation"></span>
<p>When built with the tree-sitter library (see <a href="Parsing-Program-Source.html">Parsing Program Source</a>), Emacs is capable of parsing the program source and producing
a syntax tree. This syntax tree can be used for guiding the program
source indentation commands. For maximum flexibility, it is possible
to write a custom indentation function that queries the syntax tree
and indents accordingly for each language, but that is a lot of work.
It is more convenient to use the simple indentation engine described
below: then the major mode needs only to write some indentation rules
and the engine takes care of the rest.
</p>
<p>To enable the parser-based indentation engine, either set
<var>treesit-simple-indent-rules</var> and call
<code>treesit-major-mode-setup</code>, or equivalently, set the value of
<code>indent-line-function</code> to <code>treesit-indent</code>.
</p>
<dl class="def">
<dt id="index-treesit_002dindent_002dfunction"><span class="category">Variable: </span><span><strong>treesit-indent-function</strong><a href='#index-treesit_002dindent_002dfunction' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This variable stores the actual function called by
<code>treesit-indent</code>. By default, its value is
<code>treesit-simple-indent</code>. In the future we might add other,
more complex indentation engines.
</p></dd></dl>
<span id="Writing-indentation-rules"></span><h3 class="heading">Writing indentation rules</h3>
<span id="index-indentation-rules_002c-for-parser_002dbased-indentation"></span>
<dl class="def">
<dt id="index-treesit_002dsimple_002dindent_002drules"><span class="category">Variable: </span><span><strong>treesit-simple-indent-rules</strong><a href='#index-treesit_002dsimple_002dindent_002drules' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This local variable stores indentation rules for every language. It is
a list of the form: <code>(<var>language</var>&nbsp;.&nbsp;<var>rules</var>)</code><!-- /@w -->, where
<var>language</var> is a language symbol, and <var>rules</var> is a list of the
form <code>(<var>matcher</var>&nbsp;<var>anchor</var>&nbsp;<var>offset</var>)</code><!-- /@w -->.
</p>
<p>First, Emacs passes the smallest tree-sitter node at the beginning of
the current line to <var>matcher</var>; if it returns non-<code>nil</code>, this
rule is applicable. Then Emacs passes the node to <var>anchor</var>, which
returns a buffer position. Emacs takes the column number of that
position, adds <var>offset</var> to it, and the result is the indentation
column for the current line. <var>offset</var> can be an integer or a
variable whose value is an integer.
</p>
<p>The <var>matcher</var> and <var>anchor</var> are functions, and Emacs provides
convenient defaults for them.
</p>
<p>Each <var>matcher</var> or <var>anchor</var> is a function that takes three
arguments: <var>node</var>, <var>parent</var>, and <var>bol</var>. The argument
<var>bol</var> is the buffer position whose indentation is required: the
position of the first non-whitespace character after the beginning of
the line. The argument <var>node</var> is the largest (highest-in-tree)
node that starts at that position; and <var>parent</var> is the parent of
<var>node</var>. However, when that position is in a whitespace or inside
a multi-line string, no node can start at that position, so
<var>node</var> is <code>nil</code>. In that case, <var>parent</var> would be the
smallest node that spans that position.
</p>
<p>Emacs finds <var>bol</var>, <var>node</var> and <var>parent</var> and
passes them to each <var>matcher</var> and <var>anchor</var>. <var>matcher</var>
should return non-<code>nil</code> if the rule is applicable, and
<var>anchor</var> should return a buffer position.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dsimple_002dindent_002dpresets"><span class="category">Variable: </span><span><strong>treesit-simple-indent-presets</strong><a href='#index-treesit_002dsimple_002dindent_002dpresets' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This is a list of defaults for <var>matcher</var>s and <var>anchor</var>s in
<code>treesit-simple-indent-rules</code>. Each of them represents a function
that takes 3 arguments: <var>node</var>, <var>parent</var> and <var>bol</var>. The
available default functions are:
</p>
<dl compact="compact">
<dt id='index-no_002dnode'><span><code>no-node</code><a href='#index-no_002dnode' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This matcher is a function that is called with 3 arguments:
<var>node</var>, <var>parent</var>, and <var>bol</var>, and returns non-<code>nil</code>,
indicating a match, if <var>node</var> is <code>nil</code>, i.e., there is no
node that starts at <var>bol</var>. This is the case when <var>bol</var> is on
an empty line or inside a multi-line string, etc.
</p>
</dd>
<dt id='index-parent_002dis'><span><code>parent-is</code><a href='#index-parent_002dis' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This matcher is a function of one argument, <var>type</var>; it returns a
function that is called with 3 arguments: <var>node</var>, <var>parent</var>,
and <var>bol</var>, and returns non-<code>nil</code> (i.e., a match) if
<var>parent</var>&rsquo;s type matches regexp <var>type</var>.
</p>
</dd>
<dt id='index-node_002dis'><span><code>node-is</code><a href='#index-node_002dis' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This matcher is a function of one argument, <var>type</var>; it returns a
function that is called with 3 arguments: <var>node</var>, <var>parent</var>,
and <var>bol</var>, and returns non-<code>nil</code> if <var>node</var>&rsquo;s type matches
regexp <var>type</var>.
</p>
</dd>
<dt id='index-query'><span><code>query</code><a href='#index-query' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This matcher is a function of one argument, <var>query</var>; it returns a
function that is called with 3 arguments: <var>node</var>, <var>parent</var>,
and <var>bol</var>, and returns non-<code>nil</code> if querying <var>parent</var>
with <var>query</var> captures <var>node</var> (see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>).
</p>
</dd>
<dt id='index-match'><span><code>match</code><a href='#index-match' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This matcher is a function of 5 arguments: <var>node-type</var>,
<var>parent-type</var>, <var>node-field</var>, <var>node-index-min</var>, and
<var>node-index-max</var>). It returns a function that is called with 3
arguments: <var>node</var>, <var>parent</var>, and <var>bol</var>, and returns
non-<code>nil</code> if <var>node</var>&rsquo;s type matches regexp <var>node-type</var>,
<var>parent</var>&rsquo;s type matches regexp <var>parent-type</var>, <var>node</var>&rsquo;s
field name in <var>parent</var> matches regexp <var>node-field</var>, and
<var>node</var>&rsquo;s index among its siblings is between <var>node-index-min</var>
and <var>node-index-max</var>. If the value of an argument is <code>nil</code>,
this matcher doesn&rsquo;t check that argument. For example, to match the
first child where parent is <code>argument_list</code>, use
</p>
<div class="example">
<pre class="example">(match nil &quot;argument_list&quot; nil nil 0 0)
</pre></div>
</dd>
<dt id='index-comment_002dend'><span><code>comment-end</code><a href='#index-comment_002dend' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This matcher is a function that is called with 3 arguments:
<var>node</var>, <var>parent</var>, and <var>bol</var>, and returns non-<code>nil</code> if
point is before a comment ending token. Comment ending tokens are
defined by regular expression <code>treesit-comment-end</code>
(see <a href="Tree_002dsitter-major-modes.html">treesit-comment-end</a>).
</p>
</dd>
<dt id='index-first_002dsibling'><span><code>first-sibling</code><a href='#index-first_002dsibling' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the start of the first child
of <var>parent</var>.
</p>
</dd>
<dt id='index-parent'><span><code>parent</code><a href='#index-parent' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the start of <var>parent</var>.
</p>
</dd>
<dt id='index-parent_002dbol'><span><code>parent-bol</code><a href='#index-parent_002dbol' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the first non-space character
on the line of <var>parent</var>.
</p>
</dd>
<dt id='index-prev_002dsibling'><span><code>prev-sibling</code><a href='#index-prev_002dsibling' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the start of the previous
sibling of <var>node</var>.
</p>
</dd>
<dt id='index-no_002dindent'><span><code>no-indent</code><a href='#index-no_002dindent' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the start of <var>node</var>.
</p>
</dd>
<dt id='index-prev_002dline'><span><code>prev-line</code><a href='#index-prev_002dline' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the first non-whitespace
character on the previous line.
</p>
</dd>
<dt id='index-point_002dmin'><span><code>point-min</code><a href='#index-point_002dmin' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the beginning of the buffer.
This is useful as the beginning of the buffer is always at column 0.
</p>
</dd>
<dt id='index-comment_002dstart'><span><code>comment-start</code><a href='#index-comment_002dstart' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the position right after the
comment-start token. Comment-start tokens are defined by regular
expression <code>treesit-comment-start</code> (see <a href="Tree_002dsitter-major-modes.html">treesit-comment-start</a>). This function assumes <var>parent</var> is
the comment node.
</p>
</dd>
<dt id='index-coment_002dstart_002dskip'><span><code>comment-start-skip</code><a href='#index-coment_002dstart_002dskip' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the position after the
comment-start token and any whitespace characters following that
token. Comment-start tokens are defined by regular expression
<code>treesit-comment-start</code>. This function assumes <var>parent</var> is
the comment node.
</p></dd>
</dl>
</dd></dl>
<span id="Indentation-utilities"></span><h3 class="heading">Indentation utilities</h3>
<span id="index-utility-functions-for-parser_002dbased-indentation"></span>
<p>Here are some utility functions that can help writing parser-based
indentation rules.
</p>
<dl class="def">
<dt id="index-treesit_002dcheck_002dindent"><span class="category">Function: </span><span><strong>treesit-check-indent</strong> <em>mode</em><a href='#index-treesit_002dcheck_002dindent' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function checks the current buffer&rsquo;s indentation against major
mode <var>mode</var>. It indents the current buffer according to
<var>mode</var> and compares the results with the current indentation.
Then it pops up a buffer showing the differences. Correct
indentation (target) is shown in green color, current indentation is
shown in red color. </p></dd></dl>
<p>It is also helpful to use <code>treesit-inspect-mode</code> (see <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>) when writing indentation rules.
</p>
</div>
<hr>
<div class="header">
<p>
Previous: <a href="SMIE.html">Simple Minded Indentation Engine</a>, Up: <a href="Auto_002dIndentation.html">Automatic Indentation of code</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>