mirror of
git://git.sv.gnu.org/emacs.git
synced 2025-12-25 06:50:46 -08:00
* doc/lispref/modes.texi (Parser-based Font Lock): Replace :lang with :language. * doc/lispref/parsing.texi (Language Grammar): Replace treesit-load-suffixes with dynamic-library-suffixes. (Retrieving Nodes): Fix function names. (Tree-sitter Major Modes): Fix treesit-ready-p args. Fix pxref to Parser-based Indentation. (Tree-sitter C API): Fix function names. * lisp/treesit.el (treesit--simple-indent-eval): Remove cond BODY duplicated from CONDITION. (treesit)<define-short-documentation-group>: Fix function names.
401 lines
19 KiB
HTML
401 lines
19 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<!-- This is the GNU Emacs Lisp Reference Manual
|
|
corresponding to Emacs version 29.0.50.
|
|
|
|
Copyright © 1990-1996, 1998-2023 Free Software Foundation, Inc.
|
|
|
|
Permission is granted to copy, distribute and/or modify this document
|
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
|
any later version published by the Free Software Foundation; with the
|
|
Invariant Sections being "GNU General Public License," with the
|
|
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
|
|
Texts as in (a) below. A copy of the license is included in the
|
|
section entitled "GNU Free Documentation License."
|
|
|
|
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
|
|
modify this GNU manual. Buying copies from the FSF supports it in
|
|
developing GNU and promoting software freedom." -->
|
|
<title>Language Definitions (GNU Emacs Lisp Reference Manual)</title>
|
|
|
|
<meta name="description" content="Language Definitions (GNU Emacs Lisp Reference Manual)">
|
|
<meta name="keywords" content="Language Definitions (GNU Emacs Lisp Reference Manual)">
|
|
<meta name="resource-type" content="document">
|
|
<meta name="distribution" content="global">
|
|
<meta name="Generator" content="makeinfo">
|
|
<meta name="viewport" content="width=device-width,initial-scale=1">
|
|
|
|
<link href="index.html" rel="start" title="Top">
|
|
<link href="Index.html" rel="index" title="Index">
|
|
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
|
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
|
|
<link href="Using-Parser.html" rel="next" title="Using Parser">
|
|
<style type="text/css">
|
|
<!--
|
|
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
|
|
a.summary-letter {text-decoration: none}
|
|
blockquote.indentedblock {margin-right: 0em}
|
|
div.display {margin-left: 3.2em}
|
|
div.example {margin-left: 3.2em}
|
|
kbd {font-style: oblique}
|
|
pre.display {font-family: inherit}
|
|
pre.format {font-family: inherit}
|
|
pre.menu-comment {font-family: serif}
|
|
pre.menu-preformatted {font-family: serif}
|
|
span.nolinebreak {white-space: nowrap}
|
|
span.roman {font-family: initial; font-weight: normal}
|
|
span.sansserif {font-family: sans-serif; font-weight: normal}
|
|
span:hover a.copiable-anchor {visibility: visible}
|
|
ul.no-bullet {list-style: none}
|
|
-->
|
|
</style>
|
|
<link rel="stylesheet" type="text/css" href="./manual.css">
|
|
|
|
|
|
</head>
|
|
|
|
<body lang="en">
|
|
<div class="section" id="Language-Definitions">
|
|
<div class="header">
|
|
<p>
|
|
Next: <a href="Using-Parser.html" accesskey="n" rel="next">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
|
</div>
|
|
<hr>
|
|
<span id="Tree_002dsitter-Language-Definitions"></span><h3 class="section">37.1 Tree-sitter Language Definitions</h3>
|
|
<span id="index-language-definitions_002c-for-tree_002dsitter"></span>
|
|
|
|
<span id="Loading-a-language-definition"></span><h3 class="heading">Loading a language definition</h3>
|
|
<span id="index-loading-language-definition-for-tree_002dsitter"></span>
|
|
|
|
<span id="index-language-argument_002c-for-tree_002dsitter"></span>
|
|
<p>Tree-sitter relies on language definitions to parse text in that
|
|
language. In Emacs, a language definition is represented by a symbol.
|
|
For example, the C language definition is represented as the symbol
|
|
<code>c</code>, and <code>c</code> can be passed to tree-sitter functions as the
|
|
<var>language</var> argument.
|
|
</p>
|
|
<span id="index-treesit_002dextra_002dload_002dpath"></span>
|
|
<span id="index-treesit_002dload_002dlanguage_002derror"></span>
|
|
<span id="index-treesit_002dload_002dsuffixes"></span>
|
|
<p>Tree-sitter language definitions are distributed as dynamic libraries.
|
|
In order to use a language definition in Emacs, you need to make sure
|
|
that the dynamic library is installed on the system. Emacs looks for
|
|
language definitions in several places, in the following order:
|
|
</p>
|
|
<ul>
|
|
<li> first, in the list of directories specified by the variable
|
|
<code>treesit-extra-load-path</code>;
|
|
</li><li> then, in the <samp>tree-sitter</samp> subdirectory of the directory
|
|
specified by <code>user-emacs-directory</code> (see <a href="Init-File.html">The Init File</a>);
|
|
</li><li> and finally, in the system’s default locations for dynamic libraries.
|
|
</li></ul>
|
|
|
|
<p>In each of these directories, Emacs looks for a file with file-name
|
|
extensions specified by the variable <code>dynamic-library-suffixes</code>.
|
|
</p>
|
|
<p>If Emacs cannot find the library or has problems loading it, Emacs
|
|
signals the <code>treesit-load-language-error</code> error. The data of
|
|
that signal could be one of the following:
|
|
</p>
|
|
<dl compact="compact">
|
|
<dt><span><code>(not-found <var>error-msg</var> …)</code></span></dt>
|
|
<dd><p>This means that Emacs could not find the language definition library.
|
|
</p></dd>
|
|
<dt><span><code>(symbol-error <var>error-msg</var>)</code></span></dt>
|
|
<dd><p>This means that Emacs could not find in the library the expected function
|
|
that every language definition library should export.
|
|
</p></dd>
|
|
<dt><span><code>(version-mismatch <var>error-msg</var>)</code></span></dt>
|
|
<dd><p>This means that the version of language definition library is incompatible
|
|
with that of the tree-sitter library.
|
|
</p></dd>
|
|
</dl>
|
|
|
|
<p>In all of these cases, <var>error-msg</var> might provide additional
|
|
details about the failure.
|
|
</p>
|
|
<dl class="def">
|
|
<dt id="index-treesit_002dlanguage_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-language-available-p</strong> <em>language &optional detail</em><a href='#index-treesit_002dlanguage_002davailable_002dp' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This function returns non-<code>nil</code> if the language definitions for
|
|
<var>language</var> exist and can be loaded.
|
|
</p>
|
|
<p>If <var>detail</var> is non-<code>nil</code>, return <code>(t . nil)</code> when
|
|
<var>language</var> is available, and <code>(nil . <var>data</var>)</code> when it’s
|
|
unavailable. <var>data</var> is the signal data of
|
|
<code>treesit-load-language-error</code>.
|
|
</p></dd></dl>
|
|
|
|
<span id="index-treesit_002dload_002dname_002doverride_002dlist"></span>
|
|
<p>By convention, the file name of the dynamic library for <var>language</var> is
|
|
<samp>libtree-sitter-<var>language</var>.<var>ext</var></samp>, where <var>ext</var> is the
|
|
system-specific extension for dynamic libraries. Also by convention,
|
|
the function provided by that library is named
|
|
<code>tree_sitter_<var>language</var></code>. If a language definition library
|
|
doesn’t follow this convention, you should add an entry
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">(<var>language</var> <var>library-base-name</var> <var>function-name</var>)
|
|
</pre></div>
|
|
|
|
<p>to the list in the variable <code>treesit-load-name-override-list</code>, where
|
|
<var>library-base-name</var> is the basename of the dynamic library’s file name,
|
|
(usually, <samp>libtree-sitter-<var>language</var></samp>), and
|
|
<var>function-name</var> is the function provided by the library
|
|
(usually, <code>tree_sitter_<var>language</var></code>). For example,
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">(cool-lang "libtree-sitter-coool" "tree_sitter_cooool")
|
|
</pre></div>
|
|
|
|
<p>for a language that considers itself too “cool” to abide by
|
|
conventions.
|
|
</p>
|
|
<span id="index-language_002ddefinition-version_002c-compatibility"></span>
|
|
<dl class="def">
|
|
<dt id="index-treesit_002dlanguage_002dversion"><span class="category">Function: </span><span><strong>treesit-language-version</strong> <em>&optional min-compatible</em><a href='#index-treesit_002dlanguage_002dversion' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This function returns the version of the language-definition
|
|
Application Binary Interface (<acronym>ABI</acronym>) supported by the
|
|
tree-sitter library. By default, it returns the latest ABI version
|
|
supported by the library, but if <var>min-compatible</var> is
|
|
non-<code>nil</code>, it returns the oldest ABI version which the library
|
|
still can support. Language definition libraries must be built for
|
|
ABI versions between the oldest and the latest versions supported by
|
|
the tree-sitter library, otherwise the library will be unable to load
|
|
them.
|
|
</p></dd></dl>
|
|
|
|
<span id="Concrete-syntax-tree"></span><h3 class="heading">Concrete syntax tree</h3>
|
|
<span id="index-syntax-tree_002c-concrete"></span>
|
|
|
|
<p>A syntax tree is what a parser generates. In a syntax tree, each node
|
|
represents a piece of text, and is connected to each other by a
|
|
parent-child relationship. For example, if the source text is
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">1 + 2
|
|
</pre></div>
|
|
|
|
<p>its syntax tree could be
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example"> +--------------+
|
|
| root "1 + 2" |
|
|
+--------------+
|
|
|
|
|
+--------------------------------+
|
|
| expression "1 + 2" |
|
|
+--------------------------------+
|
|
| | |
|
|
+------------+ +--------------+ +------------+
|
|
| number "1" | | operator "+" | | number "2" |
|
|
+------------+ +--------------+ +------------+
|
|
</pre></div>
|
|
|
|
<p>We can also represent it as an s-expression:
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">(root (expression (number) (operator) (number)))
|
|
</pre></div>
|
|
|
|
<span id="Node-types"></span><h4 class="subheading">Node types</h4>
|
|
<span id="index-node-types_002c-in-a-syntax-tree"></span>
|
|
|
|
<span id="index-type-of-node_002c-tree_002dsitter"></span>
|
|
<span id="tree_002dsitter-node-type"></span><span id="index-named-node_002c-tree_002dsitter"></span>
|
|
<span id="tree_002dsitter-named-node"></span><span id="index-anonymous-node_002c-tree_002dsitter"></span>
|
|
<p>Names like <code>root</code>, <code>expression</code>, <code>number</code>, and
|
|
<code>operator</code> specify the <em>type</em> of the nodes. However, not all
|
|
nodes in a syntax tree have a type. Nodes that don’t have a type are
|
|
known as <em>anonymous nodes</em>, and nodes with a type are <em>named
|
|
nodes</em>. Anonymous nodes are tokens with fixed spellings, including
|
|
punctuation characters like bracket ‘<samp>]</samp>’, and keywords like
|
|
<code>return</code>.
|
|
</p>
|
|
<span id="Field-names"></span><h4 class="subheading">Field names</h4>
|
|
|
|
<span id="index-field-name_002c-tree_002dsitter"></span>
|
|
<span id="index-tree_002dsitter-node-field-name"></span>
|
|
<span id="tree_002dsitter-node-field-name"></span><p>To make the syntax tree easier to analyze, many language definitions
|
|
assign <em>field names</em> to child nodes. For example, a
|
|
<code>function_definition</code> node could have a <code>declarator</code> and a
|
|
<code>body</code>:
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">(function_definition
|
|
declarator: (declaration)
|
|
body: (compound_statement))
|
|
</pre></div>
|
|
|
|
<span id="Exploring-the-syntax-tree"></span><h3 class="heading">Exploring the syntax tree</h3>
|
|
<span id="index-explore-tree_002dsitter-syntax-tree"></span>
|
|
<span id="index-inspection-of-tree_002dsitter-parse-tree-nodes"></span>
|
|
|
|
<p>To aid in understanding the syntax of a language and in debugging of
|
|
Lisp program that use the syntax tree, Emacs provides an “explore”
|
|
mode, which displays the syntax tree of the source in the current
|
|
buffer in real time. Emacs also comes with an “inspect mode”, which
|
|
displays information of the nodes at point in the mode-line.
|
|
</p>
|
|
<dl class="def">
|
|
<dt id="index-treesit_002dexplore_002dmode"><span class="category">Command: </span><span><strong>treesit-explore-mode</strong><a href='#index-treesit_002dexplore_002dmode' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This mode pops up a window displaying the syntax tree of the source in
|
|
the current buffer. Selecting text in the source buffer highlights
|
|
the corresponding nodes in the syntax tree display. Clicking
|
|
on nodes in the syntax tree highlights the corresponding text in the
|
|
source buffer.
|
|
</p></dd></dl>
|
|
|
|
<dl class="def">
|
|
<dt id="index-treesit_002dinspect_002dmode"><span class="category">Command: </span><span><strong>treesit-inspect-mode</strong><a href='#index-treesit_002dinspect_002dmode' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This minor mode displays on the mode-line the node that <em>starts</em>
|
|
at point. For example, the mode-line can display
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example"><var>parent</var> <var>field</var>: (<var>node</var> (<var>child</var> (…)))
|
|
</pre></div>
|
|
|
|
<p>where <var>node</var>, <var>child</var>, etc., are nodes which begin at point.
|
|
<var>parent</var> is the parent of <var>node</var>. <var>node</var> is displayed in
|
|
a bold typeface. <var>field-name</var>s are field names of <var>node</var> and
|
|
of <var>child</var>, etc.
|
|
</p>
|
|
<p>If no node starts at point, i.e., point is in the middle of a node,
|
|
then the mode line displays the earliest node that spans point, and
|
|
its immediate parent.
|
|
</p>
|
|
<p>This minor mode doesn’t create parsers on its own. It uses the first
|
|
parser in <code>(treesit-parser-list)</code> (see <a href="Using-Parser.html">Using Tree-sitter Parser</a>).
|
|
</p></dd></dl>
|
|
|
|
<span id="Reading-the-grammar-definition"></span><h3 class="heading">Reading the grammar definition</h3>
|
|
<span id="index-reading-grammar-definition_002c-tree_002dsitter"></span>
|
|
|
|
<p>Authors of language definitions define the <em>grammar</em> of a
|
|
programming language, which determines how a parser constructs a
|
|
concrete syntax tree out of the program text. In order to use the
|
|
syntax tree effectively, you need to consult the <em>grammar file</em>.
|
|
</p>
|
|
<p>The grammar file is usually <samp>grammar.js</samp> in a language
|
|
definition’s project repository. The link to a language definition’s
|
|
home page can be found on
|
|
<a href="https://tree-sitter.github.io/tree-sitter">tree-sitter’s
|
|
homepage</a>.
|
|
</p>
|
|
<p>The grammar definition is written in JavaScript. For example, the
|
|
rule matching a <code>function_definition</code> node looks like
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">function_definition: $ => seq(
|
|
$.declaration_specifiers,
|
|
field('declarator', $.declaration),
|
|
field('body', $.compound_statement)
|
|
)
|
|
</pre></div>
|
|
|
|
<p>The rules are represented by functions that take a single argument
|
|
<var>$</var>, representing the whole grammar. The function itself is
|
|
constructed by other functions: the <code>seq</code> function puts together
|
|
a sequence of children; the <code>field</code> function annotates a child
|
|
with a field name. If we write the above definition in the so-called
|
|
<em>Backus-Naur Form</em> (<acronym>BNF</acronym>) syntax, it would look like
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">function_definition :=
|
|
<declaration_specifiers> <declaration> <compound_statement>
|
|
</pre></div>
|
|
|
|
<p>and the node returned by the parser would look like
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">(function_definition
|
|
(declaration_specifier)
|
|
declarator: (declaration)
|
|
body: (compound_statement))
|
|
</pre></div>
|
|
|
|
<p>Below is a list of functions that one can see in a grammar definition.
|
|
Each function takes other rules as arguments and returns a new rule.
|
|
</p>
|
|
<dl compact="compact">
|
|
<dt><span><code>seq(<var>rule1</var>, <var>rule2</var>, …)</code></span></dt>
|
|
<dd><p>matches each rule one after another.
|
|
</p></dd>
|
|
<dt><span><code>choice(<var>rule1</var>, <var>rule2</var>, …)</code></span></dt>
|
|
<dd><p>matches one of the rules in its arguments.
|
|
</p></dd>
|
|
<dt><span><code>repeat(<var>rule</var>)</code></span></dt>
|
|
<dd><p>matches <var>rule</var> for <em>zero or more</em> times.
|
|
This is like the ‘<samp>*</samp>’ operator in regular expressions.
|
|
</p></dd>
|
|
<dt><span><code>repeat1(<var>rule</var>)</code></span></dt>
|
|
<dd><p>matches <var>rule</var> for <em>one or more</em> times.
|
|
This is like the ‘<samp>+</samp>’ operator in regular expressions.
|
|
</p></dd>
|
|
<dt><span><code>optional(<var>rule</var>)</code></span></dt>
|
|
<dd><p>matches <var>rule</var> for <em>zero or one</em> time.
|
|
This is like the ‘<samp>?</samp>’ operator in regular expressions.
|
|
</p></dd>
|
|
<dt><span><code>field(<var>name</var>, <var>rule</var>)</code></span></dt>
|
|
<dd><p>assigns field name <var>name</var> to the child node matched by <var>rule</var>.
|
|
</p></dd>
|
|
<dt><span><code>alias(<var>rule</var>, <var>alias</var>)</code></span></dt>
|
|
<dd><p>makes nodes matched by <var>rule</var> appear as <var>alias</var> in the syntax
|
|
tree generated by the parser. For example,
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">alias(preprocessor_call_exp, call_expression)
|
|
</pre></div>
|
|
|
|
<p>makes any node matched by <code>preprocessor_call_exp</code> appear as
|
|
<code>call_expression</code>.
|
|
</p></dd>
|
|
</dl>
|
|
|
|
<p>Below are grammar functions of lesser importance for reading a
|
|
language definition.
|
|
</p>
|
|
<dl compact="compact">
|
|
<dt><span><code>token(<var>rule</var>)</code></span></dt>
|
|
<dd><p>marks <var>rule</var> to produce a single leaf node. That is, instead of
|
|
generating a parent node with individual child nodes under it,
|
|
everything is combined into a single leaf node. See <a href="Retrieving-Nodes.html">Retrieving Nodes</a>.
|
|
</p></dd>
|
|
<dt><span><code>token.immediate(<var>rule</var>)</code></span></dt>
|
|
<dd><p>Normally, grammar rules ignore preceding whitespace; this
|
|
changes <var>rule</var> to match only when there is no preceding
|
|
whitespaces.
|
|
</p></dd>
|
|
<dt><span><code>prec(<var>n</var>, <var>rule</var>)</code></span></dt>
|
|
<dd><p>gives <var>rule</var> the level-<var>n</var> precedence.
|
|
</p></dd>
|
|
<dt><span><code>prec.left([<var>n</var>,] <var>rule</var>)</code></span></dt>
|
|
<dd><p>marks <var>rule</var> as left-associative, optionally with level <var>n</var>.
|
|
</p></dd>
|
|
<dt><span><code>prec.right([<var>n</var>,] <var>rule</var>)</code></span></dt>
|
|
<dd><p>marks <var>rule</var> as right-associative, optionally with level <var>n</var>.
|
|
</p></dd>
|
|
<dt><span><code>prec.dynamic(<var>n</var>, <var>rule</var>)</code></span></dt>
|
|
<dd><p>this is like <code>prec</code>, but the precedence is applied at runtime
|
|
instead.
|
|
</p></dd>
|
|
</dl>
|
|
|
|
<p>The documentation of the tree-sitter project has
|
|
<a href="https://tree-sitter.github.io/tree-sitter/creating-parsers">more
|
|
about writing a grammar</a>. Read especially “The Grammar DSL”
|
|
section.
|
|
</p>
|
|
</div>
|
|
<hr>
|
|
<div class="header">
|
|
<p>
|
|
Next: <a href="Using-Parser.html">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
|
</div>
|
|
|
|
|
|
|
|
</body>
|
|
</html>
|