mirror of
git://git.sv.gnu.org/emacs.git
synced 2025-12-06 06:20:55 -08:00
327 lines
16 KiB
HTML
327 lines
16 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<!-- This is the GNU Emacs Lisp Reference Manual
|
|
corresponding to Emacs version 29.0.50.
|
|
|
|
Copyright © 1990-1996, 1998-2023 Free Software Foundation, Inc.
|
|
|
|
Permission is granted to copy, distribute and/or modify this document
|
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
|
any later version published by the Free Software Foundation; with the
|
|
Invariant Sections being "GNU General Public License," with the
|
|
Front-Cover Texts being "A GNU Manual," and with the Back-Cover
|
|
Texts as in (a) below. A copy of the license is included in the
|
|
section entitled "GNU Free Documentation License."
|
|
|
|
(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
|
|
modify this GNU manual. Buying copies from the FSF supports it in
|
|
developing GNU and promoting software freedom." -->
|
|
<title>Multiple Languages (GNU Emacs Lisp Reference Manual)</title>
|
|
|
|
<meta name="description" content="Multiple Languages (GNU Emacs Lisp Reference Manual)">
|
|
<meta name="keywords" content="Multiple Languages (GNU Emacs Lisp Reference Manual)">
|
|
<meta name="resource-type" content="document">
|
|
<meta name="distribution" content="global">
|
|
<meta name="Generator" content="makeinfo">
|
|
<meta name="viewport" content="width=device-width,initial-scale=1">
|
|
|
|
<link href="index.html" rel="start" title="Top">
|
|
<link href="Index.html" rel="index" title="Index">
|
|
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
|
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
|
|
<link href="Tree_002dsitter-major-modes.html" rel="next" title="Tree-sitter major modes">
|
|
<link href="Pattern-Matching.html" rel="prev" title="Pattern Matching">
|
|
<style type="text/css">
|
|
<!--
|
|
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
|
|
a.summary-letter {text-decoration: none}
|
|
blockquote.indentedblock {margin-right: 0em}
|
|
div.display {margin-left: 3.2em}
|
|
div.example {margin-left: 3.2em}
|
|
kbd {font-style: oblique}
|
|
pre.display {font-family: inherit}
|
|
pre.format {font-family: inherit}
|
|
pre.menu-comment {font-family: serif}
|
|
pre.menu-preformatted {font-family: serif}
|
|
span.nolinebreak {white-space: nowrap}
|
|
span.roman {font-family: initial; font-weight: normal}
|
|
span.sansserif {font-family: sans-serif; font-weight: normal}
|
|
span:hover a.copiable-anchor {visibility: visible}
|
|
ul.no-bullet {list-style: none}
|
|
-->
|
|
</style>
|
|
<link rel="stylesheet" type="text/css" href="./manual.css">
|
|
|
|
|
|
</head>
|
|
|
|
<body lang="en">
|
|
<div class="section" id="Multiple-Languages">
|
|
<div class="header">
|
|
<p>
|
|
Next: <a href="Tree_002dsitter-major-modes.html" accesskey="n" rel="next">Developing major modes with tree-sitter</a>, Previous: <a href="Pattern-Matching.html" accesskey="p" rel="prev">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
|
</div>
|
|
<hr>
|
|
<span id="Parsing-Text-in-Multiple-Languages"></span><h3 class="section">37.6 Parsing Text in Multiple Languages</h3>
|
|
<span id="index-multiple-languages_002c-parsing-with-tree_002dsitter"></span>
|
|
<span id="index-parsing-multiple-languages-with-tree_002dsitter"></span>
|
|
<p>Sometimes, the source of a programming language could contain snippets
|
|
of other languages; <acronym>HTML</acronym> + <acronym>CSS</acronym> + JavaScript is one
|
|
example. In that case, text segments written in different languages
|
|
need to be assigned different parsers. Traditionally, this is
|
|
achieved by using narrowing. While tree-sitter works with narrowing
|
|
(see <a href="Using-Parser.html#tree_002dsitter-narrowing">narrowing</a>), the recommended way is
|
|
instead to set regions of buffer text (i.e., ranges) in which a parser
|
|
will operate. This section describes functions for setting and
|
|
getting ranges for a parser.
|
|
</p>
|
|
<p>Lisp programs should call <code>treesit-update-ranges</code> to make sure
|
|
the ranges for each parser are correct before using parsers in a
|
|
buffer, and call <code>treesit-language-at</code> to figure out the language
|
|
responsible for the text at some position. These two functions don’t
|
|
work by themselves, they need major modes to set
|
|
<code>treesit-range-settings</code> and
|
|
<code>treesit-language-at-point-function</code>, which do the actual work.
|
|
These functions and variables are explained in more detail towards the
|
|
end of the section.
|
|
</p>
|
|
<span id="Getting-and-setting-ranges"></span><h3 class="heading">Getting and setting ranges</h3>
|
|
|
|
<dl class="def">
|
|
<dt id="index-treesit_002dparser_002dset_002dincluded_002dranges"><span class="category">Function: </span><span><strong>treesit-parser-set-included-ranges</strong> <em>parser ranges</em><a href='#index-treesit_002dparser_002dset_002dincluded_002dranges' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This function sets up <var>parser</var> to operate on <var>ranges</var>. The
|
|
<var>parser</var> will only read the text of the specified ranges. Each
|
|
range in <var>ranges</var> is a list of the form <code>(<var>beg</var> . <var>end</var>)</code><!-- /@w -->.
|
|
</p>
|
|
<p>The ranges in <var>ranges</var> must come in order and must not overlap.
|
|
That is, in pseudo code:
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">(cl-loop for idx from 1 to (1- (length ranges))
|
|
for prev = (nth (1- idx) ranges)
|
|
for next = (nth idx ranges)
|
|
should (<= (car prev) (cdr prev)
|
|
(car next) (cdr next)))
|
|
</pre></div>
|
|
|
|
<span id="index-treesit_002drange_002dinvalid"></span>
|
|
<p>If <var>ranges</var> violates this constraint, or something else went
|
|
wrong, this function signals the <code>treesit-range-invalid</code> error.
|
|
The signal data contains a specific error message and the ranges we
|
|
are trying to set.
|
|
</p>
|
|
<p>This function can also be used for disabling ranges. If <var>ranges</var>
|
|
is <code>nil</code>, the parser is set to parse the whole buffer.
|
|
</p>
|
|
<p>Example:
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">(treesit-parser-set-included-ranges
|
|
parser '((1 . 9) (16 . 24) (24 . 25)))
|
|
</pre></div>
|
|
</dd></dl>
|
|
|
|
<dl class="def">
|
|
<dt id="index-treesit_002dparser_002dincluded_002dranges"><span class="category">Function: </span><span><strong>treesit-parser-included-ranges</strong> <em>parser</em><a href='#index-treesit_002dparser_002dincluded_002dranges' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This function returns the ranges set for <var>parser</var>. The return
|
|
value is the same as the <var>ranges</var> argument of
|
|
<code>treesit-parser-included-ranges</code>: a list of cons cells of the form
|
|
<code>(<var>beg</var> . <var>end</var>)</code><!-- /@w -->. If <var>parser</var> doesn’t have any
|
|
ranges, the return value is <code>nil</code>.
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">(treesit-parser-included-ranges parser)
|
|
⇒ ((1 . 9) (16 . 24) (24 . 25))
|
|
</pre></div>
|
|
</dd></dl>
|
|
|
|
<dl class="def">
|
|
<dt id="index-treesit_002dquery_002drange"><span class="category">Function: </span><span><strong>treesit-query-range</strong> <em>source query &optional beg end</em><a href='#index-treesit_002dquery_002drange' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This function matches <var>source</var> with <var>query</var> and returns the
|
|
ranges of captured nodes. The return value is a list of cons cells of
|
|
the form <code>(<var>beg</var> . <var>end</var>)</code><!-- /@w -->, where <var>beg</var> and
|
|
<var>end</var> specify the beginning and the end of a region of text.
|
|
</p>
|
|
<p>For convenience, <var>source</var> can be a language symbol, a parser, or a
|
|
node. If it’s a language symbol, this function matches in the root
|
|
node of the first parser using that language; if a parser, this
|
|
function matches in the root node of that parser; if a node, this
|
|
function matches in that node.
|
|
</p>
|
|
<p>The argument <var>query</var> is the query used to capture nodes
|
|
(see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>). The capture names don’t matter. The
|
|
arguments <var>beg</var> and <var>end</var>, if both non-<code>nil</code>, limit the
|
|
range in which this function queries.
|
|
</p>
|
|
<p>Like other query functions, this function raises the
|
|
<code>treesit-query-error</code> error if <var>query</var> is malformed.
|
|
</p></dd></dl>
|
|
|
|
<span id="Supporting-multiple-languages-in-Lisp-programs"></span><h3 class="heading">Supporting multiple languages in Lisp programs</h3>
|
|
|
|
<p>It should suffice for general Lisp programs to call the following two
|
|
functions in order to support program sources that mixes multiple
|
|
languages.
|
|
</p>
|
|
<dl class="def">
|
|
<dt id="index-treesit_002dupdate_002dranges"><span class="category">Function: </span><span><strong>treesit-update-ranges</strong> <em>&optional beg end</em><a href='#index-treesit_002dupdate_002dranges' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This function updates ranges for parsers in the buffer. It makes sure
|
|
the parsers’ ranges are set correctly between <var>beg</var> and <var>end</var>,
|
|
according to <code>treesit-range-settings</code>. If omitted, <var>beg</var>
|
|
defaults to the beginning of the buffer, and <var>end</var> defaults to the
|
|
end of the buffer.
|
|
</p>
|
|
<p>For example, fontification functions use this function before querying
|
|
for nodes in a region.
|
|
</p></dd></dl>
|
|
|
|
<dl class="def">
|
|
<dt id="index-treesit_002dlanguage_002dat"><span class="category">Function: </span><span><strong>treesit-language-at</strong> <em>pos</em><a href='#index-treesit_002dlanguage_002dat' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This function returns the language of the text at buffer position
|
|
<var>pos</var>. Under the hood it calls
|
|
<code>treesit-language-at-point-function</code> and returns its return
|
|
value. If <code>treesit-language-at-point-function</code> is <code>nil</code>,
|
|
this function returns the language of the first parser in the returned
|
|
value of <code>treesit-parser-list</code>. If there is no parser in the
|
|
buffer, it returns <code>nil</code>.
|
|
</p></dd></dl>
|
|
|
|
<span id="Supporting-multiple-languages-in-major-modes"></span><h3 class="heading">Supporting multiple languages in major modes</h3>
|
|
|
|
<span id="index-host-language_002c-tree_002dsitter"></span>
|
|
<span id="index-tree_002dsitter-host-and-embedded-languages"></span>
|
|
<span id="index-embedded-language_002c-tree_002dsitter"></span>
|
|
<p>Normally, in a set of languages that can be mixed together, there is a
|
|
<em>host language</em> and one or more <em>embedded languages</em>. A Lisp
|
|
program usually first parses the whole document with the host
|
|
language’s parser, retrieves some information, sets ranges for the
|
|
embedded languages with that information, and then parses the embedded
|
|
languages.
|
|
</p>
|
|
<p>Take a buffer containing <acronym>HTML</acronym>, <acronym>CSS</acronym> and JavaScript
|
|
as an example. A Lisp program will first parse the whole buffer with
|
|
an <acronym>HTML</acronym> parser, then query the parser for
|
|
<code>style_element</code> and <code>script_element</code> nodes, which
|
|
correspond to <acronym>CSS</acronym> and JavaScript text, respectively. Then
|
|
it sets the range of the <acronym>CSS</acronym> and JavaScript parser to the
|
|
ranges in which their corresponding nodes span.
|
|
</p>
|
|
<p>Given a simple <acronym>HTML</acronym> document:
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example"><html>
|
|
<script>1 + 2</script>
|
|
<style>body { color: "blue"; }</style>
|
|
</html>
|
|
</pre></div>
|
|
|
|
<p>a Lisp program will first parse with a <acronym>HTML</acronym> parser, then set
|
|
ranges for <acronym>CSS</acronym> and JavaScript parsers:
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">;; Create parsers.
|
|
(setq html (treesit-get-parser-create 'html))
|
|
(setq css (treesit-get-parser-create 'css))
|
|
(setq js (treesit-get-parser-create 'javascript))
|
|
</pre><pre class="example">
|
|
|
|
</pre><pre class="example">;; Set CSS ranges.
|
|
(setq css-range
|
|
(treesit-query-range
|
|
'html
|
|
"(style_element (raw_text) @capture)"))
|
|
(treesit-parser-set-included-ranges css css-range)
|
|
</pre><pre class="example">
|
|
|
|
</pre><pre class="example">;; Set JavaScript ranges.
|
|
(setq js-range
|
|
(treesit-query-range
|
|
'html
|
|
"(script_element (raw_text) @capture)"))
|
|
(treesit-parser-set-included-ranges js js-range)
|
|
</pre></div>
|
|
|
|
<p>Emacs automates this process in <code>treesit-update-ranges</code>. A
|
|
multi-language major mode should set <code>treesit-range-settings</code> so
|
|
that <code>treesit-update-ranges</code> knows how to perform this process
|
|
automatically. Major modes should use the helper function
|
|
<code>treesit-range-rules</code> to generate a value that can be assigned to
|
|
<code>treesit-range-settings</code>. The settings in the following example
|
|
directly translate into operations shown above.
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example">(setq-local treesit-range-settings
|
|
(treesit-range-rules
|
|
:embed 'javascript
|
|
:host 'html
|
|
'((script_element (raw_text) @capture))
|
|
</pre><pre class="example">
|
|
|
|
</pre><pre class="example"> :embed 'css
|
|
:host 'html
|
|
'((style_element (raw_text) @capture))))
|
|
</pre></div>
|
|
|
|
<dl class="def">
|
|
<dt id="index-treesit_002drange_002drules"><span class="category">Function: </span><span><strong>treesit-range-rules</strong> <em>&rest query-specs</em><a href='#index-treesit_002drange_002drules' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This function is used to set <var>treesit-range-settings</var>. It
|
|
takes care of compiling queries and other post-processing, and outputs
|
|
a value that <var>treesit-range-settings</var> can have.
|
|
</p>
|
|
<p>It takes a series of <var>query-spec</var>s, where each <var>query-spec</var> is
|
|
a <var>query</var> preceded by zero or more <var>keyword</var>/<var>value</var>
|
|
pairs. Each <var>query</var> is a tree-sitter query in either the
|
|
string, s-expression or compiled form, or a function.
|
|
</p>
|
|
<p>If <var>query</var> is a tree-sitter query, it should be preceded by two
|
|
<var>:keyword</var>/<var>value</var> pairs, where the <code>:embed</code> keyword
|
|
specifies the embedded language, and the <code>:host</code> keyword
|
|
specified the host language.
|
|
</p>
|
|
<p><code>treesit-update-ranges</code> uses <var>query</var> to figure out how to set
|
|
the ranges for parsers for the embedded language. It queries
|
|
<var>query</var> in a host language parser, computes the ranges in which
|
|
the captured nodes span, and applies these ranges to embedded
|
|
language parsers.
|
|
</p>
|
|
<p>If <var>query</var> is a function, it doesn’t need any <var>:keyword</var> and
|
|
<var>value</var> pair. It should be a function that takes 2 arguments,
|
|
<var>start</var> and <var>end</var>, and sets the ranges for parsers in the
|
|
current buffer in the region between <var>start</var> and <var>end</var>. It is
|
|
fine for this function to set ranges in a larger region that
|
|
encompasses the region between <var>start</var> and <var>end</var>.
|
|
</p></dd></dl>
|
|
|
|
<dl class="def">
|
|
<dt id="index-treesit_002drange_002dsettings"><span class="category">Variable: </span><span><strong>treesit-range-settings</strong><a href='#index-treesit_002drange_002dsettings' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This variable helps <code>treesit-update-ranges</code> in updating the
|
|
ranges for parsers in the buffer. It is a list of <var>setting</var>s
|
|
where the exact format of a <var>setting</var> is considered internal. You
|
|
should use <code>treesit-range-rules</code> to generate a value that this
|
|
variable can have.
|
|
</p>
|
|
</dd></dl>
|
|
|
|
|
|
<dl class="def">
|
|
<dt id="index-treesit_002dlanguage_002dat_002dpoint_002dfunction"><span class="category">Variable: </span><span><strong>treesit-language-at-point-function</strong><a href='#index-treesit_002dlanguage_002dat_002dpoint_002dfunction' class='copiable-anchor'> ¶</a></span></dt>
|
|
<dd><p>This variable’s value should be a function that takes a single
|
|
argument, <var>pos</var>, which is a buffer position, and returns the
|
|
language of the buffer text at <var>pos</var>. This variable is used by
|
|
<code>treesit-language-at</code>.
|
|
</p></dd></dl>
|
|
|
|
</div>
|
|
<hr>
|
|
<div class="header">
|
|
<p>
|
|
Next: <a href="Tree_002dsitter-major-modes.html">Developing major modes with tree-sitter</a>, Previous: <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
|
</div>
|
|
|
|
|
|
|
|
</body>
|
|
</html>
|