mirror of
git://git.sv.gnu.org/emacs.git
synced 2025-12-08 23:40:24 -08:00
; * admin/notes/tree-sitter/starter-guide: Update starter-guide.
This commit is contained in:
parent
11592bcfda
commit
e84f878e19
1 changed files with 80 additions and 77 deletions
|
|
@ -17,6 +17,7 @@ TOC:
|
||||||
- More features?
|
- More features?
|
||||||
- Common tasks (code snippets)
|
- Common tasks (code snippets)
|
||||||
- Manual
|
- Manual
|
||||||
|
- Appendix 1
|
||||||
|
|
||||||
* Building Emacs with tree-sitter
|
* Building Emacs with tree-sitter
|
||||||
|
|
||||||
|
|
@ -42,11 +43,9 @@ You can use this script that I put together here:
|
||||||
|
|
||||||
https://github.com/casouri/tree-sitter-module
|
https://github.com/casouri/tree-sitter-module
|
||||||
|
|
||||||
You can also find them under this directory in /build-modules.
|
|
||||||
|
|
||||||
This script automatically pulls and builds language definitions for C,
|
This script automatically pulls and builds language definitions for C,
|
||||||
C++, Rust, JSON, Go, HTML, JavaScript, CSS, Python, Typescript,
|
C++, Rust, JSON, Go, HTML, JavaScript, CSS, Python, Typescript,
|
||||||
and C#. Better yet, I pre-built these language definitions for
|
C#, etc. Better yet, I pre-built these language definitions for
|
||||||
GNU/Linux and macOS, they can be downloaded here:
|
GNU/Linux and macOS, they can be downloaded here:
|
||||||
|
|
||||||
https://github.com/casouri/tree-sitter-module/releases/tag/v2.1
|
https://github.com/casouri/tree-sitter-module/releases/tag/v2.1
|
||||||
|
|
@ -68,6 +67,10 @@ organization has all the "official" language definitions:
|
||||||
|
|
||||||
https://github.com/tree-sitter
|
https://github.com/tree-sitter
|
||||||
|
|
||||||
|
Alternatively, you can use treesit-install-language-grammar command
|
||||||
|
and follow its instructions. If everything goes right, it should
|
||||||
|
automatically download and compile the language grammar for you.
|
||||||
|
|
||||||
* Setting up for adding major mode features
|
* Setting up for adding major mode features
|
||||||
|
|
||||||
Start Emacs and load tree-sitter with
|
Start Emacs and load tree-sitter with
|
||||||
|
|
@ -78,6 +81,10 @@ Now check if Emacs is built with tree-sitter library
|
||||||
|
|
||||||
(treesit-available-p)
|
(treesit-available-p)
|
||||||
|
|
||||||
|
Make sure Emacs can find the language grammar you want to use
|
||||||
|
|
||||||
|
(treesit-language-available-p 'lang)
|
||||||
|
|
||||||
* Tree-sitter major modes
|
* Tree-sitter major modes
|
||||||
|
|
||||||
Tree-sitter modes should be separate major modes, so other modes
|
Tree-sitter modes should be separate major modes, so other modes
|
||||||
|
|
@ -89,12 +96,15 @@ modes.
|
||||||
|
|
||||||
If the tree-sitter variant and the "native" variant could share some
|
If the tree-sitter variant and the "native" variant could share some
|
||||||
setup, you can create a "base mode", which only contains the common
|
setup, you can create a "base mode", which only contains the common
|
||||||
setup. For example, there is python-base-mode (shared), python-mode
|
setup. For example, python.el defines python-base-mode (shared),
|
||||||
(native), and python-ts-mode (tree-sitter).
|
python-mode (native), and python-ts-mode (tree-sitter).
|
||||||
|
|
||||||
In the tree-sitter mode, check if we can use tree-sitter with
|
In the tree-sitter mode, check if we can use tree-sitter with
|
||||||
treesit-ready-p, it will error out if tree-sitter is not ready.
|
treesit-ready-p, it will error out if tree-sitter is not ready.
|
||||||
|
|
||||||
|
In Emacs 30 we'll introduce some mechanism to more gracefully inherit
|
||||||
|
modes and fallback to other modes.
|
||||||
|
|
||||||
* Naming convention
|
* Naming convention
|
||||||
|
|
||||||
Use tree-sitter for text (documentation, comment), use treesit for
|
Use tree-sitter for text (documentation, comment), use treesit for
|
||||||
|
|
@ -180,18 +190,17 @@ mark the offending part in red.
|
||||||
To enable tree-sitter font-lock, set ‘treesit-font-lock-settings’ and
|
To enable tree-sitter font-lock, set ‘treesit-font-lock-settings’ and
|
||||||
‘treesit-font-lock-feature-list’ buffer-locally and call
|
‘treesit-font-lock-feature-list’ buffer-locally and call
|
||||||
‘treesit-major-mode-setup’. For example, see
|
‘treesit-major-mode-setup’. For example, see
|
||||||
‘python--treesit-settings’ in python.el. Below I paste a snippet of
|
‘python--treesit-settings’ in python.el. Below is a snippet of it.
|
||||||
it.
|
|
||||||
|
|
||||||
Note that like the current font-lock, if the to-be-fontified region
|
Just like the current font-lock, if the to-be-fontified region already
|
||||||
already has a face (ie, an earlier match fontified part/all of the
|
has a face (ie, an earlier match fontified part/all of the region),
|
||||||
region), the new face is discarded rather than applied. If you want
|
the new face is discarded rather than applied. If you want later
|
||||||
later matches always override earlier matches, use the :override
|
matches always override earlier matches, use the :override keyword.
|
||||||
keyword.
|
|
||||||
|
|
||||||
Each rule should have a :feature, like function-name,
|
Each rule should have a :feature, like function-name,
|
||||||
string-interpolation, builtin, etc. Users can then enable/disable each
|
string-interpolation, builtin, etc. Users can then enable/disable each
|
||||||
feature individually.
|
feature individually. See Appendix 1 at the bottom for a set of common
|
||||||
|
features names.
|
||||||
|
|
||||||
#+begin_src elisp
|
#+begin_src elisp
|
||||||
(defvar python--treesit-settings
|
(defvar python--treesit-settings
|
||||||
|
|
@ -247,8 +256,7 @@ Concretely, something like this:
|
||||||
(string-interpolation decorator)))
|
(string-interpolation decorator)))
|
||||||
(treesit-major-mode-setup))
|
(treesit-major-mode-setup))
|
||||||
(t
|
(t
|
||||||
;; No tree-sitter
|
;; No tree-sitter, do nothing or fallback to another mode.
|
||||||
(setq-local font-lock-defaults ...)
|
|
||||||
...)))
|
...)))
|
||||||
#+end_src
|
#+end_src
|
||||||
|
|
||||||
|
|
@ -289,6 +297,7 @@ For ANCHOR we have
|
||||||
first-sibling => start of the first sibling
|
first-sibling => start of the first sibling
|
||||||
parent => start of parent
|
parent => start of parent
|
||||||
parent-bol => BOL of the line parent is on.
|
parent-bol => BOL of the line parent is on.
|
||||||
|
standalone-parent => Like parent-bol but handles more edge cases
|
||||||
prev-sibling => start of previous sibling
|
prev-sibling => start of previous sibling
|
||||||
no-indent => current position (don’t indent)
|
no-indent => current position (don’t indent)
|
||||||
prev-line => start of previous line
|
prev-line => start of previous line
|
||||||
|
|
@ -329,7 +338,8 @@ tells you which rule is applied in the echo area.
|
||||||
...))))
|
...))))
|
||||||
#+end_src
|
#+end_src
|
||||||
|
|
||||||
Then you set ‘treesit-simple-indent-rules’ to your rules, and call
|
To setup indentation for your major mode, set
|
||||||
|
‘treesit-simple-indent-rules’ to your rules, and call
|
||||||
‘treesit-major-mode-setup’:
|
‘treesit-major-mode-setup’:
|
||||||
|
|
||||||
#+begin_src elisp
|
#+begin_src elisp
|
||||||
|
|
@ -339,36 +349,14 @@ Then you set ‘treesit-simple-indent-rules’ to your rules, and call
|
||||||
|
|
||||||
* Imenu
|
* Imenu
|
||||||
|
|
||||||
Not much to say except for utilizing ‘treesit-induce-sparse-tree’ (and
|
Set ‘treesit-simple-imenu-settings’ and call
|
||||||
explicitly pass a LIMIT argument: most of the time you don't need more
|
‘treesit-major-mode-setup’.
|
||||||
than 10). See ‘js--treesit-imenu-1’ in js.el for an example.
|
|
||||||
|
|
||||||
Once you have the index builder, set ‘imenu-create-index-function’ to
|
|
||||||
it.
|
|
||||||
|
|
||||||
* Navigation
|
* Navigation
|
||||||
|
|
||||||
Mainly ‘beginning-of-defun-function’ and ‘end-of-defun-function’.
|
Set ‘treesit-defun-type-regexp’ and call
|
||||||
You can find the end of a defun with something like
|
‘treesit-major-mode-setup’. You can additionally set
|
||||||
|
‘treesit-defun-name-function’.
|
||||||
(treesit-search-forward-goto "function_definition" 'end)
|
|
||||||
|
|
||||||
where "function_definition" matches the node type of a function
|
|
||||||
definition node, and ’end means we want to go to the end of that node.
|
|
||||||
|
|
||||||
Tree-sitter has default implementations for
|
|
||||||
‘beginning-of-defun-function’ and ‘end-of-defun-function’. So for
|
|
||||||
ordinary languages, it is enough to set ‘treesit-defun-type-regexp’
|
|
||||||
to something that matches all the defun struct types in the language,
|
|
||||||
and call ‘treesit-major-mode-setup’. For example,
|
|
||||||
|
|
||||||
#+begin_src emacs-lisp
|
|
||||||
(setq-local treesit-defun-type-regexp (rx bol
|
|
||||||
(or "function" "class")
|
|
||||||
"_definition"
|
|
||||||
eol))
|
|
||||||
(treesit-major-mode-setup)
|
|
||||||
#+end_src>
|
|
||||||
|
|
||||||
* Which-func
|
* Which-func
|
||||||
|
|
||||||
|
|
@ -376,36 +364,7 @@ If you have an imenu implementation, set ‘which-func-functions’ to
|
||||||
nil, and which-func will automatically use imenu’s data.
|
nil, and which-func will automatically use imenu’s data.
|
||||||
|
|
||||||
If you want an independent implementation for which-func, you can
|
If you want an independent implementation for which-func, you can
|
||||||
find the current function by going up the tree and looking for the
|
find the current function by ‘treesit-defun-at-point’.
|
||||||
function_definition node. See the function below for an example.
|
|
||||||
Since Python allows nested function definitions, that function keeps
|
|
||||||
going until it reaches the root node, and records all the function
|
|
||||||
names along the way.
|
|
||||||
|
|
||||||
#+begin_src elisp
|
|
||||||
(defun python-info-treesit-current-defun (&optional include-type)
|
|
||||||
"Identical to `python-info-current-defun' but use tree-sitter.
|
|
||||||
For INCLUDE-TYPE see `python-info-current-defun'."
|
|
||||||
(let ((node (treesit-node-at (point)))
|
|
||||||
(name-list ())
|
|
||||||
(type nil))
|
|
||||||
(cl-loop while node
|
|
||||||
if (pcase (treesit-node-type node)
|
|
||||||
("function_definition"
|
|
||||||
(setq type 'def))
|
|
||||||
("class_definition"
|
|
||||||
(setq type 'class))
|
|
||||||
(_ nil))
|
|
||||||
do (push (treesit-node-text
|
|
||||||
(treesit-node-child-by-field-name node "name")
|
|
||||||
t)
|
|
||||||
name-list)
|
|
||||||
do (setq node (treesit-node-parent node))
|
|
||||||
finally return (concat (if include-type
|
|
||||||
(format "%s " type)
|
|
||||||
"")
|
|
||||||
(string-join name-list ".")))))
|
|
||||||
#+end_src
|
|
||||||
|
|
||||||
* More features?
|
* More features?
|
||||||
|
|
||||||
|
|
@ -449,7 +408,51 @@ section is Parsing Program Source. Typing
|
||||||
|
|
||||||
C-h i d m elisp RET g Parsing Program Source RET
|
C-h i d m elisp RET g Parsing Program Source RET
|
||||||
|
|
||||||
will bring you to that section. You can also read the HTML version
|
will bring you to that section. You don’t need to read through every
|
||||||
under /html-manual in this directory. I find the HTML version easier
|
sentence, just read the text paragraphs and glance over function
|
||||||
to read. You don’t need to read through every sentence, just read the
|
names.
|
||||||
text paragraphs and glance over function names.
|
|
||||||
|
* Appendix 1
|
||||||
|
|
||||||
|
Below is a set of common features used by built-in major mode.
|
||||||
|
|
||||||
|
Basic tokens:
|
||||||
|
|
||||||
|
delimiter ,.; (delimit things)
|
||||||
|
operator == != || (produces a value)
|
||||||
|
bracket []{}()
|
||||||
|
misc-punctuation (other punctuation that you want to highlight)
|
||||||
|
|
||||||
|
constant true, false, null
|
||||||
|
number
|
||||||
|
keyword
|
||||||
|
comment (includes doc-comments)
|
||||||
|
string (includes chars and docstrings)
|
||||||
|
string-interpolation f"text {variable}"
|
||||||
|
escape-sequence "\n\t\\"
|
||||||
|
function every function identifier
|
||||||
|
variable every variable identifier
|
||||||
|
type every type identifier
|
||||||
|
property a.b <--- highlight b
|
||||||
|
key { a: b, c: d } <--- highlight a, c
|
||||||
|
error highlight parse error
|
||||||
|
|
||||||
|
Abstract features:
|
||||||
|
|
||||||
|
assignment: the LHS of an assignment (thing being assigned to), eg:
|
||||||
|
|
||||||
|
a = b <--- highlight a
|
||||||
|
a.b = c <--- highlight b
|
||||||
|
a[1] = d <--- highlight a
|
||||||
|
|
||||||
|
definition: the thing being defined, eg:
|
||||||
|
|
||||||
|
int a(int b) { <--- highlight a
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
int a; <-- highlight a
|
||||||
|
|
||||||
|
struct a { <--- highlight a
|
||||||
|
int b; <--- highlight b
|
||||||
|
}
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue