1
Fork 0
mirror of git://git.sv.gnu.org/emacs.git synced 2025-12-08 23:40:24 -08:00

; * admin/notes/tree-sitter/starter-guide: Update starter-guide.

This commit is contained in:
Yuan Fu 2023-03-18 14:13:31 -07:00
parent 11592bcfda
commit e84f878e19
No known key found for this signature in database
GPG key ID: 56E19BC57664A442

View file

@ -17,6 +17,7 @@ TOC:
- More features? - More features?
- Common tasks (code snippets) - Common tasks (code snippets)
- Manual - Manual
- Appendix 1
* Building Emacs with tree-sitter * Building Emacs with tree-sitter
@ -42,11 +43,9 @@ You can use this script that I put together here:
https://github.com/casouri/tree-sitter-module https://github.com/casouri/tree-sitter-module
You can also find them under this directory in /build-modules.
This script automatically pulls and builds language definitions for C, This script automatically pulls and builds language definitions for C,
C++, Rust, JSON, Go, HTML, JavaScript, CSS, Python, Typescript, C++, Rust, JSON, Go, HTML, JavaScript, CSS, Python, Typescript,
and C#. Better yet, I pre-built these language definitions for C#, etc. Better yet, I pre-built these language definitions for
GNU/Linux and macOS, they can be downloaded here: GNU/Linux and macOS, they can be downloaded here:
https://github.com/casouri/tree-sitter-module/releases/tag/v2.1 https://github.com/casouri/tree-sitter-module/releases/tag/v2.1
@ -68,6 +67,10 @@ organization has all the "official" language definitions:
https://github.com/tree-sitter https://github.com/tree-sitter
Alternatively, you can use treesit-install-language-grammar command
and follow its instructions. If everything goes right, it should
automatically download and compile the language grammar for you.
* Setting up for adding major mode features * Setting up for adding major mode features
Start Emacs and load tree-sitter with Start Emacs and load tree-sitter with
@ -78,6 +81,10 @@ Now check if Emacs is built with tree-sitter library
(treesit-available-p) (treesit-available-p)
Make sure Emacs can find the language grammar you want to use
(treesit-language-available-p 'lang)
* Tree-sitter major modes * Tree-sitter major modes
Tree-sitter modes should be separate major modes, so other modes Tree-sitter modes should be separate major modes, so other modes
@ -89,12 +96,15 @@ modes.
If the tree-sitter variant and the "native" variant could share some If the tree-sitter variant and the "native" variant could share some
setup, you can create a "base mode", which only contains the common setup, you can create a "base mode", which only contains the common
setup. For example, there is python-base-mode (shared), python-mode setup. For example, python.el defines python-base-mode (shared),
(native), and python-ts-mode (tree-sitter). python-mode (native), and python-ts-mode (tree-sitter).
In the tree-sitter mode, check if we can use tree-sitter with In the tree-sitter mode, check if we can use tree-sitter with
treesit-ready-p, it will error out if tree-sitter is not ready. treesit-ready-p, it will error out if tree-sitter is not ready.
In Emacs 30 we'll introduce some mechanism to more gracefully inherit
modes and fallback to other modes.
* Naming convention * Naming convention
Use tree-sitter for text (documentation, comment), use treesit for Use tree-sitter for text (documentation, comment), use treesit for
@ -180,18 +190,17 @@ mark the offending part in red.
To enable tree-sitter font-lock, set treesit-font-lock-settings and To enable tree-sitter font-lock, set treesit-font-lock-settings and
treesit-font-lock-feature-list buffer-locally and call treesit-font-lock-feature-list buffer-locally and call
treesit-major-mode-setup. For example, see treesit-major-mode-setup. For example, see
python--treesit-settings in python.el. Below I paste a snippet of python--treesit-settings in python.el. Below is a snippet of it.
it.
Note that like the current font-lock, if the to-be-fontified region Just like the current font-lock, if the to-be-fontified region already
already has a face (ie, an earlier match fontified part/all of the has a face (ie, an earlier match fontified part/all of the region),
region), the new face is discarded rather than applied. If you want the new face is discarded rather than applied. If you want later
later matches always override earlier matches, use the :override matches always override earlier matches, use the :override keyword.
keyword.
Each rule should have a :feature, like function-name, Each rule should have a :feature, like function-name,
string-interpolation, builtin, etc. Users can then enable/disable each string-interpolation, builtin, etc. Users can then enable/disable each
feature individually. feature individually. See Appendix 1 at the bottom for a set of common
features names.
#+begin_src elisp #+begin_src elisp
(defvar python--treesit-settings (defvar python--treesit-settings
@ -247,8 +256,7 @@ Concretely, something like this:
(string-interpolation decorator))) (string-interpolation decorator)))
(treesit-major-mode-setup)) (treesit-major-mode-setup))
(t (t
;; No tree-sitter ;; No tree-sitter, do nothing or fallback to another mode.
(setq-local font-lock-defaults ...)
...))) ...)))
#+end_src #+end_src
@ -289,6 +297,7 @@ For ANCHOR we have
first-sibling => start of the first sibling first-sibling => start of the first sibling
parent => start of parent parent => start of parent
parent-bol => BOL of the line parent is on. parent-bol => BOL of the line parent is on.
standalone-parent => Like parent-bol but handles more edge cases
prev-sibling => start of previous sibling prev-sibling => start of previous sibling
no-indent => current position (dont indent) no-indent => current position (dont indent)
prev-line => start of previous line prev-line => start of previous line
@ -329,7 +338,8 @@ tells you which rule is applied in the echo area.
...)))) ...))))
#+end_src #+end_src
Then you set treesit-simple-indent-rules to your rules, and call To setup indentation for your major mode, set
treesit-simple-indent-rules to your rules, and call
treesit-major-mode-setup: treesit-major-mode-setup:
#+begin_src elisp #+begin_src elisp
@ -339,36 +349,14 @@ Then you set treesit-simple-indent-rules to your rules, and call
* Imenu * Imenu
Not much to say except for utilizing treesit-induce-sparse-tree (and Set treesit-simple-imenu-settings and call
explicitly pass a LIMIT argument: most of the time you don't need more treesit-major-mode-setup.
than 10). See js--treesit-imenu-1 in js.el for an example.
Once you have the index builder, set imenu-create-index-function to
it.
* Navigation * Navigation
Mainly beginning-of-defun-function and end-of-defun-function. Set treesit-defun-type-regexp and call
You can find the end of a defun with something like treesit-major-mode-setup. You can additionally set
treesit-defun-name-function.
(treesit-search-forward-goto "function_definition" 'end)
where "function_definition" matches the node type of a function
definition node, and end means we want to go to the end of that node.
Tree-sitter has default implementations for
beginning-of-defun-function and end-of-defun-function. So for
ordinary languages, it is enough to set treesit-defun-type-regexp
to something that matches all the defun struct types in the language,
and call treesit-major-mode-setup. For example,
#+begin_src emacs-lisp
(setq-local treesit-defun-type-regexp (rx bol
(or "function" "class")
"_definition"
eol))
(treesit-major-mode-setup)
#+end_src>
* Which-func * Which-func
@ -376,36 +364,7 @@ If you have an imenu implementation, set which-func-functions to
nil, and which-func will automatically use imenus data. nil, and which-func will automatically use imenus data.
If you want an independent implementation for which-func, you can If you want an independent implementation for which-func, you can
find the current function by going up the tree and looking for the find the current function by treesit-defun-at-point.
function_definition node. See the function below for an example.
Since Python allows nested function definitions, that function keeps
going until it reaches the root node, and records all the function
names along the way.
#+begin_src elisp
(defun python-info-treesit-current-defun (&optional include-type)
"Identical to `python-info-current-defun' but use tree-sitter.
For INCLUDE-TYPE see `python-info-current-defun'."
(let ((node (treesit-node-at (point)))
(name-list ())
(type nil))
(cl-loop while node
if (pcase (treesit-node-type node)
("function_definition"
(setq type 'def))
("class_definition"
(setq type 'class))
(_ nil))
do (push (treesit-node-text
(treesit-node-child-by-field-name node "name")
t)
name-list)
do (setq node (treesit-node-parent node))
finally return (concat (if include-type
(format "%s " type)
"")
(string-join name-list ".")))))
#+end_src
* More features? * More features?
@ -449,7 +408,51 @@ section is Parsing Program Source. Typing
C-h i d m elisp RET g Parsing Program Source RET C-h i d m elisp RET g Parsing Program Source RET
will bring you to that section. You can also read the HTML version will bring you to that section. You dont need to read through every
under /html-manual in this directory. I find the HTML version easier sentence, just read the text paragraphs and glance over function
to read. You dont need to read through every sentence, just read the names.
text paragraphs and glance over function names.
* Appendix 1
Below is a set of common features used by built-in major mode.
Basic tokens:
delimiter ,.; (delimit things)
operator == != || (produces a value)
bracket []{}()
misc-punctuation (other punctuation that you want to highlight)
constant true, false, null
number
keyword
comment (includes doc-comments)
string (includes chars and docstrings)
string-interpolation f"text {variable}"
escape-sequence "\n\t\\"
function every function identifier
variable every variable identifier
type every type identifier
property a.b <--- highlight b
key { a: b, c: d } <--- highlight a, c
error highlight parse error
Abstract features:
assignment: the LHS of an assignment (thing being assigned to), eg:
a = b <--- highlight a
a.b = c <--- highlight b
a[1] = d <--- highlight a
definition: the thing being defined, eg:
int a(int b) { <--- highlight a
return 0
}
int a; <-- highlight a
struct a { <--- highlight a
int b; <--- highlight b
}