From 46929f6b7308b9aab011b3d4ea4adaa4242076cd Mon Sep 17 00:00:00 2001 From: Eli Zaretskii Date: Fri, 4 Nov 2022 15:12:29 +0200 Subject: [PATCH 1/3] ; Improve documentation of character classes in regexps * doc/lispref/searching.texi (Char Classes): Add notes about the dependence of character classes on case and syntax tables specific to buffers and modes. (Bug#58992) --- doc/lispref/searching.texi | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi index fe4de0abbb2..3365c0c9042 100644 --- a/doc/lispref/searching.texi +++ b/doc/lispref/searching.texi @@ -617,7 +617,7 @@ This matches any character whose code is in the range 0--31. This matches @samp{0} through @samp{9}. Thus, @samp{[-+[:digit:]]} matches any digit, as well as @samp{+} and @samp{-}. @item [:graph:] -This matches graphic characters---everything except whitespace, +This matches graphic characters---everything except spaces, @acronym{ASCII} and non-@acronym{ASCII} control characters, surrogates, and codepoints unassigned by Unicode, as indicated by the Unicode @samp{general-category} property (@pxref{Character @@ -625,29 +625,39 @@ Properties}). @item [:lower:] This matches any lower-case letter, as determined by the current case table (@pxref{Case Tables}). If @code{case-fold-search} is -non-@code{nil}, this also matches any upper-case letter. +non-@code{nil}, this also matches any upper-case letter. Note that a +buffer can have its own local case table different from the default +one. @item [:multibyte:] This matches any multibyte character (@pxref{Text Representations}). @item [:nonascii:] This matches any non-@acronym{ASCII} character. @item [:print:] -This matches any printing character---either whitespace, or a graphic -character matched by @samp{[:graph:]}. +This matches any printing character---either spaces or graphic +characters matched by @samp{[:graph:]}. @item [:punct:] This matches any punctuation character. (At present, for multibyte -characters, it matches anything that has non-word syntax.) +characters, it matches anything that has non-word syntax, and thus its +exact definition can vary from one major mode to another, since the +syntax of a character depends on the major mode.) @item [:space:] This matches any character that has whitespace syntax -(@pxref{Syntax Class Table}). +(@pxref{Syntax Class Table}). Note that the syntax of a character, +and thus which characters are considered ``whitespace'', +depends on the major mode. @item [:unibyte:] This matches any unibyte character (@pxref{Text Representations}). @item [:upper:] This matches any upper-case letter, as determined by the current case table (@pxref{Case Tables}). If @code{case-fold-search} is -non-@code{nil}, this also matches any lower-case letter. +non-@code{nil}, this also matches any lower-case letter. Note that a +buffer can have its own local case table different from the default +one. @item [:word:] This matches any character that has word syntax (@pxref{Syntax Class -Table}). +Table}). Note that the syntax of a character, and thus which +characters are considered ``word-constituent'', depends on the major +mode. @item [:xdigit:] This matches the hexadecimal digits: @samp{0} through @samp{9}, @samp{a} through @samp{f} and @samp{A} through @samp{F}. From 5779df0c5bb7b1805196c09948be24bd5531a4b4 Mon Sep 17 00:00:00 2001 From: Eli Zaretskii Date: Fri, 4 Nov 2022 16:02:48 +0200 Subject: [PATCH 2/3] ; * doc/lispref/searching.texi: Remove reference to Posix. (Bug#58992) --- doc/lispref/searching.texi | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi index 3365c0c9042..087505ad246 100644 --- a/doc/lispref/searching.texi +++ b/doc/lispref/searching.texi @@ -395,13 +395,12 @@ range should not be the starting point of another one; for example, @samp{[a-m-z]} should be avoided. A character alternative can also specify named character classes -(@pxref{Char Classes}). This is a POSIX feature. For example, -@samp{[[:ascii:]]} matches any @acronym{ASCII} character. -Using a character class is equivalent to mentioning each of the -characters in that class; but the latter is not feasible in practice, -since some classes include thousands of different characters. -A character class should not appear as the lower or upper bound -of a range. +(@pxref{Char Classes}). For example, @samp{[[:ascii:]]} matches any +@acronym{ASCII} character. Using a character class is equivalent to +mentioning each of the characters in that class; but the latter is not +feasible in practice, since some classes include thousands of +different characters. A character class should not appear as the +lower or upper bound of a range. The usual regexp special characters are not special inside a character alternative. A completely different set of characters is From 70fb03a49af07bd644e831c7d2e8d219aa910535 Mon Sep 17 00:00:00 2001 From: Eli Zaretskii Date: Fri, 4 Nov 2022 17:21:58 +0200 Subject: [PATCH 3/3] ; * doc/emacs/search.texi (Lax Search): Improve suggestion. (Bug#58992) --- doc/emacs/search.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/emacs/search.texi b/doc/emacs/search.texi index c58cc363ad2..77226808859 100644 --- a/doc/emacs/search.texi +++ b/doc/emacs/search.texi @@ -1343,7 +1343,7 @@ Hence, @w{@samp{foo bar}} matches @w{@samp{foo bar}}, @w{@samp{foo@ @ bar}}, @w{@samp{foo@ @ @ bar}}, and so on (but not @samp{foobar}). If you want to make spaces match sequences of newlines as well as spaces and tabs, customize the option to make its value be the regular -expression @samp{[[:space:]\n]+}. (The default behavior of the +expression @samp{[ \t\n]+}. (The default behavior of the incremental regexp search is different; see @ref{Regexp Search}.) If you want whitespace characters to match exactly, you can turn lax